[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #87000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87000/testReport)** for PR 20387 at commit

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87000/ Test FAILed. ---

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/537/

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #87000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87000/testReport)** for PR 20387 at commit

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @cloud-fan, I'll update this PR and we can talk about passing configuration on the dev list. And as a reminder, please close #20445. ---

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > I tried and can't figure out how to do it with PhysicalOperation, that's why I build something new for data source v2 pushdown. The problem is that we should get DSv2 working independently

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20387 +1 for @cloud-fan 's suggestion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20387 Hi @rdblue , I think we all agree that the plan should be immutable, but other parts are still under discussion. Can you send a new PR that focus on making the plan immutable? so that we can

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20387 > We can add things like limit pushdown later, by adding it properly to the existing code. I tried and can't figure out how to do it with `PhysicalOperation`, that's why I build

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20387 Currently `DataSourceOptions` is the major way for Spark and users to pass information to the data source. It's very flexible and only defines one rule: the option key lookup should be

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @dongjoon-hyun, @gatorsmile, could you guys weigh in on some this discussion? I'd like to get additional perspectives on the changes I'm proposing. ---

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > Let's keep it general and let the data source to interprete it. I think this is the wrong approach. The reason why we are using a special `DataSourceOptions` object is to ensure that data

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @cloud-fan, to your point about push-down order, I'm not saying that order doesn't matter at all, I'm saying that the push-down can run more than once and it should push the closest operators. That

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > `spark.read.format("iceberg").table("db.table").load()` I'm fine with this if you think it is confusing to parse the path as a table name in load. I think it is reasonable. I'd

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @felixcheung, yes, we do already have a `table` option. That creates an `UnresolvedRelation` with the parsed table name as a `TableIdentifier`, which is not currently compatible with `DataSourceV2`

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20387 don't we already have table in DataFrameReader? http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=dataframereader#pyspark.sql.DataFrameReader.table

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20387 > I thought it was a good thing to push a single node down at a time and not depend on order. The order must be taken care. For example, we can't push down a limit through Filter, unless

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20387 > This is a new API... Are you saying you wanna add a new method in `DataFreameReader` that is different than `load`? In Scala, parameter name is part of the method signature, so for

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > It's hard to improve PhysicalOperation to support more operators and specific push down orders, so I created the new one I'm concerned about the new one. The projection support seems

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > [The push-down rule may be run more than once if filters are not pushed through projections] looks weird, do you have a query to reproduce this issue? One of the DataSourceV2 tests hit

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > I'd suggest that we just propogate the paths parameter to options, and data source implementations are free to interprete the path option to whatever they want, e.g. table and database names.

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > I'm ok to make it immutable if there is an significant benefit. Mutable nodes violate a basic assumption of catalyst, that trees are immutable. Here's a good quote from the SIGMOD paper

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20387 I dig into the commit history and recalled why I made these decisions: * having an mutable `DataSourceV2Relation`. This is mostly to avoid to keep adding more constructor parameters to

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20387 overall I think it's a good idea to make the plan immutable. --- - To unsubscribe, e-mail:

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86602/ Test PASSed. ---

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #86602 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86602/testReport)** for PR 20387 at commit

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #86602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86602/testReport)** for PR 20387 at commit

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/201/

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #86601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86601/testReport)** for PR 20387 at commit

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86601/ Test FAILed. ---

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #86601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86601/testReport)** for PR 20387 at commit

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/200/

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #86600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86600/testReport)** for PR 20387 at commit

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86600/ Test FAILed. ---

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #86600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86600/testReport)** for PR 20387 at commit

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/199/

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @cloud-fan, please have a look at these changes. This will require follow-up for the Streaming side. I have yet to review the streaming interfaces for `DataSourceV2`, so I haven't made any changes

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional