Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20387
**[Test build #87000 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87000/testReport)**
for PR 20387 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87000/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/537/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20387
**[Test build #87000 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87000/testReport)**
for PR 20387 at commit
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20387
@cloud-fan, I'll update this PR and we can talk about passing configuration
on the dev list.
And as a reminder, please close #20445.
---
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20387
> I tried and can't figure out how to do it with PhysicalOperation, that's
why I build something new for data source v2 pushdown.
The problem is that we should get DSv2 working independently
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/20387
+1 for @cloud-fan 's suggestion.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20387
Hi @rdblue , I think we all agree that the plan should be immutable, but
other parts are still under discussion. Can you send a new PR that focus on
making the plan immutable? so that we can
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20387
> We can add things like limit pushdown later, by adding it properly to the
existing code.
I tried and can't figure out how to do it with `PhysicalOperation`, that's
why I build
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20387
Currently `DataSourceOptions` is the major way for Spark and users to pass
information to the data source. It's very flexible and only defines one rule:
the option key lookup should be
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20387
@dongjoon-hyun, @gatorsmile, could you guys weigh in on some this
discussion? I'd like to get additional perspectives on the changes I'm
proposing.
---
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20387
> Let's keep it general and let the data source to interprete it.
I think this is the wrong approach. The reason why we are using a special
`DataSourceOptions` object is to ensure that data
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20387
@cloud-fan, to your point about push-down order, I'm not saying that order
doesn't matter at all, I'm saying that the push-down can run more than once and
it should push the closest operators. That
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20387
> `spark.read.format("iceberg").table("db.table").load()`
I'm fine with this if you think it is confusing to parse the path as a
table name in load. I think it is reasonable.
I'd
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20387
@felixcheung, yes, we do already have a `table` option. That creates an
`UnresolvedRelation` with the parsed table name as a `TableIdentifier`, which
is not currently compatible with `DataSourceV2`
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/20387
don't we already have table in DataFrameReader?
http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=dataframereader#pyspark.sql.DataFrameReader.table
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20387
> I thought it was a good thing to push a single node down at a time and
not depend on order.
The order must be taken care. For example, we can't push down a limit
through Filter, unless
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20387
> This is a new API...
Are you saying you wanna add a new method in `DataFreameReader` that is
different than `load`? In Scala, parameter name is part of the method
signature, so for
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20387
> It's hard to improve PhysicalOperation to support more operators and
specific push down orders, so I created the new one
I'm concerned about the new one. The projection support seems
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20387
> [The push-down rule may be run more than once if filters are not pushed
through projections] looks weird, do you have a query to reproduce this issue?
One of the DataSourceV2 tests hit
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20387
> I'd suggest that we just propogate the paths parameter to options, and
data source implementations are free to interprete the path option to whatever
they want, e.g. table and database names.
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20387
> I'm ok to make it immutable if there is an significant benefit.
Mutable nodes violate a basic assumption of catalyst, that trees are
immutable. Here's a good quote from the SIGMOD paper
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20387
I dig into the commit history and recalled why I made these decisions:
* having an mutable `DataSourceV2Relation`. This is mostly to avoid to keep
adding more constructor parameters to
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20387
overall I think it's a good idea to make the plan immutable.
---
-
To unsubscribe, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86602/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20387
**[Test build #86602 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86602/testReport)**
for PR 20387 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20387
**[Test build #86602 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86602/testReport)**
for PR 20387 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/201/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20387
**[Test build #86601 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86601/testReport)**
for PR 20387 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86601/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20387
**[Test build #86601 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86601/testReport)**
for PR 20387 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/200/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20387
**[Test build #86600 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86600/testReport)**
for PR 20387 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86600/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20387
**[Test build #86600 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86600/testReport)**
for PR 20387 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/199/
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20387
@cloud-fan, please have a look at these changes. This will require
follow-up for the Streaming side. I have yet to review the streaming interfaces
for `DataSourceV2`, so I haven't made any changes
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20387
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
45 matches
Mail list logo