[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #87000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87000/testReport)** for PR 20387 at commit [`9bb0141`](https://github.com/apache/spark/commit/9bb01416d68e9e2b7ed34745ba0a4b92721d98dd). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87000/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/537/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #87000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87000/testReport)** for PR 20387 at commit [`9bb0141`](https://github.com/apache/spark/commit/9bb01416d68e9e2b7ed34745ba0a4b92721d98dd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @cloud-fan, I'll update this PR and we can talk about passing configuration on the dev list. And as a reminder, please close #20445. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > I tried and can't figure out how to do it with PhysicalOperation, that's why I build something new for data source v2 pushdown. The problem is that we should get DSv2 working independently of a redesign of the push-down rules. Throwing an untested push-down rule into changes for DSv2 makes the new API less reliable, and hurts people that want to try it out and start using it. There is no benefit to doing this for 2.3.0. I also think a redesign of push-down should be properly designed, thought out, and tested. I'm all for fixing this if you can make the case that we need to, but we shouldn't needlessly mix together major changes. @cloud-fan, There's more discussion about this on #20476 that I encourage you to read. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20387 +1 for @cloud-fan 's suggestion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20387 Hi @rdblue , I think we all agree that the plan should be immutable, but other parts are still under discussion. Can you send a new PR that focus on making the plan immutable? so that we can merge that one first, and continue to discuss other parts in this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20387 > We can add things like limit pushdown later, by adding it properly to the existing code. I tried and can't figure out how to do it with `PhysicalOperation`, that's why I build something new for data source v2 pushdown. I'm OK to reuse it if you can convince me `PhysicalOperation` is extendable, e.g. support limit push down. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20387 Currently `DataSourceOptions` is the major way for Spark and users to pass information to the data source. It's very flexible and only defines one rule: the option key lookup should be case-insensitive. I agree with your point that more consistency is better. It's annoying if every data source needs to define their own option keys for table and database, and tell users about it. It's good if Spark can define some rules about what option keys should be used for some common information. My proposal: ``` class DataSourceOptions { ... def getPath(): String = get("path") def getTimeZone(): String = get("timeZone") def getTableName(): String = get("table") } ``` We can keep adding these options since this won't break binary compatibility. And then we just need to document it and tell both users and data source developers about how to specify and retrieve these common options. Then I think we don't need to add `table` and `database` parameters to `DataSourceV2Relation`, because we can easily do `relation.options.getTable`. BTW this doesn't change the API so I think it's fine to do it after 2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @dongjoon-hyun, @gatorsmile, could you guys weigh in on some this discussion? I'd like to get additional perspectives on the changes I'm proposing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > Let's keep it general and let the data source to interprete it. I think this is the wrong approach. The reason why we are using a special `DataSourceOptions` object is to ensure that data sources consistently ignore case when reading **their own options**. Consistency across data sources matters and we should be pushing for more consistency, not less. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @cloud-fan, to your point about push-down order, I'm not saying that order doesn't matter at all, I'm saying that the push-down can run more than once and it should push the closest operators. That way, if you have a situation where operators can't be reordered but they can all be pushed, they all get pushed through multiple runs of the rule, each one further refining the relation. If we do it this way, then we don't need to traverse the logical plan to find out what to push down. We continue pushing projections until the plan stops changing. This is how the rest of the optimizer works, so I think it is a better approach from a design standpoint. My implementation also reuses more existing code that we have higher confidence in, which is a good thing. We can add things like limit pushdown later, by adding it properly to the existing code. I don't see a compelling reason to toss out the existing implementation, especially without the same level of testing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > `spark.read.format("iceberg").table("db.table").load()` I'm fine with this if you think it is confusing to parse the path as a table name in load. I think it is reasonable. I'd still like to keep the `Option[TableIdentifier]` parameter on the relation, so that we can support `table` or `insertInto` on the write path. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @felixcheung, yes, we do already have a `table` option. That creates an `UnresolvedRelation` with the parsed table name as a `TableIdentifier`, which is not currently compatible with `DataSourceV2` because there is no standard way to pass the identifier's db and table name. Part of the intent here is to add support in `DataSourceV2Relation` for cases where we have a `TableIdentifier`, so that we can add a resolver rule that replaces `UnresolvedRelation` with `DataSourceV2Relation`. This is what we do in our Spark branch. @cloud-fan, what is your objection to support like this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20387 don't we already have table in DataFrameReader? http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=dataframereader#pyspark.sql.DataFrameReader.table http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader@table(tableName:String):org.apache.spark.sql.DataFrame --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20387 > I thought it was a good thing to push a single node down at a time and not depend on order. The order must be taken care. For example, we can't push down a limit through Filter, unless the entire filter is pushed into the data source. Generally, if we pushed down multiple operators into a data source, we should clearly define what the order is to apply these operators in the data source. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20387 > This is a new API... Are you saying you wanna add a new method in `DataFreameReader` that is different than `load`? In Scala, parameter name is part of the method signature, so for `def load(path: String)`, we can't change its semantic, the parameter is a path. It's fine if a data source impelementation teach its users that path will interpreted as database/tables by it, but this should not be a contract in Spark. I do agree that Spark should set a standard for specifying database and table, as it's very common. We can even argue that path is not a general concept for data sources, but we still provide special APIs for path. My proposal: How about we add a new methods `table` in `DataFrameReader`? The usage would look like: `spark.read.format("iceberg").table("db.table").load()`, what do you think? We should not specify `database`, as if we may have catalog federation and table name may have 3 parts `catalog.db.table`. Let's keep it general and let the data source to interprete it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > It's hard to improve PhysicalOperation to support more operators and specific push down orders, so I created the new one I'm concerned about the new one. The projection support seems really brittle because it calls out specific logical nodes and scans the entire plan. If we are doing push-down wrong on the current v1 and Hive code paths, then I'd like to see a proposal for fixing that without these drawbacks. I like that this PR pushes projections and filters just like the other paths. We should start there and add additional push-down as necessary. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > [The push-down rule may be run more than once if filters are not pushed through projections] looks weird, do you have a query to reproduce this issue? One of the DataSourceV2 tests hit this. I thought it was a good thing to push a single node down at a time and not depend on order. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > I'd suggest that we just propogate the paths parameter to options, and data source implementations are free to interprete the path option to whatever they want, e.g. table and database names. What about code paths that expect table names? In our branch, we've added support for converting Hive relations (which have a `TableIdentifier`, not a path) and using `insertInto`. Table names are paths are the two main ways to identify tables and I think both should be supported. This is a new API, so it doesn't matter that `load` and `save` currently use paths. We can easily update that support for tables. If we don't, then there will be no common way to refer to tables: some implementations will use `table`, some will pass `db` separately, and some might use `database`. Standardizing this and adding support in Spark will produce more consistent behavior across data sources. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > I'm ok to make it immutable if there is an significant benefit. Mutable nodes violate a basic assumption of catalyst, that trees are immutable. Here's a good quote from the SIGMOD paper (by @rxin, @yhuai, and @marmbrus et al.): > In our experience, functional transformations on immutable trees make the whole optimizer very easy to reason about and debug. They also enable parallelization in the optimizer, although we do not yet exploit this. Mixing mutable nodes into supposedly immutable trees is a bad idea. Other nodes in the tree assume that children do not change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20387 I dig into the commit history and recalled why I made these decisions: * having an mutable `DataSourceV2Relation`. This is mostly to avoid to keep adding more constructor parameters to `DataSourceV2Relation`, make the code easy to maintain. I'm ok to make it immutable if there is an significant benefit. * not using `PhysicalOperation`. This is because we will add more push down optimizations(e.g. limit, aggregate, join), and we have a specify push down order for them. It's hard to improve `PhysicalOperation` to support more operators and specific push down orders, so I created the new one. Eventually all data sources will be implemented as data source v2, so `PhysicalOperation` will go away. > The output of DataSourceV2Relation should be what is returned by the reader, in case the reader can only partially satisfy the requested schema projection Good catch! Since `DataSourceV2Reader` is mutable, the output can't be fixed, as it may change when we apply data source optimizations. Using `lazy val output ...` can fix this. > The requested projection passed to the DataSourceV2Reader should include filter columns I did this intentionally. If a column is only refered by pushed filters, Spark doesn't need this column. Even if we require this column from the data source, we just read it out and wait it to be pruned by the next operator. > The push-down rule may be run more than once if filters are not pushed through projections This looks weird, do you have a query to reproduce this issue? > This updates DataFrameReader to parse locations that do not look like paths as table names and pass the result as "database" and "table" keys in v2 options. Personally I'd suggest to use `spark.read.format("iceberg").option("table", "db.table").load()`, as `load` is defined as `def load(paths: String*)`, but I think your usage looks better. The communition protocol between Spark and data source is options, I'd suggest that we just propogate the `paths` parameter to options, and data source implementations are free to interprete the path option to whatever they want, e.g. table and database names. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20387 overall I think it's a good idea to make the plan immutable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86602/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #86602 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86602/testReport)** for PR 20387 at commit [`83203a6`](https://github.com/apache/spark/commit/83203a6e117f180b1839c815e4c3b3ef539f6b2b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #86602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86602/testReport)** for PR 20387 at commit [`83203a6`](https://github.com/apache/spark/commit/83203a6e117f180b1839c815e4c3b3ef539f6b2b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/201/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #86601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86601/testReport)** for PR 20387 at commit [`ac58844`](https://github.com/apache/spark/commit/ac58844118d543030fadfeda0a64b52ad659cf31). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86601/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #86601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86601/testReport)** for PR 20387 at commit [`ac58844`](https://github.com/apache/spark/commit/ac58844118d543030fadfeda0a64b52ad659cf31). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/200/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #86600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86600/testReport)** for PR 20387 at commit [`9c4dcb5`](https://github.com/apache/spark/commit/9c4dcb5b693e729e89ddd7daa54b19c8f8eb3571). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class StreamingDataSourceV2Relation(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86600/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #86600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86600/testReport)** for PR 20387 at commit [`9c4dcb5`](https://github.com/apache/spark/commit/9c4dcb5b693e729e89ddd7daa54b19c8f8eb3571). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/199/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @cloud-fan, please have a look at these changes. This will require follow-up for the Streaming side. I have yet to review the streaming interfaces for `DataSourceV2`, so I haven't made any changes there. In our Spark build, I've also moved the write path to use DataSourceV2Relation, which I intend to do in a follow-up to this issue. @rxin FYI. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org