[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23208 Let's move the high level discussion to the doc: https://docs.google.com/document/d/1vI26UEuDpVuOjWw4WPoH2T6y8WAekwtI7qoowhOFnI4/edit?usp=sharing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23208 @cloud-fan, what are you suggesting to use as a design? If you think this shouldn't mirror the read side, then let's be clear on what it should look like. Maybe that's a design doc, or maybe that's a discussion thread on the mailing list. Whatever option we go for, we still need to have a plan for exposing the replace-by-filter and replace-dynamic-partitions methods, whatever they end up being. We also need the life-cycle to match. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23208 @rdblue I tried to add `WriteBuilder`, but there is a difference between read and write: 1. for read, the `ScanBuilder` can collect many information, like column pruning, filter pushdown, etc. together, and create a `Scan` 2. for write, it's just different branches, not a combination. e.g. you can't do append and replaceWhere at the same time. Because of this, I feel we don't need `WriterBuilder`, but just different mixin traits to create `Write` for different purposes. Let me know if you have other ideas. Thanks for your review! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23208 @cloud-fan, I see that this adds `Table` and uses `TableProvider`, but I was expecting this to also update the write side to mirror the read side, like PR #22190 for [SPARK-25188](https://issues.apache.org/jira/browse/SPARK-25188) (originally proposed in [discussion on SPARK-24882](https://issues.apache.org/jira/browse/SPARK-24882?focusedCommentId=16581725=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16581725)). The main parts that we discussed there were: * Mirror the read side structure by adding WriteConfig. Now, that would be adding a WriteBuilder. * Mirroring the read life-cycle of ScanBuilder and Scan, to enable use cases like acquiring and holding a write lock, for example. * Using the WriteBuilder to expose more write configuration to support overwrite and dynamic partition overwrite. We don't need to add the overwrite mix-ins here, but I would expect to see a WriteBuilder that returns a Writer. (`Table -> WriteBuilder -> Write` matches `Table -> ScanBulder -> Scan`.) The Write would expose BatchWrite and StreamWrite (if they are different) or could directly expose the WriteFactory, commit, abort, etc. WriteBuilder would be extensible so that SupportsOverwrite and SupportsDynamicOverwrite can be added as mix-ins at some point. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23208 Thanks for posting this PR @cloud-fan! I'll have a look in the next day or so. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99620/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23208 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23208 **[Test build #99620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99620/testReport)** for PR 23208 at commit [`00fc34f`](https://github.com/apache/spark/commit/00fc34fa793b922a48a4bf8e9f9cd0e3b688800b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23208 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5676/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23208 **[Test build #99620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99620/testReport)** for PR 23208 at commit [`00fc34f`](https://github.com/apache/spark/commit/00fc34fa793b922a48a4bf8e9f9cd0e3b688800b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23208 cc @rdblue @rxin @jose-torres @gatorsmile @HyukjinKwon @gengliangwang --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org