[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23208
  
Let's move the high level discussion to the doc: 
https://docs.google.com/document/d/1vI26UEuDpVuOjWw4WPoH2T6y8WAekwtI7qoowhOFnI4/edit?usp=sharing


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-07 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/23208
  
@cloud-fan, what are you suggesting to use as a design? If you think this 
shouldn't mirror the read side, then let's be clear on what it should look 
like. Maybe that's a design doc, or maybe that's a discussion thread on the 
mailing list.

Whatever option we go for, we still need to have a plan for exposing the 
replace-by-filter and replace-dynamic-partitions methods, whatever they end up 
being. We also need the life-cycle to match.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23208
  
@rdblue I tried to add `WriteBuilder`, but there is a difference between 
read and write:
1. for read, the `ScanBuilder` can collect many information, like column 
pruning, filter pushdown, etc. together, and create a `Scan`
2. for write, it's just different branches, not a combination. e.g. you 
can't do append and replaceWhere at the same time.

Because of this, I feel we don't need `WriterBuilder`, but just different 
mixin traits to create `Write` for different purposes.

Let me know if you have other ideas. Thanks for your review!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-06 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/23208
  
@cloud-fan, I see that this adds `Table` and uses `TableProvider`, but I 
was expecting this to also update the write side to mirror the read side, like 
PR #22190 for [SPARK-25188](https://issues.apache.org/jira/browse/SPARK-25188) 
(originally proposed in [discussion on 
SPARK-24882](https://issues.apache.org/jira/browse/SPARK-24882?focusedCommentId=16581725=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16581725)).

The main parts that we discussed there were:
* Mirror the read side structure by adding WriteConfig. Now, that would be 
adding a WriteBuilder.
* Mirroring the read life-cycle of ScanBuilder and Scan, to enable use 
cases like acquiring and holding a write lock, for example.
* Using the WriteBuilder to expose more write configuration to support 
overwrite and dynamic partition overwrite.

We don't need to add the overwrite mix-ins here, but I would expect to see 
a WriteBuilder that returns a Writer. (`Table -> WriteBuilder -> Write` matches 
`Table -> ScanBulder -> Scan`.)

The Write would expose BatchWrite and StreamWrite (if they are different) 
or could directly expose the WriteFactory, commit, abort, etc.

WriteBuilder would be extensible so that SupportsOverwrite and 
SupportsDynamicOverwrite can be added as mix-ins at some point.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-03 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/23208
  
Thanks for posting this PR @cloud-fan! I'll have a look in the next day or 
so.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99620/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23208
  
**[Test build #99620 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99620/testReport)**
 for PR 23208 at commit 
[`00fc34f`](https://github.com/apache/spark/commit/00fc34fa793b922a48a4bf8e9f9cd0e3b688800b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5676/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23208
  
**[Test build #99620 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99620/testReport)**
 for PR 23208 at commit 
[`00fc34f`](https://github.com/apache/spark/commit/00fc34fa793b922a48a4bf8e9f9cd0e3b688800b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23208
  
cc @rdblue @rxin @jose-torres  @gatorsmile @HyukjinKwon @gengliangwang 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org