[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-06 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20710 I still maintain that it's sensible to say a batch query is a query that has only one epoch, and that the ship has sailed on passing useless information. But I'm bikeshedding here. Created

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20710 Epoch ID is not a valid part of the logical place in a query for batch. I think we should separate batch and streaming, as they are already coming from different interfaces. There's no need to pass

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20710 As you say, there's no strict semantic need to have createDataWriter() take arguments. We could simply have each DataWriter identify itself by a random UUID, and require upstream components to

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20710 > Data source writers need to be able to reason about what progress they've made, which is impossible in the streaming case if each epoch is its own disconnected query. I don't think the

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20710 Partitions are a better example than task attempts, but it's still roughly the same idea. Data source writers need to be able to reason about what progress they've made, which is impossible in

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20710 For either case. Any streaming execution model has to know that epoch 1 and epoch 2 are part of the same query, for the same reasons it has to know that task attempt 0 and task attempt 1 are

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20710 My question is: why can't we use a batch interface for batch and micro-batch (which behaves like batch) and add a separate streaming interface for continuous streaming? I see no reason to have epoch

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20710 I'm not certain I understand the question. From the perspective of query plan execution, the non-continuous streaming mode does just use the batch interface. The motivation of adding

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20710 Could the non-continuous streaming mode just use the batch interface, since each write is basically separate? --- - To

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20710 There isn't a currently a distinction between streaming and batch in the places where this interface is called, except in the experimental continuous processing streaming mode. The streaming

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20710 @jose-torres, can you explain that more for me? Why would callers only use one interface but not the other? Wouldn't streaming use one and batch the other? Why would batch need to know about

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20710 My primary concern with splitting the interfaces is that it makes it easy for Spark changes to accidentally do the wrong thing. Callers of DataWriterFactory.createDataWriter() won't necessarily

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20710 @tdas, thanks for letting us know. I'm really wondering if we should be using the same interfaces between batch and streaming. The epoch id strikes me as strange for data sources that won't support

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20710 @rdblue @jose-torres arrgh... i didnt notice that you guys were still commenting before i merged it. feel free to continue discussion and if any change is needed we will deal with this

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20710 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20710 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20710 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87917/ Test PASSed. ---

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20710 **[Test build #87917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87917/testReport)** for PR 20710 at commit

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20710 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87918/ Test PASSed. ---

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20710 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20710 **[Test build #87918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87918/testReport)** for PR 20710 at commit

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20710 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20710 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87915/ Test PASSed. ---

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20710 **[Test build #87915 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87915/testReport)** for PR 20710 at commit

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20710 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87912/ Test PASSed. ---

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20710 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20710 **[Test build #87912 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87912/testReport)** for PR 20710 at commit

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20710 **[Test build #87918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87918/testReport)** for PR 20710 at commit

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20710 **[Test build #87917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87917/testReport)** for PR 20710 at commit

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20710 **[Test build #87915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87915/testReport)** for PR 20710 at commit

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20710 **[Test build #87912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87912/testReport)** for PR 20710 at commit

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20710 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20710 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87909/ Test FAILed. ---

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20710 **[Test build #87909 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87909/testReport)** for PR 20710 at commit

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20710 **[Test build #87909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87909/testReport)** for PR 20710 at commit

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20710 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20710 **[Test build #87862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87862/testReport)** for PR 20710 at commit

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20710 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87862/ Test FAILed. ---

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20710 **[Test build #87862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87862/testReport)** for PR 20710 at commit

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20710 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-01 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20710 @tdas @rdblue @cloud-fan I haven't forgotten that we need a design doc before finalization; SPARK-23556 tracks that. ---