Github user jose-torres commented on the issue:
https://github.com/apache/spark/pull/20710
I still maintain that it's sensible to say a batch query is a query that
has only one epoch, and that the ship has sailed on passing useless
information. But I'm bikeshedding here. Created
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20710
Epoch ID is not a valid part of the logical place in a query for batch. I
think we should separate batch and streaming, as they are already coming from
different interfaces. There's no need to pass
Github user jose-torres commented on the issue:
https://github.com/apache/spark/pull/20710
As you say, there's no strict semantic need to have createDataWriter() take
arguments. We could simply have each DataWriter identify itself by a random
UUID, and require upstream components to
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20710
> Data source writers need to be able to reason about what progress they've
made, which is impossible in the streaming case if each epoch is its own
disconnected query.
I don't think the
Github user jose-torres commented on the issue:
https://github.com/apache/spark/pull/20710
Partitions are a better example than task attempts, but it's still roughly
the same idea. Data source writers need to be able to reason about what
progress they've made, which is impossible in
Github user jose-torres commented on the issue:
https://github.com/apache/spark/pull/20710
For either case. Any streaming execution model has to know that epoch 1 and
epoch 2 are part of the same query, for the same reasons it has to know that
task attempt 0 and task attempt 1 are
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20710
My question is: why can't we use a batch interface for batch and
micro-batch (which behaves like batch) and add a separate streaming interface
for continuous streaming? I see no reason to have epoch
Github user jose-torres commented on the issue:
https://github.com/apache/spark/pull/20710
I'm not certain I understand the question.
From the perspective of query plan execution, the non-continuous streaming
mode does just use the batch interface. The motivation of adding
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20710
Could the non-continuous streaming mode just use the batch interface, since
each write is basically separate?
---
-
To
Github user jose-torres commented on the issue:
https://github.com/apache/spark/pull/20710
There isn't a currently a distinction between streaming and batch in the
places where this interface is called, except in the experimental continuous
processing streaming mode. The streaming
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20710
@jose-torres, can you explain that more for me? Why would callers only use
one interface but not the other? Wouldn't streaming use one and batch the
other? Why would batch need to know about
Github user jose-torres commented on the issue:
https://github.com/apache/spark/pull/20710
My primary concern with splitting the interfaces is that it makes it easy
for Spark changes to accidentally do the wrong thing. Callers of
DataWriterFactory.createDataWriter() won't necessarily
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20710
@tdas, thanks for letting us know. I'm really wondering if we should be
using the same interfaces between batch and streaming. The epoch id strikes me
as strange for data sources that won't support
Github user tdas commented on the issue:
https://github.com/apache/spark/pull/20710
@rdblue @jose-torres arrgh... i didnt notice that you guys were still
commenting before i merged it.
feel free to continue discussion and if any change is needed we will deal
with this
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20710
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20710
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20710
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87917/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20710
**[Test build #87917 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87917/testReport)**
for PR 20710 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20710
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87918/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20710
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20710
**[Test build #87918 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87918/testReport)**
for PR 20710 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20710
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20710
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87915/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20710
**[Test build #87915 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87915/testReport)**
for PR 20710 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20710
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87912/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20710
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20710
**[Test build #87912 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87912/testReport)**
for PR 20710 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20710
**[Test build #87918 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87918/testReport)**
for PR 20710 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20710
**[Test build #87917 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87917/testReport)**
for PR 20710 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20710
**[Test build #87915 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87915/testReport)**
for PR 20710 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20710
**[Test build #87912 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87912/testReport)**
for PR 20710 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20710
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20710
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87909/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20710
**[Test build #87909 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87909/testReport)**
for PR 20710 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20710
**[Test build #87909 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87909/testReport)**
for PR 20710 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20710
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20710
**[Test build #87862 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87862/testReport)**
for PR 20710 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20710
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87862/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20710
**[Test build #87862 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87862/testReport)**
for PR 20710 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20710
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user jose-torres commented on the issue:
https://github.com/apache/spark/pull/20710
@tdas @rdblue @cloud-fan
I haven't forgotten that we need a design doc before finalization;
SPARK-23556 tracks that.
---
41 matches
Mail list logo