[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-11-04 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20675
  
@HeartSaVioR Thanks for your reply, sorry for just seen your comment. Yep, 
will keep tracking this feature after we supports shuffled stateful operators.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-07-29 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/20675
  
Looks like the patch is outdated, and when continuous query supports 
shuffled stateful operators, implementing task level retry is not that trivial. 
To get correct result of aggregation, when one of task fails at epoch N, all 
the tasks and states should be restored to epoch N. 

I definitely agree that it would be ideal to have stable task level retry, 
just wondering this patch would work with follow-up tasks for continuous mode.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-27 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20675
  
> it just means that for very long-running streams task restarts will 
eventually run out.

Ah, I know your means. Yeah, if we support task level retry we should also 
set the task retry number unlimited.

> But if you're worried that the current implementation of task restart 
will become incorrect as more complex scenarios are supported, I'd definitely 
lean towards deferring it until continuous processing is more feature-complete.

Yep, the "complex scenarios" I mentioned mainly including shuffle and 
aggregation scenario like comments in 
https://issues.apache.org/jira/browse/SPARK-20928?focusedCommentId=16245556=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16245556,
 in those scenario maybe task level retry should consider epoch align, but 
current implementation of task restart is completed for map-only continuous 
processing I think.

Agree with you about deferring it, so I just leave a comment in SPARK-23033 
and close this or you think this should reviewed by others?

> Do you want to spin that off into a separate PR? (I can handle it 
otherwise.)

Of cause, #20689 added a new interface `ContinuousDataReaderFactory` as our 
comments.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-27 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/20675
  
It's not semantically wrong that the attempt number is never reset; it just 
means that for very long-running streams task restarts will eventually run out.

You make a good point that in high parallelism cases we might need to be 
able to restart only a single task, although I think we'd still need 
query-level restart on top of that. But if you're worried that the current 
implementation of task restart will become incorrect as more complex scenarios 
are supported, I'd definitely lean towards deferring it until continuous 
processing is more feature-complete.

I was working on getting basic aggregation working, and I think we 
definitely will need some kind of setOffset-like functionality. Do you want to 
spin that off into a separate PR? (I can handle it otherwise.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20675
  
Great thanks for your detailed reply!
> The semantics aren't quite right. Task-level retry can happen a fixed 
number of times for the lifetime of the task, which is the lifetime of the 
query - even if it runs for days after, the attempt number will never be reset.
- I think the attempt number never be reset is not a problem, as long as 
the task start with right epoch and offset. Maybe I don't understand the 
meaning of the semantics, could you please give more explain?
- As far as I'm concerned, while we have a larger parallel number, whole 
stage restart is a too heavy operation and will lead a data shaking.
- Also want to leave a further thinking, after CP support shuffle and more 
complex scenario, task level retry need more work to do in order to ensure data 
is correct. But it maybe still a useful feature? I just want to leave this 
patch and initiate a discussion about this :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/20675
  
Now that I think about it, we may eventually need a way to set the starting 
partition offset after creation for other reasons, so I'm less confident in 
those second and third reasons. But on the whole I still think converting to 
global restarts makes sense.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20675
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87666/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20675
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20675
  
**[Test build #87666 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87666/testReport)**
 for PR 20675 at commit 
[`21f574e`](https://github.com/apache/spark/commit/21f574e2a3ad3c8e68b92776d2a141d7fcb90502).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20675
  
**[Test build #87666 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87666/testReport)**
 for PR 20675 at commit 
[`21f574e`](https://github.com/apache/spark/commit/21f574e2a3ad3c8e68b92776d2a141d7fcb90502).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20675
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20675
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1054/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20675
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20675
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87665/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20675
  
**[Test build #87665 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87665/testReport)**
 for PR 20675 at commit 
[`21f574e`](https://github.com/apache/spark/commit/21f574e2a3ad3c8e68b92776d2a141d7fcb90502).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20675
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-25 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20675
  
cc @tdas and @jose-torres 
#20225 gives a quickly fix for task level retry, this is just an attempt 
for a maybe better implementation. Please let me know if I do something wrong 
or have misunderstandings of Continuous Processing. Thanks :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20675
  
**[Test build #87665 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87665/testReport)**
 for PR 20675 at commit 
[`21f574e`](https://github.com/apache/spark/commit/21f574e2a3ad3c8e68b92776d2a141d7fcb90502).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20675
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1053/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20675
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org