[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2018-01-05 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19848 Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2018-01-04 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/19848 @steveloughran can you bring this up on dev@? we should move this discussion off of this PR. (sorry haven't had a chance to look yet, but I appreciate you doing this) ---

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2018-01-04 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19848 WiP: [a_zero_rename_committer.pdf](https://github.com/steveloughran/zero-rename-committer/files/1604894/a_zero_rename_committer.pdf) I would really like some early review of the

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-30 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19848 > I actually feel like this is something hadoop should be documenting ... we are talking about how committers we happen to know work, rather than talking about the general contract of

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-27 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/19848 +1 to Marcelo's comment about having this conversation somewhere archived. I actually feel like this is something hadoop should be documenting ... we are talking about how committers we

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-15 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19848 Side note: this would be a great conversation to have recorded in our dev mailing list or in JIRA, instead of lost in PR comments on github... ---

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-15 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19848 > Check if the same jobId already is committed and then remove existing files and commit again. if your job doesn't allow overwrite, that's mostly implicit; it's only in

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-15 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19848 @rezasafi That is equivalent to two different executions of the same/similar app (concurrently or sequentially) right ? If yes, that is something @steveloughran already covered above and does

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-15 Thread rezasafi
Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/19848 @mridulm what I meant by same rdd was to run the same job two times on the same cluster but in different spark contexts. So it is not the same rdd, but since sparkContext will start rdd ids from

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-15 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19848 @rezasafi What do you mean by "different copies of an rdd at different times" ? If they are two different jobs to save, even if of the same rdd, they are two different jobs (save rdd with different

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-15 Thread rezasafi
Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/19848 @steveloughran Thank you very much for your detailed comment. I really appreciate it. I think In the above list when you reach step 6, for Stage2 you will have a different JobId and it cannot be

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-15 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19848 Thought some more on this. Here's a possible workflow for failures which can arise from job attempt recycling 1. Stage 1, Job ID 0, attempt 1, kicks off task 0 attempt 1,

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-15 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19848 Job is is only used in the normal FileOutputCommitter to generate unique paths, using`s" _temporary/$jobid_$job-attempt"` for the file (ie. job-attempt-ID, which is jobID+attempt). When

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-14 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19848 I found Spark SQL always use 0 as job id... How hadoop committers work with job id? only for recovery? --- - To unsubscribe,

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-14 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19848 > I was hoping you would know the hadoop committer semantics better than me I might, but that's only because I spent time with a debugger and asking people the history of things,

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-14 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19848 @steveloughran Any thoughts on @squito's comment ? It might be a valid corner case some committer might be leveraging ? (in context of a single user session for example) ---

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-14 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/19848 i dunno what the requirements are -- I was hoping you would know the hadoop committer semantics better than me! I suppose a uuid is really the only get something globally unique, as you could even

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-14 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19848 @squito Is there a requirement that it should be globally unique ? I am not sure whether (some?) committers make this assumption : and the few I did take a look at did not seem to care. If

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-13 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/19848 I have one concern about this -- there is a case where you are not giving a unique id to the hadoop committers. You could save one rdd twice, and even have both of those operations running

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-04 Thread rezasafi
Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/19848 The branch 2.2 PR for this fix is here: https://github.com/apache/spark/pull/19886 --- - To unsubscribe, e-mail:

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-04 Thread rezasafi
Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/19848 Thank you very much @vanzin, @mridulm and @jiangxb1987. I really appreciate it. I will create PR for branch 2.2 ASAP. --- -

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-04 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19848 (There was a conflict in 2.2, open a new PR if you want it there.) --- - To unsubscribe, e-mail:

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-04 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19848 Merging to master / 2.2. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-02 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19848 Thanks for fixing this @rezasafi ! This looks cleaner than my suggestion to generate a unique jobId. LGTM @vanzin --- - To

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19848 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84380/ Test PASSed. ---

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19848 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19848 **[Test build #84380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84380/testReport)** for PR 19848 at commit

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19848 **[Test build #84380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84380/testReport)** for PR 19848 at commit

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-01 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19848 LGTM but I'll leave it here a bit for others to take a look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19848 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19848 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84362/ Test PASSed. ---

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19848 **[Test build #84362 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84362/testReport)** for PR 19848 at commit

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19848 **[Test build #84362 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84362/testReport)** for PR 19848 at commit

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19848 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84355/ Test PASSed. ---

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19848 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19848 **[Test build #84355 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84355/testReport)** for PR 19848 at commit

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19848 **[Test build #84355 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84355/testReport)** for PR 19848 at commit

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19848 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84321/ Test PASSed. ---

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19848 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19848 **[Test build #84321 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84321/testReport)** for PR 19848 at commit

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread rezasafi
Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/19848 Thank you very much, @vanzin. I changed the code per your comment and pushed the changes. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19848 **[Test build #84321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84321/testReport)** for PR 19848 at commit

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread rezasafi
Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/19848 @vanzin , @mridulm , @jiangxb1987 let me know if you have any comment here. Thank you in advance. I appreciate it. --- - To

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19848 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19848 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84315/ Test PASSed. ---

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19848 **[Test build #84315 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84315/testReport)** for PR 19848 at commit

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19848 **[Test build #84315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84315/testReport)** for PR 19848 at commit

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19848 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19848 **[Test build #84313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84313/testReport)** for PR 19848 at commit

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19848 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84313/ Test FAILed. ---

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19848 **[Test build #84313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84313/testReport)** for PR 19848 at commit

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19848 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19848 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional