Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19848
Done.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user squito commented on the issue:
https://github.com/apache/spark/pull/19848
@steveloughran can you bring this up on dev@? we should move this
discussion off of this PR.
(sorry haven't had a chance to look yet, but I appreciate you doing this)
---
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19848
WiP:
[a_zero_rename_committer.pdf](https://github.com/steveloughran/zero-rename-committer/files/1604894/a_zero_rename_committer.pdf)
I would really like some early review of the
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19848
> I actually feel like this is something hadoop should be documenting ...
we are talking about how committers we happen to know work, rather than talking
about the general contract of
Github user squito commented on the issue:
https://github.com/apache/spark/pull/19848
+1 to Marcelo's comment about having this conversation somewhere archived.
I actually feel like this is something hadoop should be documenting ... we
are talking about how committers we
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19848
Side note: this would be a great conversation to have recorded in our dev
mailing list or in JIRA, instead of lost in PR comments on github...
---
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19848
> Check if the same jobId already is committed and then remove existing
files and commit again.
if your job doesn't allow overwrite, that's mostly implicit; it's only in
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19848
@rezasafi That is equivalent to two different executions of the
same/similar app (concurrently or sequentially) right ?
If yes, that is something @steveloughran already covered above and does
Github user rezasafi commented on the issue:
https://github.com/apache/spark/pull/19848
@mridulm what I meant by same rdd was to run the same job two times on the
same cluster but in different spark contexts. So it is not the same rdd, but
since sparkContext will start rdd ids from
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19848
@rezasafi What do you mean by "different copies of an rdd at different
times" ? If they are two different jobs to save, even if of the same rdd, they
are two different jobs (save rdd with different
Github user rezasafi commented on the issue:
https://github.com/apache/spark/pull/19848
@steveloughran Thank you very much for your detailed comment. I really
appreciate it. I think In the above list when you reach step 6, for Stage2 you
will have a different JobId and it cannot be
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19848
Thought some more on this.
Here's a possible workflow for failures which can arise from job attempt
recycling
1. Stage 1, Job ID 0, attempt 1, kicks off task 0 attempt 1,
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19848
Job is is only used in the normal FileOutputCommitter to generate unique
paths, using`s" _temporary/$jobid_$job-attempt"` for the file (ie.
job-attempt-ID, which is jobID+attempt).
When
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19848
I found Spark SQL always use 0 as job id... How hadoop committers work
with job id? only for recovery?
---
-
To unsubscribe,
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19848
> I was hoping you would know the hadoop committer semantics better than me
I might, but that's only because I spent time with a debugger and asking
people the history of things,
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19848
@steveloughran Any thoughts on @squito's comment ? It might be a valid
corner case some committer might be leveraging ? (in context of a single user
session for example)
---
Github user squito commented on the issue:
https://github.com/apache/spark/pull/19848
i dunno what the requirements are -- I was hoping you would know the hadoop
committer semantics better than me! I suppose a uuid is really the only get
something globally unique, as you could even
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19848
@squito Is there a requirement that it should be globally unique ? I am not
sure whether (some?) committers make this assumption : and the few I did take a
look at did not seem to care.
If
Github user squito commented on the issue:
https://github.com/apache/spark/pull/19848
I have one concern about this -- there is a case where you are not giving a
unique id to the hadoop committers. You could save one rdd twice, and even
have both of those operations running
Github user rezasafi commented on the issue:
https://github.com/apache/spark/pull/19848
The branch 2.2 PR for this fix is here:
https://github.com/apache/spark/pull/19886
---
-
To unsubscribe, e-mail:
Github user rezasafi commented on the issue:
https://github.com/apache/spark/pull/19848
Thank you very much @vanzin, @mridulm and @jiangxb1987. I really appreciate
it. I will create PR for branch 2.2 ASAP.
---
-
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19848
(There was a conflict in 2.2, open a new PR if you want it there.)
---
-
To unsubscribe, e-mail:
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19848
Merging to master / 2.2.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19848
Thanks for fixing this @rezasafi !
This looks cleaner than my suggestion to generate a unique jobId. LGTM
@vanzin
---
-
To
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19848
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84380/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19848
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19848
**[Test build #84380 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84380/testReport)**
for PR 19848 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19848
**[Test build #84380 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84380/testReport)**
for PR 19848 at commit
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19848
LGTM but I'll leave it here a bit for others to take a look.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19848
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19848
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84362/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19848
**[Test build #84362 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84362/testReport)**
for PR 19848 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19848
**[Test build #84362 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84362/testReport)**
for PR 19848 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19848
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84355/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19848
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19848
**[Test build #84355 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84355/testReport)**
for PR 19848 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19848
**[Test build #84355 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84355/testReport)**
for PR 19848 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19848
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84321/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19848
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19848
**[Test build #84321 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84321/testReport)**
for PR 19848 at commit
Github user rezasafi commented on the issue:
https://github.com/apache/spark/pull/19848
Thank you very much, @vanzin. I changed the code per your comment and
pushed the changes.
---
-
To unsubscribe, e-mail:
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19848
**[Test build #84321 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84321/testReport)**
for PR 19848 at commit
Github user rezasafi commented on the issue:
https://github.com/apache/spark/pull/19848
@vanzin , @mridulm , @jiangxb1987 let me know if you have any comment here.
Thank you in advance. I appreciate it.
---
-
To
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19848
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19848
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84315/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19848
**[Test build #84315 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84315/testReport)**
for PR 19848 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19848
**[Test build #84315 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84315/testReport)**
for PR 19848 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19848
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19848
**[Test build #84313 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84313/testReport)**
for PR 19848 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19848
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84313/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19848
**[Test build #84313 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84313/testReport)**
for PR 19848 at commit
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19848
ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19848
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
53 matches
Mail list logo