Github user rezasafi commented on the issue:

    https://github.com/apache/spark/pull/19848
  
    @steveloughran Thank you very much for your detailed comment. I really 
appreciate it. I think In the above list when you reach step 6, for Stage2 you 
will have a different JobId and it cannot be zero considering the current fix. 
That is because the JobId is rdd.id and in the spark context you will have a 
new rddId for each new rdd (nextRddId.getAndIncrement()). 
    Across different executions (with different SparkContexts) we may hit the 
same jobId using this fix. What I understand from your detailed analysis, to 
resolve that we have two options:
    1) Check if the same jobId already is committed and then remove existing 
files and commit again.
    2) Use a UUID and each time create a new unique jobId even across different 
executions.
    Option 2 can be problematic since we may not want to have different copies 
of an rdd at different times. We probably just want the latest one. So maybe 
the first option is better.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to