Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197540970 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -76,13 +76,17 @@ object SparkHadoopWriter extends Logging { // Try to write all RDD partitions as a Hadoop OutputFormat. try { val ret = sparkContext.runJob(rdd, (context: TaskContext, iter: Iterator[(K, V)]) => { + // SPARK-24552: Generate a unique "attempt ID" based on the stage and task atempt numbers. + // Assumes that there won't be more than Short.MaxValue attempts, at least not concurrently. + val attemptId = (context.stageAttemptNumber << 16) | context.attemptNumber --- End diff -- I don't think we should generate an ID this way. We already have a unique ID that is exposed in the Spark UI. I'd much rather make it clear that the TID passed to committers as an attempt ID is the same as the TID in the stage view. That makes debugging easier. Going with this approach just introduces yet another number to track an attempt.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org