Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20490#discussion_r166395788 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -117,20 +118,43 @@ object DataWritingSparkTask extends Logging { writeTask: DataWriterFactory[InternalRow], context: TaskContext, iter: Iterator[InternalRow]): WriterCommitMessage = { - val dataWriter = writeTask.createDataWriter(context.partitionId(), context.attemptNumber()) + val stageId = context.stageId() + val partId = context.partitionId() + val attemptId = context.attemptNumber() + val dataWriter = writeTask.createDataWriter(partId, attemptId) // write the data and commit this writer. Utils.tryWithSafeFinallyAndFailureCallbacks(block = { iter.foreach(dataWriter.write) - logInfo(s"Writer for partition ${context.partitionId()} is committing.") - val msg = dataWriter.commit() - logInfo(s"Writer for partition ${context.partitionId()} committed.") + + val msg = if (writeTask.useCommitCoordinator) { + val coordinator = SparkEnv.get.outputCommitCoordinator --- End diff -- Since we have a workaround(call coordinator in `DataWriter.commit`), I don't think this should block the 2.3 release, but we can definitely get this in branch 2.3 if there is no breaking change on the public APIs. And I won't treat it as a correctness bug. The default no-coordinator behavior is well documented with the current APIs, see the classdoc of `DataWriter`. We never guarantee that for an RDD partition, only one task can commit successfully. > What do you have in mind to "introduce the concept"? I never thought about it before, I'll think about it these days.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org