[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

cloud-fan Tue, 06 Feb 2018 10:20:42 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20490#discussion_r166395788
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala
 ---
    @@ -117,20 +118,43 @@ object DataWritingSparkTask extends Logging {
           writeTask: DataWriterFactory[InternalRow],
           context: TaskContext,
           iter: Iterator[InternalRow]): WriterCommitMessage = {
    -    val dataWriter = writeTask.createDataWriter(context.partitionId(), 
context.attemptNumber())
    +    val stageId = context.stageId()
    +    val partId = context.partitionId()
    +    val attemptId = context.attemptNumber()
    +    val dataWriter = writeTask.createDataWriter(partId, attemptId)
     
         // write the data and commit this writer.
         Utils.tryWithSafeFinallyAndFailureCallbacks(block = {
           iter.foreach(dataWriter.write)
    -      logInfo(s"Writer for partition ${context.partitionId()} is 
committing.")
    -      val msg = dataWriter.commit()
    -      logInfo(s"Writer for partition ${context.partitionId()} committed.")
    +
    +      val msg = if (writeTask.useCommitCoordinator) {
    +        val coordinator = SparkEnv.get.outputCommitCoordinator
    --- End diff --
    
    Since we have a workaround(call coordinator in `DataWriter.commit`), I 
don't think this should block the 2.3 release, but we can definitely get this 
in branch 2.3 if there is no breaking change on the public APIs.
    
    And I won't treat it as a correctness bug. The default no-coordinator 
behavior is well documented with the current APIs, see the classdoc of 
`DataWriter`. We never guarantee that for an RDD partition, only one task can 
commit successfully.
    
    > What do you have in mind to "introduce the concept"?
    
    I never thought about it before, I'll think about it these days.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

Reply via email to