Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14292#discussion_r72125645
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
    @@ -91,18 +92,30 @@ class HDFSMetadataLog[T: ClassTag](sparkSession: 
SparkSession, path: String)
         serializer.deserialize[T](ByteBuffer.wrap(bytes))
       }
     
    +  /**
    +   * Store the metadata for the specified batchId and return `true` if 
successful. If the batchId's
    +   * metadata has already been stored, this method will return `false`.
    +   *
    +   * Note that this method must be called on a 
[[org.apache.spark.util.UninterruptibleThread]]
    +   * so that interrupts can be disabled while writing the batch file. This 
is because there is a
    +   * potential dead-lock in Hadoop "Shell.runCommand" before 2.5.0 
(HADOOP-10622). If the thread
    +   * running "Shell.runCommand" is interrupted, then the thread can get 
deadlocked. In our
    +   * case, `writeBatch` creates a file using HDFS API and calls 
"Shell.runCommand" to set the
    +   * file permissions, and can get deadlocked is the stream execution 
thread is stopped by
    +   * interrupt. Hence, we make sure that this method is called on 
UninterruptibleThread which
    +   * allows use disable interrupts. Also see SPARK-14131.
    --- End diff --
    
    "allows us to disable interrupts here"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to