[GitHub] spark pull request #14292: [SPARK-14131][SQL[STREAMING] Improved fix for avo...

jaceklaskowski Mon, 25 Jul 2016 14:26:02 -0700

Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14292#discussion_r72148009
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
    @@ -269,19 +273,11 @@ class StreamExecution(
        * batchId counter is incremented and a new log entry is written with 
the newest offsets.
        */
       private def constructNextBatch(): Unit = {
    -    // There is a potential dead-lock in Hadoop "Shell.runCommand" before 
2.5.0 (HADOOP-10622).
    -    // If we interrupt some thread running Shell.runCommand, we may hit 
this issue.
    -    // As "FileStreamSource.getOffset" will create a file using HDFS API 
and call "Shell.runCommand"
    -    // to set the file permission, we should not interrupt 
"microBatchThread" when running this
    -    // method. See SPARK-14131.
    -    //
         // Check to see what new data is available.
         val hasNewData = {
           awaitBatchLock.lock()
           try {
    -        val newData = microBatchThread.runUninterruptibly {
    -          uniqueSources.flatMap(s => s.getOffset.map(o => s -> o))
    -        }
    +        val newData = uniqueSources.flatMap(s => s.getOffset.map(o => s -> 
o))
    --- End diff --
    
    Gave it a longer thought. I'm not using for comprehension very often, but 
when I do...What do you think about this?
    
    ```
            val newData = for {
              source <- uniqueSources
              offset <- source.getOffset
            } yield (source, offset)
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14292: [SPARK-14131][SQL[STREAMING] Improved fix for avo...

Reply via email to