GitHub user xuanyuanking opened a pull request:

    https://github.com/apache/spark/pull/20150

    [SPARK-22956][SS] Bug fix for 2 streams union failover scenario

    ## What changes were proposed in this pull request?
    
    This problem reported by @yanlin-Lynn @ivoson and @LiangchangZ. Thanks!
    
    When we union 2 streams from kafka or other sources, while one of them have 
no continues data coming and in the same time task restart, this will cause an 
`IllegalStateException`. This mainly cause because the code in 
[MicroBatchExecution](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L190)
 , while one stream has no continues data, its comittedOffset same with 
availableOffset during `populateStartOffsets`, and `currentPartitionOffsets` 
not properly handled in KafkaSource. Also, maybe we should also consider this 
scenario in other Source.
    
    ## How was this patch tested?
    
    Add a UT in KafkaSourceSuite.scala


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuanyuanking/spark SPARK-22956

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20150.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20150
    
----
commit aa3d7b73ed5221bdc2aee9dea1f6db45b4a626d7
Author: Yuanjian Li <xyliyuanjian@...>
Date:   2018-01-04T11:52:23Z

    SPARK-22956: Bug fix for 2 streams union failover scenario

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to