GitHub user gaborgsomogyi opened a pull request:

    https://github.com/apache/spark/pull/20620

    [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL when driver crashes

    ## What changes were proposed in this pull request?
    
    There is a race condition introduced in SPARK-11141 which could cause data 
loss.
    The problem is that ReceivedBlockTracker.insertAllocatedBatch function 
assumes that all blocks from streamIdToUnallocatedBlockQueues allocated to the 
batch and clears the queue.
    
    In this PR only the allocated blocks will be removed from the queue which 
will prevent data loss.
    
    ## How was this patch tested?
    
    Additional unit test + manually.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gaborgsomogyi/spark SPARK-23438

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20620.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20620
    
----
commit 152fec431218161e538c377a6cb82753100dc70b
Author: Gabor Somogyi <gabor.g.somogyi@...>
Date:   2018-02-09T08:30:19Z

    [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL when driver crashes

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to