GitHub user koeninger opened a pull request:

    https://github.com/apache/spark/pull/15387

    [SPARK-17782][STREAMING][KAFKA] eliminate race condition of poll twice

    ## What changes were proposed in this pull request?
    
    Kafka consumers can't subscribe or maintain heartbeat without polling, but 
polling ordinarily consumes messages and adjusts position.  We don't want this 
on the driver, so we poll with a timeout of 0 and pause all topicpartitions.
    
    Some consumer strategies that seek to particular positions have to poll 
first, but they weren't pausing immediately thereafter.  Thus, there was a race 
condition where the second poll() in the DStream start method might actually 
adjust consumer position.
    
    Eliminated (or at least drastically reduced the chance of) the race 
condition via pausing in the relevant consumer strategies, and assert on 
startup that no messages were consumed.
    
    ## How was this patch tested?
    
    I reliably reproduced the intermittent test failure by inserting a 
thread.sleep directly before returning from SubscribePattern.  The suggested 
fix eliminated the failure.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/koeninger/spark-1 SPARK-17782

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15387.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15387
    
----
commit 1fc5863db88cac9dfd0be09318c4ca8779a51682
Author: cody koeninger <c...@koeninger.org>
Date:   2016-10-07T01:08:01Z

    [SPARK-17782][STREAMING][KAFKA] eliminate race condition of poll being 
called twice and moving position

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to