GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15387
[SPARK-17782][STREAMING][KAFKA] eliminate race condition of poll twice ## What changes were proposed in this pull request? Kafka consumers can't subscribe or maintain heartbeat without polling, but polling ordinarily consumes messages and adjusts position. We don't want this on the driver, so we poll with a timeout of 0 and pause all topicpartitions. Some consumer strategies that seek to particular positions have to poll first, but they weren't pausing immediately thereafter. Thus, there was a race condition where the second poll() in the DStream start method might actually adjust consumer position. Eliminated (or at least drastically reduced the chance of) the race condition via pausing in the relevant consumer strategies, and assert on startup that no messages were consumed. ## How was this patch tested? I reliably reproduced the intermittent test failure by inserting a thread.sleep directly before returning from SubscribePattern. The suggested fix eliminated the failure. You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 SPARK-17782 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15387.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15387 ---- commit 1fc5863db88cac9dfd0be09318c4ca8779a51682 Author: cody koeninger <c...@koeninger.org> Date: 2016-10-07T01:08:01Z [SPARK-17782][STREAMING][KAFKA] eliminate race condition of poll being called twice and moving position ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org