[ 
https://issues.apache.org/jira/browse/SPARK-21378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092088#comment-16092088
 ] 

Shixiong Zhu commented on SPARK-21378:
--------------------------------------

The data must already be in Kafka when executors try to poll. So if it cannot 
poll any record, it means either the timeout is too short, or something is 
wrong. It it's jut a timeout issue, you can just increase it. Otherwise, IMO, 
blindly retrying on unknown is pretty bad. One example of this is if you type a 
wrong broker address, the Kafka consumer will just run forever.

> Spark Poll timeout when specific offsets are passed
> ---------------------------------------------------
>
>                 Key: SPARK-21378
>                 URL: https://issues.apache.org/jira/browse/SPARK-21378
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.0.0, 2.0.2
>            Reporter: Ambud Sharma
>
> Kafka direct stream fails with poll timeout:
> {code:java}
> JavaInputDStream<ConsumerRecord<String, String>> stream = 
> KafkaUtils.createDirectStream(ssc,
>                               LocationStrategies.PreferConsistent(),
>                               ConsumerStrategies.<String, 
> String>Subscribe(topicsCollection, kafkaParams, fromOffsets));
> {code}
> Digging deeper shows that there's an assert statement such that if no records 
> are returned (which is a valid case) then a failure will happen.
> https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala#L75
> This solution: https://issues.apache.org/jira/browse/SPARK-19275 keeps 
> getting "Added jobs for time" and eventually leads to "Failed to get records 
> for spark-xxxxx after polling for 3000"; in this case batch size is 3seconds
> We can increase it to an even bigger number which leads to OOM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to