[
https://issues.apache.org/jira/browse/CRUNCH-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742438#comment-15742438
]
Andrew Olson commented on CRUNCH-630:
-------------------------------------
The current workaround for this bug is to set auto.offset.reset=earliest in the
Kafka connection properties when creating the KafkaSource (or alternatively
org.apache.crunch.kafka.connection.properties.auto.offset.reset=earliest in the
Pipeline's Configuration).
We might consider making that a config override like the serializers [1], or at
least flipping the default from latest to earliest if it's not specified.
[1]
https://github.com/apache/crunch/blob/master/crunch-kafka/src/main/java/org/apache/crunch/kafka/KafkaSource.java#L156-L165
> KafkaRecordReader keeps retrying to poll data when the offset is reset to
> latest offset
> ---------------------------------------------------------------------------------------
>
> Key: CRUNCH-630
> URL: https://issues.apache.org/jira/browse/CRUNCH-630
> Project: Crunch
> Issue Type: Bug
> Reporter: Pooja Dhondge
>
> We recently saw this behavior where, if the offset it is trying to read from
> doesn't exist on Kafka due to retention policy, the offset gets reset to
> latest(default) and the KafkaRecordReader keeps retrying beyond
> KAFKA_EMPTY_RETRY_ATTEMPTS_KEY
> {noformat}
> ...crunch.kafka.inputformat.KafkaRecordReader: No records retrieved but
> pending offsets to consume therefore polling again. Attempt 17/10
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)