[
https://issues.apache.org/jira/browse/KAFKA-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803036#comment-16803036
]
ASF GitHub Bot commented on KAFKA-6399:
---------------------------------------
vvcephei commented on pull request #6509: KAFKA-6399: Reduce Streams
max.poll.interval
URL: https://github.com/apache/kafka/pull/6509
Since we now call `poll` during restore, we can decrease the timeout
to a reasonable value, which should help Streams make progress if
threads get stuck.
### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Consider reducing "max.poll.interval.ms" default for Kafka Streams
> ------------------------------------------------------------------
>
> Key: KAFKA-6399
> URL: https://issues.apache.org/jira/browse/KAFKA-6399
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Affects Versions: 1.0.0
> Reporter: Matthias J. Sax
> Assignee: John Roesler
> Priority: Minor
>
> In Kafka {{0.10.2.1}} we change the default value of
> {{max.poll.intervall.ms}} for Kafka Streams to {{Integer.MAX_VALUE}}. The
> reason was that long state restore phases during rebalance could yield
> "rebalance storms" as consumers drop out of a consumer group even if they are
> healthy as they didn't call {{poll()}} during state restore phase.
> In version {{0.11}} and {{1.0}} the state restore logic was improved a lot
> and thus, now Kafka Streams does call {{poll()}} even during restore phase.
> Therefore, we might consider setting a smaller timeout for
> {{max.poll.intervall.ms}} to detect bad behaving Kafka Streams applications
> (ie, targeting user code) that don't make progress any more during regular
> operations.
> The open question would be, what a good default might be. Maybe the actual
> consumer default of 30 seconds might be sufficient. During one {{poll()}}
> roundtrip, we would only call {{restoreConsumer.poll()}} once and restore a
> single batch of records. This should take way less time than 30 seconds.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)