[ https://issues.apache.org/jira/browse/KAFKA-10218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152108#comment-17152108 ]
Randall Hauch edited comment on KAFKA-10218 at 7/6/20, 3:44 PM: ---------------------------------------------------------------- Thanks, [~ChrisEgerton]. You didn't really describe the impact of this bug, but IIUC the herder still behaves correctly with this bug but does extra work with every herder tick. Specifically, the config log's `producer.flush()` will have work to do only if there were recent new, removed, or changed connector configs or state change requests for that herder. And although the config log's consumer does have to fetch end offsets with every tick, only when the producer will have flushed records might the consumer have to block while it consumes records. (I say "might" because it's possible, though less likely, the consumer had just finished consuming those records and does not have to block.) IOW, most of the time the only impact of this bug is that the Connect worker's herder unnecessarily fetches the config topic's end offsets on every tick. (Normally, the herder's config log already reads to the end any time anything of interest happens anyway.) But in some cases a lossy network or Kafka cluster transient outage coupled with the short default timeout (3 seconds) could cause the worker to leave the group, when it might not normally need to. Is my analysis correct? (This isn't to suggest we not fix this; I'm just trying to understand the impact of the bug and to properly prioritize the fix.) was (Author: rhauch): Thanks, [~ChrisEgerton]. You didn't really describe the impact of this bug, but IIUC the herder still behaves correctly with this bug but does extra work with every herder tick. Specifically, the config log's `producer.flush()` will have work to do only if there were recent new, removed, or changed connector configs or state change requests for that herder. And although the config log's consumer does have to fetch end offsets with every tick, only when the producer will have flushed records might the consumer have to block while it consumes records. (I say "might" because it's possible, though less likely, the consumer had just finished consuming those records and does not have to block.) IOW, most of the time the only impact of this bug is that the Connect worker's herder unnecessarily fetches the config topic's end offsets on every tick. (Normally, the herder's config log already reads to the end any time anything of interest happens anyway.) But in some cases a lossy network or Kafka cluster transient outage coupled with the short default timeout (3 seconds) could cause the worker to leave the group, when it might not normally need to. Is my analysis correct? > DistributedHerder's canReadConfigs field is never reset to true > --------------------------------------------------------------- > > Key: KAFKA-10218 > URL: https://issues.apache.org/jira/browse/KAFKA-10218 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect > Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1, 2.0.2, 2.3.0, 2.1.2, > 2.2.1, 2.2.2, 2.4.0, 2.3.1, 2.2.3, 2.5.0, 2.3.2, 2.4.1, 2.6.0, 2.4.2, 2.5.1, > 2.7.0, 2.5.2, 2.6.1 > Reporter: Chris Egerton > Assignee: Chris Egerton > Priority: Major > > If the {{DistributedHerder}} encounters issues reading to the end of the > config topic, it [takes note of this > fact|https://github.com/apache/kafka/blob/7db52a46b00eed652e791dd4eae809d590626a1f/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1109] > by setting a field {{canReadConfigs}} to {{false}} and then acts accordingly > at the [start of its tick > loop|https://github.com/apache/kafka/blob/7db52a46b00eed652e791dd4eae809d590626a1f/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L319] > by trying again to read to the end of the config topic. However, if a > subsequent attempt to read to the end of the config topic succeeds, the > {{canReadConfigs}} field is never set back to {{true}} again, so no matter > what, the herder will always attempt to read to the end of the config topic > at the beginning of each tick. -- This message was sent by Atlassian Jira (v8.3.4#803005)