[ 
https://issues.apache.org/jira/browse/KAFKA-10218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152108#comment-17152108
 ] 

Randall Hauch edited comment on KAFKA-10218 at 7/6/20, 3:44 PM:
----------------------------------------------------------------

Thanks, [~ChrisEgerton]. You didn't really describe the impact of this bug, but 
IIUC the herder still behaves correctly with this bug but does extra work with 
every herder tick. Specifically, the config log's `producer.flush()` will have 
work to do only if there were recent new, removed, or changed connector configs 
or state change requests for that herder. And although the config log's 
consumer does have to fetch end offsets with every tick, only when the producer 
will have flushed records might the consumer have to block while it consumes 
records. (I say "might" because it's possible, though less likely, the consumer 
had just finished consuming those records and does not have to block.)

IOW, most of the time the only impact of this bug is that the Connect worker's 
herder unnecessarily fetches the config topic's end offsets on every tick. 
(Normally, the herder's config log already reads to the end any time anything 
of interest happens anyway.) But in some cases a lossy network or Kafka cluster 
transient outage coupled with the short default timeout (3 seconds) could cause 
the worker to leave the group, when it might not normally need to. 

Is my analysis correct?

(This isn't to suggest we not fix this; I'm just trying to understand the 
impact of the bug and to properly prioritize the fix.)


was (Author: rhauch):
Thanks, [~ChrisEgerton]. You didn't really describe the impact of this bug, but 
IIUC the herder still behaves correctly with this bug but does extra work with 
every herder tick. Specifically, the config log's `producer.flush()` will have 
work to do only if there were recent new, removed, or changed connector configs 
or state change requests for that herder. And although the config log's 
consumer does have to fetch end offsets with every tick, only when the producer 
will have flushed records might the consumer have to block while it consumes 
records. (I say "might" because it's possible, though less likely, the consumer 
had just finished consuming those records and does not have to block.)

IOW, most of the time the only impact of this bug is that the Connect worker's 
herder unnecessarily fetches the config topic's end offsets on every tick. 
(Normally, the herder's config log already reads to the end any time anything 
of interest happens anyway.) But in some cases a lossy network or Kafka cluster 
transient outage coupled with the short default timeout (3 seconds) could cause 
the worker to leave the group, when it might not normally need to. 

Is my analysis correct?

> DistributedHerder's canReadConfigs field is never reset to true
> ---------------------------------------------------------------
>
>                 Key: KAFKA-10218
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10218
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>    Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1, 2.0.2, 2.3.0, 2.1.2, 
> 2.2.1, 2.2.2, 2.4.0, 2.3.1, 2.2.3, 2.5.0, 2.3.2, 2.4.1, 2.6.0, 2.4.2, 2.5.1, 
> 2.7.0, 2.5.2, 2.6.1
>            Reporter: Chris Egerton
>            Assignee: Chris Egerton
>            Priority: Major
>
> If the {{DistributedHerder}} encounters issues reading to the end of the 
> config topic, it [takes note of this 
> fact|https://github.com/apache/kafka/blob/7db52a46b00eed652e791dd4eae809d590626a1f/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1109]
>  by setting a field {{canReadConfigs}} to {{false}} and then acts accordingly 
> at the [start of its tick 
> loop|https://github.com/apache/kafka/blob/7db52a46b00eed652e791dd4eae809d590626a1f/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L319]
>  by trying again to read to the end of the config topic. However, if a 
> subsequent attempt to read to the end of the config topic succeeds, the 
> {{canReadConfigs}} field is never set back to {{true}} again, so no matter 
> what, the herder will always attempt to read to the end of the config topic 
> at the beginning of each tick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to