[jira] [Commented] (KAFKA-10021) When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms
[ https://issues.apache.org/jira/browse/KAFKA-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282158#comment-17282158 ] Randall Hauch commented on KAFKA-10021: --- Including in 2.6.2 due to reports of rebalance storms because of the logs getting stuck. cc [~ableegoldman]. > When reading to the end of the config log, check if fetch.max.wait.ms is > greater than worker.sync.timeout.ms > > > Key: KAFKA-10021 > URL: https://issues.apache.org/jira/browse/KAFKA-10021 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0 >Reporter: Sanjana Kaundinya >Assignee: Randall Hauch >Priority: Major > Fix For: 2.5.2, 2.8.0, 2.7.1, 2.6.2 > > > Currently in the Connect code in DistributedHerder.java, we see the following > piece of code > > {{if (!canReadConfigs && !readConfigToEnd(workerSyncTimeoutMs)) > return; // Safe to return and tick immediately because > readConfigToEnd will do the backoff for us}} > where the workerSyncTimeoutMs passed in is the timeout given to read to the > end of the config log. This is a bug as we should check if fetch.wait.max.ms > is greater than worker.sync.timeout.ms and if it is, use > worker.sync.timeout.ms as the fetch.wait.max.ms. A better fix would be to use > the AdminClient to read to the end of the log, but at a minimum we should > check the configs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10021) When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms
[ https://issues.apache.org/jira/browse/KAFKA-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111584#comment-17111584 ] Arjun Satish commented on KAFKA-10021: -- Thanks, [~hachikuji]! > When reading to the end of the config log, check if fetch.max.wait.ms is > greater than worker.sync.timeout.ms > > > Key: KAFKA-10021 > URL: https://issues.apache.org/jira/browse/KAFKA-10021 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Sanjana Kaundinya >Priority: Major > > Currently in the Connect code in DistributedHerder.java, we see the following > piece of code > > {{if (!canReadConfigs && !readConfigToEnd(workerSyncTimeoutMs)) > return; // Safe to return and tick immediately because > readConfigToEnd will do the backoff for us}} > where the workerSyncTimeoutMs passed in is the timeout given to read to the > end of the config log. This is a bug as we should check if fetch.wait.max.ms > is greater than worker.sync.timeout.ms and if it is, use > worker.sync.timeout.ms as the fetch.wait.max.ms. A better fix would be to use > the AdminClient to read to the end of the log, but at a minimum we should > check the configs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10021) When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms
[ https://issues.apache.org/jira/browse/KAFKA-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111581#comment-17111581 ] Jason Gustafson commented on KAFKA-10021: - [~wicknicks] Each consumer instance has a separate connection to the brokers. This issue wouldn't be a problem if we used a separate AdminClient to query the end offsets. > When reading to the end of the config log, check if fetch.max.wait.ms is > greater than worker.sync.timeout.ms > > > Key: KAFKA-10021 > URL: https://issues.apache.org/jira/browse/KAFKA-10021 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Sanjana Kaundinya >Priority: Major > > Currently in the Connect code in DistributedHerder.java, we see the following > piece of code > > {{if (!canReadConfigs && !readConfigToEnd(workerSyncTimeoutMs)) > return; // Safe to return and tick immediately because > readConfigToEnd will do the backoff for us}} > where the workerSyncTimeoutMs passed in is the timeout given to read to the > end of the config log. This is a bug as we should check if fetch.wait.max.ms > is greater than worker.sync.timeout.ms and if it is, use > worker.sync.timeout.ms as the fetch.wait.max.ms. A better fix would be to use > the AdminClient to read to the end of the log, but at a minimum we should > check the configs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10021) When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms
[ https://issues.apache.org/jira/browse/KAFKA-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111580#comment-17111580 ] Jason Gustafson commented on KAFKA-10021: - [~wicknicks] Each consumer instance has a separate connection to the brokers. This issue wouldn't be a problem if we used a separate AdminClient to query the end offsets. > When reading to the end of the config log, check if fetch.max.wait.ms is > greater than worker.sync.timeout.ms > > > Key: KAFKA-10021 > URL: https://issues.apache.org/jira/browse/KAFKA-10021 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Sanjana Kaundinya >Priority: Major > > Currently in the Connect code in DistributedHerder.java, we see the following > piece of code > > {{if (!canReadConfigs && !readConfigToEnd(workerSyncTimeoutMs)) > return; // Safe to return and tick immediately because > readConfigToEnd will do the backoff for us}} > where the workerSyncTimeoutMs passed in is the timeout given to read to the > end of the config log. This is a bug as we should check if fetch.wait.max.ms > is greater than worker.sync.timeout.ms and if it is, use > worker.sync.timeout.ms as the fetch.wait.max.ms. A better fix would be to use > the AdminClient to read to the end of the log, but at a minimum we should > check the configs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10021) When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms
[ https://issues.apache.org/jira/browse/KAFKA-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111493#comment-17111493 ] Arjun Satish commented on KAFKA-10021: -- [~hachikuji] [~skaundinya] do consumers in a JVM share a connection to a broker or do they all create their own connections? if they share a connection, then this problem can occur if a connector/task has its consumer overrides that sets a high `fetch.max.wait.ms`. In this case, the worker should not allow overriding this value in a connector to more than what is allowed by the workerSyncTimeoutMs. > When reading to the end of the config log, check if fetch.max.wait.ms is > greater than worker.sync.timeout.ms > > > Key: KAFKA-10021 > URL: https://issues.apache.org/jira/browse/KAFKA-10021 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Sanjana Kaundinya >Priority: Major > > Currently in the Connect code in DistributedHerder.java, we see the following > piece of code > > {{if (!canReadConfigs && !readConfigToEnd(workerSyncTimeoutMs)) > return; // Safe to return and tick immediately because > readConfigToEnd will do the backoff for us}} > where the workerSyncTimeoutMs passed in is the timeout given to read to the > end of the config log. This is a bug as we should check if fetch.wait.max.ms > is greater than worker.sync.timeout.ms and if it is, use > worker.sync.timeout.ms as the fetch.wait.max.ms. A better fix would be to use > the AdminClient to read to the end of the log, but at a minimum we should > check the configs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10021) When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms
[ https://issues.apache.org/jira/browse/KAFKA-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111338#comment-17111338 ] Jason Gustafson commented on KAFKA-10021: - For a little more detail, the problem today with the `readToLogEnd` function in `KafkaBasedLog` is that it can get blocked by `fetch.max.wait.ms`. This is because the connection that is used for finding the end offset is also shared by the consumer fetching from the log. If the topic has low volume, then it is in fact likely that the ListOffset request gets stuck behind a Fetch which is blocking on the broker. This can cause a timeout when syncing configs or even just slowness when reading offsets using `OffsetStorageReader`. The simplest fix would be to use a shared `AdminClient` to fetch the end offset instead of the consumer. > When reading to the end of the config log, check if fetch.max.wait.ms is > greater than worker.sync.timeout.ms > > > Key: KAFKA-10021 > URL: https://issues.apache.org/jira/browse/KAFKA-10021 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Sanjana Kaundinya >Priority: Major > > Currently in the Connect code in DistributedHerder.java, we see the following > piece of code > > {{if (!canReadConfigs && !readConfigToEnd(workerSyncTimeoutMs)) > return; // Safe to return and tick immediately because > readConfigToEnd will do the backoff for us}} > where the workerSyncTimeoutMs passed in is the timeout given to read to the > end of the config log. This is a bug as we should check if fetch.wait.max.ms > is greater than worker.sync.timeout.ms and if it is, use > worker.sync.timeout.ms as the fetch.wait.max.ms. A better fix would be to use > the AdminClient to read to the end of the log, but at a minimum we should > check the configs. -- This message was sent by Atlassian Jira (v8.3.4#803005)