[jira] [Commented] (KAFKA-10021) When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms

2021-02-09 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282158#comment-17282158
 ] 

Randall Hauch commented on KAFKA-10021:
---

Including in 2.6.2 due to reports of rebalance storms because of the logs 
getting stuck. cc [~ableegoldman].

> When reading to the end of the config log, check if fetch.max.wait.ms is 
> greater than worker.sync.timeout.ms
> 
>
> Key: KAFKA-10021
> URL: https://issues.apache.org/jira/browse/KAFKA-10021
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0
>Reporter: Sanjana Kaundinya
>Assignee: Randall Hauch
>Priority: Major
> Fix For: 2.5.2, 2.8.0, 2.7.1, 2.6.2
>
>
> Currently in the Connect code in DistributedHerder.java, we see the following 
> piece of code
>  
> {{if (!canReadConfigs && !readConfigToEnd(workerSyncTimeoutMs))
> return; // Safe to return and tick immediately because 
> readConfigToEnd will do the backoff for us}}
> where the workerSyncTimeoutMs passed in is the timeout given to read to the 
> end of the config log. This is a bug as we should check if fetch.wait.max.ms 
> is greater than worker.sync.timeout.ms and if it is, use 
> worker.sync.timeout.ms as the fetch.wait.max.ms. A better fix would be to use 
> the AdminClient to read to the end of the log, but at a minimum we should 
> check the configs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10021) When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms

2020-05-19 Thread Arjun Satish (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111584#comment-17111584
 ] 

Arjun Satish commented on KAFKA-10021:
--

Thanks, [~hachikuji]!

> When reading to the end of the config log, check if fetch.max.wait.ms is 
> greater than worker.sync.timeout.ms
> 
>
> Key: KAFKA-10021
> URL: https://issues.apache.org/jira/browse/KAFKA-10021
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Sanjana Kaundinya
>Priority: Major
>
> Currently in the Connect code in DistributedHerder.java, we see the following 
> piece of code
>  
> {{if (!canReadConfigs && !readConfigToEnd(workerSyncTimeoutMs))
> return; // Safe to return and tick immediately because 
> readConfigToEnd will do the backoff for us}}
> where the workerSyncTimeoutMs passed in is the timeout given to read to the 
> end of the config log. This is a bug as we should check if fetch.wait.max.ms 
> is greater than worker.sync.timeout.ms and if it is, use 
> worker.sync.timeout.ms as the fetch.wait.max.ms. A better fix would be to use 
> the AdminClient to read to the end of the log, but at a minimum we should 
> check the configs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10021) When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms

2020-05-19 Thread Jason Gustafson (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111581#comment-17111581
 ] 

Jason Gustafson commented on KAFKA-10021:
-

[~wicknicks] Each consumer instance has a separate connection to the brokers. 
This issue wouldn't be a problem if we used a separate AdminClient to query the 
end offsets.

> When reading to the end of the config log, check if fetch.max.wait.ms is 
> greater than worker.sync.timeout.ms
> 
>
> Key: KAFKA-10021
> URL: https://issues.apache.org/jira/browse/KAFKA-10021
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Sanjana Kaundinya
>Priority: Major
>
> Currently in the Connect code in DistributedHerder.java, we see the following 
> piece of code
>  
> {{if (!canReadConfigs && !readConfigToEnd(workerSyncTimeoutMs))
> return; // Safe to return and tick immediately because 
> readConfigToEnd will do the backoff for us}}
> where the workerSyncTimeoutMs passed in is the timeout given to read to the 
> end of the config log. This is a bug as we should check if fetch.wait.max.ms 
> is greater than worker.sync.timeout.ms and if it is, use 
> worker.sync.timeout.ms as the fetch.wait.max.ms. A better fix would be to use 
> the AdminClient to read to the end of the log, but at a minimum we should 
> check the configs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10021) When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms

2020-05-19 Thread Jason Gustafson (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111580#comment-17111580
 ] 

Jason Gustafson commented on KAFKA-10021:
-

[~wicknicks] Each consumer instance has a separate connection to the brokers. 
This issue wouldn't be a problem if we used a separate AdminClient to query the 
end offsets.

> When reading to the end of the config log, check if fetch.max.wait.ms is 
> greater than worker.sync.timeout.ms
> 
>
> Key: KAFKA-10021
> URL: https://issues.apache.org/jira/browse/KAFKA-10021
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Sanjana Kaundinya
>Priority: Major
>
> Currently in the Connect code in DistributedHerder.java, we see the following 
> piece of code
>  
> {{if (!canReadConfigs && !readConfigToEnd(workerSyncTimeoutMs))
> return; // Safe to return and tick immediately because 
> readConfigToEnd will do the backoff for us}}
> where the workerSyncTimeoutMs passed in is the timeout given to read to the 
> end of the config log. This is a bug as we should check if fetch.wait.max.ms 
> is greater than worker.sync.timeout.ms and if it is, use 
> worker.sync.timeout.ms as the fetch.wait.max.ms. A better fix would be to use 
> the AdminClient to read to the end of the log, but at a minimum we should 
> check the configs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10021) When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms

2020-05-19 Thread Arjun Satish (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111493#comment-17111493
 ] 

Arjun Satish commented on KAFKA-10021:
--

[~hachikuji] [~skaundinya] do consumers in a JVM share a connection to a broker 
or do they all create their own connections? if they share a connection, then 
this problem can occur if a connector/task has its consumer overrides that sets 
a high `fetch.max.wait.ms`. In this case, the worker should not allow 
overriding this value in a connector to more than what is allowed by the 
workerSyncTimeoutMs.

> When reading to the end of the config log, check if fetch.max.wait.ms is 
> greater than worker.sync.timeout.ms
> 
>
> Key: KAFKA-10021
> URL: https://issues.apache.org/jira/browse/KAFKA-10021
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Sanjana Kaundinya
>Priority: Major
>
> Currently in the Connect code in DistributedHerder.java, we see the following 
> piece of code
>  
> {{if (!canReadConfigs && !readConfigToEnd(workerSyncTimeoutMs))
> return; // Safe to return and tick immediately because 
> readConfigToEnd will do the backoff for us}}
> where the workerSyncTimeoutMs passed in is the timeout given to read to the 
> end of the config log. This is a bug as we should check if fetch.wait.max.ms 
> is greater than worker.sync.timeout.ms and if it is, use 
> worker.sync.timeout.ms as the fetch.wait.max.ms. A better fix would be to use 
> the AdminClient to read to the end of the log, but at a minimum we should 
> check the configs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10021) When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms

2020-05-19 Thread Jason Gustafson (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111338#comment-17111338
 ] 

Jason Gustafson commented on KAFKA-10021:
-

For a little more detail, the problem today with the `readToLogEnd` function in 
`KafkaBasedLog` is that it can get blocked by `fetch.max.wait.ms`. This is 
because the connection that is used for finding the end offset is also shared 
by the consumer fetching from the log. If the topic has low volume, then it is 
in fact likely that the ListOffset request gets stuck behind a Fetch which is 
blocking on the broker. This can cause a timeout when syncing configs or even 
just slowness when reading offsets using `OffsetStorageReader`. The simplest 
fix would be to use a shared `AdminClient` to fetch the end offset instead of 
the consumer.

> When reading to the end of the config log, check if fetch.max.wait.ms is 
> greater than worker.sync.timeout.ms
> 
>
> Key: KAFKA-10021
> URL: https://issues.apache.org/jira/browse/KAFKA-10021
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Sanjana Kaundinya
>Priority: Major
>
> Currently in the Connect code in DistributedHerder.java, we see the following 
> piece of code
>  
> {{if (!canReadConfigs && !readConfigToEnd(workerSyncTimeoutMs))
> return; // Safe to return and tick immediately because 
> readConfigToEnd will do the backoff for us}}
> where the workerSyncTimeoutMs passed in is the timeout given to read to the 
> end of the config log. This is a bug as we should check if fetch.wait.max.ms 
> is greater than worker.sync.timeout.ms and if it is, use 
> worker.sync.timeout.ms as the fetch.wait.max.ms. A better fix would be to use 
> the AdminClient to read to the end of the log, but at a minimum we should 
> check the configs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)