[jira] [Commented] (KAFKA-13959) Controller should unfence Broker with busy metadata log

2022-06-09 Thread Luke Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552107#comment-17552107
 ] 

Luke Chen commented on KAFKA-13959:
---

So I think the proposed solution should fix the issue.

One solution to this problem is to require the broker to only catch up to the 
last committed offset when they last sent the heartbeat. For example:
 # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit 
offset is {{{}X{}}}. The controller remember this last commit offset, call it 
{{X'}}
 # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence the 
broker if {{Z >= X}} or {{{}Z >= X'{}}}.

> Controller should unfence Broker with busy metadata log
> ---
>
> Key: KAFKA-13959
> URL: https://issues.apache.org/jira/browse/KAFKA-13959
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Priority: Blocker
>
> https://issues.apache.org/jira/browse/KAFKA-13955 showed that it is possible 
> for the controller to not unfence a broker if the committed offset keeps 
> increasing.
>  
> One solution to this problem is to require the broker to only catch up to the 
> last committed offset when they last sent the heartbeat. For example:
>  # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit 
> offset is {{{}X{}}}. The controller remember this last commit offset, call it 
> {{X'}}
>  # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence 
> the broker if {{Z >= X}} or {{{}Z >= X'{}}}.
>  
> This change should also set the default for MetadataMaxIdleIntervalMs back to 
> 500.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KAFKA-13959) Controller should unfence Broker with busy metadata log

2022-06-09 Thread Luke Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552106#comment-17552106
 ] 

Luke Chen commented on KAFKA-13959:
---

[~dengziming] [~jagsancio] , I did some investigation today, and here's my 
finding:
 # broker heartbeat to active controller won't fetch any data or increase the 
offset. broker just sends the current offset and some broker info to the 
controller. So, even if we have small interval of heartbeat, it still won't 
help.
 # So, when will the broker offset increased? It only happened in broker 
metadataListener. the metadataListener is listening to raftClient. And 
raftClient is polling metadata from active controller.
 # About when the highwatermark will be updated in active controller: Whenever 
there's record append to the active controller log, it won't update the 
highwatermark, until there are voters fetch records from active controller and 
also update the highwatermark. ex: current active controller is in 
highwatermark 9, and a record append to active controller log to offset 10, 
it'll wait, until voters send fetch request to active controller to update 
highwatermark, and then, commit the offset 10 record, update the new 
highwatermark to 10, to make sure the record is replicated to a majority of the 
voters.
 # So, that explains what we saw in the issue:
 ## active controller send no-op message to metadata topic, active controller 
append into log, but don't update highwatermark (still 9)
 ## broker raftClient fetch records from active controller,
 ## active controller return the records to offset 9, and then update the 
highwatermark to 10
 ## broker metaListener will operate the records
 ## broker send heartbeat to active controller with offest 9
 ## since offset 9 is < active controller highwatermark 10
 ## keep trying, and in the meantime, no-op message sent again, and back to 
step 1

 

> Controller should unfence Broker with busy metadata log
> ---
>
> Key: KAFKA-13959
> URL: https://issues.apache.org/jira/browse/KAFKA-13959
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Priority: Blocker
>
> https://issues.apache.org/jira/browse/KAFKA-13955 showed that it is possible 
> for the controller to not unfence a broker if the committed offset keeps 
> increasing.
>  
> One solution to this problem is to require the broker to only catch up to the 
> last committed offset when they last sent the heartbeat. For example:
>  # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit 
> offset is {{{}X{}}}. The controller remember this last commit offset, call it 
> {{X'}}
>  # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence 
> the broker if {{Z >= X}} or {{{}Z >= X'{}}}.
>  
> This change should also set the default for MetadataMaxIdleIntervalMs back to 
> 500.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KAFKA-13959) Controller should unfence Broker with busy metadata log

2022-06-08 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551663#comment-17551663
 ] 

Jose Armando Garcia Sancio commented on KAFKA-13959:


[~dengziming], If you haven't, maybe looking at the KRaft side of the 
implementation may help. Specially at the LEOs reported by KRaft for both the 
controller and the broker(s), any pending FETCH request(s) and how often 
brokers sends FETCH requests.

> Controller should unfence Broker with busy metadata log
> ---
>
> Key: KAFKA-13959
> URL: https://issues.apache.org/jira/browse/KAFKA-13959
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Priority: Blocker
>
> https://issues.apache.org/jira/browse/KAFKA-13955 showed that it is possible 
> for the controller to not unfence a broker if the committed offset keeps 
> increasing.
>  
> One solution to this problem is to require the broker to only catch up to the 
> last committed offset when they last sent the heartbeat. For example:
>  # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit 
> offset is {{{}X{}}}. The controller remember this last commit offset, call it 
> {{X'}}
>  # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence 
> the broker if {{Z >= X}} or {{{}Z >= X'{}}}.
>  
> This change should also set the default for MetadataMaxIdleIntervalMs back to 
> 500.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KAFKA-13959) Controller should unfence Broker with busy metadata log

2022-06-08 Thread dengziming (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551534#comment-17551534
 ] 

dengziming commented on KAFKA-13959:


I haven't find the root cause, I just print the brokerOffset and 
controllerOffset when heartbeat, I find that every time the brokerOffset bump, 
the controllerOffset will also bump. 

```

time: 1654679131904 broker 0 brokerOffset:27 controllerOffset:28
time: 1654679132115 broker 0 brokerOffset:27 controllerOffset:28
time: 1654679132381 broker 0 brokerOffset:28 controllerOffset:29
time: 1654679132592 broker 0 brokerOffset:28 controllerOffset:29
time: 1654679132878 broker 0 brokerOffset:29 controllerOffset:30
time: 1654679133089 broker 0 brokerOffset:29 controllerOffset:30
time: 1654679133299 broker 0 brokerOffset:30 controllerOffset:31
time: 1654679133509 broker 0 brokerOffset:30 controllerOffset:31

```

I try to increase the interval of heartbeats but got the same result, and if I 
set numberControllerNodes to 1,  this problem disappear. I think this may be 
related to the logic of how we compute leader hw and follower hw.  

> Controller should unfence Broker with busy metadata log
> ---
>
> Key: KAFKA-13959
> URL: https://issues.apache.org/jira/browse/KAFKA-13959
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Priority: Blocker
>
> https://issues.apache.org/jira/browse/KAFKA-13955 showed that it is possible 
> for the controller to not unfence a broker if the committed offset keeps 
> increasing.
>  
> One solution to this problem is to require the broker to only catch up to the 
> last committed offset when they last sent the heartbeat. For example:
>  # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit 
> offset is {{{}X{}}}. The controller remember this last commit offset, call it 
> {{X'}}
>  # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence 
> the broker if {{Z >= X}} or {{{}Z >= X'{}}}.
>  
> This change should also set the default for MetadataMaxIdleIntervalMs back to 
> 500.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KAFKA-13959) Controller should unfence Broker with busy metadata log

2022-06-07 Thread Luke Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550943#comment-17550943
 ] 

Luke Chen commented on KAFKA-13959:
---

Sorry, I didn't see your last sentence. Thanks for the investigation! Looking 
forward to knowing the root cause! :)

> Controller should unfence Broker with busy metadata log
> ---
>
> Key: KAFKA-13959
> URL: https://issues.apache.org/jira/browse/KAFKA-13959
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Priority: Blocker
>
> https://issues.apache.org/jira/browse/KAFKA-13955 showed that it is possible 
> for the controller to not unfence a broker if the committed offset keeps 
> increasing.
>  
> One solution to this problem is to require the broker to only catch up to the 
> last committed offset when they last sent the heartbeat. For example:
>  # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit 
> offset is {{{}X{}}}. The controller remember this last commit offset, call it 
> {{X'}}
>  # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence 
> the broker if {{Z >= X}} or {{{}Z >= X'{}}}.
>  
> This change should also set the default for MetadataMaxIdleIntervalMs back to 
> 500.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KAFKA-13959) Controller should unfence Broker with busy metadata log

2022-06-07 Thread Luke Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550941#comment-17550941
 ] 

Luke Chen commented on KAFKA-13959:
---

[~dengziming] , if it's 10 ms heartbeat, how could it not be able to catch up 
with 500ms no-op records?

> Controller should unfence Broker with busy metadata log
> ---
>
> Key: KAFKA-13959
> URL: https://issues.apache.org/jira/browse/KAFKA-13959
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Priority: Blocker
>
> https://issues.apache.org/jira/browse/KAFKA-13955 showed that it is possible 
> for the controller to not unfence a broker if the committed offset keeps 
> increasing.
>  
> One solution to this problem is to require the broker to only catch up to the 
> last committed offset when they last sent the heartbeat. For example:
>  # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit 
> offset is {{{}X{}}}. The controller remember this last commit offset, call it 
> {{X'}}
>  # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence 
> the broker if {{Z >= X}} or {{{}Z >= X'{}}}.
>  
> This change should also set the default for MetadataMaxIdleIntervalMs back to 
> 500.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KAFKA-13959) Controller should unfence Broker with busy metadata log

2022-06-07 Thread dengziming (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550912#comment-17550912
 ] 

dengziming commented on KAFKA-13959:


When BrokerLifecycleManager is starting up, it will send heartbeat every 10 
milliseconds rather than 2000 milliseconds: 

`scheduleNextCommunication(NANOSECONDS.convert(10, MILLISECONDS))`

which is already smaller than 500ms, so the reason for this bug is more 
complex, I need more time to investigate.

> Controller should unfence Broker with busy metadata log
> ---
>
> Key: KAFKA-13959
> URL: https://issues.apache.org/jira/browse/KAFKA-13959
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Priority: Blocker
>
> https://issues.apache.org/jira/browse/KAFKA-13955 showed that it is possible 
> for the controller to not unfence a broker if the committed offset keeps 
> increasing.
>  
> One solution to this problem is to require the broker to only catch up to the 
> last committed offset when they last sent the heartbeat. For example:
>  # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit 
> offset is {{{}X{}}}. The controller remember this last commit offset, call it 
> {{X'}}
>  # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence 
> the broker if {{Z >= X}} or {{{}Z >= X'{}}}.
>  
> This change should also set the default for MetadataMaxIdleIntervalMs back to 
> 500.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KAFKA-13959) Controller should unfence Broker with busy metadata log

2022-06-05 Thread Luke Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550173#comment-17550173
 ] 

Luke Chen commented on KAFKA-13959:
---

Thanks [~dengziming] !

> Controller should unfence Broker with busy metadata log
> ---
>
> Key: KAFKA-13959
> URL: https://issues.apache.org/jira/browse/KAFKA-13959
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Priority: Blocker
>
> https://issues.apache.org/jira/browse/KAFKA-13955 showed that it is possible 
> for the controller to not unfence a broker if the committed offset keeps 
> increasing.
>  
> One solution to this problem is to require the broker to only catch up to the 
> last committed offset when they last sent the heartbeat. For example:
>  # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit 
> offset is {{{}X{}}}. The controller remember this last commit offset, call it 
> {{X'}}
>  # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence 
> the broker if {{Z >= X}} or {{{}Z >= X'{}}}.
>  
> This change should also set the default for MetadataMaxIdleIntervalMs back to 
> 500.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KAFKA-13959) Controller should unfence Broker with busy metadata log

2022-06-04 Thread dengziming (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17548215#comment-17548215
 ] 

dengziming commented on KAFKA-13959:


I will take a look at this if no one assigned.

> Controller should unfence Broker with busy metadata log
> ---
>
> Key: KAFKA-13959
> URL: https://issues.apache.org/jira/browse/KAFKA-13959
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Priority: Blocker
>
> https://issues.apache.org/jira/browse/KAFKA-13955 showed that it is possible 
> for the controller to not unfence a broker if the committed offset keeps 
> increasing.
>  
> One solution to this problem is to require the broker to only catch up to the 
> last committed offset when they last sent the heartbeat. For example:
>  # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit 
> offset is {{{}X{}}}. The controller remember this last commit offset, call it 
> {{X'}}
>  # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence 
> the broker if {{Z >= X}} or {{{}Z >= X'{}}}.
>  
> This change should also set the default for MetadataMaxIdleIntervalMs back to 
> 500.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KAFKA-13959) Controller should unfence Broker with busy metadata log

2022-06-03 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17546695#comment-17546695
 ] 

Jose Armando Garcia Sancio commented on KAFKA-13959:


[~dengziming] [~showuon] Are you interested in working on this issue?

> Controller should unfence Broker with busy metadata log
> ---
>
> Key: KAFKA-13959
> URL: https://issues.apache.org/jira/browse/KAFKA-13959
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Priority: Blocker
>
> https://issues.apache.org/jira/browse/KAFKA-13955 showed that it is possible 
> for the controller to not unfence a broker if the committed offset keeps 
> increasing.
>  
> One solution to this problem is to require the broker to only catch up to the 
> last committed offset when they last sent the heartbeat. For example:
>  # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit 
> offset is {{{}X{}}}. The controller remember this last commit offset, call it 
> {{X'}}
>  # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence 
> the broker if {{Z >= X}} or {{{}Z >= X'{}}}.
>  
> This change should also set the default for MetadataMaxIdleIntervalMs back to 
> 500.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)