Calvin Liu created KAFKA-18966:
----------------------------------
Summary: Don't honor controller_num_nodes_override in combined
controller test mode
Key: KAFKA-18966
URL: https://issues.apache.org/jira/browse/KAFKA-18966
Project: Kafka
Issue Type: Bug
Reporter: Calvin Liu
Assignee: Calvin Liu
I found some flaky tests caused by the following test setup:
# Using combined controller mode which means the broker will also host the
controller.
# Using 1 controller node. This is very common among the tests.
# Testing hard bounce.
When the broker which hosts the controller is down, the whole controller
service is down as well. It can take a long time to elect a new leader even if
ISR has good candidates. This downtime costs unnecessary extra test time(due to
unavailable partition) and pushes some timeout (like transaction timeout) to be
longer.
Propose to set the controller node to at least 3 in the combined controller
test mode to
# Avoid the flaky factor of no valid leader during the broker restart.
# Reduce the test time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)