Justin Chen created KAFKA-19354: ----------------------------------- Summary: KRaft observer unable to recover after re-bootstrapping to follower Key: KAFKA-19354 URL: https://issues.apache.org/jira/browse/KAFKA-19354 Project: Kafka Issue Type: Bug Components: kraft Affects Versions: 4.0.0 Reporter: Justin Chen
[Original dev mail thread|https://lists.apache.org/thread/ws3390khsxhdg2b8cnv2mzv8slz5xq7q] If an observer's FETCH request to the quorum leader experiences a failure/timeout, it is possible that when it re-bootstraps, it will connect to a follower node (random selection). Subsequently, the observer node will continually send FETCH requests to that follower, and in receive a response with a "partitionError" errorCode=6 (NOT_LEADER_OR_FOLLOWER), which does not trigger a re-bootstrap. Thus, the observer will be stuck sending FETCH requests to the follower instead of the leader, halting metadata replication and causing it to fall out of sync. To recover from this state, re-bootstrapping would need to occur by restarting the affected observer or follower, until it connects to the correct leader. *Steps to reproduce:* 1. Spin up Kafka cluster with 3 or 5 controllers. (ideally 5 to increase likelihood of bootstrapping to a follower instead of the leader) 2. Enable a network delay on a particular observer broker (e.g. `tc qdisc add dev eth0 root netem delay 2500ms`). I picked 2500ms since default timeout is 2s for `controller.quorum.fetch.timeout.ms`/`controller.quorum.request.timeout.ms`. After a few seconds, disable the network delay (e.g. `tc qdisc del dev eth0 root netem`). 3. The observer node will re-bootstrap, potentially to a follower instead of the leader. If so, the observer will continuously send fetch requests to the follower node, receive `NOT_LEADER_OR_FOLLOWER` in response, and will no longer replicate metadata. *Debug logs demonstrating this scenario:* - https://gist.github.com/justin-chen/1f3eee79d9a5066a467818a0b1bc006f - kraftcontroller-3 (leader), kraftcontroller-4 (follower), kafka-0 (observer) -- This message was sent by Atlassian Jira (v8.20.10#820010)