To correct my original description:

We have observed that KRaft observers (process.roles=broker) that typically
send FETCH requests to the quorum Leader node can enter a state of
indefinitely **sending FETCH requests to a voter (follower) node**, which
we believe to be after a re-bootstrap due to some sort of request failure
or timeout.


On Thu, May 29, 2025 at 3:16 PM Justin Chen <justin.c...@shopify.com> wrote:

> Hello,
>
> In our Kafka 4.0 cluster (dynamic quorum, 5 controller nodes), we have
> observed that KRaft observers (process.roles=broker) that typically send
> FETCH requests to the quorum Leader node can enter a state of indefinitely
> re-bootstraping to a voter (follower) node, likely after some sort of
> request failure or timeout. Subsequently, the observer node’s high water
> mark/metadata offset would not update, causing issues such as out of sync
> replicas during partition reassignments. We also observe a high rate of
> NOT_LEADER_OR_FOLLOWER errors
> (kafka.network:type=RequestMetrics,name=ErrorsPerSec,request=FETCH,error=NOT_LEADER_OR_FOLLOWER
> ) by the voter node that is receiving the FETCH requests.
>
> The observer would only be able to recover after restarting the voter it
> (re-)bootstrapped to, which causes another re-bootstrap to a random voter
> node. If by chance the observer connects to the correct leader node, the
> metadata replication would recover and errors would stop.
>
> With DEBUG logs enabled on the KRaft controllers, we repeatedly see the
> following log on the voter node that is incorrectly receiving the for FETCH
> requests:
>
> ```
> Completed
> request:{"isForwarded":false,"requestHeader":{"requestApiKey":1,"requestApiVersion":17,"correlationId":11131463,"clientId":"raft-client-81","requestApiKeyName":"FETCH"},"request":{"clusterId":"vD8YJMbtQMyOnzTfZ5RK4g","replicaState":{"replicaId":81,"replicaEpoch":-1},"maxWaitMs":500,"minBytes":0,"maxBytes":8388608,"isolationLevel":0,"sessionId":0,"sessionEpoch":-1,"topics":[{"topicId":"AAAAAAAAAAAAAAAAAAAAAQ","partitions":[{"partition":0,"currentLeaderEpoch":69,"fetchOffset":4793550,"lastFetchedEpoch":69,"logStartOffset":-1,"partitionMaxBytes":0,"replicaDirectoryId":"sJjyY5zzN1XLxEmDoWwVig"}]}],"forgottenTopicsData":[],"rackId":""},"response":{"throttleTimeMs":0,"errorCode":0,"sessionId":0,"responses":[{"topicId":"AAAAAAAAAAAAAAAAAAAAAQ","partitions":[{"partitionIndex":0,"errorCode":6,"highWatermark":-1,"lastStableOffset":-1,"logStartOffset":4782613,"currentLeader":{"leaderId":7002,"leaderEpoch":69},"abortedTransactions":[],"preferredReadReplica":-1,"recordsSizeInBytes":0}]}],"nodeEndpoints":[{"nodeId":7002,"host":"kraftcontroller-7002.kafka.<redacted>.com.","port":9095,"rack":null}]},"connection":"<redacted_serverIp>:<serverPort>-<redacted_clientIp>:<clientPort>","totalTimeMs":0.458,"requestQueueTimeMs":0.113,"localTimeMs":0.086,"remoteTimeMs":0.203,"throttleTimeMs":0,"responseQueueTimeMs":0.021,"sendTimeMs":0.032,"securityProtocol":"PLAINTEXT","principal":"User:ANONYMOUS","listener":"KRAFT_CONTROLLER","clientInformation":{"softwareName":"apache-kafka-java","softwareVersion":"4.0.0"}}
> ```
>
> Note that the top level error code is 0 (success), however the
> `response.partitions[0].errorCode` is 6 (NOT_LEADER_OR_FOLLOWER). Tracing
> through the FETCH logic of the KafkaRaftClient, it seems the response is
> handled “successfully” by the `maybeHandleCommonResponse` (
> https://github.com/apache/kafka/blob/4.0.0/raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java#L1707-L1717)
> method, yet the correct leader in the response (node 7002) is not used for
> subsequent requests. The Raft client would continue sending to the
> incorrect voter node and never re-bootstrap/backoff (
> https://github.com/apache/kafka/blob/4.0.0/raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java#L3224-L3225)
> until a voter node restart.
>
> We configure each node’s `controller.quorum.bootstrap.servers` to a single
> host that load balances to 1 of 5 KRaft controllers, but I do not believe
> that explicitly listing all 5 host:port strings would prevent this issue.
>
> I wanted to confirm if this is indeed a bug within KRaft, or a potential
> misconfiguration on our end.
>
> Thank you!
> Justin C
>
>

-- 
Regards,
Justin Chen

Reply via email to