[GitHub] [kafka] KarboniteKream commented on pull request #13679: KAFKA-14291: KRaft controller should return right finalized features in ApiVersionResponse
KarboniteKream commented on PR #13679: URL: https://github.com/apache/kafka/pull/13679#issuecomment-1562213887 Thank you, I'll check today! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] KarboniteKream commented on pull request #13679: KAFKA-14291: KRaft controller should return right finalized features in ApiVersionResponse
KarboniteKream commented on PR #13679: URL: https://github.com/apache/kafka/pull/13679#issuecomment-1560737054 Correct, the issue happens after shutting down and restarting the leader. Here's the example configuration: - [`1.properties`](https://gist.github.com/KarboniteKream/6ec0378f9b057f9e15467177e58539c6) - [`2.properties`](https://gist.github.com/KarboniteKream/4bd198db1db77e7b43c6bb37cecd9b68) Reproduction: 1. Initialize the storage: - `./bin/kafka-storage.sh format --config 1.properties --cluster-id 9N8QxiJfRoe1rPZ1vpjd2w` - `./bin/kafka-storage.sh format --config 2.properties --cluster-id 9N8QxiJfRoe1rPZ1vpjd2w` 2. Start both controllers: - `./bin/kafka-server-start.sh 1.properties` - `./bin/kafka-server-start.sh 2.properties` 3. Wait for quorum to be established, then stop the leader with Ctrl-C 4. Start the controller back up After a few moments, the controller will start logging the following: ``` [2023-05-24 18:00:02,980] WARN [RaftManager id=3001] Received error UNKNOWN_SERVER_ERROR from node 3002 when making an ApiVersionsRequest with correlation id 4. Disconnecting. (org.apache.kafka.clients.NetworkClient) [2023-05-24 18:00:03,030] INFO [MetadataLoader id=3001] initializeNewPublishers: the loader is still catching up because we still don't know the high water mark yet. (org.apache.kafka.image.loader.MetadataLoader) ``` On the other controller (new leader), you'll see the following logs in `logs/controller.log`: ``` [2023-05-24 18:00:02,898] WARN [QuorumController id=3002] getFinalizedFeatures: failed with unknown server exception RuntimeException in 271 us. The controller is already in standby mode. (org.apache.kafka.controller.QuorumController) java.lang.RuntimeException: No in-memory snapshot for epoch 9. Snapshot epochs are: at org.apache.kafka.timeline.SnapshotRegistry.getSnapshot(SnapshotRegistry.java:173) at org.apache.kafka.timeline.SnapshotRegistry.iterator(SnapshotRegistry.java:131) at org.apache.kafka.timeline.TimelineObject.get(TimelineObject.java:69) at org.apache.kafka.controller.FeatureControlManager.finalizedFeatures(FeatureControlManager.java:303) at org.apache.kafka.controller.QuorumController.lambda$finalizedFeatures$16(QuorumController.java:2016) at org.apache.kafka.controller.QuorumController$ControllerReadEvent.run(QuorumController.java:546) at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127) at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210) at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181) at java.base/java.lang.Thread.run(Thread.java:829) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] KarboniteKream commented on pull request #13679: KAFKA-14291: KRaft controller should return right finalized features in ApiVersionResponse
KarboniteKream commented on PR #13679: URL: https://github.com/apache/kafka/pull/13679#issuecomment-1559681643 This PR seems to have introduced a regression (confirmed using bisect). In a simple setup of two controllers using `config/kraft/controller.properties`, after the leader is shut down and restarted, `UNKNOWN_SERVER_EXCEPTION` will be thrown by `ApiVersionsRequest`. The other controller sees the following exception: ``` [2023-05-19 15:50:18,834] WARN [QuorumController id=0] getFinalizedFeatures: failed with unknown server exception RuntimeException in 28 us. The controller is already in standby mode. (org.apache.kafka.controller.QuorumController) java.lang.RuntimeException: No in-memory snapshot for epoch 84310. Snapshot epochs are: 61900 at org.apache.kafka.timeline.SnapshotRegistry.getSnapshot(SnapshotRegistry.java:173) at org.apache.kafka.timeline.SnapshotRegistry.iterator(SnapshotRegistry.java:131) at org.apache.kafka.timeline.TimelineObject.get(TimelineObject.java:69) at org.apache.kafka.controller.FeatureControlManager.finalizedFeatures(FeatureControlManager.java:303) at org.apache.kafka.controller.QuorumController.lambda$finalizedFeatures$16(QuorumController.java:2016) at org.apache.kafka.controller.QuorumController$ControllerReadEvent.run(QuorumController.java:546) at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127) at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210) at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181) at java.base/java.lang.Thread.run(Thread.java:829) ``` A similar issue was reported in [KAFKA-14996](https://issues.apache.org/jira/browse/KAFKA-14996). I'll report this in Jira once my account is approved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org