[jira] [Assigned] (KAFKA-14941) Document which configuration options are applicable only to processes with broker role or controller role
[ https://issues.apache.org/jira/browse/KAFKA-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-14941: - Assignee: Gantigmaa Selenge > Document which configuration options are applicable only to processes with > broker role or controller role > - > > Key: KAFKA-14941 > URL: https://issues.apache.org/jira/browse/KAFKA-14941 > Project: Kafka > Issue Type: Improvement >Reporter: Jakub Scholz >Assignee: Gantigmaa Selenge >Priority: Major > > When running in KRaft mode, some of the configuration options are applicable > only to nodes with the broker process role and some are applicable only to > the nodes with the controller process roles. It would be great if this > information was part of the documentation (e.g. in the [Broker > Configs|https://kafka.apache.org/documentation/#brokerconfigs] table on the > website), but if it was also part of the config classes so that it can be > used in situations when the configuration is dynamically configured to for > example filter the options applicable to different nodes. This would allow > having configuration files with only the actually used configuration options > and for example, help to reduce unnecessary restarts when rolling out new > configurations etc. > For some options, it seems clear and the Kafka node would refuse to start if > they are set - for example the configurations of the non-controler-listeners > in controller-only nodes. For others, it seems a bit less clear (Does > {{compression.type}} option apply to controller-only nodes? Or the > configurations for the offset topic? etc.). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16781) Expose advertised.listeners in controller node
[ https://issues.apache.org/jira/browse/KAFKA-16781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860554#comment-17860554 ] Gantigmaa Selenge commented on KAFKA-16781: --- [~showuon] I think we can close this issue as it's already implemented in [https://github.com/apache/kafka/pull/16235] > Expose advertised.listeners in controller node > -- > > Key: KAFKA-16781 > URL: https://issues.apache.org/jira/browse/KAFKA-16781 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Gantigmaa Selenge >Priority: Major > Labels: need-kip, newbie, newbie++ > > After > [KIP-919|https://cwiki.apache.org/confluence/display/KAFKA/KIP-919%3A+Allow+AdminClient+to+Talk+Directly+with+the+KRaft+Controller+Quorum+and+add+Controller+Registration], > we allow clients to talk to the KRaft controller node directly. But unlike > broker node, we don't allow users to config advertised.listeners for clients > to connect to. Without this config, the client cannot connect to the > controller node if the controller is sitting behind NAT network while the > client is in the external network. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16211) Inconsistent config values in CreateTopicsResult and DescribeConfigsResult
[ https://issues.apache.org/jira/browse/KAFKA-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855584#comment-17855584 ] Gantigmaa Selenge commented on KAFKA-16211: --- This issue also causes a KRaft test to fail (e.g. SaslSslAdminIntegrationTest.testCreateTopicsResponseMetadataAndConfig), if brokers and controllers have set with different configuration values, because they compare the configurations returned from create topic and describe topic requests. As a workaround, brokers and controllers have the same configuration [values|https://github.com/apache/kafka/pull/15175/files#diff-4ffa9190a8da4f602f2022e81c87bf041e79655a8ff2d5be673cd8238eced132R369] set. > Inconsistent config values in CreateTopicsResult and DescribeConfigsResult > -- > > Key: KAFKA-16211 > URL: https://issues.apache.org/jira/browse/KAFKA-16211 > Project: Kafka > Issue Type: Bug > Components: controller >Reporter: Gantigmaa Selenge >Assignee: Dung Ha >Priority: Minor > > When creating a topic in KRaft cluster, a config value returned in > CreateTopicsResult is different than what you get from describe topic > configs, if the config was set in broker.properties or controller.properties > or in both but with different values. > > For example, start a broker with `segment.bytes` set to 573741824 in the > properties file and then create a topic, the CreateTopicsResult contains: > ConfigEntry(name=segment.bytes, value=1073741824, source=DEFAULT_CONFIG, > isSensitive=false, isReadOnly=false, synonyms=[], type=INT, > documentation=null) > because the controller was started without setting this config. > However when you describe configurations for the same topic, the config value > set by the broker is returned: > Create topic configsConfigEntry(name=segment.bytes, value=573741824, > source=STATIC_BROKER_CONFIG, isSensitive=false, isReadOnly=false, > synonyms=[], type=null, documentation=null) > > Vice versa, if the controller is started with this config set to a different > value, the create topic request returns the value set by the controller and > then when you describe the config for the same topic, you get the value set > by the broker. This makes it confusing to understand which value being is > used. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16781) Expose advertised.listeners in controller node
[ https://issues.apache.org/jira/browse/KAFKA-16781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854690#comment-17854690 ] Gantigmaa Selenge commented on KAFKA-16781: --- Thanks [~frankvicky] , I will reassign it to myself :) . > Expose advertised.listeners in controller node > -- > > Key: KAFKA-16781 > URL: https://issues.apache.org/jira/browse/KAFKA-16781 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: TengYao Chi >Priority: Major > Labels: need-kip, newbie, newbie++ > > After > [KIP-919|https://cwiki.apache.org/confluence/display/KAFKA/KIP-919%3A+Allow+AdminClient+to+Talk+Directly+with+the+KRaft+Controller+Quorum+and+add+Controller+Registration], > we allow clients to talk to the KRaft controller node directly. But unlike > broker node, we don't allow users to config advertised.listeners for clients > to connect to. Without this config, the client cannot connect to the > controller node if the controller is sitting behind NAT network while the > client is in the external network. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16781) Expose advertised.listeners in controller node
[ https://issues.apache.org/jira/browse/KAFKA-16781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-16781: - Assignee: Gantigmaa Selenge (was: TengYao Chi) > Expose advertised.listeners in controller node > -- > > Key: KAFKA-16781 > URL: https://issues.apache.org/jira/browse/KAFKA-16781 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Gantigmaa Selenge >Priority: Major > Labels: need-kip, newbie, newbie++ > > After > [KIP-919|https://cwiki.apache.org/confluence/display/KAFKA/KIP-919%3A+Allow+AdminClient+to+Talk+Directly+with+the+KRaft+Controller+Quorum+and+add+Controller+Registration], > we allow clients to talk to the KRaft controller node directly. But unlike > broker node, we don't allow users to config advertised.listeners for clients > to connect to. Without this config, the client cannot connect to the > controller node if the controller is sitting behind NAT network while the > client is in the external network. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16781) Expose advertised.listeners in controller node
[ https://issues.apache.org/jira/browse/KAFKA-16781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854408#comment-17854408 ] Gantigmaa Selenge commented on KAFKA-16781: --- [~frankvicky] Hi, are you still planning to work on this? If not, I would like to work on it, creating a KIP and implementing it. This might be blocking some users from being able to talk to controllers directly via Admin API. > Expose advertised.listeners in controller node > -- > > Key: KAFKA-16781 > URL: https://issues.apache.org/jira/browse/KAFKA-16781 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: TengYao Chi >Priority: Major > Labels: need-kip, newbie, newbie++ > > After > [KIP-919|https://cwiki.apache.org/confluence/display/KAFKA/KIP-919%3A+Allow+AdminClient+to+Talk+Directly+with+the+KRaft+Controller+Quorum+and+add+Controller+Registration], > we allow clients to talk to the KRaft controller node directly. But unlike > broker node, we don't allow users to config advertised.listeners for clients > to connect to. Without this config, the client cannot connect to the > controller node if the controller is sitting behind NAT network while the > client is in the external network. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16865) Admin.describeTopics behavior change after KIP-966
[ https://issues.apache.org/jira/browse/KAFKA-16865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-16865: - Assignee: Gantigmaa Selenge > Admin.describeTopics behavior change after KIP-966 > -- > > Key: KAFKA-16865 > URL: https://issues.apache.org/jira/browse/KAFKA-16865 > Project: Kafka > Issue Type: Task > Components: admin, clients >Affects Versions: 3.8.0 >Reporter: Mickael Maison >Assignee: Gantigmaa Selenge >Priority: Major > > Running the following code produces different behavior between ZooKeeper and > KRaft: > {code:java} > DescribeTopicsOptions options = new > DescribeTopicsOptions().includeAuthorizedOperations(false); > TopicCollection topics = > TopicCollection.ofTopicNames(Collections.singletonList(topic)); > DescribeTopicsResult describeTopicsResult = admin.describeTopics(topics, > options); > TopicDescription topicDescription = > describeTopicsResult.topicNameValues().get(topic).get(); > System.out.println(topicDescription.authorizedOperations()); > {code} > With ZooKeeper this print null, and with KRaft it prints [ALTER, READ, > DELETE, ALTER_CONFIGS, CREATE, DESCRIBE_CONFIGS, WRITE, DESCRIBE]. > The Admin.getTopicDescriptionFromDescribeTopicsResponseTopic does not take > into account the options provided to describeTopics() and always populates > the authorizedOperations field. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16620) KRaft quorum cannot be formed if all controllers are restarted at the same time
[ https://issues.apache.org/jira/browse/KAFKA-16620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge updated KAFKA-16620: -- Summary: KRaft quorum cannot be formed if all controllers are restarted at the same time (was: Kraft quorum cannot be formed if all controllers are restarted at the same time) > KRaft quorum cannot be formed if all controllers are restarted at the same > time > --- > > Key: KAFKA-16620 > URL: https://issues.apache.org/jira/browse/KAFKA-16620 > Project: Kafka > Issue Type: Bug >Reporter: Gantigmaa Selenge >Assignee: Luke Chen >Priority: Major > > Controller quorum cannot seem to form at all after accidentally restarting > all controller nodes at the same time in a test environment. This is > reproducible, happens almost everytime when restarting all controller nodes > of the cluster. > Started a cluster with 3 controller nodes and 3 broker nodes. After > restarting the controller nodes, one of them becomes the active controller > but resigns due to fetch timeout. The quorum leadership bounces off like this > between the nodes indefinitely. > The controller.quorum.fetch.timeout.ms was set to the default of 2 seconds. > Logs from an active controller: > {code:java} > 2024-04-17 14:00:48,250 INFO [QuorumController id=0] Becoming the active > controller at epoch 34, next write offset 1116. > (org.apache.kafka.controller.QuorumController) > [quorum-controller-0-event-handler] > 2024-04-17 14:00:48,250 WARN [QuorumController id=0] Performing controller > activation. Loaded ZK migration state of NONE. > (org.apache.kafka.controller.QuorumController) > [quorum-controller-0-event-handler] > 2024-04-17 14:00:48,701 INFO [RaftManager id=0] Node 1 disconnected. > (org.apache.kafka.clients.NetworkClient) > [kafka-0-raft-outbound-request-thread] > 2024-04-17 14:00:48,701 WARN [RaftManager id=0] Connection to node 1 > (my-cluster-controller-1.my-cluster-kafka-brokers.roller.svc.cluster.local/10.244.0.68:9090) > could not be established. Node may not be available. > (org.apache.kafka.clients.NetworkClient) > [kafka-0-raft-outbound-request-thread] > 2024-04-17 14:00:48,776 DEBUG [UnifiedLog partition=__cluster_metadata-0, > dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1117 > (exclusive)with recovery point 1117, last flushed: 1713362448239, current > time: 1713362448776,unflushed: 1 (kafka.log.UnifiedLog) > [kafka-0-raft-io-thread] > 2024-04-17 14:00:49,277 DEBUG [UnifiedLog partition=__cluster_metadata-0, > dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1118 > (exclusive)with recovery point 1118, last flushed: 1713362448777, current > time: > ... > 2024-04-17 14:01:35,934 DEBUG [UnifiedLog partition=__cluster_metadata-0, > dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1200 > (exclusive)with recovery point 1200, last flushed: 1713362489371, current > time: 1713362495934,unflushed: 1 (kafka.log.UnifiedLog) > [kafka-0-raft-io-thread] > 2024-04-17 14:01:36,121 INFO [RaftManager id=0] Did not receive fetch request > from the majority of the voters within 3000ms. Current fetched voters are []. > (org.apache.kafka.raft.LeaderState) [kafka-0-raft-io-thread] > 2024-04-17 14:01:36,223 WARN [QuorumController id=0] Renouncing the > leadership due to a metadata log event. We were the leader at epoch 34, but > in the new epoch 35, the leader is (none). Reverting to last stable offset > 1198. (org.apache.kafka.controller.QuorumController) > [quorum-controller-0-event-handler] > 2024-04-17 14:01:36,223 INFO [QuorumController id=0] > failAll(NotControllerException): failing writeNoOpRecord(152156824). > (org.apache.kafka.deferred.DeferredEventQueue) > [quorum-controller-0-event-handler] > 2024-04-17 14:01:36,223 INFO [QuorumController id=0] writeNoOpRecord: event > failed with NotControllerException in 6291037 microseconds. > (org.apache.kafka.controller.QuorumController) > [quorum-controller-0-event-handler]{code} > Logs from the follower: > {code:java} > 024-04-17 14:00:48,242 INFO [RaftManager id=2] Completed transition to > FollowerState(fetchTimeoutMs=2000, epoch=34, leaderId=0, voters=[0, 1, 2], > highWatermark=Optional[LogOffsetMetadata(offset=1113, > metadata=Optional.empty)], fetchingSnapshot=Optional.empty) from > Voted(epoch=34, votedId=0, voters=[0, 1, 2], electionTimeoutMs=1794) > (org.apache.kafka.raft.QuorumState) [kafka-2-raft-io-thread] > 2024-04-17 14:00:48,242 INFO [QuorumController id=2] In the new epoch 34, the > leader is 0. (org.apache.kafka.controller.QuorumController) > [quorum-controller-2-event-handler] > 2024-04-17 14:00:48,247 DEBUG [UnifiedLog partition=__cluster_metadata-0, > dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset
[jira] [Updated] (KAFKA-16620) Kraft quorum cannot be formed if all controllers are restarted at the same time
[ https://issues.apache.org/jira/browse/KAFKA-16620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge updated KAFKA-16620: -- Description: Controller quorum cannot seem to form at all after accidentally restarting all controller nodes at the same time in a test environment. This is reproducible, happens almost everytime when restarting all controller nodes of the cluster. Started a cluster with 3 controller nodes and 3 broker nodes. After restarting the controller nodes, one of them becomes the active controller but resigns due to fetch timeout. The quorum leadership bounces off like this between the nodes indefinitely. The controller.quorum.fetch.timeout.ms was set to the default of 2 seconds. Logs from an active controller: {code:java} 2024-04-17 14:00:48,250 INFO [QuorumController id=0] Becoming the active controller at epoch 34, next write offset 1116. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler] 2024-04-17 14:00:48,250 WARN [QuorumController id=0] Performing controller activation. Loaded ZK migration state of NONE. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler] 2024-04-17 14:00:48,701 INFO [RaftManager id=0] Node 1 disconnected. (org.apache.kafka.clients.NetworkClient) [kafka-0-raft-outbound-request-thread] 2024-04-17 14:00:48,701 WARN [RaftManager id=0] Connection to node 1 (my-cluster-controller-1.my-cluster-kafka-brokers.roller.svc.cluster.local/10.244.0.68:9090) could not be established. Node may not be available. (org.apache.kafka.clients.NetworkClient) [kafka-0-raft-outbound-request-thread] 2024-04-17 14:00:48,776 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1117 (exclusive)with recovery point 1117, last flushed: 1713362448239, current time: 1713362448776,unflushed: 1 (kafka.log.UnifiedLog) [kafka-0-raft-io-thread] 2024-04-17 14:00:49,277 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1118 (exclusive)with recovery point 1118, last flushed: 1713362448777, current time: ... 2024-04-17 14:01:35,934 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1200 (exclusive)with recovery point 1200, last flushed: 1713362489371, current time: 1713362495934,unflushed: 1 (kafka.log.UnifiedLog) [kafka-0-raft-io-thread] 2024-04-17 14:01:36,121 INFO [RaftManager id=0] Did not receive fetch request from the majority of the voters within 3000ms. Current fetched voters are []. (org.apache.kafka.raft.LeaderState) [kafka-0-raft-io-thread] 2024-04-17 14:01:36,223 WARN [QuorumController id=0] Renouncing the leadership due to a metadata log event. We were the leader at epoch 34, but in the new epoch 35, the leader is (none). Reverting to last stable offset 1198. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler] 2024-04-17 14:01:36,223 INFO [QuorumController id=0] failAll(NotControllerException): failing writeNoOpRecord(152156824). (org.apache.kafka.deferred.DeferredEventQueue) [quorum-controller-0-event-handler] 2024-04-17 14:01:36,223 INFO [QuorumController id=0] writeNoOpRecord: event failed with NotControllerException in 6291037 microseconds. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler]{code} Logs from the follower: {code:java} 024-04-17 14:00:48,242 INFO [RaftManager id=2] Completed transition to FollowerState(fetchTimeoutMs=2000, epoch=34, leaderId=0, voters=[0, 1, 2], highWatermark=Optional[LogOffsetMetadata(offset=1113, metadata=Optional.empty)], fetchingSnapshot=Optional.empty) from Voted(epoch=34, votedId=0, voters=[0, 1, 2], electionTimeoutMs=1794) (org.apache.kafka.raft.QuorumState) [kafka-2-raft-io-thread] 2024-04-17 14:00:48,242 INFO [QuorumController id=2] In the new epoch 34, the leader is 0. (org.apache.kafka.controller.QuorumController) [quorum-controller-2-event-handler] 2024-04-17 14:00:48,247 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1116 (exclusive)with recovery point 1116, last flushed: 1713362442238, current time: 1713362448247,unflushed: 2 (kafka.log.UnifiedLog) [kafka-2-raft-io-thread] 2024-04-17 14:00:48,777 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1117 (exclusive)with recovery point 1117, last flushed: 1713362448249, current time: 1713362448777,unflushed: 1 (kafka.log.UnifiedLog) [kafka-2-raft-io-thread] 2024-04-17 14:00:49,278 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1118 (exclusive)with recovery point 1118, last flushed: 1713362448811, current time: 1713362449278,unflushed ... 2024-04-17 14:01:29,371 DEBUG
[jira] [Assigned] (KAFKA-16620) Kraft quorum cannot be formed if all controllers are restarted at the same time
[ https://issues.apache.org/jira/browse/KAFKA-16620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-16620: - Assignee: Luke Chen (was: Gantigmaa Selenge) > Kraft quorum cannot be formed if all controllers are restarted at the same > time > --- > > Key: KAFKA-16620 > URL: https://issues.apache.org/jira/browse/KAFKA-16620 > Project: Kafka > Issue Type: Bug >Reporter: Gantigmaa Selenge >Assignee: Luke Chen >Priority: Major > > Controller quorum cannot seem to form at all after accidentally restarting > all controller nodes at the same time in a test environment. This is > reproducible, happens almost everytime when restarting all controller nodes > of the cluster. > Started a cluster with 3 controller nodes and 3 broker nodes. After > restarting the controller nodes, one of them becomes the active controller > but resigns due to fetch timeout. The quorum leadership bounces off like this > between the nodes indefinitely. > The controller.quorum.fetch.timeout.ms was set to the default of 2 seconds. > Logs from an active controller: > {code:java} > 2024-04-17 14:00:48,250 INFO [QuorumController id=0] Becoming the active > controller at epoch 34, next write offset 1116. > (org.apache.kafka.controller.QuorumController) > [quorum-controller-0-event-handler] > 2024-04-17 14:00:48,250 WARN [QuorumController id=0] Performing controller > activation. Loaded ZK migration state of NONE. > (org.apache.kafka.controller.QuorumController) > [quorum-controller-0-event-handler] > 2024-04-17 14:00:48,701 INFO [RaftManager id=0] Node 1 disconnected. > (org.apache.kafka.clients.NetworkClient) > [kafka-0-raft-outbound-request-thread] > 2024-04-17 14:00:48,701 WARN [RaftManager id=0] Connection to node 1 > (my-cluster-controller-1.my-cluster-kafka-brokers.roller.svc.cluster.local/10.244.0.68:9090) > could not be established. Node may not be available. > (org.apache.kafka.clients.NetworkClient) > [kafka-0-raft-outbound-request-thread] > 2024-04-17 14:00:48,776 DEBUG [UnifiedLog partition=__cluster_metadata-0, > dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1117 > (exclusive)with recovery point 1117, last flushed: 1713362448239, current > time: 1713362448776,unflushed: 1 (kafka.log.UnifiedLog) > [kafka-0-raft-io-thread] > 2024-04-17 14:00:49,277 DEBUG [UnifiedLog partition=__cluster_metadata-0, > dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1118 > (exclusive)with recovery point 1118, last flushed: 1713362448777, current > time: > ... > 2024-04-17 14:01:35,934 DEBUG [UnifiedLog partition=__cluster_metadata-0, > dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1200 > (exclusive)with recovery point 1200, last flushed: 1713362489371, current > time: 1713362495934,unflushed: 1 (kafka.log.UnifiedLog) > [kafka-0-raft-io-thread] > 2024-04-17 14:01:36,121 INFO [RaftManager id=0] Did not receive fetch request > from the majority of the voters within 3000ms. Current fetched voters are []. > (org.apache.kafka.raft.LeaderState) [kafka-0-raft-io-thread] > 2024-04-17 14:01:36,223 WARN [QuorumController id=0] Renouncing the > leadership due to a metadata log event. We were the leader at epoch 34, but > in the new epoch 35, the leader is (none). Reverting to last stable offset > 1198. (org.apache.kafka.controller.QuorumController) > [quorum-controller-0-event-handler] > 2024-04-17 14:01:36,223 INFO [QuorumController id=0] > failAll(NotControllerException): failing writeNoOpRecord(152156824). > (org.apache.kafka.deferred.DeferredEventQueue) > [quorum-controller-0-event-handler] > 2024-04-17 14:01:36,223 INFO [QuorumController id=0] writeNoOpRecord: event > failed with NotControllerException in 6291037 microseconds. > (org.apache.kafka.controller.QuorumController) > [quorum-controller-0-event-handler]{code} > Logs from the follower: > {code:java} > 024-04-17 14:00:48,242 INFO [RaftManager id=2] Completed transition to > FollowerState(fetchTimeoutMs=2000, epoch=34, leaderId=0, voters=[0, 1, 2], > highWatermark=Optional[LogOffsetMetadata(offset=1113, > metadata=Optional.empty)], fetchingSnapshot=Optional.empty) from > Voted(epoch=34, votedId=0, voters=[0, 1, 2], electionTimeoutMs=1794) > (org.apache.kafka.raft.QuorumState) [kafka-2-raft-io-thread] > 2024-04-17 14:00:48,242 INFO [QuorumController id=2] In the new epoch 34, the > leader is 0. (org.apache.kafka.controller.QuorumController) > [quorum-controller-2-event-handler] > 2024-04-17 14:00:48,247 DEBUG [UnifiedLog partition=__cluster_metadata-0, > dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1116 > (exclusive)with recovery point 1116, last flushed: 1713362442238, current > time: 1713362448247,unflushed: 2
[jira] [Assigned] (KAFKA-16620) Kraft quorum cannot be formed if all controllers are restarted at the same time
[ https://issues.apache.org/jira/browse/KAFKA-16620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-16620: - Assignee: Gantigmaa Selenge > Kraft quorum cannot be formed if all controllers are restarted at the same > time > --- > > Key: KAFKA-16620 > URL: https://issues.apache.org/jira/browse/KAFKA-16620 > Project: Kafka > Issue Type: Bug >Reporter: Gantigmaa Selenge >Assignee: Gantigmaa Selenge >Priority: Major > > Controller quorum cannot seem to form at all after accidentally restarting > all controller nodes at the same time in a test environment. This is > reproducible, happens almost everytime when restarting all controller nodes > of the cluster. > Started a cluster with 3 controller nodes and 3 broker nodes. After > restarting the controller nodes, one of them becomes the active controller > but resigns due to fetch timeout. The quorum leadership bounces off like this > between the nodes indefinitely. > The controller.quorum.fetch.timeout.ms was set to the default of 2 seconds. > Logs from an active controller: > {code:java} > 2024-04-17 14:00:48,250 INFO [QuorumController id=0] Becoming the active > controller at epoch 34, next write offset 1116. > (org.apache.kafka.controller.QuorumController) > [quorum-controller-0-event-handler] > 2024-04-17 14:00:48,250 WARN [QuorumController id=0] Performing controller > activation. Loaded ZK migration state of NONE. > (org.apache.kafka.controller.QuorumController) > [quorum-controller-0-event-handler] > 2024-04-17 14:00:48,701 INFO [RaftManager id=0] Node 1 disconnected. > (org.apache.kafka.clients.NetworkClient) > [kafka-0-raft-outbound-request-thread] > 2024-04-17 14:00:48,701 WARN [RaftManager id=0] Connection to node 1 > (my-cluster-controller-1.my-cluster-kafka-brokers.roller.svc.cluster.local/10.244.0.68:9090) > could not be established. Node may not be available. > (org.apache.kafka.clients.NetworkClient) > [kafka-0-raft-outbound-request-thread] > 2024-04-17 14:00:48,776 DEBUG [UnifiedLog partition=__cluster_metadata-0, > dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1117 > (exclusive)with recovery point 1117, last flushed: 1713362448239, current > time: 1713362448776,unflushed: 1 (kafka.log.UnifiedLog) > [kafka-0-raft-io-thread] > 2024-04-17 14:00:49,277 DEBUG [UnifiedLog partition=__cluster_metadata-0, > dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1118 > (exclusive)with recovery point 1118, last flushed: 1713362448777, current > time: > ... > 2024-04-17 14:01:35,934 DEBUG [UnifiedLog partition=__cluster_metadata-0, > dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1200 > (exclusive)with recovery point 1200, last flushed: 1713362489371, current > time: 1713362495934,unflushed: 1 (kafka.log.UnifiedLog) > [kafka-0-raft-io-thread] > 2024-04-17 14:01:36,121 INFO [RaftManager id=0] Did not receive fetch request > from the majority of the voters within 3000ms. Current fetched voters are []. > (org.apache.kafka.raft.LeaderState) [kafka-0-raft-io-thread] > 2024-04-17 14:01:36,223 WARN [QuorumController id=0] Renouncing the > leadership due to a metadata log event. We were the leader at epoch 34, but > in the new epoch 35, the leader is (none). Reverting to last stable offset > 1198. (org.apache.kafka.controller.QuorumController) > [quorum-controller-0-event-handler] > 2024-04-17 14:01:36,223 INFO [QuorumController id=0] > failAll(NotControllerException): failing writeNoOpRecord(152156824). > (org.apache.kafka.deferred.DeferredEventQueue) > [quorum-controller-0-event-handler] > 2024-04-17 14:01:36,223 INFO [QuorumController id=0] writeNoOpRecord: event > failed with NotControllerException in 6291037 microseconds. > (org.apache.kafka.controller.QuorumController) > [quorum-controller-0-event-handler]{code} > Logs from the follower: > {code:java} > 024-04-17 14:00:48,242 INFO [RaftManager id=2] Completed transition to > FollowerState(fetchTimeoutMs=2000, epoch=34, leaderId=0, voters=[0, 1, 2], > highWatermark=Optional[LogOffsetMetadata(offset=1113, > metadata=Optional.empty)], fetchingSnapshot=Optional.empty) from > Voted(epoch=34, votedId=0, voters=[0, 1, 2], electionTimeoutMs=1794) > (org.apache.kafka.raft.QuorumState) [kafka-2-raft-io-thread] > 2024-04-17 14:00:48,242 INFO [QuorumController id=2] In the new epoch 34, the > leader is 0. (org.apache.kafka.controller.QuorumController) > [quorum-controller-2-event-handler] > 2024-04-17 14:00:48,247 DEBUG [UnifiedLog partition=__cluster_metadata-0, > dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1116 > (exclusive)with recovery point 1116, last flushed: 1713362442238, current > time: 1713362448247,unflushed: 2
[jira] [Updated] (KAFKA-16620) Kraft quorum cannot be formed if all controllers are restarted at the same time
[ https://issues.apache.org/jira/browse/KAFKA-16620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge updated KAFKA-16620: -- Description: Controller quorum cannot seem to form at all after accidentally restarting all controller nodes at the same time in a test environment. This is reproducible, happens almost everytime when restarting all controller nodes of the cluster. Started a cluster with 3 controller nodes and 3 broker nodes. After restarting the controller nodes, one of them becomes the active controller but resigns due to fetch timeout. The quorum leadership bounces off like this between the nodes indefinitely. The controller.quorum.fetch.timeout.ms was set to the default of 2 seconds. Logs from an active controller: ``` 2024-04-17 14:00:48,250 INFO [QuorumController id=0] Becoming the active controller at epoch 34, next write offset 1116. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler] 2024-04-17 14:00:48,250 WARN [QuorumController id=0] Performing controller activation. Loaded ZK migration state of NONE. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler] 2024-04-17 14:00:48,701 INFO [RaftManager id=0] Node 1 disconnected. (org.apache.kafka.clients.NetworkClient) [kafka-0-raft-outbound-request-thread] 2024-04-17 14:00:48,701 WARN [RaftManager id=0] Connection to node 1 (my-cluster-controller-1.my-cluster-kafka-brokers.roller.svc.cluster.local/10.244.0.68:9090) could not be established. Node may not be available. (org.apache.kafka.clients.NetworkClient) [kafka-0-raft-outbound-request-thread] 2024-04-17 14:00:48,776 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1117 (exclusive)with recovery point 1117, last flushed: 1713362448239, current time: 1713362448776,unflushed: 1 (kafka.log.UnifiedLog) [kafka-0-raft-io-thread] 2024-04-17 14:00:49,277 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1118 (exclusive)with recovery point 1118, last flushed: 1713362448777, current time: ... 2024-04-17 14:01:35,934 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1200 (exclusive)with recovery point 1200, last flushed: 1713362489371, current time: 1713362495934,unflushed: 1 (kafka.log.UnifiedLog) [kafka-0-raft-io-thread] 2024-04-17 14:01:36,121 INFO [RaftManager id=0] Did not receive fetch request from the majority of the voters within 3000ms. Current fetched voters are []. (org.apache.kafka.raft.LeaderState) [kafka-0-raft-io-thread] 2024-04-17 14:01:36,223 WARN [QuorumController id=0] Renouncing the leadership due to a metadata log event. We were the leader at epoch 34, but in the new epoch 35, the leader is (none). Reverting to last stable offset 1198. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler] 2024-04-17 14:01:36,223 INFO [QuorumController id=0] failAll(NotControllerException): failing writeNoOpRecord(152156824). (org.apache.kafka.deferred.DeferredEventQueue) [quorum-controller-0-event-handler] 2024-04-17 14:01:36,223 INFO [QuorumController id=0] writeNoOpRecord: event failed with NotControllerException in 6291037 microseconds. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler] ``` Logs from the follower: ``` 024-04-17 14:00:48,242 INFO [RaftManager id=2] Completed transition to FollowerState(fetchTimeoutMs=2000, epoch=34, leaderId=0, voters=[0, 1, 2], highWatermark=Optional[LogOffsetMetadata(offset=1113, metadata=Optional.empty)], fetchingSnapshot=Optional.empty) from Voted(epoch=34, votedId=0, voters=[0, 1, 2], electionTimeoutMs=1794) (org.apache.kafka.raft.QuorumState) [kafka-2-raft-io-thread] 2024-04-17 14:00:48,242 INFO [QuorumController id=2] In the new epoch 34, the leader is 0. (org.apache.kafka.controller.QuorumController) [quorum-controller-2-event-handler] 2024-04-17 14:00:48,247 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1116 (exclusive)with recovery point 1116, last flushed: 1713362442238, current time: 1713362448247,unflushed: 2 (kafka.log.UnifiedLog) [kafka-2-raft-io-thread] 2024-04-17 14:00:48,777 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1117 (exclusive)with recovery point 1117, last flushed: 1713362448249, current time: 1713362448777,unflushed: 1 (kafka.log.UnifiedLog) [kafka-2-raft-io-thread] 2024-04-17 14:00:49,278 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1118 (exclusive)with recovery point 1118, last flushed: 1713362448811, current time: 1713362449278,unflushed ... 2024-04-17 14:01:29,371 DEBUG [UnifiedLog
[jira] [Created] (KAFKA-16620) Kraft quorum cannot be formed if all controllers are restarted at the same time
Gantigmaa Selenge created KAFKA-16620: - Summary: Kraft quorum cannot be formed if all controllers are restarted at the same time Key: KAFKA-16620 URL: https://issues.apache.org/jira/browse/KAFKA-16620 Project: Kafka Issue Type: Bug Reporter: Gantigmaa Selenge Controller quorum cannot seem to form at all after accidentally restarting all controller nodes at the same time in a test environment. This is reproducible, happens almost everytime when restarting all controller nodes of the cluster. Started a cluster with 3 controller nodes and 3 broker nodes. After restarting the controller nodes, one of them becomes the active controller but resigns due to fetch timeout. The quorum leadership bounces off like this between the nodes indefinitely. The controller.quorum.fetch.timeout.ms was set to the default of 2 seconds. Logs from an active controller: ``` 2024-04-17 14:00:48,250 INFO [QuorumController id=0] Becoming the active controller at epoch 34, next write offset 1116. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler] 2024-04-17 14:00:48,250 WARN [QuorumController id=0] Performing controller activation. Loaded ZK migration state of NONE. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler] 2024-04-17 14:00:48,701 INFO [RaftManager id=0] Node 1 disconnected. (org.apache.kafka.clients.NetworkClient) [kafka-0-raft-outbound-request-thread] 2024-04-17 14:00:48,701 WARN [RaftManager id=0] Connection to node 1 (my-cluster-controller-1.my-cluster-kafka-brokers.roller.svc.cluster.local/10.244.0.68:9090) could not be established. Node may not be available. (org.apache.kafka.clients.NetworkClient) [kafka-0-raft-outbound-request-thread] 2024-04-17 14:00:48,776 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1117 (exclusive)with recovery point 1117, last flushed: 1713362448239, current time: 1713362448776,unflushed: 1 (kafka.log.UnifiedLog) [kafka-0-raft-io-thread] 2024-04-17 14:00:49,277 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1118 (exclusive)with recovery point 1118, last flushed: 1713362448777, current time: ... 2024-04-17 14:01:35,934 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1200 (exclusive)with recovery point 1200, last flushed: 1713362489371, current time: 1713362495934,unflushed: 1 (kafka.log.UnifiedLog) [kafka-0-raft-io-thread] 2024-04-17 14:01:36,121 INFO [RaftManager id=0] Did not receive fetch request from the majority of the voters within 3000ms. Current fetched voters are []. (org.apache.kafka.raft.LeaderState) [kafka-0-raft-io-thread] 2024-04-17 14:01:36,223 WARN [QuorumController id=0] Renouncing the leadership due to a metadata log event. We were the leader at epoch 34, but in the new epoch 35, the leader is (none). Reverting to last stable offset 1198. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler] 2024-04-17 14:01:36,223 INFO [QuorumController id=0] failAll(NotControllerException): failing writeNoOpRecord(152156824). (org.apache.kafka.deferred.DeferredEventQueue) [quorum-controller-0-event-handler] 2024-04-17 14:01:36,223 INFO [QuorumController id=0] writeNoOpRecord: event failed with NotControllerException in 6291037 microseconds. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler] ``` Logs from the follower: ``` 024-04-17 14:00:48,242 INFO [RaftManager id=2] Completed transition to FollowerState(fetchTimeoutMs=2000, epoch=34, leaderId=0, voters=[0, 1, 2], highWatermark=Optional[LogOffsetMetadata(offset=1113, metadata=Optional.empty)], fetchingSnapshot=Optional.empty) from Voted(epoch=34, votedId=0, voters=[0, 1, 2], electionTimeoutMs=1794) (org.apache.kafka.raft.QuorumState) [kafka-2-raft-io-thread] 2024-04-17 14:00:48,242 INFO [QuorumController id=2] In the new epoch 34, the leader is 0. (org.apache.kafka.controller.QuorumController) [quorum-controller-2-event-handler] 2024-04-17 14:00:48,247 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1116 (exclusive)with recovery point 1116, last flushed: 1713362442238, current time: 1713362448247,unflushed: 2 (kafka.log.UnifiedLog) [kafka-2-raft-io-thread] 2024-04-17 14:00:48,777 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1117 (exclusive)with recovery point 1117, last flushed: 1713362448249, current time: 1713362448777,unflushed: 1 (kafka.log.UnifiedLog) [kafka-2-raft-io-thread] 2024-04-17 14:00:49,278 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset
[jira] [Created] (KAFKA-16612) Talking to controllers via AdminClient requires reconfiguring controller listener
Gantigmaa Selenge created KAFKA-16612: - Summary: Talking to controllers via AdminClient requires reconfiguring controller listener Key: KAFKA-16612 URL: https://issues.apache.org/jira/browse/KAFKA-16612 Project: Kafka Issue Type: Improvement Reporter: Gantigmaa Selenge After KIP-919, Kafka controllers register themselves with the active controller once they start up. This registration includes information about the endpoints which the controller listener is configured with. This endpoint is then sent to admin clients (via DescribeClusterResponse) so that clients send requests to the active controller. If the controller listener is configured with "CONTROLLER://0.0.0.0:9093" , this will result in admin clients requests failing (trying to connect to localhost). This was not clearly stated in the KIP or the documentation. When clients talking to brokers, advertised.listeners is used, however advertised.listener is forbidden for controllers. Should we allow advertised.listeners for controllers so that admin client can use it to talk to controllers, in the same way it uses it to talk to brokers? Or should the endpoints provided in controller.quorum.voters, be returned to admin client? If the intention is to use the regular "listeners" configuration of controller for clients, this should be clearly documented. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15752) KRaft support in SaslSslAdminIntegrationTest
[ https://issues.apache.org/jira/browse/KAFKA-15752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829457#comment-17829457 ] Gantigmaa Selenge commented on KAFKA-15752: --- The PR for this(https://github.com/apache/kafka/pull/15175) is blocked behind https://github.com/apache/kafka/pull/15377. > KRaft support in SaslSslAdminIntegrationTest > > > Key: KAFKA-15752 > URL: https://issues.apache.org/jira/browse/KAFKA-15752 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Sameer Tejani >Assignee: Gantigmaa Selenge >Priority: Minor > Labels: kraft, kraft-test, newbie > > The following tests in SaslSslAdminIntegrationTest in > core/src/test/scala/integration/kafka/api/SaslSslAdminIntegrationTest.scala > need to be updated to support KRaft > 95 : def testAclOperations(): Unit = { > 116 : def testAclOperations2(): Unit = { > 142 : def testAclDescribe(): Unit = { > 169 : def testAclDelete(): Unit = { > 219 : def testLegacyAclOpsNeverAffectOrReturnPrefixed(): Unit = { > 256 : def testAttemptToCreateInvalidAcls(): Unit = { > 351 : def testAclAuthorizationDenied(): Unit = { > 383 : def testCreateTopicsResponseMetadataAndConfig(): Unit = { > Scanned 527 lines. Found 0 KRaft tests out of 8 tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16240) Flaky test PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords(String).quorum=kraft
[ https://issues.apache.org/jira/browse/KAFKA-16240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge updated KAFKA-16240: -- Description: Failed run [https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15300/8/testReport/junit/kafka.api/PlaintextAdminIntegrationTest/Build___JDK_17_and_Scala_2_13___testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords_String__quorum_kraft_2/] Stack trace {code:java} org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: deleteRecords(api=DELETE_RECORDS) at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165) at kafka.api.PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords(PlaintextAdminIntegrationTest.scala:860) {code} was: Failed run [https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15300/8/testReport/junit/kafka.api/PlaintextAdminIntegrationTest/Build___JDK_17_and_Scala_2_13___testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords_String__quorum_kraft_2/] Stack trace ``` org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: deleteRecords(api=DELETE_RECORDS) at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165) at kafka.api.PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords(PlaintextAdminIntegrationTest.scala:860) ``` > Flaky test > PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords(String).quorum=kraft > - > > Key: KAFKA-16240 > URL: https://issues.apache.org/jira/browse/KAFKA-16240 > Project: Kafka > Issue Type: Test >Reporter: Gantigmaa Selenge >Priority: Minor > > Failed run > [https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15300/8/testReport/junit/kafka.api/PlaintextAdminIntegrationTest/Build___JDK_17_and_Scala_2_13___testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords_String__quorum_kraft_2/] > Stack trace > {code:java} > org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node > assignment. Call: deleteRecords(api=DELETE_RECORDS) at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165) > at > kafka.api.PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords(PlaintextAdminIntegrationTest.scala:860) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16240) Flaky test PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords(String).quorum=kraft
Gantigmaa Selenge created KAFKA-16240: - Summary: Flaky test PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords(String).quorum=kraft Key: KAFKA-16240 URL: https://issues.apache.org/jira/browse/KAFKA-16240 Project: Kafka Issue Type: Test Reporter: Gantigmaa Selenge Failed run [https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15300/8/testReport/junit/kafka.api/PlaintextAdminIntegrationTest/Build___JDK_17_and_Scala_2_13___testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords_String__quorum_kraft_2/] Stack trace ``` org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: deleteRecords(api=DELETE_RECORDS) at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165) at kafka.api.PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords(PlaintextAdminIntegrationTest.scala:860) ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16240) Flaky test PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords(String).quorum=kraft
[ https://issues.apache.org/jira/browse/KAFKA-16240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge updated KAFKA-16240: -- Priority: Minor (was: Major) > Flaky test > PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords(String).quorum=kraft > - > > Key: KAFKA-16240 > URL: https://issues.apache.org/jira/browse/KAFKA-16240 > Project: Kafka > Issue Type: Test >Reporter: Gantigmaa Selenge >Priority: Minor > > Failed run > [https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15300/8/testReport/junit/kafka.api/PlaintextAdminIntegrationTest/Build___JDK_17_and_Scala_2_13___testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords_String__quorum_kraft_2/] > Stack trace > ``` > org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a > node assignment. Call: deleteRecords(api=DELETE_RECORDS) at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165) > at > kafka.api.PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords(PlaintextAdminIntegrationTest.scala:860) > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15104) Flaky test MetadataQuorumCommandTest for method testDescribeQuorumReplicationSuccessful
[ https://issues.apache.org/jira/browse/KAFKA-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-15104: - Assignee: Gantigmaa Selenge > Flaky test MetadataQuorumCommandTest for method > testDescribeQuorumReplicationSuccessful > --- > > Key: KAFKA-15104 > URL: https://issues.apache.org/jira/browse/KAFKA-15104 > Project: Kafka > Issue Type: Bug > Components: tools >Affects Versions: 3.5.0 >Reporter: Josep Prat >Assignee: Gantigmaa Selenge >Priority: Major > Labels: flaky-test > > The MetadataQuorumCommandTest has become flaky on CI, I saw this failing: > org.apache.kafka.tools.MetadataQuorumCommandTest.[1] Type=Raft-Combined, > Name=testDescribeQuorumReplicationSuccessful, MetadataVersion=3.6-IV0, > Security=PLAINTEXT > Link to the CI: > https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-13865/2/testReport/junit/org.apache.kafka.tools/MetadataQuorumCommandTest/Build___JDK_8_and_Scala_2_121__Type_Raft_Combined__Name_testDescribeQuorumReplicationSuccessful__MetadataVersion_3_6_IV0__Security_PLAINTEXT/ > > h3. Error Message > {code:java} > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Received > a fatal error while waiting for the controller to acknowledge that we are > caught up{code} > h3. Stacktrace > {code:java} > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Received > a fatal error while waiting for the controller to acknowledge that we are > caught up at java.util.concurrent.FutureTask.report(FutureTask.java:122) at > java.util.concurrent.FutureTask.get(FutureTask.java:192) at > kafka.testkit.KafkaClusterTestKit.startup(KafkaClusterTestKit.java:419) at > kafka.test.junit.RaftClusterInvocationContext.lambda$getAdditionalExtensions$5(RaftClusterInvocationContext.java:115) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeBeforeTestExecutionCallbacks$5(TestMethodTestDescriptor.java:191) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeBeforeMethodsOrCallbacksUntilExceptionOccurs$6(TestMethodTestDescriptor.java:202) > at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeBeforeMethodsOrCallbacksUntilExceptionOccurs(TestMethodTestDescriptor.java:202) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeBeforeTestExecutionCallbacks(TestMethodTestDescriptor.java:190) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:136){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16211) Inconsistent static config values in CreateTopicsResult and DescribeConfigsResult
Gantigmaa Selenge created KAFKA-16211: - Summary: Inconsistent static config values in CreateTopicsResult and DescribeConfigsResult Key: KAFKA-16211 URL: https://issues.apache.org/jira/browse/KAFKA-16211 Project: Kafka Issue Type: Bug Components: controller Reporter: Gantigmaa Selenge When creating a topic in KRaft cluster, a config value returned in CreateTopicsResult is different than what you get from describe topic configs, if the config was set in broker.properties or controller.properties or in both but with different values. For example, start a broker with `segment.bytes` set to 573741824 in the properties file and then create a topic, the CreateTopicsResult contains: ConfigEntry(name=segment.bytes, value=1073741824, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[], type=INT, documentation=null) because the controller was started without setting this config. However when you describe configurations for the same topic, the config value set by the broker is returned: Create topic configsConfigEntry(name=segment.bytes, value=573741824, source=STATIC_BROKER_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[], type=null, documentation=null) Vice versa, if the controller is started with this config set to a different value, the create topic request returns the value set by the controller and then when you describe the config for the same topic, you get the value set by the broker. This makes it confusing to understand which value being is used. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16211) Inconsistent config values in CreateTopicsResult and DescribeConfigsResult
[ https://issues.apache.org/jira/browse/KAFKA-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge updated KAFKA-16211: -- Summary: Inconsistent config values in CreateTopicsResult and DescribeConfigsResult (was: Inconsistent static config values in CreateTopicsResult and DescribeConfigsResult) > Inconsistent config values in CreateTopicsResult and DescribeConfigsResult > -- > > Key: KAFKA-16211 > URL: https://issues.apache.org/jira/browse/KAFKA-16211 > Project: Kafka > Issue Type: Bug > Components: controller >Reporter: Gantigmaa Selenge >Priority: Minor > > When creating a topic in KRaft cluster, a config value returned in > CreateTopicsResult is different than what you get from describe topic > configs, if the config was set in broker.properties or controller.properties > or in both but with different values. > > For example, start a broker with `segment.bytes` set to 573741824 in the > properties file and then create a topic, the CreateTopicsResult contains: > ConfigEntry(name=segment.bytes, value=1073741824, source=DEFAULT_CONFIG, > isSensitive=false, isReadOnly=false, synonyms=[], type=INT, > documentation=null) > because the controller was started without setting this config. > However when you describe configurations for the same topic, the config value > set by the broker is returned: > Create topic configsConfigEntry(name=segment.bytes, value=573741824, > source=STATIC_BROKER_CONFIG, isSensitive=false, isReadOnly=false, > synonyms=[], type=null, documentation=null) > > Vice versa, if the controller is started with this config set to a different > value, the create topic request returns the value set by the controller and > then when you describe the config for the same topic, you get the value set > by the broker. This makes it confusing to understand which value being is > used. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15752) KRaft support in SaslSslAdminIntegrationTest
[ https://issues.apache.org/jira/browse/KAFKA-15752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-15752: - Assignee: Gantigmaa Selenge > KRaft support in SaslSslAdminIntegrationTest > > > Key: KAFKA-15752 > URL: https://issues.apache.org/jira/browse/KAFKA-15752 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Sameer Tejani >Assignee: Gantigmaa Selenge >Priority: Minor > Labels: kraft, kraft-test, newbie > > The following tests in SaslSslAdminIntegrationTest in > core/src/test/scala/integration/kafka/api/SaslSslAdminIntegrationTest.scala > need to be updated to support KRaft > 95 : def testAclOperations(): Unit = { > 116 : def testAclOperations2(): Unit = { > 142 : def testAclDescribe(): Unit = { > 169 : def testAclDelete(): Unit = { > 219 : def testLegacyAclOpsNeverAffectOrReturnPrefixed(): Unit = { > 256 : def testAttemptToCreateInvalidAcls(): Unit = { > 351 : def testAclAuthorizationDenied(): Unit = { > 383 : def testCreateTopicsResponseMetadataAndConfig(): Unit = { > Scanned 527 lines. Found 0 KRaft tests out of 8 tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15718) KRaft support in UncleanLeaderElectionTest
[ https://issues.apache.org/jira/browse/KAFKA-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794616#comment-17794616 ] Gantigmaa Selenge commented on KAFKA-15718: --- Unclean leader election is currently not supported in KRaft therefore we can add KRaft support for this test. https://issues.apache.org/jira/browse/KAFKA-12670 > KRaft support in UncleanLeaderElectionTest > -- > > Key: KAFKA-15718 > URL: https://issues.apache.org/jira/browse/KAFKA-15718 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Sameer Tejani >Assignee: Gantigmaa Selenge >Priority: Minor > Labels: kraft, kraft-test, newbie > > The following tests in UncleanLeaderElectionTest in > core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala > need to be updated to support KRaft > 103 : def testUncleanLeaderElectionEnabled(): Unit = { > 116 : def testUncleanLeaderElectionDisabled(): Unit = { > 127 : def testUncleanLeaderElectionEnabledByTopicOverride(): Unit = { > 142 : def testUncleanLeaderElectionDisabledByTopicOverride(): Unit = { > 157 : def testUncleanLeaderElectionInvalidTopicOverride(): Unit = { > 286 : def testTopicUncleanLeaderElectionEnable(): Unit = { > Scanned 358 lines. Found 0 KRaft tests out of 6 tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15718) KRaft support in UncleanLeaderElectionTest
[ https://issues.apache.org/jira/browse/KAFKA-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794616#comment-17794616 ] Gantigmaa Selenge edited comment on KAFKA-15718 at 12/8/23 9:49 AM: Unclean leader election is currently not supported in KRaft therefore we cannot add KRaft support for this test. https://issues.apache.org/jira/browse/KAFKA-12670 was (Author: JIRAUSER298404): Unclean leader election is currently not supported in KRaft therefore we can add KRaft support for this test. https://issues.apache.org/jira/browse/KAFKA-12670 > KRaft support in UncleanLeaderElectionTest > -- > > Key: KAFKA-15718 > URL: https://issues.apache.org/jira/browse/KAFKA-15718 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Sameer Tejani >Assignee: Gantigmaa Selenge >Priority: Minor > Labels: kraft, kraft-test, newbie > > The following tests in UncleanLeaderElectionTest in > core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala > need to be updated to support KRaft > 103 : def testUncleanLeaderElectionEnabled(): Unit = { > 116 : def testUncleanLeaderElectionDisabled(): Unit = { > 127 : def testUncleanLeaderElectionEnabledByTopicOverride(): Unit = { > 142 : def testUncleanLeaderElectionDisabledByTopicOverride(): Unit = { > 157 : def testUncleanLeaderElectionInvalidTopicOverride(): Unit = { > 286 : def testTopicUncleanLeaderElectionEnable(): Unit = { > Scanned 358 lines. Found 0 KRaft tests out of 6 tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15751) KRaft support in BaseAdminIntegrationTest
[ https://issues.apache.org/jira/browse/KAFKA-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-15751: - Assignee: Gantigmaa Selenge > KRaft support in BaseAdminIntegrationTest > - > > Key: KAFKA-15751 > URL: https://issues.apache.org/jira/browse/KAFKA-15751 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Sameer Tejani >Assignee: Gantigmaa Selenge >Priority: Minor > Labels: kraft, kraft-test, newbie > > The following tests in BaseAdminIntegrationTest in > core/src/test/scala/integration/kafka/api/BaseAdminIntegrationTest.scala need > to be updated to support KRaft > 70 : def testCreateDeleteTopics(): Unit = { > 163 : def testAuthorizedOperations(): Unit = { > Scanned 259 lines. Found 0 KRaft tests out of 2 tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15718) KRaft support in UncleanLeaderElectionTest
[ https://issues.apache.org/jira/browse/KAFKA-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-15718: - Assignee: Gantigmaa Selenge > KRaft support in UncleanLeaderElectionTest > -- > > Key: KAFKA-15718 > URL: https://issues.apache.org/jira/browse/KAFKA-15718 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Sameer Tejani >Assignee: Gantigmaa Selenge >Priority: Minor > Labels: kraft, kraft-test, newbie > > The following tests in UncleanLeaderElectionTest in > core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala > need to be updated to support KRaft > 103 : def testUncleanLeaderElectionEnabled(): Unit = { > 116 : def testUncleanLeaderElectionDisabled(): Unit = { > 127 : def testUncleanLeaderElectionEnabledByTopicOverride(): Unit = { > 142 : def testUncleanLeaderElectionDisabledByTopicOverride(): Unit = { > 157 : def testUncleanLeaderElectionInvalidTopicOverride(): Unit = { > 286 : def testTopicUncleanLeaderElectionEnable(): Unit = { > Scanned 358 lines. Found 0 KRaft tests out of 6 tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15720) KRaft support in DeleteTopicTest
[ https://issues.apache.org/jira/browse/KAFKA-15720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-15720: - Assignee: Gantigmaa Selenge > KRaft support in DeleteTopicTest > > > Key: KAFKA-15720 > URL: https://issues.apache.org/jira/browse/KAFKA-15720 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Sameer Tejani >Assignee: Gantigmaa Selenge >Priority: Minor > Labels: kraft, kraft-test, newbie > > The following tests in DeleteTopicTest in > core/src/test/scala/unit/kafka/admin/DeleteTopicTest.scala need to be updated > to support KRaft > 53 : def testDeleteTopicWithAllAliveReplicas(): Unit = { > 62 : def testResumeDeleteTopicWithRecoveredFollower(): Unit = { > 86 : def testResumeDeleteTopicOnControllerFailover(): Unit = { > 112 : def testPartitionReassignmentDuringDeleteTopic(): Unit = { > 191 : def testIncreasePartitionCountDuringDeleteTopic(): Unit = { > 253 : def testDeleteTopicDuringAddPartition(): Unit = { > 281 : def testAddPartitionDuringDeleteTopic(): Unit = { > 298 : def testRecreateTopicAfterDeletion(): Unit = { > 314 : def testDeleteNonExistingTopic(): Unit = { > 332 : def testDeleteTopicWithCleaner(): Unit = { > 362 : def testDeleteTopicAlreadyMarkedAsDeleted(): Unit = { > 403 : def testDisableDeleteTopic(): Unit = { > 421 : def testDeletingPartiallyDeletedTopic(): Unit = { > Scanned 451 lines. Found 0 KRaft tests out of 13 tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15711) KRaft support in LogRecoveryTest
[ https://issues.apache.org/jira/browse/KAFKA-15711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-15711: - Assignee: Gantigmaa Selenge > KRaft support in LogRecoveryTest > > > Key: KAFKA-15711 > URL: https://issues.apache.org/jira/browse/KAFKA-15711 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Sameer Tejani >Assignee: Gantigmaa Selenge >Priority: Minor > Labels: kraft, kraft-test, newbie > > The following tests in LogRecoveryTest in > core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala need to be > updated to support KRaft > 103 : def testHWCheckpointNoFailuresSingleLogSegment(): Unit = { > 120 : def testHWCheckpointWithFailuresSingleLogSegment(): Unit = { > 180 : def testHWCheckpointNoFailuresMultipleLogSegments(): Unit = { > 196 : def testHWCheckpointWithFailuresMultipleLogSegments(): Unit = { > Scanned 247 lines. Found 0 KRaft tests out of 4 tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15667) preCheck the invalid configuration for tiered storage replication factor
[ https://issues.apache.org/jira/browse/KAFKA-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-15667: - Assignee: Gantigmaa Selenge > preCheck the invalid configuration for tiered storage replication factor > > > Key: KAFKA-15667 > URL: https://issues.apache.org/jira/browse/KAFKA-15667 > Project: Kafka > Issue Type: Improvement > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Luke Chen >Assignee: Gantigmaa Selenge >Priority: Major > > `remote.log.metadata.topic.replication.factor` is a config to set the > Replication factor of remote log metadata topic. For the > `min.insync.replicas`, we'll use the broker config. Today, if the > `remote.log.metadata.topic.replication.factor` < `min.insync.replicas` value, > everything still works until new remote log metadata records created. We > should be able to identify it when broker startup to notify users to fix the > invalid config. > ref: > https://kafka.apache.org/documentation/#remote_log_metadata_manager_remote.log.metadata.topic.replication.factor -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15566) Klaky tests in FetchRequestTest.scala in KRaft mode
[ https://issues.apache.org/jira/browse/KAFKA-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-15566: - Assignee: Gantigmaa Selenge > Klaky tests in FetchRequestTest.scala in KRaft mode > --- > > Key: KAFKA-15566 > URL: https://issues.apache.org/jira/browse/KAFKA-15566 > Project: Kafka > Issue Type: Improvement >Reporter: Deng Ziming >Assignee: Gantigmaa Selenge >Priority: Major > > |[https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14295/4/#showFailuresLink] > [Build / JDK 11 and Scala 2.13 / > kafka.server.FetchRequestTest.testLastFetchedEpochValidation(String).quorum=kraft|https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14295/4/testReport/junit/kafka.server/FetchRequestTest/Build___JDK_11_and_Scala_2_13___testLastFetchedEpochValidation_String__quorum_kraft/] > [Build / JDK 11 and Scala 2.13 / > kafka.server.FetchRequestTest.testLastFetchedEpochValidationV12(String).quorum=kraft|https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14295/4/testReport/junit/kafka.server/FetchRequestTest/Build___JDK_11_and_Scala_2_13___testLastFetchedEpochValidationV12_String__quorum_kraft/] > [Build / JDK 11 and Scala 2.13 / > kafka.server.FetchRequestTest.testFetchWithPartitionsWithIdError(String).quorum=kraft|https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14295/4/testReport/junit/kafka.server/FetchRequestTest/Build___JDK_11_and_Scala_2_13___testFetchWithPartitionsWithIdError_String__quorum_kraft_2/] > [Build / JDK 11 and Scala 2.13 / > kafka.server.FetchRequestTest.testLastFetchedEpochValidation(String).quorum=kraft|https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14295/4/testReport/junit/kafka.server/FetchRequestTest/Build___JDK_11_and_Scala_2_13___testLastFetchedEpochValidation_String__quorum_kraft_2/] > [Build / JDK 11 and Scala 2.13 / > kafka.server.FetchRequestTest.testLastFetchedEpochValidationV12(String).quorum=kraft|https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14295/4/testReport/junit/kafka.server/FetchRequestTest/Build___JDK_11_and_Scala_2_13___testLastFetchedEpochValidationV12_String__quorum_kraft_2/]| > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15507) adminClient should not throw retriable exception when closing instance
[ https://issues.apache.org/jira/browse/KAFKA-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-15507: - Assignee: Gantigmaa Selenge > adminClient should not throw retriable exception when closing instance > -- > > Key: KAFKA-15507 > URL: https://issues.apache.org/jira/browse/KAFKA-15507 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 3.5.1 >Reporter: Luke Chen >Assignee: Gantigmaa Selenge >Priority: Major > > When adminClient is closing the instance, it'll first set > `hardShutdownTimeMs` to a positive timeout value, and then wait until > existing threads to complete within the timeout. However, within this > waiting, when new caller tries to invoke new commend in adminClient, it'll > immediately get an > {code:java} > TimeoutException("The AdminClient thread is not accepting new calls.") > {code} > There are some issues with the design: > 1. Since the `TimeoutException` is a retriable exception, the caller will > enter a tight loop and keep trying it > 2. The error message is confusing. What does "the adminClient is not > accepting new calls" mean? > We should improve it by throwing a non-retriable error (ex: > IllegalStateException), then, the error message should clearly describe the > adminClient is closing. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15201) When git fails, script goes into a loop
[ https://issues.apache.org/jira/browse/KAFKA-15201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-15201: - Assignee: (was: Gantigmaa Selenge) > When git fails, script goes into a loop > --- > > Key: KAFKA-15201 > URL: https://issues.apache.org/jira/browse/KAFKA-15201 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Priority: Major > > When the git push to remote fails (let's say with unauthenticated exception), > then the script runs into a loop. It should not retry and fail gracefully > instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15158) Add metrics for RemoteRequestsPerSec
[ https://issues.apache.org/jira/browse/KAFKA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-15158: - Assignee: Gantigmaa Selenge > Add metrics for RemoteRequestsPerSec > > > Key: KAFKA-15158 > URL: https://issues.apache.org/jira/browse/KAFKA-15158 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Gantigmaa Selenge >Priority: Major > Fix For: 3.6.0 > > > Add the following metrics for better observability into the RemoteLog related > activities inside the broker. > 1. RemoteWriteRequestsPerSec > 2. RemoteDeleteRequestsPerSec > 3. BuildRemoteLogAuxStateRequestsPerSec > > These metrics will be calculated at topic level (we can add them at > brokerTopicStats) > -*RemoteWriteRequestsPerSec* will be marked on every call to > RemoteLogManager#- > -copyLogSegmentsToRemote()- already covered by KAFKA-14953 > > *RemoteDeleteRequestsPerSec* will be marked on every call to > RemoteLogManager#cleanupExpiredRemoteLogSegments(). This method is introduced > in [https://github.com/apache/kafka/pull/13561] > *BuildRemoteLogAuxStateRequestsPerSec* will be marked on every call to > ReplicaFetcherTierStateMachine#buildRemoteLogAuxState() > > (Note: For all the above, add Error metrics as well such as > RemoteDeleteErrorPerSec) > (Note: This requires a change in KIP-405 and hence, must be approved by KIP > author [~satishd] ) > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15294) Make remote storage related configs as public (i.e. non-internal)
[ https://issues.apache.org/jira/browse/KAFKA-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-15294: - Assignee: Gantigmaa Selenge > Make remote storage related configs as public (i.e. non-internal) > - > > Key: KAFKA-15294 > URL: https://issues.apache.org/jira/browse/KAFKA-15294 > Project: Kafka > Issue Type: Sub-task >Reporter: Luke Chen >Assignee: Gantigmaa Selenge >Priority: Blocker > Fix For: 3.6.0 > > > We should publish all the remote storage related configs in v3.6.0. It can be > verified by: > > {code:java} > ./gradlew releaseTarGz > # The build output is stored in > ./core/build/distributions/kafka_2.13-3.x.x-site-docs.tgz. Untar the file > verify it{code} > {{}} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15201) When git fails, script goes into a loop
[ https://issues.apache.org/jira/browse/KAFKA-15201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754643#comment-17754643 ] Gantigmaa Selenge commented on KAFKA-15201: --- Hi [~divijvaidya] I could not reproduce the issue mentioned in this issue. The script failed immediately on git error and exited for me. Can you please provide more information on this? > When git fails, script goes into a loop > --- > > Key: KAFKA-15201 > URL: https://issues.apache.org/jira/browse/KAFKA-15201 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Gantigmaa Selenge >Priority: Major > > When the git push to remote fails (let's say with unauthenticated exception), > then the script runs into a loop. It should not retry and fail gracefully > instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15201) When git fails, script goes into a loop
[ https://issues.apache.org/jira/browse/KAFKA-15201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-15201: - Assignee: Gantigmaa Selenge > When git fails, script goes into a loop > --- > > Key: KAFKA-15201 > URL: https://issues.apache.org/jira/browse/KAFKA-15201 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Gantigmaa Selenge >Priority: Major > > When the git push to remote fails (let's say with unauthenticated exception), > then the script runs into a loop. It should not retry and fail gracefully > instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15210) Mention vote should be open for at atleast 72 hours
[ https://issues.apache.org/jira/browse/KAFKA-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-15210: - Assignee: Gantigmaa Selenge > Mention vote should be open for at atleast 72 hours > --- > > Key: KAFKA-15210 > URL: https://issues.apache.org/jira/browse/KAFKA-15210 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Gantigmaa Selenge >Priority: Minor > > The voting deadline should be at least 3 days from the time VOTE email is > posted. Hence, the script should mention that the date should be at least 72 > hours from now. The change needs to be done at the line below: > *** Please download, test and vote by 28, 9am PT> -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-14823) Clean up ConfigProvider API
[ https://issues.apache.org/jira/browse/KAFKA-14823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-14823: - Assignee: (was: Gantigmaa Selenge) > Clean up ConfigProvider API > --- > > Key: KAFKA-14823 > URL: https://issues.apache.org/jira/browse/KAFKA-14823 > Project: Kafka > Issue Type: Improvement >Reporter: Mickael Maison >Priority: Major > > The ConfigProvider interface exposes several methods that are not used: > - ConfigData get(String path) > - default void subscribe(String path, Set keys, ConfigChangeCallback > callback) > - default void unsubscribe(String path, Set keys, > ConfigChangeCallback callback) > - default void unsubscribeAll() > We should either build mechanisms to support them or deprecate them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-14823) Clean up ConfigProvider API
[ https://issues.apache.org/jira/browse/KAFKA-14823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-14823: - Assignee: Gantigmaa Selenge > Clean up ConfigProvider API > --- > > Key: KAFKA-14823 > URL: https://issues.apache.org/jira/browse/KAFKA-14823 > Project: Kafka > Issue Type: Improvement >Reporter: Mickael Maison >Assignee: Gantigmaa Selenge >Priority: Major > > The ConfigProvider interface exposes several methods that are not used: > - ConfigData get(String path) > - default void subscribe(String path, Set keys, ConfigChangeCallback > callback) > - default void unsubscribe(String path, Set keys, > ConfigChangeCallback callback) > - default void unsubscribeAll() > We should either build mechanisms to support them or deprecate them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-13478) KIP-802: Validation Support for Kafka Connect SMT Options
[ https://issues.apache.org/jira/browse/KAFKA-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727860#comment-17727860 ] Gantigmaa Selenge commented on KAFKA-13478: --- Hi [~gunnar.morling], are you still working on the KIP? If not, I would be keen on working on it as it looks quite interesting. > KIP-802: Validation Support for Kafka Connect SMT Options > - > > Key: KAFKA-13478 > URL: https://issues.apache.org/jira/browse/KAFKA-13478 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Gunnar Morling >Priority: Major > > Implement > [KIP-802|https://cwiki.apache.org/confluence/display/KAFKA/KIP-802%3A+Validation+Support+for+Kafka+Connect+SMT+Options], > adding validation support for SMT options. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-8982) Admin.deleteRecords should retry when failing to fetch metadata
[ https://issues.apache.org/jira/browse/KAFKA-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-8982: Assignee: Gantigmaa Selenge > Admin.deleteRecords should retry when failing to fetch metadata > --- > > Key: KAFKA-8982 > URL: https://issues.apache.org/jira/browse/KAFKA-8982 > Project: Kafka > Issue Type: Improvement >Reporter: Mickael Maison >Assignee: Gantigmaa Selenge >Priority: Major > > Currently deleteRecords() does not retry to fetch metadata and immediatly > fails all futures if metadata contains any errors. It should instead attempt > to refresh metadata. > https://github.com/apache/kafka/pull/7296#discussion_r330808723 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-14662) ACL listings in documentation are out of date
[ https://issues.apache.org/jira/browse/KAFKA-14662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-14662: - Assignee: Gantigmaa Selenge > ACL listings in documentation are out of date > - > > Key: KAFKA-14662 > URL: https://issues.apache.org/jira/browse/KAFKA-14662 > Project: Kafka > Issue Type: Bug > Components: core, docs >Reporter: Mickael Maison >Assignee: Gantigmaa Selenge >Priority: Major > > ACLs listed in > https://kafka.apache.org/documentation/#operations_resources_and_protocols > are out of date. They only cover API keys up to 47 (OffsetDelete) and don't > include DescribeClientQuotas, AlterClientQuotas, > DescribeUserScramCredentials, AlterUserScramCredentials, DescribeQuorum, > AlterPartition, UpdateFeatures, DescribeCluster, DescribeProducers, > UnregisterBroker, DescribeTransactions, ListTransactions, AllocateProducerIds. > This is hard to keep up to date so we should consider whether this could be > generated automatically. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-14669) Include MirrorMaker connector configurations in docs
[ https://issues.apache.org/jira/browse/KAFKA-14669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-14669: - Assignee: Gantigmaa Selenge > Include MirrorMaker connector configurations in docs > > > Key: KAFKA-14669 > URL: https://issues.apache.org/jira/browse/KAFKA-14669 > Project: Kafka > Issue Type: Improvement > Components: docs >Reporter: Mickael Maison >Assignee: Gantigmaa Selenge >Priority: Major > > In the https://kafka.apache.org/documentation/#georeplication-flow-configure > section we list some of the MirrorMaker connectors configurations. These are > hardcoded in the docs: > https://github.com/apache/kafka/blob/trunk/docs/ops.html#L768-L788 > Instead we should used the generated docs (added as part of > https://github.com/apache/kafka/commit/40af3a74507cce9155f4fb4fca317d3c68235d78) > like we do for the file connectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-14709) Move content in connect/mirror/README.md to the docs
[ https://issues.apache.org/jira/browse/KAFKA-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-14709: - Assignee: Gantigmaa Selenge > Move content in connect/mirror/README.md to the docs > > > Key: KAFKA-14709 > URL: https://issues.apache.org/jira/browse/KAFKA-14709 > Project: Kafka > Issue Type: Improvement > Components: docs, mirrormaker >Reporter: Mickael Maison >Assignee: Gantigmaa Selenge >Priority: Major > > We should move all the content in > https://github.com/apache/kafka/blob/trunk/connect/mirror/README.md to the > relevant doc sections. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14763) Add integration test for DelegationTokenCommand tool
Gantigmaa Selenge created KAFKA-14763: - Summary: Add integration test for DelegationTokenCommand tool Key: KAFKA-14763 URL: https://issues.apache.org/jira/browse/KAFKA-14763 Project: Kafka Issue Type: Task Reporter: Gantigmaa Selenge When moving DelegationTokenCommand from core to tools module in [https://github.com/apache/kafka/pull/13172], the existing integration test could not be migrated because there is no {{BaseRequestTest}} or {{SaslSetup}} to help setup integration tests in the tools module. We will need to create similar setup in the tools module and create an integration test for the command tool. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-14581) Move GetOffsetShell to tools
[ https://issues.apache.org/jira/browse/KAFKA-14581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-14581: - Assignee: (was: Gantigmaa Selenge) > Move GetOffsetShell to tools > > > Key: KAFKA-14581 > URL: https://issues.apache.org/jira/browse/KAFKA-14581 > Project: Kafka > Issue Type: Sub-task >Reporter: Mickael Maison >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14590) Move DelegationTokenCommand to tools
[ https://issues.apache.org/jira/browse/KAFKA-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge updated KAFKA-14590: -- Fix Version/s: 3.5.0 Affects Version/s: (was: 3.5.0) > Move DelegationTokenCommand to tools > > > Key: KAFKA-14590 > URL: https://issues.apache.org/jira/browse/KAFKA-14590 > Project: Kafka > Issue Type: Sub-task >Reporter: Mickael Maison >Assignee: Gantigmaa Selenge >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-14592) Move FeatureCommand to tools
[ https://issues.apache.org/jira/browse/KAFKA-14592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-14592: - Assignee: Gantigmaa Selenge > Move FeatureCommand to tools > > > Key: KAFKA-14592 > URL: https://issues.apache.org/jira/browse/KAFKA-14592 > Project: Kafka > Issue Type: Sub-task >Reporter: Mickael Maison >Assignee: Gantigmaa Selenge >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-14581) Move GetOffsetShell to tools
[ https://issues.apache.org/jira/browse/KAFKA-14581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-14581: - Assignee: Gantigmaa Selenge > Move GetOffsetShell to tools > > > Key: KAFKA-14581 > URL: https://issues.apache.org/jira/browse/KAFKA-14581 > Project: Kafka > Issue Type: Sub-task >Reporter: Mickael Maison >Assignee: Gantigmaa Selenge >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-14590) Move DelegationTokenCommand to tools
[ https://issues.apache.org/jira/browse/KAFKA-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-14590: - Assignee: Gantigmaa Selenge > Move DelegationTokenCommand to tools > > > Key: KAFKA-14590 > URL: https://issues.apache.org/jira/browse/KAFKA-14590 > Project: Kafka > Issue Type: Sub-task >Reporter: Mickael Maison >Assignee: Gantigmaa Selenge >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-14371) quorum-state file contains empty/unused clusterId field
[ https://issues.apache.org/jira/browse/KAFKA-14371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gantigmaa Selenge reassigned KAFKA-14371: - Assignee: Gantigmaa Selenge > quorum-state file contains empty/unused clusterId field > --- > > Key: KAFKA-14371 > URL: https://issues.apache.org/jira/browse/KAFKA-14371 > Project: Kafka > Issue Type: Improvement >Reporter: Ron Dagostino >Assignee: Gantigmaa Selenge >Priority: Minor > > The KRaft controller's quorum-state file > `$LOG_DIR/__cluster_metadata-0/quorum-state` contains an empty clusterId > value. This value is never non-empty, and it is never used after it is > written and then subsequently read. This is a cosmetic issue; it would be > best if this value did not exist there. The cluster ID already exists in the > `$LOG_DIR/meta.properties` file. -- This message was sent by Atlassian Jira (v8.20.10#820010)