[ https://issues.apache.org/jira/browse/KAFKA-12465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352518#comment-17352518 ]
Omnia Ibrahim edited comment on KAFKA-12465 at 5/27/21, 3:40 PM: ----------------------------------------------------------------- I have been testing KRAFT and I tried this scenario where I setup a cluster with 3 combined nodes (broker, controller) and 3 nodes as brokers then later at some point I add an extra 2 nodes to the KRAFT with different cluster id. I would expect if this is a really deployment on production then these 2 nodes with wrong cluster id should crash immediately so we can tell that something is wrong during the deployment. The scenario I was testing is the following: * Setup a cluster with 3 combined raft nodes (broker, controller mode) + 3 brokers nodes with cluster id {{CLUSTER_ID_1}} and they elected {{raft-node-1}} to become the leader. * Added an extra 2 nodes later to the raft with different cluster id {{WRONG_CLUSTER_ID}} * The the extra nodes don't crash however it stay in running mode and keep throw error {code:java} {"level":"ERROR","message":"[RaftManager nodeId=8] Unexpected error INCONSISTENT_CLUSTER_ID in FETCH response: InboundResponse(correlationId=16699, data=FetchResponseData(throttleTimeMs=0, errorCode=104, sessionId=0, responses=[]), sourceId=2)","logger":"org.apache.kafka.raft.KafkaRaftClient"}{code} * {{raft-node-1}} don't throw errors, only warning for connection issues connection {code:java} {"level":"WARN","message":"[RaftManager nodeId=1] Error connecting to node raft-node-4:9093 (id: 8 rack: null)","logger":"org.apache.kafka.clients.NetworkClient","throwable":{"class":"java.net.UnknownHostException","msg":"raft-node-4","stack":["java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)","java.net.InetAddress.getAllByName0(InetAddress.java:1505)","java.net.InetAddress.getAllByName(InetAddress.java:1364)","java.net.InetAddress.getAllByName(InetAddress.java:1298)","org.apache.kafka.clients.DefaultHostResolver.resolve(DefaultHostResolver.java:27)","org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:111)","org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:512)","org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:466)","org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:172)","org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:985)","org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:311)","kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:103)","kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99)","scala.collection.IterableOnceOps.foreach(IterableOnce.scala:553)","scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:551)","scala.collection.AbstractIterable.foreach(Iterable.scala:920)","kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99)","kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73)","kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:94)","kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)"]}}{code} If this is a real deployment the error `INCONSISTENT_CLUSTER_ID` should be fatel at all time, otherwise how can we tell if these nodes is failing to join the active raft quourm? was (Author: omnia_h_ibrahim): I have been testing KRAFT and I was trying this scenario where I setup a cluster with 3 combined nodes (broker, controller) and 3 nodes as brokers then later at some point I add an extra 2 nodes to the KRAFT with different cluster id. I would expect if this is a really deployment on production then these 2 nodes with wrong cluster id should crash immediately so we can tell that something is wrong during the deployment. The scenario I was testing is the following: * Setup a cluster with 3 combined raft nodes (broker, controller mode) + 3 brokers nodes with cluster id {{CLUSTER_ID_1}} and they elected {{raft-node-1}} to become the leader. * Added an extra 2 nodes later to the raft with different cluster id {{WRONG_CLUSTER_ID}} * The the extra nodes don't crash however it stay in running mode and keep throw error {code:java} {"level":"ERROR","message":"[RaftManager nodeId=8] Unexpected error INCONSISTENT_CLUSTER_ID in FETCH response: InboundResponse(correlationId=16699, data=FetchResponseData(throttleTimeMs=0, errorCode=104, sessionId=0, responses=[]), sourceId=2)","logger":"org.apache.kafka.raft.KafkaRaftClient"}{code} * {{raft-node-1}} don't throw errors, only warning for connection issues connection {code:java} {"level":"WARN","message":"[RaftManager nodeId=1] Error connecting to node raft-node-4:9093 (id: 8 rack: null)","logger":"org.apache.kafka.clients.NetworkClient","throwable":{"class":"java.net.UnknownHostException","msg":"raft-node-4","stack":["java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)","java.net.InetAddress.getAllByName0(InetAddress.java:1505)","java.net.InetAddress.getAllByName(InetAddress.java:1364)","java.net.InetAddress.getAllByName(InetAddress.java:1298)","org.apache.kafka.clients.DefaultHostResolver.resolve(DefaultHostResolver.java:27)","org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:111)","org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:512)","org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:466)","org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:172)","org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:985)","org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:311)","kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:103)","kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99)","scala.collection.IterableOnceOps.foreach(IterableOnce.scala:553)","scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:551)","scala.collection.AbstractIterable.foreach(Iterable.scala:920)","kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99)","kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73)","kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:94)","kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)"]}}{code} If this is a real deployment the error `INCONSISTENT_CLUSTER_ID` should be fatel at all time, otherwise how can we tell if these nodes is failing to join the active raft quourm? > Decide whether inconsistent cluster id error are fatal > ------------------------------------------------------ > > Key: KAFKA-12465 > URL: https://issues.apache.org/jira/browse/KAFKA-12465 > Project: Kafka > Issue Type: Sub-task > Reporter: dengziming > Priority: Major > > Currently, we just log an error when an inconsistent cluster-id occurred. We > should set a window during startup when these errors are fatal but after that > window, we no longer treat them to be fatal. see > https://github.com/apache/kafka/pull/10289#discussion_r592853088 -- This message was sent by Atlassian Jira (v8.3.4#803005)