[ 
https://issues.apache.org/jira/browse/KAFKA-12465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352518#comment-17352518
 ] 

Omnia Ibrahim edited comment on KAFKA-12465 at 5/27/21, 3:40 PM:
-----------------------------------------------------------------

I have been testing KRAFT and I tried this scenario where I setup a cluster 
with 3 combined nodes (broker, controller) and 3 nodes as brokers then later at 
some point I add an extra 2 nodes to the KRAFT with different cluster id. I 
would expect if this is a really deployment on production then these 2 nodes 
with wrong cluster id should crash immediately so we can tell that something is 
wrong during the deployment. 

The scenario I was testing is the following:
 * Setup a cluster with 3 combined raft nodes (broker, controller mode) + 3 
brokers nodes with cluster id {{CLUSTER_ID_1}} and they elected {{raft-node-1}} 
to become the leader.
 *  Added an extra 2 nodes later to the raft with different cluster id 
{{WRONG_CLUSTER_ID}}
 * The the extra nodes don't crash however it stay in running mode and keep 
throw error
{code:java}
 {"level":"ERROR","message":"[RaftManager nodeId=8] Unexpected error 
INCONSISTENT_CLUSTER_ID in FETCH response: InboundResponse(correlationId=16699, 
data=FetchResponseData(throttleTimeMs=0, errorCode=104, sessionId=0, 
responses=[]), 
sourceId=2)","logger":"org.apache.kafka.raft.KafkaRaftClient"}{code}

 * {{raft-node-1}} don't throw errors, only warning for connection issues 
connection

{code:java}
{"level":"WARN","message":"[RaftManager nodeId=1] Error connecting to node 
raft-node-4:9093 (id: 8 rack: 
null)","logger":"org.apache.kafka.clients.NetworkClient","throwable":{"class":"java.net.UnknownHostException","msg":"raft-node-4","stack":["java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)","java.net.InetAddress.getAllByName0(InetAddress.java:1505)","java.net.InetAddress.getAllByName(InetAddress.java:1364)","java.net.InetAddress.getAllByName(InetAddress.java:1298)","org.apache.kafka.clients.DefaultHostResolver.resolve(DefaultHostResolver.java:27)","org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:111)","org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:512)","org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:466)","org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:172)","org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:985)","org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:311)","kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:103)","kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99)","scala.collection.IterableOnceOps.foreach(IterableOnce.scala:553)","scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:551)","scala.collection.AbstractIterable.foreach(Iterable.scala:920)","kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99)","kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73)","kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:94)","kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)"]}}{code}
If this is a real deployment the error `INCONSISTENT_CLUSTER_ID` should be 
fatel at all time, otherwise how can we tell if these nodes is failing to join 
the active raft quourm? 


was (Author: omnia_h_ibrahim):
I have been testing KRAFT and I was trying this scenario where I setup a 
cluster with 3 combined nodes (broker, controller) and 3 nodes as brokers then 
later at some point I add an extra 2 nodes to the KRAFT with different cluster 
id. I would expect if this is a really deployment on production then these 2 
nodes with wrong cluster id should crash immediately so we can tell that 
something is wrong during the deployment. 

The scenario I was testing is the following:
 * Setup a cluster with 3 combined raft nodes (broker, controller mode) + 3 
brokers nodes with cluster id {{CLUSTER_ID_1}} and they elected {{raft-node-1}} 
to become the leader.
 *  Added an extra 2 nodes later to the raft with different cluster id 
{{WRONG_CLUSTER_ID}}
 * The the extra nodes don't crash however it stay in running mode and keep 
throw error
{code:java}
 {"level":"ERROR","message":"[RaftManager nodeId=8] Unexpected error 
INCONSISTENT_CLUSTER_ID in FETCH response: InboundResponse(correlationId=16699, 
data=FetchResponseData(throttleTimeMs=0, errorCode=104, sessionId=0, 
responses=[]), 
sourceId=2)","logger":"org.apache.kafka.raft.KafkaRaftClient"}{code}

 * {{raft-node-1}} don't throw errors, only warning for connection issues 
connection

{code:java}
{"level":"WARN","message":"[RaftManager nodeId=1] Error connecting to node 
raft-node-4:9093 (id: 8 rack: 
null)","logger":"org.apache.kafka.clients.NetworkClient","throwable":{"class":"java.net.UnknownHostException","msg":"raft-node-4","stack":["java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)","java.net.InetAddress.getAllByName0(InetAddress.java:1505)","java.net.InetAddress.getAllByName(InetAddress.java:1364)","java.net.InetAddress.getAllByName(InetAddress.java:1298)","org.apache.kafka.clients.DefaultHostResolver.resolve(DefaultHostResolver.java:27)","org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:111)","org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:512)","org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:466)","org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:172)","org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:985)","org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:311)","kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:103)","kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99)","scala.collection.IterableOnceOps.foreach(IterableOnce.scala:553)","scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:551)","scala.collection.AbstractIterable.foreach(Iterable.scala:920)","kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99)","kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73)","kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:94)","kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)"]}}{code}
If this is a real deployment the error `INCONSISTENT_CLUSTER_ID` should be 
fatel at all time, otherwise how can we tell if these nodes is failing to join 
the active raft quourm? 

> Decide whether inconsistent cluster id error are fatal
> ------------------------------------------------------
>
>                 Key: KAFKA-12465
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12465
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: dengziming
>            Priority: Major
>
> Currently, we just log an error when an inconsistent cluster-id occurred. We 
> should set a window during startup when these errors are fatal but after that 
> window, we no longer treat them to be fatal. see 
> https://github.com/apache/kafka/pull/10289#discussion_r592853088



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to