[jira] [Comment Edited] (KAFKA-7165) Error while creating ephemeral at /brokers/ids/BROKER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668672#comment-16668672 ] Jonathan Santilli edited comment on KAFKA-7165 at 10/30/18 12:59 PM: - Thanks for your reply [~junrao] , that was the only time we had that issue of the _org.apache.zookeeper.KeeperException$SessionExpiredException_ so far. Now we are in version 2.0 of Kafka and from time to time suffering the *NODEEXISTS* issue. Maybe the errors were related but difficult to ensure that, hopefully with the fix, we can get rid of the *NODEEXISTS* error. Cheers! was (Author: pachilo): Thanks for your reply [~junrao] , that was the only time we had that issue of the _org.apache.zookeeper.KeeperException$SessionExpiredException_ so far. Now we are in version 2.0 of Kafka and from time to time suffering the *NODEEXISTS* issue. Maybe the errors were related but difficult to ensure that, hopefully with the fix, we can get rid of the *NODEEXISTS* error*.* Cheers! > Error while creating ephemeral at /brokers/ids/BROKER_ID > > > Key: KAFKA-7165 > URL: https://issues.apache.org/jira/browse/KAFKA-7165 > Project: Kafka > Issue Type: Bug > Components: core, zkclient >Affects Versions: 1.1.0 >Reporter: Jonathan Santilli >Assignee: Jonathan Santilli >Priority: Major > > Kafka version: 1.1.0 > Zookeeper version: 3.4.12 > 4 Kafka Brokers > 4 Zookeeper servers > > In one of the 4 brokers of the cluster, we detect the following error: > [2018-07-14 04:38:23,784] INFO Unable to read additional data from server > sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:24,509] INFO Opening socket connection to server > *ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL > (unknown error) (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:24,510] INFO Socket connection established to > *ZOOKEEPER_SERVER_1:PORT*, initiating session > (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:24,513] INFO Unable to read additional data from server > sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:25,287] INFO Opening socket connection to server > *ZOOKEEPER_SERVER_2:PORT*. Will not attempt to authenticate using SASL > (unknown error) (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:25,287] INFO Socket connection established to > *ZOOKEEPER_SERVER_2:PORT*, initiating session > (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:25,954] INFO [Partition TOPIC_NAME-PARTITION-# broker=1|#* > broker=1] Shrinking ISR from 1,3,4,2 to 1,4,2 (kafka.cluster.Partition) > [2018-07-14 04:38:26,444] WARN Unable to reconnect to ZooKeeper service, > session 0x3000c2420cb458d has expired (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,444] INFO Unable to reconnect to ZooKeeper service, > session 0x3000c2420cb458d has expired, closing socket connection > (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,445] INFO EventThread shut down for session: > 0x3000c2420cb458d (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,446] INFO [ZooKeeperClient] Session expired. > (kafka.zookeeper.ZooKeeperClient) > [2018-07-14 04:38:26,459] INFO [ZooKeeperClient] Initializing a new session > to > *ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT*. > (kafka.zookeeper.ZooKeeperClient) > [2018-07-14 04:38:26,459] INFO Initiating client connection, > connectString=*ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT* > sessionTimeout=6000 > watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@44821a96 > (org.apache.zookeeper.ZooKeeper) > [2018-07-14 04:38:26,465] INFO Opening socket connection to server > *ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL > (unknown error) (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,477] INFO Socket connection established to > *ZOOKEEPER_SERVER_1:PORT*, initiating session > (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,484] INFO Session establishment complete on server > *ZOOKEEPER_SERVER_1:PORT*, sessionid = 0x4005b59eb6a, negotiated timeout > = 6000 (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,496] *INFO Creating /brokers/ids/1* (is it secure? > false) (kafka.zk.KafkaZkClient) > [2018-07-14 04:38:26,500] INFO Processing notification(s) to /config/changes > (kafka.common.ZkNodeChangeNotificationListener) > *[2018-07-14 04:38:26,547] ERROR Error while
[jira] [Comment Edited] (KAFKA-7165) Error while creating ephemeral at /brokers/ids/BROKER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661931#comment-16661931 ] Jonathan Santilli edited comment on KAFKA-7165 at 10/24/18 8:31 AM: Hello Jun Rao, I would like to continue working on this bug, hope you can have some time to elaborate a little bit more your proposal from [https://github.com/apache/kafka/pull/5575#issuecomment-416419017:] {noformat} An alternative approach is to retry the creation of the ephemeral node up to sth like twice the session timeout. It may take a bit long for the broker to be re-registered. However, it seems it's a bit safer and simpler, until ZOOKEEPER-2985 is fixed.{noformat} Cheers, – Jonathan was (Author: pachilo): Hello Juan Rao, I would like to continue working on this bug, hope you can have some time to elaborate a little bit more your proposal from [https://github.com/apache/kafka/pull/5575#issuecomment-416419017:] {noformat} An alternative approach is to retry the creation of the ephemeral node up to sth like twice the session timeout. It may take a bit long for the broker to be re-registered. However, it seems it's a bit safer and simpler, until ZOOKEEPER-2985 is fixed.{noformat} Cheers, -- Jonathan > Error while creating ephemeral at /brokers/ids/BROKER_ID > > > Key: KAFKA-7165 > URL: https://issues.apache.org/jira/browse/KAFKA-7165 > Project: Kafka > Issue Type: Bug > Components: core, zkclient >Affects Versions: 1.1.0 >Reporter: Jonathan Santilli >Assignee: Jonathan Santilli >Priority: Major > > Kafka version: 1.1.0 > Zookeeper version: 3.4.12 > 4 Kafka Brokers > 4 Zookeeper servers > > In one of the 4 brokers of the cluster, we detect the following error: > [2018-07-14 04:38:23,784] INFO Unable to read additional data from server > sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:24,509] INFO Opening socket connection to server > *ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL > (unknown error) (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:24,510] INFO Socket connection established to > *ZOOKEEPER_SERVER_1:PORT*, initiating session > (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:24,513] INFO Unable to read additional data from server > sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:25,287] INFO Opening socket connection to server > *ZOOKEEPER_SERVER_2:PORT*. Will not attempt to authenticate using SASL > (unknown error) (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:25,287] INFO Socket connection established to > *ZOOKEEPER_SERVER_2:PORT*, initiating session > (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:25,954] INFO [Partition TOPIC_NAME-PARTITION-# broker=1|#* > broker=1] Shrinking ISR from 1,3,4,2 to 1,4,2 (kafka.cluster.Partition) > [2018-07-14 04:38:26,444] WARN Unable to reconnect to ZooKeeper service, > session 0x3000c2420cb458d has expired (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,444] INFO Unable to reconnect to ZooKeeper service, > session 0x3000c2420cb458d has expired, closing socket connection > (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,445] INFO EventThread shut down for session: > 0x3000c2420cb458d (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,446] INFO [ZooKeeperClient] Session expired. > (kafka.zookeeper.ZooKeeperClient) > [2018-07-14 04:38:26,459] INFO [ZooKeeperClient] Initializing a new session > to > *ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT*. > (kafka.zookeeper.ZooKeeperClient) > [2018-07-14 04:38:26,459] INFO Initiating client connection, > connectString=*ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT* > sessionTimeout=6000 > watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@44821a96 > (org.apache.zookeeper.ZooKeeper) > [2018-07-14 04:38:26,465] INFO Opening socket connection to server > *ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL > (unknown error) (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,477] INFO Socket connection established to > *ZOOKEEPER_SERVER_1:PORT*, initiating session > (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,484] INFO Session establishment complete on server > *ZOOKEEPER_SERVER_1:PORT*, sessionid = 0x4005b59eb6a, negotiated timeout > = 6000 (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,496] *INFO Creating
[jira] [Comment Edited] (KAFKA-7165) Error while creating ephemeral at /brokers/ids/BROKER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563458#comment-16563458 ] Manikumar edited comment on KAFKA-7165 at 7/31/18 10:46 AM: [~pachilo] your solution may work, but we should be careful not to remove the ephemeral nodes created by another broker. if someone starts a broker with same brokerId, then the registration should fail. Another option is to maintain previous zk session id and do a check [here|https://github.com/apache/kafka/blob/90e0bbec94dd85e1c5b1af0b6426df0a02e5da3f/core/src/main/scala/kafka/zk/KafkaZkClient.scala#L1512]. If the owner matches with previous sessionID, we can delete and recreate the node. [~cthunes] Since you have analyzed the ZOOKEEPER-2985, any thoughts on handling this on Kafka side. also can you share the code to reproduce this this issue? was (Author: omkreddy): [~pachilo] your solution may work, but we should be careful not to remove the ephemeral nodes created by another broker. if someone starts a broker with same brokerId, then the registration should fail. Another option is to maintain previous zk session id and do a check [here|https://github.com/apache/kafka/blob/90e0bbec94dd85e1c5b1af0b6426df0a02e5da3f/core/src/main/scala/kafka/zk/KafkaZkClient.scala#L1512 If the owner matches with previous sessionID, we can delete and recreate the node. [~cthunes] Since you have analyzed the ZOOKEEPER-2985, any thoughts on handling this on Kafka side. also can you share the code to reproduce this this issue? > Error while creating ephemeral at /brokers/ids/BROKER_ID > > > Key: KAFKA-7165 > URL: https://issues.apache.org/jira/browse/KAFKA-7165 > Project: Kafka > Issue Type: Bug > Components: core, zkclient >Affects Versions: 1.1.0 >Reporter: Jonathan Santilli >Priority: Major > > Kafka version: 1.1.0 > Zookeeper version: 3.4.12 > 4 Kafka Brokers > 4 Zookeeper servers > > In one of the 4 brokers of the cluster, we detect the following error: > [2018-07-14 04:38:23,784] INFO Unable to read additional data from server > sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:24,509] INFO Opening socket connection to server > *ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL > (unknown error) (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:24,510] INFO Socket connection established to > *ZOOKEEPER_SERVER_1:PORT*, initiating session > (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:24,513] INFO Unable to read additional data from server > sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:25,287] INFO Opening socket connection to server > *ZOOKEEPER_SERVER_2:PORT*. Will not attempt to authenticate using SASL > (unknown error) (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:25,287] INFO Socket connection established to > *ZOOKEEPER_SERVER_2:PORT*, initiating session > (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:25,954] INFO [Partition TOPIC_NAME-PARTITION-# broker=1|#* > broker=1] Shrinking ISR from 1,3,4,2 to 1,4,2 (kafka.cluster.Partition) > [2018-07-14 04:38:26,444] WARN Unable to reconnect to ZooKeeper service, > session 0x3000c2420cb458d has expired (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,444] INFO Unable to reconnect to ZooKeeper service, > session 0x3000c2420cb458d has expired, closing socket connection > (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,445] INFO EventThread shut down for session: > 0x3000c2420cb458d (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,446] INFO [ZooKeeperClient] Session expired. > (kafka.zookeeper.ZooKeeperClient) > [2018-07-14 04:38:26,459] INFO [ZooKeeperClient] Initializing a new session > to > *ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT*. > (kafka.zookeeper.ZooKeeperClient) > [2018-07-14 04:38:26,459] INFO Initiating client connection, > connectString=*ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT* > sessionTimeout=6000 > watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@44821a96 > (org.apache.zookeeper.ZooKeeper) > [2018-07-14 04:38:26,465] INFO Opening socket connection to server > *ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL > (unknown error) (org.apache.zookeeper.ClientCnxn) > [2018-07-14 04:38:26,477] INFO Socket connection established to > *ZOOKEEPER_SERVER_1:PORT*, initiating session >