[ 
https://issues.apache.org/jira/browse/KAFKA-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793567#comment-16793567
 ] 

Steven McDonald commented on KAFKA-7898:
----------------------------------------

Hi!

We encountered this as well and I did some investigation into the problem. It 
is a bug in Kafka 2.1.x that is fixed in 2.2.x (though not explicitly; the fix 
is the result of [a 
refactor|https://github.com/apache/kafka/commit/2155c6d54b087206b6aa1d58747f141761394eaf#diff-8bcd2c427556f434e33cf22abec548c2R217]).

The underlying problem is with the Zookeeper client library's MultiCallback 
interface. The 
[documentation|https://zookeeper.apache.org/doc/r3.4.13/api/org/apache/zookeeper/AsyncCallback.MultiCallback.html]
 for this says that "all opResults are OpResult.ErrorResult", but [some error 
conditions|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java#L689]
 will pass the callback a null pointer in place of a list. Kafka 2.1.x is 
implemented according to the documentation, so the null pointer case is not 
handled, leading to this bug.

I also consider it a bug that this NullPointerException leaves the Kafka 
cluster in a state that it does not recover from automatically. In our case, 
this bug was hit during a controller election, resulting in a node that was 
designated as controller but unable to function as such. It would be sufficient 
for this exception to simply kill the Kafka node so that the remaining nodes 
can recover, but I think that is a separate bug (which I will raise with 
Zookeeper first, as the exception is currently caught there).

I can provide additional information on our experience if it's of any interest, 
but since this is already fixed in Kafka 2.2.x I don't see much point expanding 
here.

> ERROR Caught unexpected throwable (org.apache.zookeeper.ClientCnxn)
> -------------------------------------------------------------------
>
>                 Key: KAFKA-7898
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7898
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.1.0
>            Reporter: Gabriel Lukacs
>            Priority: Major
>
> We observed a NullPointerException on one of our broker in 3 broker cluster 
> environment. If I list the processes and open ports it seems that the faulty 
> broker is running, but the kafka-connect (we used it also) periodically 
> restarts due to fact that it can not connect to the kafka cluster (configured 
> ssl & plaintext mode too). Is it a bug in kafka/zookeeper?
>  
> [2019-02-05 14:28:11,359] WARN Client session timed out, have not heard from 
> server in 4141ms for sessionid 0x3000010166e0000 
> (org.apache.zookeeper.ClientCnxn)
> [2019-02-05 14:28:12,525] ERROR Caught unexpected throwable 
> (org.apache.zookeeper.ClientCnxn)
> java.lang.NullPointerException
>  at 
> kafka.zookeeper.ZooKeeperClient$$anon$8.processResult(ZooKeeperClient.scala:217)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:633)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
> [2019-02-05 14:28:12,526] ERROR Caught unexpected throwable 
> (org.apache.zookeeper.ClientCnxn)
> [2019-02-05 14:28:22,701] WARN Client session timed out, have not heard from 
> server in 4004ms for sessionid 0x3000010166e0000 
> (org.apache.zookeeper.ClientCnxn)
> [2019-02-05 14:28:28,670] WARN Client session timed out, have not heard from 
> server in 4049ms for sessionid 0x3000010166e0000 
> (org.apache.zookeeper.ClientCnxn)
> [2019-02-05 15:05:20,601] WARN [GroupCoordinator 1]: Failed to write empty 
> metadata for group 
> encodable-emvTokenAccess-delta-encoder-group-emvIssuerAccess-v2-2-0: The 
> group is rebalancing, so a rejoin is needed. 
> (kafka.coordinator.group.GroupCoordinator)
> kafka 7381 1 0 14:22 ? 00:00:19 java -Xmx512M -Xms512M -server -XX:+UseG1GC 
> -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 
> -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true 
> -Xloggc:/opt/kafka/bin/../logs/zookeeper-gc.log -verbose:gc 
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M 
> -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dkafka.logs.dir=/opt/kafka/bin/../logs 
> -Dlog4j.configuration=file:/opt/kafka/config/zoo-log4j.properties -cp 
> /opt/kafka/bin/../libs/activation-1.1.1.jar:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka/bin/../libs/commons-lang3-3.5.jar:/opt/kafka/bin/../libs/compileScala.mapping:/opt/kafka/bin/../libs/compileScala.mapping.asc:/opt/kafka/bin/../libs/connect-api-2.1.0.jar:/opt/kafka/bin/../libs/connect-basic-auth-extension-2.1.0.jar:/opt/kafka/bin/../libs/connect-file-2.1.0.jar:/opt/kafka/bin/../libs/connect-json-2.1.0.jar:/opt/kafka/bin/../libs/connect-runtime-2.1.0.jar:/opt/kafka/bin/../libs/connect-transforms-2.1.0.jar:/opt/kafka/bin/../libs/guava-20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka/bin/../libs/jackson-annotations-2.9.7.jar:/opt/kafka/bin/../libs/jackson-core-2.9.7.jar:/opt/kafka/bin/../libs/jackson-databind-2.9.7.jar:/opt/kafka/bin/../libs/jackson-jaxrs-base-2.9.7.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider-2.9.7.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations-2.9.7.jar:/opt/kafka/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka/bin/../libs/javax.inject-1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka/bin/../libs/jersey-client-2.27.jar:/opt/kafka/bin/../libs/jersey-common-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka/bin/../libs/jersey-server-2.27.jar:/opt/kafka/bin/../libs/jetty-client-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-continuation-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-http-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-io-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-security-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-server-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-servlet-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-servlets-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-util-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka/bin/../libs/kafka_2.12-2.1.0.jar:/opt/kafka/bin/../libs/kafka_2.12-2.1.0-sources.jar:/opt/kafka/bin/../libs/kafka-clients-2.1.0.jar:/opt/kafka/bin/../libs/kafka-log4j-appender-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-examples-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-scala_2.12-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-test-utils-2.1.0.jar:/opt/kafka/bin/../libs/kafka-tools-2.1.0.jar:/opt/kafka/bin/../libs/log4j-1.2.17.jar:/opt/kafka/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka/bin/../libs/maven-artifact-3.5.4.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka/bin/../libs/reflections-0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni-5.14.2.jar:/opt/kafka/bin/../libs/scala-library-2.12.7.jar:/opt/kafka/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka/bin/../libs/scala-reflect-2.12.7.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka/bin/../libs/slf4j-log4j12-1.7.
> kafka 9806 1 2 14:22 ? 00:02:41 java -Xmx1G -Xms1G -server -XX:+UseG1GC 
> -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 
> -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true 
> -Xloggc:/opt/kafka/bin/../logs/kafkaServer-gc.log -verbose:gc 
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M 
> -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.port=9997 
> -Dkafka.logs.dir=/opt/kafka/bin/../logs 
> -Dlog4j.configuration=file:/opt/kafka/bin/../config/log4j.properties -cp 
> /opt/kafka/bin/../libs/activation-1.1.1.jar:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka/bin/../libs/commons-lang3-3.5.jar:/opt/kafka/bin/../libs/compileScala.mapping:/opt/kafka/bin/../libs/compileScala.mapping.asc:/opt/kafka/bin/../libs/connect-api-2.1.0.jar:/opt/kafka/bin/../libs/connect-basic-auth-extension-2.1.0.jar:/opt/kafka/bin/../libs/connect-file-2.1.0.jar:/opt/kafka/bin/../libs/connect-json-2.1.0.jar:/opt/kafka/bin/../libs/connect-runtime-2.1.0.jar:/opt/kafka/bin/../libs/connect-transforms-2.1.0.jar:/opt/kafka/bin/../libs/guava-20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka/bin/../libs/jackson-annotations-2.9.7.jar:/opt/kafka/bin/../libs/jackson-core-2.9.7.jar:/opt/kafka/bin/../libs/jackson-databind-2.9.7.jar:/opt/kafka/bin/../libs/jackson-jaxrs-base-2.9.7.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider-2.9.7.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations-2.9.7.jar:/opt/kafka/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka/bin/../libs/javax.inject-1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka/bin/../libs/jersey-client-2.27.jar:/opt/kafka/bin/../libs/jersey-common-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka/bin/../libs/jersey-server-2.27.jar:/opt/kafka/bin/../libs/jetty-client-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-continuation-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-http-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-io-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-security-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-server-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-servlet-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-servlets-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-util-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka/bin/../libs/kafka_2.12-2.1.0.jar:/opt/kafka/bin/../libs/kafka_2.12-2.1.0-sources.jar:/opt/kafka/bin/../libs/kafka-clients-2.1.0.jar:/opt/kafka/bin/../libs/kafka-log4j-appender-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-examples-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-scala_2.12-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-test-utils-2.1.0.jar:/opt/kafka/bin/../libs/kafka-tools-2.1.0.jar:/opt/kafka/bin/../libs/log4j-1.2.17.jar:/opt/kafka/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka/bin/../libs/maven-artifact-3.5.4.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka/bin/../libs/reflections-0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni-5.14.2.jar:/opt/kafka/bin/../libs/scala-library-2.12.7.jar:/opt/kafka/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka/bin/../libs/scala-reflect-2.12.7.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.25.jar
> [root@litest-gateway1 ~]$ netstat -tupnl | grep 7381
> tcp 0 0 0.0.0.0:48097 0.0.0.0:* LISTEN 7381/java
> tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN 7381/java
> tcp 0 0 172.31.32.194:3888 0.0.0.0:* LISTEN 7381/java
> [root@litest-gateway1 ~]$ netstat -tupnl | grep 9806
> tcp 0 0 0.0.0.0:9092 0.0.0.0:* LISTEN 9806/java
> tcp 0 0 0.0.0.0:9093 0.0.0.0:* LISTEN 9806/java
> tcp 0 0 0.0.0.0:37062 0.0.0.0:* LISTEN 9806/java
> tcp 0 0 0.0.0.0:9997 0.0.0.0:* LISTEN 9806/java
> tcp 0 0 0.0.0.0:37948 0.0.0.0:* LISTEN 9806/java
> [root@litest-gateway1 ~]$ cat /etc/kafka/server.properties
> broker.id=1
> listeners=PLAINTEXT://:9092,SSL://:9093
> num.network.threads=3
> num.io.threads=8
> socket.send.buffer.bytes=102400
> socket.receive.buffer.bytes=102400
> socket.request.max.bytes=104857600
> auto.create.topics.enable=false
> log.dirs=/var/kafka
> num.partitions=1
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=3
> transaction.state.log.min.isr=2
> log.retention.hours=336
> log.segment.bytes=1073741824
> log.roll.hours=24
> log.cleaner.enable=false
> offsets.retention.minutes=20160
> zookeeper.connect=litest-gateway1:2181,litest-gateway2:2181,litest-gateway3:2181
> zookeeper.connection.timeout.ms=6000
> min.insync.replicas=2
> group.initial.rebalance.delay.ms=0
> ssl.keystore.location=<some location>
> ssl.keystore.password=<some pwd>
> ssl.key.password=<some pwd>
> ssl.truststore.location=<some location>
> ssl.truststore.password=<some pwd>
> ssl.client.auth=requested
> ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
> ssl.keystore.type=JKS
> ssl.truststore.type=JKS
> messages, plus replication traffic, per-broker)
> ssl.secure.random.implementation=SHA1PRNG
> security.inter.broker.protocol=SSL
> [root@litest-gateway1 ~]$ cat /etc/zookeeper/zookeeper.properties
> tickTime=2000
> initLimit=10
> syncLimit=5
> maxClientCnxns=200
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=24
> dataDir = /var/lib/zookeeper
> clientPort = 2181
> maxClientCnxns = 60
> logDir = /var/log/zookeeper
> server.1=litest-gateway1:2888:3888
> server.2=litest-gateway2:2888:3888
> server.3=litest-gateway3:2888:3888
> [root@litest-gateway1 ~]$ cat /etc/system-release
> Oracle Linux Server release 7.2
> [root@litest-gateway1 ~]$ uname -a
> Linux litest-gateway1 3.8.13-118.10.2.el7uek.x86_64 #2 SMP Fri Aug 12 
> 15:08:18 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to