[ 
https://issues.apache.org/jira/browse/KAFKA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780189#comment-16780189
 ] 

Naga commented on KAFKA-8008:
-----------------------------

I'm also observing the similar issue.

> Clients unable to connect and replicas are not able to connect to each other
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-8008
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8008
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller, core
>    Affects Versions: 2.1.0, 2.1.1
>         Environment: Java 11
>            Reporter: Abhi
>            Priority: Critical
>
> Hi,
> I upgrade to Kafka v2.1.1 recently and seeing the below exceptions in all the 
> servers. The kafka-network-thread-1-ListenerName are all consuming full cpu 
> cycles. Lots of TCP connections are in CLOSE_WAIT state.
> My broker setup is using kerberos authentication with 
> -Dsun.security.jgss.native=true.
> I am not sure how to handle this? Will increasing the kafka-network thread 
> count help if it is possible?
> Does this seem like a bug? I am happy to help in anyway I can as this issue 
> blocking our production usage and would like to get it resolved as early as 
> possible.
> *server.log snippet from one of the servers:*
> [2019-02-27 00:00:02,948] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
> fetcherId=3] Built full fetch (sessionId=1488865423, epoch=INITIAL) for node 
> 2 with 3 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
> [2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
> fetcherId=3] Initiating connection to node mwkafka-prod-02.nyc.foo.com:9092 
> (id: 2 rack: null) using address mwkafka-prod-02.nyc.foo.com/10.219.247.26 
> (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 00:00:02,949] DEBUG Set SASL client state to 
> SEND_APIVERSIONS_REQUEST 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:02,949] DEBUG Creating SaslClient: 
> client=kafka/mwkafka-prod-01.nyc.foo....@unix.foo.com;service=kafka;serviceHostname=mwkafka-prod-02.nyc.foo.com;mechs=[GSSAPI]
>  (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
> fetcherId=3] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 166400, 
> SO_TIMEOUT = 0 to node 2 (org.apache.kafka.common.network.Selector)
> [2019-02-27 00:00:02,949] DEBUG Set SASL client state to 
> RECEIVE_APIVERSIONS_RESPONSE 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
> fetcherId=3] Completed connection to node 2. Ready. 
> (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 00:00:03,007] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, 
> fetcherId=0] Built full fetch (sessionId=2039987243, epoch=INITIAL) for node 
> 5 with 0 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
> [2019-02-27 00:00:03,317] INFO [ReplicaFetcher replicaId=1, leaderId=5, 
> fetcherId=1] Error sending fetch request (sessionId=397037945, epoch=INITIAL) 
> to node 5: java.net.SocketTimeoutException: Failed to connect within 30000 
> ms. (org.apache.kafka.clients.FetchSessionHandler)
> [2019-02-27 00:00:03,317] WARN [ReplicaFetcher replicaId=1, leaderId=5, 
> fetcherId=1] Error in response for fetch request (type=FetchRequest, 
> replicaId=1, maxWait=10000, minBytes=1, maxBytes=10485760, 
> fetchData={reddyvel-159-0=(fetchOffset=3173198, logStartOffset=3173198, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-331-0=(fetchOffset=3173197, logStartOffset=3173197, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-newtp-5-64-0=(fetchOffset=8936, logStartOffset=8936, 
> maxBytes=1048576, currentLeaderEpoch=Optional[18]), 
> reddyvel-tp9-78-0=(fetchOffset=247943, logStartOffset=247943, 
> maxBytes=1048576, currentLeaderEpoch=Optional[19]), 
> reddyvel-tp3-58-0=(fetchOffset=264495, logStartOffset=264495, 
> maxBytes=1048576, currentLeaderEpoch=Optional[19]), 
> fps.trsy.fe_prvt-0=(fetchOffset=24, logStartOffset=8, maxBytes=1048576, 
> currentLeaderEpoch=Optional[3]), reddyvel-7-0=(fetchOffset=3173199, 
> logStartOffset=3173199, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-298-0=(fetchOffset=3173197, logStartOffset=3173197, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> fps.guas.peeq.fe_marb_us-0=(fetchOffset=2, logStartOffset=2, 
> maxBytes=1048576, currentLeaderEpoch=Optional[6]), 
> reddyvel-108-0=(fetchOffset=3173198, logStartOffset=3173198, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-988-0=(fetchOffset=3173185, logStartOffset=3173185, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-111-0=(fetchOffset=3173198, logStartOffset=3173198, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-409-0=(fetchOffset=3173194, logStartOffset=3173194, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-104-0=(fetchOffset=3173198, logStartOffset=3173198, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> fps.priveq.reins-0=(fetchOffset=12, logStartOffset=6, maxBytes=1048576, 
> currentLeaderEpoch=Optional[5]), reddyvel-353-0=(fetchOffset=3173197, 
> logStartOffset=3173197, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-tp10-63-0=(fetchOffset=220652, logStartOffset=220652, 
> maxBytes=1048576, currentLeaderEpoch=Optional[19]), 
> reddyvel-newtp-5-86-0=(fetchOffset=8935, logStartOffset=8935, 
> maxBytes=1048576, currentLeaderEpoch=Optional[18]), 
> reddyvel-878-0=(fetchOffset=3173187, logStartOffset=3173187, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-621-0=(fetchOffset=3173190, logStartOffset=3173190, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> fps.agg.uopt.opt-0=(fetchOffset=28297, logStartOffset=28297, 
> maxBytes=1048576, currentLeaderEpoch=Optional[8]), 
> reddyvel-661-0=(fetchOffset=3173190, logStartOffset=3173190, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> fps.guas.jpeq.fe_marb-0=(fetchOffset=532, logStartOffset=10, 
> maxBytes=1048576, currentLeaderEpoch=Optional[3]), 
> reddyvel-607-0=(fetchOffset=3173191, logStartOffset=3173191, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> fps.seed.ornt.desim_ornt-0=(fetchOffset=4060, logStartOffset=2433, 
> maxBytes=1048576, currentLeaderEpoch=Optional[5]), 
> reddyvel-962-0=(fetchOffset=3173185, logStartOffset=3173185, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> fps.agg.ornt.desim_ornt-0=(fetchOffset=1177, logStartOffset=1177, 
> maxBytes=1048576, currentLeaderEpoch=Optional[8]), 
> reddyvel-tp6-71-0=(fetchOffset=256309, logStartOffset=256309, 
> maxBytes=1048576, currentLeaderEpoch=Optional[19]), 
> fps.trsy.macro-0=(fetchOffset=324, logStartOffset=106, maxBytes=1048576, 
> currentLeaderEpoch=Optional[3]), fps.agg.dist.treas-0=(fetchOffset=0, 
> logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[8]), 
> reddyvel-newtp-8-111-0=(fetchOffset=1, logStartOffset=1, maxBytes=1048576, 
> currentLeaderEpoch=Optional[18]), reddyvel-94-0=(fetchOffset=3173198, 
> logStartOffset=3173198, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-544-0=(fetchOffset=3173192, logStartOffset=3173192, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-newtp-2-72-0=(fetchOffset=8679, logStartOffset=8679, 
> maxBytes=1048576, currentLeaderEpoch=Optional[18]), 
> fps.guas.useq.eq_stat_arb_useq-0=(fetchOffset=6071629, 
> logStartOffset=4470145, maxBytes=1048576, currentLeaderEpoch=Optional[4]), 
> reddyvel-newtp-10-75-0=(fetchOffset=9768, logStartOffset=9768, 
> maxBytes=1048576, currentLeaderEpoch=Optional[18]), 
> reddyvel-tp8-51-0=(fetchOffset=249873, logStartOffset=249873, 
> maxBytes=1048576, currentLeaderEpoch=Optional[19]), 
> reddyvel-newtp-8-122-0=(fetchOffset=1, logStartOffset=1, maxBytes=1048576, 
> currentLeaderEpoch=Optional[18]), fps.seed.trsy.pe_china-0=(fetchOffset=24, 
> logStartOffset=8, maxBytes=1048576, currentLeaderEpoch=Optional[3]), 
> fps.seed.trsy.jcas-0=(fetchOffset=93, logStartOffset=31, maxBytes=1048576, 
> currentLeaderEpoch=Optional[3]), reddyvel-tp8-99-0=(fetchOffset=249871, 
> logStartOffset=249871, maxBytes=1048576, currentLeaderEpoch=Optional[19]), 
> reddyvel-643-0=(fetchOffset=3173190, logStartOffset=3173190, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-577-0=(fetchOffset=3173191, logStartOffset=3173191, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-tp6-97-0=(fetchOffset=256307, logStartOffset=256307, 
> maxBytes=1048576, currentLeaderEpoch=Optional[19]), 
> reddyvel-newtp-6-72-0=(fetchOffset=7652, logStartOffset=7652, 
> maxBytes=1048576, currentLeaderEpoch=Optional[18]), 
> reddyvel-959-0=(fetchOffset=3173185, logStartOffset=3173185, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-newtp-4-70-0=(fetchOffset=8828, logStartOffset=8828, 
> maxBytes=1048576, currentLeaderEpoch=Optional[18]), 
> fps.seed.trsy.opt-0=(fetchOffset=114, logStartOffset=38, maxBytes=1048576, 
> currentLeaderEpoch=Optional[4])}, isolationLevel=READ_UNCOMMITTED, toForget=, 
> metadata=(sessionId=397037945, epoch=INITIAL)) 
> (kafka.server.ReplicaFetcherThread)
> java.net.SocketTimeoutException: Failed to connect within 30000 ms
>         at 
> kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:95)
>         at 
> kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:192)
>         at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:274)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:132)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:131)
>         at scala.Option.foreach(Option.scala:257)
>         at 
> kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:131)
>         at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> [2019-02-27 00:00:04,007] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, 
> fetcherId=0] Built full fetch (sessionId=2039987243, epoch=INITIAL) for node 
> 5 with 45 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
> [2019-02-27 00:00:04,007] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, 
> fetcherId=0] Initiating connection to node mwkafka-prod-01.tbd.foo.com:9092 
> (id: 5 rack: null) using address mwkafka-prod-01.tbd.foo.com/10.236.30.30 
> (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 00:00:04,008] DEBUG Set SASL client state to 
> SEND_APIVERSIONS_REQUEST 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:04,008] DEBUG Creating SaslClient: 
> client=kafka/mwkafka-prod-01.nyc.foo....@unix.foo.com;service=kafka;serviceHostname=mwkafka-prod-01.tbd.foo.com;mechs=[GSSAPI]
>  (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:04,008] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, 
> fetcherId=0] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 166400, 
> SO_TIMEOUT = 0 to node 5 (org.apache.kafka.common.network.Selector)
> [2019-02-27 00:00:04,008] DEBUG Set SASL client state to 
> RECEIVE_APIVERSIONS_RESPONSE 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:04,008] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, 
> fetcherId=0] Completed connection to node 5. Ready. 
> (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 00:00:04,318] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, 
> fetcherId=1] Built full fetch (sessionId=397037945, epoch=INITIAL) for node 5 
> with 0 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
> [2019-02-27 00:00:05,318] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, 
> fetcherId=1] Built full fetch (sessionId=397037945, epoch=INITIAL) for node 5 
> with 48 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
> [2019-02-27 00:00:05,318] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, 
> fetcherId=1] Initiating connection to node mwkafka-prod-01.tbd.foo.com:9092 
> (id: 5 rack: null) using address mwkafka-prod-01.tbd.foo.com/10.236.30.30 
> (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 00:00:05,318] DEBUG Set SASL client state to 
> SEND_APIVERSIONS_REQUEST 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:05,318] DEBUG Creating SaslClient: 
> client=kafka/mwkafka-prod-01.nyc.foo....@unix.foo.com;service=kafka;serviceHostname=mwkafka-prod-01.tbd.foo.com;mechs=[GSSAPI]
>  (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:05,319] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, 
> fetcherId=1] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 166400, 
> SO_TIMEOUT = 0 to node 5 (org.apache.kafka.common.network.Selector)
> [2019-02-27 00:00:05,319] DEBUG Set SASL client state to 
> RECEIVE_APIVERSIONS_RESPONSE 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:05,319] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, 
> fetcherId=1] Completed connection to node 5. Ready. 
> (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 00:00:05,477] DEBUG Set SASL server state to AUTHENTICATE 
> (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
> [2019-02-27 00:00:05,477] DEBUG [SocketServer brokerId=1] Connection with 
> /10.236.30.31 disconnected (org.apache.kafka.common.network.Selector)
> java.io.EOFException
>         at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:96)
>         at 
> org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:237)
>         at 
> org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:132)
>         at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:532)
>         at org.apache.kafka.common.network.Selector.poll(Selector.java:467)
>         at kafka.network.Processor.poll(SocketServer.scala:689)
>         at kafka.network.Processor.run(SocketServer.scala:594)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> [2019-02-27 00:00:05,477] DEBUG Handling Kafka request SASL_HANDSHAKE 
> (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
> [2019-02-27 00:00:05,477] DEBUG Using SASL mechanism 'GSSAPI' provided by 
> client 
> (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
> [2019-02-27 00:00:05,477] DEBUG Creating SaslServer for 
> kafka/mwkafka-prod-01.nyc.foo....@unix.foo.com with mechanism GSSAPI 
> (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
> [2019-02-27 00:00:06,056] INFO [ReplicaFetcher replicaId=1, leaderId=4, 
> fetcherId=2] Error sending fetch request (sessionId=373847113, epoch=INITIAL) 
> to node 4: java.net.SocketTimeoutException: Failed to connect within 30000 
> ms. (org.apache.kafka.clients.FetchSessionHandler)
> *Lsof output:*
> kafka...@mwkafka-prod-01.nyc[toa]:/local/kafka/logs> lsof -P -p 103485 | grep 
> TCP | grep CLOSE
> java    103485 kafkagod  635u     IPv4           86522305       0t0       TCP 
> mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.nyc.foo.com:46014 
> (CLOSE_WAIT)
> java    103485 kafkagod  639u     IPv4           86519040       0t0       TCP 
> mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.nyc.foo.com:45926 
> (CLOSE_WAIT)
> java    103485 kafkagod  642u     IPv4           86519057       0t0       TCP 
> mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-01.dr.foo.com:47424 
> (CLOSE_WAIT)
> java    103485 kafkagod  643u     IPv4           86519058       0t0       TCP 
> mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-01.dr.foo.com:47428 
> (CLOSE_WAIT)
> java    103485 kafkagod  683u     IPv4           86509505       0t0       TCP 
> mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-01.tbd.foo.com:57856 
> (CLOSE_WAIT)
> java    103485 kafkagod  684u     IPv4           86522910       0t0       TCP 
> mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-01.tbd.foo.com:57894 
> (CLOSE_WAIT)
> java    103485 kafkagod  688u     IPv4           86522176       0t0       TCP 
> mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.nyc.foo.com:45966 
> (CLOSE_WAIT)
> java    103485 kafkagod  690u     IPv4           86522306       0t0       TCP 
> mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.tbd.foo.com:35326 
> (CLOSE_WAIT)
> java    103485 kafkagod  695u     IPv4           86522192       0t0       TCP 
> mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.nyc.foo.com:45968 
> (CLOSE_WAIT)
> java    103485 kafkagod  696u     IPv4           86509516       0t0       TCP 
> mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.dr.foo.com:39676 
> (CLOSE_WAIT)
> java    103485 kafkagod  697u     IPv4           86522307       0t0       TCP 
> mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.tbd.foo.com:35328 
> (CLOSE_WAIT)
> java    103485 kafkagod  705u     IPv4           86519026       0t0       TCP 
> mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-01.tbd.foo.com:57838 
> (CLOSE_WAIT)
> java    103485 kafkagod  726u     IPv4           86509517       0t0       TCP 
> mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.tbd.foo.com:35258 
> (CLOSE_WAIT)
> *contoller logs below show that no broker is able connect to each other:*
> [2019-02-27 03:39:16,135] WARN [RequestSendThread controllerId=1] Controller 
> 1's connection to broker mwkafka-prod-02.dr.foo.com:9092 (id: 4 rack: 
> dr.foo.com) was unsuccessful (kafka.controller.RequestSendThread)
> java.net.SocketTimeoutException: Failed to connect within 30000 ms
>         at 
> kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280)
>         at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> [2019-02-27 03:39:20,915] WARN [RequestSendThread controllerId=1] Controller 
> 1's connection to broker mwkafka-prod-02.tbd.foo.com:9092 (id: 6 rack: 
> tbd.foo.com) was unsuccessful (kafka.controller.RequestSendThread)
> java.net.SocketTimeoutException: Failed to connect within 30000 ms
>         at 
> kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280)
>         at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> [2019-02-27 03:39:25,839] WARN [RequestSendThread controllerId=1] Controller 
> 1's connection to broker mwkafka-prod-01.dr.foo.com:9092 (id: 3 rack: 
> dr.foo.com) was unsuccessful (kafka.controller.RequestSendThread)
> java.net.SocketTimeoutException: Failed to connect within 30000 ms
>         at 
> kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280)
>         at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> [2019-02-27 03:39:31,371] WARN [RequestSendThread controllerId=1] Controller 
> 1's connection to broker mwkafka-prod-01.tbd.foo.com:9092 (id: 5 rack: 
> tbd.foo.com) was unsuccessful (kafka.controller.RequestSendThread)
> java.net.SocketTimeoutException: Failed to connect within 30000 ms
>         at 
> kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280)
>         at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> [2019-02-27 03:39:40,440] WARN [RequestSendThread controllerId=1] Controller 
> 1's connection to broker mwkafka-prod-01.nyc.foo.com:9092 (id: 1 rack: 
> nyc.foo.com) was unsuccessful (kafka.controller.RequestSendThread)
> java.net.SocketTimeoutException: Failed to connect within 30000 ms
>         at 
> kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280)
>         at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> [2019-02-27 03:39:42,149] WARN [RequestSendThread controllerId=1] Controller 
> 1's connection to broker mwkafka-prod-02.nyc.foo.com:9092 (id: 2 rack: 
> nyc.foo.com) was unsuccessful (kafka.controller.RequestSendThread)
> java.net.SocketTimeoutException: Failed to connect within 30000 ms
>         at 
> kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280)
>         at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> *Client consumer logs: consumer is not able to connect to the brokers*
> [2019-02-27 03:37:37,587] DEBUG [Consumer clientId=test_con, groupId=chow] 
> Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to 
> node -6 (org.apache.kafka.common.network.Selector)
> [2019-02-27 03:37:37,587] DEBUG Set SASL client state to 
> RECEIVE_APIVERSIONS_RESPONSE 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:37:37,587] DEBUG [Consumer clientId=test_con, groupId=chow] 
> Completed connection to node -6. Fetching API versions. 
> (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 03:38:02,488] DEBUG [Consumer clientId=test_con, groupId=chow] 
> Coordinator discovery failed, refreshing metadata 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:38:02,488] DEBUG [Consumer clientId=test_con, groupId=chow] 
> Sending FindCoordinator request to broker mwkafka-prod-02.tbd.deshaw.com:9092 
> (id: -6 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:38:08,618] DEBUG Set SASL client state to INITIAL 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:38:08,619] DEBUG Set SASL client state to INTERMEDIATE 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:38:29,288] DEBUG Set SASL client state to 
> SEND_HANDSHAKE_REQUEST 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:38:29,288] DEBUG Set SASL client state to 
> RECEIVE_HANDSHAKE_RESPONSE 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:38:32,493] DEBUG [Consumer clientId=test_con, groupId=chow] 
> Coordinator discovery failed, refreshing metadata 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:38:32,493] DEBUG [Consumer clientId=test_con, groupId=chow] 
> Sending FindCoordinator request to broker mwkafka-prod-02.dr.deshaw.com:9092 
> (id: -4 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:39:02,498] DEBUG [Consumer clientId=test_con, groupId=chow] 
> Coordinator discovery failed, refreshing metadata 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:39:02,498] DEBUG [Consumer clientId=test_con, groupId=chow] 
> Sending FindCoordinator request to broker mwkafka-prod-02.dr.deshaw.com:9092 
> (id: -3 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:39:32,501] DEBUG [Consumer clientId=test_con, groupId=chow] 
> Coordinator discovery failed, refreshing metadata 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:39:32,501] DEBUG [Consumer clientId=test_con, groupId=chow] 
> Sending FindCoordinator request to broker mwkafka-prod-01.nyc.deshaw.com:9092 
> (id: -1 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:39:33,050] DEBUG Set SASL client state to 
> SEND_HANDSHAKE_REQUEST 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:39:33,051] DEBUG Set SASL client state to 
> RECEIVE_HANDSHAKE_RESPONSE 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:39:34,683] DEBUG Set SASL client state to CLIENT_COMPLETE 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:39:41,887] DEBUG Set SASL client state to CLIENT_COMPLETE 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:40:02,503] DEBUG [Consumer clientId=test_con, groupId=chow] 
> Coordinator discovery failed, refreshing metadata 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:40:02,503] DEBUG [Consumer clientId=test_con, groupId=chow] 
> Sending FindCoordinator request to broker mwkafka-prod-02.tbd.deshaw.com:9092 
> (id: -6 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:40:31,269] DEBUG Set SASL client state to COMPLETE 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 03:40:31,269] DEBUG [Consumer clientId=test_con, groupId=chow] 
> Initiating API versions fetch from node -1. 
> (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 03:40:32,507] DEBUG [Consumer clientId=test_con, groupId=chow] 
> Coordinator discovery failed, refreshing metadata 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:40:32,507] DEBUG [Consumer clientId=test_con, groupId=chow] 
> Sending FindCoordinator request to broker mwkafka-prod-02.dr.deshaw.com:9092 
> (id: -4 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2019-02-27 03:40:42,438] DEBUG Set SASL client state to COMPLETE 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to