[
https://issues.apache.org/jira/browse/KAFKA-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anil reopened KAFKA-6281:
-------------------------
Thanks [~omkreddy],
We've got this issue again today, twice.
I see that both the times, there is GC running in my server1 , which I'm
connecting to. But it seems to be of short duration.
Please suggest, if that could be the cause. Regarding the network connection
issue(s), b/w which components there could be issue ? I mean the zoo-keeper of
server1, couldnot connect to zoo-keeper of server2 ? . We haven't found any
connectivity issues, but still want to make sure on the ports and availablitiy.
{code:java}
Exception 1 :
-2018-06-04T05:03:35,967 [kafka-producer-network-thread | producer-1] ERROR
-org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is
not the leader for that topic-partition.
GC 1:
2018-06-04T05:03:24.865-0400: 1465075.225: [GC pause (G1 Evacuation Pause)
(young), 0.0158735 secs]
[Parallel Time: 15.3 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 1465075225.5, Avg: 1465075225.5, Max: 1465075225.5,
Diff: 0.0]
[Ext Root Scanning (ms): Min: 0.7, Avg: 0.7, Max: 0.7, Diff: 0.1, Sum: 1.4]
[Update RS (ms): Min: 2.1, Avg: 2.3, Max: 2.4, Diff: 0.2, Sum: 4.5]
[Processed Buffers: Min: 19, Avg: 20.5, Max: 22, Diff: 3, Sum: 41]
[Scan RS (ms): Min: 3.6, Avg: 3.7, Max: 3.9, Diff: 0.2, Sum: 7.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 8.5, Avg: 8.5, Max: 8.6, Diff: 0.1, Sum: 17.0]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 15.2, Avg: 15.2, Max: 15.2, Diff: 0.0, Sum: 30.4]
[GC Worker End (ms): Min: 1465075240.7, Avg: 1465075240.7, Max: 1465075240.7,
Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 0.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 46.0M(46.0M)->0.0B(100.0M) Survivors: 5120.0K->6144.0K Heap:
262.0M(1024.0M)->217.0M(1024.0M)]
[Times: user=0.03 sys=0.00, real=0.02 secs]
--Exception 2 --
-2018-06-04T10:15:15,522 [kafka-producer-network-thread | producer-1] ERROR
-org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is
not the leader for that topic-partition.
--GC--2
2018-05-18T10:15:11.501-0400: 14981.861: [GC pause (G1 Evacuation Pause)
(young), 0.0205628 secs]
[Parallel Time: 20.0 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 14981861.1, Avg: 14981861.1, Max: 14981861.1, Diff:
0.0]
[Ext Root Scanning (ms): Min: 0.6, Avg: 0.7, Max: 0.7, Diff: 0.0, Sum: 1.3]
[Update RS (ms): Min: 1.6, Avg: 1.6, Max: 1.6, Diff: 0.0, Sum: 3.1]
[Processed Buffers: Min: 12, Avg: 18.5, Max: 25, Diff: 13, Sum: 37]
[Scan RS (ms): Min: 2.6, Avg: 2.6, Max: 2.6, Diff: 0.0, Sum: 5.2]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 15.1, Avg: 15.1, Max: 15.1, Diff: 0.0, Sum: 30.2]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 19.9, Avg: 19.9, Max: 19.9, Diff: 0.0, Sum: 39.8]
[GC Worker End (ms): Min: 14981881.0, Avg: 14981881.0, Max: 14981881.0, Diff:
0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.0 ms]
[Other: 0.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 63.0M(63.0M)->0.0B(79.0M) Survivors: 8192.0K->6144.0K Heap:
425.0M(1024.0M)->364.5M(1024.0M)]
[Times: user=0.04 sys=0.00, real=0.02 secs]
{code}
> Kafka JavaAPI Producer failed with NotLeaderForPartitionException
> -----------------------------------------------------------------
>
> Key: KAFKA-6281
> URL: https://issues.apache.org/jira/browse/KAFKA-6281
> Project: Kafka
> Issue Type: Bug
> Reporter: Anil
> Priority: Major
> Attachments: server1-controller.log, server2-controller.log
>
>
> We are running Kafka (vesion kafka_2.11-0.10.1.0) in a 2 node cluster. We
> have 2 producers (Java API) acting on different topics. Each topic has single
> partition. The topic where we had this issue, has one consumer running. This
> set up has been running fine for 3 months, and we saw this issue. All the
> suggested cases/solutions for this issue in other forums don't seem to apply
> for my scenario.
> Exception at producer;
> {code}
> -2017-11-25T17:40:33,035 [kafka-producer-network-thread | producer-1] ERROR
> client.producer.BingLogProducerCallback - Encountered exception in sending
> message ; > org.apache.kafka.common.errors.NotLeaderForPartitionException:
> This server is not the leader for that topic-partition.
> {code}
> We haven't enabled retries for the messages, because this is transactional
> data and we want to maintain the order.
> Producer config:
> {code}
> bootstrap.servers : server1ip:9092
> acks :all
> retries : 0
> linger.ms :0
> buffer.memory :10240000
> max.request.size :1024000
> key.serializer : org.apache.kafka.common.serialization.StringSerializer
> value.serializer : org.apache.kafka.common.serialization.StringSerializer
> {code}
> We are connecting to server1 at both producer and consumer. The controller
> log at server2 indicates there is some shutdown happened at during sametime,
> but I dont understand why this happened.
> {color:red}[2017-11-25 17:31:44,776] DEBUG [Controller 2]: topics not in
> preferred replica Map() (kafka.controller.KafkaController) [2017-11-25
> 17:31:44,776] TRACE [Controller 2]: leader imbalance ratio for broker 2 is
> 0.000000 (kafka.controller.KafkaController) [2017-11-25 17:31:44,776] DEBUG
> [Controller 2]: topics not in preferred replica Map()
> (kafka.controller.KafkaController) [2017-11-25 17:31:44,776] TRACE
> [Controller 2]: leader imbalance ratio for broker 1 is 0.000000
> (kafka.controller.KafkaController) [2017-11-25 17:34:18,314] INFO
> [SessionExpirationListener on 2], ZK expired; shut down all controller
> components and try to re-elect
> (kafka.controller.KafkaController$SessionExpirationListener) [2017-11-25
> 17:34:18,317] DEBUG [Controller 2]: Controller resigning, broker id 2
> (kafka.controller.KafkaController) [2017-11-25 17:34:18,317] DEBUG
> [Controller 2]: De-registering IsrChangeNotificationListener
> (kafka.controller.KafkaController) [2017-11-25 17:34:18,317] INFO
> [delete-topics-thread-2], Shutting down
> (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25
> 17:34:18,317] INFO [delete-topics-thread-2], Stopped
> (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25
> 17:34:18,318] INFO [delete-topics-thread-2], Shutdown completed
> (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25
> 17:34:18,318] INFO [Partition state machine on Controller 2]: Stopped
> partition state machine (kafka.controller.PartitionStateMachine) [2017-11-25
> 17:34:18,318] INFO [Replica state machine on controller 2]: Stopped replica
> state machine (kafka.controller.ReplicaStateMachine) [2017-11-25
> 17:34:18,318] INFO [Controller-2-to-broker-2-send-thread], Shutting down
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,318] INFO
> [Controller-2-to-broker-2-send-thread], Stopped
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO
> [Controller-2-to-broker-2-send-thread], Shutdown completed
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO
> [Controller-2-to-broker-1-send-thread], Shutting down
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO
> [Controller-2-to-broker-1-send-thread], Stopped
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO
> [Controller-2-to-broker-1-send-thread], Shutdown completed
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO
> [Controller 2]: Broker 2 resigned as the controller
> (kafka.controller.KafkaController) [2017-11-25 17:34:18,353] DEBUG
> [IsrChangeNotificationListener] Fired!!!
> (kafka.controller.IsrChangeNotificationListener) [2017-11-25 17:34:18,353]
> DEBUG [IsrChangeNotificationListener] Fired!!!
> (kafka.controller.IsrChangeNotificationListener) [2017-11-25 17:34:18,354]
> INFO [BrokerChangeListener on Controller 2]: Broker change listener fired for
> path /brokers/ids with children 1,2
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener) [2017-11-25
> 17:34:18,355] DEBUG [DeleteTopicsListener on 2]: Delete topics listener fired
> for topics to be deleted
> (kafka.controller.PartitionStateMachine$DeleteTopicsListener) [2017-11-25
> 17:34:18,362] INFO [AddPartitionsListener on 2]: Partition modification
> triggered {"version":1,"partitions":{"0":[1]}} for path /brokers/topics/ESQ
> (kafka.controller.PartitionStateMachine$PartitionModificationsListener)
> [2017-11-25 17:34:18,368] INFO [AddPartitionsListener on 2]: Partition
> modification triggered {"version":1,"partitions":{"0":[1]}} for path
> /brokers/topics/Test1
> (kafka.controller.PartitionStateMachine$PartitionModificationsListener)
> [2017-11-25 17:34:18,369] INFO [AddPartitionsListener on 2]: Partition
> modification triggered {"version":1,"partitions":{"0":[2]}} for path
> /brokers/topics/ImageQ
> (kafka.controller.PartitionStateMachine$PartitionModificationsListener)
> [2017-11-25 17:34:18,374] INFO [AddPartitionsListener on 2]: Partition
> modification triggered
> {"version":1,"partitions":{"8":[1,2],"4":[1,2],"9":[2,1],"5":[2,1],"6":[1,2],"1":[2,1],"0":[1,2],"2":[1,2],"7":[2,1],"3":[2,1]}}
> for path /brokers/topics/NMS_NotifyQ
> (kafka.controller.PartitionStateMachine$PartitionModificationsListener)
> [2017-11-25 17:34:18,375] INFO [AddPartitionsListener on 2]: Partition
> modification triggered {"version":1,"partitions":{"0":[1]}} for path
> /brokers/topics/TempBinLogReqQ @{color}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)