[ 
https://issues.apache.org/jira/browse/KAFKA-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anil reopened KAFKA-6281:
-------------------------

Thanks [~omkreddy], 

We've got this issue again today, twice. 

I see that both the times, there is GC running in my server1 , which I'm 
connecting to. But it seems to be of short duration. 

Please suggest, if that could be the cause. Regarding the network connection 
issue(s), b/w which components there could be issue ? I mean the zoo-keeper of 
server1, couldnot connect to zoo-keeper of server2 ? . We haven't found any 
connectivity issues, but still want to make sure on the ports and availablitiy.

 

 
{code:java}

Exception 1 :
-2018-06-04T05:03:35,967 [kafka-producer-network-thread | producer-1] ERROR 
-org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition.

GC 1:
2018-06-04T05:03:24.865-0400: 1465075.225: [GC pause (G1 Evacuation Pause) 
(young), 0.0158735 secs]
[Parallel Time: 15.3 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 1465075225.5, Avg: 1465075225.5, Max: 1465075225.5, 
Diff: 0.0]
[Ext Root Scanning (ms): Min: 0.7, Avg: 0.7, Max: 0.7, Diff: 0.1, Sum: 1.4]
[Update RS (ms): Min: 2.1, Avg: 2.3, Max: 2.4, Diff: 0.2, Sum: 4.5]
[Processed Buffers: Min: 19, Avg: 20.5, Max: 22, Diff: 3, Sum: 41]
[Scan RS (ms): Min: 3.6, Avg: 3.7, Max: 3.9, Diff: 0.2, Sum: 7.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 8.5, Avg: 8.5, Max: 8.6, Diff: 0.1, Sum: 17.0]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 15.2, Avg: 15.2, Max: 15.2, Diff: 0.0, Sum: 30.4]
[GC Worker End (ms): Min: 1465075240.7, Avg: 1465075240.7, Max: 1465075240.7, 
Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 0.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 46.0M(46.0M)->0.0B(100.0M) Survivors: 5120.0K->6144.0K Heap: 
262.0M(1024.0M)->217.0M(1024.0M)]
[Times: user=0.03 sys=0.00, real=0.02 secs]




--Exception 2 --
-2018-06-04T10:15:15,522 [kafka-producer-network-thread | producer-1] ERROR 
-org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition.

--GC--2

2018-05-18T10:15:11.501-0400: 14981.861: [GC pause (G1 Evacuation Pause) 
(young), 0.0205628 secs]
[Parallel Time: 20.0 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 14981861.1, Avg: 14981861.1, Max: 14981861.1, Diff: 
0.0]
[Ext Root Scanning (ms): Min: 0.6, Avg: 0.7, Max: 0.7, Diff: 0.0, Sum: 1.3]
[Update RS (ms): Min: 1.6, Avg: 1.6, Max: 1.6, Diff: 0.0, Sum: 3.1]
[Processed Buffers: Min: 12, Avg: 18.5, Max: 25, Diff: 13, Sum: 37]
[Scan RS (ms): Min: 2.6, Avg: 2.6, Max: 2.6, Diff: 0.0, Sum: 5.2]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 15.1, Avg: 15.1, Max: 15.1, Diff: 0.0, Sum: 30.2]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 19.9, Avg: 19.9, Max: 19.9, Diff: 0.0, Sum: 39.8]
[GC Worker End (ms): Min: 14981881.0, Avg: 14981881.0, Max: 14981881.0, Diff: 
0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.0 ms]
[Other: 0.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 63.0M(63.0M)->0.0B(79.0M) Survivors: 8192.0K->6144.0K Heap: 
425.0M(1024.0M)->364.5M(1024.0M)]
[Times: user=0.04 sys=0.00, real=0.02 secs]


{code}

> Kafka JavaAPI Producer failed with NotLeaderForPartitionException
> -----------------------------------------------------------------
>
>                 Key: KAFKA-6281
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6281
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Anil
>            Priority: Major
>         Attachments: server1-controller.log, server2-controller.log
>
>
> We are running Kafka (vesion kafka_2.11-0.10.1.0) in a 2 node cluster. We 
> have 2 producers (Java API) acting on different topics. Each topic has single 
> partition. The topic where we had this issue, has one consumer running. This 
> set up has been running fine for 3 months, and we saw this issue. All the 
> suggested cases/solutions for this issue in other forums don't seem to apply 
> for my scenario.
> Exception at producer;
> {code}
> -2017-11-25T17:40:33,035 [kafka-producer-network-thread | producer-1] ERROR 
> client.producer.BingLogProducerCallback - Encountered exception in sending 
> message ; > org.apache.kafka.common.errors.NotLeaderForPartitionException: 
> This server is not the leader for that topic-partition.
> {code}
> We haven't enabled retries for the messages, because this is transactional 
> data and we want to maintain the order.
> Producer config:
> {code}
> bootstrap.servers : server1ip:9092
> acks :all
> retries : 0
> linger.ms :0
> buffer.memory :10240000
> max.request.size :1024000
> key.serializer : org.apache.kafka.common.serialization.StringSerializer
> value.serializer : org.apache.kafka.common.serialization.StringSerializer
> {code}
> We are connecting to server1 at both producer and consumer. The controller 
> log at server2 indicates there is some shutdown happened at during sametime, 
> but I dont understand why this happened.
> {color:red}[2017-11-25 17:31:44,776] DEBUG [Controller 2]: topics not in 
> preferred replica Map() (kafka.controller.KafkaController) [2017-11-25 
> 17:31:44,776] TRACE [Controller 2]: leader imbalance ratio for broker 2 is 
> 0.000000 (kafka.controller.KafkaController) [2017-11-25 17:31:44,776] DEBUG 
> [Controller 2]: topics not in preferred replica Map() 
> (kafka.controller.KafkaController) [2017-11-25 17:31:44,776] TRACE 
> [Controller 2]: leader imbalance ratio for broker 1 is 0.000000 
> (kafka.controller.KafkaController) [2017-11-25 17:34:18,314] INFO 
> [SessionExpirationListener on 2], ZK expired; shut down all controller 
> components and try to re-elect 
> (kafka.controller.KafkaController$SessionExpirationListener) [2017-11-25 
> 17:34:18,317] DEBUG [Controller 2]: Controller resigning, broker id 2 
> (kafka.controller.KafkaController) [2017-11-25 17:34:18,317] DEBUG 
> [Controller 2]: De-registering IsrChangeNotificationListener 
> (kafka.controller.KafkaController) [2017-11-25 17:34:18,317] INFO 
> [delete-topics-thread-2], Shutting down 
> (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25 
> 17:34:18,317] INFO [delete-topics-thread-2], Stopped 
> (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25 
> 17:34:18,318] INFO [delete-topics-thread-2], Shutdown completed 
> (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25 
> 17:34:18,318] INFO [Partition state machine on Controller 2]: Stopped 
> partition state machine (kafka.controller.PartitionStateMachine) [2017-11-25 
> 17:34:18,318] INFO [Replica state machine on controller 2]: Stopped replica 
> state machine (kafka.controller.ReplicaStateMachine) [2017-11-25 
> 17:34:18,318] INFO [Controller-2-to-broker-2-send-thread], Shutting down 
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,318] INFO 
> [Controller-2-to-broker-2-send-thread], Stopped 
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO 
> [Controller-2-to-broker-2-send-thread], Shutdown completed 
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO 
> [Controller-2-to-broker-1-send-thread], Shutting down 
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO 
> [Controller-2-to-broker-1-send-thread], Stopped 
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO 
> [Controller-2-to-broker-1-send-thread], Shutdown completed 
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO 
> [Controller 2]: Broker 2 resigned as the controller 
> (kafka.controller.KafkaController) [2017-11-25 17:34:18,353] DEBUG 
> [IsrChangeNotificationListener] Fired!!! 
> (kafka.controller.IsrChangeNotificationListener) [2017-11-25 17:34:18,353] 
> DEBUG [IsrChangeNotificationListener] Fired!!! 
> (kafka.controller.IsrChangeNotificationListener) [2017-11-25 17:34:18,354] 
> INFO [BrokerChangeListener on Controller 2]: Broker change listener fired for 
> path /brokers/ids with children 1,2 
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener) [2017-11-25 
> 17:34:18,355] DEBUG [DeleteTopicsListener on 2]: Delete topics listener fired 
> for topics to be deleted 
> (kafka.controller.PartitionStateMachine$DeleteTopicsListener) [2017-11-25 
> 17:34:18,362] INFO [AddPartitionsListener on 2]: Partition modification 
> triggered {"version":1,"partitions":{"0":[1]}} for path /brokers/topics/ESQ 
> (kafka.controller.PartitionStateMachine$PartitionModificationsListener) 
> [2017-11-25 17:34:18,368] INFO [AddPartitionsListener on 2]: Partition 
> modification triggered {"version":1,"partitions":{"0":[1]}} for path 
> /brokers/topics/Test1 
> (kafka.controller.PartitionStateMachine$PartitionModificationsListener) 
> [2017-11-25 17:34:18,369] INFO [AddPartitionsListener on 2]: Partition 
> modification triggered {"version":1,"partitions":{"0":[2]}} for path 
> /brokers/topics/ImageQ 
> (kafka.controller.PartitionStateMachine$PartitionModificationsListener) 
> [2017-11-25 17:34:18,374] INFO [AddPartitionsListener on 2]: Partition 
> modification triggered 
> {"version":1,"partitions":{"8":[1,2],"4":[1,2],"9":[2,1],"5":[2,1],"6":[1,2],"1":[2,1],"0":[1,2],"2":[1,2],"7":[2,1],"3":[2,1]}}
>  for path /brokers/topics/NMS_NotifyQ 
> (kafka.controller.PartitionStateMachine$PartitionModificationsListener) 
> [2017-11-25 17:34:18,375] INFO [AddPartitionsListener on 2]: Partition 
> modification triggered {"version":1,"partitions":{"0":[1]}} for path 
> /brokers/topics/TempBinLogReqQ @{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to