[jira] [Comment Edited] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2020-12-03 Thread zhangzhisheng (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241383#comment-17241383
 ] 

zhangzhisheng edited comment on KAFKA-3042 at 12/3/20, 8:27 AM:


using kafka_2.12-2.4.1,zookeeper-3.5.7

3 ZKs 3 Broker cluster, topic replication factor is 2
 linux (redhat) xfs kafka logs on single local disk

error info 

 
{code:java}
// server info

kafka_2.12-2.4.1/logs/server.log.2020-11-28-01:[2020-11-28 01:51:36,636] INFO 
[GroupCoordinator 2]: Assignment received from leader for group 
money-repayment-cmd-listener-1606227494097 for generation 126 
(kafka.coordinator.group.GroupCoordinator)
kafka_2.12-2.4.1/logs/server.log.2020-11-28-01:[2020-11-28 01:51:37,220] INFO 
[Partition __consumer_offsets-13 broker=2] Shrinking ISR from 2,0,1 to 
2.Leader: (highWatermark: 500993846, endOffset: 500993972). Out of sync 
replicas: (brokerId: 0, endOffset: 500993846) (brokerId: 1, endOffset: 
500993967). (kafka.cluster.Partition)
kafka_2.12-2.4.1/logs/server.log.2020-11-28-01:[2020-11-28 01:51:37,223] INFO 
[Partition __consumer_offsets-13 broker=2] Cached zkVersion 131 not equalto 
that in zookeeper, skip updating ISR (kafka.cluster.Partition)
kafka_2.12-2.4.1/logs/server.log.2020-11-28-01:[2020-11-28 01:51:37,223] INFO 
[Partition __consumer_offsets-46 broker=2] Shrinking ISR from 2,0,1 to 
2.Leader: (highWatermark: 281523643, endOffset: 281523684). Out of sync 
replicas: (brokerId: 0, endOffset: 281523643) (brokerId: 1, endOffset: 
281523683). (kafka.cluster.Partition)
kafka_2.12-2.4.1/logs/server.log.2020-11-28-01:[2020-11-28 01:51:37,224] INFO 
[Partition __consumer_offsets-46 broker=2] Cached zkVersion 123 not equalto 
that in zookeeper, skip updating ISR (kafka.cluster.Partition)
kafka_2.12-2.4.1/logs/server.log.2020-11-28-01:[2020-11-28 01:51:37,224] INFO 
[Partition fcp-FFF-LOANFILE-201806271059-2 broker=2] Shrinking ISR from 2,0,1 
to 2. Leader: (highWatermark: 9302797, endOffset: 9302806). Out of sync 
replicas: (brokerId: 0, endOffset: 9302797) (brokerId: 1, endOffset: 9302804). 
(kafka.cluster.Partition)
kafka_2.12-2.4.1/logs/server.log.2020-11-28-01:[2020-11-28 01:51:37,227] INFO 
[Partition fcp-FFF-LOANFILE-201806271059-2 broker=2] Cached zkVersion 125 not 
equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
{code}
{code:java}
// state info
kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,073] 
ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state 
for partition __consumer_offsets-22 from OnlinePartition to OnlinePartition 
(state.change.logger) 
kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,073] 
ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state 
for partition fcp-FFF-account-201807131719-2 from OnlinePartition to 
OnlinePartition (state.change.logger) 
kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] 
ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state 
for partition fcp-PCP-INSTRANSACTIONPOLICY-2018079116-0 from OnlinePartition to 
OnlinePartition (state.change.logger) 
kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] 
ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state 
for partition LOAN_FAIL_MANAGE-202011231831270534-1 from OnlinePartition to 
OnlinePartition (state.change.logger) 
kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] 
ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state 
for partition fcp-PFINMONEYMONITOR-LOANTXNSUB-201806271129-0 from 
OnlinePartition to OnlinePartition (state.change.logger) 
kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] 
ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state 
for partition __consumer_offsets-4 from OnlinePartition to OnlinePartition 
(state.change.logger) 
kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] 
ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state 
for partition __consumer_command_request-5 from OnlinePartition to 
OnlinePartition (state.change.logger) 
kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] 
ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state 
for partition fcp-creditcore-loan-trans-201809112022-2 from OnlinePartition to 
OnlinePartition (state.change.logger) 
kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] 
ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state 
for partition fcp-CREDITCORE-LOAN-TRANS-20180791126-0 from OnlinePartition to 
OnlinePartition (state.change.logger) 
kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] 
ERROR [Controller id=0 

[jira] [Comment Edited] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2020-12-03 Thread zhangzhisheng (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241383#comment-17241383
 ] 

zhangzhisheng edited comment on KAFKA-3042 at 12/3/20, 8:26 AM:


using kafka_2.12-2.4.1,zookeeper-3.5.7

3 ZKs 3 Broker cluster, topic replication factor is 2
 linux (redhat) xfs kafka logs on single local disk

error info 

 
{code:java}
// controller info
fka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition __consumer_offsets-22 (kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition fcp-FFF-account-201807131719-2 (kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition fcp-PCP-INSTRANSACTIONPOLICY-2018079116-0 
(kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition LOAN_FAIL_MANAGE-202011231831270534-1 
(kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition fcp-FFF-LOANTXNSUB-201806271129-0 
(kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition __consumer_offsets-4 (kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition __consumer_command_request-5 (kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition fcp-creditcore-loan-trans-201809112022-2 
(kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,079] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition fcp-CREDITCORE-LOAN-TRANS-20180791126-0 
(kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,079] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition __consumer_offsets-7 (kafka.controller.KafkaController)
{code}


was (Author: zhangzs):
using kafka_2.12-2.4.1,zookeeper-3.5.7

3 ZKs 3 Broker cluster, topic replication factor is 2
 linux (redhat) xfs kafka logs on single local disk

error info 

 
{code:java}
// code placeholder
fka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition __consumer_offsets-22 (kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition fcp-FFF-account-201807131719-2 (kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition fcp-PCP-INSTRANSACTIONPOLICY-2018079116-0 
(kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition LOAN_FAIL_MANAGE-202011231831270534-1 
(kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition fcp-FFF-LOANTXNSUB-201806271129-0 
(kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition __consumer_offsets-4 (kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition __consumer_command_request-5 (kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition fcp-creditcore-loan-trans-201809112022-2 
(kafka.controller.Kaf

[jira] [Comment Edited] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2020-12-03 Thread zhangzhisheng (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241383#comment-17241383
 ] 

zhangzhisheng edited comment on KAFKA-3042 at 12/3/20, 8:25 AM:


using kafka_2.12-2.4.1,zookeeper-3.5.7

3 ZKs 3 Broker cluster, topic replication factor is 2
 linux (redhat) xfs kafka logs on single local disk

error info 

 
{code:java}
// code placeholder
fka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition __consumer_offsets-22 (kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition fcp-FFF-account-201807131719-2 (kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition fcp-PCP-INSTRANSACTIONPOLICY-2018079116-0 
(kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition LOAN_FAIL_MANAGE-202011231831270534-1 
(kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition fcp-FFF-LOANTXNSUB-201806271129-0 
(kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition __consumer_offsets-4 (kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition __consumer_command_request-5 (kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition fcp-creditcore-loan-trans-201809112022-2 
(kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,079] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition fcp-CREDITCORE-LOAN-TRANS-20180791126-0 
(kafka.controller.KafkaController) 
kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,079] 
ERROR [Controller id=0] Error completing replica leader election (PREFERRED) 
for partition __consumer_offsets-7 (kafka.controller.KafkaController)
{code}


was (Author: zhangzs):
using kafka_2.12-2.4.1,zookeeper-3.5.7

3 ZKs 3 Broker cluster, topic replication factor is 2
linux (redhat) xfs kafka logs on single local disk

error info 
{code:java}
// code placeholder
[2020-12-01 15:38:22,237] INFO [Partition  topic-cs-201907181035-0 broker=2] 
Cached zkVersion 59 not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2020-12-01 15:38:22,237] INFO [Partition  
topic-repay-plan-detail-201809112057-5 broker=2] Shrinking ISR from 2,0 to 2. 
Leader: (highWatermark: 173252090, endOffset: 173426233). Out of sync replicas: 
(brokerId: 0, endOffset: 173252090). (kafka.cluster.Partition)
[2020-12-01 15:38:22,238] INFO [Partition  
topic-repay-plan-detail-201809112057-5 broker=2] Cached zkVersion 81 not equal 
to that in zookeeper,skip updating ISR (kafka.cluster.Partition)
[2020-12-01 15:38:22,239] INFO [Partition  topic-pay-flow-201810181631-1 
broker=2] Shrinking ISR from 2,0 to 2. Leader: (highWatermark: 334799502, 
endOffset: 335281045). Out of sync replicas: (brokerId: 0, endOffset: 
334799502). (kafka.cluster.Partition)
[2020-12-01 15:38:22,240] INFO [Partition  topic-pay-flow-201810181631-1 
broker=2] Cached zkVersion 85 not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2020-12-01 15:38:22,240] INFO [Partition  
topic-repay-plan-detail-201809112057-1 broker=2] Shrinking ISR from 2,0 to 2. 
Leader: (highWatermark: 302761557, endOffset: 302935719). Out of sync replicas: 
(brokerId: 0, endOffset: 302761557). (kafka.cluster.Partition)
[2020-12-01 15:38:22,242] INFO [Partition  
topic-repay-plan-detail-201809112057-1 broker=2] Cached zkVersion 90 not equal 
to that in zookeeper,skip updating ISR (kafka.cluster.Partition)
{code}

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Ver

[jira] [Comment Edited] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2018-07-25 Thread Don (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555485#comment-16555485
 ] 

Don edited comment on KAFKA-3042 at 7/25/18 10:21 AM:
--

We tried reproducing the issue on confluent docker image 4.1.0 by commenting 
out:
 - #name: KAFKA_REPLICA_LAG_TIME_MAX_MS
 #value: "14000"
 - #name: KAFKA_ZOOKEEPER_SESSION_TIMEOUT_MS
 #value: "21000"

In one of our environments. 
 Fortunately? we haven't observed "Cached zkVersion 54 not equal to that in 
zookeeper, skip updating ISR" for almost two months. 

It could be that we had a configuration issue, e.g. we used an older version of 
kafka, when i reported "We still observed ...".
 Sorry for the false alarm. 


was (Author: donis):
We tried reproducing the issue on confluent docker image 4.1.0 by commenting 
out:

#- name: KAFKA_REPLICA_LAG_TIME_MAX_MS
#value: "14000"
#- name: KAFKA_ZOOKEEPER_SESSION_TIMEOUT_MS
#value: "21000"

In one of our environments. 
Fortunately? we haven't observed "Cached zkVersion 54 not equal to that in 
zookeeper, skip updating ISR" for almost two months. 

It could be that we had a configuration issue, e.g. we used an older version of 
kafka, when i reported "We still observed ...".
Sorry for the false alarm. 

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.10.0.0
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
>Assignee: Dong Lin
>Priority: Major
>  Labels: reliability
> Fix For: 2.1.0
>
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2018-07-25 Thread Don (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555485#comment-16555485
 ] 

Don edited comment on KAFKA-3042 at 7/25/18 10:21 AM:
--

We tried reproducing the issue on confluent docker image 4.1.0 by commenting 
out:

#name: KAFKA_REPLICA_LAG_TIME_MAX_MS
 #value: "14000"

#name: KAFKA_ZOOKEEPER_SESSION_TIMEOUT_MS
 #value: "21000"

In one of our environments. 
 Fortunately? we haven't observed "Cached zkVersion 54 not equal to that in 
zookeeper, skip updating ISR" for almost two months. 

It could be that we had a configuration issue, e.g. we used an older version of 
kafka, when i reported "We still observed ...".
 Sorry for the false alarm. 


was (Author: donis):
We tried reproducing the issue on confluent docker image 4.1.0 by commenting 
out:
 - #name: KAFKA_REPLICA_LAG_TIME_MAX_MS
 #value: "14000"
 - #name: KAFKA_ZOOKEEPER_SESSION_TIMEOUT_MS
 #value: "21000"

In one of our environments. 
 Fortunately? we haven't observed "Cached zkVersion 54 not equal to that in 
zookeeper, skip updating ISR" for almost two months. 

It could be that we had a configuration issue, e.g. we used an older version of 
kafka, when i reported "We still observed ...".
 Sorry for the false alarm. 

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.10.0.0
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
>Assignee: Dong Lin
>Priority: Major
>  Labels: reliability
> Fix For: 2.1.0
>
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)