[ https://issues.apache.org/jira/browse/KAFKA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454611#comment-17454611 ]
Shivakumar Kedlaya edited comment on KAFKA-13077 at 12/7/21, 12:05 PM: ----------------------------------------------------------------------- [~causton] we have a similar issue in our cluster did you get any temporary solution as of now? [~junrao] can you please support us here was (Author: shivakumar): [~causton] we have a similar issue in our cluster did you get any temporary solution as of now? [~junrao] could you please support here > Replication failing after unclean shutdown of ZK and all brokers > ---------------------------------------------------------------- > > Key: KAFKA-13077 > URL: https://issues.apache.org/jira/browse/KAFKA-13077 > Project: Kafka > Issue Type: Bug > Affects Versions: 2.8.0 > Reporter: Christopher Auston > Priority: Minor > > I am submitting this in the spirit of what can go wrong when an operator > violates the constraints Kafka depends on. I don't know if Kafka could or > should handle this more gracefully. I decided to file this issue because it > was easy to get the problem I'm reporting with Kubernetes StatefulSets (STS). > By "easy" I mean that I did not go out of my way to corrupt anything, I just > was not careful when restarting ZK and brokers. > I violated the constraints of keeping Zookeeper stable and at least one > running in-sync replica. > I am running the bitnami/kafka helm chart on Amazon EKS. > {quote}% kubectl get po kaf-kafka-0 -ojson |jq .spec.containers'[].image' > "docker.io/bitnami/kafka:2.8.0-debian-10-r43" > {quote} > I started with 3 ZK instances and 3 brokers (both STS). I changed the > cpu/memory requests on both STS and kubernetes proceeded to restart ZK and > kafka instances at the same time. If I recall correctly there were some > crashes and several restarts but eventually all the instances were running > again. It's possible all ZK nodes and all brokers were unavailable at various > points. > The problem I noticed was that two of the brokers were just continually > spitting out messages like: > {quote}% kubectl logs kaf-kafka-0 --tail 10 > [2021-07-13 14:26:08,871] INFO [ProducerStateManager > partition=__transaction_state-0] Loading producer state from snapshot file > 'SnapshotFile(/bitnami/kafka/data/__transaction_state-0/00000000000000000001.snapshot,1)' > (kafka.log.ProducerStateManager) > [2021-07-13 14:26:08,871] WARN [Log partition=__transaction_state-0, > dir=/bitnami/kafka/data] *Non-monotonic update of high watermark from > (offset=2744 segment=[0:1048644]) to (offset=1 segment=[0:169])* > (kafka.log.Log) > [2021-07-13 14:26:08,874] INFO [Log partition=__transaction_state-10, > dir=/bitnami/kafka/data] Truncating to offset 2 (kafka.log.Log) > [2021-07-13 14:26:08,877] INFO [Log partition=__transaction_state-10, > dir=/bitnami/kafka/data] Loading producer state till offset 2 with message > format version 2 (kafka.log.Log) > [2021-07-13 14:26:08,877] INFO [ProducerStateManager > partition=__transaction_state-10] Loading producer state from snapshot file > 'SnapshotFile(/bitnami/kafka/data/__transaction_state-10/00000000000000000002.snapshot,2)' > (kafka.log.ProducerStateManager) > [2021-07-13 14:26:08,877] WARN [Log partition=__transaction_state-10, > dir=/bitnami/kafka/data] Non-monotonic update of high watermark from > (offset=2930 segment=[0:1048717]) to (offset=2 segment=[0:338]) > (kafka.log.Log) > [2021-07-13 14:26:08,880] INFO [Log partition=__transaction_state-20, > dir=/bitnami/kafka/data] Truncating to offset 1 (kafka.log.Log) > [2021-07-13 14:26:08,882] INFO [Log partition=__transaction_state-20, > dir=/bitnami/kafka/data] Loading producer state till offset 1 with message > format version 2 (kafka.log.Log) > [2021-07-13 14:26:08,882] INFO [ProducerStateManager > partition=__transaction_state-20] Loading producer state from snapshot file > 'SnapshotFile(/bitnami/kafka/data/__transaction_state-20/00000000000000000001.snapshot,1)' > (kafka.log.ProducerStateManager) > [2021-07-13 14:26:08,883] WARN [Log partition=__transaction_state-20, > dir=/bitnami/kafka/data] Non-monotonic update of high watermark from > (offset=2956 segment=[0:1048608]) to (offset=1 segment=[0:169]) > (kafka.log.Log) > {quote} > If I describe that topic I can see that several partitions have a leader of 2 > and the ISR is just 2 (NOTE I added two more brokers and tried to reassign > the topic onto brokers 2,3,4 which you can see below). The new brokers also > spit out the messages about "non-monotonic update" just like the original > followers. This describe output is from the following day. > {{% kafka-topics.sh ${=BS} -topic __transaction_state -describe}} > {{Topic: __transaction_state TopicId: i7bBNCeuQMWl-ZMpzrnMAw PartitionCount: > 50 ReplicationFactor: 3 Configs: > compression.type=uncompressed,min.insync.replicas=3,cleanup.policy=compact,flush.ms=1000,segment.bytes=104857600,flush.messages=10000,max.message.bytes=1000012,unclean.leader.election.enable=false,retention.bytes=1073741824}} > {{ Topic: __transaction_state Partition: 0 Leader: 2 Replicas: 4,3,2,1,0 Isr: > 2 Adding Replicas: 4,3 Removing Replicas: 1,0}} > {{ Topic: __transaction_state Partition: 1 Leader: 2 Replicas: 2,4,3 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 2 Leader: 3 Replicas: 3,2,4 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 3 Leader: 4 Replicas: 4,2,3 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 4 Leader: 2 Replicas: 2,3,4 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 5 Leader: 2 Replicas: 3,4,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 6 Leader: 4 Replicas: 4,3,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 7 Leader: 2 Replicas: 2,4,3 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 8 Leader: 2 Replicas: 3,2,4,0,1 Isr: > 2 Adding Replicas: 3,4 Removing Replicas: 0,1}} > {{ Topic: __transaction_state Partition: 9 Leader: 2 Replicas: 4,2,3,1,0 Isr: > 2 Adding Replicas: 4,3 Removing Replicas: 1,0}} > {{ Topic: __transaction_state Partition: 10 Leader: 2 Replicas: 2,3,4,1,0 > Isr: 2 Adding Replicas: 3,4 Removing Replicas: 1,0}} > {{ Topic: __transaction_state Partition: 11 Leader: 3 Replicas: 3,4,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 12 Leader: 4 Replicas: 4,3,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 13 Leader: 2 Replicas: 2,4,3 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 14 Leader: 3 Replicas: 3,2,4 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 15 Leader: 4 Replicas: 4,2,3 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 16 Leader: 2 Replicas: 2,3,4 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 17 Leader: 2 Replicas: 3,4,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 18 Leader: 4 Replicas: 4,3,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 19 Leader: 2 Replicas: 2,4,3,0,1 > Isr: 2 Adding Replicas: 4,3 Removing Replicas: 0,1}} > {{ Topic: __transaction_state Partition: 20 Leader: 2 Replicas: 3,2,4,0,1 > Isr: 2 Adding Replicas: 3,4 Removing Replicas: 0,1}} > {{ Topic: __transaction_state Partition: 21 Leader: 2 Replicas: 4,2,3,1,0 > Isr: 2 Adding Replicas: 4,3 Removing Replicas: 1,0}} > {{ Topic: __transaction_state Partition: 22 Leader: 2 Replicas: 2,3,4,1,0 > Isr: 2 Adding Replicas: 3,4 Removing Replicas: 1,0}} > {{ Topic: __transaction_state Partition: 23 Leader: 3 Replicas: 3,4,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 24 Leader: 4 Replicas: 4,3,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 25 Leader: 2 Replicas: 2,4,3 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 26 Leader: 3 Replicas: 3,2,4 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 27 Leader: 4 Replicas: 4,2,3 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 28 Leader: 2 Replicas: 2,3,4 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 29 Leader: 3 Replicas: 3,4,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 30 Leader: 4 Replicas: 4,3,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 31 Leader: 2 Replicas: 2,4,3 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 32 Leader: 3 Replicas: 3,2,4 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 33 Leader: 4 Replicas: 4,2,3 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 34 Leader: 2 Replicas: 2,3,4 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 35 Leader: 3 Replicas: 3,4,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 36 Leader: 4 Replicas: 4,3,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 37 Leader: 2 Replicas: 2,4,3 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 38 Leader: 3 Replicas: 3,2,4 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 39 Leader: 4 Replicas: 4,2,3 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 40 Leader: 2 Replicas: 2,3,4 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 41 Leader: 3 Replicas: 3,4,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 42 Leader: 4 Replicas: 4,3,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 43 Leader: 2 Replicas: 2,4,3 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 44 Leader: 3 Replicas: 3,2,4 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 45 Leader: 4 Replicas: 4,2,3 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 46 Leader: 2 Replicas: 2,3,4 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 47 Leader: 3 Replicas: 3,4,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 48 Leader: 4 Replicas: 4,3,2 Isr: > 2,3,4}} > {{ Topic: __transaction_state Partition: 49 Leader: 2 Replicas: 2,4,3 Isr: > 2,3,4}} > > It seems something got corrupted and the followers will never make progress. > Even worse the original followers appear to have truncated their copies, so > if the remaining leader replica is what is corrupted then it may have > truncated replicas that had more valid data? > Anyway, for what it's worth, this is something that happened to me. I plan to > change the statefulsets to require manual restarts so I can control rolling > upgrades. It also seems to underscore having a separate Kafka cluster for > disaster recovery. -- This message was sent by Atlassian Jira (v8.20.1#820001)