[jira] [Commented] (KAFKA-15407) Not able to connect to kafka from the Private NLB from outside the VPC account
[ https://issues.apache.org/jira/browse/KAFKA-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759487#comment-17759487 ] Shivakumar commented on KAFKA-15407: [~viktorsomogyi] can you please help us with this issue > Not able to connect to kafka from the Private NLB from outside the VPC > account > --- > > Key: KAFKA-15407 > URL: https://issues.apache.org/jira/browse/KAFKA-15407 > Project: Kafka > Issue Type: Bug > Components: clients, connect, consumer, producer , protocol > Environment: Staging, PROD >Reporter: Shivakumar >Priority: Blocker > Attachments: image-2023-08-28-12-37-33-100.png > > > !image-2023-08-28-12-37-33-100.png|width=768,height=223! > Problem statement : > We are trying to connect Kafka from another account/VPC account > Our kafka is in EKS cluster , we have service pointing to these pods for > connection > We tried to create private link endpoint form Account B to connect to our NLB > to connect to our Kafka in Account A > We see the connection reset from both client and target(kafka) in the NLB > monitoring tab of AWS. > We tried various combo of listeners and advertised listeners which did not > help us. > We are assuming we are missing some combination of Listeners and Network > level configs with which this connection can be made > Can you please guide us with this as we are blocked with a major migration. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15407) Not able to connect to kafka from the Private NLB from outside the VPC account
Shivakumar created KAFKA-15407: -- Summary: Not able to connect to kafka from the Private NLB from outside the VPC account Key: KAFKA-15407 URL: https://issues.apache.org/jira/browse/KAFKA-15407 Project: Kafka Issue Type: Bug Components: clients, connect, consumer, producer , protocol Environment: Staging, PROD Reporter: Shivakumar Attachments: image-2023-08-28-12-37-33-100.png !image-2023-08-28-12-37-33-100.png|width=768,height=223! Problem statement : We are trying to connect Kafka from another account/VPC account Our kafka is in EKS cluster , we have service pointing to these pods for connection We tried to create private link endpoint form Account B to connect to our NLB to connect to our Kafka in Account A We see the connection reset from both client and target(kafka) in the NLB monitoring tab of AWS. We tried various combo of listeners and advertised listeners which did not help us. We are assuming we are missing some combination of Listeners and Network level configs with which this connection can be made Can you please guide us with this as we are blocked with a major migration. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-13805) Upgrade vulnerable dependencies march 2022
[ https://issues.apache.org/jira/browse/KAFKA-13805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivakumar updated KAFKA-13805: --- Reviewer: Abhishek > Upgrade vulnerable dependencies march 2022 > -- > > Key: KAFKA-13805 > URL: https://issues.apache.org/jira/browse/KAFKA-13805 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.8.1, 3.0.1 >Reporter: Shivakumar >Priority: Blocker > Labels: secutiry > > https://nvd.nist.gov/vuln/detail/CVE-2020-36518 > |Packages|Package Version|CVSS|Fix Status| > |com.fasterxml.jackson.core_jackson-databind| 2.10.5.1| 7.5|fixed in 2.13.2.1| > |com.fasterxml.jackson.core_jackson-databind|2.13.1|7.5|fixed in 2.13.2.1| > Our security scan detected the above vulnerabilities > upgrade to correct versions for fixing vulnerabilities -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14100) Upgrade vulnerable dependencies Kafka version 3.1.1 July 2022
[ https://issues.apache.org/jira/browse/KAFKA-14100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivakumar updated KAFKA-14100: --- Description: ||CVE ID ||Type||Severity||Packages||Package Version||CVSS||Fix Status|| |CVE-2022-2048 |java|high|org.eclipse.jetty_jetty-io|9.4.44.v20210927|7.5|fixed in 11.0.9, 10.0.9, 9.4.47| Our security scan detected the above vulnerabilities upgrade to correct versions for fixing vulnerabilities was: |Packages|Package Version|CVSS|Fix Status| |com.fasterxml.jackson.core_jackson-databind| 2.10.5.1| 7.5| fixed in 2.14, 2.13.1, 2.12.6| | | | | | Our security scan detected the above vulnerabilities upgrade to correct versions for fixing vulnerabilities > Upgrade vulnerable dependencies Kafka version 3.1.1 July 2022 > -- > > Key: KAFKA-14100 > URL: https://issues.apache.org/jira/browse/KAFKA-14100 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.8.1 >Reporter: Shivakumar >Priority: Major > Labels: secutiry > Fix For: 3.0.1, 3.2.0, 3.1.1 > > > ||CVE ID ||Type||Severity||Packages||Package Version||CVSS||Fix Status|| > |CVE-2022-2048 > |java|high|org.eclipse.jetty_jetty-io|9.4.44.v20210927|7.5|fixed in 11.0.9, > 10.0.9, 9.4.47| > Our security scan detected the above vulnerabilities > upgrade to correct versions for fixing vulnerabilities -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14100) Upgrade vulnerable dependencies Kafka version 3.1.1 July 2022
Shivakumar created KAFKA-14100: -- Summary: Upgrade vulnerable dependencies Kafka version 3.1.1 July 2022 Key: KAFKA-14100 URL: https://issues.apache.org/jira/browse/KAFKA-14100 Project: Kafka Issue Type: Bug Affects Versions: 2.8.1 Reporter: Shivakumar Fix For: 3.0.1, 3.2.0, 3.1.1 |Packages|Package Version|CVSS|Fix Status| |com.fasterxml.jackson.core_jackson-databind| 2.10.5.1| 7.5| fixed in 2.14, 2.13.1, 2.12.6| | | | | | Our security scan detected the above vulnerabilities upgrade to correct versions for fixing vulnerabilities -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-13726) Fix Vulnerability CVE-2022-23181 -Upgrade org.apache.tomcat.embed_tomcat-embed-core
[ https://issues.apache.org/jira/browse/KAFKA-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17554353#comment-17554353 ] Shivakumar commented on KAFKA-13726: getting this vulnerability in *3.1.1 Kafka* version as well. > Fix Vulnerability CVE-2022-23181 -Upgrade > org.apache.tomcat.embed_tomcat-embed-core > --- > > Key: KAFKA-13726 > URL: https://issues.apache.org/jira/browse/KAFKA-13726 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.8.1 >Reporter: Chris Sabelstrom >Priority: Major > > Our security scanner detected the following vulnerablity. Please upgrade to > version noted in Fix Status column. > |CVE ID|Severity|Packages|Package Version|CVSS|Fix Status| > |CVE-2022-23181|high|org.apache.tomcat.embed_tomcat-embed-core|9.0.54|7|fixed > in 10.0.0, 9.0.1| -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (KAFKA-13805) Upgrade vulnerable dependencies march 2022
[ https://issues.apache.org/jira/browse/KAFKA-13805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivakumar updated KAFKA-13805: --- Description: |Packages|Package Version|CVSS|Fix Status| |com.fasterxml.jackson.core_jackson-databind| 2.10.5.1| 7.5|fixed in 2.13.0| |com.fasterxml.jackson.core_jackson-databind|2.13.1|7.5|fixed in 2.13.0| Our security scan detected the above vulnerabilities upgrade to correct versions for fixing vulnerabilities was: |Packages|Package Version|CVSS|Fix Status| |com.fasterxml.jackson.core_jackson-databind| 2.10.5.1| 7.5| fixed in 2.14, 2.13.1, 2.12.6| | | | | | Our security scan detected the above vulnerabilities upgrade to correct versions for fixing vulnerabilities > Upgrade vulnerable dependencies march 2022 > -- > > Key: KAFKA-13805 > URL: https://issues.apache.org/jira/browse/KAFKA-13805 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.8.1 >Reporter: Shivakumar >Priority: Major > Labels: secutiry > Fix For: 3.0.1, 3.2.0, 3.1.1 > > > |Packages|Package Version|CVSS|Fix Status| > |com.fasterxml.jackson.core_jackson-databind| 2.10.5.1| 7.5|fixed in 2.13.0| > |com.fasterxml.jackson.core_jackson-databind|2.13.1|7.5|fixed in 2.13.0| > Our security scan detected the above vulnerabilities > upgrade to correct versions for fixing vulnerabilities -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13805) Upgrade vulnerable dependencies march 2022
Shivakumar created KAFKA-13805: -- Summary: Upgrade vulnerable dependencies march 2022 Key: KAFKA-13805 URL: https://issues.apache.org/jira/browse/KAFKA-13805 Project: Kafka Issue Type: Bug Affects Versions: 2.8.1 Reporter: Shivakumar Fix For: 3.0.1, 3.2.0, 3.1.1 |Packages|Package Version|CVSS|Fix Status| |com.fasterxml.jackson.core_jackson-databind| 2.10.5.1| 7.5| fixed in 2.14, 2.13.1, 2.12.6| | | | | | Our security scan detected the above vulnerabilities upgrade to correct versions for fixing vulnerabilities -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (KAFKA-13658) Upgrade vulnerable dependencies jan 2022
[ https://issues.apache.org/jira/browse/KAFKA-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivakumar updated KAFKA-13658: --- Description: |Packages|Package Version|CVSS|Fix Status| |com.fasterxml.jackson.core_jackson-databind| 2.10.5.1| 7.5| fixed in 2.14, 2.13.1, 2.12.6| | | | | | Our security scan detected the above vulnerabilities upgrade to correct versions for fixing vulnerabilities was: |Packages|Package Version|CVSS|Fix Status| |io.netty_netty-codec (CVE-2021-43797)|4.1.62.Final|6.5|fixed in 4.1.71| |org.eclipse.jetty_jetty-server|9.4.43.v20210629|5.5|fixed in 9.4.44| |org.eclipse.jetty_jetty-servlet|9.4.43.v20210629|5.5|fixed in 9.4.44| Our security scan detected the above vulnerabilities upgrade to correct versions for fixing vulnerabilities > Upgrade vulnerable dependencies jan 2022 > > > Key: KAFKA-13658 > URL: https://issues.apache.org/jira/browse/KAFKA-13658 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.8.1 >Reporter: Shivakumar >Assignee: Luke Chen >Priority: Major > Labels: secutiry > > |Packages|Package Version|CVSS|Fix Status| > |com.fasterxml.jackson.core_jackson-databind| 2.10.5.1| 7.5| fixed in 2.14, > 2.13.1, 2.12.6| > | | | | | > Our security scan detected the above vulnerabilities > upgrade to correct versions for fixing vulnerabilities -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13658) Upgrade vulnerable dependencies jan 2022
Shivakumar created KAFKA-13658: -- Summary: Upgrade vulnerable dependencies jan 2022 Key: KAFKA-13658 URL: https://issues.apache.org/jira/browse/KAFKA-13658 Project: Kafka Issue Type: Bug Affects Versions: 2.8.1 Reporter: Shivakumar Assignee: Luke Chen |Packages|Package Version|CVSS|Fix Status| |io.netty_netty-codec (CVE-2021-43797)|4.1.62.Final|6.5|fixed in 4.1.71| |org.eclipse.jetty_jetty-server|9.4.43.v20210629|5.5|fixed in 9.4.44| |org.eclipse.jetty_jetty-servlet|9.4.43.v20210629|5.5|fixed in 9.4.44| Our security scan detected the above vulnerabilities upgrade to correct versions for fixing vulnerabilities -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (KAFKA-13579) Upgrade vulnerable dependencies
[ https://issues.apache.org/jira/browse/KAFKA-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivakumar updated KAFKA-13579: --- Summary: Upgrade vulnerable dependencies(was: Upgrade vulnerable dependencies for jetty packages ) > Upgrade vulnerable dependencies > - > > Key: KAFKA-13579 > URL: https://issues.apache.org/jira/browse/KAFKA-13579 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.8.1 >Reporter: Shivakumar >Assignee: Luke Chen >Priority: Major > Labels: secutiry > > |Packages|Package Version|CVSS|Fix Status| > |io.netty_netty-codec (CVE-2021-43797)|4.1.62.Final|6.5|fixed in 4.1.71| > |org.eclipse.jetty_jetty-server|9.4.43.v20210629|5.5|fixed in 9.4.44| > |org.eclipse.jetty_jetty-servlet|9.4.43.v20210629|5.5|fixed in 9.4.44| > Our security scan detected the above vulnerabilities > upgrade to correct versions for fixing vulnerabilities -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (KAFKA-13579) Upgrade vulnerable dependencies for jetty packages
[ https://issues.apache.org/jira/browse/KAFKA-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivakumar updated KAFKA-13579: --- Description: |Packages|Package Version|CVSS|Fix Status| |io.netty_netty-codec (CVE-2021-43797)|4.1.62.Final|6.5|fixed in 4.1.71| |org.eclipse.jetty_jetty-server|9.4.43.v20210629|5.5|fixed in 9.4.44| |org.eclipse.jetty_jetty-servlet|9.4.43.v20210629|5.5|fixed in 9.4.44| Our security scan detected the above vulnerabilities upgrade to correct versions for fixing vulnerabilities was: |Packages|Package Version|CVSS|Fix Status| |org.eclipse.jetty_jetty-server|9.4.43.v20210629|5.5|fixed in 9.4.44| |org.eclipse.jetty_jetty-servlet|9.4.43.v20210629|5.5|fixed in 9.4.44| Our security scan detected the above vulnerabilities upgrade to correct versions for fixing vulnerabilities > Upgrade vulnerable dependencies for jetty packages > --- > > Key: KAFKA-13579 > URL: https://issues.apache.org/jira/browse/KAFKA-13579 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.8.1 >Reporter: Shivakumar >Priority: Major > Labels: secutiry > > |Packages|Package Version|CVSS|Fix Status| > |io.netty_netty-codec (CVE-2021-43797)|4.1.62.Final|6.5|fixed in 4.1.71| > |org.eclipse.jetty_jetty-server|9.4.43.v20210629|5.5|fixed in 9.4.44| > |org.eclipse.jetty_jetty-servlet|9.4.43.v20210629|5.5|fixed in 9.4.44| > Our security scan detected the above vulnerabilities > upgrade to correct versions for fixing vulnerabilities -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13579) Upgrade vulnerable dependencies for jetty packages
Shivakumar created KAFKA-13579: -- Summary: Upgrade vulnerable dependencies for jetty packages Key: KAFKA-13579 URL: https://issues.apache.org/jira/browse/KAFKA-13579 Project: Kafka Issue Type: Bug Affects Versions: 2.8.1 Reporter: Shivakumar |Packages|Package Version|CVSS|Fix Status| |org.eclipse.jetty_jetty-server|9.4.43.v20210629|5.5|fixed in 9.4.44| |org.eclipse.jetty_jetty-servlet|9.4.43.v20210629|5.5|fixed in 9.4.44| Our security scan detected the above vulnerabilities upgrade to correct versions for fixing vulnerabilities -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KAFKA-13077) Replication failing after unclean shutdown of ZK and all brokers
[ https://issues.apache.org/jira/browse/KAFKA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461952#comment-17461952 ] Shivakumar commented on KAFKA-13077: [~junrao] : We have restarted the zk and also Kafka brokers, but we still end up with the ISR=2 and Leader=2 and other brokers are out of ISR. This scenario is not getting resolved by waiting or by rolling restart of brokers. we are in search of a solution that does not involve data loss by deleting the data directory. > Replication failing after unclean shutdown of ZK and all brokers > > > Key: KAFKA-13077 > URL: https://issues.apache.org/jira/browse/KAFKA-13077 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Christopher Auston >Priority: Minor > > I am submitting this in the spirit of what can go wrong when an operator > violates the constraints Kafka depends on. I don't know if Kafka could or > should handle this more gracefully. I decided to file this issue because it > was easy to get the problem I'm reporting with Kubernetes StatefulSets (STS). > By "easy" I mean that I did not go out of my way to corrupt anything, I just > was not careful when restarting ZK and brokers. > I violated the constraints of keeping Zookeeper stable and at least one > running in-sync replica. > I am running the bitnami/kafka helm chart on Amazon EKS. > {quote}% kubectl get po kaf-kafka-0 -ojson |jq .spec.containers'[].image' > "docker.io/bitnami/kafka:2.8.0-debian-10-r43" > {quote} > I started with 3 ZK instances and 3 brokers (both STS). I changed the > cpu/memory requests on both STS and kubernetes proceeded to restart ZK and > kafka instances at the same time. If I recall correctly there were some > crashes and several restarts but eventually all the instances were running > again. It's possible all ZK nodes and all brokers were unavailable at various > points. > The problem I noticed was that two of the brokers were just continually > spitting out messages like: > {quote}% kubectl logs kaf-kafka-0 --tail 10 > [2021-07-13 14:26:08,871] INFO [ProducerStateManager > partition=__transaction_state-0] Loading producer state from snapshot file > 'SnapshotFile(/bitnami/kafka/data/__transaction_state-0/0001.snapshot,1)' > (kafka.log.ProducerStateManager) > [2021-07-13 14:26:08,871] WARN [Log partition=__transaction_state-0, > dir=/bitnami/kafka/data] *Non-monotonic update of high watermark from > (offset=2744 segment=[0:1048644]) to (offset=1 segment=[0:169])* > (kafka.log.Log) > [2021-07-13 14:26:08,874] INFO [Log partition=__transaction_state-10, > dir=/bitnami/kafka/data] Truncating to offset 2 (kafka.log.Log) > [2021-07-13 14:26:08,877] INFO [Log partition=__transaction_state-10, > dir=/bitnami/kafka/data] Loading producer state till offset 2 with message > format version 2 (kafka.log.Log) > [2021-07-13 14:26:08,877] INFO [ProducerStateManager > partition=__transaction_state-10] Loading producer state from snapshot file > 'SnapshotFile(/bitnami/kafka/data/__transaction_state-10/0002.snapshot,2)' > (kafka.log.ProducerStateManager) > [2021-07-13 14:26:08,877] WARN [Log partition=__transaction_state-10, > dir=/bitnami/kafka/data] Non-monotonic update of high watermark from > (offset=2930 segment=[0:1048717]) to (offset=2 segment=[0:338]) > (kafka.log.Log) > [2021-07-13 14:26:08,880] INFO [Log partition=__transaction_state-20, > dir=/bitnami/kafka/data] Truncating to offset 1 (kafka.log.Log) > [2021-07-13 14:26:08,882] INFO [Log partition=__transaction_state-20, > dir=/bitnami/kafka/data] Loading producer state till offset 1 with message > format version 2 (kafka.log.Log) > [2021-07-13 14:26:08,882] INFO [ProducerStateManager > partition=__transaction_state-20] Loading producer state from snapshot file > 'SnapshotFile(/bitnami/kafka/data/__transaction_state-20/0001.snapshot,1)' > (kafka.log.ProducerStateManager) > [2021-07-13 14:26:08,883] WARN [Log partition=__transaction_state-20, > dir=/bitnami/kafka/data] Non-monotonic update of high watermark from > (offset=2956 segment=[0:1048608]) to (offset=1 segment=[0:169]) > (kafka.log.Log) > {quote} > If I describe that topic I can see that several partitions have a leader of 2 > and the ISR is just 2 (NOTE I added two more brokers and tried to reassign > the topic onto brokers 2,3,4 which you can see below). The new brokers also > spit out the messages about "non-monotonic update" just like the original > followers. This describe output is from the following day. > {{% kafka-topics.sh ${=BS} -topic __transaction_state -describe}} > {{Topic: __transaction_state TopicId: i7bBNCeuQMWl-ZMpzrnMAw PartitionCount: > 50 ReplicationFactor: 3 Configs: >
[jira] [Commented] (KAFKA-13077) Replication failing after unclean shutdown of ZK and all brokers
[ https://issues.apache.org/jira/browse/KAFKA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461241#comment-17461241 ] Shivakumar commented on KAFKA-13077: [~junrao] could you please suggest what can we do or recommend any changes to fix this? > Replication failing after unclean shutdown of ZK and all brokers > > > Key: KAFKA-13077 > URL: https://issues.apache.org/jira/browse/KAFKA-13077 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Christopher Auston >Priority: Minor > > I am submitting this in the spirit of what can go wrong when an operator > violates the constraints Kafka depends on. I don't know if Kafka could or > should handle this more gracefully. I decided to file this issue because it > was easy to get the problem I'm reporting with Kubernetes StatefulSets (STS). > By "easy" I mean that I did not go out of my way to corrupt anything, I just > was not careful when restarting ZK and brokers. > I violated the constraints of keeping Zookeeper stable and at least one > running in-sync replica. > I am running the bitnami/kafka helm chart on Amazon EKS. > {quote}% kubectl get po kaf-kafka-0 -ojson |jq .spec.containers'[].image' > "docker.io/bitnami/kafka:2.8.0-debian-10-r43" > {quote} > I started with 3 ZK instances and 3 brokers (both STS). I changed the > cpu/memory requests on both STS and kubernetes proceeded to restart ZK and > kafka instances at the same time. If I recall correctly there were some > crashes and several restarts but eventually all the instances were running > again. It's possible all ZK nodes and all brokers were unavailable at various > points. > The problem I noticed was that two of the brokers were just continually > spitting out messages like: > {quote}% kubectl logs kaf-kafka-0 --tail 10 > [2021-07-13 14:26:08,871] INFO [ProducerStateManager > partition=__transaction_state-0] Loading producer state from snapshot file > 'SnapshotFile(/bitnami/kafka/data/__transaction_state-0/0001.snapshot,1)' > (kafka.log.ProducerStateManager) > [2021-07-13 14:26:08,871] WARN [Log partition=__transaction_state-0, > dir=/bitnami/kafka/data] *Non-monotonic update of high watermark from > (offset=2744 segment=[0:1048644]) to (offset=1 segment=[0:169])* > (kafka.log.Log) > [2021-07-13 14:26:08,874] INFO [Log partition=__transaction_state-10, > dir=/bitnami/kafka/data] Truncating to offset 2 (kafka.log.Log) > [2021-07-13 14:26:08,877] INFO [Log partition=__transaction_state-10, > dir=/bitnami/kafka/data] Loading producer state till offset 2 with message > format version 2 (kafka.log.Log) > [2021-07-13 14:26:08,877] INFO [ProducerStateManager > partition=__transaction_state-10] Loading producer state from snapshot file > 'SnapshotFile(/bitnami/kafka/data/__transaction_state-10/0002.snapshot,2)' > (kafka.log.ProducerStateManager) > [2021-07-13 14:26:08,877] WARN [Log partition=__transaction_state-10, > dir=/bitnami/kafka/data] Non-monotonic update of high watermark from > (offset=2930 segment=[0:1048717]) to (offset=2 segment=[0:338]) > (kafka.log.Log) > [2021-07-13 14:26:08,880] INFO [Log partition=__transaction_state-20, > dir=/bitnami/kafka/data] Truncating to offset 1 (kafka.log.Log) > [2021-07-13 14:26:08,882] INFO [Log partition=__transaction_state-20, > dir=/bitnami/kafka/data] Loading producer state till offset 1 with message > format version 2 (kafka.log.Log) > [2021-07-13 14:26:08,882] INFO [ProducerStateManager > partition=__transaction_state-20] Loading producer state from snapshot file > 'SnapshotFile(/bitnami/kafka/data/__transaction_state-20/0001.snapshot,1)' > (kafka.log.ProducerStateManager) > [2021-07-13 14:26:08,883] WARN [Log partition=__transaction_state-20, > dir=/bitnami/kafka/data] Non-monotonic update of high watermark from > (offset=2956 segment=[0:1048608]) to (offset=1 segment=[0:169]) > (kafka.log.Log) > {quote} > If I describe that topic I can see that several partitions have a leader of 2 > and the ISR is just 2 (NOTE I added two more brokers and tried to reassign > the topic onto brokers 2,3,4 which you can see below). The new brokers also > spit out the messages about "non-monotonic update" just like the original > followers. This describe output is from the following day. > {{% kafka-topics.sh ${=BS} -topic __transaction_state -describe}} > {{Topic: __transaction_state TopicId: i7bBNCeuQMWl-ZMpzrnMAw PartitionCount: > 50 ReplicationFactor: 3 Configs: > compression.type=uncompressed,min.insync.replicas=3,cleanup.policy=compact,flush.ms=1000,segment.bytes=104857600,flush.messages=1,max.message.bytes=112,unclean.leader.election.enable=false,retention.bytes=1073741824}} > {{ Topic: __transaction_state Partition: 0 Leader: 2
[jira] [Commented] (KAFKA-13077) Replication failing after unclean shutdown of ZK and all brokers
[ https://issues.apache.org/jira/browse/KAFKA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461239#comment-17461239 ] Shivakumar commented on KAFKA-13077: hi [~junrao] we did not save the DumpLogSegment log during the incident but we were able to reproduce the error and got this output, it should be the same in the case of our above error kafka [ /var/lib/kafka/data/__consumer_offsets-46 ]$ ls -l total 32 -rw-rw-r-- 1 kafka kafka0 Dec 9 11:40 .index -rw-rw-r-- 1 kafka kafka 877 Dec 9 11:40 .log -rw-rw-r-- 1 kafka kafka 12 Dec 9 11:40 .timeindex -rw-rw-r-- 1 kafka kafka 10485760 Dec 13 13:20 04804743.index -rw-rw-r-- 1 kafka kafka 207 Dec 11 21:01 04804743.log -rw-rw-r-- 1 kafka kafka 10 Dec 11 21:01 04804743.snapshot -rw-rw-r-- 1 kafka kafka 10485756 Dec 13 13:20 04804743.timeindex -rw-rw-r-- 1 kafka kafka 10 Dec 13 13:11 04804745.snapshot -rw-r--r-- 1 kafka kafka 132 Dec 13 13:20 leader-epoch-checkpoint -rw-rw-r-- 1 kafka kafka 43 Dec 1 11:13 partition.metadata kafka [ /var/lib/kafka/data/__consumer_offsets-46 ]$ kafka-run-class.sh kafka.tools.DumpLogSegments --files 04804743.index Dumping 04804743.index offset: 4804743 position: 0 Mismatches in :/var/lib/kafka/data/__consumer_offsets-46/04804743.index Index offset: 4804743, log offset: 4804744 kafka [ /var/lib/kafka/data/__consumer_offsets-46 ]$ kafka-run-class.sh kafka.tools.DumpLogSegments --files 04804743.log Dumping 04804743.log Starting offset: 4804743 baseOffset: 4804743 lastOffset: 4804744 count: 2 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 368 isTransactional: false isControl: false position: 0 CreateTime: 1639256468568 size: 207 magic: 2 compresscodec: NONE crc: 2267717758 isvalid: true kafka [ /var/lib/kafka/data/__consumer_offsets-46 ]$ kafka-run-class.sh kafka.tools.DumpLogSegments --files 04804743.timeindex Dumping 04804743.timeindex timestamp: 1639256468568 offset: 4804744 kafka [ /var/lib/kafka/data/__consumer_offsets-46 ]$ kafka-run-class.sh kafka.tools.DumpLogSegments --files 04804745.snapshot Dumping 04804745.snapshot kafka [ /var/lib/kafka/data/__consumer_offsets-46 ]$ > Replication failing after unclean shutdown of ZK and all brokers > > > Key: KAFKA-13077 > URL: https://issues.apache.org/jira/browse/KAFKA-13077 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Christopher Auston >Priority: Minor > > I am submitting this in the spirit of what can go wrong when an operator > violates the constraints Kafka depends on. I don't know if Kafka could or > should handle this more gracefully. I decided to file this issue because it > was easy to get the problem I'm reporting with Kubernetes StatefulSets (STS). > By "easy" I mean that I did not go out of my way to corrupt anything, I just > was not careful when restarting ZK and brokers. > I violated the constraints of keeping Zookeeper stable and at least one > running in-sync replica. > I am running the bitnami/kafka helm chart on Amazon EKS. > {quote}% kubectl get po kaf-kafka-0 -ojson |jq .spec.containers'[].image' > "docker.io/bitnami/kafka:2.8.0-debian-10-r43" > {quote} > I started with 3 ZK instances and 3 brokers (both STS). I changed the > cpu/memory requests on both STS and kubernetes proceeded to restart ZK and > kafka instances at the same time. If I recall correctly there were some > crashes and several restarts but eventually all the instances were running > again. It's possible all ZK nodes and all brokers were unavailable at various > points. > The problem I noticed was that two of the brokers were just continually > spitting out messages like: > {quote}% kubectl logs kaf-kafka-0 --tail 10 > [2021-07-13 14:26:08,871] INFO [ProducerStateManager > partition=__transaction_state-0] Loading producer state from snapshot file > 'SnapshotFile(/bitnami/kafka/data/__transaction_state-0/0001.snapshot,1)' > (kafka.log.ProducerStateManager) > [2021-07-13 14:26:08,871] WARN [Log partition=__transaction_state-0, > dir=/bitnami/kafka/data] *Non-monotonic update of high watermark from > (offset=2744 segment=[0:1048644]) to (offset=1 segment=[0:169])* > (kafka.log.Log) > [2021-07-13 14:26:08,874] INFO [Log partition=__transaction_state-10, > dir=/bitnami/kafka/data] Truncating to offset 2 (kafka.log.Log) > [2021-07-13 14:26:08,877] INFO [Log partition=__transaction_state-10, > dir=/bitnami/kafka/data] Loading producer state till offset 2 with message > format version 2 (kafka.log.Log)
[jira] [Comment Edited] (KAFKA-13077) Replication failing after unclean shutdown of ZK and all brokers
[ https://issues.apache.org/jira/browse/KAFKA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456515#comment-17456515 ] Shivakumar edited comment on KAFKA-13077 at 12/9/21, 3:44 PM: -- Hi [~junrao] here is the summary of our issue, hope you can help us here Kafka(2.8.1), ZooKeeper(3.6.3) in eks kubernetes 1.19 kafka cluster size = 3 zk cluster size = 3 1) after rolling restart of zk , sometimes all the partitions of the topic become out of sync especially for broker 2, ISR=2 and Leader=2 and other brokers are out of ISR Topic: __consumer_offsets PartitionCount: 50 ReplicationFactor: 3 Configs: compression.type=producer,cleanup.policy=compact,segment.bytes=104857600 Topic: __consumer_offsets Partition: 0 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 1 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 2 Leader: 2 Replicas: 2,0,1 Isr: 2,1,0 Topic: __consumer_offsets Partition: 3 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 4 Leader: 2 Replicas: 1,0,2 Isr: 2 Topic: __consumer_offsets Partition: 5 Leader: 2 Replicas: 2,1,0 Isr: 2 Topic: __consumer_offsets Partition: 6 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 7 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 8 Leader: 2 Replicas: 2,0,1 Isr: 2 Topic: __consumer_offsets Partition: 9 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 10 Leader: 2 Replicas: 1,0,2 Isr: 2 Topic: __consumer_offsets Partition: 11 Leader: 2 Replicas: 2,1,0 Isr: 2 Topic: __consumer_offsets Partition: 12 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 13 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 14 Leader: 2 Replicas: 2,0,1 Isr: 2 Topic: __consumer_offsets Partition: 15 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 16 Leader: 2 Replicas: 1,0,2 Isr: 2 Topic: __consumer_offsets Partition: 17 Leader: 2 Replicas: 2,1,0 Isr: 2 Topic: __consumer_offsets Partition: 18 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 19 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 20 Leader: 2 Replicas: 2,0,1 Isr: 2 Topic: __consumer_offsets Partition: 21 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 22 Leader: 2 Replicas: 1,0,2 Isr: 2 Topic: __consumer_offsets Partition: 23 Leader: 2 Replicas: 2,1,0 Isr: 2 Topic: __consumer_offsets Partition: 24 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 25 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 26 Leader: 2 Replicas: 2,0,1 Isr: 2 Topic: __consumer_offsets Partition: 27 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 28 Leader: 2 Replicas: 1,0,2 Isr: 2 Topic: __consumer_offsets Partition: 29 Leader: 2 Replicas: 2,1,0 Isr: 2,1,0 Topic: __consumer_offsets Partition: 30 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 31 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 32 Leader: 2 Replicas: 2,0,1 Isr: 2 Topic: __consumer_offsets Partition: 33 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 34 Leader: 2 Replicas: 1,0,2 Isr: 2 Topic: __consumer_offsets Partition: 35 Leader: 2 Replicas: 2,1,0 Isr: 2 Topic: __consumer_offsets Partition: 36 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 37 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 38 Leader: 2 Replicas: 2,0,1 Isr: 2 Topic: __consumer_offsets Partition: 39 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 40 Leader: 2 Replicas: 1,0,2 Isr: 2 Topic: __consumer_offsets Partition: 41 Leader: 2 Replicas: 2,1,0 Isr: 2,1,0 Topic: __consumer_offsets Partition: 42 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 43 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 44 Leader: 2 Replicas: 2,0,1 Isr: 2 Topic: __consumer_offsets Partition: 45 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 46 Leader: 2
[jira] [Commented] (KAFKA-13077) Replication failing after unclean shutdown of ZK and all brokers
[ https://issues.apache.org/jira/browse/KAFKA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456515#comment-17456515 ] Shivakumar commented on KAFKA-13077: Hi [~junrao] here is the summary of our issue, hope you can help us here Kafka(2.8.1), ZooKeeper(3.6.3) in eks kubernetes 1.19 kafka cluster size = 3 zk cluster size = 3 1) after rolling restart of zk , sometimes all the partitions of the topic become out of sync especially for broker 2, ISR=2 and Leader=2 and other brokers are out of ISR Topic: __consumer_offsets PartitionCount: 50 ReplicationFactor: 3 Configs: compression.type=producer,cleanup.policy=compact,segment.bytes=104857600 Topic: __consumer_offsets Partition: 0 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 1 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 2 Leader: 2 Replicas: 2,0,1 Isr: 2,1,0 Topic: __consumer_offsets Partition: 3 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 4 Leader: 2 Replicas: 1,0,2 Isr: 2 Topic: __consumer_offsets Partition: 5 Leader: 2 Replicas: 2,1,0 Isr: 2 Topic: __consumer_offsets Partition: 6 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 7 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 8 Leader: 2 Replicas: 2,0,1 Isr: 2 Topic: __consumer_offsets Partition: 9 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 10 Leader: 2 Replicas: 1,0,2 Isr: 2 Topic: __consumer_offsets Partition: 11 Leader: 2 Replicas: 2,1,0 Isr: 2 Topic: __consumer_offsets Partition: 12 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 13 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 14 Leader: 2 Replicas: 2,0,1 Isr: 2 Topic: __consumer_offsets Partition: 15 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 16 Leader: 2 Replicas: 1,0,2 Isr: 2 Topic: __consumer_offsets Partition: 17 Leader: 2 Replicas: 2,1,0 Isr: 2 Topic: __consumer_offsets Partition: 18 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 19 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 20 Leader: 2 Replicas: 2,0,1 Isr: 2 Topic: __consumer_offsets Partition: 21 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 22 Leader: 2 Replicas: 1,0,2 Isr: 2 Topic: __consumer_offsets Partition: 23 Leader: 2 Replicas: 2,1,0 Isr: 2 Topic: __consumer_offsets Partition: 24 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 25 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 26 Leader: 2 Replicas: 2,0,1 Isr: 2 Topic: __consumer_offsets Partition: 27 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 28 Leader: 2 Replicas: 1,0,2 Isr: 2 Topic: __consumer_offsets Partition: 29 Leader: 2 Replicas: 2,1,0 Isr: 2,1,0 Topic: __consumer_offsets Partition: 30 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 31 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 32 Leader: 2 Replicas: 2,0,1 Isr: 2 Topic: __consumer_offsets Partition: 33 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 34 Leader: 2 Replicas: 1,0,2 Isr: 2 Topic: __consumer_offsets Partition: 35 Leader: 2 Replicas: 2,1,0 Isr: 2 Topic: __consumer_offsets Partition: 36 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 37 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 38 Leader: 2 Replicas: 2,0,1 Isr: 2 Topic: __consumer_offsets Partition: 39 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 40 Leader: 2 Replicas: 1,0,2 Isr: 2 Topic: __consumer_offsets Partition: 41 Leader: 2 Replicas: 2,1,0 Isr: 2,1,0 Topic: __consumer_offsets Partition: 42 Leader: 2 Replicas: 0,1,2 Isr: 2 Topic: __consumer_offsets Partition: 43 Leader: 2 Replicas: 1,2,0 Isr: 2 Topic: __consumer_offsets Partition: 44 Leader: 2 Replicas: 2,0,1 Isr: 2 Topic: __consumer_offsets Partition: 45 Leader: 2 Replicas: 0,2,1 Isr: 2 Topic: __consumer_offsets Partition: 46 Leader: 2 Replicas: 1,0,2 Isr: 2 Topic: