[jira] [Resolved] (KAFKA-16757) Fix broker re-registration issues around MV 3.7-IV2
[ https://issues.apache.org/jira/browse/KAFKA-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez resolved KAFKA-16757. - Resolution: Fixed > Fix broker re-registration issues around MV 3.7-IV2 > --- > > Key: KAFKA-16757 > URL: https://issues.apache.org/jira/browse/KAFKA-16757 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > Fix For: 3.7.1, 3.8 > > > When upgrading from a MetadataVersion older than 3.7-IV2, we need to resend > the broker registration, so that the controller can record the storage > directories. The current code for doing this has several problems, however. > One is that it tends to trigger even in cases where we don't actually need > it. Another is that when re-registering the broker, the broker is marked as > fenced. > This PR moves the handling of the re-registration case out of > BrokerMetadataPublisher and into BrokerRegistrationTracker. The > re-registration code there will only trigger in the case where the broker > sees an existing registration for itself with no directories set. This is > much more targetted than the original code. > Additionally, in ClusterControlManager, when re-registering the same broker, > we now preserve its fencing and shutdown state, rather than clearing those. > (There isn't any good reason re-registering the same broker should clear > these things... this was purely an oversight.) Note that we can tell the > broker is "the same" because it has the same IncarnationId. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16645) CVEs in 3.7.0 docker image
[ https://issues.apache.org/jira/browse/KAFKA-16645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez resolved KAFKA-16645. - Resolution: Resolved > CVEs in 3.7.0 docker image > -- > > Key: KAFKA-16645 > URL: https://issues.apache.org/jira/browse/KAFKA-16645 > Project: Kafka > Issue Type: Task >Affects Versions: 3.7.0 >Reporter: Mickael Maison >Assignee: Igor Soarez >Priority: Blocker > Fix For: 3.8.0, 3.7.1 > > > Our [Docker Image CVE > Scanner|https://github.com/apache/kafka/actions/runs/874393] GitHub > action reports 2 high CVEs in our base image: > apache/kafka:3.7.0 (alpine 3.19.1) > == > Total: 2 (HIGH: 2, CRITICAL: 0) > ┌──┬┬──┬┬───┬───┬─┐ > │ Library │ Vulnerability │ Severity │ Status │ Installed Version │ Fixed > Version │Title│ > ├──┼┼──┼┼───┼───┼─┤ > │ libexpat │ CVE-2023-52425 │ HIGH │ fixed │ 2.5.0-r2 │ > 2.6.0-r0 │ expat: parsing large tokens can trigger a denial of service │ > │ ││ ││ │ > │ https://avd.aquasec.com/nvd/cve-2023-52425 │ > │ ├┤ ││ > ├───┼─┤ > │ │ CVE-2024-28757 │ ││ │ > 2.6.2-r0 │ expat: XML Entity Expansion │ > │ ││ ││ │ > │ https://avd.aquasec.com/nvd/cve-2024-28757 │ > └──┴┴──┴┴───┴───┴─┘ > Looking at the > [KIP|https://cwiki.apache.org/confluence/display/KAFKA/KIP-975%3A+Docker+Image+for+Apache+Kafka#KIP975:DockerImageforApacheKafka-WhatifweobserveabugoracriticalCVEinthereleasedApacheKafkaDockerImage?] > that introduced the docker images, it seems we should release a bugfix when > high CVEs are detected. It would be good to investigate and assess whether > Kafka is impacted or not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (KAFKA-16645) CVEs in 3.7.0 docker image
[ https://issues.apache.org/jira/browse/KAFKA-16645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez reopened KAFKA-16645: - Need to re-open to change the resolution, release_notes.py doesn't like the one I picked > CVEs in 3.7.0 docker image > -- > > Key: KAFKA-16645 > URL: https://issues.apache.org/jira/browse/KAFKA-16645 > Project: Kafka > Issue Type: Task >Affects Versions: 3.7.0 >Reporter: Mickael Maison >Assignee: Igor Soarez >Priority: Blocker > Fix For: 3.8.0, 3.7.1 > > > Our [Docker Image CVE > Scanner|https://github.com/apache/kafka/actions/runs/874393] GitHub > action reports 2 high CVEs in our base image: > apache/kafka:3.7.0 (alpine 3.19.1) > == > Total: 2 (HIGH: 2, CRITICAL: 0) > ┌──┬┬──┬┬───┬───┬─┐ > │ Library │ Vulnerability │ Severity │ Status │ Installed Version │ Fixed > Version │Title│ > ├──┼┼──┼┼───┼───┼─┤ > │ libexpat │ CVE-2023-52425 │ HIGH │ fixed │ 2.5.0-r2 │ > 2.6.0-r0 │ expat: parsing large tokens can trigger a denial of service │ > │ ││ ││ │ > │ https://avd.aquasec.com/nvd/cve-2023-52425 │ > │ ├┤ ││ > ├───┼─┤ > │ │ CVE-2024-28757 │ ││ │ > 2.6.2-r0 │ expat: XML Entity Expansion │ > │ ││ ││ │ > │ https://avd.aquasec.com/nvd/cve-2024-28757 │ > └──┴┴──┴┴───┴───┴─┘ > Looking at the > [KIP|https://cwiki.apache.org/confluence/display/KAFKA/KIP-975%3A+Docker+Image+for+Apache+Kafka#KIP975:DockerImageforApacheKafka-WhatifweobserveabugoracriticalCVEinthereleasedApacheKafkaDockerImage?] > that introduced the docker images, it seems we should release a bugfix when > high CVEs are detected. It would be good to investigate and assess whether > Kafka is impacted or not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (KAFKA-16692) InvalidRequestException: ADD_PARTITIONS_TO_TXN with version 4 which is not enabled when upgrading from kafka 3.5 to 3.6
[ https://issues.apache.org/jira/browse/KAFKA-16692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez reopened KAFKA-16692: - Re-opening as 3.6 backport is still missing > InvalidRequestException: ADD_PARTITIONS_TO_TXN with version 4 which is not > enabled when upgrading from kafka 3.5 to 3.6 > > > Key: KAFKA-16692 > URL: https://issues.apache.org/jira/browse/KAFKA-16692 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 3.7.0, 3.6.1, 3.8 >Reporter: Johnson Okorie >Assignee: Justine Olshan >Priority: Major > Fix For: 3.7.1, 3.8 > > > We have a kafka cluster running on version 3.5.2 that we are upgrading to > 3.6.1. This cluster has a lot of clients with exactly one semantics enabled > and hence creating transactions. As we replaced brokers with the new > binaries, we observed lots of clients in the cluster experiencing the > following error: > {code:java} > 2024-05-07T09:08:10.039Z "tid": "" -- [Producer clientId=, > transactionalId=] Got error produce response with > correlation id 6402937 on topic-partition , retrying > (2147483512 attempts left). Error: NETWORK_EXCEPTION. Error Message: The > server disconnected before a response was received.{code} > On inspecting the broker, we saw the following errors on brokers still > running Kafka version 3.5.2: > > {code:java} > message: > Closing socket for because of error > exception_exception_class: > org.apache.kafka.common.errors.InvalidRequestException > exception_exception_message: > Received request api key ADD_PARTITIONS_TO_TXN with version 4 which is not > enabled > exception_stacktrace: > org.apache.kafka.common.errors.InvalidRequestException: Received request api > key ADD_PARTITIONS_TO_TXN with version 4 which is not enabled > {code} > On the new brokers running 3.6.1 we saw the following errors: > > {code:java} > [AddPartitionsToTxnSenderThread-1055]: AddPartitionsToTxnRequest failed for > node 1043 with a network exception.{code} > > I can also see this : > {code:java} > [AddPartitionsToTxnManager broker=1055]Cancelled in-flight > ADD_PARTITIONS_TO_TXN request with correlation id 21120 due to node 1043 > being disconnected (elapsed time since creation: 11ms, elapsed time since > send: 4ms, request timeout: 3ms){code} > We started investigating this issue and digging through the changes in 3.6, > we came across some changes introduced as part of KAFKA-14402 that we thought > might lead to this behaviour. > First we could see that _transaction.partition.verification.enable_ is > enabled by default and enables a new code path that culminates in we sending > version 4 ADD_PARTITIONS_TO_TXN requests to other brokers that are generated > [here|https://github.com/apache/kafka/blob/29f3260a9c07e654a28620aeb93a778622a5233d/core/src/main/scala/kafka/server/AddPartitionsToTxnManager.scala#L269]. > From a > [discussion|https://lists.apache.org/thread/4895wrd1z92kjb708zck4s1f62xq6r8x] > on the mailing list, [~jolshan] pointed out that this scenario shouldn't be > possible as the following code paths should prevent version 4 > ADD_PARTITIONS_TO_TXN requests being sent to other brokers: > [https://github.com/apache/kafka/blob/525b9b1d7682ae2a527ceca83fedca44b1cba11a/clients/src/main/java/org/apache/kafka/clients/NodeApiVersions.java#L130] > > [https://github.com/apache/kafka/blob/525b9b1d7682ae2a527ceca83fedca44b1cba11a/core/src/main/scala/kafka/server/AddPartitionsToTxnManager.scala#L195] > However, these requests are still sent to other brokers in our environment. > On further inspection of the code, I am wondering if the following code path > could lead to this issue: > [https://github.com/apache/kafka/blob/c4deed513057c94eb502e64490d6bdc23551d8b6/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L500] > In this scenario, we don't have any _NodeApiVersions_ available for the > specified nodeId and potentially skipping the _latestUsableVersion_ check. I > am wondering if it is possible that because _discoverBrokerVersions_ is set > to _false_ for the network client of the {_}AddPartitionsToTxnManager{_}, it > skips fetching {_}NodeApiVersions{_}? I can see that we create the network > client here: > [https://github.com/apache/kafka/blob/c4deed513057c94eb502e64490d6bdc23551d8b6/core/src/main/scala/kafka/server/KafkaServer.scala#L641] > The _NetworkUtils.buildNetworkClient_ method seems to create a network client > that has _discoverBrokerVersions_ set to {_}false{_}. > I was hoping I could get some assistance debugging this issue. Happy to > provide any additional information needed. > > > -- This message was sent by Atlassian Jira
[jira] [Resolved] (KAFKA-16692) InvalidRequestException: ADD_PARTITIONS_TO_TXN with version 4 which is not enabled when upgrading from kafka 3.5 to 3.6
[ https://issues.apache.org/jira/browse/KAFKA-16692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez resolved KAFKA-16692. - Resolution: Fixed > InvalidRequestException: ADD_PARTITIONS_TO_TXN with version 4 which is not > enabled when upgrading from kafka 3.5 to 3.6 > > > Key: KAFKA-16692 > URL: https://issues.apache.org/jira/browse/KAFKA-16692 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 3.7.0, 3.6.1, 3.8 >Reporter: Johnson Okorie >Assignee: Justine Olshan >Priority: Major > Fix For: 3.7.1, 3.8 > > > We have a kafka cluster running on version 3.5.2 that we are upgrading to > 3.6.1. This cluster has a lot of clients with exactly one semantics enabled > and hence creating transactions. As we replaced brokers with the new > binaries, we observed lots of clients in the cluster experiencing the > following error: > {code:java} > 2024-05-07T09:08:10.039Z "tid": "" -- [Producer clientId=, > transactionalId=] Got error produce response with > correlation id 6402937 on topic-partition , retrying > (2147483512 attempts left). Error: NETWORK_EXCEPTION. Error Message: The > server disconnected before a response was received.{code} > On inspecting the broker, we saw the following errors on brokers still > running Kafka version 3.5.2: > > {code:java} > message: > Closing socket for because of error > exception_exception_class: > org.apache.kafka.common.errors.InvalidRequestException > exception_exception_message: > Received request api key ADD_PARTITIONS_TO_TXN with version 4 which is not > enabled > exception_stacktrace: > org.apache.kafka.common.errors.InvalidRequestException: Received request api > key ADD_PARTITIONS_TO_TXN with version 4 which is not enabled > {code} > On the new brokers running 3.6.1 we saw the following errors: > > {code:java} > [AddPartitionsToTxnSenderThread-1055]: AddPartitionsToTxnRequest failed for > node 1043 with a network exception.{code} > > I can also see this : > {code:java} > [AddPartitionsToTxnManager broker=1055]Cancelled in-flight > ADD_PARTITIONS_TO_TXN request with correlation id 21120 due to node 1043 > being disconnected (elapsed time since creation: 11ms, elapsed time since > send: 4ms, request timeout: 3ms){code} > We started investigating this issue and digging through the changes in 3.6, > we came across some changes introduced as part of KAFKA-14402 that we thought > might lead to this behaviour. > First we could see that _transaction.partition.verification.enable_ is > enabled by default and enables a new code path that culminates in we sending > version 4 ADD_PARTITIONS_TO_TXN requests to other brokers that are generated > [here|https://github.com/apache/kafka/blob/29f3260a9c07e654a28620aeb93a778622a5233d/core/src/main/scala/kafka/server/AddPartitionsToTxnManager.scala#L269]. > From a > [discussion|https://lists.apache.org/thread/4895wrd1z92kjb708zck4s1f62xq6r8x] > on the mailing list, [~jolshan] pointed out that this scenario shouldn't be > possible as the following code paths should prevent version 4 > ADD_PARTITIONS_TO_TXN requests being sent to other brokers: > [https://github.com/apache/kafka/blob/525b9b1d7682ae2a527ceca83fedca44b1cba11a/clients/src/main/java/org/apache/kafka/clients/NodeApiVersions.java#L130] > > [https://github.com/apache/kafka/blob/525b9b1d7682ae2a527ceca83fedca44b1cba11a/core/src/main/scala/kafka/server/AddPartitionsToTxnManager.scala#L195] > However, these requests are still sent to other brokers in our environment. > On further inspection of the code, I am wondering if the following code path > could lead to this issue: > [https://github.com/apache/kafka/blob/c4deed513057c94eb502e64490d6bdc23551d8b6/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L500] > In this scenario, we don't have any _NodeApiVersions_ available for the > specified nodeId and potentially skipping the _latestUsableVersion_ check. I > am wondering if it is possible that because _discoverBrokerVersions_ is set > to _false_ for the network client of the {_}AddPartitionsToTxnManager{_}, it > skips fetching {_}NodeApiVersions{_}? I can see that we create the network > client here: > [https://github.com/apache/kafka/blob/c4deed513057c94eb502e64490d6bdc23551d8b6/core/src/main/scala/kafka/server/KafkaServer.scala#L641] > The _NetworkUtils.buildNetworkClient_ method seems to create a network client > that has _discoverBrokerVersions_ set to {_}false{_}. > I was hoping I could get some assistance debugging this issue. Happy to > provide any additional information needed. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16645) CVEs in 3.7.0 docker image
[ https://issues.apache.org/jira/browse/KAFKA-16645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez resolved KAFKA-16645. - Assignee: Igor Soarez Resolution: Won't Fix The vulnerability has already been addressed in the base image, under the same image tag, so the next published Kafka images will not contain ship the vulnerability. We do not republish previous releases, so we're not taking any action here. > CVEs in 3.7.0 docker image > -- > > Key: KAFKA-16645 > URL: https://issues.apache.org/jira/browse/KAFKA-16645 > Project: Kafka > Issue Type: Task >Affects Versions: 3.7.0 >Reporter: Mickael Maison >Assignee: Igor Soarez >Priority: Blocker > Fix For: 3.8.0, 3.7.1 > > > Our [Docker Image CVE > Scanner|https://github.com/apache/kafka/actions/runs/874393] GitHub > action reports 2 high CVEs in our base image: > apache/kafka:3.7.0 (alpine 3.19.1) > == > Total: 2 (HIGH: 2, CRITICAL: 0) > ┌──┬┬──┬┬───┬───┬─┐ > │ Library │ Vulnerability │ Severity │ Status │ Installed Version │ Fixed > Version │Title│ > ├──┼┼──┼┼───┼───┼─┤ > │ libexpat │ CVE-2023-52425 │ HIGH │ fixed │ 2.5.0-r2 │ > 2.6.0-r0 │ expat: parsing large tokens can trigger a denial of service │ > │ ││ ││ │ > │ https://avd.aquasec.com/nvd/cve-2023-52425 │ > │ ├┤ ││ > ├───┼─┤ > │ │ CVE-2024-28757 │ ││ │ > 2.6.2-r0 │ expat: XML Entity Expansion │ > │ ││ ││ │ > │ https://avd.aquasec.com/nvd/cve-2024-28757 │ > └──┴┴──┴┴───┴───┴─┘ > Looking at the > [KIP|https://cwiki.apache.org/confluence/display/KAFKA/KIP-975%3A+Docker+Image+for+Apache+Kafka#KIP975:DockerImageforApacheKafka-WhatifweobserveabugoracriticalCVEinthereleasedApacheKafkaDockerImage?] > that introduced the docker images, it seems we should release a bugfix when > high CVEs are detected. It would be good to investigate and assess whether > Kafka is impacted or not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16688) SystemTimer leaks resources on close
[ https://issues.apache.org/jira/browse/KAFKA-16688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez resolved KAFKA-16688. - Resolution: Fixed > SystemTimer leaks resources on close > > > Key: KAFKA-16688 > URL: https://issues.apache.org/jira/browse/KAFKA-16688 > Project: Kafka > Issue Type: Test >Affects Versions: 3.8.0 >Reporter: Gaurav Narula >Assignee: Gaurav Narula >Priority: Major > > We observe some thread leaks with thread name {{executor-client-metrics}}. > This may happen because {{SystemTimer}} doesn't attempt to shutdown its > executor service properly. > Refer: > https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15885/1/tests > and tests with {{initializationError}} in them for stacktrace -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16624) Don't generate useless PartitionChangeRecord on older MV
[ https://issues.apache.org/jira/browse/KAFKA-16624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez resolved KAFKA-16624. - Resolution: Fixed > Don't generate useless PartitionChangeRecord on older MV > > > Key: KAFKA-16624 > URL: https://issues.apache.org/jira/browse/KAFKA-16624 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Minor > > Fix a case where we could generate useless PartitionChangeRecords on metadata > versions older than 3.6-IV0. This could happen in the case where we had an > ISR with only one broker in it, and we were trying to go down to a fully > empty ISR. In this case, PartitionChangeBuilder would block the record to > going down to a fully empty ISR (since that is not valid in these pre-KIP-966 > metadata versions), but it would still emit the record, even though it had no > effect. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16636) Flaky test - testStickyTaskAssignorLargePartitionCount – org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest
Igor Soarez created KAFKA-16636: --- Summary: Flaky test - testStickyTaskAssignorLargePartitionCount – org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest Key: KAFKA-16636 URL: https://issues.apache.org/jira/browse/KAFKA-16636 Project: Kafka Issue Type: Test Reporter: Igor Soarez Attachments: log (1).txt testStickyTaskAssignorLargePartitionCount – org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest {code:java} java.lang.AssertionError: The first assignment took too long to complete at 131680ms. at org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest.completeLargeAssignment(StreamsAssignmentScaleTest.java:220) at org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest.testStickyTaskAssignorLargePartitionCount(StreamsAssignmentScaleTest.java:102) {code} [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15816/1/tests/] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16635) Flaky test "shouldThrottleOldSegments(String).quorum=kraft" – kafka.server.ReplicationQuotasTest
Igor Soarez created KAFKA-16635: --- Summary: Flaky test "shouldThrottleOldSegments(String).quorum=kraft" – kafka.server.ReplicationQuotasTest Key: KAFKA-16635 URL: https://issues.apache.org/jira/browse/KAFKA-16635 Project: Kafka Issue Type: Test Reporter: Igor Soarez "shouldThrottleOldSegments(String).quorum=kraft" – kafka.server.ReplicationQuotasTest {code:java} org.opentest4j.AssertionFailedError: Throttled replication of 2203ms should be > 3600.0ms ==> expected: but was: at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214)at app//kafka.server.ReplicationQuotasTest.shouldThrottleOldSegments(ReplicationQuotasTest.scala:260) {code} https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15816/1/tests/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16634) Flaky test - testFenceMultipleBrokers() – org.apache.kafka.controller.QuorumControllerTest
Igor Soarez created KAFKA-16634: --- Summary: Flaky test - testFenceMultipleBrokers() – org.apache.kafka.controller.QuorumControllerTest Key: KAFKA-16634 URL: https://issues.apache.org/jira/browse/KAFKA-16634 Project: Kafka Issue Type: Test Reporter: Igor Soarez Attachments: output.txt testFenceMultipleBrokers() – org.apache.kafka.controller.QuorumControllerTest Error: {code:java} java.util.concurrent.TimeoutException: testFenceMultipleBrokers() timed out after 40 seconds {code} Test logs in attached output.txt https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15816/1/tests/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16633) Flaky test - testDescribeExistingGroupWithNoMembers(String, String).quorum=kraft+kip848.groupProtocol=consumer – org.apache.kafka.tools.consumer.group.DescribeConsumerGr
Igor Soarez created KAFKA-16633: --- Summary: Flaky test - testDescribeExistingGroupWithNoMembers(String, String).quorum=kraft+kip848.groupProtocol=consumer – org.apache.kafka.tools.consumer.group.DescribeConsumerGroupTest Key: KAFKA-16633 URL: https://issues.apache.org/jira/browse/KAFKA-16633 Project: Kafka Issue Type: Test Reporter: Igor Soarez testDescribeExistingGroupWithNoMembers(String, String).quorum=kraft+kip848.groupProtocol=consumer – org.apache.kafka.tools.consumer.group.DescribeConsumerGroupTest {code:java} org.opentest4j.AssertionFailedError: Condition not met within timeout 15000. Expected no active member in describe group results with describe type --offsets ==> expected: but was: at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214)at app//org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:396) at app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:444) at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:393) at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:377) at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:350) at app//org.apache.kafka.tools.consumer.group.DescribeConsumerGroupTest.testDescribeExistingGroupWithNoMembers(DescribeConsumerGroupTest.java:430) {code} https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15816/1/tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16632) Flaky test testDeleteOffsetsOfStableConsumerGroupWithTopicPartition [1] Type=Raft-Isolated, MetadataVersion=3.8-IV0, Security=PLAINTEXT – org.apache.kafka.tools.consumer
Igor Soarez created KAFKA-16632: --- Summary: Flaky test testDeleteOffsetsOfStableConsumerGroupWithTopicPartition [1] Type=Raft-Isolated, MetadataVersion=3.8-IV0, Security=PLAINTEXT – org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest Key: KAFKA-16632 URL: https://issues.apache.org/jira/browse/KAFKA-16632 Project: Kafka Issue Type: Test Reporter: Igor Soarez testDeleteOffsetsOfStableConsumerGroupWithTopicPartition [1] Type=Raft-Isolated, MetadataVersion=3.8-IV0, Security=PLAINTEXT – org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest {code:java} org.opentest4j.AssertionFailedError: expected: not equal but was: <0> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:152) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertNotEquals.failEqual(AssertNotEquals.java:277) at app//org.junit.jupiter.api.AssertNotEquals.assertNotEquals(AssertNotEquals.java:94) at app//org.junit.jupiter.api.AssertNotEquals.assertNotEquals(AssertNotEquals.java:86) at app//org.junit.jupiter.api.Assertions.assertNotEquals(Assertions.java:1981) at app//org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest.withConsumerGroup(DeleteOffsetsConsumerGroupCommandIntegrationTest.java:213) at app//org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest.testWithConsumerGroup(DeleteOffsetsConsumerGroupCommandIntegrationTest.java:179) at app//org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest.testDeleteOffsetsOfStableConsumerGroupWithTopicPartition(DeleteOffsetsConsumerGroupCommandIntegrationTest.java:96) {code} https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15816/1/tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16631) Flaky test - testDeleteOffsetsOfStableConsumerGroupWithTopicOnly [1] Type=Raft-Isolated, MetadataVersion=3.8-IV0, Security=PLAINTEXT – org.apache.kafka.tools.consumer.gr
Igor Soarez created KAFKA-16631: --- Summary: Flaky test - testDeleteOffsetsOfStableConsumerGroupWithTopicOnly [1] Type=Raft-Isolated, MetadataVersion=3.8-IV0, Security=PLAINTEXT – org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest Key: KAFKA-16631 URL: https://issues.apache.org/jira/browse/KAFKA-16631 Project: Kafka Issue Type: Test Reporter: Igor Soarez testDeleteOffsetsOfStableConsumerGroupWithTopicOnly [1] Type=Raft-Isolated, MetadataVersion=3.8-IV0, Security=PLAINTEXT – org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest {code:java} org.opentest4j.AssertionFailedError: expected: not equal but was: <0> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:152) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertNotEquals.failEqual(AssertNotEquals.java:277) at app//org.junit.jupiter.api.AssertNotEquals.assertNotEquals(AssertNotEquals.java:94) at app//org.junit.jupiter.api.AssertNotEquals.assertNotEquals(AssertNotEquals.java:86) at app//org.junit.jupiter.api.Assertions.assertNotEquals(Assertions.java:1981) at app//org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest.withConsumerGroup(DeleteOffsetsConsumerGroupCommandIntegrationTest.java:213) at app//org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest.testWithConsumerGroup(DeleteOffsetsConsumerGroupCommandIntegrationTest.java:179) at app//org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest.testDeleteOffsetsOfStableConsumerGroupWithTopicOnly(DeleteOffsetsConsumerGroupCommandIntegrationTest.java:105) {code} https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15816/1/tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16630) Flaky test "testPollReturnsRecords(GroupProtocol).groupProtocol=CLASSIC" – org.apache.kafka.clients.consumer.KafkaConsumerTest
Igor Soarez created KAFKA-16630: --- Summary: Flaky test "testPollReturnsRecords(GroupProtocol).groupProtocol=CLASSIC" – org.apache.kafka.clients.consumer.KafkaConsumerTest Key: KAFKA-16630 URL: https://issues.apache.org/jira/browse/KAFKA-16630 Project: Kafka Issue Type: Test Reporter: Igor Soarez "testPollReturnsRecords(GroupProtocol).groupProtocol=CLASSIC" – org.apache.kafka.clients.consumer.KafkaConsumerTest {code:java} org.opentest4j.AssertionFailedError: expected: <0> but was: <5> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145) at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:531) at app//org.apache.kafka.clients.consumer.KafkaConsumerTest.testPollReturnsRecords(KafkaConsumerTest.java:289) {code} [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15816/1/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16610) Replace "Map#entrySet#forEach" by "Map#forEach"
[ https://issues.apache.org/jira/browse/KAFKA-16610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez resolved KAFKA-16610. - Resolution: Resolved > Replace "Map#entrySet#forEach" by "Map#forEach" > --- > > Key: KAFKA-16610 > URL: https://issues.apache.org/jira/browse/KAFKA-16610 > Project: Kafka > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: TengYao Chi >Priority: Minor > Fix For: 3.8.0 > > > {quote} > Targets > Occurrences of 'entrySet().forEach' in Project > Found occurrences in Project (16 usages found) > Unclassified (16 usages found) > kafka.core.main (9 usages found) > kafka.server (4 usages found) > ControllerApis.scala (2 usages found) > ControllerApis (2 usages found) > handleIncrementalAlterConfigs (1 usage found) > 774 controllerResults.entrySet().forEach(entry => > response.responses().add( > handleLegacyAlterConfigs (1 usage found) > 533 controllerResults.entrySet().forEach(entry => > response.responses().add( > ControllerConfigurationValidator.scala (2 usages found) > ControllerConfigurationValidator (2 usages found) > validate (2 usages found) > 99 config.entrySet().forEach(e => { > 114 config.entrySet().forEach(e => > properties.setProperty(e.getKey, e.getValue)) > kafka.server.metadata (5 usages found) > AclPublisher.scala (1 usage found) > AclPublisher (1 usage found) > onMetadataUpdate (1 usage found) > 73 aclsDelta.changes().entrySet().forEach(e => > ClientQuotaMetadataManager.scala (3 usages found) > ClientQuotaMetadataManager (3 usages found) > handleIpQuota (1 usage found) > 119 quotaDelta.changes().entrySet().forEach { e => > update (2 usages found) > 54 quotasDelta.changes().entrySet().forEach { e => > 99 quotaDelta.changes().entrySet().forEach { e => > KRaftMetadataCache.scala (1 usage found) > KRaftMetadataCache (1 usage found) > getClusterMetadata (1 usage found) > 491 topic.partitions().entrySet().forEach { entry > => > kafka.core.test (1 usage found) > unit.kafka.integration (1 usage found) > KafkaServerTestHarness.scala (1 usage found) > KafkaServerTestHarness (1 usage found) > getTopicNames (1 usage found) > 349 > controllerServer.controller.findAllTopicIds(ANONYMOUS_CONTEXT).get().entrySet().forEach > { > kafka.metadata.main (3 usages found) > org.apache.kafka.controller (2 usages found) > QuorumFeatures.java (1 usage found) > toString() (1 usage found) > 144 localSupportedFeatures.entrySet().forEach(f -> > features.add(f.getKey() + ": " + f.getValue())); > ReplicationControlManager.java (1 usage found) > createTopic(ControllerRequestContext, CreatableTopic, > List, Map, > List, boolean) (1 usage found) > 732 newParts.entrySet().forEach(e -> > assignments.put(e.getKey(), > org.apache.kafka.metadata.properties (1 usage found) > MetaPropertiesEnsemble.java (1 usage found) > toString() (1 usage found) > 610 logDirProps.entrySet().forEach( > kafka.metadata.test (1 usage found) > org.apache.kafka.controller (1 usage found) > ReplicationControlManagerTest.java (1 usage found) > createTestTopic(String, int[][], Map, > short) (1 usage found) > 307 configs.entrySet().forEach(e -> > topic.configs().add( > kafka.streams.main (1 usage found) > org.apache.kafka.streams.processor.internals (1 usage found) > StreamsMetadataState.java (1 usage found) > onChange(Map>, > Map>, Map) (1 > usage found) > 317 topicPartitionInfo.entrySet().forEach(entry -> > this.partitionsByTopic > kafka.tools.main (1 usage found) > org.apache.kafka.tools (1 usage found) > LeaderElectionCommand.java (1 usage found) > electLeaders(Admin, ElectionType, >
[jira] [Reopened] (KAFKA-16606) JBOD support in KRaft does not seem to be gated by the metadata version
[ https://issues.apache.org/jira/browse/KAFKA-16606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez reopened KAFKA-16606: - Assignee: Igor Soarez > JBOD support in KRaft does not seem to be gated by the metadata version > --- > > Key: KAFKA-16606 > URL: https://issues.apache.org/jira/browse/KAFKA-16606 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Jakub Scholz >Assignee: Igor Soarez >Priority: Major > > JBOD support in KRaft should be supported since Kafka 3.7.0. The Kafka > [source > code|https://github.com/apache/kafka/blob/1b301b30207ed8fca9f0aea5cf940b0353a1abca/server-common/src/main/java/org/apache/kafka/server/common/MetadataVersion.java#L194-L195] > suggests that it is supported with the metadata version {{{}3.7-IV2{}}}. > However, it seems to be possible to run KRaft cluster with JBOD even with > older metadata versions such as {{{}3.6{}}}. For example, I have a cluster > using the {{3.6}} metadata version: > {code:java} > bin/kafka-features.sh --bootstrap-server localhost:9092 describe > Feature: metadata.version SupportedMinVersion: 3.0-IV1 > SupportedMaxVersion: 3.7-IV4 FinalizedVersionLevel: 3.6-IV2 Epoch: 1375 > {code} > Yet a KRaft cluster with JBOD seems to run fine: > {code:java} > bin/kafka-log-dirs.sh --bootstrap-server localhost:9092 --describe > Querying brokers for log directories information > Received log directory information from brokers 2000,3000,1000 >
[jira] [Created] (KAFKA-16602) Flaky test – org.apache.kafka.controller.QuorumControllerTest.testBootstrapZkMigrationRecord()
Igor Soarez created KAFKA-16602: --- Summary: Flaky test – org.apache.kafka.controller.QuorumControllerTest.testBootstrapZkMigrationRecord() Key: KAFKA-16602 URL: https://issues.apache.org/jira/browse/KAFKA-16602 Project: Kafka Issue Type: Test Components: controller Reporter: Igor Soarez org.apache.kafka.controller.QuorumControllerTest.testBootstrapZkMigrationRecord() failed with: h4. Error {code:java} org.apache.kafka.server.fault.FaultHandlerException: fatalFaultHandler: exception while renouncing leadership: Attempt to resign from epoch 1 which is larger than the current epoch 0{code} h4. Stacktrace {code:java} org.apache.kafka.server.fault.FaultHandlerException: fatalFaultHandler: exception while renouncing leadership: Attempt to resign from epoch 1 which is larger than the current epoch 0 at app//org.apache.kafka.metalog.LocalLogManager.resign(LocalLogManager.java:808) at app//org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1270) at app//org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:547) at app//org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:179) at app//org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:881) at app//org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871) at app//org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:149) at app//org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:138) at app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:211) at app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:182) at java.base@11.0.16.1/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.IllegalArgumentException: Attempt to resign from epoch 1 which is larger than the current epoch 0{code} Source: https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15701/3/tests/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16601) Flaky test – org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest.testClosingQuorumControllerClosesMetrics()
Igor Soarez created KAFKA-16601: --- Summary: Flaky test – org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest.testClosingQuorumControllerClosesMetrics() Key: KAFKA-16601 URL: https://issues.apache.org/jira/browse/KAFKA-16601 Project: Kafka Issue Type: Test Components: controller Reporter: Igor Soarez org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest.testClosingQuorumControllerClosesMetrics() failed with: h4. Error {code:java} org.apache.kafka.server.fault.FaultHandlerException: fatalFaultHandler: exception while renouncing leadership: Attempt to resign from epoch 1 which is larger than the current epoch 0{code} h4. Stacktrace {code:java} org.apache.kafka.server.fault.FaultHandlerException: fatalFaultHandler: exception while renouncing leadership: Attempt to resign from epoch 1 which is larger than the current epoch 0 at app//org.apache.kafka.metalog.LocalLogManager.resign(LocalLogManager.java:808) at app//org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1270) at app//org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:547) at app//org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:179) at app//org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:881) at app//org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871) at app//org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:149) at app//org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:138) at app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:211) at app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:182) at java.base@11.0.16.1/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.IllegalArgumentException: Attempt to resign from epoch 1 which is larger than the current epoch 0{code} Source: https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15701/3/tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16597) Flaky test - org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQuerySpecificStalePartitionStoresMultiStreamThreads()
Igor Soarez created KAFKA-16597: --- Summary: Flaky test - org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQuerySpecificStalePartitionStoresMultiStreamThreads() Key: KAFKA-16597 URL: https://issues.apache.org/jira/browse/KAFKA-16597 Project: Kafka Issue Type: Test Components: streams Reporter: Igor Soarez org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQuerySpecificStalePartitionStoresMultiStreamThreads() failed with: {code:java} Error org.apache.kafka.streams.errors.InvalidStateStorePartitionException: The specified partition 1 for store source-table does not exist. Stacktrace org.apache.kafka.streams.errors.InvalidStateStorePartitionException: The specified partition 1 for store source-table does not exist. at app//org.apache.kafka.streams.state.internals.WrappingStoreProvider.stores(WrappingStoreProvider.java:63) at app//org.apache.kafka.streams.state.internals.CompositeReadOnlyKeyValueStore.get(CompositeReadOnlyKeyValueStore.java:53) at app//org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQuerySpecificStalePartitionStoresMultiStreamThreads(StoreQueryIntegrationTest.java:411) {code} https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15701/2/tests/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16596) Flaky test – org.apache.kafka.clients.ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup()
Igor Soarez created KAFKA-16596: --- Summary: Flaky test – org.apache.kafka.clients.ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup() Key: KAFKA-16596 URL: https://issues.apache.org/jira/browse/KAFKA-16596 Project: Kafka Issue Type: Test Reporter: Igor Soarez org.apache.kafka.clients.ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup() failed in the following way: {code:java} org.opentest4j.AssertionFailedError: Unexpected addresses [93.184.215.14, 2606:2800:21f:cb07:6820:80da:af6b:8b2c] ==> expected: but was: at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214)at app//org.apache.kafka.clients.ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup(ClientUtilsTest.java:65) {code} As a result of the following assertions: {code:java} // With lookup of example.com, either one or two addresses are expected depending on // whether ipv4 and ipv6 are enabled List validatedAddresses = checkWithLookup(asList("example.com:1")); assertTrue(validatedAddresses.size() >= 1, "Unexpected addresses " + validatedAddresses); List validatedHostNames = validatedAddresses.stream().map(InetSocketAddress::getHostName) .collect(Collectors.toList()); List expectedHostNames = asList("93.184.216.34", "2606:2800:220:1:248:1893:25c8:1946"); {code} It seems that the DNS result has changed for example.com. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (KAFKA-15793) Flaky test ZkMigrationIntegrationTest.testMigrateTopicDeletions
[ https://issues.apache.org/jira/browse/KAFKA-15793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez reopened KAFKA-15793: - This has come up again: {code:java} [2024-04-09T21:06:17.307Z] Gradle Test Run :core:test > Gradle Test Executor 97 > DeleteTopicsRequestTest > testTopicDeletionClusterHasOfflinePartitions(String) > "testTopicDeletionClusterHasOfflinePartitions(String).quorum=zk" STARTED [2024-04-09T21:06:17.307Z] kafka.zk.ZkMigrationIntegrationTest.testMigrateTopicDeletions(ClusterInstance)[7] failed, log available in /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-15522/core/build/reports/testOutput/kafka.zk.ZkMigrationIntegrationTest.testMigrateTopicDeletions(ClusterInstance)[7].test.stdout [2024-04-09T21:06:17.307Z] [2024-04-09T21:06:17.307Z] Gradle Test Run :core:test > Gradle Test Executor 96 > ZkMigrationIntegrationTest > testMigrateTopicDeletions(ClusterInstance) > testMigrateTopicDeletions [7] Type=ZK, MetadataVersion=3.7-IV4, Security=PLAINTEXT FAILED [2024-04-09T21:06:17.307Z] org.apache.kafka.server.fault.FaultHandlerException: nonFatalFaultHandler: Unhandled error in MetadataChangeEvent: Check op on KRaft Migration ZNode failed. Expected zkVersion = 5. This indicates that another KRaft controller is making writes to ZooKeeper. [2024-04-09T21:06:17.307Z] at app//kafka.zk.KafkaZkClient.handleUnwrappedMigrationResult$1(KafkaZkClient.scala:2001) [2024-04-09T21:06:17.307Z] at app//kafka.zk.KafkaZkClient.unwrapMigrationResponse$1(KafkaZkClient.scala:2027) [2024-04-09T21:06:17.307Z] at app//kafka.zk.KafkaZkClient.$anonfun$retryMigrationRequestsUntilConnected$2(KafkaZkClient.scala:2052) [2024-04-09T21:06:17.307Z] at app//scala.collection.StrictOptimizedIterableOps.map(StrictOptimizedIterableOps.scala:100) [2024-04-09T21:06:17.307Z] at app//scala.collection.StrictOptimizedIterableOps.map$(StrictOptimizedIterableOps.scala:87) [2024-04-09T21:06:17.307Z] at app//scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:43) [2024-04-09T21:06:17.307Z] at app//kafka.zk.KafkaZkClient.retryMigrationRequestsUntilConnected(KafkaZkClient.scala:2052) [2024-04-09T21:06:17.307Z] at app//kafka.zk.migration.ZkTopicMigrationClient.$anonfun$updateTopicPartitions$1(ZkTopicMigrationClient.scala:265) [2024-04-09T21:06:17.307Z] at app//kafka.zk.migration.ZkTopicMigrationClient.updateTopicPartitions(ZkTopicMigrationClient.scala:255) [2024-04-09T21:06:17.307Z] at app//org.apache.kafka.metadata.migration.KRaftMigrationZkWriter.lambda$handleTopicsDelta$20(KRaftMigrationZkWriter.java:334) [2024-04-09T21:06:17.307Z] at app//org.apache.kafka.metadata.migration.KRaftMigrationDriver.applyMigrationOperation(KRaftMigrationDriver.java:248) [2024-04-09T21:06:17.307Z] at app//org.apache.kafka.metadata.migration.KRaftMigrationDriver.access$300(KRaftMigrationDriver.java:62) [2024-04-09T21:06:17.307Z] at app//org.apache.kafka.metadata.migration.KRaftMigrationDriver$MetadataChangeEvent.lambda$run$1(KRaftMigrationDriver.java:532) [2024-04-09T21:06:17.307Z] at app//org.apache.kafka.metadata.migration.KRaftMigrationDriver.lambda$countingOperationConsumer$6(KRaftMigrationDriver.java:845) [2024-04-09T21:06:17.307Z] at app//org.apache.kafka.metadata.migration.KRaftMigrationZkWriter.lambda$handleTopicsDelta$21(KRaftMigrationZkWriter.java:331) [2024-04-09T21:06:17.307Z] at java.base@17.0.7/java.util.HashMap.forEach(HashMap.java:1421) [2024-04-09T21:06:17.307Z] at app//org.apache.kafka.metadata.migration.KRaftMigrationZkWriter.handleTopicsDelta(KRaftMigrationZkWriter.java:297) [2024-04-09T21:06:17.307Z] at app//org.apache.kafka.metadata.migration.KRaftMigrationZkWriter.handleDelta(KRaftMigrationZkWriter.java:112) [2024-04-09T21:06:17.307Z] at app//org.apache.kafka.metadata.migration.KRaftMigrationDriver$MetadataChangeEvent.run(KRaftMigrationDriver.java:531) [2024-04-09T21:06:17.307Z] at app//org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:128) [2024-04-09T21:06:17.307Z] at app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:211) [2024-04-09T21:06:17.307Z] at app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:182) [2024-04-09T21:06:17.307Z] at java.base@17.0.7/java.lang.Thread.run(Thread.java:833) [2024-04-09T21:06:17.307Z] [2024-04-09T21:06:17.307Z] Caused by: [2024-04-09T21:06:17.307Z] java.lang.RuntimeException: Check op on KRaft Migration ZNode failed. Expected zkVersion = 5. This indicates that another KRaft controller is making writes to ZooKeeper. [2024-04-09T21:06:17.307Z] at kafka.zk.KafkaZkClient.handleUnwrappedMigrationResult$1(KafkaZkClient.scala:2001) [2024-04-09T21:06:17.307Z] ... 22
[jira] [Created] (KAFKA-16504) Flaky test org.apache.kafka.controller.QuorumControllerTest.testConfigurationOperations
Igor Soarez created KAFKA-16504: --- Summary: Flaky test org.apache.kafka.controller.QuorumControllerTest.testConfigurationOperations Key: KAFKA-16504 URL: https://issues.apache.org/jira/browse/KAFKA-16504 Project: Kafka Issue Type: Test Components: controller Reporter: Igor Soarez {code:java} [2024-04-09T20:26:55.840Z] Gradle Test Run :metadata:test > Gradle Test Executor 54 > QuorumControllerTest > testConfigurationOperations() STARTED [2024-04-09T20:26:55.840Z] org.apache.kafka.controller.QuorumControllerTest.testConfigurationOperations() failed, log available in /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-15522/metadata/build/reports/testOutput/org.apache.kafka.controller.QuorumControllerTest.testConfigurationOperations().test.stdout [2024-04-09T20:26:55.840Z] [2024-04-09T20:26:55.840Z] Gradle Test Run :metadata:test > Gradle Test Executor 54 > QuorumControllerTest > testConfigurationOperations() FAILED [2024-04-09T20:26:55.840Z] java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NotControllerException: No controller appears to be active. [2024-04-09T20:26:55.840Z] at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) [2024-04-09T20:26:55.840Z] at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) [2024-04-09T20:26:55.840Z] at org.apache.kafka.controller.QuorumControllerTest.testConfigurationOperations(QuorumControllerTest.java:202) [2024-04-09T20:26:55.840Z] [2024-04-09T20:26:55.840Z] Caused by: [2024-04-09T20:26:55.840Z] org.apache.kafka.common.errors.NotControllerException: No controller appears to be active. {code} [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15522/5/tests/] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16403) Flaky test org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords
[ https://issues.apache.org/jira/browse/KAFKA-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez resolved KAFKA-16403. - Resolution: Not A Bug > Flaky test > org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords > - > > Key: KAFKA-16403 > URL: https://issues.apache.org/jira/browse/KAFKA-16403 > Project: Kafka > Issue Type: Bug >Reporter: Igor Soarez >Priority: Major > > {code:java} > org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords() > failed, log available in > /home/jenkins/workspace/Kafka_kafka-pr_PR-14903/streams/examples/build/reports/testOutput/org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords().test.stdout > Gradle Test Run :streams:examples:test > Gradle Test Executor 82 > > WordCountDemoTest > testCountListOfWords() FAILED > org.apache.kafka.streams.errors.ProcessorStateException: Error opening > store KSTREAM-AGGREGATE-STATE-STORE-03 at location > /tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03 > at > org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:329) > at > org.apache.kafka.streams.state.internals.RocksDBTimestampedStore.openRocksDB(RocksDBTimestampedStore.java:69) > at > org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:254) > at > org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:175) > at > org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71) > at > org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:56) > at > org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71) > at > org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$init$3(MeteredKeyValueStore.java:151) > at > org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:872) > at > org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:151) > at > org.apache.kafka.streams.processor.internals.ProcessorStateManager.registerStateStores(ProcessorStateManager.java:232) > at > org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:102) > at > org.apache.kafka.streams.processor.internals.StreamTask.initializeIfNeeded(StreamTask.java:258) > at > org.apache.kafka.streams.TopologyTestDriver.setupTask(TopologyTestDriver.java:530) > at > org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:373) > at > org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:300) > at > org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:276) > at > org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.setup(WordCountDemoTest.java:60) > Caused by: > org.rocksdb.RocksDBException: Corruption: IO error: No such file or > directory: While open a file for random read: > /tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03/10.ldb: > No such file or directory in file > /tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03/MANIFEST-05 > at org.rocksdb.RocksDB.open(Native Method) > at org.rocksdb.RocksDB.open(RocksDB.java:307) > at > org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:323) > ... 17 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16404) Flaky test org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig
[ https://issues.apache.org/jira/browse/KAFKA-16404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez resolved KAFKA-16404. - Resolution: Not A Bug Same as KAFKA-16403, this only failed once. It was likely the result of a testing infrastructure problem. We can always re-open if we see this again and suspect otherwise. > Flaky test > org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig > - > > Key: KAFKA-16404 > URL: https://issues.apache.org/jira/browse/KAFKA-16404 > Project: Kafka > Issue Type: Bug >Reporter: Igor Soarez >Priority: Major > > > {code:java} > org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig() > failed, log available in > /home/jenkins/workspace/Kafka_kafka-pr_PR-14903@2/streams/examples/build/reports/testOutput/org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig().test.stdout > Gradle Test Run :streams:examples:test > Gradle Test Executor 87 > > WordCountDemoTest > testGetStreamsConfig() FAILED > org.apache.kafka.streams.errors.ProcessorStateException: Error opening > store KSTREAM-AGGREGATE-STATE-STORE-03 at location > /tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03 > at > app//org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:329) > at > app//org.apache.kafka.streams.state.internals.RocksDBTimestampedStore.openRocksDB(RocksDBTimestampedStore.java:69) > at > app//org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:254) > at > app//org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:175) > at > app//org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71) > at > app//org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:56) > at > app//org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71) > at > app//org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$init$3(MeteredKeyValueStore.java:151) > at > app//org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:872) > at > app//org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:151) > at > app//org.apache.kafka.streams.processor.internals.ProcessorStateManager.registerStateStores(ProcessorStateManager.java:232) > at > app//org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:102) > at > app//org.apache.kafka.streams.processor.internals.StreamTask.initializeIfNeeded(StreamTask.java:258) > at > app//org.apache.kafka.streams.TopologyTestDriver.setupTask(TopologyTestDriver.java:530) > at > app//org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:373) > at > app//org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:300) > at > app//org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:276) > at > app//org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.setup(WordCountDemoTest.java:60) > Caused by: > org.rocksdb.RocksDBException: While lock file: > /tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03/LOCK: > Resource temporarily unavailable > at app//org.rocksdb.RocksDB.open(Native Method) > at app//org.rocksdb.RocksDB.open(RocksDB.java:307) > at > app//org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:323) > ... 17 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16422) Flaky test org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest."testFailingOverIncrementsNewActiveControllerCount(boolean).true"
Igor Soarez created KAFKA-16422: --- Summary: Flaky test org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest."testFailingOverIncrementsNewActiveControllerCount(boolean).true" Key: KAFKA-16422 URL: https://issues.apache.org/jira/browse/KAFKA-16422 Project: Kafka Issue Type: Bug Reporter: Igor Soarez {code:java} [2024-03-22T10:39:59.911Z] Gradle Test Run :metadata:test > Gradle Test Executor 92 > QuorumControllerMetricsIntegrationTest > testFailingOverIncrementsNewActiveControllerCount(boolean) > "testFailingOverIncrementsNewActiveControllerCount(boolean).true" FAILED [2024-03-22T10:39:59.912Z] org.opentest4j.AssertionFailedError: expected: <1> but was: <2> [2024-03-22T10:39:59.912Z] at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) [2024-03-22T10:39:59.912Z] at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) [2024-03-22T10:39:59.912Z] at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) [2024-03-22T10:39:59.912Z] at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166) [2024-03-22T10:39:59.912Z] at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161) [2024-03-22T10:39:59.912Z] at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:632) [2024-03-22T10:39:59.912Z] at app//org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest.lambda$testFailingOverIncrementsNewActiveControllerCount$1(QuorumControllerMetricsIntegrationTest.java:107) [2024-03-22T10:39:59.912Z] at app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:444) [2024-03-22T10:39:59.912Z] at app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:412) [2024-03-22T10:39:59.912Z] at app//org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest.testFailingOverIncrementsNewActiveControllerCount(QuorumControllerMetricsIntegrationTest.java:105) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16404) Flaky test org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig
Igor Soarez created KAFKA-16404: --- Summary: Flaky test org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig Key: KAFKA-16404 URL: https://issues.apache.org/jira/browse/KAFKA-16404 Project: Kafka Issue Type: Bug Reporter: Igor Soarez {code:java} org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig() failed, log available in /home/jenkins/workspace/Kafka_kafka-pr_PR-14903@2/streams/examples/build/reports/testOutput/org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig().test.stdout Gradle Test Run :streams:examples:test > Gradle Test Executor 87 > WordCountDemoTest > testGetStreamsConfig() FAILED org.apache.kafka.streams.errors.ProcessorStateException: Error opening store KSTREAM-AGGREGATE-STATE-STORE-03 at location /tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03 at app//org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:329) at app//org.apache.kafka.streams.state.internals.RocksDBTimestampedStore.openRocksDB(RocksDBTimestampedStore.java:69) at app//org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:254) at app//org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:175) at app//org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71) at app//org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:56) at app//org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71) at app//org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$init$3(MeteredKeyValueStore.java:151) at app//org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:872) at app//org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:151) at app//org.apache.kafka.streams.processor.internals.ProcessorStateManager.registerStateStores(ProcessorStateManager.java:232) at app//org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:102) at app//org.apache.kafka.streams.processor.internals.StreamTask.initializeIfNeeded(StreamTask.java:258) at app//org.apache.kafka.streams.TopologyTestDriver.setupTask(TopologyTestDriver.java:530) at app//org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:373) at app//org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:300) at app//org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:276) at app//org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.setup(WordCountDemoTest.java:60) Caused by: org.rocksdb.RocksDBException: While lock file: /tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03/LOCK: Resource temporarily unavailable at app//org.rocksdb.RocksDB.open(Native Method) at app//org.rocksdb.RocksDB.open(RocksDB.java:307) at app//org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:323) ... 17 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16403) Flaky test org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords
Igor Soarez created KAFKA-16403: --- Summary: Flaky test org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords Key: KAFKA-16403 URL: https://issues.apache.org/jira/browse/KAFKA-16403 Project: Kafka Issue Type: Bug Reporter: Igor Soarez {code:java} org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords() failed, log available in /home/jenkins/workspace/Kafka_kafka-pr_PR-14903/streams/examples/build/reports/testOutput/org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords().test.stdout Gradle Test Run :streams:examples:test > Gradle Test Executor 82 > WordCountDemoTest > testCountListOfWords() FAILED org.apache.kafka.streams.errors.ProcessorStateException: Error opening store KSTREAM-AGGREGATE-STATE-STORE-03 at location /tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03 at org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:329) at org.apache.kafka.streams.state.internals.RocksDBTimestampedStore.openRocksDB(RocksDBTimestampedStore.java:69) at org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:254) at org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:175) at org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71) at org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:56) at org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71) at org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$init$3(MeteredKeyValueStore.java:151) at org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:872) at org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:151) at org.apache.kafka.streams.processor.internals.ProcessorStateManager.registerStateStores(ProcessorStateManager.java:232) at org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:102) at org.apache.kafka.streams.processor.internals.StreamTask.initializeIfNeeded(StreamTask.java:258) at org.apache.kafka.streams.TopologyTestDriver.setupTask(TopologyTestDriver.java:530) at org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:373) at org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:300) at org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:276) at org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.setup(WordCountDemoTest.java:60) Caused by: org.rocksdb.RocksDBException: Corruption: IO error: No such file or directory: While open a file for random read: /tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03/10.ldb: No such file or directory in file /tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03/MANIFEST-05 at org.rocksdb.RocksDB.open(Native Method) at org.rocksdb.RocksDB.open(RocksDB.java:307) at org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:323) ... 17 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16402) Flaky test org.apache.kafka.controller.QuorumControllerTest.testSnapshotSaveAndLoad
Igor Soarez created KAFKA-16402: --- Summary: Flaky test org.apache.kafka.controller.QuorumControllerTest.testSnapshotSaveAndLoad Key: KAFKA-16402 URL: https://issues.apache.org/jira/browse/KAFKA-16402 Project: Kafka Issue Type: Bug Reporter: Igor Soarez {code:java} org.apache.kafka.controller.QuorumControllerTest.testSnapshotSaveAndLoad() failed, log available in /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-14903/metadata/build/reports/testOutput/org.apache.kafka.controller.QuorumControllerTest.testSnapshotSaveAndLoad().test.stdout Gradle Test Run :metadata:test > Gradle Test Executor 93 > QuorumControllerTest > testSnapshotSaveAndLoad() FAILED java.lang.IllegalArgumentException: Self-suppression not permitted at java.base/java.lang.Throwable.addSuppressed(Throwable.java:1072) at org.apache.kafka.controller.QuorumControllerTest.testSnapshotSaveAndLoad(QuorumControllerTest.java:645) Caused by: org.apache.kafka.server.fault.FaultHandlerException: fatalFaultHandler: exception while renouncing leadership: Attempt to resign from epoch 1 which is larger than the current epoch 0 at app//org.apache.kafka.metalog.LocalLogManager.resign(LocalLogManager.java:808) at app//org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1270) at app//org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:547) at app//org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:179) at app//org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:881) at app//org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871) at app//org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:149) at app//org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:138) at app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:211) at app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:182) at java.base@17.0.7/java.lang.Thread.run(Thread.java:833) Caused by: java.lang.IllegalArgumentException: Attempt to resign from epoch 1 which is larger than the current epoch 0 at org.apache.kafka.metalog.LocalLogManager.resign(LocalLogManager.java:808) ... 10 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16365) AssignmentsManager mismanages completion notifications
Igor Soarez created KAFKA-16365: --- Summary: AssignmentsManager mismanages completion notifications Key: KAFKA-16365 URL: https://issues.apache.org/jira/browse/KAFKA-16365 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez Assignee: Igor Soarez When moving replicas between directories in the same broker, future replica promotion hinges on acknowledgment from the controller of a change in the directory assignment. ReplicaAlterLogDirsThread relies on AssignmentsManager for a completion notification of the directory assignment change. In its current form, under certain assignment scheduling, AssignmentsManager both miss completion notifications, or prematurely trigger them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16363) Storage crashes if dir is unavailable
Igor Soarez created KAFKA-16363: --- Summary: Storage crashes if dir is unavailable Key: KAFKA-16363 URL: https://issues.apache.org/jira/browse/KAFKA-16363 Project: Kafka Issue Type: Sub-task Components: tools Affects Versions: 3.7.0 Reporter: Igor Soarez The storage tool crashes if one of the configured log directories is unavailable. {code:java} sh-4.4# ./bin/kafka-storage.sh format --ignore-formatted -t $KAFKA_CLUSTER_ID -c server.properties [2024-03-11 17:51:05,391] ERROR Error while reading meta.properties file /data/d2/meta.properties (org.apache.kafka.metadata.properties.MetaPropertiesEnsemble) java.nio.file.AccessDeniedException: /data/d2/meta.properties at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218) at java.base/java.nio.file.Files.newByteChannel(Files.java:380) at java.base/java.nio.file.Files.newByteChannel(Files.java:432) at java.base/java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:422) at java.base/java.nio.file.Files.newInputStream(Files.java:160) at org.apache.kafka.metadata.properties.PropertiesUtils.readPropertiesFile(PropertiesUtils.java:77) at org.apache.kafka.metadata.properties.MetaPropertiesEnsemble$Loader.load(MetaPropertiesEnsemble.java:135) at kafka.tools.StorageTool$.formatCommand(StorageTool.scala:431) at kafka.tools.StorageTool$.main(StorageTool.scala:95) at kafka.tools.StorageTool.main(StorageTool.scala) metaPropertiesEnsemble=MetaPropertiesEnsemble(metadataLogDir=Optional.empty, dirs={/data/d1: MetaProperties(version=1, clusterId=RwO2UIkmTBWltwRllP05aA, nodeId=101, directoryId=zm7fSw3zso9aR0AtuzsI_A), /data/metadata: MetaProperties(version=1, clusterId=RwO2UIkmTBWltwRllP05aA, nodeId=101, directoryId=eRO8vOP7ddbpx_W2ZazjLw), /data/d2: ERROR}) I/O error trying to read log directory /data/d2. {code} When configured with multiple directories, Kafka tolerates some of them (but not all) being inaccessible, so this tool should be able to handle the same scenarios without crashing. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16297) Race condition while promoting future replica can lead to partition unavailability.
Igor Soarez created KAFKA-16297: --- Summary: Race condition while promoting future replica can lead to partition unavailability. Key: KAFKA-16297 URL: https://issues.apache.org/jira/browse/KAFKA-16297 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez KIP-858 proposed that when a directory failure occurs after changing the assignment of a replica that's moved between two directories in the same broker, but before the future replica promotion completes, the broker should reassign the replica to inform the controller of its correct status. But this hasn't yet been implemented, and without it this failure may lead to indefinite partition unavailability. Example scenario: # A broker which leads partition P receives a request to alter the replica from directory A to directory B. # The broker creates a future replica in directory B and starts a replica fetcher. # Once the future replica first catches up, the broker queues a reassignment to inform the controller of the directory change. # The next time the replica catches up, the broker briefly blocks appends and promotes the replica. However, before the promotion is attempted, directory A fails. # The controller was informed that P in now in directory B before it received the notification that directory A has failed, so it does not elect a new leader, and as long as the broker is online, partition A remains unavailable. As per KIP-858, the broker should detect this scenario and queue a reassignment of P into directory ID {{{}DirectoryId.LOST{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15955) Migrating ZK brokers send dir assignments
Igor Soarez created KAFKA-15955: --- Summary: Migrating ZK brokers send dir assignments Key: KAFKA-15955 URL: https://issues.apache.org/jira/browse/KAFKA-15955 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez Broker in ZooKeeper mode, while in migration mode, should start sending directory assignments to the KRaft Controller using AssignmentsManager. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15893) Bump MetadataVersion for directory assignments
Igor Soarez created KAFKA-15893: --- Summary: Bump MetadataVersion for directory assignments Key: KAFKA-15893 URL: https://issues.apache.org/jira/browse/KAFKA-15893 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15886) Always specify directories for new partition registrations
Igor Soarez created KAFKA-15886: --- Summary: Always specify directories for new partition registrations Key: KAFKA-15886 URL: https://issues.apache.org/jira/browse/KAFKA-15886 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez When creating partition registrations directories must always be defined. If creating a partition from a PartitionRecord or PartitionChangeRecord from an older version that does not support directory assignments, then DirectoryId.MIGRATING is assumed. If creating a new partition, or triggering a change in assignment, DirectoryId.UNASSIGNED should be specified, unless the target broker has a single online directory registered, in which case the replica should be assigned directly to that single directory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15858) Broker stays fenced until all assignments are correct
Igor Soarez created KAFKA-15858: --- Summary: Broker stays fenced until all assignments are correct Key: KAFKA-15858 URL: https://issues.apache.org/jira/browse/KAFKA-15858 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez Until there the broker has caught up with metadata AND corrected any incorrect directory assignments, it should continue to want to stay fenced. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15650) Data-loss on leader shutdown right after partition creation?
Igor Soarez created KAFKA-15650: --- Summary: Data-loss on leader shutdown right after partition creation? Key: KAFKA-15650 URL: https://issues.apache.org/jira/browse/KAFKA-15650 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez As per KIP-858, when a replica is created, the broker selects a log directory to host the replica and queues the propagation of the directory assignment to the controller. The replica becomes immediately active, it isn't blocked until the controller confirms the metadata change. If the replica is the leader replica it can immediately start accepting writes. Consider the following scenario: # A partition is created in some selected log directory, and some produce traffic is accepted # Before the broker is able to notify the controller of the directory assignment, the broker shuts down # Upon coming back online, the broker has an offline directory, the same directory which was chosen to host the replica # The broker assumes leadership for the replica, but cannot find it in any available directory and has no way of knowing it was already created because the directory assignment is still missing # The replica is created and the previously produced records are lost Step 4. may seem unlikely due to ISR membership gating leadership, but even assuming acks=all and replicas>1, if all other replicas are also offline the broker may still gain leadership. Perhaps KIP-966 is relevant here. We may need to delay new replica activation until the assignment is propagated successfully. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15649) Handle directory failure timeout
Igor Soarez created KAFKA-15649: --- Summary: Handle directory failure timeout Key: KAFKA-15649 URL: https://issues.apache.org/jira/browse/KAFKA-15649 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez If a broker with an offline log directory continues to fail to notify the controller of either: * the fact that the directory is offline; or * of any replica assignment into a failed directory then the controller will not check if a leadership change is required, and this may lead to partitions remaining indefinitely offline. KIP-858 proposes that the broker should shut down after a configurable timeout to force a leadership change. Alternatively, the broker could also request to be fenced, as long as there's a path for it to later become unfenced. While this unavailability is possible in theory, in practice it's not easy to entertain a scenario where a broker continues to appear as healthy before the controller, but fails to send this information. So it's not clear if this is a real problem. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (KAFKA-15355) Message schema changes
[ https://issues.apache.org/jira/browse/KAFKA-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez reopened KAFKA-15355: - closed by mistake > Message schema changes > -- > > Key: KAFKA-15355 > URL: https://issues.apache.org/jira/browse/KAFKA-15355 > Project: Kafka > Issue Type: Sub-task >Reporter: Igor Soarez >Assignee: Igor Soarez >Priority: Major > Fix For: 3.7.0 > > > Metadata records changes: > * BrokerRegistrationChangeRecord > * PartitionChangeRecord > * PartitionRecord > * RegisterBrokerRecord > New RPCs > * AssignReplicasToDirsRequest > * AssignReplicasToDirsResponse > RPC changes: > * BrokerHeartbeatRequest > * BrokerRegistrationRequest -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15514) Controller-side replica management changes
Igor Soarez created KAFKA-15514: --- Summary: Controller-side replica management changes Key: KAFKA-15514 URL: https://issues.apache.org/jira/browse/KAFKA-15514 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez The new "Assignments" field replaces the "Replicas" field in PartitionRecord and PartitionChangeRecord. On the controller side, any changes to partitions need to consider both fields. * ISR updates * Partiton reassignments & reverts * Partition creation -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15451) Include offline dirs in BrokerHeartbeatRequest
Igor Soarez created KAFKA-15451: --- Summary: Include offline dirs in BrokerHeartbeatRequest Key: KAFKA-15451 URL: https://issues.apache.org/jira/browse/KAFKA-15451 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez Assignee: Igor Soarez -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15426) Process and persist directory assignments
Igor Soarez created KAFKA-15426: --- Summary: Process and persist directory assignments Key: KAFKA-15426 URL: https://issues.apache.org/jira/browse/KAFKA-15426 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez * Handle AssignReplicasToDirsRequest * Persist metadata changes -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15368) Test ZK JBOD to KRaft migration
Igor Soarez created KAFKA-15368: --- Summary: Test ZK JBOD to KRaft migration Key: KAFKA-15368 URL: https://issues.apache.org/jira/browse/KAFKA-15368 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez A ZK cluster running JBOD should be able to migrate to KRaft mode without issues -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15367) Test KRaft JBOD enabling migration
Igor Soarez created KAFKA-15367: --- Summary: Test KRaft JBOD enabling migration Key: KAFKA-15367 URL: https://issues.apache.org/jira/browse/KAFKA-15367 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez A cluster running in KRaft without JBOD should be able to transition into JBOD mode without issues -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15366) Log directory failure integration test
Igor Soarez created KAFKA-15366: --- Summary: Log directory failure integration test Key: KAFKA-15366 URL: https://issues.apache.org/jira/browse/KAFKA-15366 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15365) Replica management changes
Igor Soarez created KAFKA-15365: --- Summary: Replica management changes Key: KAFKA-15365 URL: https://issues.apache.org/jira/browse/KAFKA-15365 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15364) Handle log directory failure in the Controller
Igor Soarez created KAFKA-15364: --- Summary: Handle log directory failure in the Controller Key: KAFKA-15364 URL: https://issues.apache.org/jira/browse/KAFKA-15364 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15363) Broker log directory failure changes
Igor Soarez created KAFKA-15363: --- Summary: Broker log directory failure changes Key: KAFKA-15363 URL: https://issues.apache.org/jira/browse/KAFKA-15363 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15362) Resolve offline replicas in metadata cache
Igor Soarez created KAFKA-15362: --- Summary: Resolve offline replicas in metadata cache Key: KAFKA-15362 URL: https://issues.apache.org/jira/browse/KAFKA-15362 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez Considering broker's offline log directories and replica to dir assignments -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15361) Process and persist dir info with broker registration
Igor Soarez created KAFKA-15361: --- Summary: Process and persist dir info with broker registration Key: KAFKA-15361 URL: https://issues.apache.org/jira/browse/KAFKA-15361 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez Controllers should process and persist directory information from the broker registration request -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15360) Include directory info in BrokerRegistration
Igor Soarez created KAFKA-15360: --- Summary: Include directory info in BrokerRegistration Key: KAFKA-15360 URL: https://issues.apache.org/jira/browse/KAFKA-15360 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez Assignee: Igor Soarez Brokers should correctly set the OnlineLogDirs and OfflineLogDirs fields in each BrokerRegistrationRequest. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15359) log.dir.failure.timeout.ms configuration
Igor Soarez created KAFKA-15359: --- Summary: log.dir.failure.timeout.ms configuration Key: KAFKA-15359 URL: https://issues.apache.org/jira/browse/KAFKA-15359 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez If the Broker repeatedly cannot communicate fails to communicate a log directory failure after a configurable amount of time — {{log.dir.failure.timeout.ms}} — and it is the leader for any replicas in the failed log directory the broker will shutdown, as that is the only other way to guarantee that the controller will elect a new leader for those partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15358) QueuedReplicaToDirAssignments metric
Igor Soarez created KAFKA-15358: --- Summary: QueuedReplicaToDirAssignments metric Key: KAFKA-15358 URL: https://issues.apache.org/jira/browse/KAFKA-15358 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez Assignee: Igor Soarez -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15357) Propagates assignments and logdir failures to controller
Igor Soarez created KAFKA-15357: --- Summary: Propagates assignments and logdir failures to controller Key: KAFKA-15357 URL: https://issues.apache.org/jira/browse/KAFKA-15357 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez Assignee: Igor Soarez LogDirEventManager accumulates, batches, and sends assignments or failure events to the Controller, prioritizing assignments to ensure the Controller has the correct assignment view before processing log dir failures. Assignments are sent via AssignReplicasToDirs, logdir failures are sent via BrokerHeartbeat. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15356) Generate and persist log directory UUIDs
Igor Soarez created KAFKA-15356: --- Summary: Generate and persist log directory UUIDs Key: KAFKA-15356 URL: https://issues.apache.org/jira/browse/KAFKA-15356 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez Assignee: Igor Soarez * The StorageTool format command should ensure each dir has a generated UUID * BrokerMetadataCheckpoint should parse directory.id from meta.properties, or generate and persist if missing -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15355) Update metadata records
Igor Soarez created KAFKA-15355: --- Summary: Update metadata records Key: KAFKA-15355 URL: https://issues.apache.org/jira/browse/KAFKA-15355 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez Includes: * BrokerRegistrationChangeRecord * PartitionChangeRecord * PartitionRecord * RegisterBrokerRecord -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14303) Producer.send without record key and batch.size=0 goes into infinite loop
Igor Soarez created KAFKA-14303: --- Summary: Producer.send without record key and batch.size=0 goes into infinite loop Key: KAFKA-14303 URL: https://issues.apache.org/jira/browse/KAFKA-14303 Project: Kafka Issue Type: Bug Components: clients, producer Affects Versions: 3.3.1, 3.3.0 Reporter: Igor Soarez 3.3 has broken previous producer behavior. A call to {{producer.send(record)}} with a record without a key and configured with {{batch.size=0}} never returns. Reproducer: {code:java} class ProducerIssueTest extends IntegrationTestHarness { override protected def brokerCount = 1 @Test def testProduceWithBatchSizeZeroAndNoRecordKey(): Unit = { val topicName = "foo" createTopic(topicName) val overrides = new Properties overrides.put(ProducerConfig.BATCH_SIZE_CONFIG, 0) val producer = createProducer(keySerializer = new StringSerializer, valueSerializer = new StringSerializer, overrides) val record = new ProducerRecord[String, String](topicName, null, "hello") val future = producer.send(record) // goes into infinite loop here future.get(10, TimeUnit.SECONDS) } } {code} [Documentation for producer configuration|https://kafka.apache.org/documentation/#producerconfigs_batch.size] states {{batch.size=0}} as a valid value: {quote}Valid Values: [0,...] {quote} and recommends its use directly: {quote}A small batch size will make batching less common and may reduce throughput (a batch size of zero will disable batching entirely). {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14127) KIP-858: Handle JBOD broker disk failure in KRaft
Igor Soarez created KAFKA-14127: --- Summary: KIP-858: Handle JBOD broker disk failure in KRaft Key: KAFKA-14127 URL: https://issues.apache.org/jira/browse/KAFKA-14127 Project: Kafka Issue Type: Improvement Components: jbod, kraft Reporter: Igor Soarez Tracking for KIP-858 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-13906) Invalid replica state transition
Igor Soarez created KAFKA-13906: --- Summary: Invalid replica state transition Key: KAFKA-13906 URL: https://issues.apache.org/jira/browse/KAFKA-13906 Project: Kafka Issue Type: Bug Components: controller, core, replication Affects Versions: 3.1.1, 3.0.1, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.0.2, 3.1.2, 3.2.1 Reporter: Igor Soarez The controller runs into an IllegalStateException when reacting to changes in broker membership status if there are topics that are pending deletion. How to reproduce: # Setup cluster with 3 brokers # Create a topic with a partition being led by each broker and produce some data # Kill one of the brokers that is not the controller, and keep that broker down # Delete the topic # Restart the other broker that is not the controller Logs and stacktrace: {{[2022-05-16 11:53:25,482] ERROR [Controller id=1 epoch=1] Controller 1 epoch 1 initiated state change of replica 3 for partition test-topic-2 from ReplicaDeletionSuccessful to ReplicaDeletionIneligible failed (state.change.logger)}} {{java.lang.IllegalStateException: Replica [Topic=test-topic,Partition=2,Replica=3] should be in the OfflineReplica,ReplicaDeletionStarted states before moving to ReplicaDeletionIneligible state. Instead it is in ReplicaDeletionSuccessful state}} {{ at kafka.controller.ZkReplicaStateMachine.logInvalidTransition(ReplicaStateMachine.scala:442)}} {{ at kafka.controller.ZkReplicaStateMachine.$anonfun$doHandleStateChanges$2(ReplicaStateMachine.scala:164)}} {{ at kafka.controller.ZkReplicaStateMachine.$anonfun$doHandleStateChanges$2$adapted(ReplicaStateMachine.scala:164)}} {{ at scala.collection.immutable.List.foreach(List.scala:333)}} {{ at kafka.controller.ZkReplicaStateMachine.doHandleStateChanges(ReplicaStateMachine.scala:164)}} {{ at kafka.controller.ZkReplicaStateMachine.$anonfun$handleStateChanges$2(ReplicaStateMachine.scala:112)}} {{ at kafka.controller.ZkReplicaStateMachine.$anonfun$handleStateChanges$2$adapted(ReplicaStateMachine.scala:111)}} {{ at kafka.utils.Implicits$MapExtensionMethods$.$anonfun$forKeyValue$1(Implicits.scala:62)}} {{ at scala.collection.immutable.HashMap.foreachEntry(HashMap.scala:1092)}} {{ at kafka.controller.ZkReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:111)}} {{ at kafka.controller.TopicDeletionManager.failReplicaDeletion(TopicDeletionManager.scala:157)}} {{ at kafka.controller.KafkaController.onReplicasBecomeOffline(KafkaController.scala:638)}} {{ at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:599)}} {{ at kafka.controller.KafkaController.processBrokerChange(KafkaController.scala:1623)}} {{ at kafka.controller.KafkaController.process(KafkaController.scala:2534)}} {{ at kafka.controller.QueuedEvent.process(ControllerEventManager.scala:52)}} {{ at kafka.controller.ControllerEventManager$ControllerEventThread.process$1(ControllerEventManager.scala:130)}} {{--}} {{[2022-05-16 11:53:40,726] ERROR [Controller id=1 epoch=1] Controller 1 epoch 1 initiated state change of replica 3 for partition test-topic-2 from ReplicaDeletionSuccessful to OnlineReplica failed (state.change.logger)}} {{java.lang.IllegalStateException: Replica [Topic=test-topic,Partition=2,Replica=3] should be in the NewReplica,OnlineReplica,OfflineReplica,ReplicaDeletionIneligible states before moving to OnlineReplica state. Instead it is in ReplicaDeletionSuccessful state}} {{ at kafka.controller.ZkReplicaStateMachine.logInvalidTransition(ReplicaStateMachine.scala:442)}} {{ at kafka.controller.ZkReplicaStateMachine.$anonfun$doHandleStateChanges$2(ReplicaStateMachine.scala:164)}} {{ at kafka.controller.ZkReplicaStateMachine.$anonfun$doHandleStateChanges$2$adapted(ReplicaStateMachine.scala:164)}} {{ at scala.collection.immutable.List.foreach(List.scala:333)}} {{ at kafka.controller.ZkReplicaStateMachine.doHandleStateChanges(ReplicaStateMachine.scala:164)}} {{ at kafka.controller.ZkReplicaStateMachine.$anonfun$handleStateChanges$2(ReplicaStateMachine.scala:112)}} {{ at kafka.controller.ZkReplicaStateMachine.$anonfun$handleStateChanges$2$adapted(ReplicaStateMachine.scala:111)}} {{ at kafka.utils.Implicits$MapExtensionMethods$.$anonfun$forKeyValue$1(Implicits.scala:62)}} {{ at scala.collection.immutable.HashMap.foreachEntry(HashMap.scala:1092)}} {{ at kafka.controller.ZkReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:111)}} {{ at kafka.controller.KafkaController.onBrokerStartup(KafkaController.scala:543)}} {{ at kafka.controller.KafkaController.processBrokerChange(KafkaController.scala:1607)}} {{ at kafka.controller.KafkaController.process(KafkaController.scala:2534)}}
[jira] [Created] (KAFKA-13382) Automatic storage formatting
Igor Soarez created KAFKA-13382: --- Summary: Automatic storage formatting Key: KAFKA-13382 URL: https://issues.apache.org/jira/browse/KAFKA-13382 Project: Kafka Issue Type: Improvement Components: core, jbod, kraft Reporter: Igor Soarez Assignee: Igor Soarez It should be possible to configure a KRaft server to detect the need to, and format storage directories on startup. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12938) Fix and reenable testChrootExistsAndRootIsLocked
Igor Soarez created KAFKA-12938: --- Summary: Fix and reenable testChrootExistsAndRootIsLocked Key: KAFKA-12938 URL: https://issues.apache.org/jira/browse/KAFKA-12938 Project: Kafka Issue Type: Task Reporter: Igor Soarez In core/src/test/scala/unit/kafka/zk/KafkaZkClientTest.scala Disabled in https://github.com/apache/kafka/pull/10820 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12866) Kafka requires ZK root access even when using a chroot
Igor Soarez created KAFKA-12866: --- Summary: Kafka requires ZK root access even when using a chroot Key: KAFKA-12866 URL: https://issues.apache.org/jira/browse/KAFKA-12866 Project: Kafka Issue Type: Bug Components: core, zkclient Affects Versions: 2.6.2, 2.7.1, 2.8.0, 2.6.1 Reporter: Igor Soarez When a Zookeeper chroot is configured, users do not expect Kafka to need Zookeeper access outside of that chroot. h1. Why is this important? A zookeeper cluster may be shared with other Kafka clusters or even other applications. It is an expected security practice to restrict each cluster/application's access to it's own Zookeeper chroot. h1. Steps to reproduce h2. Zookeeper setup Using the zkCli, create a chroot for Kafka, make it available to Kafka but lock the root znode. {{ [zk: localhost:2181(CONNECTED) 1] create /somechroot }} {{ Created /some}} {{ [zk: localhost:2181(CONNECTED) 2] setAcl /somechroot world:anyone:cdrwa}} {{ [zk: localhost:2181(CONNECTED) 3] addauth digest test:12345}} {{ [zk: localhost:2181(CONNECTED) 4] setAcl / digest:test:Mx1uO9GLtm1qaVAQ20Vh9ODgACg=:cdrwa}} h2. Kafka setup Configure the chroot in broker.properties: {{zookeeper.connect=localhost:2181/somechroot}} h2. Expected behavior The expected behavior here is that Kafka will use the chroot without issues. h2. Actual result Kafka fails to start with a fatal exception: {{ org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /chroot}} {{ at org.apache.zookeeper.KeeperException.create(KeeperException.java:120)}} {{ at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)}} {{ at kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:583)}} {{ at kafka.zk.KafkaZkClient.createRecursive(KafkaZkClient.scala:1729)}} {{ at kafka.zk.KafkaZkClient.makeSurePersistentPathExists(KafkaZkClient.scala:1627)}} {{ at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1957)}} {{ at kafka.zk.ZkClientAclTest.testChrootExistsAndRootIsLocked(ZkClientAclTest.scala:60)}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12763) NoSuchElementException during kafka-log-start-offset-checkpoint
Igor Soarez created KAFKA-12763: --- Summary: NoSuchElementException during kafka-log-start-offset-checkpoint Key: KAFKA-12763 URL: https://issues.apache.org/jira/browse/KAFKA-12763 Project: Kafka Issue Type: Bug Components: log Affects Versions: 2.6.2 Reporter: Igor Soarez Assignee: Igor Soarez The following exception was observed in a cluster running Kafka version 2.6.2. {{}} {code:java} { "class": "java.util.NoSuchElementException", "msg": null, "stack": [ "java.util.concurrent.ConcurrentSkipListMap$ValueIterator.next(ConcurrentSkipListMap.java:2123)", "scala.collection.convert.JavaCollectionWrappers$JIteratorWrapper.next(JavaCollectionWrappers.scala:38)", "scala.collection.IterableOps.head(Iterable.scala:218)", "scala.collection.IterableOps.head$(Iterable.scala:218)", "scala.collection.AbstractIterable.head(Iterable.scala:920)", "kafka.log.LogManager$$anonfun$4.applyOrElse(LogManager.scala:640)", "kafka.log.LogManager$$anonfun$4.applyOrElse(LogManager.scala:639)", "scala.collection.Iterator$$anon$7.hasNext(Iterator.scala:516)", "scala.collection.mutable.Growable.addAll(Growable.scala:61)", "scala.collection.mutable.Growable.addAll$(Growable.scala:59)", "scala.collection.mutable.HashMap.addAll(HashMap.scala:111)", "scala.collection.mutable.HashMap$.from(HashMap.scala:549)", "scala.collection.mutable.HashMap$.from(HashMap.scala:542)", "scala.collection.MapFactory$Delegate.from(Factory.scala:425)", "scala.collection.MapOps.collect(Map.scala:283)", "scala.collection.MapOps.collect$(Map.scala:282)", "scala.collection.AbstractMap.collect(Map.scala:375)", "kafka.log.LogManager.$anonfun$checkpointLogStartOffsetsInDir$2(LogManager.scala:639)", "kafka.log.LogManager.$anonfun$checkpointLogStartOffsetsInDir$1(LogManager.scala:636)", "kafka.log.LogManager.checkpointLogStartOffsetsInDir(LogManager.scala:635)", "kafka.log.LogManager.$anonfun$checkpointLogStartOffsets$1(LogManager.scala:600)", "kafka.log.LogManager.$anonfun$checkpointLogStartOffsets$1$adapted(LogManager.scala:600)", "scala.collection.IterableOnceOps.foreach(IterableOnce.scala:553)", "scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:551)", "scala.collection.AbstractIterable.foreach(Iterable.scala:920)", "kafka.log.LogManager.checkpointLogStartOffsets(LogManager.scala:600)", "kafka.log.LogManager.$anonfun$startup$6(LogManager.scala:426)", "kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)", "java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)", "java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)", "java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)", "java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)", "java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)", "java.lang.Thread.run(Thread.java:834)" ] }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-10303) kafka producer says connect failed in cluster mode
[ https://issues.apache.org/jira/browse/KAFKA-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez resolved KAFKA-10303. - Resolution: Incomplete > kafka producer says connect failed in cluster mode > -- > > Key: KAFKA-10303 > URL: https://issues.apache.org/jira/browse/KAFKA-10303 > Project: Kafka > Issue Type: Bug >Reporter: Yogesh BG >Priority: Major > > Hi > > I am using kafka broker version 2.3.0 > We have two setups with standalone(one node) and 3 nodes cluster > we pump huge data ~25MBPS, ~80K messages per second > It all works well in one node mode > but in case of cluster, producer start throwing connect failed(librd kafka) > after sometime again able to connect start sending traffic. > What could be the issue? some of the configurations are > > replica.fetch.max.bytes=10485760 > num.network.threads=12 > num.replica.fetchers=6 > queued.max.requests=5 > # The number of threads doing disk I/O > num.io.threads=12 > replica.socket.receive.buffer.bytes=1 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-8505) Limit the maximum number of connections per ip client ID
Igor Soarez created KAFKA-8505: -- Summary: Limit the maximum number of connections per ip client ID Key: KAFKA-8505 URL: https://issues.apache.org/jira/browse/KAFKA-8505 Project: Kafka Issue Type: New Feature Reporter: Igor Soarez As highlighted by KAFKA-1512 back in 2014, it is important to be able to limit the number of client connections to brokers to maintain service availability. With multiple use-cases on the same cluster, it's important to prevent one misconfigured use-case from affecting other use-cases. Cloud infrastructure technology has come a long way since then. Nowadays days in a private network using container orchestration technology, IPs come cheap. Limiting connections solely on origin IP is no longer acceptable. Kafka needs to support connection limits based on client identity. A new configuration property - {{max.connections.per.clientid}} - should work similarly to {{max.connections.per.ip}} using ConnectionQuotas, managed straight after parsing the request header in SocketServer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)