[jira] [Resolved] (KAFKA-16757) Fix broker re-registration issues around MV 3.7-IV2

2024-06-01 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez resolved KAFKA-16757.
-
Resolution: Fixed

> Fix broker re-registration issues around MV 3.7-IV2
> ---
>
> Key: KAFKA-16757
> URL: https://issues.apache.org/jira/browse/KAFKA-16757
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.7.1, 3.8
>
>
> When upgrading from a MetadataVersion older than 3.7-IV2, we need to resend 
> the broker registration, so that the controller can record the storage 
> directories. The current code for doing this has several problems, however. 
> One is that it tends to trigger even in cases where we don't actually need 
> it. Another is that when re-registering the broker, the broker is marked as 
> fenced.
> This PR moves the handling of the re-registration case out of 
> BrokerMetadataPublisher and into BrokerRegistrationTracker. The 
> re-registration code there will only trigger in the case where the broker 
> sees an existing registration for itself with no directories set. This is 
> much more targetted than the original code.
> Additionally, in ClusterControlManager, when re-registering the same broker, 
> we now preserve its fencing and shutdown state, rather than clearing those. 
> (There isn't any good reason re-registering the same broker should clear 
> these things... this was purely an oversight.) Note that we can tell the 
> broker is "the same" because it has the same IncarnationId.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16645) CVEs in 3.7.0 docker image

2024-05-20 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez resolved KAFKA-16645.
-
Resolution: Resolved

> CVEs in 3.7.0 docker image
> --
>
> Key: KAFKA-16645
> URL: https://issues.apache.org/jira/browse/KAFKA-16645
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 3.7.0
>Reporter: Mickael Maison
>Assignee: Igor Soarez
>Priority: Blocker
> Fix For: 3.8.0, 3.7.1
>
>
> Our [Docker Image CVE 
> Scanner|https://github.com/apache/kafka/actions/runs/874393] GitHub 
> action reports 2 high CVEs in our base image:
> apache/kafka:3.7.0 (alpine 3.19.1)
> ==
> Total: 2 (HIGH: 2, CRITICAL: 0)
> ┌──┬┬──┬┬───┬───┬─┐
> │ Library  │ Vulnerability  │ Severity │ Status │ Installed Version │ Fixed 
> Version │Title│
> ├──┼┼──┼┼───┼───┼─┤
> │ libexpat │ CVE-2023-52425 │ HIGH │ fixed  │ 2.5.0-r2  │ 
> 2.6.0-r0  │ expat: parsing large tokens can trigger a denial of service │
> │  ││  ││   │ 
>   │ https://avd.aquasec.com/nvd/cve-2023-52425  │
> │  ├┤  ││   
> ├───┼─┤
> │  │ CVE-2024-28757 │  ││   │ 
> 2.6.2-r0  │ expat: XML Entity Expansion │
> │  ││  ││   │ 
>   │ https://avd.aquasec.com/nvd/cve-2024-28757  │
> └──┴┴──┴┴───┴───┴─┘
> Looking at the 
> [KIP|https://cwiki.apache.org/confluence/display/KAFKA/KIP-975%3A+Docker+Image+for+Apache+Kafka#KIP975:DockerImageforApacheKafka-WhatifweobserveabugoracriticalCVEinthereleasedApacheKafkaDockerImage?]
>  that introduced the docker images, it seems we should release a bugfix when 
> high CVEs are detected. It would be good to investigate and assess whether 
> Kafka is impacted or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-16645) CVEs in 3.7.0 docker image

2024-05-20 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez reopened KAFKA-16645:
-

Need to re-open to change the resolution, release_notes.py doesn't like the one 
I picked

> CVEs in 3.7.0 docker image
> --
>
> Key: KAFKA-16645
> URL: https://issues.apache.org/jira/browse/KAFKA-16645
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 3.7.0
>Reporter: Mickael Maison
>Assignee: Igor Soarez
>Priority: Blocker
> Fix For: 3.8.0, 3.7.1
>
>
> Our [Docker Image CVE 
> Scanner|https://github.com/apache/kafka/actions/runs/874393] GitHub 
> action reports 2 high CVEs in our base image:
> apache/kafka:3.7.0 (alpine 3.19.1)
> ==
> Total: 2 (HIGH: 2, CRITICAL: 0)
> ┌──┬┬──┬┬───┬───┬─┐
> │ Library  │ Vulnerability  │ Severity │ Status │ Installed Version │ Fixed 
> Version │Title│
> ├──┼┼──┼┼───┼───┼─┤
> │ libexpat │ CVE-2023-52425 │ HIGH │ fixed  │ 2.5.0-r2  │ 
> 2.6.0-r0  │ expat: parsing large tokens can trigger a denial of service │
> │  ││  ││   │ 
>   │ https://avd.aquasec.com/nvd/cve-2023-52425  │
> │  ├┤  ││   
> ├───┼─┤
> │  │ CVE-2024-28757 │  ││   │ 
> 2.6.2-r0  │ expat: XML Entity Expansion │
> │  ││  ││   │ 
>   │ https://avd.aquasec.com/nvd/cve-2024-28757  │
> └──┴┴──┴┴───┴───┴─┘
> Looking at the 
> [KIP|https://cwiki.apache.org/confluence/display/KAFKA/KIP-975%3A+Docker+Image+for+Apache+Kafka#KIP975:DockerImageforApacheKafka-WhatifweobserveabugoracriticalCVEinthereleasedApacheKafkaDockerImage?]
>  that introduced the docker images, it seems we should release a bugfix when 
> high CVEs are detected. It would be good to investigate and assess whether 
> Kafka is impacted or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-16692) InvalidRequestException: ADD_PARTITIONS_TO_TXN with version 4 which is not enabled when upgrading from kafka 3.5 to 3.6

2024-05-20 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez reopened KAFKA-16692:
-

Re-opening as 3.6 backport is still missing

> InvalidRequestException: ADD_PARTITIONS_TO_TXN with version 4 which is not 
> enabled when upgrading from kafka 3.5 to 3.6 
> 
>
> Key: KAFKA-16692
> URL: https://issues.apache.org/jira/browse/KAFKA-16692
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.7.0, 3.6.1, 3.8
>Reporter: Johnson Okorie
>Assignee: Justine Olshan
>Priority: Major
> Fix For: 3.7.1, 3.8
>
>
> We have a kafka cluster running on version 3.5.2 that we are upgrading to 
> 3.6.1. This cluster has a lot of clients with exactly one semantics enabled 
> and hence creating transactions. As we replaced brokers with the new 
> binaries, we observed lots of clients in the cluster experiencing the 
> following error:
> {code:java}
> 2024-05-07T09:08:10.039Z "tid": "" -- [Producer clientId=, 
> transactionalId=] Got error produce response with 
> correlation id 6402937 on topic-partition , retrying 
> (2147483512 attempts left). Error: NETWORK_EXCEPTION. Error Message: The 
> server disconnected before a response was received.{code}
> On inspecting the broker, we saw the following errors on brokers still 
> running Kafka version 3.5.2:
>  
> {code:java}
> message:     
> Closing socket for  because of error
> exception_exception_class:    
> org.apache.kafka.common.errors.InvalidRequestException
> exception_exception_message:    
> Received request api key ADD_PARTITIONS_TO_TXN with version 4 which is not 
> enabled
> exception_stacktrace:    
> org.apache.kafka.common.errors.InvalidRequestException: Received request api 
> key ADD_PARTITIONS_TO_TXN with version 4 which is not enabled
> {code}
> On the new brokers running 3.6.1 we saw the following errors:
>  
> {code:java}
> [AddPartitionsToTxnSenderThread-1055]: AddPartitionsToTxnRequest failed for 
> node 1043 with a network exception.{code}
>  
> I can also see this :
> {code:java}
> [AddPartitionsToTxnManager broker=1055]Cancelled in-flight 
> ADD_PARTITIONS_TO_TXN request with correlation id 21120 due to node 1043 
> being disconnected (elapsed time since creation: 11ms, elapsed time since 
> send: 4ms, request timeout: 3ms){code}
> We started investigating this issue and digging through the changes in 3.6, 
> we came across some changes introduced as part of KAFKA-14402 that we thought 
> might lead to this behaviour. 
> First we could see that _transaction.partition.verification.enable_ is 
> enabled by default and enables a new code path that culminates in we sending 
> version 4 ADD_PARTITIONS_TO_TXN requests to other brokers that are generated 
> [here|https://github.com/apache/kafka/blob/29f3260a9c07e654a28620aeb93a778622a5233d/core/src/main/scala/kafka/server/AddPartitionsToTxnManager.scala#L269].
> From a 
> [discussion|https://lists.apache.org/thread/4895wrd1z92kjb708zck4s1f62xq6r8x] 
> on the mailing list, [~jolshan] pointed out that this scenario shouldn't be 
> possible as the following code paths should prevent version 4 
> ADD_PARTITIONS_TO_TXN requests being sent to other brokers:
> [https://github.com/apache/kafka/blob/525b9b1d7682ae2a527ceca83fedca44b1cba11a/clients/src/main/java/org/apache/kafka/clients/NodeApiVersions.java#L130]
>  
> [https://github.com/apache/kafka/blob/525b9b1d7682ae2a527ceca83fedca44b1cba11a/core/src/main/scala/kafka/server/AddPartitionsToTxnManager.scala#L195]
> However, these requests are still sent to other brokers in our environment.
> On further inspection of the code, I am wondering if the following code path 
> could lead to this issue:
> [https://github.com/apache/kafka/blob/c4deed513057c94eb502e64490d6bdc23551d8b6/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L500]
> In this scenario, we don't have any _NodeApiVersions_ available for the 
> specified nodeId and potentially skipping the _latestUsableVersion_ check. I 
> am wondering if it is possible that because _discoverBrokerVersions_ is set 
> to _false_ for the network client of the {_}AddPartitionsToTxnManager{_}, it 
> skips fetching {_}NodeApiVersions{_}? I can see that we create the network 
> client here:
> [https://github.com/apache/kafka/blob/c4deed513057c94eb502e64490d6bdc23551d8b6/core/src/main/scala/kafka/server/KafkaServer.scala#L641]
> The _NetworkUtils.buildNetworkClient_ method seems to create a network client 
> that has _discoverBrokerVersions_ set to {_}false{_}. 
> I was hoping I could get some assistance debugging this issue. Happy to 
> provide any additional information needed.
>  
>  
>  



--
This message was sent by Atlassian Jira

[jira] [Resolved] (KAFKA-16692) InvalidRequestException: ADD_PARTITIONS_TO_TXN with version 4 which is not enabled when upgrading from kafka 3.5 to 3.6

2024-05-20 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez resolved KAFKA-16692.
-
Resolution: Fixed

> InvalidRequestException: ADD_PARTITIONS_TO_TXN with version 4 which is not 
> enabled when upgrading from kafka 3.5 to 3.6 
> 
>
> Key: KAFKA-16692
> URL: https://issues.apache.org/jira/browse/KAFKA-16692
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.7.0, 3.6.1, 3.8
>Reporter: Johnson Okorie
>Assignee: Justine Olshan
>Priority: Major
> Fix For: 3.7.1, 3.8
>
>
> We have a kafka cluster running on version 3.5.2 that we are upgrading to 
> 3.6.1. This cluster has a lot of clients with exactly one semantics enabled 
> and hence creating transactions. As we replaced brokers with the new 
> binaries, we observed lots of clients in the cluster experiencing the 
> following error:
> {code:java}
> 2024-05-07T09:08:10.039Z "tid": "" -- [Producer clientId=, 
> transactionalId=] Got error produce response with 
> correlation id 6402937 on topic-partition , retrying 
> (2147483512 attempts left). Error: NETWORK_EXCEPTION. Error Message: The 
> server disconnected before a response was received.{code}
> On inspecting the broker, we saw the following errors on brokers still 
> running Kafka version 3.5.2:
>  
> {code:java}
> message:     
> Closing socket for  because of error
> exception_exception_class:    
> org.apache.kafka.common.errors.InvalidRequestException
> exception_exception_message:    
> Received request api key ADD_PARTITIONS_TO_TXN with version 4 which is not 
> enabled
> exception_stacktrace:    
> org.apache.kafka.common.errors.InvalidRequestException: Received request api 
> key ADD_PARTITIONS_TO_TXN with version 4 which is not enabled
> {code}
> On the new brokers running 3.6.1 we saw the following errors:
>  
> {code:java}
> [AddPartitionsToTxnSenderThread-1055]: AddPartitionsToTxnRequest failed for 
> node 1043 with a network exception.{code}
>  
> I can also see this :
> {code:java}
> [AddPartitionsToTxnManager broker=1055]Cancelled in-flight 
> ADD_PARTITIONS_TO_TXN request with correlation id 21120 due to node 1043 
> being disconnected (elapsed time since creation: 11ms, elapsed time since 
> send: 4ms, request timeout: 3ms){code}
> We started investigating this issue and digging through the changes in 3.6, 
> we came across some changes introduced as part of KAFKA-14402 that we thought 
> might lead to this behaviour. 
> First we could see that _transaction.partition.verification.enable_ is 
> enabled by default and enables a new code path that culminates in we sending 
> version 4 ADD_PARTITIONS_TO_TXN requests to other brokers that are generated 
> [here|https://github.com/apache/kafka/blob/29f3260a9c07e654a28620aeb93a778622a5233d/core/src/main/scala/kafka/server/AddPartitionsToTxnManager.scala#L269].
> From a 
> [discussion|https://lists.apache.org/thread/4895wrd1z92kjb708zck4s1f62xq6r8x] 
> on the mailing list, [~jolshan] pointed out that this scenario shouldn't be 
> possible as the following code paths should prevent version 4 
> ADD_PARTITIONS_TO_TXN requests being sent to other brokers:
> [https://github.com/apache/kafka/blob/525b9b1d7682ae2a527ceca83fedca44b1cba11a/clients/src/main/java/org/apache/kafka/clients/NodeApiVersions.java#L130]
>  
> [https://github.com/apache/kafka/blob/525b9b1d7682ae2a527ceca83fedca44b1cba11a/core/src/main/scala/kafka/server/AddPartitionsToTxnManager.scala#L195]
> However, these requests are still sent to other brokers in our environment.
> On further inspection of the code, I am wondering if the following code path 
> could lead to this issue:
> [https://github.com/apache/kafka/blob/c4deed513057c94eb502e64490d6bdc23551d8b6/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L500]
> In this scenario, we don't have any _NodeApiVersions_ available for the 
> specified nodeId and potentially skipping the _latestUsableVersion_ check. I 
> am wondering if it is possible that because _discoverBrokerVersions_ is set 
> to _false_ for the network client of the {_}AddPartitionsToTxnManager{_}, it 
> skips fetching {_}NodeApiVersions{_}? I can see that we create the network 
> client here:
> [https://github.com/apache/kafka/blob/c4deed513057c94eb502e64490d6bdc23551d8b6/core/src/main/scala/kafka/server/KafkaServer.scala#L641]
> The _NetworkUtils.buildNetworkClient_ method seems to create a network client 
> that has _discoverBrokerVersions_ set to {_}false{_}. 
> I was hoping I could get some assistance debugging this issue. Happy to 
> provide any additional information needed.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16645) CVEs in 3.7.0 docker image

2024-05-20 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez resolved KAFKA-16645.
-
  Assignee: Igor Soarez
Resolution: Won't Fix

The vulnerability has already been addressed in the base image, under the same 
image tag, so the next published Kafka images will not contain ship the 
vulnerability.

We do not republish previous releases, so we're not taking any action here.

> CVEs in 3.7.0 docker image
> --
>
> Key: KAFKA-16645
> URL: https://issues.apache.org/jira/browse/KAFKA-16645
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 3.7.0
>Reporter: Mickael Maison
>Assignee: Igor Soarez
>Priority: Blocker
> Fix For: 3.8.0, 3.7.1
>
>
> Our [Docker Image CVE 
> Scanner|https://github.com/apache/kafka/actions/runs/874393] GitHub 
> action reports 2 high CVEs in our base image:
> apache/kafka:3.7.0 (alpine 3.19.1)
> ==
> Total: 2 (HIGH: 2, CRITICAL: 0)
> ┌──┬┬──┬┬───┬───┬─┐
> │ Library  │ Vulnerability  │ Severity │ Status │ Installed Version │ Fixed 
> Version │Title│
> ├──┼┼──┼┼───┼───┼─┤
> │ libexpat │ CVE-2023-52425 │ HIGH │ fixed  │ 2.5.0-r2  │ 
> 2.6.0-r0  │ expat: parsing large tokens can trigger a denial of service │
> │  ││  ││   │ 
>   │ https://avd.aquasec.com/nvd/cve-2023-52425  │
> │  ├┤  ││   
> ├───┼─┤
> │  │ CVE-2024-28757 │  ││   │ 
> 2.6.2-r0  │ expat: XML Entity Expansion │
> │  ││  ││   │ 
>   │ https://avd.aquasec.com/nvd/cve-2024-28757  │
> └──┴┴──┴┴───┴───┴─┘
> Looking at the 
> [KIP|https://cwiki.apache.org/confluence/display/KAFKA/KIP-975%3A+Docker+Image+for+Apache+Kafka#KIP975:DockerImageforApacheKafka-WhatifweobserveabugoracriticalCVEinthereleasedApacheKafkaDockerImage?]
>  that introduced the docker images, it seems we should release a bugfix when 
> high CVEs are detected. It would be good to investigate and assess whether 
> Kafka is impacted or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16688) SystemTimer leaks resources on close

2024-05-10 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez resolved KAFKA-16688.
-
Resolution: Fixed

> SystemTimer leaks resources on close
> 
>
> Key: KAFKA-16688
> URL: https://issues.apache.org/jira/browse/KAFKA-16688
> Project: Kafka
>  Issue Type: Test
>Affects Versions: 3.8.0
>Reporter: Gaurav Narula
>Assignee: Gaurav Narula
>Priority: Major
>
> We observe some thread leaks with thread name {{executor-client-metrics}}.
> This may happen because {{SystemTimer}} doesn't attempt to shutdown its 
> executor service properly.
> Refer: 
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15885/1/tests
>  and tests with {{initializationError}} in them for stacktrace



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16624) Don't generate useless PartitionChangeRecord on older MV

2024-05-04 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez resolved KAFKA-16624.
-
Resolution: Fixed

> Don't generate useless PartitionChangeRecord on older MV
> 
>
> Key: KAFKA-16624
> URL: https://issues.apache.org/jira/browse/KAFKA-16624
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Minor
>
> Fix a case where we could generate useless PartitionChangeRecords on metadata 
> versions older than 3.6-IV0. This could happen in the case where we had an 
> ISR with only one broker in it, and we were trying to go down to a fully 
> empty ISR. In this case, PartitionChangeBuilder would block the record to 
> going down to a fully empty ISR (since that is not valid in these pre-KIP-966 
> metadata versions), but it would still emit the record, even though it had no 
> effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16636) Flaky test - testStickyTaskAssignorLargePartitionCount – org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest

2024-04-27 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16636:
---

 Summary: Flaky test - testStickyTaskAssignorLargePartitionCount – 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest
 Key: KAFKA-16636
 URL: https://issues.apache.org/jira/browse/KAFKA-16636
 Project: Kafka
  Issue Type: Test
Reporter: Igor Soarez
 Attachments: log (1).txt

testStickyTaskAssignorLargePartitionCount – 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest
{code:java}
java.lang.AssertionError: The first assignment took too long to complete at 
131680ms.   at 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest.completeLargeAssignment(StreamsAssignmentScaleTest.java:220)
 at 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest.testStickyTaskAssignorLargePartitionCount(StreamsAssignmentScaleTest.java:102)
 {code}
[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15816/1/tests/]

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16635) Flaky test "shouldThrottleOldSegments(String).quorum=kraft" – kafka.server.ReplicationQuotasTest

2024-04-27 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16635:
---

 Summary: Flaky test 
"shouldThrottleOldSegments(String).quorum=kraft" – 
kafka.server.ReplicationQuotasTest
 Key: KAFKA-16635
 URL: https://issues.apache.org/jira/browse/KAFKA-16635
 Project: Kafka
  Issue Type: Test
Reporter: Igor Soarez


"shouldThrottleOldSegments(String).quorum=kraft" – 
kafka.server.ReplicationQuotasTest
{code:java}
org.opentest4j.AssertionFailedError: Throttled replication of 2203ms should be 
> 3600.0ms ==> expected:  but was:  at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
   at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
   at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) 
   at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) at 
app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214)at 
app//kafka.server.ReplicationQuotasTest.shouldThrottleOldSegments(ReplicationQuotasTest.scala:260)
 {code}
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15816/1/tests/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16634) Flaky test - testFenceMultipleBrokers() – org.apache.kafka.controller.QuorumControllerTest

2024-04-27 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16634:
---

 Summary: Flaky test - testFenceMultipleBrokers() – 
org.apache.kafka.controller.QuorumControllerTest
 Key: KAFKA-16634
 URL: https://issues.apache.org/jira/browse/KAFKA-16634
 Project: Kafka
  Issue Type: Test
Reporter: Igor Soarez
 Attachments: output.txt

testFenceMultipleBrokers() – org.apache.kafka.controller.QuorumControllerTest

Error:
{code:java}
java.util.concurrent.TimeoutException: testFenceMultipleBrokers() timed out 
after 40 seconds {code}
Test logs in attached output.txt

https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15816/1/tests/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16633) Flaky test - testDescribeExistingGroupWithNoMembers(String, String).quorum=kraft+kip848.groupProtocol=consumer – org.apache.kafka.tools.consumer.group.DescribeConsumerGr

2024-04-27 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16633:
---

 Summary: Flaky test - 
testDescribeExistingGroupWithNoMembers(String, 
String).quorum=kraft+kip848.groupProtocol=consumer – 
org.apache.kafka.tools.consumer.group.DescribeConsumerGroupTest
 Key: KAFKA-16633
 URL: https://issues.apache.org/jira/browse/KAFKA-16633
 Project: Kafka
  Issue Type: Test
Reporter: Igor Soarez


testDescribeExistingGroupWithNoMembers(String, 
String).quorum=kraft+kip848.groupProtocol=consumer – 
org.apache.kafka.tools.consumer.group.DescribeConsumerGroupTest
{code:java}
org.opentest4j.AssertionFailedError: Condition not met within timeout 15000. 
Expected no active member in describe group results with describe type 
--offsets ==> expected:  but was:  at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
   at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
   at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) 
   at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) at 
app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214)at 
app//org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:396)
   at 
app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:444)
 at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:393)   
 at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:377)   
 at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:350)   
 at 
app//org.apache.kafka.tools.consumer.group.DescribeConsumerGroupTest.testDescribeExistingGroupWithNoMembers(DescribeConsumerGroupTest.java:430)
 {code}
 

https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15816/1/tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16632) Flaky test testDeleteOffsetsOfStableConsumerGroupWithTopicPartition [1] Type=Raft-Isolated, MetadataVersion=3.8-IV0, Security=PLAINTEXT – org.apache.kafka.tools.consumer

2024-04-27 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16632:
---

 Summary: Flaky test 
testDeleteOffsetsOfStableConsumerGroupWithTopicPartition [1] 
Type=Raft-Isolated, MetadataVersion=3.8-IV0, Security=PLAINTEXT – 
org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest
 Key: KAFKA-16632
 URL: https://issues.apache.org/jira/browse/KAFKA-16632
 Project: Kafka
  Issue Type: Test
Reporter: Igor Soarez


testDeleteOffsetsOfStableConsumerGroupWithTopicPartition [1] 
Type=Raft-Isolated, MetadataVersion=3.8-IV0, Security=PLAINTEXT – 
org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest

 
{code:java}
org.opentest4j.AssertionFailedError: expected: not equal but was: <0>   at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:152)
   at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
   at 
app//org.junit.jupiter.api.AssertNotEquals.failEqual(AssertNotEquals.java:277)  
 at 
app//org.junit.jupiter.api.AssertNotEquals.assertNotEquals(AssertNotEquals.java:94)
  at 
app//org.junit.jupiter.api.AssertNotEquals.assertNotEquals(AssertNotEquals.java:86)
  at 
app//org.junit.jupiter.api.Assertions.assertNotEquals(Assertions.java:1981)  at 
app//org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest.withConsumerGroup(DeleteOffsetsConsumerGroupCommandIntegrationTest.java:213)
 at 
app//org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest.testWithConsumerGroup(DeleteOffsetsConsumerGroupCommandIntegrationTest.java:179)
 at 
app//org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest.testDeleteOffsetsOfStableConsumerGroupWithTopicPartition(DeleteOffsetsConsumerGroupCommandIntegrationTest.java:96)
{code}
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15816/1/tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16631) Flaky test - testDeleteOffsetsOfStableConsumerGroupWithTopicOnly [1] Type=Raft-Isolated, MetadataVersion=3.8-IV0, Security=PLAINTEXT – org.apache.kafka.tools.consumer.gr

2024-04-27 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16631:
---

 Summary: Flaky test - 
testDeleteOffsetsOfStableConsumerGroupWithTopicOnly [1] Type=Raft-Isolated, 
MetadataVersion=3.8-IV0, Security=PLAINTEXT – 
org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest
 Key: KAFKA-16631
 URL: https://issues.apache.org/jira/browse/KAFKA-16631
 Project: Kafka
  Issue Type: Test
Reporter: Igor Soarez


testDeleteOffsetsOfStableConsumerGroupWithTopicOnly [1] Type=Raft-Isolated, 
MetadataVersion=3.8-IV0, Security=PLAINTEXT – 
org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest

 
{code:java}
org.opentest4j.AssertionFailedError: expected: not equal but was: <0>   at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:152)
   at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
   at 
app//org.junit.jupiter.api.AssertNotEquals.failEqual(AssertNotEquals.java:277)  
 at 
app//org.junit.jupiter.api.AssertNotEquals.assertNotEquals(AssertNotEquals.java:94)
  at 
app//org.junit.jupiter.api.AssertNotEquals.assertNotEquals(AssertNotEquals.java:86)
  at 
app//org.junit.jupiter.api.Assertions.assertNotEquals(Assertions.java:1981)  at 
app//org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest.withConsumerGroup(DeleteOffsetsConsumerGroupCommandIntegrationTest.java:213)
 at 
app//org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest.testWithConsumerGroup(DeleteOffsetsConsumerGroupCommandIntegrationTest.java:179)
 at 
app//org.apache.kafka.tools.consumer.group.DeleteOffsetsConsumerGroupCommandIntegrationTest.testDeleteOffsetsOfStableConsumerGroupWithTopicOnly(DeleteOffsetsConsumerGroupCommandIntegrationTest.java:105)
{code}
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15816/1/tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16630) Flaky test "testPollReturnsRecords(GroupProtocol).groupProtocol=CLASSIC" – org.apache.kafka.clients.consumer.KafkaConsumerTest

2024-04-27 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16630:
---

 Summary: Flaky test 
"testPollReturnsRecords(GroupProtocol).groupProtocol=CLASSIC" – 
org.apache.kafka.clients.consumer.KafkaConsumerTest
 Key: KAFKA-16630
 URL: https://issues.apache.org/jira/browse/KAFKA-16630
 Project: Kafka
  Issue Type: Test
Reporter: Igor Soarez


"testPollReturnsRecords(GroupProtocol).groupProtocol=CLASSIC" – 
org.apache.kafka.clients.consumer.KafkaConsumerTest

 
{code:java}
org.opentest4j.AssertionFailedError: expected: <0> but was: <5> at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
   at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
   at 
app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)  at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)  at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145)  at 
app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:531)  at 
app//org.apache.kafka.clients.consumer.KafkaConsumerTest.testPollReturnsRecords(KafkaConsumerTest.java:289)
 {code}
[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15816/1/tests]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16610) Replace "Map#entrySet#forEach" by "Map#forEach"

2024-04-24 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez resolved KAFKA-16610.
-
Resolution: Resolved

> Replace "Map#entrySet#forEach" by "Map#forEach"
> ---
>
> Key: KAFKA-16610
> URL: https://issues.apache.org/jira/browse/KAFKA-16610
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: TengYao Chi
>Priority: Minor
> Fix For: 3.8.0
>
>
> {quote}
> Targets
>     Occurrences of 'entrySet().forEach' in Project
> Found occurrences in Project  (16 usages found)
>     Unclassified  (16 usages found)
>         kafka.core.main  (9 usages found)
>             kafka.server  (4 usages found)
>                 ControllerApis.scala  (2 usages found)
>                     ControllerApis  (2 usages found)
>                         handleIncrementalAlterConfigs  (1 usage found)
>                             774 controllerResults.entrySet().forEach(entry => 
> response.responses().add(
>                         handleLegacyAlterConfigs  (1 usage found)
>                             533 controllerResults.entrySet().forEach(entry => 
> response.responses().add(
>                 ControllerConfigurationValidator.scala  (2 usages found)
>                     ControllerConfigurationValidator  (2 usages found)
>                         validate  (2 usages found)
>                             99 config.entrySet().forEach(e => {
>                             114 config.entrySet().forEach(e => 
> properties.setProperty(e.getKey, e.getValue))
>             kafka.server.metadata  (5 usages found)
>                 AclPublisher.scala  (1 usage found)
>                     AclPublisher  (1 usage found)
>                         onMetadataUpdate  (1 usage found)
>                             73 aclsDelta.changes().entrySet().forEach(e =>
>                 ClientQuotaMetadataManager.scala  (3 usages found)
>                     ClientQuotaMetadataManager  (3 usages found)
>                         handleIpQuota  (1 usage found)
>                             119 quotaDelta.changes().entrySet().forEach { e =>
>                         update  (2 usages found)
>                             54 quotasDelta.changes().entrySet().forEach { e =>
>                             99 quotaDelta.changes().entrySet().forEach { e =>
>                 KRaftMetadataCache.scala  (1 usage found)
>                     KRaftMetadataCache  (1 usage found)
>                         getClusterMetadata  (1 usage found)
>                             491 topic.partitions().entrySet().forEach { entry 
> =>
>         kafka.core.test  (1 usage found)
>             unit.kafka.integration  (1 usage found)
>                 KafkaServerTestHarness.scala  (1 usage found)
>                     KafkaServerTestHarness  (1 usage found)
>                         getTopicNames  (1 usage found)
>                             349 
> controllerServer.controller.findAllTopicIds(ANONYMOUS_CONTEXT).get().entrySet().forEach
>  {
>         kafka.metadata.main  (3 usages found)
>             org.apache.kafka.controller  (2 usages found)
>                 QuorumFeatures.java  (1 usage found)
>                     toString()  (1 usage found)
>                         144 localSupportedFeatures.entrySet().forEach(f -> 
> features.add(f.getKey() + ": " + f.getValue()));
>                 ReplicationControlManager.java  (1 usage found)
>                     createTopic(ControllerRequestContext, CreatableTopic, 
> List, Map, 
> List, boolean)  (1 usage found)
>                         732 newParts.entrySet().forEach(e -> 
> assignments.put(e.getKey(),
>             org.apache.kafka.metadata.properties  (1 usage found)
>                 MetaPropertiesEnsemble.java  (1 usage found)
>                     toString()  (1 usage found)
>                         610 logDirProps.entrySet().forEach(
>         kafka.metadata.test  (1 usage found)
>             org.apache.kafka.controller  (1 usage found)
>                 ReplicationControlManagerTest.java  (1 usage found)
>                     createTestTopic(String, int[][], Map, 
> short)  (1 usage found)
>                         307 configs.entrySet().forEach(e -> 
> topic.configs().add(
>         kafka.streams.main  (1 usage found)
>             org.apache.kafka.streams.processor.internals  (1 usage found)
>                 StreamsMetadataState.java  (1 usage found)
>                     onChange(Map>, 
> Map>, Map)  (1 
> usage found)
>                         317 topicPartitionInfo.entrySet().forEach(entry -> 
> this.partitionsByTopic
>         kafka.tools.main  (1 usage found)
>             org.apache.kafka.tools  (1 usage found)
>                 LeaderElectionCommand.java  (1 usage found)
>                     electLeaders(Admin, ElectionType, 
> 

[jira] [Reopened] (KAFKA-16606) JBOD support in KRaft does not seem to be gated by the metadata version

2024-04-24 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez reopened KAFKA-16606:
-
  Assignee: Igor Soarez

> JBOD support in KRaft does not seem to be gated by the metadata version
> ---
>
> Key: KAFKA-16606
> URL: https://issues.apache.org/jira/browse/KAFKA-16606
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Jakub Scholz
>Assignee: Igor Soarez
>Priority: Major
>
> JBOD support in KRaft should be supported since Kafka 3.7.0. The Kafka 
> [source 
> code|https://github.com/apache/kafka/blob/1b301b30207ed8fca9f0aea5cf940b0353a1abca/server-common/src/main/java/org/apache/kafka/server/common/MetadataVersion.java#L194-L195]
>  suggests that it is supported with the metadata version {{{}3.7-IV2{}}}. 
> However, it seems to be possible to run KRaft cluster with JBOD even with 
> older metadata versions such as {{{}3.6{}}}. For example, I have a cluster 
> using the {{3.6}} metadata version:
> {code:java}
> bin/kafka-features.sh --bootstrap-server localhost:9092 describe
> Feature: metadata.version       SupportedMinVersion: 3.0-IV1    
> SupportedMaxVersion: 3.7-IV4    FinalizedVersionLevel: 3.6-IV2  Epoch: 1375 
> {code}
> Yet a KRaft cluster with JBOD seems to run fine:
> {code:java}
> bin/kafka-log-dirs.sh --bootstrap-server localhost:9092 --describe
> Querying brokers for log directories information
> Received log directory information from brokers 2000,3000,1000
> 

[jira] [Created] (KAFKA-16602) Flaky test – org.apache.kafka.controller.QuorumControllerTest.testBootstrapZkMigrationRecord()

2024-04-22 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16602:
---

 Summary: Flaky test – 
org.apache.kafka.controller.QuorumControllerTest.testBootstrapZkMigrationRecord()
 Key: KAFKA-16602
 URL: https://issues.apache.org/jira/browse/KAFKA-16602
 Project: Kafka
  Issue Type: Test
  Components: controller
Reporter: Igor Soarez


 
org.apache.kafka.controller.QuorumControllerTest.testBootstrapZkMigrationRecord()
 failed with:

 
h4. Error
{code:java}
org.apache.kafka.server.fault.FaultHandlerException: fatalFaultHandler: 
exception while renouncing leadership: Attempt to resign from epoch 1 which is 
larger than the current epoch 0{code}
h4. Stacktrace
{code:java}
org.apache.kafka.server.fault.FaultHandlerException: fatalFaultHandler: 
exception while renouncing leadership: Attempt to resign from epoch 1 which is 
larger than the current epoch 0  at 
app//org.apache.kafka.metalog.LocalLogManager.resign(LocalLogManager.java:808)  
at 
app//org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1270)
  at 
app//org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:547)
  at 
app//org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:179)
  at 
app//org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:881)
  at 
app//org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871)
  at 
app//org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:149)
  at 
app//org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:138)
  at 
app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:211)
  at 
app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:182)
  at java.base@11.0.16.1/java.lang.Thread.run(Thread.java:829) Caused by: 
java.lang.IllegalArgumentException: Attempt to resign from epoch 1 which is 
larger than the current epoch 0{code}
 

Source: 
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15701/3/tests/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16601) Flaky test – org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest.testClosingQuorumControllerClosesMetrics()

2024-04-22 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16601:
---

 Summary: Flaky test – 
org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest.testClosingQuorumControllerClosesMetrics()
 Key: KAFKA-16601
 URL: https://issues.apache.org/jira/browse/KAFKA-16601
 Project: Kafka
  Issue Type: Test
  Components: controller
Reporter: Igor Soarez


 
org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest.testClosingQuorumControllerClosesMetrics()
 failed with:
h4. Error
{code:java}
org.apache.kafka.server.fault.FaultHandlerException: fatalFaultHandler: 
exception while renouncing leadership: Attempt to resign from epoch 1 which is 
larger than the current epoch 0{code}
h4. Stacktrace
{code:java}
org.apache.kafka.server.fault.FaultHandlerException: fatalFaultHandler: 
exception while renouncing leadership: Attempt to resign from epoch 1 which is 
larger than the current epoch 0  at 
app//org.apache.kafka.metalog.LocalLogManager.resign(LocalLogManager.java:808)  
at 
app//org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1270)
  at 
app//org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:547)
  at 
app//org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:179)
  at 
app//org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:881)
  at 
app//org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871)
  at 
app//org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:149)
  at 
app//org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:138)
  at 
app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:211)
  at 
app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:182)
  at java.base@11.0.16.1/java.lang.Thread.run(Thread.java:829) Caused by: 
java.lang.IllegalArgumentException: Attempt to resign from epoch 1 which is 
larger than the current epoch 0{code}
 

Source:

https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15701/3/tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16597) Flaky test - org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQuerySpecificStalePartitionStoresMultiStreamThreads()

2024-04-22 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16597:
---

 Summary: Flaky test - 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQuerySpecificStalePartitionStoresMultiStreamThreads()
 Key: KAFKA-16597
 URL: https://issues.apache.org/jira/browse/KAFKA-16597
 Project: Kafka
  Issue Type: Test
  Components: streams
Reporter: Igor Soarez


org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQuerySpecificStalePartitionStoresMultiStreamThreads()
 failed with:
{code:java}
Error

org.apache.kafka.streams.errors.InvalidStateStorePartitionException: The 
specified partition 1 for store source-table does not exist.

Stacktrace

org.apache.kafka.streams.errors.InvalidStateStorePartitionException: The 
specified partition 1 for store source-table does not exist.   at 
app//org.apache.kafka.streams.state.internals.WrappingStoreProvider.stores(WrappingStoreProvider.java:63)
at 
app//org.apache.kafka.streams.state.internals.CompositeReadOnlyKeyValueStore.get(CompositeReadOnlyKeyValueStore.java:53)
 at 
app//org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQuerySpecificStalePartitionStoresMultiStreamThreads(StoreQueryIntegrationTest.java:411)
 {code}
 

https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15701/2/tests/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16596) Flaky test – org.apache.kafka.clients.ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup()

2024-04-22 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16596:
---

 Summary: Flaky test – 
org.apache.kafka.clients.ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup()
 
 Key: KAFKA-16596
 URL: https://issues.apache.org/jira/browse/KAFKA-16596
 Project: Kafka
  Issue Type: Test
Reporter: Igor Soarez


org.apache.kafka.clients.ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup()
 failed in the following way:

 
{code:java}
org.opentest4j.AssertionFailedError: Unexpected addresses [93.184.215.14, 
2606:2800:21f:cb07:6820:80da:af6b:8b2c] ==> expected:  but was:  
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
   at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
   at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) 
   at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) at 
app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214)at 
app//org.apache.kafka.clients.ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup(ClientUtilsTest.java:65)
 {code}
As a result of the following assertions:

 
{code:java}
// With lookup of example.com, either one or two addresses are expected 
depending on
// whether ipv4 and ipv6 are enabled
List validatedAddresses = 
checkWithLookup(asList("example.com:1"));
assertTrue(validatedAddresses.size() >= 1, "Unexpected addresses " + 
validatedAddresses);
List validatedHostNames = 
validatedAddresses.stream().map(InetSocketAddress::getHostName)
.collect(Collectors.toList());
List expectedHostNames = asList("93.184.216.34", 
"2606:2800:220:1:248:1893:25c8:1946"); {code}
It seems that the DNS result has changed for example.com.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-15793) Flaky test ZkMigrationIntegrationTest.testMigrateTopicDeletions

2024-04-10 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez reopened KAFKA-15793:
-

This has come up again:

 
{code:java}
[2024-04-09T21:06:17.307Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> DeleteTopicsRequestTest > 
testTopicDeletionClusterHasOfflinePartitions(String) > 
"testTopicDeletionClusterHasOfflinePartitions(String).quorum=zk" STARTED
[2024-04-09T21:06:17.307Z] 
kafka.zk.ZkMigrationIntegrationTest.testMigrateTopicDeletions(ClusterInstance)[7]
 failed, log available in 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-15522/core/build/reports/testOutput/kafka.zk.ZkMigrationIntegrationTest.testMigrateTopicDeletions(ClusterInstance)[7].test.stdout
[2024-04-09T21:06:17.307Z] 
[2024-04-09T21:06:17.307Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > testMigrateTopicDeletions(ClusterInstance) > 
testMigrateTopicDeletions [7] Type=ZK, MetadataVersion=3.7-IV4, 
Security=PLAINTEXT FAILED
[2024-04-09T21:06:17.307Z]     
org.apache.kafka.server.fault.FaultHandlerException: nonFatalFaultHandler: 
Unhandled error in MetadataChangeEvent: Check op on KRaft Migration ZNode 
failed. Expected zkVersion = 5. This indicates that another KRaft controller is 
making writes to ZooKeeper.
[2024-04-09T21:06:17.307Z]         at 
app//kafka.zk.KafkaZkClient.handleUnwrappedMigrationResult$1(KafkaZkClient.scala:2001)
[2024-04-09T21:06:17.307Z]         at 
app//kafka.zk.KafkaZkClient.unwrapMigrationResponse$1(KafkaZkClient.scala:2027)
[2024-04-09T21:06:17.307Z]         at 
app//kafka.zk.KafkaZkClient.$anonfun$retryMigrationRequestsUntilConnected$2(KafkaZkClient.scala:2052)
[2024-04-09T21:06:17.307Z]         at 
app//scala.collection.StrictOptimizedIterableOps.map(StrictOptimizedIterableOps.scala:100)
[2024-04-09T21:06:17.307Z]         at 
app//scala.collection.StrictOptimizedIterableOps.map$(StrictOptimizedIterableOps.scala:87)
[2024-04-09T21:06:17.307Z]         at 
app//scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:43)
[2024-04-09T21:06:17.307Z]         at 
app//kafka.zk.KafkaZkClient.retryMigrationRequestsUntilConnected(KafkaZkClient.scala:2052)
[2024-04-09T21:06:17.307Z]         at 
app//kafka.zk.migration.ZkTopicMigrationClient.$anonfun$updateTopicPartitions$1(ZkTopicMigrationClient.scala:265)
[2024-04-09T21:06:17.307Z]         at 
app//kafka.zk.migration.ZkTopicMigrationClient.updateTopicPartitions(ZkTopicMigrationClient.scala:255)
[2024-04-09T21:06:17.307Z]         at 
app//org.apache.kafka.metadata.migration.KRaftMigrationZkWriter.lambda$handleTopicsDelta$20(KRaftMigrationZkWriter.java:334)
[2024-04-09T21:06:17.307Z]         at 
app//org.apache.kafka.metadata.migration.KRaftMigrationDriver.applyMigrationOperation(KRaftMigrationDriver.java:248)
[2024-04-09T21:06:17.307Z]         at 
app//org.apache.kafka.metadata.migration.KRaftMigrationDriver.access$300(KRaftMigrationDriver.java:62)
[2024-04-09T21:06:17.307Z]         at 
app//org.apache.kafka.metadata.migration.KRaftMigrationDriver$MetadataChangeEvent.lambda$run$1(KRaftMigrationDriver.java:532)
[2024-04-09T21:06:17.307Z]         at 
app//org.apache.kafka.metadata.migration.KRaftMigrationDriver.lambda$countingOperationConsumer$6(KRaftMigrationDriver.java:845)
[2024-04-09T21:06:17.307Z]         at 
app//org.apache.kafka.metadata.migration.KRaftMigrationZkWriter.lambda$handleTopicsDelta$21(KRaftMigrationZkWriter.java:331)
[2024-04-09T21:06:17.307Z]         at 
java.base@17.0.7/java.util.HashMap.forEach(HashMap.java:1421)
[2024-04-09T21:06:17.307Z]         at 
app//org.apache.kafka.metadata.migration.KRaftMigrationZkWriter.handleTopicsDelta(KRaftMigrationZkWriter.java:297)
[2024-04-09T21:06:17.307Z]         at 
app//org.apache.kafka.metadata.migration.KRaftMigrationZkWriter.handleDelta(KRaftMigrationZkWriter.java:112)
[2024-04-09T21:06:17.307Z]         at 
app//org.apache.kafka.metadata.migration.KRaftMigrationDriver$MetadataChangeEvent.run(KRaftMigrationDriver.java:531)
[2024-04-09T21:06:17.307Z]         at 
app//org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:128)
[2024-04-09T21:06:17.307Z]         at 
app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:211)
[2024-04-09T21:06:17.307Z]         at 
app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:182)
[2024-04-09T21:06:17.307Z]         at 
java.base@17.0.7/java.lang.Thread.run(Thread.java:833)
[2024-04-09T21:06:17.307Z] 
[2024-04-09T21:06:17.307Z]         Caused by:
[2024-04-09T21:06:17.307Z]         java.lang.RuntimeException: Check op on 
KRaft Migration ZNode failed. Expected zkVersion = 5. This indicates that 
another KRaft controller is making writes to ZooKeeper.
[2024-04-09T21:06:17.307Z]             at 
kafka.zk.KafkaZkClient.handleUnwrappedMigrationResult$1(KafkaZkClient.scala:2001)
[2024-04-09T21:06:17.307Z]             ... 22 

[jira] [Created] (KAFKA-16504) Flaky test org.apache.kafka.controller.QuorumControllerTest.testConfigurationOperations

2024-04-10 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16504:
---

 Summary: Flaky test 
org.apache.kafka.controller.QuorumControllerTest.testConfigurationOperations
 Key: KAFKA-16504
 URL: https://issues.apache.org/jira/browse/KAFKA-16504
 Project: Kafka
  Issue Type: Test
  Components: controller
Reporter: Igor Soarez


{code:java}
[2024-04-09T20:26:55.840Z] Gradle Test Run :metadata:test > Gradle Test 
Executor 54 > QuorumControllerTest > testConfigurationOperations() STARTED
[2024-04-09T20:26:55.840Z] 
org.apache.kafka.controller.QuorumControllerTest.testConfigurationOperations() 
failed, log available in 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-15522/metadata/build/reports/testOutput/org.apache.kafka.controller.QuorumControllerTest.testConfigurationOperations().test.stdout
[2024-04-09T20:26:55.840Z] 
[2024-04-09T20:26:55.840Z] Gradle Test Run :metadata:test > Gradle Test 
Executor 54 > QuorumControllerTest > testConfigurationOperations() FAILED
[2024-04-09T20:26:55.840Z]     java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.NotControllerException: No controller appears to 
be active.
[2024-04-09T20:26:55.840Z]         at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
[2024-04-09T20:26:55.840Z]         at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
[2024-04-09T20:26:55.840Z]         at 
org.apache.kafka.controller.QuorumControllerTest.testConfigurationOperations(QuorumControllerTest.java:202)
[2024-04-09T20:26:55.840Z] 
[2024-04-09T20:26:55.840Z]         Caused by:
[2024-04-09T20:26:55.840Z]         
org.apache.kafka.common.errors.NotControllerException: No controller appears to 
be active. {code}
 

[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15522/5/tests/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16403) Flaky test org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords

2024-03-27 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez resolved KAFKA-16403.
-
Resolution: Not A Bug

> Flaky test 
> org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords
> -
>
> Key: KAFKA-16403
> URL: https://issues.apache.org/jira/browse/KAFKA-16403
> Project: Kafka
>  Issue Type: Bug
>Reporter: Igor Soarez
>Priority: Major
>
> {code:java}
> org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords()
>  failed, log available in 
> /home/jenkins/workspace/Kafka_kafka-pr_PR-14903/streams/examples/build/reports/testOutput/org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords().test.stdout
> Gradle Test Run :streams:examples:test > Gradle Test Executor 82 > 
> WordCountDemoTest > testCountListOfWords() FAILED
>     org.apache.kafka.streams.errors.ProcessorStateException: Error opening 
> store KSTREAM-AGGREGATE-STATE-STORE-03 at location 
> /tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03
>         at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:329)
>         at 
> org.apache.kafka.streams.state.internals.RocksDBTimestampedStore.openRocksDB(RocksDBTimestampedStore.java:69)
>         at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:254)
>         at 
> org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:175)
>         at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71)
>         at 
> org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:56)
>         at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71)
>         at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$init$3(MeteredKeyValueStore.java:151)
>         at 
> org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:872)
>         at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:151)
>         at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.registerStateStores(ProcessorStateManager.java:232)
>         at 
> org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:102)
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.initializeIfNeeded(StreamTask.java:258)
>         at 
> org.apache.kafka.streams.TopologyTestDriver.setupTask(TopologyTestDriver.java:530)
>         at 
> org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:373)
>         at 
> org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:300)
>         at 
> org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:276)
>         at 
> org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.setup(WordCountDemoTest.java:60)
>         Caused by:
>         org.rocksdb.RocksDBException: Corruption: IO error: No such file or 
> directory: While open a file for random read: 
> /tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03/10.ldb:
>  No such file or directory in file 
> /tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03/MANIFEST-05
>             at org.rocksdb.RocksDB.open(Native Method)
>             at org.rocksdb.RocksDB.open(RocksDB.java:307)
>             at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:323)
>             ... 17 more
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16404) Flaky test org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig

2024-03-27 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez resolved KAFKA-16404.
-
Resolution: Not A Bug

Same as KAFKA-16403, this only failed once. It was likely the result of a 
testing infrastructure problem. We can always re-open if we see this again and 
suspect otherwise.

> Flaky test 
> org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig
> -
>
> Key: KAFKA-16404
> URL: https://issues.apache.org/jira/browse/KAFKA-16404
> Project: Kafka
>  Issue Type: Bug
>Reporter: Igor Soarez
>Priority: Major
>
>  
> {code:java}
> org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig()
>  failed, log available in 
> /home/jenkins/workspace/Kafka_kafka-pr_PR-14903@2/streams/examples/build/reports/testOutput/org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig().test.stdout
> Gradle Test Run :streams:examples:test > Gradle Test Executor 87 > 
> WordCountDemoTest > testGetStreamsConfig() FAILED
>     org.apache.kafka.streams.errors.ProcessorStateException: Error opening 
> store KSTREAM-AGGREGATE-STATE-STORE-03 at location 
> /tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03
>         at 
> app//org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:329)
>         at 
> app//org.apache.kafka.streams.state.internals.RocksDBTimestampedStore.openRocksDB(RocksDBTimestampedStore.java:69)
>         at 
> app//org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:254)
>         at 
> app//org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:175)
>         at 
> app//org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71)
>         at 
> app//org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:56)
>         at 
> app//org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71)
>         at 
> app//org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$init$3(MeteredKeyValueStore.java:151)
>         at 
> app//org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:872)
>         at 
> app//org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:151)
>         at 
> app//org.apache.kafka.streams.processor.internals.ProcessorStateManager.registerStateStores(ProcessorStateManager.java:232)
>         at 
> app//org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:102)
>         at 
> app//org.apache.kafka.streams.processor.internals.StreamTask.initializeIfNeeded(StreamTask.java:258)
>         at 
> app//org.apache.kafka.streams.TopologyTestDriver.setupTask(TopologyTestDriver.java:530)
>         at 
> app//org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:373)
>         at 
> app//org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:300)
>         at 
> app//org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:276)
>         at 
> app//org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.setup(WordCountDemoTest.java:60)
>         Caused by:
>         org.rocksdb.RocksDBException: While lock file: 
> /tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03/LOCK:
>  Resource temporarily unavailable
>             at app//org.rocksdb.RocksDB.open(Native Method)
>             at app//org.rocksdb.RocksDB.open(RocksDB.java:307)
>             at 
> app//org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:323)
>             ... 17 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16422) Flaky test org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest."testFailingOverIncrementsNewActiveControllerCount(boolean).true"

2024-03-25 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16422:
---

 Summary: Flaky test 
org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest."testFailingOverIncrementsNewActiveControllerCount(boolean).true"
 Key: KAFKA-16422
 URL: https://issues.apache.org/jira/browse/KAFKA-16422
 Project: Kafka
  Issue Type: Bug
Reporter: Igor Soarez


{code:java}
[2024-03-22T10:39:59.911Z] Gradle Test Run :metadata:test > Gradle Test 
Executor 92 > QuorumControllerMetricsIntegrationTest > 
testFailingOverIncrementsNewActiveControllerCount(boolean) > 
"testFailingOverIncrementsNewActiveControllerCount(boolean).true" FAILED
[2024-03-22T10:39:59.912Z]     org.opentest4j.AssertionFailedError: expected: 
<1> but was: <2>
[2024-03-22T10:39:59.912Z]         at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
[2024-03-22T10:39:59.912Z]         at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
[2024-03-22T10:39:59.912Z]         at 
app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
[2024-03-22T10:39:59.912Z]         at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166)
[2024-03-22T10:39:59.912Z]         at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161)
[2024-03-22T10:39:59.912Z]         at 
app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:632)
[2024-03-22T10:39:59.912Z]         at 
app//org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest.lambda$testFailingOverIncrementsNewActiveControllerCount$1(QuorumControllerMetricsIntegrationTest.java:107)
[2024-03-22T10:39:59.912Z]         at 
app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:444)
[2024-03-22T10:39:59.912Z]         at 
app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:412)
[2024-03-22T10:39:59.912Z]         at 
app//org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest.testFailingOverIncrementsNewActiveControllerCount(QuorumControllerMetricsIntegrationTest.java:105)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16404) Flaky test org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig

2024-03-22 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16404:
---

 Summary: Flaky test 
org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig
 Key: KAFKA-16404
 URL: https://issues.apache.org/jira/browse/KAFKA-16404
 Project: Kafka
  Issue Type: Bug
Reporter: Igor Soarez


 
{code:java}
org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig()
 failed, log available in 
/home/jenkins/workspace/Kafka_kafka-pr_PR-14903@2/streams/examples/build/reports/testOutput/org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig().test.stdout
Gradle Test Run :streams:examples:test > Gradle Test Executor 87 > 
WordCountDemoTest > testGetStreamsConfig() FAILED
    org.apache.kafka.streams.errors.ProcessorStateException: Error opening 
store KSTREAM-AGGREGATE-STATE-STORE-03 at location 
/tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03
        at 
app//org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:329)
        at 
app//org.apache.kafka.streams.state.internals.RocksDBTimestampedStore.openRocksDB(RocksDBTimestampedStore.java:69)
        at 
app//org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:254)
        at 
app//org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:175)
        at 
app//org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71)
        at 
app//org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:56)
        at 
app//org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71)
        at 
app//org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$init$3(MeteredKeyValueStore.java:151)
        at 
app//org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:872)
        at 
app//org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:151)
        at 
app//org.apache.kafka.streams.processor.internals.ProcessorStateManager.registerStateStores(ProcessorStateManager.java:232)
        at 
app//org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:102)
        at 
app//org.apache.kafka.streams.processor.internals.StreamTask.initializeIfNeeded(StreamTask.java:258)
        at 
app//org.apache.kafka.streams.TopologyTestDriver.setupTask(TopologyTestDriver.java:530)
        at 
app//org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:373)
        at 
app//org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:300)
        at 
app//org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:276)
        at 
app//org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.setup(WordCountDemoTest.java:60)
        Caused by:
        org.rocksdb.RocksDBException: While lock file: 
/tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03/LOCK:
 Resource temporarily unavailable
            at app//org.rocksdb.RocksDB.open(Native Method)
            at app//org.rocksdb.RocksDB.open(RocksDB.java:307)
            at 
app//org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:323)
            ... 17 more
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16403) Flaky test org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords

2024-03-22 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16403:
---

 Summary: Flaky test 
org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords
 Key: KAFKA-16403
 URL: https://issues.apache.org/jira/browse/KAFKA-16403
 Project: Kafka
  Issue Type: Bug
Reporter: Igor Soarez


{code:java}
org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords()
 failed, log available in 
/home/jenkins/workspace/Kafka_kafka-pr_PR-14903/streams/examples/build/reports/testOutput/org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords().test.stdout
Gradle Test Run :streams:examples:test > Gradle Test Executor 82 > 
WordCountDemoTest > testCountListOfWords() FAILED
    org.apache.kafka.streams.errors.ProcessorStateException: Error opening 
store KSTREAM-AGGREGATE-STATE-STORE-03 at location 
/tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03
        at 
org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:329)
        at 
org.apache.kafka.streams.state.internals.RocksDBTimestampedStore.openRocksDB(RocksDBTimestampedStore.java:69)
        at 
org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:254)
        at 
org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:175)
        at 
org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71)
        at 
org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:56)
        at 
org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:71)
        at 
org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$init$3(MeteredKeyValueStore.java:151)
        at 
org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:872)
        at 
org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:151)
        at 
org.apache.kafka.streams.processor.internals.ProcessorStateManager.registerStateStores(ProcessorStateManager.java:232)
        at 
org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:102)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.initializeIfNeeded(StreamTask.java:258)
        at 
org.apache.kafka.streams.TopologyTestDriver.setupTask(TopologyTestDriver.java:530)
        at 
org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:373)
        at 
org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:300)
        at 
org.apache.kafka.streams.TopologyTestDriver.(TopologyTestDriver.java:276)
        at 
org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.setup(WordCountDemoTest.java:60)
        Caused by:
        org.rocksdb.RocksDBException: Corruption: IO error: No such file or 
directory: While open a file for random read: 
/tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03/10.ldb:
 No such file or directory in file 
/tmp/kafka-streams/streams-wordcount/0_0/rocksdb/KSTREAM-AGGREGATE-STATE-STORE-03/MANIFEST-05
            at org.rocksdb.RocksDB.open(Native Method)
            at org.rocksdb.RocksDB.open(RocksDB.java:307)
            at 
org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:323)
            ... 17 more
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16402) Flaky test org.apache.kafka.controller.QuorumControllerTest.testSnapshotSaveAndLoad

2024-03-22 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16402:
---

 Summary: Flaky test 
org.apache.kafka.controller.QuorumControllerTest.testSnapshotSaveAndLoad
 Key: KAFKA-16402
 URL: https://issues.apache.org/jira/browse/KAFKA-16402
 Project: Kafka
  Issue Type: Bug
Reporter: Igor Soarez


{code:java}
org.apache.kafka.controller.QuorumControllerTest.testSnapshotSaveAndLoad() 
failed, log available in 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-14903/metadata/build/reports/testOutput/org.apache.kafka.controller.QuorumControllerTest.testSnapshotSaveAndLoad().test.stdout
Gradle Test Run :metadata:test > Gradle Test Executor 93 > QuorumControllerTest 
> testSnapshotSaveAndLoad() FAILED
    java.lang.IllegalArgumentException: Self-suppression not permitted
        at java.base/java.lang.Throwable.addSuppressed(Throwable.java:1072)
        at 
org.apache.kafka.controller.QuorumControllerTest.testSnapshotSaveAndLoad(QuorumControllerTest.java:645)
        Caused by:
        org.apache.kafka.server.fault.FaultHandlerException: fatalFaultHandler: 
exception while renouncing leadership: Attempt to resign from epoch 1 which is 
larger than the current epoch 0
            at 
app//org.apache.kafka.metalog.LocalLogManager.resign(LocalLogManager.java:808)
            at 
app//org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1270)
            at 
app//org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:547)
            at 
app//org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:179)
            at 
app//org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:881)
            at 
app//org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871)
            at 
app//org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:149)
            at 
app//org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:138)
            at 
app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:211)
            at 
app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:182)
            at java.base@17.0.7/java.lang.Thread.run(Thread.java:833)
            Caused by:
            java.lang.IllegalArgumentException: Attempt to resign from epoch 1 
which is larger than the current epoch 0
                at 
org.apache.kafka.metalog.LocalLogManager.resign(LocalLogManager.java:808)
                ... 10 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16365) AssignmentsManager mismanages completion notifications

2024-03-11 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16365:
---

 Summary: AssignmentsManager mismanages completion notifications
 Key: KAFKA-16365
 URL: https://issues.apache.org/jira/browse/KAFKA-16365
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez
Assignee: Igor Soarez


When moving replicas between directories in the same broker, future replica 
promotion hinges on acknowledgment from the controller of a change in the 
directory assignment.
 
ReplicaAlterLogDirsThread relies on AssignmentsManager for a completion 
notification of the directory assignment change.
 
In its current form, under certain assignment scheduling, AssignmentsManager 
both miss completion notifications, or prematurely trigger them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16363) Storage crashes if dir is unavailable

2024-03-11 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16363:
---

 Summary: Storage crashes if dir is unavailable
 Key: KAFKA-16363
 URL: https://issues.apache.org/jira/browse/KAFKA-16363
 Project: Kafka
  Issue Type: Sub-task
  Components: tools
Affects Versions: 3.7.0
Reporter: Igor Soarez


The storage tool crashes if one of the configured log directories is 
unavailable. 

 
{code:java}
sh-4.4# ./bin/kafka-storage.sh format --ignore-formatted -t $KAFKA_CLUSTER_ID 
-c server.properties
[2024-03-11 17:51:05,391] ERROR Error while reading meta.properties file 
/data/d2/meta.properties 
(org.apache.kafka.metadata.properties.MetaPropertiesEnsemble)
java.nio.file.AccessDeniedException: /data/d2/meta.properties
        at 
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
        at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
        at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at 
java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218)
        at java.base/java.nio.file.Files.newByteChannel(Files.java:380)
        at java.base/java.nio.file.Files.newByteChannel(Files.java:432)
        at 
java.base/java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:422)
        at java.base/java.nio.file.Files.newInputStream(Files.java:160)
        at 
org.apache.kafka.metadata.properties.PropertiesUtils.readPropertiesFile(PropertiesUtils.java:77)
        at 
org.apache.kafka.metadata.properties.MetaPropertiesEnsemble$Loader.load(MetaPropertiesEnsemble.java:135)
        at kafka.tools.StorageTool$.formatCommand(StorageTool.scala:431)
        at kafka.tools.StorageTool$.main(StorageTool.scala:95)
        at kafka.tools.StorageTool.main(StorageTool.scala)
metaPropertiesEnsemble=MetaPropertiesEnsemble(metadataLogDir=Optional.empty, 
dirs={/data/d1: MetaProperties(version=1, clusterId=RwO2UIkmTBWltwRllP05aA, 
nodeId=101, directoryId=zm7fSw3zso9aR0AtuzsI_A), /data/metadata: 
MetaProperties(version=1, clusterId=RwO2UIkmTBWltwRllP05aA, nodeId=101, 
directoryId=eRO8vOP7ddbpx_W2ZazjLw), /data/d2: ERROR})
I/O error trying to read log directory /data/d2.
 {code}
When configured with multiple directories, Kafka tolerates some of them (but 
not all) being inaccessible, so this tool should be able to handle the same 
scenarios without crashing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16297) Race condition while promoting future replica can lead to partition unavailability.

2024-02-22 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-16297:
---

 Summary: Race condition while promoting future replica can lead to 
partition unavailability.
 Key: KAFKA-16297
 URL: https://issues.apache.org/jira/browse/KAFKA-16297
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez


KIP-858 proposed that when a directory failure occurs after changing the 
assignment of a replica that's moved between two directories in the same 
broker, but before the future replica promotion completes, the broker should 
reassign the replica to inform the controller of its correct status. But this 
hasn't yet been implemented, and without it this failure may lead to indefinite 
partition unavailability.

Example scenario:
 # A broker which leads partition P receives a request to alter the replica 
from directory A to directory B.
 # The broker creates a future replica in directory B and starts a replica 
fetcher.
 # Once the future replica first catches up, the broker queues a reassignment 
to inform the controller of the directory change.
 # The next time the replica catches up, the broker briefly blocks appends and 
promotes the replica. However, before the promotion is attempted, directory A 
fails.
 # The controller was informed that P in now in directory B before it received 
the notification that directory A has failed, so it does not elect a new 
leader, and as long as the broker is online, partition A remains unavailable.

As per KIP-858, the broker should detect this scenario and queue a reassignment 
of P into directory ID {{{}DirectoryId.LOST{}}}.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15955) Migrating ZK brokers send dir assignments

2023-12-01 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15955:
---

 Summary: Migrating ZK brokers send dir assignments
 Key: KAFKA-15955
 URL: https://issues.apache.org/jira/browse/KAFKA-15955
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez


Broker in ZooKeeper mode, while in migration mode, should start sending 
directory assignments to the KRaft Controller using AssignmentsManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15893) Bump MetadataVersion for directory assignments

2023-11-24 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15893:
---

 Summary: Bump MetadataVersion for directory assignments
 Key: KAFKA-15893
 URL: https://issues.apache.org/jira/browse/KAFKA-15893
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15886) Always specify directories for new partition registrations

2023-11-22 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15886:
---

 Summary: Always specify directories for new partition registrations
 Key: KAFKA-15886
 URL: https://issues.apache.org/jira/browse/KAFKA-15886
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez


When creating partition registrations directories must always be defined.

If creating a partition from a PartitionRecord or PartitionChangeRecord from an 
older version that does not support directory assignments, then 
DirectoryId.MIGRATING is assumed.

If creating a new partition, or triggering a change in assignment, 
DirectoryId.UNASSIGNED should be specified, unless the target broker has a 
single online directory registered, in which case the replica should be 
assigned directly to that single directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15858) Broker stays fenced until all assignments are correct

2023-11-20 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15858:
---

 Summary: Broker stays fenced until all assignments are correct
 Key: KAFKA-15858
 URL: https://issues.apache.org/jira/browse/KAFKA-15858
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez


Until there the broker has caught up with metadata AND corrected any incorrect 
directory assignments, it should continue to want to stay fenced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15650) Data-loss on leader shutdown right after partition creation?

2023-10-19 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15650:
---

 Summary: Data-loss on leader shutdown right after partition 
creation?
 Key: KAFKA-15650
 URL: https://issues.apache.org/jira/browse/KAFKA-15650
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez


As per KIP-858, when a replica is created, the broker selects a log directory 
to host the replica and queues the propagation of the directory assignment to 
the controller. The replica becomes immediately active, it isn't blocked until 
the controller confirms the metadata change. If the replica is the leader 
replica it can immediately start accepting writes. 

Consider the following scenario:
 # A partition is created in some selected log directory, and some produce 
traffic is accepted
 # Before the broker is able to notify the controller of the directory 
assignment, the broker shuts down
 # Upon coming back online, the broker has an offline directory, the same 
directory which was chosen to host the replica
 # The broker assumes leadership for the replica, but cannot find it in any 
available directory and has no way of knowing it was already created because 
the directory assignment is still missing
 # The replica is created and the previously produced records are lost

Step 4. may seem unlikely due to ISR membership gating leadership, but even 
assuming acks=all and replicas>1, if all other replicas are also offline the 
broker may still gain leadership. Perhaps KIP-966 is relevant here.

We may need to delay new replica activation until the assignment is propagated 
successfully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15649) Handle directory failure timeout

2023-10-19 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15649:
---

 Summary: Handle directory failure timeout 
 Key: KAFKA-15649
 URL: https://issues.apache.org/jira/browse/KAFKA-15649
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez


If a broker with an offline log directory continues to fail to notify the 
controller of either:
 * the fact that the directory is offline; or
 * of any replica assignment into a failed directory

then the controller will not check if a leadership change is required, and this 
may lead to partitions remaining indefinitely offline.

KIP-858 proposes that the broker should shut down after a configurable timeout 
to force a leadership change. Alternatively, the broker could also request to 
be fenced, as long as there's a path for it to later become unfenced.

While this unavailability is possible in theory, in practice it's not easy to 
entertain a scenario where a broker continues to appear as healthy before the 
controller, but fails to send this information. So it's not clear if this is a 
real problem. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-15355) Message schema changes

2023-10-16 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez reopened KAFKA-15355:
-

closed by mistake

> Message schema changes
> --
>
> Key: KAFKA-15355
> URL: https://issues.apache.org/jira/browse/KAFKA-15355
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Igor Soarez
>Assignee: Igor Soarez
>Priority: Major
> Fix For: 3.7.0
>
>
> Metadata records changes:
>  * BrokerRegistrationChangeRecord
>  * PartitionChangeRecord
>  * PartitionRecord
>  * RegisterBrokerRecord
> New RPCs
>  * AssignReplicasToDirsRequest
>  * AssignReplicasToDirsResponse
> RPC changes:
>  * BrokerHeartbeatRequest
>  * BrokerRegistrationRequest



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15514) Controller-side replica management changes

2023-09-27 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15514:
---

 Summary: Controller-side replica management changes
 Key: KAFKA-15514
 URL: https://issues.apache.org/jira/browse/KAFKA-15514
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez


The new "Assignments" field replaces the "Replicas" field in PartitionRecord 
and PartitionChangeRecord.

On the controller side, any changes to partitions need to consider both fields.
 * ISR updates
 * Partiton reassignments & reverts
 * Partition creation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15451) Include offline dirs in BrokerHeartbeatRequest

2023-09-11 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15451:
---

 Summary: Include offline dirs in BrokerHeartbeatRequest
 Key: KAFKA-15451
 URL: https://issues.apache.org/jira/browse/KAFKA-15451
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez
Assignee: Igor Soarez






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15426) Process and persist directory assignments

2023-08-31 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15426:
---

 Summary: Process and persist directory assignments
 Key: KAFKA-15426
 URL: https://issues.apache.org/jira/browse/KAFKA-15426
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez


* Handle AssignReplicasToDirsRequest
 * Persist metadata changes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15368) Test ZK JBOD to KRaft migration

2023-08-16 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15368:
---

 Summary: Test ZK JBOD to KRaft migration
 Key: KAFKA-15368
 URL: https://issues.apache.org/jira/browse/KAFKA-15368
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez


A ZK cluster running JBOD should be able to migrate to KRaft mode without issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15367) Test KRaft JBOD enabling migration

2023-08-16 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15367:
---

 Summary: Test KRaft JBOD enabling migration
 Key: KAFKA-15367
 URL: https://issues.apache.org/jira/browse/KAFKA-15367
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez


A cluster running in KRaft without JBOD should be able to transition into JBOD 
mode without issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15366) Log directory failure integration test

2023-08-16 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15366:
---

 Summary: Log directory failure integration test
 Key: KAFKA-15366
 URL: https://issues.apache.org/jira/browse/KAFKA-15366
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15365) Replica management changes

2023-08-16 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15365:
---

 Summary: Replica management changes
 Key: KAFKA-15365
 URL: https://issues.apache.org/jira/browse/KAFKA-15365
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15364) Handle log directory failure in the Controller

2023-08-16 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15364:
---

 Summary: Handle log directory failure in the Controller
 Key: KAFKA-15364
 URL: https://issues.apache.org/jira/browse/KAFKA-15364
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15363) Broker log directory failure changes

2023-08-16 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15363:
---

 Summary: Broker log directory failure changes
 Key: KAFKA-15363
 URL: https://issues.apache.org/jira/browse/KAFKA-15363
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15362) Resolve offline replicas in metadata cache

2023-08-16 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15362:
---

 Summary: Resolve offline replicas in metadata cache 
 Key: KAFKA-15362
 URL: https://issues.apache.org/jira/browse/KAFKA-15362
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez


Considering broker's offline log directories and replica to dir assignments



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15361) Process and persist dir info with broker registration

2023-08-16 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15361:
---

 Summary: Process and persist dir info with broker registration
 Key: KAFKA-15361
 URL: https://issues.apache.org/jira/browse/KAFKA-15361
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez


Controllers should process and persist directory information from the broker 
registration request



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15360) Include directory info in BrokerRegistration

2023-08-16 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15360:
---

 Summary: Include directory info in BrokerRegistration
 Key: KAFKA-15360
 URL: https://issues.apache.org/jira/browse/KAFKA-15360
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez
Assignee: Igor Soarez


Brokers should correctly set the OnlineLogDirs and OfflineLogDirs fields in 
each BrokerRegistrationRequest.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15359) log.dir.failure.timeout.ms configuration

2023-08-16 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15359:
---

 Summary: log.dir.failure.timeout.ms configuration
 Key: KAFKA-15359
 URL: https://issues.apache.org/jira/browse/KAFKA-15359
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez


If the Broker repeatedly cannot communicate fails to communicate a log 
directory failure after a configurable amount of time — 
{{log.dir.failure.timeout.ms}} — and it is the leader for any replicas in the 
failed log directory the broker will shutdown, as that is the only other way to 
guarantee that the controller will elect a new leader for those partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15358) QueuedReplicaToDirAssignments metric

2023-08-16 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15358:
---

 Summary: QueuedReplicaToDirAssignments metric
 Key: KAFKA-15358
 URL: https://issues.apache.org/jira/browse/KAFKA-15358
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez
Assignee: Igor Soarez






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15357) Propagates assignments and logdir failures to controller

2023-08-16 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15357:
---

 Summary: Propagates assignments and logdir failures to controller
 Key: KAFKA-15357
 URL: https://issues.apache.org/jira/browse/KAFKA-15357
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez
Assignee: Igor Soarez


LogDirEventManager accumulates, batches, and sends assignments or failure 
events to the Controller, prioritizing assignments to ensure the Controller has 
the correct assignment view before processing log dir failures.

Assignments are sent via AssignReplicasToDirs, logdir failures are sent via 
BrokerHeartbeat.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15356) Generate and persist log directory UUIDs

2023-08-16 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15356:
---

 Summary: Generate and persist log directory UUIDs
 Key: KAFKA-15356
 URL: https://issues.apache.org/jira/browse/KAFKA-15356
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez
Assignee: Igor Soarez


* The StorageTool format command should ensure each dir has a generated UUID
 * BrokerMetadataCheckpoint should parse directory.id from meta.properties, or 
generate and persist if missing

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15355) Update metadata records

2023-08-16 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-15355:
---

 Summary: Update metadata records
 Key: KAFKA-15355
 URL: https://issues.apache.org/jira/browse/KAFKA-15355
 Project: Kafka
  Issue Type: Sub-task
Reporter: Igor Soarez


Includes:
 * BrokerRegistrationChangeRecord
 * PartitionChangeRecord
 * PartitionRecord
 * RegisterBrokerRecord



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14303) Producer.send without record key and batch.size=0 goes into infinite loop

2022-10-14 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-14303:
---

 Summary: Producer.send without record key and batch.size=0 goes 
into infinite loop
 Key: KAFKA-14303
 URL: https://issues.apache.org/jira/browse/KAFKA-14303
 Project: Kafka
  Issue Type: Bug
  Components: clients, producer 
Affects Versions: 3.3.1, 3.3.0
Reporter: Igor Soarez


3.3 has broken previous producer behavior.

A call to {{producer.send(record)}} with a record without a key and configured 
with {{batch.size=0}} never returns.

Reproducer:
{code:java}
class ProducerIssueTest extends IntegrationTestHarness {
  override protected def brokerCount = 1
  @Test
  def testProduceWithBatchSizeZeroAndNoRecordKey(): Unit = {
val topicName = "foo"
createTopic(topicName)
val overrides = new Properties
overrides.put(ProducerConfig.BATCH_SIZE_CONFIG, 0)
val producer = createProducer(keySerializer = new StringSerializer, 
valueSerializer = new StringSerializer, overrides)
val record = new ProducerRecord[String, String](topicName, null, "hello")
val future = producer.send(record) // goes into infinite loop here
future.get(10, TimeUnit.SECONDS)
  }
} {code}
 

[Documentation for producer 
configuration|https://kafka.apache.org/documentation/#producerconfigs_batch.size]
 states {{batch.size=0}} as a valid value:
{quote}Valid Values: [0,...]
{quote}
and recommends its use directly:
{quote}A small batch size will make batching less common and may reduce 
throughput (a batch size of zero will disable batching entirely).
{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14127) KIP-858: Handle JBOD broker disk failure in KRaft

2022-08-01 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-14127:
---

 Summary: KIP-858: Handle JBOD broker disk failure in KRaft
 Key: KAFKA-14127
 URL: https://issues.apache.org/jira/browse/KAFKA-14127
 Project: Kafka
  Issue Type: Improvement
  Components: jbod, kraft
Reporter: Igor Soarez


Tracking for KIP-858



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-13906) Invalid replica state transition

2022-05-16 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-13906:
---

 Summary: Invalid replica state transition
 Key: KAFKA-13906
 URL: https://issues.apache.org/jira/browse/KAFKA-13906
 Project: Kafka
  Issue Type: Bug
  Components: controller, core, replication
Affects Versions: 3.1.1, 3.0.1, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.0.2, 3.1.2, 
3.2.1
Reporter: Igor Soarez


The controller runs into an IllegalStateException when reacting to changes in 
broker membership status if there are topics that are pending deletion.

 

How to reproduce:
 # Setup cluster with 3 brokers
 # Create a topic with a partition being led by each broker and produce some 
data
 # Kill one of the brokers that is not the controller, and keep that broker down
 # Delete the topic
 # Restart the other broker that is not the controller

 

Logs and stacktrace:

{{[2022-05-16 11:53:25,482] ERROR [Controller id=1 epoch=1] Controller 1 epoch 
1 initiated state change of replica 3 for partition test-topic-2 from 
ReplicaDeletionSuccessful to ReplicaDeletionIneligible failed 
(state.change.logger)}}
{{java.lang.IllegalStateException: Replica 
[Topic=test-topic,Partition=2,Replica=3] should be in the 
OfflineReplica,ReplicaDeletionStarted states before moving to 
ReplicaDeletionIneligible state. Instead it is in ReplicaDeletionSuccessful 
state}}
{{        at 
kafka.controller.ZkReplicaStateMachine.logInvalidTransition(ReplicaStateMachine.scala:442)}}
{{        at 
kafka.controller.ZkReplicaStateMachine.$anonfun$doHandleStateChanges$2(ReplicaStateMachine.scala:164)}}
{{        at 
kafka.controller.ZkReplicaStateMachine.$anonfun$doHandleStateChanges$2$adapted(ReplicaStateMachine.scala:164)}}
{{        at scala.collection.immutable.List.foreach(List.scala:333)}}
{{        at 
kafka.controller.ZkReplicaStateMachine.doHandleStateChanges(ReplicaStateMachine.scala:164)}}
{{        at 
kafka.controller.ZkReplicaStateMachine.$anonfun$handleStateChanges$2(ReplicaStateMachine.scala:112)}}
{{        at 
kafka.controller.ZkReplicaStateMachine.$anonfun$handleStateChanges$2$adapted(ReplicaStateMachine.scala:111)}}
{{        at 
kafka.utils.Implicits$MapExtensionMethods$.$anonfun$forKeyValue$1(Implicits.scala:62)}}
{{        at 
scala.collection.immutable.HashMap.foreachEntry(HashMap.scala:1092)}}
{{        at 
kafka.controller.ZkReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:111)}}
{{        at 
kafka.controller.TopicDeletionManager.failReplicaDeletion(TopicDeletionManager.scala:157)}}
{{        at 
kafka.controller.KafkaController.onReplicasBecomeOffline(KafkaController.scala:638)}}
{{        at 
kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:599)}}
{{        at 
kafka.controller.KafkaController.processBrokerChange(KafkaController.scala:1623)}}
{{        at 
kafka.controller.KafkaController.process(KafkaController.scala:2534)}}
{{        at 
kafka.controller.QueuedEvent.process(ControllerEventManager.scala:52)}}
{{        at 
kafka.controller.ControllerEventManager$ControllerEventThread.process$1(ControllerEventManager.scala:130)}}
{{--}}
{{[2022-05-16 11:53:40,726] ERROR [Controller id=1 epoch=1] Controller 1 epoch 
1 initiated state change of replica 3 for partition test-topic-2 from 
ReplicaDeletionSuccessful to OnlineReplica failed (state.change.logger)}}
{{java.lang.IllegalStateException: Replica 
[Topic=test-topic,Partition=2,Replica=3] should be in the 
NewReplica,OnlineReplica,OfflineReplica,ReplicaDeletionIneligible states before 
moving to OnlineReplica state. Instead it is in ReplicaDeletionSuccessful 
state}}
{{        at 
kafka.controller.ZkReplicaStateMachine.logInvalidTransition(ReplicaStateMachine.scala:442)}}
{{        at 
kafka.controller.ZkReplicaStateMachine.$anonfun$doHandleStateChanges$2(ReplicaStateMachine.scala:164)}}
{{        at 
kafka.controller.ZkReplicaStateMachine.$anonfun$doHandleStateChanges$2$adapted(ReplicaStateMachine.scala:164)}}
{{        at scala.collection.immutable.List.foreach(List.scala:333)}}
{{        at 
kafka.controller.ZkReplicaStateMachine.doHandleStateChanges(ReplicaStateMachine.scala:164)}}
{{        at 
kafka.controller.ZkReplicaStateMachine.$anonfun$handleStateChanges$2(ReplicaStateMachine.scala:112)}}
{{        at 
kafka.controller.ZkReplicaStateMachine.$anonfun$handleStateChanges$2$adapted(ReplicaStateMachine.scala:111)}}
{{        at 
kafka.utils.Implicits$MapExtensionMethods$.$anonfun$forKeyValue$1(Implicits.scala:62)}}
{{        at 
scala.collection.immutable.HashMap.foreachEntry(HashMap.scala:1092)}}
{{        at 
kafka.controller.ZkReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:111)}}
{{        at 
kafka.controller.KafkaController.onBrokerStartup(KafkaController.scala:543)}}
{{        at 
kafka.controller.KafkaController.processBrokerChange(KafkaController.scala:1607)}}
{{        at 
kafka.controller.KafkaController.process(KafkaController.scala:2534)}}

[jira] [Created] (KAFKA-13382) Automatic storage formatting

2021-10-18 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-13382:
---

 Summary: Automatic storage formatting
 Key: KAFKA-13382
 URL: https://issues.apache.org/jira/browse/KAFKA-13382
 Project: Kafka
  Issue Type: Improvement
  Components: core, jbod, kraft
Reporter: Igor Soarez
Assignee: Igor Soarez


It should be possible to configure a KRaft server to detect the need to, and 
format storage directories on startup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12938) Fix and reenable testChrootExistsAndRootIsLocked

2021-06-11 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-12938:
---

 Summary: Fix and reenable testChrootExistsAndRootIsLocked
 Key: KAFKA-12938
 URL: https://issues.apache.org/jira/browse/KAFKA-12938
 Project: Kafka
  Issue Type: Task
Reporter: Igor Soarez


In core/src/test/scala/unit/kafka/zk/KafkaZkClientTest.scala

 

Disabled in https://github.com/apache/kafka/pull/10820



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12866) Kafka requires ZK root access even when using a chroot

2021-05-30 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-12866:
---

 Summary: Kafka requires ZK root access even when using a chroot
 Key: KAFKA-12866
 URL: https://issues.apache.org/jira/browse/KAFKA-12866
 Project: Kafka
  Issue Type: Bug
  Components: core, zkclient
Affects Versions: 2.6.2, 2.7.1, 2.8.0, 2.6.1
Reporter: Igor Soarez


When a Zookeeper chroot is configured, users do not expect Kafka to need 
Zookeeper access outside of that chroot.
h1. Why is this important?

A zookeeper cluster may be shared with other Kafka clusters or even other 
applications. It is an expected security practice to restrict each 
cluster/application's access to it's own Zookeeper chroot.
h1. Steps to reproduce
h2. Zookeeper setup

Using the zkCli, create a chroot for Kafka, make it available to Kafka but lock 
the root znode.

{{ [zk: localhost:2181(CONNECTED) 1] create /somechroot }}
{{ Created /some}}
{{ [zk: localhost:2181(CONNECTED) 2] setAcl /somechroot world:anyone:cdrwa}}
{{ [zk: localhost:2181(CONNECTED) 3] addauth digest test:12345}}
{{ [zk: localhost:2181(CONNECTED) 4] setAcl / 
digest:test:Mx1uO9GLtm1qaVAQ20Vh9ODgACg=:cdrwa}}
h2. Kafka setup

Configure the chroot in broker.properties:

{{zookeeper.connect=localhost:2181/somechroot}}
h2. Expected behavior

The expected behavior here is that Kafka will use the chroot without issues.
h2. Actual result

Kafka fails to start with a fatal exception:

{{ org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = 
NoAuth for /chroot}}
{{ at org.apache.zookeeper.KeeperException.create(KeeperException.java:120)}}
{{ at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)}}
{{ at kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:583)}}
{{ at kafka.zk.KafkaZkClient.createRecursive(KafkaZkClient.scala:1729)}}
{{ at 
kafka.zk.KafkaZkClient.makeSurePersistentPathExists(KafkaZkClient.scala:1627)}}
{{ at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1957)}}
{{ at 
kafka.zk.ZkClientAclTest.testChrootExistsAndRootIsLocked(ZkClientAclTest.scala:60)}}

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12763) NoSuchElementException during kafka-log-start-offset-checkpoint

2021-05-07 Thread Igor Soarez (Jira)
Igor Soarez created KAFKA-12763:
---

 Summary: NoSuchElementException during 
kafka-log-start-offset-checkpoint
 Key: KAFKA-12763
 URL: https://issues.apache.org/jira/browse/KAFKA-12763
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 2.6.2
Reporter: Igor Soarez
Assignee: Igor Soarez


The following exception was observed in a cluster running Kafka version 2.6.2.

 

{{}}
{code:java}
{
  "class": "java.util.NoSuchElementException",
  "msg": null,
  "stack": [

"java.util.concurrent.ConcurrentSkipListMap$ValueIterator.next(ConcurrentSkipListMap.java:2123)",

"scala.collection.convert.JavaCollectionWrappers$JIteratorWrapper.next(JavaCollectionWrappers.scala:38)",
"scala.collection.IterableOps.head(Iterable.scala:218)",
"scala.collection.IterableOps.head$(Iterable.scala:218)",
"scala.collection.AbstractIterable.head(Iterable.scala:920)",
"kafka.log.LogManager$$anonfun$4.applyOrElse(LogManager.scala:640)",
"kafka.log.LogManager$$anonfun$4.applyOrElse(LogManager.scala:639)",
"scala.collection.Iterator$$anon$7.hasNext(Iterator.scala:516)",
"scala.collection.mutable.Growable.addAll(Growable.scala:61)",
"scala.collection.mutable.Growable.addAll$(Growable.scala:59)",
"scala.collection.mutable.HashMap.addAll(HashMap.scala:111)",
"scala.collection.mutable.HashMap$.from(HashMap.scala:549)",
"scala.collection.mutable.HashMap$.from(HashMap.scala:542)",
"scala.collection.MapFactory$Delegate.from(Factory.scala:425)",
"scala.collection.MapOps.collect(Map.scala:283)",
"scala.collection.MapOps.collect$(Map.scala:282)",
"scala.collection.AbstractMap.collect(Map.scala:375)",

"kafka.log.LogManager.$anonfun$checkpointLogStartOffsetsInDir$2(LogManager.scala:639)",

"kafka.log.LogManager.$anonfun$checkpointLogStartOffsetsInDir$1(LogManager.scala:636)",
"kafka.log.LogManager.checkpointLogStartOffsetsInDir(LogManager.scala:635)",

"kafka.log.LogManager.$anonfun$checkpointLogStartOffsets$1(LogManager.scala:600)",

"kafka.log.LogManager.$anonfun$checkpointLogStartOffsets$1$adapted(LogManager.scala:600)",
"scala.collection.IterableOnceOps.foreach(IterableOnce.scala:553)",
"scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:551)",
"scala.collection.AbstractIterable.foreach(Iterable.scala:920)",
"kafka.log.LogManager.checkpointLogStartOffsets(LogManager.scala:600)",
"kafka.log.LogManager.$anonfun$startup$6(LogManager.scala:426)",
"kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)",
"java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)",
"java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)",

"java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)",

"java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)",

"java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)",
"java.lang.Thread.run(Thread.java:834)"
  ]
}{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10303) kafka producer says connect failed in cluster mode

2020-08-02 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez resolved KAFKA-10303.
-
Resolution: Incomplete

> kafka producer says connect failed in cluster mode
> --
>
> Key: KAFKA-10303
> URL: https://issues.apache.org/jira/browse/KAFKA-10303
> Project: Kafka
>  Issue Type: Bug
>Reporter: Yogesh BG
>Priority: Major
>
> Hi
>  
> I am using kafka broker version 2.3.0
> We have two setups with standalone(one node) and 3 nodes cluster
> we pump huge data ~25MBPS, ~80K messages per second
> It all works well in one node mode
> but in case of cluster, producer start throwing connect failed(librd kafka)
> after sometime again able to connect start sending traffic.
> What could be the issue? some of the configurations are
>  
> replica.fetch.max.bytes=10485760
> num.network.threads=12
> num.replica.fetchers=6
> queued.max.requests=5
> # The number of threads doing disk I/O
> num.io.threads=12
> replica.socket.receive.buffer.bytes=1
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-8505) Limit the maximum number of connections per ip client ID

2019-06-07 Thread Igor Soarez (JIRA)
Igor Soarez created KAFKA-8505:
--

 Summary: Limit the maximum number of connections per ip client ID
 Key: KAFKA-8505
 URL: https://issues.apache.org/jira/browse/KAFKA-8505
 Project: Kafka
  Issue Type: New Feature
Reporter: Igor Soarez


As highlighted by KAFKA-1512 back in 2014, it is important to be able to limit 
the number of client connections to brokers to maintain service availability. 
With multiple use-cases on the same cluster, it's important to prevent one 
misconfigured use-case from affecting other use-cases.

Cloud infrastructure technology has come a long way since then. Nowadays days 
in a private network using container orchestration technology, IPs come cheap. 
Limiting connections solely on origin IP is no longer acceptable.

Kafka needs to support connection limits based on client identity.

A new configuration property - {{max.connections.per.clientid}} - should work 
similarly to {{max.connections.per.ip}} using ConnectionQuotas, managed 
straight after parsing the request header in SocketServer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)