[jira] [Created] (KAFKA-15654) Address Transactions Errors

2023-10-19 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15654:
--

 Summary: Address Transactions Errors 
 Key: KAFKA-15654
 URL: https://issues.apache.org/jira/browse/KAFKA-15654
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan
Assignee: Justine Olshan


In addition to the work in KIP-691, I propose we handle and clean up 
transactional error handling. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15655) Consider making transactional apis more compatible with topic IDs

2023-10-19 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15655:
--

 Summary: Consider making transactional apis more compatible with 
topic IDs
 Key: KAFKA-15655
 URL: https://issues.apache.org/jira/browse/KAFKA-15655
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan
Assignee: Justine Olshan


Some ideas include adding topic ID to AddPartitions and other topic partition 
specific APIs.

Adding topic ID as a tagged field in the transactional state logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15784) Ensure atomicity of in memory update and write when transactionally committing offsets

2023-11-03 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15784:
--

 Summary: Ensure atomicity of in memory update and write when 
transactionally committing offsets
 Key: KAFKA-15784
 URL: https://issues.apache.org/jira/browse/KAFKA-15784
 Project: Kafka
  Issue Type: Sub-task
Affects Versions: 3.7.0
Reporter: Justine Olshan
Assignee: Justine Olshan


[https://github.com/apache/kafka/pull/14370] (KAFKA-15449) removed the locking 
around validating, updating state, and writing to the log transactional offset 
commits. (The verification causes us to release the lock)

This was discovered in the discussion of 
[https://github.com/apache/kafka/pull/14629] (KAFKA-15653).

Since KAFKA-15653 is needed for 3.5.1, it makes sense to address the locking 
issue separately with this ticket. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15797) Flaky test EosV2UpgradeIntegrationTest.shouldUpgradeFromEosAlphaToEosV2[true]

2023-11-07 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15797:
--

 Summary: Flaky test 
EosV2UpgradeIntegrationTest.shouldUpgradeFromEosAlphaToEosV2[true] 
 Key: KAFKA-15797
 URL: https://issues.apache.org/jira/browse/KAFKA-15797
 Project: Kafka
  Issue Type: Bug
Reporter: Justine Olshan


I found two recent failures:

[https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14629/22/testReport/junit/org.apache.kafka.streams.integration/EosV2UpgradeIntegrationTest/Build___JDK_8_and_Scala_2_12___shouldUpgradeFromEosAlphaToEosV2_true_/]
[https://ci-builds.apache.org/job/Kafka/job/kafka/job/trunk/2365/testReport/junit/org.apache.kafka.streams.integration/EosV2UpgradeIntegrationTest/Build___JDK_21_and_Scala_2_13___shouldUpgradeFromEosAlphaToEosV2_true__2/]
 

Output generally looks like:


{code:java}
java.lang.AssertionError: Did not receive all 138 records from topic 
multiPartitionOutputTopic within 6 ms, currently accumulated data is 
[KeyValue(0, 0), KeyValue(0, 1), KeyValue(0, 3), KeyValue(0, 6), KeyValue(0, 
10), KeyValue(0, 15), KeyValue(0, 21), KeyValue(0, 28), KeyValue(0, 36), 
KeyValue(0, 45), KeyValue(0, 55), KeyValue(0, 66), KeyValue(0, 78), KeyValue(0, 
91), KeyValue(0, 55), KeyValue(0, 66), KeyValue(0, 78), KeyValue(0, 91), 
KeyValue(0, 105), KeyValue(0, 120), KeyValue(0, 136), KeyValue(0, 153), 
KeyValue(0, 171), KeyValue(0, 190), KeyValue(3, 0), KeyValue(3, 1), KeyValue(3, 
3), KeyValue(3, 6), KeyValue(3, 10), KeyValue(3, 15), KeyValue(3, 21), 
KeyValue(3, 28), KeyValue(3, 36), KeyValue(3, 45), KeyValue(3, 55), KeyValue(3, 
66), KeyValue(3, 78), KeyValue(3, 91), KeyValue(3, 105), KeyValue(3, 120), 
KeyValue(3, 136), KeyValue(3, 153), KeyValue(3, 171), KeyValue(3, 190), 
KeyValue(3, 190), KeyValue(3, 210), KeyValue(3, 231), KeyValue(3, 253), 
KeyValue(3, 276), KeyValue(3, 300), KeyValue(3, 325), KeyValue(3, 351), 
KeyValue(3, 378), KeyValue(3, 406), KeyValue(3, 435), KeyValue(1, 0), 
KeyValue(1, 1), KeyValue(1, 3), KeyValue(1, 6), KeyValue(1, 10), KeyValue(1, 
15), KeyValue(1, 21), KeyValue(1, 28), KeyValue(1, 36), KeyValue(1, 45), 
KeyValue(1, 55), KeyValue(1, 66), KeyValue(1, 78), KeyValue(1, 91), KeyValue(1, 
105), KeyValue(1, 120), KeyValue(1, 136), KeyValue(1, 153), KeyValue(1, 171), 
KeyValue(1, 190), KeyValue(1, 120), KeyValue(1, 136), KeyValue(1, 153), 
KeyValue(1, 171), KeyValue(1, 190), KeyValue(1, 210), KeyValue(1, 231), 
KeyValue(1, 253), KeyValue(1, 276), KeyValue(1, 300), KeyValue(1, 325), 
KeyValue(1, 351), KeyValue(1, 378), KeyValue(1, 406), KeyValue(1, 435), 
KeyValue(2, 0), KeyValue(2, 1), KeyValue(2, 3), KeyValue(2, 6), KeyValue(2, 
10), KeyValue(2, 15), KeyValue(2, 21), KeyValue(2, 28), KeyValue(2, 36), 
KeyValue(2, 45), KeyValue(2, 55), KeyValue(2, 66), KeyValue(2, 78), KeyValue(2, 
91), KeyValue(2, 105), KeyValue(2, 55), KeyValue(2, 66), KeyValue(2, 78), 
KeyValue(2, 91), KeyValue(2, 105), KeyValue(2, 120), KeyValue(2, 136), 
KeyValue(2, 153), KeyValue(2, 171), KeyValue(2, 190), KeyValue(2, 210), 
KeyValue(2, 231), KeyValue(2, 253), KeyValue(2, 276), KeyValue(2, 300), 
KeyValue(2, 325), KeyValue(2, 351), KeyValue(2, 378), KeyValue(2, 406), 
KeyValue(0, 210), KeyValue(0, 231), KeyValue(0, 253), KeyValue(0, 276), 
KeyValue(0, 300), KeyValue(0, 325), KeyValue(0, 351), KeyValue(0, 378), 
KeyValue(0, 406), KeyValue(0, 435)] Expected: is a value equal to or greater 
than <138> but: <134> was less than <138>{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15798) Flaky Test NamedTopologyIntegrationTest.shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology()

2023-11-07 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15798:
--

 Summary: Flaky Test 
NamedTopologyIntegrationTest.shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology()
 Key: KAFKA-15798
 URL: https://issues.apache.org/jira/browse/KAFKA-15798
 Project: Kafka
  Issue Type: Bug
Reporter: Justine Olshan


I saw a few examples recently. 2 have the same error, but the third is different

[https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14629/22/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_8_and_Scala_2_12___shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology___2/]

[https://ci-builds.apache.org/job/Kafka/job/kafka/job/trunk/2365/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_21_and_Scala_2_13___shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology__/]
 

The failure is like


{code:java}
java.lang.AssertionError: Did not receive all 5 records from topic 
output-stream-1 within 6 ms, currently accumulated data is [] Expected: is 
a value equal to or greater than <5> but: <0> was less than <5>{code}

The other failure was
[https://ci-builds.apache.org/job/Kafka/job/kafka/job/trunk/2365/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_8_and_Scala_2_12___shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology__/]


{code:java}
java.lang.AssertionError: Expected: <[0, 1]> but: was <[0]>{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15758) Always schedule wrapped callbacks

2023-10-30 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15758:
--

 Summary: Always schedule wrapped callbacks
 Key: KAFKA-15758
 URL: https://issues.apache.org/jira/browse/KAFKA-15758
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan


As part of 
[https://github.com/apache/kafka/commit/08aa33127a4254497456aa7a0c1646c7c38adf81]
 the finding of the coordinator was moved to the AddPartitionsToTxnManager. In 
the case of an error, we return the error on the wrapped callback. 

This seemed to cause issues in the tests and we realized that executing the 
callback directly and not rescheduling it on the request channel seemed to 
resolve some issues. 

One theory was that scheduling the callback before the request returned caused 
issues.

Ideally we wouldn't have this special handling. This ticket is to remove it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15757) Do not advertise v4 AddPartitionsToTxn to clients

2023-10-30 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15757:
--

 Summary: Do not advertise v4 AddPartitionsToTxn to clients
 Key: KAFKA-15757
 URL: https://issues.apache.org/jira/browse/KAFKA-15757
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan


v4+ is intended to be a broker side API. Thus, we should not return it as a 
valid version to clients.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15449) Verify transactional offset commits (KIP-890 part 1)

2023-10-02 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-15449.

Resolution: Fixed

> Verify transactional offset commits (KIP-890 part 1)
> 
>
> Key: KAFKA-15449
> URL: https://issues.apache.org/jira/browse/KAFKA-15449
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Justine Olshan
>        Assignee: Justine Olshan
>Priority: Critical
>
> We verify on produce requests but not offset commits. We should fix this to 
> avoid hanging transactions on consumer offset partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15545) Update Request metrics in ops.html to reflect all the APIs

2023-10-04 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15545:
--

 Summary: Update Request metrics in ops.html to reflect all the APIs
 Key: KAFKA-15545
 URL: https://issues.apache.org/jira/browse/KAFKA-15545
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan


When updating for KAFKA-15530, I noticed that the request metrics only mention 
Produce|FetchConsumer|FetchFollower. These requests metrics apply to all APIs 
so we should update the documentation to make this clearer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15546) Transactions tool duration field confusing for completed transactions

2023-10-04 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15546:
--

 Summary: Transactions tool duration field confusing for completed 
transactions
 Key: KAFKA-15546
 URL: https://issues.apache.org/jira/browse/KAFKA-15546
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan
Assignee: Justine Olshan


When using the transactions tool to describe transactions, if the transaction 
is completed, its duration will still increase based on when it started. This 
value is not correct. Instead, we can leave the duration field blank (since we 
don't have the data for the completed transaction in the describe response).

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15589) Flaky kafka.server.FetchRequestTest

2023-10-11 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-15589.

Resolution: Duplicate

Duplicate of https://issues.apache.org/jira/browse/KAFKA-15566

> Flaky  kafka.server.FetchRequestTest
> 
>
> Key: KAFKA-15589
> URL: https://issues.apache.org/jira/browse/KAFKA-15589
> Project: Kafka
>  Issue Type: Task
>    Reporter: Justine Olshan
>Priority: Major
> Attachments: image-2023-10-11-13-19-37-012.png
>
>
> I've been seeing a lot of test failures recently for  
> kafka.server.FetchRequestTest
> Specifically: !image-2023-10-11-13-19-37-012.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15626) Replace verification guard object with an specific type

2023-10-20 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-15626.

Resolution: Fixed

> Replace verification guard object with an specific type
> ---
>
> Key: KAFKA-15626
> URL: https://issues.apache.org/jira/browse/KAFKA-15626
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Justine Olshan
>        Assignee: Justine Olshan
>Priority: Major
>
>  https://github.com/apache/kafka/pull/13787#discussion_r1361468169



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15674) Consider making RequestLocal thread safe

2023-10-23 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15674:
--

 Summary: Consider making RequestLocal thread safe
 Key: KAFKA-15674
 URL: https://issues.apache.org/jira/browse/KAFKA-15674
 Project: Kafka
  Issue Type: Improvement
Reporter: Justine Olshan


KAFKA-15653 found an issue with using the a request local on multiple threads. 
The RequestLocal object was originally designed in a non-thread-safe manner for 
performance.

It is passed around to methods that write to the log, and KAFKA-15653 showed 
that is it not too hard to accidentally share between different threads.

Given all this, and new changes and dependencies in the project compared to 
when it was first introduced, we may want to reconsider the thread safety of 
ThreadLocal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15380) Try complete actions after callback

2023-08-18 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15380:
--

 Summary: Try complete actions after callback
 Key: KAFKA-15380
 URL: https://issues.apache.org/jira/browse/KAFKA-15380
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan
Assignee: Justine Olshan


KIP-890 part 1 introduced the callback request type. It is used to execute a 
callback after KafkaApis.handle has returned. We did not account for 
tryCompleteActions at the end of handle when making this change.

In tests, we saw produce p99 increase dramatically (likely because we have to 
wait for another request before we can complete DelayedProduce). As a result, 
we should add the tryCompleteActions after the callback as well. In testing, 
this improved the produce performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14984) DynamicBrokerReconfigurationTest.testThreadPoolResize() test is flaky

2023-08-25 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-14984.

Resolution: Duplicate

> DynamicBrokerReconfigurationTest.testThreadPoolResize() test is flaky 
> --
>
> Key: KAFKA-14984
> URL: https://issues.apache.org/jira/browse/KAFKA-14984
> Project: Kafka
>  Issue Type: Test
>Reporter: Manyanda Chitimbo
>Priority: Major
>  Labels: flaky-test
>
> The test sometimes fails with the below log 
> {code:java}
> kafka.server.DynamicBrokerReconfigurationTest.testThreadPoolResize() failed, 
> log available in 
> .../core/build/reports/testOutput/kafka.server.DynamicBrokerReconfigurationTest.testThreadPoolResize().test.stdoutGradle
>  Test Run :core:test > Gradle Test Executor 6 > 
> DynamicBrokerReconfigurationTest > testThreadPoolResize() FAILED
>     org.opentest4j.AssertionFailedError: Invalid threads: expected 6, got 8: 
> List(data-plane-kafka-socket-acceptor-ListenerName(PLAINTEXT)-PLAINTEXT-0, 
> data-plane-kafka-socket-acceptor-ListenerName(PLAINTEXT)-PLAINTEXT-0, 
> data-plane-kafka-socket-acceptor-ListenerName(INTERNAL)-SSL-0, 
> data-plane-kafka-socket-acceptor-ListenerName(EXTERNAL)-SASL_SSL-0, 
> data-plane-kafka-socket-acceptor-ListenerName(INTERNAL)-SSL-0, 
> data-plane-kafka-socket-acceptor-ListenerName(INTERNAL)-SSL-0, 
> data-plane-kafka-socket-acceptor-ListenerName(EXTERNAL)-SASL_SSL-0, 
> data-plane-kafka-socket-acceptor-ListenerName(EXTERNAL)-SASL_SSL-0) ==> 
> expected:  but was: 
>         at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>         at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>         at 
> app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
>         at 
> app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
>         at 
> app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:211)
>         at 
> app//kafka.server.DynamicBrokerReconfigurationTest.verifyThreads(DynamicBrokerReconfigurationTest.scala:1634)
>         at 
> app//kafka.server.DynamicBrokerReconfigurationTest.testThreadPoolResize(DynamicBrokerReconfigurationTest.scala:872)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15404) Failing Test DynamicBrokerReconfigurationTest#testThreadPoolResize

2023-08-24 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15404:
--

 Summary: Failing Test 
DynamicBrokerReconfigurationTest#testThreadPoolResize
 Key: KAFKA-15404
 URL: https://issues.apache.org/jira/browse/KAFKA-15404
 Project: Kafka
  Issue Type: Bug
Reporter: Justine Olshan


I've seen this failing on all builds pretty consistently.


{{org.opentest4j.AssertionFailedError: Invalid threads: expected 6, got 8: 
List(data-plane-kafka-socket-acceptor-ListenerName(EXTERNAL)-SASL_SSL-0, 
data-plane-kafka-socket-acceptor-ListenerName(EXTERNAL)-SASL_SSL-0, 
data-plane-kafka-socket-acceptor-ListenerName(PLAINTEXT)-PLAINTEXT-0, 
data-plane-kafka-socket-acceptor-ListenerName(EXTERNAL)-SASL_SSL-0, 
data-plane-kafka-socket-acceptor-ListenerName(INTERNAL)-SSL-0, 
data-plane-kafka-socket-acceptor-ListenerName(INTERNAL)-SSL-0, 
data-plane-kafka-socket-acceptor-ListenerName(PLAINTEXT)-PLAINTEXT-0, 
data-plane-kafka-socket-acceptor-ListenerName(INTERNAL)-SSL-0) ==> expected: 
 but was: }}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14097) Separate configuration for producer ID expiry

2022-07-21 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14097:
--

 Summary:  Separate configuration for producer ID expiry
 Key: KAFKA-14097
 URL: https://issues.apache.org/jira/browse/KAFKA-14097
 Project: Kafka
  Issue Type: Improvement
Reporter: Justine Olshan


Ticket to track KIP-854. Currently time-based producer ID expiration is 
controlled by `transactional.id.expiration.ms` but we want to create a separate 
config. This can give us finer control over memory usage – especially since 
producer IDs will be more common with idempotency becoming the default.


See 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14140) Ensure a fenced or in-controlled-shutdown replica is not eligible to become leader in ZK mode

2022-08-03 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14140:
--

 Summary: Ensure a fenced or in-controlled-shutdown replica is not 
eligible to become leader in ZK mode
 Key: KAFKA-14140
 URL: https://issues.apache.org/jira/browse/KAFKA-14140
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan
 Fix For: 3.3.0


KIP-841 introduced fencing on ISR in KRaft. We should also provide some of 
these protections in ZK, since all the ground work is mostly there. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-10550) Update AdminClient and kafka-topics.sh to support topic IDs

2023-01-03 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-10550.

Resolution: Fixed

I think the scope of the kip – describe and delete has been completed so I will 
mark this as resolved for now.

> Update AdminClient and kafka-topics.sh to support topic IDs
> ---
>
> Key: KAFKA-10550
> URL: https://issues.apache.org/jira/browse/KAFKA-10550
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Justine Olshan
>Assignee: Deng Ziming
>Priority: Major
>
> Change describe topics AdminClient method to expose and support topic IDs 
>  
>  Make changes to kafka-topics.sh --describe so a user can specify a topic 
> name to describe with the --topic parameter, or alternatively the user can 
> supply a topic ID with the --topic_id parameter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14561) Improve transactions experience for older clients by ensuring ongoing transaction

2023-01-03 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14561:
--

 Summary: Improve transactions experience for older clients by 
ensuring ongoing transaction
 Key: KAFKA-14561
 URL: https://issues.apache.org/jira/browse/KAFKA-14561
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan


This is part 3 of KIP-890:

3. *To cover older clients, we will ensure a transaction is ongoing before we 
write to a transaction. We can do this by querying the transaction coordinator 
and caching the result.*

See KIP-890 for more details: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+Server-Side+Defense



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14562) Implement epoch bump after every transaction

2023-01-03 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14562:
--

 Summary: Implement epoch bump after every transaction
 Key: KAFKA-14562
 URL: https://issues.apache.org/jira/browse/KAFKA-14562
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan
Assignee: Justine Olshan


This is part 1 of KIP-890


 # *Uniquely identify transactions by bumping the producer epoch after every 
commit/abort marker. That way, each transaction can be identified by (producer 
id, epoch).* 



See KIP-890 for more information: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+Server-Side+Defense]
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14563) Remove AddPartitionsToTxn call for newer clients as optimization

2023-01-03 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14563:
--

 Summary: Remove AddPartitionsToTxn call for newer clients as 
optimization
 Key: KAFKA-14563
 URL: https://issues.apache.org/jira/browse/KAFKA-14563
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan
Assignee: Justine Olshan


This is part 2 of KIP-890:

{*}2. Remove the addPartitionsToTxn call and implicitly just add partitions to 
the transaction on the first produce request during a transaction{*}.

See KIP-890 for more information: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+Server-Side+Defense



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14439) Specify returned errors for various APIs and versions

2022-12-02 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14439:
--

 Summary: Specify returned errors for various APIs and versions
 Key: KAFKA-14439
 URL: https://issues.apache.org/jira/browse/KAFKA-14439
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan


Kafka is known for supporting various clients and being compatible across 
different versions. But one thing that is a bit unclear is what errors each 
response can send. 

Knowing what errors can come from each version helps those who implement 
clients have a more defined spec for what errors they need to handle. When new 
errors are added, it is clearer to the clients that changes need to be made.

It also helps contributors get a better understanding about how clients are 
expected to react and potentially find and prevent gaps like the one found in 
https://issues.apache.org/jira/browse/KAFKA-14417

I briefly synced offline with [~hachikuji] about this and he suggested maybe 
adding values for the error codes in the schema definitions of APIs that 
specify the error codes and what versions they are returned on. One idea was 
creating some enum type to accomplish this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14402) Transactions Server Side Defense

2022-11-18 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14402:
--

 Summary: Transactions Server Side Defense
 Key: KAFKA-14402
 URL: https://issues.apache.org/jira/browse/KAFKA-14402
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan
Assignee: Justine Olshan


We have seen hanging transactions in Kafka where the last stable offset (LSO) 
does not update, we can’t clean the log (if the topic is compacted), and 
read_committed consumers get stuck.

This can happen when a message gets stuck or delayed due to networking issues 
or a network partition, the transaction aborts, and then the delayed message 
finally comes in. The delayed message case can also violate EOS if the delayed 
message comes in after the next addPartitionsToTxn request comes in. 
Effectively we may see a message from a previous (aborted) transaction become 
part of the next transaction.

Another way hanging transactions can occur is that a client is buggy and may 
somehow try to write to a partition before it adds the partition to the 
transaction. In both of these cases, we want the server to have some control to 
prevent these incorrect records from being written and either causing hanging 
transactions or violating Exactly once semantics (EOS) by including records in 
the wrong transaction.

The best way to avoid this issue is to:
 # *Uniquely identify transactions by bumping the producer epoch after every 
commit/abort marker. That way, each transaction can be identified by (producer 
id, epoch).* 

 # {*}Remove the addPartitionsToTxn call and implicitly just add partitions to 
the transaction on the first produce request during a transaction{*}.

We avoid the late arrival case because the transaction is uniquely identified 
and fenced AND we avoid the buggy client case because we remove the need for 
the client to explicitly add partitions to begin the transaction.

Of course, 1 and 2 require client-side changes, so for older clients, those 
approaches won’t apply.

3. *To cover older clients, we will ensure a transaction is ongoing before we 
write to a transaction. We can do this by querying the transaction coordinator 
and caching the result.*

 

See KIP-890 for more information: ** 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+Server-Side+Defense



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14417) Producer doesn't handle REQUEST_TIMED_OUT for InitProducerIdRequest

2022-11-22 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14417:
--

 Summary: Producer doesn't handle REQUEST_TIMED_OUT for 
InitProducerIdRequest
 Key: KAFKA-14417
 URL: https://issues.apache.org/jira/browse/KAFKA-14417
 Project: Kafka
  Issue Type: Task
Affects Versions: 3.3.0, 3.2.0, 3.0.0, 3.1.0
Reporter: Justine Olshan


In TransactionManager we have a handler for InitProducerIdRequests 
[https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#LL1276C14-L1276C14]

However, we have the potential to return a REQUEST_TIMED_OUT error in 
ProducerIdManager when the BrokerToControllerChannel manager times out: 
[https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L236]
 

or when the poll returns null: 
[https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L170]

Since REQUEST_TIMED_OUT is not handled by the producer, we treat it as a fatal 
error. With the default of idempotent producers, this can cause more issues.

Seems like the commit that introduced the changes was this one: 
[https://github.com/apache/kafka/commit/72d108274c98dca44514007254552481c731c958]
 so we are vulnerable when the server code is ibp 3.0 and beyond.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14640) Update AddPartitionsToTxn protocol to batch and handle verifyOnly requests

2023-01-19 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14640:
--

 Summary: Update AddPartitionsToTxn protocol to batch and handle 
verifyOnly requests
 Key: KAFKA-14640
 URL: https://issues.apache.org/jira/browse/KAFKA-14640
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan


As part of KIP-890 we are making some changes to this protocol.

1. We can send a request to verify a partition is added to a transaction

2. We can batch multiple transactional IDs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14359) Idempotent Producer continues to retry on OutOfOrderSequence error when first batch fails

2022-11-04 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14359:
--

 Summary: Idempotent Producer continues to retry on 
OutOfOrderSequence error when first batch fails
 Key: KAFKA-14359
 URL: https://issues.apache.org/jira/browse/KAFKA-14359
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan


When the idempotent producer does not have any state it can fall into a state 
where the producer keeps retrying an out of order sequence. Consider the 
following scenario where an idempotent producer has retries and delivery 
timeout are int max (a configuration used in streams).

1. A producer send out several batches (up to 5) with the first one starting at 
sequence 0.
2. The first batch with sequence 0 fails due to a transient error (ie, 
NOT_LEADER_OR_FOLLOWER or a timeout error)
3. The second batch, say with sequence 200 comes in. Since there is no previous 
state to invalidate it, it gets written to the log
4. The original batch is retried and will get an out of order sequence number
5. Current java client will continue to retry this batch, but it will never 
resolve. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14790) Add more AddPartitionsToTxn tests in KafkaApis and Authorizer tests

2023-03-07 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14790:
--

 Summary: Add more AddPartitionsToTxn tests in KafkaApis and 
Authorizer tests
 Key: KAFKA-14790
 URL: https://issues.apache.org/jira/browse/KAFKA-14790
 Project: Kafka
  Issue Type: Test
Reporter: Justine Olshan
Assignee: Justine Olshan


Followup from [https://github.com/apache/kafka/pull/13231]

We should add authorizer tests for the new version.

We should add some more tests to KafkaApis to cover auth and validation 
failures.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14640) Update AddPartitionsToTxn protocol to batch and handle verifyOnly requests

2023-03-07 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-14640.

Resolution: Fixed

> Update AddPartitionsToTxn protocol to batch and handle verifyOnly requests
> --
>
> Key: KAFKA-14640
> URL: https://issues.apache.org/jira/browse/KAFKA-14640
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Justine Olshan
>        Assignee: Justine Olshan
>Priority: Major
>
> As part of KIP-890 we are making some changes to this protocol.
> 1. We can send a request to verify a partition is added to a transaction
> 2. We can batch multiple transactional IDs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14790) Add more AddPartitionsToTxn tests in KafkaApis and Authorizer tests

2023-04-14 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-14790.

Resolution: Fixed

> Add more AddPartitionsToTxn tests in KafkaApis and Authorizer tests
> ---
>
> Key: KAFKA-14790
> URL: https://issues.apache.org/jira/browse/KAFKA-14790
> Project: Kafka
>  Issue Type: Test
>    Reporter: Justine Olshan
>        Assignee: Justine Olshan
>Priority: Minor
>
> Followup from [https://github.com/apache/kafka/pull/13231]
> We should add authorizer tests for the new version.
> We should add some more tests to KafkaApis to cover auth and validation 
> failures.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14916) Fix code that assumes transactional ID implies all records are transactional

2023-04-17 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14916:
--

 Summary: Fix code that assumes transactional ID implies all 
records are transactional
 Key: KAFKA-14916
 URL: https://issues.apache.org/jira/browse/KAFKA-14916
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan
Assignee: Justine Olshan


KAFKA-14561 wrote code that assumed that if a transactional ID was included, 
all record batches were transactional and had the same producer ID.

This work with improve validation and fix the code that assumes all batches are 
transactional.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14917) Producer write while transaction is pending.

2023-04-17 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14917:
--

 Summary: Producer write while transaction is pending.
 Key: KAFKA-14917
 URL: https://issues.apache.org/jira/browse/KAFKA-14917
 Project: Kafka
  Issue Type: Bug
Reporter: Justine Olshan
Assignee: Justine Olshan


As discovered in KAFKA-14904, we seem to get into a state where we try to write 
to a partition while the ongoing state is still pending.

This is likely a bigger issue than the test and worth looking in to.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14899) Revisit Action Queue

2023-04-12 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14899:
--

 Summary: Revisit Action Queue
 Key: KAFKA-14899
 URL: https://issues.apache.org/jira/browse/KAFKA-14899
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan
Assignee: Justine Olshan


With Kafka-14561 we introduced a notion for callback requests. It would be nice 
to standardize and combine action queue usage here. However, the current 
implementation of the callback request assumes local time is computed upon 
response send. 

This same paradigm may not be the case with the action queue. We should follow 
up and see what changes need to be made to combine the two.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14917) Producer write while transaction is pending.

2023-04-18 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-14917.

Resolution: Won't Fix

> Producer write while transaction is pending.
> 
>
> Key: KAFKA-14917
> URL: https://issues.apache.org/jira/browse/KAFKA-14917
> Project: Kafka
>  Issue Type: Bug
>    Reporter: Justine Olshan
>        Assignee: Justine Olshan
>Priority: Major
>
> As discovered in KAFKA-14904, we seem to get into a state where we try to 
> write to a partition while the ongoing state is still pending.
> This is likely a bigger issue than the test and worth looking in to.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14920) Address timeouts and out of order sequences

2023-04-18 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14920:
--

 Summary: Address timeouts and out of order sequences
 Key: KAFKA-14920
 URL: https://issues.apache.org/jira/browse/KAFKA-14920
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan
Assignee: Justine Olshan


KAFKA-14844 showed the destructive nature of a timeout on the first produce 
request for a topic partition (ie one that has no state in psm)

Since we currently don't validate the first sequence (we will in part 2 of 
kip-890), any transient error on the first produce can lead to out of order 
sequences that never recover.

Originally, KAFKA-14561 relied on the producer's retry mechanism for these 
transient issues, but until that is fixed, we may need to retry from in the 
AddPartitionsManager instead. We addressed the concurrent transactions, but 
there are other errors like coordinator loading that we could run into and see 
increased out of order issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14884) Include check transaction is still ongoing right before append

2023-04-20 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-14884.

Resolution: Fixed

> Include check transaction is still ongoing right before append 
> ---
>
> Key: KAFKA-14884
> URL: https://issues.apache.org/jira/browse/KAFKA-14884
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 3.5.0
>        Reporter: Justine Olshan
>    Assignee: Justine Olshan
>Priority: Blocker
>
> Even after checking via AddPartitionsToTxn, the transaction could be aborted 
> after the response. We can add one more check before appending.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-14884) Include check transaction is still ongoing right before append

2023-04-20 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan reopened KAFKA-14884:


I'm confused by all my blockers 臘‍♀️

> Include check transaction is still ongoing right before append 
> ---
>
> Key: KAFKA-14884
> URL: https://issues.apache.org/jira/browse/KAFKA-14884
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 3.5.0
>        Reporter: Justine Olshan
>    Assignee: Justine Olshan
>Priority: Blocker
>
> Even after checking via AddPartitionsToTxn, the transaction could be aborted 
> after the response. We can add one more check before appending.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14904) Flaky Test kafka.api.TransactionsBounceTest.testWithGroupId()

2023-04-20 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-14904.

Resolution: Fixed

> Flaky Test  kafka.api.TransactionsBounceTest.testWithGroupId()
> --
>
> Key: KAFKA-14904
> URL: https://issues.apache.org/jira/browse/KAFKA-14904
> Project: Kafka
>  Issue Type: Test
>Affects Versions: 3.5.0
>        Reporter: Justine Olshan
>    Assignee: Justine Olshan
>Priority: Blocker
>
> After merging KAFKA-14561 I noticed this test still occasionally failed via 
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6ms while awaiting EndTxn(true)
> I will investigate the cause. 
> Note: This error occurs when we are waiting for the transaction to be 
> committed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14931) Revert KAFKA-14561 in 3.5

2023-04-24 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14931:
--

 Summary: Revert KAFKA-14561 in 3.5
 Key: KAFKA-14931
 URL: https://issues.apache.org/jira/browse/KAFKA-14931
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan
Assignee: Justine Olshan


We have too many blockers for this commit to work well, so in the interest of 
code quality, we should revert in 3.5 and fix the issues for 3.6



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14931) Revert KAFKA-14561 in 3.5

2023-04-25 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-14931.

Resolution: Fixed

> Revert KAFKA-14561 in 3.5
> -
>
> Key: KAFKA-14931
> URL: https://issues.apache.org/jira/browse/KAFKA-14931
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 3.5.0
>        Reporter: Justine Olshan
>    Assignee: Justine Olshan
>Priority: Blocker
>
> We have too many blockers for this commit to work well, so in the interest of 
> code quality, we should revert 
> https://issues.apache.org/jira/browse/KAFKA-14561 in 3.5 and fix the issues 
> for 3.6



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14958) Investigate enforcing all batches have the same producer ID

2023-05-02 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14958:
--

 Summary: Investigate enforcing all batches have the same producer 
ID
 Key: KAFKA-14958
 URL: https://issues.apache.org/jira/browse/KAFKA-14958
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan


KAFKA-14916 was created after I incorrectly assumed transaction ID in the 
produce request indicated all batches were transactional.

Originally this ticket had an action item to ensure all the producer IDs are 
the same in the batches since we send a single txn ID, but we decided this can 
be done in a followup, as we still need to assess if we can enforce this 
without breaking workloads.

This ticket is that followup. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14854) Refactor inter broker send thread to handle all interbroker requests on one thread

2023-03-27 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14854:
--

 Summary: Refactor inter broker send thread to handle all 
interbroker requests on one thread
 Key: KAFKA-14854
 URL: https://issues.apache.org/jira/browse/KAFKA-14854
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan
Assignee: Justine Olshan


Currently we create a new thread for each interbroker request that implements 
InterbrokerSendThread. It would be better to implement a single thread that 
multiple request types can use with their custom logic. 

I propose creating a single thread that takes a collection of "managers" for 
each request and sends the requests generated. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14884) Include check transaction is still ongoing right before append

2023-04-07 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14884:
--

 Summary: Include check transaction is still ongoing right before 
append 
 Key: KAFKA-14884
 URL: https://issues.apache.org/jira/browse/KAFKA-14884
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan
Assignee: Justine Olshan


Even after checking via AddPartitionsToTxn, the transaction could be aborted 
after the response. We can add one more check before appending.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14895) Move AddPartitionsToTxnManager files to java

2023-04-11 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14895:
--

 Summary: Move AddPartitionsToTxnManager files to java
 Key: KAFKA-14895
 URL: https://issues.apache.org/jira/browse/KAFKA-14895
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan
Assignee: Justine Olshan


Followup task to move the files from scala to java.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14896) TransactionsBounceTest causes a thread leak

2023-04-11 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14896:
--

 Summary: TransactionsBounceTest causes a thread leak
 Key: KAFKA-14896
 URL: https://issues.apache.org/jira/browse/KAFKA-14896
 Project: Kafka
  Issue Type: Bug
Reporter: Justine Olshan
Assignee: Justine Olshan


On several PR builds I see a test fail with ["Producer closed forcefully" 
|https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-13391/21/testReport/junit/kafka.api/TransactionsBounceTest/Build___JDK_8_and_Scala_2_12___testWithGroupId__/]
and then many other tests fail with initialization errors due to 
[controller-event-thread,daemon-broker-bouncer-EventThread|https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-13391/21/testReport/junit/kafka.api/TransactionsBounceTest/Build___JDK_8_and_Scala_2_12___executionError/]

In TransactionsBounceTest.testBrokerFailure, we create this thread to bounce 
the brokers. There is a finally block to shut it down but it seems to not be 
working. We should shut it down correctly.

Examples of failures:
[https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-13391/21/#showFailuresLink]
[https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-13391/17/#showFailuresLink]
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14904) Flaky Test kafka.api.TransactionsBounceTest.testWithGroupId()

2023-04-13 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14904:
--

 Summary: Flaky Test  
kafka.api.TransactionsBounceTest.testWithGroupId()
 Key: KAFKA-14904
 URL: https://issues.apache.org/jira/browse/KAFKA-14904
 Project: Kafka
  Issue Type: Test
Reporter: Justine Olshan
Assignee: Justine Olshan


After merging KAFKA-14561 I noticed this test still occasionally failed via 

org.apache.kafka.common.errors.TimeoutException: Timeout expired after 6ms 
while awaiting EndTxn(true)

I will investigate the cause. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14884) Include check transaction is still ongoing right before append

2023-07-17 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-14884.

Resolution: Fixed

> Include check transaction is still ongoing right before append 
> ---
>
> Key: KAFKA-14884
> URL: https://issues.apache.org/jira/browse/KAFKA-14884
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 3.6.0
>        Reporter: Justine Olshan
>    Assignee: Justine Olshan
>Priority: Blocker
>
> Even after checking via AddPartitionsToTxn, the transaction could be aborted 
> after the response. We can add one more check before appending.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15044) Snappy v.1.1.9.1 NoClassDefFound on ARM machines

2023-05-31 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-15044.

Resolution: Fixed

> Snappy v.1.1.9.1 NoClassDefFound on ARM machines
> 
>
> Key: KAFKA-15044
> URL: https://issues.apache.org/jira/browse/KAFKA-15044
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: David Mao
>Assignee: David Mao
>Priority: Major
>
> We upgraded our snappy dependency but v1.1.9.1 has compatibility issues with 
> arm. We should upgrade to v1.1.10.0 which resolves this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15028) AddPartitionsToTxnManager metrics

2023-05-25 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15028:
--

 Summary: AddPartitionsToTxnManager metrics
 Key: KAFKA-15028
 URL: https://issues.apache.org/jira/browse/KAFKA-15028
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan


KIP-890 added metrics for the AddPartitionsToTxnManager

VerificationTimeMs – number of milliseconds from adding partition info to the 
manager to the time the response is sent. This will include the round trip to 
the transaction coordinator if it is called. This will also account for 
verifications that fail before the coordinator is called.

VerificationFailureRate – rate of verifications that returned in failure either 
from the AddPartitionsToTxn response or through errors in the manager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14920) Address timeouts and out of order sequences

2023-07-24 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-14920.

Resolution: Fixed

> Address timeouts and out of order sequences
> ---
>
> Key: KAFKA-14920
> URL: https://issues.apache.org/jira/browse/KAFKA-14920
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 3.6.0
>        Reporter: Justine Olshan
>    Assignee: Justine Olshan
>Priority: Blocker
>
> KAFKA-14844 showed the destructive nature of a timeout on the first produce 
> request for a topic partition (ie one that has no state in psm)
> Since we currently don't validate the first sequence (we will in part 2 of 
> kip-890), any transient error on the first produce can lead to out of order 
> sequences that never recover.
> Originally, KAFKA-14561 relied on the producer's retry mechanism for these 
> transient issues, but until that is fixed, we may need to retry from in the 
> AddPartitionsManager instead. We addressed the concurrent transactions, but 
> there are other errors like coordinator loading that we could run into and 
> see increased out of order issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15028) AddPartitionsToTxnManager metrics

2023-06-28 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-15028.

Resolution: Fixed

> AddPartitionsToTxnManager metrics
> -
>
> Key: KAFKA-15028
> URL: https://issues.apache.org/jira/browse/KAFKA-15028
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Justine Olshan
>        Assignee: Justine Olshan
>Priority: Major
> Attachments: latency-cpu.html
>
>
> KIP-890 added metrics for the AddPartitionsToTxnManager
> VerificationTimeMs – number of milliseconds from adding partition info to the 
> manager to the time the response is sent. This will include the round trip to 
> the transaction coordinator if it is called. This will also account for 
> verifications that fail before the coordinator is called.
> VerificationFailureRate – rate of verifications that returned in failure 
> either from the AddPartitionsToTxn response or through errors in the manager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15099) Flaky Test kafka.api.TransactionsTest.testBumpTransactionalEpoch(String).quorum=kraft

2023-06-16 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15099:
--

 Summary: Flaky Test 
kafka.api.TransactionsTest.testBumpTransactionalEpoch(String).quorum=kraft
 Key: KAFKA-15099
 URL: https://issues.apache.org/jira/browse/KAFKA-15099
 Project: Kafka
  Issue Type: Bug
Reporter: Justine Olshan


This one often fails with: 

org.apache.kafka.common.errors.TimeoutException: Timeout expired after 6ms 
while awaiting InitProducerId

seems like a Kraft only issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14916) Fix code that assumes transactional ID implies all records are transactional

2023-05-04 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-14916.

Resolution: Fixed

> Fix code that assumes transactional ID implies all records are transactional
> 
>
> Key: KAFKA-14916
> URL: https://issues.apache.org/jira/browse/KAFKA-14916
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 3.6.0
>        Reporter: Justine Olshan
>    Assignee: Justine Olshan
>Priority: Blocker
>
> KAFKA-14561 wrote code that assumed that if a transactional ID was included, 
> all record batches were transactional and had the same producer ID.
> This work with improve validation and fix the code that assumes all batches 
> are transactional.
> Further, KAFKA-14561 will not assume all records are transactional.
> Originally this ticket had an action item to ensure all the producer IDs are 
> the same in the batches since we send a single txn ID, but that can be done 
> in a followup KAFKA-14958, as we still need to assess if we can enforce this 
> without breaking workloads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16192) Introduce usage of flexible records to coordinators

2024-01-24 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-16192:
--

 Summary: Introduce usage of flexible records to coordinators
 Key: KAFKA-16192
 URL: https://issues.apache.org/jira/browse/KAFKA-16192
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan
Assignee: Justine Olshan


[KIP-915| 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-915%3A+Txn+and+Group+Coordinator+Downgrade+Foundation]
 introduced flexible versions to the records used for the group and transaction 
coordinators.
However, the KIP did not update the record version used.

For 
[KIP-890|https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+Server-Side+Defense]
 we intend to use flexible fields in the transaction state records. This 
requires a safe way to upgrade from non-flexible version records to flexible 
version records.

Typically this is done as a message format bump. There may be an option to make 
this change using MV since if the readers of the records are internal and not 
external consumers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16229) Slow expiration of Producer IDs leading to high CPU usage

2024-02-12 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-16229.

Resolution: Fixed

> Slow expiration of Producer IDs leading to high CPU usage
> -
>
> Key: KAFKA-16229
> URL: https://issues.apache.org/jira/browse/KAFKA-16229
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> Expiration of ProducerIds is implemented with a slow removal of map keys:
> ```
>         producers.keySet().removeAll(keys);
> ```
> Unnecessarily going through all producer ids and then throw all expired keys 
> to be removed.
> This leads to exponential time on worst case when most/all keys need to be 
> removed:
> ```
> Benchmark                                        (numProducerIds)  Mode  Cnt  
>          Score            Error  Units
> ProducerStateManagerBench.testDeleteExpiringIds               100  avgt    3  
>       9164.043 ±      10647.877  ns/op
> ProducerStateManagerBench.testDeleteExpiringIds              1000  avgt    3  
>     341561.093 ±      20283.211  ns/op
> ProducerStateManagerBench.testDeleteExpiringIds             1  avgt    3  
>   44957983.550 ±    9389011.290  ns/op
> ProducerStateManagerBench.testDeleteExpiringIds            10  avgt    3  
> 5683374164.167 ± 1446242131.466  ns/op
> ```
> A simple fix is to use map#remove(key) instead, leading to a more linear 
> growth:
> ```
> Benchmark                                        (numProducerIds)  Mode  Cnt  
>       Score         Error  Units
> ProducerStateManagerBench.testDeleteExpiringIds               100  avgt    3  
>    5779.056 ±     651.389  ns/op
> ProducerStateManagerBench.testDeleteExpiringIds              1000  avgt    3  
>   61430.530 ±   21875.644  ns/op
> ProducerStateManagerBench.testDeleteExpiringIds             1  avgt    3  
>  643887.031 ±  600475.302  ns/op
> ProducerStateManagerBench.testDeleteExpiringIds            10  avgt    3  
> 7741689.539 ± 3218317.079  ns/op
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16245) DescribeConsumerGroupTest failing

2024-02-12 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-16245:
--

 Summary: DescribeConsumerGroupTest failing
 Key: KAFKA-16245
 URL: https://issues.apache.org/jira/browse/KAFKA-16245
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan


The first instances on trunk are in this PR 
[https://github.com/apache/kafka/pull/15275]
And this PR seems to have it failing consistently in the builds when it wasn't 
failing this consistently before.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15665) Enforce ISR to have all target replicas when complete partition reassignment

2024-02-21 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-15665.

Resolution: Fixed

> Enforce ISR to have all target replicas when complete partition reassignment
> 
>
> Key: KAFKA-15665
> URL: https://issues.apache.org/jira/browse/KAFKA-15665
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Calvin Liu
>Assignee: Calvin Liu
>Priority: Major
>
> Current partition reassignment can be completed when the new ISR is under min 
> ISR. We should fix this behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16012) Incomplete range assignment in consumer

2023-12-22 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-16012.

Resolution: Fixed

> Incomplete range assignment in consumer
> ---
>
> Key: KAFKA-16012
> URL: https://issues.apache.org/jira/browse/KAFKA-16012
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Philip Nee
>Priority: Blocker
> Fix For: 3.7.0
>
>
> We were looking into test failures here: 
> https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1702475525--jolshan--kafka-15784--7cad567675/2023-12-13--001./2023-12-13–001./report.html.
>  
> Here is the first failure in the report:
> {code:java}
> 
> test_id:    
> kafkatest.tests.core.group_mode_transactions_test.GroupModeTransactionsTest.test_transactions.failure_mode=clean_bounce.bounce_target=brokers
> status:     FAIL
> run time:   3 minutes 4.950 seconds
>     TimeoutError('Consumer consumed only 88223 out of 10 messages in 
> 90s') {code}
>  
> We traced the failure to an apparent bug during the last rebalance before the 
> group became empty. The last remaining instance seems to receive an 
> incomplete assignment which prevents it from completing expected consumption 
> on some partitions. Here is the rebalance from the coordinator's perspective:
> {code:java}
> server.log.2023-12-13-04:[2023-12-13 04:58:56,987] INFO [GroupCoordinator 3]: 
> Stabilized group grouped-transactions-test-consumer-group generation 5 
> (__consumer_offsets-2) with 1 members 
> (kafka.coordinator.group.GroupCoordinator)
> server.log.2023-12-13-04:[2023-12-13 04:58:56,990] INFO [GroupCoordinator 3]: 
> Assignment received from leader 
> consumer-grouped-transactions-test-consumer-group-1-2164f472-93f3-4176-af3f-23d4ed8b37fd
>  for group grouped-transactions-test-consumer-group for generation 5. The 
> group has 1 members, 0 of which are static. 
> (kafka.coordinator.group.GroupCoordinator) {code}
> The group is down to one member in generation 5. In the previous generation, 
> the consumer in question reported this assignment:
> {code:java}
> // Gen 4: we've got partitions 0-4
> [2023-12-13 04:58:52,631] DEBUG [Consumer 
> clientId=consumer-grouped-transactions-test-consumer-group-1, 
> groupId=grouped-transactions-test-consumer-group] Executing onJoinComplete 
> with generation 4 and memberId 
> consumer-grouped-transactions-test-consumer-group-1-2164f472-93f3-4176-af3f-23d4ed8b37fd
>  (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2023-12-13 04:58:52,631] INFO [Consumer 
> clientId=consumer-grouped-transactions-test-consumer-group-1, 
> groupId=grouped-transactions-test-consumer-group] Notifying assignor about 
> the new Assignment(partitions=[input-topic-0, input-topic-1, input-topic-2, 
> input-topic-3, input-topic-4]) 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) {code}
> However, in generation 5, we seem to be assigned only one partition:
> {code:java}
> // Gen 5: Now we have only partition 1? But aren't we the last member in the 
> group?
> [2023-12-13 04:58:56,954] DEBUG [Consumer 
> clientId=consumer-grouped-transactions-test-consumer-group-1, 
> groupId=grouped-transactions-test-consumer-group] Executing onJoinComplete 
> with generation 5 and memberId 
> consumer-grouped-transactions-test-consumer-group-1-2164f472-93f3-4176-af3f-23d4ed8b37fd
>  (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2023-12-13 04:58:56,955] INFO [Consumer 
> clientId=consumer-grouped-transactions-test-consumer-group-1, 
> groupId=grouped-transactions-test-consumer-group] Notifying assignor about 
> the new Assignment(partitions=[input-topic-1]) 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) {code}
> The assignment type is range from the JoinGroup for generation 5. The decoded 
> metadata sent by the consumer is this:
> {code:java}
> Subscription(topics=[input-topic], ownedPartitions=[], groupInstanceId=null, 
> generationId=4, rackId=null) {code}
> Here is the decoded assignment from the SyncGroup:
> {code:java}
> Assignment(partitions=[input-topic-1]) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15784) Ensure atomicity of in memory update and write when transactionally committing offsets

2023-12-14 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-15784.

Resolution: Fixed

> Ensure atomicity of in memory update and write when transactionally 
> committing offsets
> --
>
> Key: KAFKA-15784
> URL: https://issues.apache.org/jira/browse/KAFKA-15784
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 3.7.0
>    Reporter: Justine Olshan
>    Assignee: Justine Olshan
>Priority: Blocker
>
> [https://github.com/apache/kafka/pull/14370] (KAFKA-15449) removed the 
> locking around validating, updating state, and writing to the log 
> transactional offset commits. (The verification causes us to release the lock)
> This was discovered in the discussion of 
> [https://github.com/apache/kafka/pull/14629] (KAFKA-15653).
> Since KAFKA-15653 is needed for 3.5.1, it makes sense to address the locking 
> issue separately with this ticket. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16045) ZkMigrationIntegrationTest.testMigrateTopicDeletion flaky

2023-12-21 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-16045:
--

 Summary: ZkMigrationIntegrationTest.testMigrateTopicDeletion flaky
 Key: KAFKA-16045
 URL: https://issues.apache.org/jira/browse/KAFKA-16045
 Project: Kafka
  Issue Type: Test
Reporter: Justine Olshan


I'm seeing ZkMigrationIntegrationTest.testMigrateTopicDeletion fail for many 
builds. I believe it is also causing a thread leak because on most runs where 
it fails I also see ReplicaManager tests also fail with extra threads. 

The test always fails 
`org.opentest4j.AssertionFailedError: Timed out waiting for topics to be 
deleted`


gradle enterprise link:

[https://ge.apache.org/scans/tests?search.names=Git%20branch[…]lues=trunk=kafka.zk.ZkMigrationIntegrationTest|https://ge.apache.org/scans/tests?search.names=Git%20branch=P28D=kafka=America%2FLos_Angeles=trunk=kafka.zk.ZkMigrationIntegrationTest]

recent pr: 
[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15023/18/tests/]
trunk builds: 
[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/2502/tests],
 
[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/2501/tests]
 (edited) 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16122) TransactionsBounceTest -- server disconnected before response was received

2024-01-12 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-16122:
--

 Summary: TransactionsBounceTest -- server disconnected before 
response was received
 Key: KAFKA-16122
 URL: https://issues.apache.org/jira/browse/KAFKA-16122
 Project: Kafka
  Issue Type: Test
Reporter: Justine Olshan


I noticed a ton of tests failing with 


h4.  
{code:java}
Error  org.apache.kafka.common.KafkaException: Unexpected error in 
TxnOffsetCommitResponse: The server disconnected before a response was 
received.  {code}
{code:java}
Stacktrace  org.apache.kafka.common.KafkaException: Unexpected error in 
TxnOffsetCommitResponse: The server disconnected before a response was 
received.  at 
app//org.apache.kafka.clients.producer.internals.TransactionManager$TxnOffsetCommitHandler.handleResponse(TransactionManager.java:1702)
  at 
app//org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1236)
  at 
app//org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:154)
  at 
app//org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:608)
  at app//org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:600)  
at 
app//org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:457)
  at 
app//org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:334)
  at 
app//org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:249)  
at java.base@21.0.1/java.lang.Thread.run(Thread.java:1583) {code}


The error indicates a network error which is retriable but the TxnOffsetCommit 
handler doesn't expect this. 

https://issues.apache.org/jira/browse/KAFKA-14417 addressed many of the other 
requests but not this one. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15975) Update kafka quickstart guide to no longer list ZK start first

2023-12-05 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15975:
--

 Summary: Update kafka quickstart guide to no longer list ZK start 
first
 Key: KAFKA-15975
 URL: https://issues.apache.org/jira/browse/KAFKA-15975
 Project: Kafka
  Issue Type: Task
  Components: docs
Affects Versions: 4.0.0
Reporter: Justine Olshan


Given we are deprecating ZooKeeper, I think we should update our quickstart 
guide to not list the ZooKeeper instructions first.

With 4.0, we may want to remove it entirely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15957) ConsistencyVectorIntegrationTest.shouldHaveSamePositionBoundActiveAndStandBy broken

2023-12-01 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15957:
--

 Summary: 
ConsistencyVectorIntegrationTest.shouldHaveSamePositionBoundActiveAndStandBy 
broken
 Key: KAFKA-15957
 URL: https://issues.apache.org/jira/browse/KAFKA-15957
 Project: Kafka
  Issue Type: Bug
Reporter: Justine Olshan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15984) Client disconnections can cause hanging transactions on __consumer_offsets

2023-12-06 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15984:
--

 Summary: Client disconnections can cause hanging transactions on 
__consumer_offsets
 Key: KAFKA-15984
 URL: https://issues.apache.org/jira/browse/KAFKA-15984
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan


When investigating frequent hanging transactions on __consumer_offsets 
partitions, we realized that many of them were cause by the same offset being 
committed with duplicates and one with `"isDisconnectedClient":true`. 

TxnOffsetCommits do not have sequence numbers and thus are not protected 
against duplicates in the same way idempotent produce requests are. Thus, when 
a client is disconnected (and flushes its requests), we may see the duplicate 
get appended to the log. 

KIP-890 part 1 should protect against this as the duplicate will not succeed 
verification. KIP-890 part 2 strengthens this further as duplicates (from 
previous transactions) can not be added to new transactions if the partitions 
is re-added since the epoch will be bumped. 

Another possible solution is to do duplicate checking on the group coordinator 
side when the request comes in. This solution could be used instead of KIP-890 
part 1 to prevent hanging transactions but given that part 1 only has one open 
PR remaining, we may not need to do this. However, this can also prevent 
duplicates from being added to a new transaction – something only part 2 will 
protect against.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15987) Refactor ReplicaManager code

2023-12-07 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-15987:
--

 Summary: Refactor ReplicaManager code
 Key: KAFKA-15987
 URL: https://issues.apache.org/jira/browse/KAFKA-15987
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan


I started to do this in KAFKA-15784, but the diff was deemed too large and 
confusing. I just wanted to file a followup ticket to reference this in code 
for the areas that will be refactored.

 

I hope to tackle it immediately after.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16122) TransactionsBounceTest -- server disconnected before response was received

2024-01-26 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-16122.

Resolution: Fixed

> TransactionsBounceTest -- server disconnected before response was received
> --
>
> Key: KAFKA-16122
> URL: https://issues.apache.org/jira/browse/KAFKA-16122
> Project: Kafka
>  Issue Type: Test
>    Reporter: Justine Olshan
>        Assignee: Justine Olshan
>Priority: Major
>
> I noticed a ton of tests failing with 
> h4.  
> {code:java}
> Error  org.apache.kafka.common.KafkaException: Unexpected error in 
> TxnOffsetCommitResponse: The server disconnected before a response was 
> received.  {code}
> {code:java}
> Stacktrace  org.apache.kafka.common.KafkaException: Unexpected error in 
> TxnOffsetCommitResponse: The server disconnected before a response was 
> received.  at 
> app//org.apache.kafka.clients.producer.internals.TransactionManager$TxnOffsetCommitHandler.handleResponse(TransactionManager.java:1702)
>   at 
> app//org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1236)
>   at 
> app//org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:154)
>   at 
> app//org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:608)
>   at app//org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:600) 
>  at 
> app//org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:457)
>   at 
> app//org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:334)
>   at 
> app//org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:249)  
> at java.base@21.0.1/java.lang.Thread.run(Thread.java:1583) {code}
> The error indicates a network error which is retriable but the 
> TxnOffsetCommit handler doesn't expect this. 
> https://issues.apache.org/jira/browse/KAFKA-14417 addressed many of the other 
> requests but not this one. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15987) Refactor ReplicaManager code for transaction verification

2024-01-26 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-15987.

Resolution: Fixed

> Refactor ReplicaManager code for transaction verification
> -
>
> Key: KAFKA-15987
> URL: https://issues.apache.org/jira/browse/KAFKA-15987
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Justine Olshan
>        Assignee: Justine Olshan
>Priority: Major
>
> I started to do this in KAFKA-15784, but the diff was deemed too large and 
> confusing. I just wanted to file a followup ticket to reference this in code 
> for the areas that will be refactored.
>  
> I hope to tackle it immediately after.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15653) NPE in ChunkedByteStream

2023-11-15 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-15653.

Fix Version/s: 3.7.0
   3.6.1
   Resolution: Fixed

> NPE in ChunkedByteStream
> 
>
> Key: KAFKA-15653
> URL: https://issues.apache.org/jira/browse/KAFKA-15653
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 3.6.0
> Environment: Docker container on a Linux laptop, using the latest 
> release.
>Reporter: Travis Bischel
>Assignee: Justine Olshan
>Priority: Major
> Fix For: 3.7.0, 3.6.1
>
> Attachments: repro.sh
>
>
> When looping franz-go integration tests, I received an UNKNOWN_SERVER_ERROR 
> from producing. The broker logs for the failing request:
>  
> {noformat}
> [2023-10-19 22:29:58,160] ERROR [ReplicaManager broker=2] Error processing 
> append operation on partition 
> 2fa8995d8002fbfe68a96d783f26aa2c5efc15368bf44ed8f2ab7e24b41b9879-24 
> (kafka.server.ReplicaManager)
> java.lang.NullPointerException
>   at 
> org.apache.kafka.common.utils.ChunkedBytesStream.(ChunkedBytesStream.java:89)
>   at 
> org.apache.kafka.common.record.CompressionType$3.wrapForInput(CompressionType.java:105)
>   at 
> org.apache.kafka.common.record.DefaultRecordBatch.recordInputStream(DefaultRecordBatch.java:273)
>   at 
> org.apache.kafka.common.record.DefaultRecordBatch.compressedIterator(DefaultRecordBatch.java:277)
>   at 
> org.apache.kafka.common.record.DefaultRecordBatch.skipKeyValueIterator(DefaultRecordBatch.java:352)
>   at 
> org.apache.kafka.storage.internals.log.LogValidator.validateMessagesAndAssignOffsetsCompressed(LogValidator.java:358)
>   at 
> org.apache.kafka.storage.internals.log.LogValidator.validateMessagesAndAssignOffsets(LogValidator.java:165)
>   at kafka.log.UnifiedLog.append(UnifiedLog.scala:805)
>   at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:719)
>   at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1313)
>   at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1301)
>   at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:1210)
>   at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
>   at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
>   at scala.collection.mutable.HashMap.map(HashMap.scala:35)
>   at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:1198)
>   at kafka.server.ReplicaManager.appendEntries$1(ReplicaManager.scala:754)
>   at 
> kafka.server.ReplicaManager.$anonfun$appendRecords$18(ReplicaManager.scala:874)
>   at 
> kafka.server.ReplicaManager.$anonfun$appendRecords$18$adapted(ReplicaManager.scala:874)
>   at 
> kafka.server.KafkaRequestHandler$.$anonfun$wrap$3(KafkaRequestHandler.scala:73)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:130)
>   at java.base/java.lang.Thread.run(Unknown Source)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16570) FenceProducers API returns "unexpected error" when successful

2024-04-16 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-16570:
--

 Summary: FenceProducers API returns "unexpected error" when 
successful
 Key: KAFKA-16570
 URL: https://issues.apache.org/jira/browse/KAFKA-16570
 Project: Kafka
  Issue Type: Bug
Reporter: Justine Olshan
Assignee: Justine Olshan


When we want to fence a producer using the admin client, we send an 
InitProducerId request.

There is logic in that API to fence (and abort) any ongoing transactions and 
that is what the API relies on to fence the producer. However, this handling 
also returns CONCURRENT_TRANSACTIONS. In normal usage, this is good because we 
want to actually get a new producer ID and want to retry until the the ID is 
supplied or we time out.  
[https://github.com/apache/kafka/blob/5193eb93237ba9093ae444d73a1eaa2d6abcc9c1/core/src/main/scala/kafka/coordinator/transaction/TransactionCoordinator.scala#L170]
 



In the case of fence producer, we don't retry and instead we have no handling 
for concurrent transactions and log a message about an unexpected error.
[https://github.com/confluentinc/ce-kafka/blob/b626db8bd94fe971adef3551518761a7be7de454/clients/src/main/java/org/apache/kafka/clients/admin/internals/FenceProducersHandler.java#L112]
 

This is not unexpected though and the operation was successful. We should just 
swallow this error and treat this as a successful run of the command. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16513) Allow WriteTxnMarkers API with Alter Cluster Permission

2024-05-10 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-16513.

Resolution: Fixed

> Allow WriteTxnMarkers API with Alter Cluster Permission
> ---
>
> Key: KAFKA-16513
> URL: https://issues.apache.org/jira/browse/KAFKA-16513
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Reporter: Nikhil Ramakrishnan
>Assignee: Siddharth Yagnik
>Priority: Minor
>  Labels: KIP-1037
> Fix For: 3.8.0
>
>
> We should allow WriteTxnMarkers API with Alter Cluster Permission because it 
> can invoked externally by a Kafka AdminClient. Such usage is more aligned 
> with the Alter permission on the Cluster resource, which includes other 
> administrative actions invoked from the Kafka AdminClient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16451) testDeltaFollower tests failing in ReplicaManager

2024-03-29 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-16451.

Resolution: Duplicate

> testDeltaFollower tests failing in ReplicaManager
> -
>
> Key: KAFKA-16451
> URL: https://issues.apache.org/jira/browse/KAFKA-16451
> Project: Kafka
>  Issue Type: Bug
>    Reporter: Justine Olshan
>Priority: Major
>
> many ReplicaManagerTests with the prefix testDeltaFollower seem to be 
> failing. A few other ReplicaManager tests as well. See existing failures in 
> [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/2765/tests]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16451) testDeltaFollower tests failing in ReplicaManager

2024-03-29 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-16451:
--

 Summary: testDeltaFollower tests failing in ReplicaManager
 Key: KAFKA-16451
 URL: https://issues.apache.org/jira/browse/KAFKA-16451
 Project: Kafka
  Issue Type: Bug
Reporter: Justine Olshan


many ReplicaManagerTests with the prefix testDeltaFollower seem to be failing. 
A few other ReplicaManager tests as well. See existing failures in 
[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/2765/tests]
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16302) Builds failing due to streams test execution failures

2024-02-22 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-16302:
--

 Summary: Builds failing due to streams test execution failures
 Key: KAFKA-16302
 URL: https://issues.apache.org/jira/browse/KAFKA-16302
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan


I'm seeing this on master and many PR builds for all versions:

```
[2024-02-22T14:37:07.076Z] * What went wrong: 

[|https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15417/1/pipeline#step-89-log-1426][2024-02-22T14:37:07.076Z]
 Execution failed for task ':streams:test'. 

[|https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15417/1/pipeline#step-89-log-1427][2024-02-22T14:37:07.076Z]
 > The following test methods could not be retried, which is unexpected. Please 
file a bug report at 
[https://github.com/gradle/test-retry-gradle-plugin/issues] 

[|https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15417/1/pipeline#step-89-log-1428][2024-02-22T14:37:07.076Z]
 
org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStoreTest#shouldLogAndMeasureExpiredRecords[org.apache.kafka.streams.state.internals.SessionKeySchema@78d39a69]
 

[|https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15417/1/pipeline#step-89-log-1429][2024-02-22T14:37:07.076Z]
 
org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStoreTest#shouldLogAndMeasureExpiredRecords[org.apache.kafka.streams.state.internals.WindowKeySchema@3c818ac4]
 

[|https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15417/1/pipeline#step-89-log-1430][2024-02-22T14:37:07.076Z]
 
org.apache.kafka.streams.state.internals.RocksDBTimestampedSegmentedBytesStoreTest#shouldLogAndMeasureExpiredRecords[org.apache.kafka.streams.state.internals.WindowKeySchema@251f7d26]
 

[|https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15417/1/pipeline#step-89-log-1431][2024-02-22T14:37:07.076Z]
 
org.apache.kafka.streams.state.internals.RocksDBTimestampedSegmentedBytesStoreTest#shouldLogAndMeasureExpiredRecords[org.apache.kafka.streams.state.internals.SessionKeySchema@52c8295b]
```
 


[|https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15417/1/pipeline#step-89-log-1432][2024-02-22T14:37:07.076Z]
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16302) Builds failing due to streams test execution failures

2024-02-22 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-16302.

Resolution: Fixed

> Builds failing due to streams test execution failures
> -
>
> Key: KAFKA-16302
> URL: https://issues.apache.org/jira/browse/KAFKA-16302
> Project: Kafka
>  Issue Type: Task
>  Components: streams, unit tests
>        Reporter: Justine Olshan
>    Assignee: Justine Olshan
>Priority: Major
>
> I'm seeing this on master and many PR builds for all versions:
>  
> {code:java}
> [2024-02-22T14:37:07.076Z] * What went wrong:
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15417/1/pipeline#step-89-log-1426[2024-02-22T14:37:07.076Z]
>  Execution failed for task ':streams:test'.
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15417/1/pipeline#step-89-log-1427[2024-02-22T14:37:07.076Z]
>  > The following test methods could not be retried, which is unexpected. 
> Please file a bug report at 
> https://github.com/gradle/test-retry-gradle-plugin/issues
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15417/1/pipeline#step-89-log-1428[2024-02-22T14:37:07.076Z]
>  
> org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStoreTest#shouldLogAndMeasureExpiredRecords[org.apache.kafka.streams.state.internals.SessionKeySchema@78d39a69]
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15417/1/pipeline#step-89-log-1429[2024-02-22T14:37:07.076Z]
>  
> org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStoreTest#shouldLogAndMeasureExpiredRecords[org.apache.kafka.streams.state.internals.WindowKeySchema@3c818ac4]
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15417/1/pipeline#step-89-log-1430[2024-02-22T14:37:07.076Z]
>  
> org.apache.kafka.streams.state.internals.RocksDBTimestampedSegmentedBytesStoreTest#shouldLogAndMeasureExpiredRecords[org.apache.kafka.streams.state.internals.WindowKeySchema@251f7d26]
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15417/1/pipeline#step-89-log-1431[2024-02-22T14:37:07.076Z]
>  
> org.apache.kafka.streams.state.internals.RocksDBTimestampedSegmentedBytesStoreTest#shouldLogAndMeasureExpiredRecords[org.apache.kafka.streams.state.internals.SessionKeySchema@52c8295b]
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15417/1/pipeline#step-89-log-1432[2024-02-22T14:37:07.076Z]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16308) Formatting and Updating Kafka Features

2024-02-26 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-16308:
--

 Summary: Formatting and Updating Kafka Features
 Key: KAFKA-16308
 URL: https://issues.apache.org/jira/browse/KAFKA-16308
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan
Assignee: Justine Olshan


As part of KIP-1022, we need to extend the storage and upgrade tools to support 
features other than metadata version. 

See 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Formatting+and+Updating+Features



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16841) ZKMigrationIntegrationTests broken

2024-05-27 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-16841.

Resolution: Fixed

fixed by 
https://github.com/apache/kafka/commit/bac8df56ffdf8a64ecfb78ec0779bcbc8e9f7c10

> ZKMigrationIntegrationTests broken
> --
>
> Key: KAFKA-16841
> URL: https://issues.apache.org/jira/browse/KAFKA-16841
> Project: Kafka
>  Issue Type: Task
>    Reporter: Justine Olshan
>Priority: Blocker
>
> A recent merge to trunk seems to have broken tests so that I see 78 failures 
> in the CI. 
> I see lots of timeout errors and `Alter Topic Configs had an error`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16692) InvalidRequestException: ADD_PARTITIONS_TO_TXN with version 4 which is not enabled when upgrading from kafka 3.5 to 3.6

2024-05-20 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-16692.

Fix Version/s: 3.6.3
   Resolution: Fixed

> InvalidRequestException: ADD_PARTITIONS_TO_TXN with version 4 which is not 
> enabled when upgrading from kafka 3.5 to 3.6 
> 
>
> Key: KAFKA-16692
> URL: https://issues.apache.org/jira/browse/KAFKA-16692
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.7.0, 3.6.1, 3.8
>Reporter: Johnson Okorie
>Assignee: Justine Olshan
>Priority: Major
> Fix For: 3.7.1, 3.6.3, 3.8
>
>
> We have a kafka cluster running on version 3.5.2 that we are upgrading to 
> 3.6.1. This cluster has a lot of clients with exactly one semantics enabled 
> and hence creating transactions. As we replaced brokers with the new 
> binaries, we observed lots of clients in the cluster experiencing the 
> following error:
> {code:java}
> 2024-05-07T09:08:10.039Z "tid": "" -- [Producer clientId=, 
> transactionalId=] Got error produce response with 
> correlation id 6402937 on topic-partition , retrying 
> (2147483512 attempts left). Error: NETWORK_EXCEPTION. Error Message: The 
> server disconnected before a response was received.{code}
> On inspecting the broker, we saw the following errors on brokers still 
> running Kafka version 3.5.2:
>  
> {code:java}
> message:     
> Closing socket for  because of error
> exception_exception_class:    
> org.apache.kafka.common.errors.InvalidRequestException
> exception_exception_message:    
> Received request api key ADD_PARTITIONS_TO_TXN with version 4 which is not 
> enabled
> exception_stacktrace:    
> org.apache.kafka.common.errors.InvalidRequestException: Received request api 
> key ADD_PARTITIONS_TO_TXN with version 4 which is not enabled
> {code}
> On the new brokers running 3.6.1 we saw the following errors:
>  
> {code:java}
> [AddPartitionsToTxnSenderThread-1055]: AddPartitionsToTxnRequest failed for 
> node 1043 with a network exception.{code}
>  
> I can also see this :
> {code:java}
> [AddPartitionsToTxnManager broker=1055]Cancelled in-flight 
> ADD_PARTITIONS_TO_TXN request with correlation id 21120 due to node 1043 
> being disconnected (elapsed time since creation: 11ms, elapsed time since 
> send: 4ms, request timeout: 3ms){code}
> We started investigating this issue and digging through the changes in 3.6, 
> we came across some changes introduced as part of KAFKA-14402 that we thought 
> might lead to this behaviour. 
> First we could see that _transaction.partition.verification.enable_ is 
> enabled by default and enables a new code path that culminates in we sending 
> version 4 ADD_PARTITIONS_TO_TXN requests to other brokers that are generated 
> [here|https://github.com/apache/kafka/blob/29f3260a9c07e654a28620aeb93a778622a5233d/core/src/main/scala/kafka/server/AddPartitionsToTxnManager.scala#L269].
> From a 
> [discussion|https://lists.apache.org/thread/4895wrd1z92kjb708zck4s1f62xq6r8x] 
> on the mailing list, [~jolshan] pointed out that this scenario shouldn't be 
> possible as the following code paths should prevent version 4 
> ADD_PARTITIONS_TO_TXN requests being sent to other brokers:
> [https://github.com/apache/kafka/blob/525b9b1d7682ae2a527ceca83fedca44b1cba11a/clients/src/main/java/org/apache/kafka/clients/NodeApiVersions.java#L130]
>  
> [https://github.com/apache/kafka/blob/525b9b1d7682ae2a527ceca83fedca44b1cba11a/core/src/main/scala/kafka/server/AddPartitionsToTxnManager.scala#L195]
> However, these requests are still sent to other brokers in our environment.
> On further inspection of the code, I am wondering if the following code path 
> could lead to this issue:
> [https://github.com/apache/kafka/blob/c4deed513057c94eb502e64490d6bdc23551d8b6/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L500]
> In this scenario, we don't have any _NodeApiVersions_ available for the 
> specified nodeId and potentially skipping the _latestUsableVersion_ check. I 
> am wondering if it is possible that because _discoverBrokerVersions_ is set 
> to _false_ for the network client of the {_}AddPartitionsToTxnManager{_}, it 
> skips fetching {_}NodeApiVersions{_}? I can see that we create the network 
> client here:
> [https://github.com/apache/kafka/blob/c4deed513057c94eb502e64490d6bdc23551d8b6/core/src/main/scala/kafka/server/KafkaServer.scala#L641]
> The _NetworkUtils.buildNetworkClient_ method 

[jira] [Created] (KAFKA-16866) RemoteLogManagerTest.testCopyQuotaManagerConfig failing

2024-05-30 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-16866:
--

 Summary: RemoteLogManagerTest.testCopyQuotaManagerConfig failing
 Key: KAFKA-16866
 URL: https://issues.apache.org/jira/browse/KAFKA-16866
 Project: Kafka
  Issue Type: Test
Affects Versions: 3.8.0
Reporter: Justine Olshan


Seems like this test introduced in [https://github.com/apache/kafka/pull/15625] 
is failing consistently.

org.opentest4j.AssertionFailedError: 
Expected :61
Actual   :11



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16841) ZKIntegrationTests broken

2024-05-25 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-16841:
--

 Summary: ZKIntegrationTests broken
 Key: KAFKA-16841
 URL: https://issues.apache.org/jira/browse/KAFKA-16841
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan


A recent merge to trunk seems to have broken tests so that I see 78 failures in 
the CI. 

I see lots of timeout errors and `Alter Topic Configs had an error`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16990) Unrecognised flag passed to kafka-storage.sh in system test

2024-06-25 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-16990.

Fix Version/s: 3.9.0
   Resolution: Fixed

> Unrecognised flag passed to kafka-storage.sh in system test
> ---
>
> Key: KAFKA-16990
> URL: https://issues.apache.org/jira/browse/KAFKA-16990
> Project: Kafka
>  Issue Type: Test
>Affects Versions: 3.8.0
>Reporter: Gaurav Narula
>    Assignee: Justine Olshan
>Priority: Major
> Fix For: 3.8.0, 3.9.0
>
>
> Running 
> {{TC_PATHS="tests/kafkatest/tests/core/kraft_upgrade_test.py::TestKRaftUpgrade"
>  bash tests/docker/run_tests.sh}} on trunk (c4a3d2475f) fails with the 
> following:
> {code:java}
> [INFO:2024-06-18 09:16:03,139]: Triggering test 2 of 32...
> [INFO:2024-06-18 09:16:03,147]: RunnerClient: Loading test {'directory': 
> '/opt/kafka-dev/tests/kafkatest/tests/core', 'file_name': 
> 'kraft_upgrade_test.py', 'cls_name': 'TestKRaftUpgrade', 'method_name': 
> 'test_isolated_mode_upgrade', 'injected_args': {'from_kafka_version': 
> '3.1.2', 'use_new_coordinator': True, 'metadata_quorum': 'ISOLATED_KRAFT'}}
> [INFO:2024-06-18 09:16:03,151]: RunnerClient: 
> kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT:
>  on run 1/1
> [INFO:2024-06-18 09:16:03,153]: RunnerClient: 
> kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT:
>  Setting up...
> [INFO:2024-06-18 09:16:03,153]: RunnerClient: 
> kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT:
>  Running...
> [INFO:2024-06-18 09:16:05,999]: RunnerClient: 
> kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT:
>  Tearing down...
> [INFO:2024-06-18 09:16:12,366]: RunnerClient: 
> kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT:
>  FAIL: RemoteCommandError({'ssh_config': {'host': 'ducker10', 'hostname': 
> 'ducker10', 'user': 'ducker', 'port': 22, 'password': '', 'identityfile': 
> '/home/ducker/.ssh/id_rsa', 'connecttimeout': None}, 'hostname': 'ducker10', 
> 'ssh_hostname': 'ducker10', 'user': 'ducker', 'externally_routable_ip': 
> 'ducker10', '_logger':  kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT-2
>  (DEBUG)>, 'os': 'linux', '_ssh_client':  0x85bccc70>, '_sftp_client':  0x85bccdf0>, '_custom_ssh_exception_checks': None}, 
> '/opt/kafka-3.1.2/bin/kafka-storage.sh format --ignore-formatted --config 
> /mnt/kafka/kafka.properties --cluster-id I2eXt9rvSnyhct8BYmW6-w -f 
> group.version=1', 1, b"usage: kafka-storage format [-h] --config CONFIG 
> --cluster-id CLUSTER_ID\n                     
> [--ignore-formatted]\nkafka-storage: error: unrecognized arguments: '-f'\n")
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 186, in _do_run
>     data = self.run_test()
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 246, in run_test
>     return self.test_context.function(self.test)
>   File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 
> 433, in wrapper
>     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File "/opt/kafka-dev/tests/kafkatest/tests/core/kraft_upgrade_test.py", 
> line 132, in test_isolated_mode_upgrade
>     self.run_upgrade(from_kafka_version, group_protocol)
>   File "/opt/kafka-dev/tests/kafkatest/tests/core/kraft_upgrade_test.py", 
> line 96, in run_upgrade
>     self.kafka.start()
>   File "/opt/kafka-dev/tests/kafkatest/services/kafka/kafka.py", line 669, in 
> start
>     self.isolated_controller_quorum.start()
>   File "/opt/kafka-dev/tests/kafkatest/services/kafka/kafka.py", line 671, in 
> start
>     Service.start(self, **kwargs)
>   File "/usr/local/lib/python3.9/dist-packages/ducktape/services/service.py", 
> line 265, in start
>

[jira] [Created] (KAFKA-17050) Revert group.version for 3.8 and 3.9

2024-06-27 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-17050:
--

 Summary: Revert group.version for 3.8 and 3.9
 Key: KAFKA-17050
 URL: https://issues.apache.org/jira/browse/KAFKA-17050
 Project: Kafka
  Issue Type: Task
Affects Versions: 3.8.0, 3.9.0
Reporter: Justine Olshan
Assignee: Justine Olshan


After much discussion for KAFKA-17011, we decided it would be best for 3.8 to 
just remove the group version feature for 3.8. 

As for 3.9, [~dajac] said it would be easier for EA users of the group 
coordinator to have a single way to configure. For 4.0 we can reintroduce it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-16990) Unrecognised flag passed to kafka-storage.sh in system test

2024-06-24 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan reopened KAFKA-16990:


> Unrecognised flag passed to kafka-storage.sh in system test
> ---
>
> Key: KAFKA-16990
> URL: https://issues.apache.org/jira/browse/KAFKA-16990
> Project: Kafka
>  Issue Type: Test
>Affects Versions: 3.8.0
>Reporter: Gaurav Narula
>    Assignee: Justine Olshan
>Priority: Blocker
> Fix For: 3.8.0
>
>
> Running 
> {{TC_PATHS="tests/kafkatest/tests/core/kraft_upgrade_test.py::TestKRaftUpgrade"
>  bash tests/docker/run_tests.sh}} on trunk (c4a3d2475f) fails with the 
> following:
> {code:java}
> [INFO:2024-06-18 09:16:03,139]: Triggering test 2 of 32...
> [INFO:2024-06-18 09:16:03,147]: RunnerClient: Loading test {'directory': 
> '/opt/kafka-dev/tests/kafkatest/tests/core', 'file_name': 
> 'kraft_upgrade_test.py', 'cls_name': 'TestKRaftUpgrade', 'method_name': 
> 'test_isolated_mode_upgrade', 'injected_args': {'from_kafka_version': 
> '3.1.2', 'use_new_coordinator': True, 'metadata_quorum': 'ISOLATED_KRAFT'}}
> [INFO:2024-06-18 09:16:03,151]: RunnerClient: 
> kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT:
>  on run 1/1
> [INFO:2024-06-18 09:16:03,153]: RunnerClient: 
> kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT:
>  Setting up...
> [INFO:2024-06-18 09:16:03,153]: RunnerClient: 
> kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT:
>  Running...
> [INFO:2024-06-18 09:16:05,999]: RunnerClient: 
> kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT:
>  Tearing down...
> [INFO:2024-06-18 09:16:12,366]: RunnerClient: 
> kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT:
>  FAIL: RemoteCommandError({'ssh_config': {'host': 'ducker10', 'hostname': 
> 'ducker10', 'user': 'ducker', 'port': 22, 'password': '', 'identityfile': 
> '/home/ducker/.ssh/id_rsa', 'connecttimeout': None}, 'hostname': 'ducker10', 
> 'ssh_hostname': 'ducker10', 'user': 'ducker', 'externally_routable_ip': 
> 'ducker10', '_logger':  kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT-2
>  (DEBUG)>, 'os': 'linux', '_ssh_client':  0x85bccc70>, '_sftp_client':  0x85bccdf0>, '_custom_ssh_exception_checks': None}, 
> '/opt/kafka-3.1.2/bin/kafka-storage.sh format --ignore-formatted --config 
> /mnt/kafka/kafka.properties --cluster-id I2eXt9rvSnyhct8BYmW6-w -f 
> group.version=1', 1, b"usage: kafka-storage format [-h] --config CONFIG 
> --cluster-id CLUSTER_ID\n                     
> [--ignore-formatted]\nkafka-storage: error: unrecognized arguments: '-f'\n")
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 186, in _do_run
>     data = self.run_test()
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 246, in run_test
>     return self.test_context.function(self.test)
>   File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 
> 433, in wrapper
>     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File "/opt/kafka-dev/tests/kafkatest/tests/core/kraft_upgrade_test.py", 
> line 132, in test_isolated_mode_upgrade
>     self.run_upgrade(from_kafka_version, group_protocol)
>   File "/opt/kafka-dev/tests/kafkatest/tests/core/kraft_upgrade_test.py", 
> line 96, in run_upgrade
>     self.kafka.start()
>   File "/opt/kafka-dev/tests/kafkatest/services/kafka/kafka.py", line 669, in 
> start
>     self.isolated_controller_quorum.start()
>   File "/opt/kafka-dev/tests/kafkatest/services/kafka/kafka.py", line 671, in 
> start
>     Service.start(self, **kwargs)
>   File "/usr/local/lib/python3.9/dist-packages/ducktape/services/service.py", 
> line 265, in start
>     self.start_node(node, **kwargs)
>   File "/op

[jira] [Resolved] (KAFKA-16990) Unrecognised flag passed to kafka-storage.sh in system test

2024-06-24 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-16990.

Resolution: Fixed

> Unrecognised flag passed to kafka-storage.sh in system test
> ---
>
> Key: KAFKA-16990
> URL: https://issues.apache.org/jira/browse/KAFKA-16990
> Project: Kafka
>  Issue Type: Test
>Affects Versions: 3.8.0
>Reporter: Gaurav Narula
>    Assignee: Justine Olshan
>Priority: Blocker
> Fix For: 3.8.0
>
>
> Running 
> {{TC_PATHS="tests/kafkatest/tests/core/kraft_upgrade_test.py::TestKRaftUpgrade"
>  bash tests/docker/run_tests.sh}} on trunk (c4a3d2475f) fails with the 
> following:
> {code:java}
> [INFO:2024-06-18 09:16:03,139]: Triggering test 2 of 32...
> [INFO:2024-06-18 09:16:03,147]: RunnerClient: Loading test {'directory': 
> '/opt/kafka-dev/tests/kafkatest/tests/core', 'file_name': 
> 'kraft_upgrade_test.py', 'cls_name': 'TestKRaftUpgrade', 'method_name': 
> 'test_isolated_mode_upgrade', 'injected_args': {'from_kafka_version': 
> '3.1.2', 'use_new_coordinator': True, 'metadata_quorum': 'ISOLATED_KRAFT'}}
> [INFO:2024-06-18 09:16:03,151]: RunnerClient: 
> kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT:
>  on run 1/1
> [INFO:2024-06-18 09:16:03,153]: RunnerClient: 
> kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT:
>  Setting up...
> [INFO:2024-06-18 09:16:03,153]: RunnerClient: 
> kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT:
>  Running...
> [INFO:2024-06-18 09:16:05,999]: RunnerClient: 
> kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT:
>  Tearing down...
> [INFO:2024-06-18 09:16:12,366]: RunnerClient: 
> kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT:
>  FAIL: RemoteCommandError({'ssh_config': {'host': 'ducker10', 'hostname': 
> 'ducker10', 'user': 'ducker', 'port': 22, 'password': '', 'identityfile': 
> '/home/ducker/.ssh/id_rsa', 'connecttimeout': None}, 'hostname': 'ducker10', 
> 'ssh_hostname': 'ducker10', 'user': 'ducker', 'externally_routable_ip': 
> 'ducker10', '_logger':  kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.1.2.use_new_coordinator=True.metadata_quorum=ISOLATED_KRAFT-2
>  (DEBUG)>, 'os': 'linux', '_ssh_client':  0x85bccc70>, '_sftp_client':  0x85bccdf0>, '_custom_ssh_exception_checks': None}, 
> '/opt/kafka-3.1.2/bin/kafka-storage.sh format --ignore-formatted --config 
> /mnt/kafka/kafka.properties --cluster-id I2eXt9rvSnyhct8BYmW6-w -f 
> group.version=1', 1, b"usage: kafka-storage format [-h] --config CONFIG 
> --cluster-id CLUSTER_ID\n                     
> [--ignore-formatted]\nkafka-storage: error: unrecognized arguments: '-f'\n")
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 186, in _do_run
>     data = self.run_test()
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 246, in run_test
>     return self.test_context.function(self.test)
>   File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 
> 433, in wrapper
>     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File "/opt/kafka-dev/tests/kafkatest/tests/core/kraft_upgrade_test.py", 
> line 132, in test_isolated_mode_upgrade
>     self.run_upgrade(from_kafka_version, group_protocol)
>   File "/opt/kafka-dev/tests/kafkatest/tests/core/kraft_upgrade_test.py", 
> line 96, in run_upgrade
>     self.kafka.start()
>   File "/opt/kafka-dev/tests/kafkatest/services/kafka/kafka.py", line 669, in 
> start
>     self.isolated_controller_quorum.start()
>   File "/opt/kafka-dev/tests/kafkatest/services/kafka/kafka.py", line 671, in 
> start
>     Service.start(self, **kwargs)
>   File "/usr/local/lib/python3.9/dist-packages/ducktape/services/service.py", 
> line 265, in start
>     self.start_node(node, **kwargs)
>

[jira] [Resolved] (KAFKA-17011) SupportedFeatures.MinVersion incorrectly blocks v0

2024-07-10 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-17011.

Resolution: Fixed

> SupportedFeatures.MinVersion incorrectly blocks v0
> --
>
> Key: KAFKA-17011
> URL: https://issues.apache.org/jira/browse/KAFKA-17011
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.8.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Critical
> Fix For: 3.9.0
>
>
> SupportedFeatures.MinVersion incorrectly blocks v0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17250) Many system tests failing with org.apache.kafka.common.errors.UnsupportedVersionException: Attempted to write a non-default replicaDirectoryId at version 13

2024-08-02 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-17250:
--

 Summary: Many system tests failing with 
org.apache.kafka.common.errors.UnsupportedVersionException: Attempted to write 
a non-default replicaDirectoryId at version 13
 Key: KAFKA-17250
 URL: https://issues.apache.org/jira/browse/KAFKA-17250
 Project: Kafka
  Issue Type: Task
Affects Versions: 3.9.0
Reporter: Justine Olshan


I see a lot of kraft system tests that test different versions failing with 
this error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


<    1   2   3   4   5