Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.7 #149

2024-04-29 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 459660 lines...]
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.SinkUtilsTest > 
testValidateAndParseStringPartitionValue PASSED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.SinkUtilsTest > 
testConsumerGroupOffsetsToConnectorOffsets STARTED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.SinkUtilsTest > 
testConsumerGroupOffsetsToConnectorOffsets PASSED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.SinkUtilsTest > 
testValidateAndParseInvalidOffset STARTED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.SinkUtilsTest > 
testValidateAndParseInvalidOffset PASSED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.SinkUtilsTest > testNullOffset 
STARTED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.SinkUtilsTest > testNullOffset 
PASSED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.SinkUtilsTest > testNullPartition 
STARTED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.SinkUtilsTest > testNullPartition 
PASSED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.SinkUtilsTest > testNullTopic 
STARTED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.SinkUtilsTest > testNullTopic PASSED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.SinkUtilsTest > 
testValidateAndParseInvalidPartition STARTED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.SinkUtilsTest > 
testValidateAndParseInvalidPartition PASSED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.TopicAdminTest > 
verifyingTopicCleanupPolicyShouldReturnFalseWhenBrokerVersionIsUnsupported 
STARTED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.TopicAdminTest > 
verifyingTopicCleanupPolicyShouldReturnFalseWhenBrokerVersionIsUnsupported 
PASSED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.TopicAdminTest > 
shouldNotCreateTopicWhenItAlreadyExists STARTED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.TopicAdminTest > 
shouldNotCreateTopicWhenItAlreadyExists PASSED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.TopicAdminTest > 
endOffsetsShouldFailWithNonRetriableWhenUnknownErrorOccurs STARTED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.TopicAdminTest > 
endOffsetsShouldFailWithNonRetriableWhenUnknownErrorOccurs PASSED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.TopicAdminTest > 
verifyingTopicCleanupPolicyShouldFailWhenTopicHasDeleteAndCompactPolicy STARTED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.TopicAdminTest > 
verifyingTopicCleanupPolicyShouldFailWhenTopicHasDeleteAndCompactPolicy PASSED
[2024-04-30T05:24:41.911Z] 
[2024-04-30T05:24:41.911Z] Gradle Test Run :connect:runtime:test > Gradle Test 
Executor 48 > org.apache.kafka.connect.util.TopicAdminTest > 
describeTopicConfigShouldReturnEmptyMapWhenUnsupportedVersionFailure STARTED

[jira] [Created] (KAFKA-16644) FK join emit duplicate tombstone on left-side delete

2024-04-29 Thread Matthias J. Sax (Jira)
Matthias J. Sax created KAFKA-16644:
---

 Summary: FK join emit duplicate tombstone on left-side delete
 Key: KAFKA-16644
 URL: https://issues.apache.org/jira/browse/KAFKA-16644
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 3.7.0
Reporter: Matthias J. Sax


We introduced a regression bug in 3.7.0 release via KAFKA-14778. When a 
left-hand side record is deleted, the join now emits two tombstone records 
instead of one.

The problem was not detected via unit test, because the test use a `Map` 
instead of a `List` when verifying the result topic records 
([https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/integration/KTableKTableForeignKeyJoinIntegrationTest.java#L250-L255).]

We should update all test cases to use `List` when reading from the output 
topic, and of course fix the introduces bug: The 
`SubscriptionSendProcessorSupplier` is sending two subscription records instead 
of just a single one: 
[https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionSendProcessorSupplier.java#L133-L136]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16467) Add README to docs folder

2024-04-29 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16467.
---
Fix Version/s: 3.8.0
   Resolution: Fixed

> Add README to docs folder
> -
>
> Key: KAFKA-16467
> URL: https://issues.apache.org/jira/browse/KAFKA-16467
> Project: Kafka
>  Issue Type: Improvement
>Reporter: PoAn Yang
>Assignee: PoAn Yang
>Priority: Minor
> Fix For: 3.8.0
>
>
> We don't have a guide in project root folder or docs folder to show how to 
> run local website. It's good to provide a way to run document with kafka-site 
> repository.
>  
> Option 1: Add links to wiki page 
> [https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Website+Documentation+Changes]
>  and 
> [https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67634793]. 
> Option 2: Show how to run the document within container. For example: moving 
> `site-docs` from kafka to kafka-site repository and run `./start-preview.sh`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16627) Remove ClusterConfig parameter in BeforeEach and AfterEach

2024-04-29 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16627.

Fix Version/s: 3.8.0
   Resolution: Fixed

> Remove ClusterConfig parameter in BeforeEach and AfterEach
> --
>
> Key: KAFKA-16627
> URL: https://issues.apache.org/jira/browse/KAFKA-16627
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Kuan Po Tseng
>Assignee: Kuan Po Tseng
>Priority: Minor
> Fix For: 3.8.0
>
>
> In the past we modify configs like server broker properties by modifying the 
> ClusterConfig reference passed to BeforeEach and AfterEach based on the 
> requirements of the tests.
> While after KAFKA-16560, the ClusterConfig become immutable, modify the 
> ClusterConfig reference no longer reflects any changes to the test cluster. 
> Then pass ClusterConfig to BeforeEach and AfterEach become redundant. We 
> should remove this behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15897) Flaky Test: testWrongIncarnationId() – kafka.server.ControllerRegistrationManagerTest

2024-04-29 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-15897.

Fix Version/s: 3.8.0
   Resolution: Fixed

> Flaky Test: testWrongIncarnationId() – 
> kafka.server.ControllerRegistrationManagerTest
> -
>
> Key: KAFKA-15897
> URL: https://issues.apache.org/jira/browse/KAFKA-15897
> Project: Kafka
>  Issue Type: Test
>Reporter: Apoorv Mittal
>Assignee: Chia-Ping Tsai
>Priority: Major
>  Labels: flaky-test
> Fix For: 3.8.0
>
>
> Build run: 
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14699/21/tests/
>  
> {code:java}
> org.opentest4j.AssertionFailedError: expected: <(false,1,0)> but was: 
> <(true,0,0)>Stacktraceorg.opentest4j.AssertionFailedError: expected: 
> <(false,1,0)> but was: <(true,0,0)>at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>at 
> app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)  
> at 
> app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182)  
> at 
> app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177)  
> at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1141)   
>   at 
> app//kafka.server.ControllerRegistrationManagerTest.$anonfun$testWrongIncarnationId$3(ControllerRegistrationManagerTest.scala:228)
>at 
> app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379)
>  at 
> app//kafka.server.ControllerRegistrationManagerTest.testWrongIncarnationId(ControllerRegistrationManagerTest.scala:226)
>   at 
> java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)at 
> java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   at 
> java.base@17.0.7/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base@17.0.7/java.lang.reflect.Method.invoke(Method.java:568)
> at 
> app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
>   at 
> app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>  at 
> app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
>  at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
> at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
>   at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
>at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
> at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
>  at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
>at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
>   at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
>   at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
>at 
> app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
> at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
>  at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
>   at 
> 

[jira] [Created] (KAFKA-16643) Add ModifierOrder checkstyle rule

2024-04-29 Thread Greg Harris (Jira)
Greg Harris created KAFKA-16643:
---

 Summary: Add ModifierOrder checkstyle rule
 Key: KAFKA-16643
 URL: https://issues.apache.org/jira/browse/KAFKA-16643
 Project: Kafka
  Issue Type: Task
  Components: build
Reporter: Greg Harris


Checkstyle offers the ModifierOrder rule: 
[https://checkstyle.sourceforge.io/checks/modifier/modifierorder.html] that 
Kafka violates in a lot of places. We should decide if this is a checkstyle 
rule we should be following or not, and potentially enable it moving forward.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Wiki account sign-up

2024-04-29 Thread Matthias J. Sax

Hi,

as many of you know, there is a issue with creating new wiki accounts 
right now: https://issues.apache.org/jira/browse/INFRA-25451


A fix might take some more time. In the meantime, please reply to the 
INFRA ticket with your email address, and accounts will be created 
manually in the mean time.



-Matthias


Re: Assistance Needed with Creating Wiki ID for Kafka Contribution

2024-04-29 Thread Matthias J. Sax
It's a known issue and INFRA is working on a solution: 
https://issues.apache.org/jira/browse/INFRA-25451


In the mean time, users can be added manually (cf the ticket, and reply 
there to get added).


-Matthias

On 3/28/24 5:10 AM, Prashant Jagtap wrote:

Hi,

I hope this email finds you well.

I've been eager to contribute to Apache Kafka and have started following 
the procedure outlined in the Kafka Improvement Proposals (KIP) 
documentation - 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-GettingStarted . I've successfully signed up for the Developer mailing list and created a Jira ID.
However, I've encountered a hurdle while trying to create a Wiki ID by 
following this url - https://cwiki.apache.org/confluence/signup.action 
. Despite the 
provided instructions, I'm unable to complete the sign-up process for 
the Wiki ID. Attached the screenshot for your reference.

Could you please assist me with the steps required to create a Wiki ID?
(I have already sent an email to infrastruct...@apache.org regarding the 
same)


Thank you for your time and support.

Thanks and Regards,
Prashant Jagtap


RE: DISCUSS KIP-984 Add pluggable compression interface to Kafka

2024-04-29 Thread Diop, Assane
Hi Divij, Greg and Luke,
I have updated the KIP for Kafka pluggable compression addressing the concerns 
from the original design. 
I believe this new design takes into account lots of concerns and have solved 
them. I would like to receive feedback on them as I am working on getting this 
KIP accepted. Not targeting a release or anything but accepting the concept 
will help getting towards this direction. 

The link to the KIP is here 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-984%3A+Add+pluggable+compression+interface+to+Kafka

Assane

-Original Message-
From: Diop, Assane  
Sent: Wednesday, April 24, 2024 4:58 PM
To: dev@kafka.apache.org
Subject: RE:DISCUSS KIP-984 Add pluggable compression interface to Kafka

Hi, 

I would like to bring back attention to 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-984%3A+Add+pluggable+compression+interface+to+Kafka
I have made significant changes to the design to accommodate the concerns and 
would like some feedback from the community and engage communication. 

Assane  

-Original Message-
From: Diop, Assane
Sent: Friday, March 1, 2024 4:45 PM
To: dev@kafka.apache.org
Subject: RE: DISCUSS KIP-984 Add pluggable compression interface to Kafka

Hi Luke, 

The proposal doesn't preclude supporting multiple clients but each client would 
need an implementation of the pluggable architecture.
At the very least we envision other clients such as librdkafka and kafka-python 
could be supported by C implementations. 

We agree with community feedback regarding the need to support these clients, 
and we are looking at alternative approaches for brokers and clients to 
coordinate the plugin. 

One way to do this coordination is each client should have a configuration 
mapping of the plugin name to its implementation.

Assane 






-Original Message-
From: Luke Chen 
Sent: Monday, February 26, 2024 7:50 PM
To: dev@kafka.apache.org
Subject: Re: DISCUSS KIP-984 Add pluggable compression interface to Kafka

Hi Assane,

I also share the same concern as Greg has, which is that the KIP is not kafka 
ecosystem friendly.
And this will make the kafka client and broker have high dependencies that once 
you use the pluggable compression interface, the producer must be java client.
This seems to go against the original Kafka's design.

If the proposal can support all kinds of clients, that would be great.

Thanks.
Luke

On Tue, Feb 27, 2024 at 7:44 AM Diop, Assane  wrote:

> Hi Greg,
>
> Thanks for taking the time to give some feedback. It was very insightful.
>
> I have some answers:
>
> 1. The current proposal is Java centric. We want to figure out with 
> Java first and then later incorporate other languages. We will get there.
>
> 2. The question of where the plugins would live is an important one. I 
> would like to get the community engagement on where a plugin would live.
>Officially supported plugins could be part of Kafka and others 
> could live in a plugin repository. Is there currently a way to store 
> plugins in Kafka and load them into the classpath? If such a space 
> could be allowed then it would provide an standard way of installing 
> officially supported plugins.
>In OpenSearch for example, there is a plugin utility that takes the 
> jar and installs it across the cluster, privileges can be granted by an admin.
> Such utility could be implemented in Kafka.
>
> 3. There is many way to look at this, we could change the message 
> format that use the pluggable interface to be for example v3 and 
> synchronize against that.
>In order to use the pluggable codec, you will have to be at message 
> version 3 for example.
>
> 4. Passing the class name as metadata is one way to have the producer 
> talk to the broker about which plugin to use. However there could be 
> other implementation
>where you could set every thing to know about the topic using topic 
> level compression. In this case for example a rule could be that in 
> order to use the
>pluggable interface, you should use topic level compression.
>
>  I would like to have your valuable inputs on this!!
>
> Thanks before end,
> Assane
>
> -Original Message-
> From: Greg Harris 
> Sent: Wednesday, February 14, 2024 2:36 PM
> To: dev@kafka.apache.org
> Subject: Re: DISCUSS KIP-984 Add pluggable compression interface to 
> Kafka
>
> Hi Assane,
>
> Thanks for the KIP!
> Looking back, it appears that the project has only ever added 
> compression types twice: lz4 in 2014 and zstd in 2018, and perhaps 
> Kafka has fallen behind the state-of-the-art compression algorithms.
> Thanks for working to fix that!
>
> I do have some concerns:
>
> 1. I think this is a very "java centric" proposal, and doesn't take 
> non-java clients into enough consideration. librdkafka [1] is a great 
> example of an implementation of the Kafka protocol which doesn't have 
> the same classloading and plugin infrastructure that Java has, which 
> would make implementing this 

Re: [DISCUSS] KIP-924: customizable task assignment for Streams

2024-04-29 Thread Sophie Blee-Goldman
Yeah I think that sums it up well. Either you computed a *possible* assignment,
or you returned something that makes it literally impossible for the
StreamsPartitionAssignor to decipher/translate into an actual group
assignment, in which case it should just fail

That's more or less it for the open questions that have been raised so far,
so I just want to remind folks that there's already a voting thread for
this. I cast my vote a few minutes ago so it should resurface in everyone's
inbox :)

On Thu, Apr 25, 2024 at 11:42 PM Rohan Desai 
wrote:

> 117: as Sophie laid out, there are two cases here right:
> 1. cases that are considered invalid by the existing assignors but are
> still valid assignments in the sense that they can be used to generate a
> valid consumer group assignment (from the perspective of the consumer group
> protocol). An assignment that excludes a task is one such example, and
> Sophie pointed out a good use case for it. I also think it makes sense to
> allow these. It's hard to predict how a user might want to use the custom
> assignor, and its reasonable to expect them to use it with care and not
> hand-hold them.
> 2. cases that are not valid because it is impossible to compute a valid
> consumer group assignment from them. In this case it seems totally
> reasonable to just throw a fatal exception that gets passed to the uncaught
> exception handler. If this case happens then there is some bug in the
> user's assignor and its totally reasonable to fail the application in that
> case. We _could_ try to be more graceful and default to one of the existing
> assignors. But it's usually better to fail hard and fast when there is some
> illegal state detected imo.
>
> On Fri, Apr 19, 2024 at 4:18 PM Rohan Desai 
> wrote:
>
> > Bruno, I've incorporated your feedback into the KIP document.
> >
> > On Fri, Apr 19, 2024 at 3:55 PM Rohan Desai 
> > wrote:
> >
> >> Thanks for the feedback Bruno! For the most part I think it makes sense,
> >> but leaving a couple follow-up thoughts/questions:
> >>
> >> re 4: I think Sophie's point was slightly different - that we might want
> >> to wrap the return type for `assign` in a class so that its easily
> >> extensible. This makes sense to me. Whether we do that or not, we can
> have
> >> the return type be a Set instead of a Map as well.
> >>
> >> re 6: Yes, it's a callback that's called with the final assignment. I
> >> like your suggested name.
> >>
> >> On Fri, Apr 5, 2024 at 12:17 PM Rohan Desai 
> >> wrote:
> >>
> >>> Thanks for the feedback Sophie!
> >>>
> >>> re1: Totally agree. The fact that it's related to the partition
> assignor
> >>> is clear from just `task.assignor`. I'll update.
> >>> re3: This is a good point, and something I would find useful
> personally.
> >>> I think its worth adding an interface that lets the plugin observe the
> >>> final assignment. I'll add that.
> >>> re4: I like the new `NodeAssignment` type. I'll update the KIP with
> that.
> >>>
> >>> On Thu, Nov 9, 2023 at 11:18 PM Rohan Desai 
> >>> wrote:
> >>>
>  Thanks for the feedback so far! I think pretty much all of it is
>  reasonable. I'll reply to it inline:
> 
>  > 1. All the API logic is granular at the Task level, except the
>  previousOwnerForPartition func. I’m not clear what’s the motivation
>  behind it, does our controller also want to change how the
>  partitions->tasks mapping is formed?
>  You're right that this is out of place. I've removed this method as
>  it's not needed by the task assignor.
> 
>  > 2. Just on the API layering itself: it feels a bit weird to have the
>  three built-in functions (defaultStandbyTaskAssignment etc) sitting in
>  the ApplicationMetadata class. If we consider them as some default
> util
>  functions, how about introducing moving those into their own static
> util
>  methods to separate from the ApplicationMetadata “fact objects” ?
>  Agreed. Updated in the latest revision of the kip. These have been
>  moved to TaskAssignorUtils
> 
>  > 3. I personally prefer `NodeAssignment` to be a read-only object
>  containing the decisions made by the assignor, including the
>  requestFollowupRebalance flag. For manipulating the half-baked results
>  inside the assignor itself, maybe we can just be flexible to let
> users use
>  whatever struts / their own classes even, if they like. WDYT?
>  Agreed. Updated in the latest version of the kip.
> 
>  > 1. For the API, thoughts on changing the method signature to return
> a
>  (non-Optional) TaskAssignor? Then we can either have the default
>  implementation return new HighAvailabilityTaskAssignor or just have a
>  default implementation class that people can extend if they don't
> want to
>  implement every method.
>  Based on some other discussion, I actually decided to get rid of the
>  plugin interface, and instead use config to specify individual plugin
> 

Re: [VOTE] KIP-924: customizable task assignment for Streams

2024-04-29 Thread Sophie Blee-Goldman
+1 (binding)

thanks for driving this KIP!

On Tue, Apr 16, 2024 at 1:46 PM Rohan Desai  wrote:

>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-924%3A+customizable+task+assignment+for+Streams
>
> As this KIP has been open for a while, and gone through a couple rounds of
> review/revision, I'm calling a vote to get it approved.
>


[jira] [Created] (KAFKA-16642) Update KafkaConsumerTest to show parameters in test lists

2024-04-29 Thread Kirk True (Jira)
Kirk True created KAFKA-16642:
-

 Summary: Update KafkaConsumerTest to show parameters in test lists
 Key: KAFKA-16642
 URL: https://issues.apache.org/jira/browse/KAFKA-16642
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


{{KafkaConsumerTest}} was recently updated to make many of its tests 
parameterized to exercise both the {{CLASSIC}} and {{CONSUMER}} group 
protocols. However, in some of the tools in which [lists of tests are 
provided|https://ge.apache.org/scans/tests?search.names=Git%20branch=P28D=kafka=America%2FLos_Angeles=trunk=org.apache.kafka.clients.consumer.KafkaConsumerTest=FLAKY],
 say, for analysis, the group protocol information is not exposed. For example, 
one test ({{{}testReturnRecordsDuringRebalance{}}}) is flaky, but it's 
difficult to know at a glance which group protocol is causing the problem 
because the list simply shows:
{quote}{{testReturnRecordsDuringRebalance(GroupProtocol)[1]}}
{quote}
Ideally, it would expose more information, such as:
{quote}{{testReturnRecordsDuringRebalance(GroupProtocol=CONSUMER)}}
{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-909: DNS Resolution Fallures Should Not Fail the Client

2024-04-29 Thread José Armando García Sancio
Thanks for the KIP Philip.

+1 binding.

On Mon, Apr 29, 2024 at 9:55 AM Philip Nee  wrote:
>
> Thanks all.  Will find time to get the patch out.
>
> On Mon, Apr 29, 2024 at 7:42 AM Federico Valeri 
> wrote:
>
> > +1 (non binding)
> > Thanks
> >
> > On Mon, Apr 29, 2024 at 10:18 AM Rajini Sivaram 
> > wrote:
> > >
> > > Hi Philip,
> > >
> > > +1 (binding). Thanks for the KIP!
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > >
> > > On Tue, Apr 25, 2023 at 8:52 PM Philip Nee  wrote:
> > >
> > > > Thanks for the vote.  We've decided to make a minor change to the
> > default
> > > > timeout from 5min to 2min.
> > > >
> > > > On Tue, Apr 25, 2023 at 11:42 AM David Jacot 
> > > > wrote:
> > > >
> > > > > +1 (binding) Thanks for the KIP, Philip!
> > > > >
> > > > > Le mar. 25 avr. 2023 à 20:23, José Armando García Sancio
> > > > >  a écrit :
> > > > >
> > > > > > +1. Thanks for the design. Looking forward to the implementation.
> > > > > >
> > > > > > On Tue, Apr 25, 2023 at 10:49 AM Jason Gustafson
> > > > > >  wrote:
> > > > > > >
> > > > > > > +1 Thanks Philip!
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Apr 13, 2023 at 7:49 AM Kirk True 
> > wrote:
> > > > > > >
> > > > > > > > +1 (non-binding)
> > > > > > > >
> > > > > > > > > On Apr 10, 2023, at 1:53 PM, Philip Nee  > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hey everyone!
> > > > > > > > >
> > > > > > > > > I'm starting a vote for KIP-909: DNS Resolution Fallures
> > Should
> > > > Not
> > > > > > Fail
> > > > > > > > > the Client  > the
> > > > > > Clients>
> > > > > > > > >
> > > > > > > > > Please refer to the discussion thread here:
> > > > > > > > >
> > https://lists.apache.org/thread/st84zzwnq5m3pkzd1r7jk9lmqdt9m98s
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > > P
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -José
> > > > > >
> > > > >
> > > >
> >



-- 
-José


[jira] [Created] (KAFKA-16641) MM2 offset translation should interpolate between sparse OffsetSyncs

2024-04-29 Thread Greg Harris (Jira)
Greg Harris created KAFKA-16641:
---

 Summary: MM2 offset translation should interpolate between sparse 
OffsetSyncs
 Key: KAFKA-16641
 URL: https://issues.apache.org/jira/browse/KAFKA-16641
 Project: Kafka
  Issue Type: Improvement
  Components: mirrormaker
Reporter: Greg Harris


Right now, the OffsetSyncStore keeps a sparse offset store, with exponential 
spacing between syncs. This can leave large gaps in translation, where offsets 
are translated much more conservatively than necessary.

The dominant way to use MirrorMaker2 is in a "single writer" fashion, where the 
target topic is only written to by a single mirror maker 2. When a topic 
without gaps is replicated, contiguous blocks of offsets are preserved. For 
example:

Say that MM2 mirrors 100 records, and emits two syncs: 0:100 and 100:200. We 
can detect when the gap between the upstream and downstream offsets is the same 
using subtraction, and then assume that 50:150 is also a valid translation. If 
the source topic has gaps, or goes through a restart, we should expect a 
discontinuity in the offset syncs, like 0:100 and 100:250 or 0:100 and 100:150.

This may allow us to restore much of the offset translation precision that was 
lost for simple contiguous topics, without additional memory usage, but at the 
risk of mis-translating some pathological situations when the source topic has 
gaps. This might be able to be enabled unconditionally, or enabled via a 
configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2855

2024-04-29 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-932: Queues for Kafka

2024-04-29 Thread Andrew Schofield
Hi Jun,
Thanks for the reply and sorry for the delay in responding.

123. Yes, I didn’t quite get your point earlier. The member
epoch is bumped by the GC when it sends a new assignment.
When the member sends its next heartbeat, it echoes back
the member epoch, which will confirm the receipt of the
assignment. It would send the same member epoch even
after recovery of a network disconnection, so that should
be sufficient to cope with this eventuality.

125. Yes, I have added it to the table which now matches
the text earlier in the KIP. Thanks.

140. Yes, I have added it to the table which now matches
the text earlier in the KIP. I’ve also added more detail for
the case where the entire share group is being deleted.

141. Yes! Sorry for confusing things.

Back to the original question for this point. To delete a share
group, should the GC write a tombstone for each
ShareGroupMemberMetadata record?

Tombstones are necessary to delete ShareGroupMemberMetadata
records. But, deletion of a share group is only possible when
the group is already empty, so the tombstones will have
been written as a result of the members leaving the group.

143. Yes, that’s right.

147. The measurement is certainly from the point of view
of the client, but it’s driven by sending and receiving heartbeats
rather than whether the client triggered the rebalance itself.
The client decides when it enters and leaves reconciliation
of the assignment, and measures this period.


Thanks,
Andrew


> On 26 Apr 2024, at 09:43, Jun Rao  wrote:
> 
> Hi, Andrew,
> 
> Thanks for the reply.
> 
> 123. "Rather than add the group epoch to the ShareGroupHeartbeat, I have
> decided to go for TopicPartitions in ShareGroupHeartbeatRequest which
> mirrors ConsumerGroupHeartbeatRequest."
> ShareGroupHeartbeat.MemberEpoch is the group epoch, right? Is that enough
> for confirming the receipt of the new assignment?
> 
> 125. This also means that "Alter share group offsets" needs to write a
> ShareGroupPartitionMetadata record, if the partition is not already
> initialized.
> 
> 140. In the table for "Delete share group offsets", we need to add a step
> to write a ShareGroupPartitionMetadata record with DeletingTopics.
> 
> 141. Hmm, ShareGroupMemberMetadata is stored in the __consumer_offsets
> topic, which is a compacted topic, right?
> 
> 143. So, the client sends DescribeShareGroupOffsets requests to GC, which
> then forwards it to SC?
> 
> 147. I guess a client only knows the rebalance triggered by itself, but not
> the ones triggered by other members or topic/partition changes?
> 
> Jun
> 
> On Thu, Apr 25, 2024 at 4:19 AM Andrew Schofield 
> wrote:
> 
>> Hi Jun,
>> Thanks for the response.
>> 
>> 123. Of course, ShareGroupHearbeat started off as ConsumerGroupHeartbeat
>> and then unnecessary fields were removed. In the network issue case,
>> there is not currently enough state being exchanged to be sure an
>> assignment
>> was received.
>> 
>> Rather than add the group epoch to the ShareGroupHeartbeat, I have decided
>> to go for TopicPartitions in ShareGroupHeartbeatRequest which mirrors
>> ConsumerGroupHeartbeatRequest. It means the share group member does
>> confirm the assignment it is using, and that can be used by the GC to
>> safely
>> stop repeating the assignment in heartbeat responses.
>> 
>> 125. Ah, yes. This is indeed something possible with a consumer group
>> and share groups should support it too. This does of course imply that
>> ShareGroupPartitionMetadataValue needs an array of partitions, not
>> just the number.
>> 
>> 140. Yes, good spot. There is an inconsistency here in consumer groups
>> where you can use AdminClient.deleteConsumerGroupOffsets at the
>> partition level, but kafka-consumer-groups.sh --delete only operates
>> at the topic level.
>> 
>> Personally, I don’t think it’s sensible to delete offsets at the partition
>> level only. You can reset them, but if you’re actively using a topic with
>> a share group, I don’t see why you’d want to delete offsets rather than
>> reset. If you’ve finished using a topic with a share group and want to
>> clean
>> up, use delete.
>> 
>> So, I’ve changed the AdminClient.deleteConsumerGroupOffsets to be
>> topic-based and the RPCs behind it.
>> 
>> The GC reconciles the cluster state with the ShareGroupPartitionMetadata
>> to spot deletion of topics and the like. However, when the offsets for
>> a topic were deleted manually, the topic very like still exists so
>> reconciliation
>> alone is not going to be able to continue an interrupted operation that
>> has started. So, I’ve added DeletingTopics back into
>> ShareGroupPartitionMetadata for this purpose. It’s so failover of a GC
>> can continue where it left off rather than leaving fragments across the
>> SCs.
>> 
>> 141. That is not required. Because this is not a compacted topic, it is
>> not necessary to write tombstones for every key. As long as there is a
>> clear and unambiguous record for the deletion of the group, that is enough.

Re: [VOTE] KIP-909: DNS Resolution Fallures Should Not Fail the Client

2024-04-29 Thread Philip Nee
Thanks all.  Will find time to get the patch out.

On Mon, Apr 29, 2024 at 7:42 AM Federico Valeri 
wrote:

> +1 (non binding)
> Thanks
>
> On Mon, Apr 29, 2024 at 10:18 AM Rajini Sivaram 
> wrote:
> >
> > Hi Philip,
> >
> > +1 (binding). Thanks for the KIP!
> >
> > Regards,
> >
> > Rajini
> >
> >
> > On Tue, Apr 25, 2023 at 8:52 PM Philip Nee  wrote:
> >
> > > Thanks for the vote.  We've decided to make a minor change to the
> default
> > > timeout from 5min to 2min.
> > >
> > > On Tue, Apr 25, 2023 at 11:42 AM David Jacot 
> > > wrote:
> > >
> > > > +1 (binding) Thanks for the KIP, Philip!
> > > >
> > > > Le mar. 25 avr. 2023 à 20:23, José Armando García Sancio
> > > >  a écrit :
> > > >
> > > > > +1. Thanks for the design. Looking forward to the implementation.
> > > > >
> > > > > On Tue, Apr 25, 2023 at 10:49 AM Jason Gustafson
> > > > >  wrote:
> > > > > >
> > > > > > +1 Thanks Philip!
> > > > > >
> > > > > >
> > > > > > On Thu, Apr 13, 2023 at 7:49 AM Kirk True 
> wrote:
> > > > > >
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > > > On Apr 10, 2023, at 1:53 PM, Philip Nee  >
> > > > wrote:
> > > > > > > >
> > > > > > > > Hey everyone!
> > > > > > > >
> > > > > > > > I'm starting a vote for KIP-909: DNS Resolution Fallures
> Should
> > > Not
> > > > > Fail
> > > > > > > > the Client  the
> > > > > Clients>
> > > > > > > >
> > > > > > > > Please refer to the discussion thread here:
> > > > > > > >
> https://lists.apache.org/thread/st84zzwnq5m3pkzd1r7jk9lmqdt9m98s
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > > P
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -José
> > > > >
> > > >
> > >
>


[jira] [Resolved] (KAFKA-16217) Transactional producer stuck in IllegalStateException during close

2024-04-29 Thread Calvin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Calvin Liu resolved KAFKA-16217.

Fix Version/s: (was: 3.6.3)
   Resolution: Fixed

> Transactional producer stuck in IllegalStateException during close
> --
>
> Key: KAFKA-16217
> URL: https://issues.apache.org/jira/browse/KAFKA-16217
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, producer 
>Affects Versions: 3.7.0, 3.6.1
>Reporter: Calvin Liu
>Assignee: Calvin Liu
>Priority: Major
>  Labels: transactions
> Fix For: 3.8.0, 3.7.1
>
>
> The producer is stuck during the close. It keeps retrying to abort the 
> transaction but it never succeeds. 
> {code:java}
> [ERROR] 2024-02-01 17:21:22,804 [kafka-producer-network-thread | 
> producer-transaction-bench-transaction-id-f60SGdyRQGGFjdgg3vUgKg] 
> org.apache.kafka.clients.producer.internals.Sender run - [Producer 
> clientId=producer-transaction-ben
> ch-transaction-id-f60SGdyRQGGFjdgg3vUgKg, 
> transactionalId=transaction-bench-transaction-id-f60SGdyRQGGFjdgg3vUgKg] 
> Error in kafka producer I/O thread while aborting transaction:
> java.lang.IllegalStateException: Cannot attempt operation `abortTransaction` 
> because the previous call to `commitTransaction` timed out and must be retried
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.handleCachedTransactionRequestResult(TransactionManager.java:1138)
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.beginAbort(TransactionManager.java:323)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:274)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66) 
> {code}
> With the additional log, I found the root cause. If the producer is in a bad 
> transaction state(in my case, the TransactionManager.pendingTransition was 
> set to commitTransaction and did not get cleaned), then the producer calls 
> close and tries to abort the existing transaction, the producer will get 
> stuck in the transaction abortion. It is related to the fix 
> [https://github.com/apache/kafka/pull/13591].
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16460) New consumer times out consuming records in consumer_test.py system test

2024-04-29 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16460.
---
Resolution: Duplicate

> New consumer times out consuming records in consumer_test.py system test
> 
>
> Key: KAFKA-16460
> URL: https://issues.apache.org/jira/browse/KAFKA-16460
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, system tests
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>
> The {{consumer_test.py}} system test fails with the following errors:
> {quote}
> * Timed out waiting for consumption
> {quote}
> Affected tests:
> * {{test_broker_failure}}
> * {{test_consumer_bounce}}
> * {{test_static_consumer_bounce}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-909: DNS Resolution Fallures Should Not Fail the Client

2024-04-29 Thread Federico Valeri
+1 (non binding)
Thanks

On Mon, Apr 29, 2024 at 10:18 AM Rajini Sivaram  wrote:
>
> Hi Philip,
>
> +1 (binding). Thanks for the KIP!
>
> Regards,
>
> Rajini
>
>
> On Tue, Apr 25, 2023 at 8:52 PM Philip Nee  wrote:
>
> > Thanks for the vote.  We've decided to make a minor change to the default
> > timeout from 5min to 2min.
> >
> > On Tue, Apr 25, 2023 at 11:42 AM David Jacot 
> > wrote:
> >
> > > +1 (binding) Thanks for the KIP, Philip!
> > >
> > > Le mar. 25 avr. 2023 à 20:23, José Armando García Sancio
> > >  a écrit :
> > >
> > > > +1. Thanks for the design. Looking forward to the implementation.
> > > >
> > > > On Tue, Apr 25, 2023 at 10:49 AM Jason Gustafson
> > > >  wrote:
> > > > >
> > > > > +1 Thanks Philip!
> > > > >
> > > > >
> > > > > On Thu, Apr 13, 2023 at 7:49 AM Kirk True  wrote:
> > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > > On Apr 10, 2023, at 1:53 PM, Philip Nee 
> > > wrote:
> > > > > > >
> > > > > > > Hey everyone!
> > > > > > >
> > > > > > > I'm starting a vote for KIP-909: DNS Resolution Fallures Should
> > Not
> > > > Fail
> > > > > > > the Client  > > > Clients>
> > > > > > >
> > > > > > > Please refer to the discussion thread here:
> > > > > > > https://lists.apache.org/thread/st84zzwnq5m3pkzd1r7jk9lmqdt9m98s
> > > > > > >
> > > > > > > Thanks!
> > > > > > > P
> > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -José
> > > >
> > >
> >


Re: [DISCUSS] KIP-909: Allow clients to rebootstrap DNS lookup failure

2024-04-29 Thread Philip Nee
Hi,

This KIP isn't implemented yet but that sounds good.

Thanks!
P

On Mon, Apr 29, 2024 at 3:17 AM Federico Valeri 
wrote:

> Hi Philip, thanks for this KIP. As other mentioned, I think it is
> useful in dynamic environment like Kubernetes.
>
> I was wondering if we should also update the "examples" modules with
> this new non-retriable exception. Wdyt?
>
> On Fri, Apr 21, 2023 at 11:13 PM Philip Nee  wrote:
> >
> > Hey Jason,
> >
> > Thanks again. I've updated the KIP.
> >
> > P
> >
> > On Fri, Apr 21, 2023 at 9:56 AM Jason Gustafson
> 
> > wrote:
> >
> > > Hey Philip, that sounds good to me.
> > >
> > > -Jason
> > >
> > >
> > >
> > > On Thu, Apr 20, 2023 at 1:20 PM Philip Nee 
> wrote:
> > >
> > > > Hey Jason,
> > > >
> > > > Thanks for the response. I agree with your suggestions. Just to
> clarify,
> > > we
> > > > want the timeout, bootstrap.resolve.timeout.ms to only bound the DNS
> > > > resolution time, and we can throw a fatal,
> BootstrapResolutionException
> > > (so
> > > > not connection exception anymore) afterward.
> > > >
> > > > I think that aligns with the goal of this KIP.
> > > >
> > > > P
> > > >
> > > > On Thu, Apr 20, 2023 at 9:23 AM Jason Gustafson
> > >  > > > >
> > > > wrote:
> > > >
> > > > > Hey Philip,
> > > > >
> > > > > Yeah, I see your point. Here is the challenge I'm considering.
> Today,
> > > the
> > > > > client does not forward connection-related errors to the
> application.
> > > It
> > > > is
> > > > > debatable whether it should, but that is the behavior that
> applications
> > > > > expect today. The only case of a fatal error is DNS resolution and
> the
> > > > > application must handle it because they cannot even construct the
> > > client.
> > > > > Once we have this KIP, we will introduce a separate fatal failure
> > > > mechanism
> > > > > through the `BootstrapConnectionException`. And we will use it not
> only
> > > > for
> > > > > DNS failures, but general bootstrap connection failures . The
> problem
> > > is
> > > > > that existing applications will not know that this should be
> treated
> > > as a
> > > > > fatal error. So they may continue to retry. Does it help in this
> > > > situation
> > > > > to poison the client so that it cannot be used? I'm not sure.
> Perhaps
> > > it
> > > > > would reduce the risk a bit if we change `
> > > > bootstrap.connection.timeout.ms`
> > > > > to `bootstrap.resolve.timeout.ms` or something like that? Then we
> > > retain
> > > > > the current behavior if DNS succeeds, but connections fail. I know
> I'm
> > > > the
> > > > > one that suggested generalizing the configuration, but I feel some
> > > > > hesitation after thinking about it more.
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > > On Wed, Apr 19, 2023 at 8:10 PM Philip Nee 
> > > wrote:
> > > > >
> > > > > > Hey Jason,
> > > > > >
> > > > > > Thanks for your review.  I think if we make it a retriable error,
> > > does
> > > > it
> > > > > > make sense to have a configurable timeout still? as we expect the
> > > user
> > > > to
> > > > > > continue to retry anyway.
> > > > > >
> > > > > > I'm considering the case of bad configuration. If the user
> retries
> > > the
> > > > > > error, then we rely on the error/warning to alert the user.  In
> this
> > > > > case,
> > > > > > maybe we continue using the proposed behavior, i.e. warn on each
> poll
> > > > > after
> > > > > > the timeout period.
> > > > > >
> > > > > > If you agree that a configuration is needed, maybe we can call
> this
> > > > > > *bootstrap.auto.retry.ms
> > > > > >  *instead, to indicate a
> > > configurable
> > > > > > period of automatic retry. What do you think?
> > > > > >
> > > > > > Cheers,
> > > > > > P
> > > > > >
> > > > > > On Wed, Apr 19, 2023 at 7:17 PM Jason Gustafson
> > > > >  > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Phillip,
> > > > > > >
> > > > > > > The KIP looks good. 5 minutes seems like a reasonable
> tradeoff. I
> > > do
> > > > > > wonder
> > > > > > > if it is necessary to treat bootstrap timeout as a fatal error
> > > > though.
> > > > > It
> > > > > > > seems possible that the exception might be caught by handlers
> in
> > > > > existing
> > > > > > > applications which may not expect that the client needs to be
> > > > > restarted.
> > > > > > > Perhaps it would be safer to make it retriable? As long as the
> > > > > > application
> > > > > > > continues trying to use the client, we could continue trying to
> > > reach
> > > > > the
> > > > > > > bootstrap servers perhaps? That would be closer to behavior
> today
> > > > which
> > > > > > > only treats DNS resolution failures as fatal. What do you
> think?
> > > > > > >
> > > > > > > Best,
> > > > > > > Jason
> > > > > > >
> > > > > > > On Mon, Apr 10, 2023 at 1:53 PM Philip Nee <
> philip...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks, everyone: I'm starting a vote today.  Here's the
> recap
> > > for
> > > > > some
> > > > > > 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2854

2024-04-29 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-909: Allow clients to rebootstrap DNS lookup failure

2024-04-29 Thread Federico Valeri
Hi Philip, thanks for this KIP. As other mentioned, I think it is
useful in dynamic environment like Kubernetes.

I was wondering if we should also update the "examples" modules with
this new non-retriable exception. Wdyt?

On Fri, Apr 21, 2023 at 11:13 PM Philip Nee  wrote:
>
> Hey Jason,
>
> Thanks again. I've updated the KIP.
>
> P
>
> On Fri, Apr 21, 2023 at 9:56 AM Jason Gustafson 
> wrote:
>
> > Hey Philip, that sounds good to me.
> >
> > -Jason
> >
> >
> >
> > On Thu, Apr 20, 2023 at 1:20 PM Philip Nee  wrote:
> >
> > > Hey Jason,
> > >
> > > Thanks for the response. I agree with your suggestions. Just to clarify,
> > we
> > > want the timeout, bootstrap.resolve.timeout.ms to only bound the DNS
> > > resolution time, and we can throw a fatal, BootstrapResolutionException
> > (so
> > > not connection exception anymore) afterward.
> > >
> > > I think that aligns with the goal of this KIP.
> > >
> > > P
> > >
> > > On Thu, Apr 20, 2023 at 9:23 AM Jason Gustafson
> >  > > >
> > > wrote:
> > >
> > > > Hey Philip,
> > > >
> > > > Yeah, I see your point. Here is the challenge I'm considering. Today,
> > the
> > > > client does not forward connection-related errors to the application.
> > It
> > > is
> > > > debatable whether it should, but that is the behavior that applications
> > > > expect today. The only case of a fatal error is DNS resolution and the
> > > > application must handle it because they cannot even construct the
> > client.
> > > > Once we have this KIP, we will introduce a separate fatal failure
> > > mechanism
> > > > through the `BootstrapConnectionException`. And we will use it not only
> > > for
> > > > DNS failures, but general bootstrap connection failures . The problem
> > is
> > > > that existing applications will not know that this should be treated
> > as a
> > > > fatal error. So they may continue to retry. Does it help in this
> > > situation
> > > > to poison the client so that it cannot be used? I'm not sure. Perhaps
> > it
> > > > would reduce the risk a bit if we change `
> > > bootstrap.connection.timeout.ms`
> > > > to `bootstrap.resolve.timeout.ms` or something like that? Then we
> > retain
> > > > the current behavior if DNS succeeds, but connections fail. I know I'm
> > > the
> > > > one that suggested generalizing the configuration, but I feel some
> > > > hesitation after thinking about it more.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Wed, Apr 19, 2023 at 8:10 PM Philip Nee 
> > wrote:
> > > >
> > > > > Hey Jason,
> > > > >
> > > > > Thanks for your review.  I think if we make it a retriable error,
> > does
> > > it
> > > > > make sense to have a configurable timeout still? as we expect the
> > user
> > > to
> > > > > continue to retry anyway.
> > > > >
> > > > > I'm considering the case of bad configuration. If the user retries
> > the
> > > > > error, then we rely on the error/warning to alert the user.  In this
> > > > case,
> > > > > maybe we continue using the proposed behavior, i.e. warn on each poll
> > > > after
> > > > > the timeout period.
> > > > >
> > > > > If you agree that a configuration is needed, maybe we can call this
> > > > > *bootstrap.auto.retry.ms
> > > > >  *instead, to indicate a
> > configurable
> > > > > period of automatic retry. What do you think?
> > > > >
> > > > > Cheers,
> > > > > P
> > > > >
> > > > > On Wed, Apr 19, 2023 at 7:17 PM Jason Gustafson
> > > >  > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hey Phillip,
> > > > > >
> > > > > > The KIP looks good. 5 minutes seems like a reasonable tradeoff. I
> > do
> > > > > wonder
> > > > > > if it is necessary to treat bootstrap timeout as a fatal error
> > > though.
> > > > It
> > > > > > seems possible that the exception might be caught by handlers in
> > > > existing
> > > > > > applications which may not expect that the client needs to be
> > > > restarted.
> > > > > > Perhaps it would be safer to make it retriable? As long as the
> > > > > application
> > > > > > continues trying to use the client, we could continue trying to
> > reach
> > > > the
> > > > > > bootstrap servers perhaps? That would be closer to behavior today
> > > which
> > > > > > only treats DNS resolution failures as fatal. What do you think?
> > > > > >
> > > > > > Best,
> > > > > > Jason
> > > > > >
> > > > > > On Mon, Apr 10, 2023 at 1:53 PM Philip Nee 
> > > > wrote:
> > > > > >
> > > > > > > Thanks, everyone: I'm starting a vote today.  Here's the recap
> > for
> > > > some
> > > > > > of
> > > > > > > the questions:
> > > > > > >
> > > > > > > John: I changed the proposal to throw a non-retriable exception
> > > after
> > > > > the
> > > > > > > timeout elapses. I feel it might be necessary to poison the
> > client
> > > > > after
> > > > > > > retry expires, as it might indicate a real issue.
> > > > > > > Ismael: The proposal is to add a configuration for the retry and
> > it
> > > > > will
> > > > > > > throw a non-retriable exception after the time 

Re: [PR] PayPal powered by Apache Kafka section Update [kafka-site]

2024-04-29 Thread via GitHub


parvase commented on PR #590:
URL: https://github.com/apache/kafka-site/pull/590#issuecomment-2082304164

   Hi,
   Could someone help in reviewing and merge?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (KAFKA-16563) migration to KRaft hanging after MigrationClientException

2024-04-29 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16563.
---
Fix Version/s: 3.8.0
   3.7.1
   Resolution: Fixed

> migration to KRaft hanging after MigrationClientException
> -
>
> Key: KAFKA-16563
> URL: https://issues.apache.org/jira/browse/KAFKA-16563
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Major
> Fix For: 3.8.0, 3.7.1
>
>
> When running ZK migrating to KRaft process, we encountered an issue that the 
> migrating is hanging and the `ZkMigrationState` cannot move to `MIGRATION` 
> state. After investigation, the root cause is because the pollEvent didn't 
> retry with the retriable `MigrationClientException` (i.e. ZK client retriable 
> errors) while it should. And because of this, the poll event will not poll 
> anymore, which causes the KRaftMigrationDriver cannot work as expected.
>  
> {code:java}
> 2024-04-11 21:27:55,393 INFO [KRaftMigrationDriver id=5] Encountered 
> ZooKeeper error during event PollEvent. Will retry. 
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver) 
> [controller-5-migration-driver-event-handler]org.apache.zookeeper.KeeperException$NodeExistsException:
>  KeeperErrorCode = NodeExists for /migrationat 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:126)at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)at 
> kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:570)at 
> kafka.zk.KafkaZkClient.createInitialMigrationState(KafkaZkClient.scala:1701)  
>   at 
> kafka.zk.KafkaZkClient.getOrCreateMigrationState(KafkaZkClient.scala:1689)
> at 
> kafka.zk.ZkMigrationClient.$anonfun$getOrCreateMigrationRecoveryState$1(ZkMigrationClient.scala:109)
> at 
> kafka.zk.ZkMigrationClient.getOrCreateMigrationRecoveryState(ZkMigrationClient.scala:69)
> at 
> org.apache.kafka.metadata.migration.KRaftMigrationDriver.applyMigrationOperation(KRaftMigrationDriver.java:248)
> at 
> org.apache.kafka.metadata.migration.KRaftMigrationDriver.recoverMigrationStateFromZK(KRaftMigrationDriver.java:169)
> at 
> org.apache.kafka.metadata.migration.KRaftMigrationDriver.access$1900(KRaftMigrationDriver.java:62)
> at 
> org.apache.kafka.metadata.migration.KRaftMigrationDriver$PollEvent.run(KRaftMigrationDriver.java:794)
> at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
> at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
> at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
> at java.base/java.lang.Thread.run(Thread.java:840){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-909: DNS Resolution Fallures Should Not Fail the Client

2024-04-29 Thread Rajini Sivaram
Hi Philip,

+1 (binding). Thanks for the KIP!

Regards,

Rajini


On Tue, Apr 25, 2023 at 8:52 PM Philip Nee  wrote:

> Thanks for the vote.  We've decided to make a minor change to the default
> timeout from 5min to 2min.
>
> On Tue, Apr 25, 2023 at 11:42 AM David Jacot 
> wrote:
>
> > +1 (binding) Thanks for the KIP, Philip!
> >
> > Le mar. 25 avr. 2023 à 20:23, José Armando García Sancio
> >  a écrit :
> >
> > > +1. Thanks for the design. Looking forward to the implementation.
> > >
> > > On Tue, Apr 25, 2023 at 10:49 AM Jason Gustafson
> > >  wrote:
> > > >
> > > > +1 Thanks Philip!
> > > >
> > > >
> > > > On Thu, Apr 13, 2023 at 7:49 AM Kirk True  wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > > On Apr 10, 2023, at 1:53 PM, Philip Nee 
> > wrote:
> > > > > >
> > > > > > Hey everyone!
> > > > > >
> > > > > > I'm starting a vote for KIP-909: DNS Resolution Fallures Should
> Not
> > > Fail
> > > > > > the Client  > > Clients>
> > > > > >
> > > > > > Please refer to the discussion thread here:
> > > > > > https://lists.apache.org/thread/st84zzwnq5m3pkzd1r7jk9lmqdt9m98s
> > > > > >
> > > > > > Thanks!
> > > > > > P
> > > > >
> > > > >
> > >
> > >
> > >
> > > --
> > > -José
> > >
> >
>


Re: [DISCUSS] KIP-1018: Introduce max remote fetch timeout config

2024-04-29 Thread Christo Lolov
Heya!

Is it difficult to instead add the metric at
kafka.network:type=RequestMetrics,name=TieredStorageMs (or some other
name=*)? Alternatively, if it is difficult to add it there, is it possible
to add 2 metrics, one at the RequestMetrics level (even if it is
total-time-ms - (all other times)) and one at what you are proposing? As an
operator I would find it strange to not see the metric in the
RequestMetrics.

Your thoughts?

Best,
Christo

On Sun, 28 Apr 2024 at 10:52, Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Christo,
>
> Updated the KIP with the remote fetch latency metric. Please take another
> look!
>
> --
> Kamal
>
> On Sun, Apr 28, 2024 at 12:23 PM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Hi Federico,
> >
> > Thanks for the suggestion! Updated the config name to "
> > remote.fetch.max.wait.ms".
> >
> > Christo,
> >
> > Good point. We don't have the remote-read latency metrics to measure the
> > performance of the remote read requests. I'll update the KIP to emit this
> > metric.
> >
> > --
> > Kamal
> >
> >
> > On Sat, Apr 27, 2024 at 4:03 PM Federico Valeri 
> > wrote:
> >
> >> Hi Kamal, it looks like all TS configurations starts with "remote."
> >> prefix, so I was wondering if we should name it
> >> "remote.fetch.max.wait.ms".
> >>
> >> On Fri, Apr 26, 2024 at 7:07 PM Kamal Chandraprakash
> >>  wrote:
> >> >
> >> > Hi all,
> >> >
> >> > If there are no more comments, I'll start a vote thread by tomorrow.
> >> > Please review the KIP.
> >> >
> >> > Thanks,
> >> > Kamal
> >> >
> >> > On Sat, Mar 30, 2024 at 11:08 PM Kamal Chandraprakash <
> >> > kamal.chandraprak...@gmail.com> wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > Bumping the thread. Please review this KIP. Thanks!
> >> > >
> >> > > On Thu, Feb 1, 2024 at 9:11 PM Kamal Chandraprakash <
> >> > > kamal.chandraprak...@gmail.com> wrote:
> >> > >
> >> > >> Hi Jorge,
> >> > >>
> >> > >> Thanks for the review! Added your suggestions to the KIP. PTAL.
> >> > >>
> >> > >> The `fetch.max.wait.ms` config will be also applicable for topics
> >> > >> enabled with remote storage.
> >> > >> Updated the description to:
> >> > >>
> >> > >> ```
> >> > >> The maximum amount of time the server will block before answering
> the
> >> > >> fetch request
> >> > >> when it is reading near to the tail of the partition
> >> (high-watermark) and
> >> > >> there isn't
> >> > >> sufficient data to immediately satisfy the requirement given by
> >> > >> fetch.min.bytes.
> >> > >> ```
> >> > >>
> >> > >> --
> >> > >> Kamal
> >> > >>
> >> > >> On Thu, Feb 1, 2024 at 12:12 AM Jorge Esteban Quilcate Otoya <
> >> > >> quilcate.jo...@gmail.com> wrote:
> >> > >>
> >> > >>> Hi Kamal,
> >> > >>>
> >> > >>> Thanks for this KIP! It should help to solve one of the main
> issues
> >> with
> >> > >>> tiered storage at the moment that is dealing with individual
> >> consumer
> >> > >>> configurations to avoid flooding logs with interrupted exceptions.
> >> > >>>
> >> > >>> One of the topics discussed in [1][2] was on the semantics of `
> >> > >>> fetch.max.wait.ms` and how it's affected by remote storage.
> Should
> >> we
> >> > >>> consider within this KIP the update of `fetch.max.wail.ms` docs
> to
> >> > >>> clarify
> >> > >>> it only applies to local storage?
> >> > >>>
> >> > >>> Otherwise, LGTM -- looking forward to see this KIP adopted.
> >> > >>>
> >> > >>> [1] https://issues.apache.org/jira/browse/KAFKA-15776
> >> > >>> [2]
> >> https://github.com/apache/kafka/pull/14778#issuecomment-1820588080
> >> > >>>
> >> > >>> On Tue, 30 Jan 2024 at 01:01, Kamal Chandraprakash <
> >> > >>> kamal.chandraprak...@gmail.com> wrote:
> >> > >>>
> >> > >>> > Hi all,
> >> > >>> >
> >> > >>> > I have opened a KIP-1018
> >> > >>> > <
> >> > >>> >
> >> > >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests
> >> > >>> > >
> >> > >>> > to introduce dynamic max-remote-fetch-timeout broker config to
> >> give
> >> > >>> more
> >> > >>> > control to the operator.
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> > >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests
> >> > >>> >
> >> > >>> > Let me know if you have any feedback or suggestions.
> >> > >>> >
> >> > >>> > --
> >> > >>> > Kamal
> >> > >>> >
> >> > >>>
> >> > >>
> >>
> >
>