Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-27 Thread David Jacot
Hi all,

I am still experimenting with reducing the noise of flaky tests in build
results. I should have results to share early next week.

Chris, I am also for a programmatic gate. Regarding using ignoreFailures,
it seems risky because the build may be green but with failed tests, no?

I would also like to make it clear that the current rule applies until we
agree on a way forward here. At minimum, I think that a build should be
yellow for all the combinations and the failed tests should have been
triaged to ensure that they are not related to the changes. We should not
merge when a build is red or has not completed.

Best,
David

On Sat, Nov 25, 2023 at 5:25 AM Chris Egerton 
wrote:

> Hi all,
>
> There's a lot to catch up on here but I wanted to clarify something.
> Regarding this comment from Sophie:
>
>
> > Yet multiple people in this thread so
> far have voiced support for "gating merges on the successful completion of
> all parts of the build except tests". Just to be totally clear, I really
> don't think that was ever in question -- though it certainly doesn't hurt
> to remind everyone.
>
> > So, this thread is not about whether or not to merge with failing
> *builds, *but it's
> whether it should be acceptable to merge with failing *tests.*
>
>
> I think there's a misunderstanding here. I was suggesting
> programmatic gating, not manual. If we could disable these types of changes
> from being merged, instead of relying on committers to check and interpret
> Jenkins results, that'd be a quick win IMO. And, because of the
> already-discussed issues with flaky tests, it seemed difficult to disable
> PRs from being merged with failing tests--just for other parts of the
> build.
>
> However, I think the retry logic brought up by David could be sufficient to
> skip that kind of intermediate step and allow us to just start
> programmatically disabling PR merges if the build (including) tests fails.
> But if anyone's interested, we can still prevent failing tests from failing
> the build with the ignoreFailures property [1].
>
> [1] -
>
> https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.Test.html#org.gradle.api.tasks.testing.Test:ignoreFailures
>
> Cheers,
>
> Chris
>
> On Wed, Nov 22, 2023 at 3:00 AM Ismael Juma  wrote:
>
> > I think it breaks the Jenkins output otherwise. Feel free to test it via
> a
> > PR.
> >
> > Ismael
> >
> > On Wed, Nov 22, 2023, 12:42 AM David Jacot 
> > wrote:
> >
> > > Hi Ismael,
> > >
> > > No, I was not aware of KAFKA-12216. My understanding is that we could
> > still
> > > do it without the JUnitFlakyTestDataPublisher plugin and we could use
> > > gradle enterprise for this. Or do you think that reporting the flaky
> > tests
> > > in the build results is required?
> > >
> > > David
> > >
> > > On Wed, Nov 22, 2023 at 9:35 AM Ismael Juma  wrote:
> > >
> > > > Hi David,
> > > >
> > > > Did you take a look at
> > https://issues.apache.org/jira/browse/KAFKA-12216
> > > ?
> > > > I
> > > > looked into this option already (yes, there isn't much that we
> haven't
> > > > considered in this space).
> > > >
> > > > Ismael
> > > >
> > > > On Wed, Nov 22, 2023 at 12:24 AM David Jacot
> >  > > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Thanks for the good discussion and all the comments. Overall, it
> > seems
> > > > that
> > > > > we all agree on the bad state of our CI. That's a good first step!
> > > > >
> > > > > I have talked to a few folks this week about it and it seems that
> > many
> > > > > folks (including me) are not comfortable with merging PRs at the
> > moment
> > > > > because the results of our builds are so bad. I had 40+ failed
> tests
> > in
> > > > one
> > > > > of my PRs, all unrelated to my changes. It is really hard to be
> > > > productive
> > > > > with this.
> > > > >
> > > > > Personally, I really want to move towards requiring a green build
> to
> > > > merge
> > > > > to trunk because this is a clear and binary signal. I agree that we
> > > need
> > > > to
> > > > > stabilize the builds before we could even require this so here is
> my
> > > > > proposal.
> > > > >
> > > > > 1) We could leverage the `reports.junitXml.mergeReruns` option in
> > > gradle.
> > > > > From the doc [1]:
> > > > >
> > > > > > When mergeReruns is enabled, if a test fails but is then retried
> > and
> > > > > succeeds, its failures will be recorded as  instead
> of
> > > > > , within one . This is effectively the reporting
> > > > > produced by the surefire plugin of Apache Maven™ when enabling
> > reruns.
> > > If
> > > > > your CI server understands this format, it will indicate that the
> > test
> > > > was
> > > > > flaky. If it > does not, it will indicate that the test succeeded
> as
> > it
> > > > > will ignore the  information. If the test does not
> > > succeed
> > > > > (i.e. it fails for every retry), it will be indicated as having
> > failed
> > > > > whether your tool understands this format or not.
> > > > > > When mergeReruns is 

Re: [VOTE] 3.6.1 RC0

2023-11-27 Thread Kamal Chandraprakash
+1 (non-binding)

1. Built the source from 3.6.1-rc0 tag in scala 2.12 and 2.13
2. Ran all the unit and integration tests.
3. Ran quickstart and verified the produce-consume on a 3 node cluster.
4. Verified the tiered storage functionality with local-tiered storage.

On Tue, Nov 28, 2023 at 12:55 AM Federico Valeri 
wrote:

> Hi Mickael,
>
> - Build from source (Java 17, Scala 2.13)
> - Run unit and integration tests
> - Run custom client apps using staging artifacts
>
> +1 (non binding)
>
> Thanks
> Fede
>
>
>
> On Sun, Nov 26, 2023 at 11:34 AM Jakub Scholz  wrote:
> >
> > +1 non-binding. I used the staged Scala 2.13 artifacts and the staged
> Maven
> > repo for my tests. All seems to work fine.
> >
> > Thanks
> > Jakub
> >
> > On Fri, Nov 24, 2023 at 4:37 PM Mickael Maison 
> wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the first candidate for release of Apache Kafka 3.6.1.
> > >
> > > This is a bugfix release with several fixes, including dependency
> > > version bumps for CVEs.
> > >
> > > Release notes for the 3.6.1 release:
> > > https://home.apache.org/~mimaison/kafka-3.6.1-rc0/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by Friday, December 1
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > https://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > https://home.apache.org/~mimaison/kafka-3.6.1-rc0/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >
> > > * Javadoc:
> > > https://home.apache.org/~mimaison/kafka-3.6.1-rc0/javadoc/
> > >
> > > * Tag to be voted upon (off 3.6 branch) is the 3.6.1 tag:
> > > https://github.com/apache/kafka/releases/tag/3.6.1-rc0
> > >
> > > PR for updating docs:
> > > https://github.com/apache/kafka-site/pull/568
> > >
> > > * Documentation:
> > > https://kafka.apache.org/36/documentation.html
> > >
> > > * Protocol:
> > > https://kafka.apache.org/36/protocol.html
> > >
> > > * Successful Jenkins builds for the 3.6 branch:
> > > Unit/integration tests: We still have a lot of flaky tests in the 3.6
> > > branch. Looking at the last few 3.6 builds in
> > > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.6/ it seems all
> > > tests passed at least once apart from
> > > ClusterConnectionStatesTest.testSingleIP(). There's
> > > https://issues.apache.org/jira/browse/KAFKA-15762 to fix that test.
> > > System tests: Still running I'll post an update once they complete.
> > >
> > > Thanks,
> > > Mickael
> > >
>


[jira] [Created] (KAFKA-15911) KRaft quorum leader should make sure the follower fetch is making progress

2023-11-27 Thread Luke Chen (Jira)
Luke Chen created KAFKA-15911:
-

 Summary: KRaft quorum leader should make sure the follower fetch 
is making progress
 Key: KAFKA-15911
 URL: https://issues.apache.org/jira/browse/KAFKA-15911
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Reporter: Luke Chen


Just because the leader returned a successful response to FETCH and 
FETCH_SNAPSHOT doesn't mean that the followers were able to handle the response 
correctly.

For example, imagine the case where the log end offset (LEO) is at 1000 and all 
of the followers are continuously fetching at offset 0 without ever increasing 
their fetch offset. This can happen if the followers encounter an error when 
processing the FETCH or FETCH_SNAPSHOT response.

In this scenario the leader will never be able to increase the HWM. I think 
that this scenario is specific to KRaft and doesn't exists in Raft because 
KRaft is pull vs Raft which is push.


https://github.com/apache/kafka/pull/14428#pullrequestreview-1751408695



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2420

2023-11-27 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-15910) New group coordinator needs to generate snapshots while loading

2023-11-27 Thread Jeff Kim (Jira)
Jeff Kim created KAFKA-15910:


 Summary: New group coordinator needs to generate snapshots while 
loading
 Key: KAFKA-15910
 URL: https://issues.apache.org/jira/browse/KAFKA-15910
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jeff Kim
Assignee: Jeff Kim


After the new coordinator loads a __consumer_offsets partition, it logs the 
following exception when making a read operation (fetch/list groups, etc):

 
{{{}java.lang.RuntimeException: No in-memory snapshot for epoch 740745. 
Snapshot epochs are:{}}}{{{}at 
org.apache.kafka.timeline.SnapshotRegistry.getSnapshot(SnapshotRegistry.java:178){}}}{{{}at
 
org.apache.kafka.timeline.SnapshottableHashTable.snapshottableIterator(SnapshottableHashTable.java:407){}}}{{{}at
 
org.apache.kafka.timeline.TimelineHashMap$ValueIterator.(TimelineHashMap.java:283){}}}{{{}at
 
org.apache.kafka.timeline.TimelineHashMap$Values.iterator(TimelineHashMap.java:271){}}}
{{...}}
 
This happens because we don't have a snapshot at the last updated high 
watermark after loading. We cannot generate a snapshot at the high watermark 
after loading all batches because it may contain records that have not yet been 
committed. We also don't know where the high watermark will advance up to so we 
need to generate a snapshot for each offset the loader observes to be greater 
than the current high watermark. Then once we add the high watermark listener 
and update the high watermark we can delete all of the snapshots prior. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2419

2023-11-27 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-11-27 Thread Vahid Hashemian
Hi Qichao,

Thanks for proposing this KIP. It'd be super valuable to have the ability
to have those partition level metrics for Kafka topics.

Sorry I'm late to the discussion. I just wanted to bring up a point
for clarification and one question:

Let's assume that a production cluster cannot afford to enable high
verbosity on a permanent basis (at least not for all topics) due to
performance concerns.

Since this new config can be set dynamically, in case of an issue or
investigation that warrants obtaining partition level metrics, one can
simply enable high verbosity for select topic(s), temporarily collect
metrics at partition level, and then change the config back to the previous
setting. Since the config values are not set incrementally, the operator
would need to run a `describe` to get the existing config first, and then
amend it to enable high verbosity for the topic(s) of interest. Finally,
when the investigation concludes, the config has to be reverted to its
permanent setting.

If the above execution path makes sense, in case the operator forgets to
take an inventory of the existing (permanent) config and simply overwrites
it, then that permanent config will be gone and not retrievable. Is this
correct?

We usually don't need to temporarily change broker configs and I see this
config as one that can be temporarily changed. So keeping track of what the
value was before the change is rather important.

Aside from this point, my question is: What's the impact of `medium`
setting for `level`? I couldn't find it described in the KIP.

Thanks!
--Vahid



On Mon, Nov 13, 2023 at 5:34 AM Divij Vaidya 
wrote:

> Thank you for updating the KIP Qichao.
>
> I don't have any more questions or suggestions. Looks good to move forward
> from my perspective.
>
>
>
> --
> Divij Vaidya
>
>
>
> On Fri, Nov 10, 2023 at 2:25 PM Qichao Chu 
> wrote:
>
> > Thank you again for the nice suggestions, Jorge!
> > I will wait for Divij's response and move it to the vote stage once the
> > generic filter part reached concensus.
> >
> > Qichao Chu
> > Software Engineer | Data - Kafka
> > [image: Uber] 
> >
> >
> > On Fri, Nov 10, 2023 at 6:49 AM Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Hi Qichao,
> > >
> > > Thanks for updating the KIP, all updates look good to me.
> > >
> > > Looking forward to see this KIP moving forward!
> > >
> > > Cheers,
> > > Jorge.
> > >
> > >
> > >
> > > On Wed, 8 Nov 2023 at 08:55, Qichao Chu 
> wrote:
> > >
> > > > Hi Divij,
> > > >
> > > > Thank you for the feedback. I updated the KIP to make it a little bit
> > > more
> > > > generic: filters will stay in an array instead of different top-level
> > > > objects. In this way, if we need language filters in the future. The
> > > logic
> > > > relationship of filters is also added.
> > > >
> > > > Hi Jorge,
> > > >
> > > > Thank you for the review and great comments. Here is the reply for
> each
> > > of
> > > > the suggestions:
> > > >
> > > > 1) The words describing the property are now updated to include more
> > > > details of the keys in the JSON. It also explicitly mentions the JSON
> > > > nature of the config now.
> > > > 2) The JSON entries should be non-conflict so the order is not
> > relevant.
> > > If
> > > > there's conflict, the conflict resolution rules are stated in the
> KIP.
> > To
> > > > make it more clear, ordering and duplication rules are updated in the
> > > > Restrictions section of the *level* property.
> > > > 3) Yeah we did take a look at the RecordingLevel config and it does
> not
> > > > work for this case. The RecodingLevel config does not offer the
> > > capability
> > > > of filtering and it has a drawback of needing to be added to all the
> > > future
> > > > sensors. To reduce the duplication, I propose we merge the
> > RecordingLevel
> > > > to this more generic config in the future. Please take a look into
> the
> > > > *Using
> > > > the Existing RecordingLevel Config* section under *Rejected
> > Alternatives*
> > > > for more details.
> > > > 4) This suggestion makes a lot of sense. My idea is to create a
> > > > table/form/doc in the documentation for the verbosity levels of all
> > > metric
> > > > series. If it's too verbose to be in the docs, I will update the KIP
> to
> > > > include this info. I will create a JIRA for this effort once the KIP
> is
> > > > approved.
> > > > 5) Sure we can expand to all other series, added to the KIP.
> > > > 6) Added a new section(*Working with the Configuration via CLI)* with
> > the
> > > > user experience details
> > > > 7) Links are updated.
> > > >
> > > > Please take another look and let me know if you have any more
> concerns.
> > > >
> > > > Best,
> > > > Qichao Chu
> > > > Software Engineer | Data - Kafka
> > > > [image: Uber] 
> > > >
> > > >
> > > > On Wed, Nov 8, 2023 at 6:29 AM Jorge Esteban Quilcate Otoya <
> > > > quilcate.jo...@gmail.com> wrote:
> > > >
> > > > > Hi Qichao,
> > > > >

Re: [DISCUSS] KIP-996: Pre-Vote

2023-11-27 Thread Alyssa Huang
Thanks for the feedback Jason!

1. I might have missed it in the KIP, but could you clarify what happens
> when a pre-vote fails (i.e. a majority of voters reject the potential
> candidacy)? The transition descriptions only mention what happens if the
> prospective leader learns of a higher epoch.


I've updated the transition description for "Prospective" to cover this.
The behavior is also covered under "Proposed Changes" where I mention "When
a server receives VoteResponses, it will follow it up with another
VoteRequest with PreVote set to either true (send another Pre-Vote) or
false (send a standard vote)" - basically the server will send another
Pre-Vote request (after some backoff).

2. Do you think the pretend epoch bump is necessary? Would it be simpler to
> change the prevote acceptance check to assert a greater than or equal
> epoch?
>

My thought process was that sending the "desired" next epoch would mean we
could borrow much of the `handleVoteRequest` logic for accepting/rejecting
Pre-Votes. The meaning of the `CandidateEpoch` field (I need to rename that
field, for now I'll call it "ProposedEpoch") would also remain pretty much
the same for both the Pre-Vote and standard vote case - "The bumped epoch
of the prospective/candidate sending the request". I do see how it could be
confusing that the Pre-Vote request includes a bumped epoch when in
actuality there is no epoch bump. I've changed the VoteRequest json a bit
in the KIP, let me know what you think.

On Mon, Nov 27, 2023 at 1:40 PM Jason Gustafson 
wrote:

> Hey Alyssa,
>
> Thanks for the KIP! I have a couple questions:
>
> 1. I might have missed it in the KIP, but could you clarify what happens
> when a pre-vote fails (i.e. a majority of voters reject the potential
> candidacy)? The transition descriptions only mention what happens if the
> prospective leader learns of a higher epoch.
> 2. Do you think the pretend epoch bump is necessary? Would it be simpler to
> change the prevote acceptance check to assert a greater than or equal
> epoch?
>
> Best,
> Jason
>
>
> On Wed, Nov 22, 2023 at 11:51 AM Alyssa Huang  >
> wrote:
>
> > Hey folks,
> >
> > Starting a discussion thread for Pre-Vote design. Appreciate your
> comments
> > in advance!
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-996%3A+Pre-Vote
> >
> > Best,
> > Alyssa
> >
>


[jira] [Created] (KAFKA-15909) Remove support for empty "group.id"

2023-11-27 Thread Kirk True (Jira)
Kirk True created KAFKA-15909:
-

 Summary: Remove support for empty "group.id"
 Key: KAFKA-15909
 URL: https://issues.apache.org/jira/browse/KAFKA-15909
 Project: Kafka
  Issue Type: Sub-task
  Components: clients, consumer
Reporter: Kirk True
Assignee: Kirk True


Per KIP-266, the {{Consumer.poll(long timeout)}} method was deprecated back in 
2.0.0. 

In 3.7, there are two implementations, each with different behavior:

* The {{LegacyKafkaConsumer}} implementation will continue to work but will log 
a warning about its removal
* The {{AsyncKafkaConsumer}} implementation will throw an error.

In 4.0, the `poll` method that takes a single `long` timeout will be removed 
altogether.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15908) Remove deprecated Consumer API poll(long timeout)

2023-11-27 Thread Kirk True (Jira)
Kirk True created KAFKA-15908:
-

 Summary: Remove deprecated Consumer API poll(long timeout)
 Key: KAFKA-15908
 URL: https://issues.apache.org/jira/browse/KAFKA-15908
 Project: Kafka
  Issue Type: Sub-task
Reporter: Kirk True
Assignee: Kirk True


Per KIP-266, the {{Consumer.poll(long timeout)}} method was deprecated back in 
2.0.0. 

In 3.7, there are two implementations, each with different behavior:

* The {{LegacyKafkaConsumer}} implementation will continue to work but will log 
a warning about its removal
* The {{AsyncKafkaConsumer}} implementation will throw an error.

In 4.0, the `poll` method that takes a single `long` timeout will be removed 
altogether.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15907) Remove previously deprecated Consumer features from 4.0

2023-11-27 Thread Kirk True (Jira)
Kirk True created KAFKA-15907:
-

 Summary: Remove previously deprecated Consumer features from 4.0
 Key: KAFKA-15907
 URL: https://issues.apache.org/jira/browse/KAFKA-15907
 Project: Kafka
  Issue Type: Task
Reporter: Kirk True
Assignee: Kirk True


This Jira serves as the main collection of APIs, logic, etc. that were 
previously marked as "deprecated" by other KIPs. With 4.0, we will be updating 
the code to remove the deprecated features.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15860) ControllerRegistration must be written out to the metadata image

2023-11-27 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15860.
--
Fix Version/s: 3.7.0
   Resolution: Fixed

> ControllerRegistration must be written out to the metadata image
> 
>
> Key: KAFKA-15860
> URL: https://issues.apache.org/jira/browse/KAFKA-15860
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-996: Pre-Vote

2023-11-27 Thread Jason Gustafson
Hey Alyssa,

Thanks for the KIP! I have a couple questions:

1. I might have missed it in the KIP, but could you clarify what happens
when a pre-vote fails (i.e. a majority of voters reject the potential
candidacy)? The transition descriptions only mention what happens if the
prospective leader learns of a higher epoch.
2. Do you think the pretend epoch bump is necessary? Would it be simpler to
change the prevote acceptance check to assert a greater than or equal epoch?

Best,
Jason


On Wed, Nov 22, 2023 at 11:51 AM Alyssa Huang 
wrote:

> Hey folks,
>
> Starting a discussion thread for Pre-Vote design. Appreciate your comments
> in advance!
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-996%3A+Pre-Vote
>
> Best,
> Alyssa
>


Re: [VOTE] 3.6.1 RC0

2023-11-27 Thread Federico Valeri
Hi Mickael,

- Build from source (Java 17, Scala 2.13)
- Run unit and integration tests
- Run custom client apps using staging artifacts

+1 (non binding)

Thanks
Fede



On Sun, Nov 26, 2023 at 11:34 AM Jakub Scholz  wrote:
>
> +1 non-binding. I used the staged Scala 2.13 artifacts and the staged Maven
> repo for my tests. All seems to work fine.
>
> Thanks
> Jakub
>
> On Fri, Nov 24, 2023 at 4:37 PM Mickael Maison  wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the first candidate for release of Apache Kafka 3.6.1.
> >
> > This is a bugfix release with several fixes, including dependency
> > version bumps for CVEs.
> >
> > Release notes for the 3.6.1 release:
> > https://home.apache.org/~mimaison/kafka-3.6.1-rc0/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Friday, December 1
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~mimaison/kafka-3.6.1-rc0/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://home.apache.org/~mimaison/kafka-3.6.1-rc0/javadoc/
> >
> > * Tag to be voted upon (off 3.6 branch) is the 3.6.1 tag:
> > https://github.com/apache/kafka/releases/tag/3.6.1-rc0
> >
> > PR for updating docs:
> > https://github.com/apache/kafka-site/pull/568
> >
> > * Documentation:
> > https://kafka.apache.org/36/documentation.html
> >
> > * Protocol:
> > https://kafka.apache.org/36/protocol.html
> >
> > * Successful Jenkins builds for the 3.6 branch:
> > Unit/integration tests: We still have a lot of flaky tests in the 3.6
> > branch. Looking at the last few 3.6 builds in
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.6/ it seems all
> > tests passed at least once apart from
> > ClusterConnectionStatesTest.testSingleIP(). There's
> > https://issues.apache.org/jira/browse/KAFKA-15762 to fix that test.
> > System tests: Still running I'll post an update once they complete.
> >
> > Thanks,
> > Mickael
> >


[jira] [Created] (KAFKA-15906) Emit offset syncs more often than offset.lag.max for low-throughput/finite partitions

2023-11-27 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15906:
---

 Summary: Emit offset syncs more often than offset.lag.max for 
low-throughput/finite partitions
 Key: KAFKA-15906
 URL: https://issues.apache.org/jira/browse/KAFKA-15906
 Project: Kafka
  Issue Type: Improvement
  Components: mirrormaker
Reporter: Greg Harris


Right now, the offset.lag.max configuration limits the number of offset syncs 
are emitted by the MirrorSourceTask, along with a fair rate-limiting semaphore. 
After 100 records have been emitted for a partition, _and_ the semaphore is 
available, an offset sync can be emitted.

For low-volume topics, the `offset.lag.max` default of 100 is much more 
restrictive than the rate-limiting semaphore. For example, a topic which 
mirrors at the rate of 1 record/sec may take 100 seconds to receive an offset 
sync. If the topic is actually finite, the last offset sync will never arrive, 
and the translation will have a persistent lag.

Instead, we can periodically flush the offset syncs for partitions that are 
under the offset.lag.max limit, but have not received an offset sync recently. 
This could be a new configuration, be a hard-coded time, or be based on the 
existing emit.checkpoints.interval.seconds and 
sync.group.offsets.interval.seconds configurations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-853: KRaft Voters Change

2023-11-27 Thread José Armando García Sancio
On Mon, Nov 27, 2023 at 2:32 AM Josep Prat  wrote:
> I wanted to revive this thread and see if there is anything preventing it to 
> be voted on. Happy to help unblocking anything that might be holding this 
> back.

Hi Josep,

Thanks for reviving the thread. I need to make some changes to the
KIP. My thinking has changed a bit since I wrote this KIP. The core of
the design still holds. I just want to improve the wording and
usability.

I should have an updated KIP this week. I'll restart the discussion
thread at that point.

Thanks,
-- 
-José


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2418

2023-11-27 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-15905) Restarts of MirrorCheckpointTask should not permanently interrupt offset translation

2023-11-27 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15905:
---

 Summary: Restarts of MirrorCheckpointTask should not permanently 
interrupt offset translation
 Key: KAFKA-15905
 URL: https://issues.apache.org/jira/browse/KAFKA-15905
 Project: Kafka
  Issue Type: Improvement
  Components: mirrormaker
Affects Versions: 3.6.0
Reporter: Greg Harris


Executive summary: When the MirrorCheckpointTask restarts, it loses the state 
of checkpointsPerConsumerGroup, which limits offset translation to records 
mirrored after the latest restart.

For example, if 1000 records are mirrored and the OffsetSyncs are read by 
MirrorCheckpointTask, the emitted checkpoints are cached, and translation can 
happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more 
records are mirrored, translation can happen at the ~1500th record, but no 
longer at the ~500th record.

Context:

Before KAFKA-13659, MM2 made translation decisions based on the 
incompletely-initialized OffsetSyncStore, and the checkpoint could appear to go 
backwards temporarily during restarts. To fix this, we forced the 
OffsetSyncStore to initialize completely before translation could take place, 
ensuring that the latest OffsetSync had been read, and thus providing the most 
accurate translation.

Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync. 
Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to 
allow for translation of earlier offsets. This came with the caveat that the 
cache's sparseness allowed translations to go backwards permanently. To prevent 
this behavior, a cache of the latest Checkpoints was kept in the 
MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset 
translation remained restricted to the fully-initialized OffsetSyncStore.

Effectively, the MirrorCheckpointTask ensures that it translates based on an 
OffsetSync emitted during it's lifetime, to ensure that no previous 
MirrorCheckpointTask emitted a later sync. If we can read the checkpoints 
emitted by previous generations of MirrorCheckpointTask, we can still ensure 
that checkpoints are monotonic, while allowing translation further back in 
history.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15661) KIP-951: Server side and protocol changes

2023-11-27 Thread Crispin Bernier (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Crispin Bernier resolved KAFKA-15661.
-
Resolution: Resolved

> KIP-951: Server side and protocol changes
> -
>
> Key: KAFKA-15661
> URL: https://issues.apache.org/jira/browse/KAFKA-15661
> Project: Kafka
>  Issue Type: Task
>  Components: protocol
>Reporter: Crispin Bernier
>Assignee: Crispin Bernier
>Priority: Major
> Fix For: 3.7.0
>
>
> Server side and protocol changes for implementing KIP-951, passing back the 
> new leader to the client on NOT_LEADER_OR_FOLLOWER errors for fetch and 
> produce requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2417

2023-11-27 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-15904) Downgrade tests are failing with directory.id 

2023-11-27 Thread Manikumar (Jira)
Manikumar created KAFKA-15904:
-

 Summary: Downgrade tests are failing with directory.id 
 Key: KAFKA-15904
 URL: https://issues.apache.org/jira/browse/KAFKA-15904
 Project: Kafka
  Issue Type: Bug
Reporter: Manikumar
 Fix For: 3.7.0


{{kafkatest.tests.core.downgrade_test.TestDowngrade}} tests are failing after 
[https://github.com/apache/kafka/pull/14628.] 
We have added {{directory.id}} to metadata.properties. This means 
{{metadata.properties}} will be different for different log directories.
Cluster downgrades will fail with below error if we have multiple log 
directories . This looks blocker or requires additional downgrade steps from AK 
3.7. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15903) Add github actions workflow for building and pushing RC Docker Image

2023-11-27 Thread Vedarth Sharma (Jira)
Vedarth Sharma created KAFKA-15903:
--

 Summary: Add github actions workflow for building and pushing RC 
Docker Image
 Key: KAFKA-15903
 URL: https://issues.apache.org/jira/browse/KAFKA-15903
 Project: Kafka
  Issue Type: Sub-task
Reporter: Vedarth Sharma
Assignee: Vedarth Sharma


This github actions workflow should build and push the multi arch RC docker 
image to docker registry



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-853: KRaft Voters Change

2023-11-27 Thread Josep Prat
Hi community,

I wanted to revive this thread and see if there is anything preventing it to be 
voted on. Happy to help unblocking anything that might be holding this back.

Best,
Josep Prat
On 2022/07/27 15:08:24 José Armando García Sancio wrote:
> Hi all,
> 
> Community members Jason Gustafson, Colin P. McCabe and I have been
> having some offline conversations.
> 
> At a high-level KIP-853 solves the problems:
> 1) How can KRaft detect and recover from disk failures on the minority
> of the voters?
> 2) How can KRaft support a changing set of voter nodes?
> 
> I think that problem 2) is a superset of problem 1). The mechanism for
> solving problem 2) can be used to solve problem 1). This is the reason
> that I decided to design them together and proposed this KIP. Problem
> 2) adds the additional requirement of how observers (Brokers and new
> Controllers) discover the leader? KIP-853 solves this problem by
> returning the endpoint of the leader in all of the KRaft RPCs. There
> are some concerns with this approach.
> 
> To solve problem 1) we don't need to return the leader's endpoint
> since it is expressed in the controller.quorum.voters property. To
> make faster progress on 1) I have decided to create "KIP-856: KRaft
> Disk Failure Recovery" that just addresses this problem. I will be
> starting a discussion thread for KIP-856 soon.
> 
> We can continue the discussion of KIP-853 here. If KIP-856 gets
> approved I will either:
> 3) Modify KIP-853 to just describe the improvement needed on top of KIP-856.
> 4) Create a new KIP and abandon KIP-853. This new KIP will take into
> account all of the discussion from this thread.
> 
> Thanks!
> -- 
> -José
> 


[jira] [Resolved] (KAFKA-14624) State restoration is broken with standby tasks and cache-enabled stores in processor API

2023-11-27 Thread Lucas Brutschy (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucas Brutschy resolved KAFKA-14624.

Resolution: Fixed

> State restoration is broken with standby tasks and cache-enabled stores in 
> processor API
> 
>
> Key: KAFKA-14624
> URL: https://issues.apache.org/jira/browse/KAFKA-14624
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.1
>Reporter: Balaji Rao
>Assignee: Lucas Brutschy
>Priority: Major
>
> I found that cache-enabled state stores in PAPI with standby tasks sometimes 
> returns stale data when a partition moves from one app instance to another 
> and back. [Here's|https://github.com/balajirrao/kafka-streams-multi-runner] a 
> small project that I used to reproduce the issue.
> I dug around a bit and it seems like it's a bug in standby task state 
> restoration when caching is enabled. If a partition moves from instance 1 to 
> 2 and then back to instance 1,  since the `CachingKeyValueStore` doesn't 
> register a restore callback, it can return potentially stale data for 
> non-dirty keys. 
> I could fix the issue by modifying the `CachingKeyValueStore` to register a 
> restore callback in which the cache restored keys are added to the cache. Is 
> this fix in the right direction?
> {code:java}
> // register the store
> context.register(
> root,
> (RecordBatchingStateRestoreCallback) records -> {
> for (final ConsumerRecord record : 
> records) {
> put(Bytes.wrap(record.key()), record.value());
> }
> }
> );
> {code}
>  
> I would like to contribute a fix, if I can get some help!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Road to Kafka 4.0

2023-11-27 Thread Colin McCabe
On Fri, Nov 24, 2023, at 03:47, Anton Agestam wrote:
> In your last message you wrote:
>
> > But, on the KRaft side, I still maintain that nothing is missing except
> > JBOD, which we already have a plan for.
>
> But earlier in this thread you mentioned an issue with "torn writes",
> possibly missing tests, as well as the fact that the recommended method of
> replacing controller nodes is undocumented. Would you mind clarifying what
> your stance is on these three issues? Do you think that they are important
> enablers of upgrade paths or not?

Hi Anton,

There shouldn't be anything blocking controller disk replacement now. From 
memory (not looking at the code now), we do log recovery on our single log 
directory every time we start the controller, so it should handle partial 
records there. I do agree that a test would be good, and some documentation. 
I'll probably take a look at that this week if I get some time.

> > Well, the line was drawn in KIP-833. If we redraw it, what is to stop us
> > from redrawing it again and again?
>
> I'm fairly new to the Kafka community so please forgive me if I'm missing
> things that have been said in earlier discussions, but reading up on that
> KIP I see it has language like "Note: this timeline is very rough and
> subject to change." in the section of versions, but it also says "As
> outlined above, we expect to close these gaps soon" with relation to the
> outstanding features. From my perspective this doesn't really look like an
> agreement that dynamic quorum membership changes shall not be a blocker for
> 4.0.

The timeline was rough because we wrote that in 2022, trying to look forward 
multiple releases. The gaps that were discussed have all been closed -- except 
for JBOD, which we are working on this quarter.

The set of features needed for 4.0 is very clearly described in KIP-833. 
There's no uncertainty on that point.

>
> To answer the specific question you pose here, "what is to stop us from
> redrawing it again and again?", wouldn't the suggestion of parallel work
> lanes brought up by Josep address this concern?
>

It's very important not to fragment the community by supporting multiple 
long-running branch lines. At the end of the day, once branch 3's time has 
come, it needs to fade away, just like JDK 6 support or the old Scala producer.

best,
Colin


> BR,
> Anton
>
> Den tors 23 nov. 2023 kl 05:48 skrev Colin McCabe :
>
>> On Tue, Nov 21, 2023, at 19:30, Luke Chen wrote:
>> > Yes, KIP-853 and disk failure support are both very important missing
>> > features. For the disk failure support, I don't think this is a
>> > "good-to-have-feature", it should be a "must-have" IMO. We can't announce
>> > the 4.0 release without a good solution for disk failure in KRaft.
>>
>> Hi Luke,
>>
>> Thanks for the reply.
>>
>> Controller disk failure support is not missing from KRaft. I described how
>> to handle controller disk failures earlier in this thread.
>>
>> I should note here that the broker in ZooKeeper mode also requires manual
>> handling of disk failures. Restarting a broker with the same ID, but an
>> empty disk, breaks the invariants of replication when in ZK mode. Consider:
>>
>> 1. Broker 1 goes down. A ZK state change notification for /brokers fires
>> and goes on the controller queue.
>>
>> 2. Broker 1 comes back up with an empty disk.
>>
>> 3. The controller processes the zk state change notification for /brokers.
>> Since broker 1 is up no action is taken.
>>
>> 4. Now broker 1 is in the ISR for any partitions it was previously, but
>> has no data. If it is or becomes leader for any partitions, irreversable
>> data loss will occur.
>>
>> This problem is more than theoretical. We at Confluent have observed it in
>> production and put in place special workarounds for the ZK clusters we
>> still have.
>>
>> KRaft has never had this problem because brokers are removed from ISRs
>> when a new incarnation of the broker registers.
>>
>> So perhaps ZK mode is not ready for production for Aiven? Since disk
>> failures do in fact require special handling there. (And/or bringing up new
>> nodes with empty disks, which seems to be their main concern.)
>>
>> >
>> > It’s also worth thinking about how Apache Kafka users who depend on JBOD
>> > might look at the risks of not having a 3.8 release. JBOD support on
>> KRaft
>> > is planned to be added in 3.7, and is still in progress so far. So it’s
>> > hard to say it’s a blocker or not. But in practice, even if the feature
>> is
>> > made into 3.7 in time, a lot of new code for this feature is unlikely to
>> be
>> > entirely bug free. We need to maintain the confidence of those users, and
>> > forcing them to migrate through 3.7 where this new code is hardly
>> > battle-tested doesn’t appear to do that.
>> >
>>
>> As Ismael said, if there are JBOD bugs in 3.7, we will do follow-on point
>> releases to address them.
>>
>> > Our goal for 4.0 should be that all the “main” features in KRaft are in
>> > 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2416

2023-11-27 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-15798) Flaky Test NamedTopologyIntegrationTest.shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology()

2023-11-27 Thread Lucas Brutschy (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucas Brutschy resolved KAFKA-15798.

Resolution: Fixed

Test disabled in https://github.com/apache/kafka/pull/14830 since feature will 
be removed.

> Flaky Test 
> NamedTopologyIntegrationTest.shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology()
> -
>
> Key: KAFKA-15798
> URL: https://issues.apache.org/jira/browse/KAFKA-15798
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Reporter: Justine Olshan
>Assignee: Lucas Brutschy
>Priority: Major
>  Labels: flaky-test
>
> I saw a few examples recently. 2 have the same error, but the third is 
> different
> [https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14629/22/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_8_and_Scala_2_12___shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology___2/]
> [https://ci-builds.apache.org/job/Kafka/job/kafka/job/trunk/2365/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_21_and_Scala_2_13___shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology__/]
>  
> The failure is like
> {code:java}
> java.lang.AssertionError: Did not receive all 5 records from topic 
> output-stream-1 within 6 ms, currently accumulated data is [] Expected: 
> is a value equal to or greater than <5> but: <0> was less than <5>{code}
> The other failure was
> [https://ci-builds.apache.org/job/Kafka/job/kafka/job/trunk/2365/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_8_and_Scala_2_12___shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology__/]
> {code:java}
> java.lang.AssertionError: Expected: <[0, 1]> but: was <[0]>{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14014) Flaky test NamedTopologyIntegrationTest.shouldAllowRemovingAndAddingNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets()

2023-11-27 Thread Lucas Brutschy (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucas Brutschy resolved KAFKA-14014.

Resolution: Fixed

Test disabled since feature will be removed.

> Flaky test 
> NamedTopologyIntegrationTest.shouldAllowRemovingAndAddingNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets()
> 
>
> Key: KAFKA-14014
> URL: https://issues.apache.org/jira/browse/KAFKA-14014
> Project: Kafka
>  Issue Type: Test
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: Matthew de Detrich
>Priority: Critical
>  Labels: flaky-test
>
> {code:java}
> java.lang.AssertionError: 
> Expected: <[KeyValue(B, 1), KeyValue(A, 2), KeyValue(C, 2)]>
>  but: was <[KeyValue(B, 1), KeyValue(A, 2), KeyValue(C, 1)]>
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
>   at 
> org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAllowRemovingAndAddingNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets(NamedTopologyIntegrationTest.java:540)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at java.base/java.lang.Thread.run(Thread.java:833)
> {code}
> https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-12310/2/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_11_and_Scala_2_13___shouldAllowRemovingAndAddingNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets/
> https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-12310/2/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_17_and_Scala_2_13___shouldAllowRemovingAndAddingNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13531) Flaky test NamedTopologyIntegrationTest

2023-11-27 Thread Lucas Brutschy (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucas Brutschy resolved KAFKA-13531.

Resolution: Fixed

Test disabled since feature will be removed.

> Flaky test NamedTopologyIntegrationTest
> ---
>
> Key: KAFKA-13531
> URL: https://issues.apache.org/jira/browse/KAFKA-13531
> Project: Kafka
>  Issue Type: Test
>  Components: streams, unit tests
>Reporter: Matthias J. Sax
>Assignee: Matthew de Detrich
>Priority: Critical
>  Labels: flaky-test
> Attachments: 
> org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldRemoveOneNamedTopologyWhileAnotherContinuesProcessing().test.stdout
>
>
> org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldRemoveNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets
> {quote}java.lang.AssertionError: Did not receive all 3 records from topic 
> output-stream-2 within 6 ms, currently accumulated data is [] Expected: 
> is a value equal to or greater than <3> but: <0> was less than <3> at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.lambda$waitUntilMinKeyValueRecordsReceived$1(IntegrationTestUtils.java:648)
>  at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:368)
>  at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:336)
>  at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:644)
>  at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:617)
>  at 
> org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldRemoveNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets(NamedTopologyIntegrationTest.java:439){quote}
> STDERR
> {quote}java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.GroupSubscribedToTopicException: Deleting 
> offsets of a topic is forbidden while the consumer group is actively 
> subscribed to it. at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) 
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
>  at 
> org.apache.kafka.streams.processor.internals.namedtopology.KafkaStreamsNamedTopologyWrapper.lambda$removeNamedTopology$3(KafkaStreamsNamedTopologyWrapper.java:213)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.lambda$whenComplete$2(KafkaFutureImpl.java:107)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
> at 
> org.apache.kafka.common.internals.KafkaCompletableFuture.kafkaComplete(KafkaCompletableFuture.java:39)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.complete(KafkaFutureImpl.java:122)
>  at 
> org.apache.kafka.streams.processor.internals.TopologyMetadata.maybeNotifyTopologyVersionWaiters(TopologyMetadata.java:154)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.checkForTopologyUpdates(StreamThread.java:916)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:598)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:575)
>  Caused by: org.apache.kafka.common.errors.GroupSubscribedToTopicException: 
> Deleting offsets of a topic is forbidden while the consumer group is actively 
> subscribed to it. java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.GroupSubscribedToTopicException: Deleting 
> offsets of a topic is forbidden while the consumer group is actively 
> subscribed to it. at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) 
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
>  at 
> org.apache.kafka.streams.processor.internals.namedtopology.KafkaStreamsNamedTopologyWrapper.lambda$removeNamedTopology$3(KafkaStreamsNamedTopologyWrapper.java:213)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.lambda$whenComplete$2(KafkaFutureImpl.java:107)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
> 

Re: [DISCUSS] KIP-896: Remove old client protocol API versions in Kafka 4.0

2023-11-27 Thread Anton Agestam
Sorry, the underscore was meant to refer to the
https://github.com/apache/kafka/tree/trunk/clients/src/main/resources/common/message
directory in the previous message.

Den fre 24 nov. 2023 kl 14:30 skrev Anton Agestam :

> Hi Ismael,
>
> This looks like a healthy KIP for Kafka 
>
> As the implementer of a Kafka Protocol library for Python, Aiven-Open/kio
> [1], I'm curious how this change will affect the library.
>
> We generate entities for the full protocol by introspecting the JSON
> schema definitions under _. How will the KIP change those definitions? Will
> the dropped versions be reflected as bumps of the lower limit
> in "validVersions"?
>
> Thanks and BR,
> Anton
>
> [1]: https://github.com/Aiven-Open/kio
>
> On 2023/01/03 16:17:24 Ismael Juma wrote:
> > Hi all,
> >
> > I would like to start a discussion regarding the removal of very old
> client
> > protocol API versions in Apache Kafka 4.0 to improve maintainability &
> > supportability of Kafka. Please take a look at the proposal:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-896%3A+Remove+old+client+protocol+API+versions+in+Kafka+4.0
> >
> > Ismael
> >
>