Re: [VOTE] 3.7.0 RC2

2024-02-05 Thread Stanislav Kozlovski
Thanks Mickael, sounds good.

KAFKA-160195 and KAFKA-16157 were both merged!

I was made aware of one final blocker, this time for streams - KAFKA-16221.
Matthias was prompt with a short hotfix PR:
https://github.com/apache/kafka/pull/15315

After that goes into 3.7, I think I will be free to build the next RC.
Great work!

On Fri, Feb 2, 2024 at 6:43 PM Mickael Maison 
wrote:

> Hi Stanislav,
>
> I merged https://github.com/apache/kafka/pull/15308 in trunk. I let
> you cherry-pick it to 3.7.
>
> I think fixing the absolute show stoppers and calling JBOD support in
> KRaft early access in 3.7.0 is probably the right call. Even without
> the bugs we found, there's still quite a few JBOD follow up work to do
> (KAFKA-16061) + system tests and documentation updates.
>
> Thanks,
> Mickael
>
> On Fri, Feb 2, 2024 at 4:49 PM Stanislav Kozlovski
>  wrote:
> >
> > Thanks for the work everybody. Providing a status update at the end of
> the
> > week:
> >
> > - docs change explaining migration
> >  was merged
> > - the blocker KAFKA-16162 
> was
> > merged
> > - the blocker KAFKA-14616 
> was
> > merged
> > - a small blocker problem with the shadow jar plugin
> > 
> > - the blockers KAFKALESS-16157 & KAFKALESS-16195 aren't merged
> > - the good-to-have KAFKA-16082 isn't merged
> >
> > I think we should prioritize merging KAFKALESS-16195 and *call JBOD EA*.
> I
> > question whether we may find more blocker bugs in the next RC.
> > The release is late by approximately a month so far, so I do want to
> scope
> > down aggressively to meet the time-based goal.
> >
> > Best,
> > Stanislav
> >
> > On Mon, Jan 29, 2024 at 5:46 PM Omnia Ibrahim 
> > wrote:
> >
> > > Hi Stan and Gaurav,
> > > Just to clarify some points mentioned here before
> > >  KAFKA-14616: I raised a year ago so it's not related to JBOD work. It
> is
> > > rather a blocker bug for KRAFT in general. The PR from Colin should fix
> > > this. Am not sure if it is a blocker for 3.7 per-say as it was a major
> bug
> > > since 3.3 and got missed from all other releases.
> > >
> > > Regarding the JBOD's work:
> > > KAFKA-16082:  Is not a blocker for 3.7 instead it's nice fix. The pr
> > > https://github.com/apache/kafka/pull/15136 is quite a small one and
> was
> > > approved by Proven and I but it is waiting for a committer's approval.
> > > KAFKA-16162: This is a blocker for 3.7.  Same it’s a small pr
> > > https://github.com/apache/kafka/pull/15270 and it is approved Proven
> and
> > > I and the PR is waiting for committer's approval.
> > > KAFKA-16157: This is a blocker for 3.7. There is one small suggestion
> for
> > > the pr https://github.com/apache/kafka/pull/15263 but I don't think
> any
> > > of the current feedback is blocking the pr from getting approved.
> Assuming
> > > we get a committer's approval on it.
> > > KAFKA-16195:  Same it's a blocker but it has approval from Proven and I
> > > and we are waiting for committer's approval on the pr
> > > https://github.com/apache/kafka/pull/15262.
> > >
> > > If we can’t get a committer approval for KAFKA-16162, KAFKA-16157 and
> > > KAFKA-16195  in time for 3.7 then we can mark JBOD as early release
> > > assuming we merge at least KAFKA-16195.
> > >
> > > Regards,
> > > Omnia
> > >
> > > > On 26 Jan 2024, at 15:39, ka...@gnarula.com wrote:
> > > >
> > > > Apologies, I duplicated KAFKA-16157 twice in my previous message. I
> > > intended to mention KAFKA-16195
> > > > with the PR at https://github.com/apache/kafka/pull/15262 as the
> second
> > > JIRA.
> > > >
> > > > Thanks,
> > > > Gaurav
> > > >
> > > >> On 26 Jan 2024, at 15:34, ka...@gnarula.com wrote:
> > > >>
> > > >> Hi Stan,
> > > >>
> > > >> I wanted to share some updates about the bugs you shared earlier.
> > > >>
> > > >> - KAFKA-14616: I've reviewed and tested the PR from Colin and have
> > > observed
> > > >> the fix works as intended.
> > > >> - KAFKA-16162: I reviewed Proven's PR and found some gaps in the
> > > proposed fix. I've
> > > >> therefore raised https://github.com/apache/kafka/pull/15270
> following
> > > a discussion with Luke in JIRA.
> > > >> - KAFKA-16082: I don't think this is marked as a blocker anymore.
> I'm
> > > awaiting
> > > >> feedback/reviews at https://github.com/apache/kafka/pull/15136
> > > >>
> > > >> In addition to the above, there are 2 JIRAs I'd like to bring
> > > everyone's attention to:
> > > >>
> > > >> - KAFKA-16157: This is similar to KAFKA-14616 and is marked as a
> > > blocker. I've raised
> > > >> https://github.com/apache/kafka/pull/15263 and am awaiting reviews
> on
> > > it.
> > > >> - KAFKA-16157: I raised this yesterday and have addressed feedback
> from
> > > Luke. This should
> > > >> hopefully get merged soon.
> > > >>
> > > >> Regards,
> > > >> Gaurav
> > > >>
> > > >>
> > > >>> On 24 Jan 2024, at 11:51, ka...@gnarul

Re: [VOTE] 3.7.0 RC2

2024-02-02 Thread Mickael Maison
Hi Stanislav,

I merged https://github.com/apache/kafka/pull/15308 in trunk. I let
you cherry-pick it to 3.7.

I think fixing the absolute show stoppers and calling JBOD support in
KRaft early access in 3.7.0 is probably the right call. Even without
the bugs we found, there's still quite a few JBOD follow up work to do
(KAFKA-16061) + system tests and documentation updates.

Thanks,
Mickael

On Fri, Feb 2, 2024 at 4:49 PM Stanislav Kozlovski
 wrote:
>
> Thanks for the work everybody. Providing a status update at the end of the
> week:
>
> - docs change explaining migration
>  was merged
> - the blocker KAFKA-16162  was
> merged
> - the blocker KAFKA-14616  was
> merged
> - a small blocker problem with the shadow jar plugin
> 
> - the blockers KAFKALESS-16157 & KAFKALESS-16195 aren't merged
> - the good-to-have KAFKA-16082 isn't merged
>
> I think we should prioritize merging KAFKALESS-16195 and *call JBOD EA*. I
> question whether we may find more blocker bugs in the next RC.
> The release is late by approximately a month so far, so I do want to scope
> down aggressively to meet the time-based goal.
>
> Best,
> Stanislav
>
> On Mon, Jan 29, 2024 at 5:46 PM Omnia Ibrahim 
> wrote:
>
> > Hi Stan and Gaurav,
> > Just to clarify some points mentioned here before
> >  KAFKA-14616: I raised a year ago so it's not related to JBOD work. It is
> > rather a blocker bug for KRAFT in general. The PR from Colin should fix
> > this. Am not sure if it is a blocker for 3.7 per-say as it was a major bug
> > since 3.3 and got missed from all other releases.
> >
> > Regarding the JBOD's work:
> > KAFKA-16082:  Is not a blocker for 3.7 instead it's nice fix. The pr
> > https://github.com/apache/kafka/pull/15136 is quite a small one and was
> > approved by Proven and I but it is waiting for a committer's approval.
> > KAFKA-16162: This is a blocker for 3.7.  Same it’s a small pr
> > https://github.com/apache/kafka/pull/15270 and it is approved Proven and
> > I and the PR is waiting for committer's approval.
> > KAFKA-16157: This is a blocker for 3.7. There is one small suggestion for
> > the pr https://github.com/apache/kafka/pull/15263 but I don't think any
> > of the current feedback is blocking the pr from getting approved. Assuming
> > we get a committer's approval on it.
> > KAFKA-16195:  Same it's a blocker but it has approval from Proven and I
> > and we are waiting for committer's approval on the pr
> > https://github.com/apache/kafka/pull/15262.
> >
> > If we can’t get a committer approval for KAFKA-16162, KAFKA-16157 and
> > KAFKA-16195  in time for 3.7 then we can mark JBOD as early release
> > assuming we merge at least KAFKA-16195.
> >
> > Regards,
> > Omnia
> >
> > > On 26 Jan 2024, at 15:39, ka...@gnarula.com wrote:
> > >
> > > Apologies, I duplicated KAFKA-16157 twice in my previous message. I
> > intended to mention KAFKA-16195
> > > with the PR at https://github.com/apache/kafka/pull/15262 as the second
> > JIRA.
> > >
> > > Thanks,
> > > Gaurav
> > >
> > >> On 26 Jan 2024, at 15:34, ka...@gnarula.com wrote:
> > >>
> > >> Hi Stan,
> > >>
> > >> I wanted to share some updates about the bugs you shared earlier.
> > >>
> > >> - KAFKA-14616: I've reviewed and tested the PR from Colin and have
> > observed
> > >> the fix works as intended.
> > >> - KAFKA-16162: I reviewed Proven's PR and found some gaps in the
> > proposed fix. I've
> > >> therefore raised https://github.com/apache/kafka/pull/15270 following
> > a discussion with Luke in JIRA.
> > >> - KAFKA-16082: I don't think this is marked as a blocker anymore. I'm
> > awaiting
> > >> feedback/reviews at https://github.com/apache/kafka/pull/15136
> > >>
> > >> In addition to the above, there are 2 JIRAs I'd like to bring
> > everyone's attention to:
> > >>
> > >> - KAFKA-16157: This is similar to KAFKA-14616 and is marked as a
> > blocker. I've raised
> > >> https://github.com/apache/kafka/pull/15263 and am awaiting reviews on
> > it.
> > >> - KAFKA-16157: I raised this yesterday and have addressed feedback from
> > Luke. This should
> > >> hopefully get merged soon.
> > >>
> > >> Regards,
> > >> Gaurav
> > >>
> > >>
> > >>> On 24 Jan 2024, at 11:51, ka...@gnarula.com wrote:
> > >>>
> > >>> Hi Stanislav,
> > >>>
> > >>> Thanks for bringing these JIRAs/PRs up.
> > >>>
> > >>> I'll be testing the open PRs for KAFKA-14616 and KAFKA-16162 this week
> > and I hope to have some feedback
> > >>> by Friday. I gather the latter JIRA is marked as a WIP by Proven and
> > he's away. I'll try to build on his work in the meantime.
> > >>>
> > >>> As for KAFKA-16082, we haven't been able to deduce a data loss
> > scenario. There's a PR open
> > >>> by me for promoting an abandoned future replica with approvals from
> > Omnia and Proven,
> > >>> so I'd appreciate a committer reviewing it.
> > >>>
> > >>

Re: [VOTE] 3.7.0 RC2

2024-02-02 Thread Stanislav Kozlovski
Thanks for the work everybody. Providing a status update at the end of the
week:

- docs change explaining migration
 was merged
- the blocker KAFKA-16162  was
merged
- the blocker KAFKA-14616  was
merged
- a small blocker problem with the shadow jar plugin

- the blockers KAFKALESS-16157 & KAFKALESS-16195 aren't merged
- the good-to-have KAFKA-16082 isn't merged

I think we should prioritize merging KAFKALESS-16195 and *call JBOD EA*. I
question whether we may find more blocker bugs in the next RC.
The release is late by approximately a month so far, so I do want to scope
down aggressively to meet the time-based goal.

Best,
Stanislav

On Mon, Jan 29, 2024 at 5:46 PM Omnia Ibrahim 
wrote:

> Hi Stan and Gaurav,
> Just to clarify some points mentioned here before
>  KAFKA-14616: I raised a year ago so it's not related to JBOD work. It is
> rather a blocker bug for KRAFT in general. The PR from Colin should fix
> this. Am not sure if it is a blocker for 3.7 per-say as it was a major bug
> since 3.3 and got missed from all other releases.
>
> Regarding the JBOD's work:
> KAFKA-16082:  Is not a blocker for 3.7 instead it's nice fix. The pr
> https://github.com/apache/kafka/pull/15136 is quite a small one and was
> approved by Proven and I but it is waiting for a committer's approval.
> KAFKA-16162: This is a blocker for 3.7.  Same it’s a small pr
> https://github.com/apache/kafka/pull/15270 and it is approved Proven and
> I and the PR is waiting for committer's approval.
> KAFKA-16157: This is a blocker for 3.7. There is one small suggestion for
> the pr https://github.com/apache/kafka/pull/15263 but I don't think any
> of the current feedback is blocking the pr from getting approved. Assuming
> we get a committer's approval on it.
> KAFKA-16195:  Same it's a blocker but it has approval from Proven and I
> and we are waiting for committer's approval on the pr
> https://github.com/apache/kafka/pull/15262.
>
> If we can’t get a committer approval for KAFKA-16162, KAFKA-16157 and
> KAFKA-16195  in time for 3.7 then we can mark JBOD as early release
> assuming we merge at least KAFKA-16195.
>
> Regards,
> Omnia
>
> > On 26 Jan 2024, at 15:39, ka...@gnarula.com wrote:
> >
> > Apologies, I duplicated KAFKA-16157 twice in my previous message. I
> intended to mention KAFKA-16195
> > with the PR at https://github.com/apache/kafka/pull/15262 as the second
> JIRA.
> >
> > Thanks,
> > Gaurav
> >
> >> On 26 Jan 2024, at 15:34, ka...@gnarula.com wrote:
> >>
> >> Hi Stan,
> >>
> >> I wanted to share some updates about the bugs you shared earlier.
> >>
> >> - KAFKA-14616: I've reviewed and tested the PR from Colin and have
> observed
> >> the fix works as intended.
> >> - KAFKA-16162: I reviewed Proven's PR and found some gaps in the
> proposed fix. I've
> >> therefore raised https://github.com/apache/kafka/pull/15270 following
> a discussion with Luke in JIRA.
> >> - KAFKA-16082: I don't think this is marked as a blocker anymore. I'm
> awaiting
> >> feedback/reviews at https://github.com/apache/kafka/pull/15136
> >>
> >> In addition to the above, there are 2 JIRAs I'd like to bring
> everyone's attention to:
> >>
> >> - KAFKA-16157: This is similar to KAFKA-14616 and is marked as a
> blocker. I've raised
> >> https://github.com/apache/kafka/pull/15263 and am awaiting reviews on
> it.
> >> - KAFKA-16157: I raised this yesterday and have addressed feedback from
> Luke. This should
> >> hopefully get merged soon.
> >>
> >> Regards,
> >> Gaurav
> >>
> >>
> >>> On 24 Jan 2024, at 11:51, ka...@gnarula.com wrote:
> >>>
> >>> Hi Stanislav,
> >>>
> >>> Thanks for bringing these JIRAs/PRs up.
> >>>
> >>> I'll be testing the open PRs for KAFKA-14616 and KAFKA-16162 this week
> and I hope to have some feedback
> >>> by Friday. I gather the latter JIRA is marked as a WIP by Proven and
> he's away. I'll try to build on his work in the meantime.
> >>>
> >>> As for KAFKA-16082, we haven't been able to deduce a data loss
> scenario. There's a PR open
> >>> by me for promoting an abandoned future replica with approvals from
> Omnia and Proven,
> >>> so I'd appreciate a committer reviewing it.
> >>>
> >>> Regards,
> >>> Gaurav
> >>>
> >>> On 23 Jan 2024, at 20:17, Stanislav Kozlovski 
> >>> 
> wrote:
> 
>  Hey all, I figured I'd give an update about what known blockers we
> have
>  right now:
> 
>  - KAFKA-16101: KRaft migration rollback documentation is incorrect -
>  https://github.com/apache/kafka/pull/15193; This need not block RC
>  creation, but we need the docs updated so that people can test
> properly
>  - KAFKA-14616: Topic recreation with offline broker causes permanent
> URPs -
>  https://github.com/apache/kafka/pull/15230 ; I am of the
> understanding that
>  this is blocking JBOD for 3.7
>  - KAFKA-16162: New cre

Re: [VOTE] 3.7.0 RC2

2024-01-29 Thread Omnia Ibrahim
Hi Stan and Gaurav, 
Just to clarify some points mentioned here before 
 KAFKA-14616: I raised a year ago so it's not related to JBOD work. It is 
rather a blocker bug for KRAFT in general. The PR from Colin should fix this. 
Am not sure if it is a blocker for 3.7 per-say as it was a major bug since 3.3 
and got missed from all other releases.
 
Regarding the JBOD's work: 
KAFKA-16082:  Is not a blocker for 3.7 instead it's nice fix. The pr 
https://github.com/apache/kafka/pull/15136 is quite a small one and was 
approved by Proven and I but it is waiting for a committer's approval.
KAFKA-16162: This is a blocker for 3.7.  Same it’s a small pr 
https://github.com/apache/kafka/pull/15270 and it is approved Proven and I and 
the PR is waiting for committer's approval. 
KAFKA-16157: This is a blocker for 3.7. There is one small suggestion for the 
pr https://github.com/apache/kafka/pull/15263 but I don't think any of the 
current feedback is blocking the pr from getting approved. Assuming we get a 
committer's approval on it. 
KAFKA-16195:  Same it's a blocker but it has approval from Proven and I and we 
are waiting for committer's approval on the pr 
https://github.com/apache/kafka/pull/15262. 

If we can’t get a committer approval for KAFKA-16162, KAFKA-16157 and 
KAFKA-16195  in time for 3.7 then we can mark JBOD as early release assuming we 
merge at least KAFKA-16195.

Regards, 
Omnia

> On 26 Jan 2024, at 15:39, ka...@gnarula.com wrote:
> 
> Apologies, I duplicated KAFKA-16157 twice in my previous message. I intended 
> to mention KAFKA-16195
> with the PR at https://github.com/apache/kafka/pull/15262 as the second JIRA.
> 
> Thanks,
> Gaurav
> 
>> On 26 Jan 2024, at 15:34, ka...@gnarula.com wrote:
>> 
>> Hi Stan,
>> 
>> I wanted to share some updates about the bugs you shared earlier.
>> 
>> - KAFKA-14616: I've reviewed and tested the PR from Colin and have observed
>> the fix works as intended.
>> - KAFKA-16162: I reviewed Proven's PR and found some gaps in the proposed 
>> fix. I've
>> therefore raised https://github.com/apache/kafka/pull/15270 following a 
>> discussion with Luke in JIRA.
>> - KAFKA-16082: I don't think this is marked as a blocker anymore. I'm 
>> awaiting
>> feedback/reviews at https://github.com/apache/kafka/pull/15136
>> 
>> In addition to the above, there are 2 JIRAs I'd like to bring everyone's 
>> attention to:
>> 
>> - KAFKA-16157: This is similar to KAFKA-14616 and is marked as a blocker. 
>> I've raised
>> https://github.com/apache/kafka/pull/15263 and am awaiting reviews on it.
>> - KAFKA-16157: I raised this yesterday and have addressed feedback from 
>> Luke. This should
>> hopefully get merged soon.
>> 
>> Regards,
>> Gaurav
>> 
>> 
>>> On 24 Jan 2024, at 11:51, ka...@gnarula.com wrote:
>>> 
>>> Hi Stanislav,
>>> 
>>> Thanks for bringing these JIRAs/PRs up.
>>> 
>>> I'll be testing the open PRs for KAFKA-14616 and KAFKA-16162 this week and 
>>> I hope to have some feedback
>>> by Friday. I gather the latter JIRA is marked as a WIP by Proven and he's 
>>> away. I'll try to build on his work in the meantime.
>>> 
>>> As for KAFKA-16082, we haven't been able to deduce a data loss scenario. 
>>> There's a PR open
>>> by me for promoting an abandoned future replica with approvals from Omnia 
>>> and Proven,
>>> so I'd appreciate a committer reviewing it.
>>> 
>>> Regards,
>>> Gaurav
>>> 
>>> On 23 Jan 2024, at 20:17, Stanislav Kozlovski 
>>>  wrote:
 
 Hey all, I figured I'd give an update about what known blockers we have
 right now:
 
 - KAFKA-16101: KRaft migration rollback documentation is incorrect -
 https://github.com/apache/kafka/pull/15193; This need not block RC
 creation, but we need the docs updated so that people can test properly
 - KAFKA-14616: Topic recreation with offline broker causes permanent URPs -
 https://github.com/apache/kafka/pull/15230 ; I am of the understanding that
 this is blocking JBOD for 3.7
 - KAFKA-16162: New created topics are unavailable after upgrading to 3.7 -
 a strict blocker with an open PR https://github.com/apache/kafka/pull/15232
 - although I understand Proveen is out of office
 - KAFKA-16082: JBOD: Possible dataloss when moving leader partition - I am
 hearing mixed opinions on whether this is a blocker (
 https://github.com/apache/kafka/pull/15136)
 
 Given that there are 3 JBOD blocker bugs, and I am not confident they will
 all be merged this week - I am on the edge of voting to revert JBOD from
 this release, or mark it early access.
 
 By all accounts, it seems that if we keep with JBOD the release will have
 to spill into February, which is a month extra from the time-based release
 plan we had of start of January.
 
 Can I ask others for an opinion?
 
 Best,
 Stan
 
 On Thu, Jan 18, 2024 at 1:21 PM Luke Chen  wrote:
 
> Hi all,
> 
> I think I've found another blocker issue: KAFKA-16

Re: [VOTE] 3.7.0 RC2

2024-01-26 Thread kafka
Apologies, I duplicated KAFKA-16157 twice in my previous message. I intended to 
mention KAFKA-16195
with the PR at https://github.com/apache/kafka/pull/15262 as the second JIRA.

Thanks,
Gaurav

> On 26 Jan 2024, at 15:34, ka...@gnarula.com wrote:
> 
> Hi Stan,
> 
> I wanted to share some updates about the bugs you shared earlier.
> 
> - KAFKA-14616: I've reviewed and tested the PR from Colin and have observed
> the fix works as intended.
> - KAFKA-16162: I reviewed Proven's PR and found some gaps in the proposed 
> fix. I've
> therefore raised https://github.com/apache/kafka/pull/15270 following a 
> discussion with Luke in JIRA.
> - KAFKA-16082: I don't think this is marked as a blocker anymore. I'm awaiting
> feedback/reviews at https://github.com/apache/kafka/pull/15136
> 
> In addition to the above, there are 2 JIRAs I'd like to bring everyone's 
> attention to:
> 
> - KAFKA-16157: This is similar to KAFKA-14616 and is marked as a blocker. 
> I've raised
> https://github.com/apache/kafka/pull/15263 and am awaiting reviews on it.
> - KAFKA-16157: I raised this yesterday and have addressed feedback from Luke. 
> This should
> hopefully get merged soon.
> 
> Regards,
> Gaurav
> 
> 
>> On 24 Jan 2024, at 11:51, ka...@gnarula.com wrote:
>> 
>> Hi Stanislav,
>> 
>> Thanks for bringing these JIRAs/PRs up.
>> 
>> I'll be testing the open PRs for KAFKA-14616 and KAFKA-16162 this week and I 
>> hope to have some feedback
>> by Friday. I gather the latter JIRA is marked as a WIP by Proven and he's 
>> away. I'll try to build on his work in the meantime.
>> 
>> As for KAFKA-16082, we haven't been able to deduce a data loss scenario. 
>> There's a PR open
>> by me for promoting an abandoned future replica with approvals from Omnia 
>> and Proven,
>> so I'd appreciate a committer reviewing it.
>> 
>> Regards,
>> Gaurav
>> 
>> On 23 Jan 2024, at 20:17, Stanislav Kozlovski 
>>  wrote:
>>> 
>>> Hey all, I figured I'd give an update about what known blockers we have
>>> right now:
>>> 
>>> - KAFKA-16101: KRaft migration rollback documentation is incorrect -
>>> https://github.com/apache/kafka/pull/15193; This need not block RC
>>> creation, but we need the docs updated so that people can test properly
>>> - KAFKA-14616: Topic recreation with offline broker causes permanent URPs -
>>> https://github.com/apache/kafka/pull/15230 ; I am of the understanding that
>>> this is blocking JBOD for 3.7
>>> - KAFKA-16162: New created topics are unavailable after upgrading to 3.7 -
>>> a strict blocker with an open PR https://github.com/apache/kafka/pull/15232
>>> - although I understand Proveen is out of office
>>> - KAFKA-16082: JBOD: Possible dataloss when moving leader partition - I am
>>> hearing mixed opinions on whether this is a blocker (
>>> https://github.com/apache/kafka/pull/15136)
>>> 
>>> Given that there are 3 JBOD blocker bugs, and I am not confident they will
>>> all be merged this week - I am on the edge of voting to revert JBOD from
>>> this release, or mark it early access.
>>> 
>>> By all accounts, it seems that if we keep with JBOD the release will have
>>> to spill into February, which is a month extra from the time-based release
>>> plan we had of start of January.
>>> 
>>> Can I ask others for an opinion?
>>> 
>>> Best,
>>> Stan
>>> 
>>> On Thu, Jan 18, 2024 at 1:21 PM Luke Chen  wrote:
>>> 
 Hi all,
 
 I think I've found another blocker issue: KAFKA-16162
  .
 The impact is after upgrading to 3.7.0, any new created topics/partitions
 will be unavailable.
 I've put my findings in the JIRA.
 
 Thanks.
 Luke
 
 On Thu, Jan 18, 2024 at 9:50 AM Matthias J. Sax  wrote:
 
> Stan, thanks for driving this all forward! Excellent job.
> 
> About
> 
>> StreamsStandbyTask - https://issues.apache.org/jira/browse/KAFKA-16141
>> StreamsUpgradeTest - https://issues.apache.org/jira/browse/KAFKA-16139
> 
> For `StreamsUpgradeTest` it was a test setup issue and should be fixed
> now in trunk and 3.7 (and actually also in 3.6...)
> 
> For `StreamsStandbyTask` the failing test exposes a regression bug, so
> it's a blocker. I updated the ticket accordingly. We already have an
> open PR that reverts the code introducing the regression.
> 
> 
> -Matthias
> 
> On 1/17/24 9:44 AM, Proven Provenzano wrote:
>> We have another blocking issue for the RC :
>> https://issues.apache.org/jira/browse/KAFKA-16157. This bug is similar
> to
>> https://issues.apache.org/jira/browse/KAFKA-14616. The new issue
 however
>> can lead to the new topic having partitions that a producer cannot
 write
> to.
>> 
>> --Proven
>> 
>> On Tue, Jan 16, 2024 at 12:04 PM Proven Provenzano <
> pprovenz...@confluent.io>
>> wrote:
>> 
>>> 
>>> I have a PR https://github.com/apache/kafka/pull/15197 for
>>> https://issues.ap

Re: [VOTE] 3.7.0 RC2

2024-01-26 Thread kafka
Hi Stan,

I wanted to share some updates about the bugs you shared earlier.

- KAFKA-14616: I've reviewed and tested the PR from Colin and have observed
the fix works as intended.
- KAFKA-16162: I reviewed Proven's PR and found some gaps in the proposed fix. 
I've
therefore raised https://github.com/apache/kafka/pull/15270 following a 
discussion with Luke in JIRA.
- KAFKA-16082: I don't think this is marked as a blocker anymore. I'm awaiting
feedback/reviews at https://github.com/apache/kafka/pull/15136

In addition to the above, there are 2 JIRAs I'd like to bring everyone's 
attention to:

- KAFKA-16157: This is similar to KAFKA-14616 and is marked as a blocker. I've 
raised
https://github.com/apache/kafka/pull/15263 and am awaiting reviews on it.
- KAFKA-16157: I raised this yesterday and have addressed feedback from Luke. 
This should
hopefully get merged soon.

Regards,
Gaurav


> On 24 Jan 2024, at 11:51, ka...@gnarula.com wrote:
> 
> Hi Stanislav,
> 
> Thanks for bringing these JIRAs/PRs up.
> 
> I'll be testing the open PRs for KAFKA-14616 and KAFKA-16162 this week and I 
> hope to have some feedback
> by Friday. I gather the latter JIRA is marked as a WIP by Proven and he's 
> away. I'll try to build on his work in the meantime.
> 
> As for KAFKA-16082, we haven't been able to deduce a data loss scenario. 
> There's a PR open
> by me for promoting an abandoned future replica with approvals from Omnia and 
> Proven,
> so I'd appreciate a committer reviewing it.
> 
> Regards,
> Gaurav
> 
> On 23 Jan 2024, at 20:17, Stanislav Kozlovski 
>  wrote:
>> 
>> Hey all, I figured I'd give an update about what known blockers we have
>> right now:
>> 
>> - KAFKA-16101: KRaft migration rollback documentation is incorrect -
>> https://github.com/apache/kafka/pull/15193; This need not block RC
>> creation, but we need the docs updated so that people can test properly
>> - KAFKA-14616: Topic recreation with offline broker causes permanent URPs -
>> https://github.com/apache/kafka/pull/15230 ; I am of the understanding that
>> this is blocking JBOD for 3.7
>> - KAFKA-16162: New created topics are unavailable after upgrading to 3.7 -
>> a strict blocker with an open PR https://github.com/apache/kafka/pull/15232
>> - although I understand Proveen is out of office
>> - KAFKA-16082: JBOD: Possible dataloss when moving leader partition - I am
>> hearing mixed opinions on whether this is a blocker (
>> https://github.com/apache/kafka/pull/15136)
>> 
>> Given that there are 3 JBOD blocker bugs, and I am not confident they will
>> all be merged this week - I am on the edge of voting to revert JBOD from
>> this release, or mark it early access.
>> 
>> By all accounts, it seems that if we keep with JBOD the release will have
>> to spill into February, which is a month extra from the time-based release
>> plan we had of start of January.
>> 
>> Can I ask others for an opinion?
>> 
>> Best,
>> Stan
>> 
>> On Thu, Jan 18, 2024 at 1:21 PM Luke Chen  wrote:
>> 
>>> Hi all,
>>> 
>>> I think I've found another blocker issue: KAFKA-16162
>>>  .
>>> The impact is after upgrading to 3.7.0, any new created topics/partitions
>>> will be unavailable.
>>> I've put my findings in the JIRA.
>>> 
>>> Thanks.
>>> Luke
>>> 
>>> On Thu, Jan 18, 2024 at 9:50 AM Matthias J. Sax  wrote:
>>> 
 Stan, thanks for driving this all forward! Excellent job.
 
 About
 
> StreamsStandbyTask - https://issues.apache.org/jira/browse/KAFKA-16141
> StreamsUpgradeTest - https://issues.apache.org/jira/browse/KAFKA-16139
 
 For `StreamsUpgradeTest` it was a test setup issue and should be fixed
 now in trunk and 3.7 (and actually also in 3.6...)
 
 For `StreamsStandbyTask` the failing test exposes a regression bug, so
 it's a blocker. I updated the ticket accordingly. We already have an
 open PR that reverts the code introducing the regression.
 
 
 -Matthias
 
 On 1/17/24 9:44 AM, Proven Provenzano wrote:
> We have another blocking issue for the RC :
> https://issues.apache.org/jira/browse/KAFKA-16157. This bug is similar
 to
> https://issues.apache.org/jira/browse/KAFKA-14616. The new issue
>>> however
> can lead to the new topic having partitions that a producer cannot
>>> write
 to.
> 
> --Proven
> 
> On Tue, Jan 16, 2024 at 12:04 PM Proven Provenzano <
 pprovenz...@confluent.io>
> wrote:
> 
>> 
>> I have a PR https://github.com/apache/kafka/pull/15197 for
>> https://issues.apache.org/jira/browse/KAFKA-16131 that is building
>>> now.
>> --Proven
>> 
>> On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz  wrote:
>> 
>>> *> Hi Jakub,> > Thanks for trying the RC. I think what you found is a
>>> blocker bug because it *
>>> *> will generate huge amount of logspam. I guess we didn't find it in
>>> junit
>>> tests *
>>> *> since logspam doesn't fail t

Re: [VOTE] 3.7.0 RC2

2024-01-24 Thread kafka
Hi Stanislav,

Thanks for bringing these JIRAs/PRs up.

I'll be testing the open PRs for KAFKA-14616 and KAFKA-16162 this week and I 
hope to have some feedback
by Friday. I gather the latter JIRA is marked as a WIP by Proven and he's away. 
I'll try to build on his work in the meantime.

As for KAFKA-16082, we haven't been able to deduce a data loss scenario. 
There's a PR open
by me for promoting an abandoned future replica with approvals from Omnia and 
Proven,
so I'd appreciate a committer reviewing it.

Regards,
Gaurav

On 23 Jan 2024, at 20:17, Stanislav Kozlovski  
wrote:
> 
> Hey all, I figured I'd give an update about what known blockers we have
> right now:
> 
> - KAFKA-16101: KRaft migration rollback documentation is incorrect -
> https://github.com/apache/kafka/pull/15193; This need not block RC
> creation, but we need the docs updated so that people can test properly
> - KAFKA-14616: Topic recreation with offline broker causes permanent URPs -
> https://github.com/apache/kafka/pull/15230 ; I am of the understanding that
> this is blocking JBOD for 3.7
> - KAFKA-16162: New created topics are unavailable after upgrading to 3.7 -
> a strict blocker with an open PR https://github.com/apache/kafka/pull/15232
> - although I understand Proveen is out of office
> - KAFKA-16082: JBOD: Possible dataloss when moving leader partition - I am
> hearing mixed opinions on whether this is a blocker (
> https://github.com/apache/kafka/pull/15136)
> 
> Given that there are 3 JBOD blocker bugs, and I am not confident they will
> all be merged this week - I am on the edge of voting to revert JBOD from
> this release, or mark it early access.
> 
> By all accounts, it seems that if we keep with JBOD the release will have
> to spill into February, which is a month extra from the time-based release
> plan we had of start of January.
> 
> Can I ask others for an opinion?
> 
> Best,
> Stan
> 
> On Thu, Jan 18, 2024 at 1:21 PM Luke Chen  wrote:
> 
>> Hi all,
>> 
>> I think I've found another blocker issue: KAFKA-16162
>>  .
>> The impact is after upgrading to 3.7.0, any new created topics/partitions
>> will be unavailable.
>> I've put my findings in the JIRA.
>> 
>> Thanks.
>> Luke
>> 
>> On Thu, Jan 18, 2024 at 9:50 AM Matthias J. Sax  wrote:
>> 
>>> Stan, thanks for driving this all forward! Excellent job.
>>> 
>>> About
>>> 
 StreamsStandbyTask - https://issues.apache.org/jira/browse/KAFKA-16141
 StreamsUpgradeTest - https://issues.apache.org/jira/browse/KAFKA-16139
>>> 
>>> For `StreamsUpgradeTest` it was a test setup issue and should be fixed
>>> now in trunk and 3.7 (and actually also in 3.6...)
>>> 
>>> For `StreamsStandbyTask` the failing test exposes a regression bug, so
>>> it's a blocker. I updated the ticket accordingly. We already have an
>>> open PR that reverts the code introducing the regression.
>>> 
>>> 
>>> -Matthias
>>> 
>>> On 1/17/24 9:44 AM, Proven Provenzano wrote:
 We have another blocking issue for the RC :
 https://issues.apache.org/jira/browse/KAFKA-16157. This bug is similar
>>> to
 https://issues.apache.org/jira/browse/KAFKA-14616. The new issue
>> however
 can lead to the new topic having partitions that a producer cannot
>> write
>>> to.
 
 --Proven
 
 On Tue, Jan 16, 2024 at 12:04 PM Proven Provenzano <
>>> pprovenz...@confluent.io>
 wrote:
 
> 
> I have a PR https://github.com/apache/kafka/pull/15197 for
> https://issues.apache.org/jira/browse/KAFKA-16131 that is building
>> now.
> --Proven
> 
> On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz  wrote:
> 
>> *> Hi Jakub,> > Thanks for trying the RC. I think what you found is a
>> blocker bug because it *
>> *> will generate huge amount of logspam. I guess we didn't find it in
>> junit
>> tests *
>> *> since logspam doesn't fail the automated tests. But certainly it's
>>> not
>> suitable *
>> *> for production. Did you file a JIRA yet?*
>> 
>> Hi Colin,
>> 
>> I opened https://issues.apache.org/jira/browse/KAFKA-16131.
>> 
>> Thanks & Regards
>> Jakub
>> 
>> On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe 
>>> wrote:
>> 
>>> Hi Stanislav,
>>> 
>>> Thanks for making the first RC. The fact that it's titled RC2 is
>>> messing
>>> with my mind a bit. I hope this doesn't make people think that we're
>>> farther along than we are, heh.
>>> 
>>> On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
 *> Nice catch! It does seem like we should have gated this behind
>> the
 metadata> version as KIP-858 implies. Is the cluster configured
>> with
 multiple log> dirs? What is the impact of the error messages?*
 
 I did not observe any obvious impact. I was able to send and
>> receive
 messages as normally. But to be honest, I have no idea what else
 this might impact, so I did not t

Re: [VOTE] 3.7.0 RC2

2024-01-23 Thread Justine Olshan
Oops sorry I got confused between
https://issues.apache.org/jira/browse/KAFKA-16120 (migration issue) and
https://issues.apache.org/jira/browse/KAFKA-14616 (not migration issue)

However, both do not seem related to JBOD based on the jira and PRs

Justine

On Tue, Jan 23, 2024 at 1:51 PM Justine Olshan  wrote:

> Hey Stan,
>
> Just wanted to clarify -- KAFKA-14616 is not particularly related to JBOD
> but to ZK -> KRaft migration.
>
> There were some other related migration bugs like
> https://issues.apache.org/jira/browse/KAFKA-16180.
>
> This may or may not influence decisions, but wanted to paint the full
> picture of the blocker bugs and their causes.
>
> Justine
>
>
> On Tue, Jan 23, 2024 at 12:17 PM Stanislav Kozlovski
>  wrote:
>
>> Hey all, I figured I'd give an update about what known blockers we have
>> right now:
>>
>> - KAFKA-16101: KRaft migration rollback documentation is incorrect -
>> https://github.com/apache/kafka/pull/15193; This need not block RC
>> creation, but we need the docs updated so that people can test properly
>> - KAFKA-14616: Topic recreation with offline broker causes permanent URPs
>> -
>> https://github.com/apache/kafka/pull/15230 ; I am of the understanding
>> that
>> this is blocking JBOD for 3.7
>> - KAFKA-16162: New created topics are unavailable after upgrading to 3.7 -
>> a strict blocker with an open PR
>> https://github.com/apache/kafka/pull/15232
>> - although I understand Proveen is out of office
>> - KAFKA-16082: JBOD: Possible dataloss when moving leader partition - I am
>> hearing mixed opinions on whether this is a blocker (
>> https://github.com/apache/kafka/pull/15136)
>>
>> Given that there are 3 JBOD blocker bugs, and I am not confident they will
>> all be merged this week - I am on the edge of voting to revert JBOD from
>> this release, or mark it early access.
>>
>> By all accounts, it seems that if we keep with JBOD the release will have
>> to spill into February, which is a month extra from the time-based release
>> plan we had of start of January.
>>
>> Can I ask others for an opinion?
>>
>> Best,
>> Stan
>>
>> On Thu, Jan 18, 2024 at 1:21 PM Luke Chen  wrote:
>>
>> > Hi all,
>> >
>> > I think I've found another blocker issue: KAFKA-16162
>> >  .
>> > The impact is after upgrading to 3.7.0, any new created
>> topics/partitions
>> > will be unavailable.
>> > I've put my findings in the JIRA.
>> >
>> > Thanks.
>> > Luke
>> >
>> > On Thu, Jan 18, 2024 at 9:50 AM Matthias J. Sax 
>> wrote:
>> >
>> > > Stan, thanks for driving this all forward! Excellent job.
>> > >
>> > > About
>> > >
>> > > > StreamsStandbyTask -
>> https://issues.apache.org/jira/browse/KAFKA-16141
>> > > > StreamsUpgradeTest -
>> https://issues.apache.org/jira/browse/KAFKA-16139
>> > >
>> > > For `StreamsUpgradeTest` it was a test setup issue and should be fixed
>> > > now in trunk and 3.7 (and actually also in 3.6...)
>> > >
>> > > For `StreamsStandbyTask` the failing test exposes a regression bug, so
>> > > it's a blocker. I updated the ticket accordingly. We already have an
>> > > open PR that reverts the code introducing the regression.
>> > >
>> > >
>> > > -Matthias
>> > >
>> > > On 1/17/24 9:44 AM, Proven Provenzano wrote:
>> > > > We have another blocking issue for the RC :
>> > > > https://issues.apache.org/jira/browse/KAFKA-16157. This bug is
>> similar
>> > > to
>> > > > https://issues.apache.org/jira/browse/KAFKA-14616. The new issue
>> > however
>> > > > can lead to the new topic having partitions that a producer cannot
>> > write
>> > > to.
>> > > >
>> > > > --Proven
>> > > >
>> > > > On Tue, Jan 16, 2024 at 12:04 PM Proven Provenzano <
>> > > pprovenz...@confluent.io>
>> > > > wrote:
>> > > >
>> > > >>
>> > > >> I have a PR https://github.com/apache/kafka/pull/15197 for
>> > > >> https://issues.apache.org/jira/browse/KAFKA-16131 that is building
>> > now.
>> > > >> --Proven
>> > > >>
>> > > >> On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz 
>> wrote:
>> > > >>
>> > > >>> *> Hi Jakub,> > Thanks for trying the RC. I think what you found
>> is a
>> > > >>> blocker bug because it *
>> > > >>> *> will generate huge amount of logspam. I guess we didn't find
>> it in
>> > > >>> junit
>> > > >>> tests *
>> > > >>> *> since logspam doesn't fail the automated tests. But certainly
>> it's
>> > > not
>> > > >>> suitable *
>> > > >>> *> for production. Did you file a JIRA yet?*
>> > > >>>
>> > > >>> Hi Colin,
>> > > >>>
>> > > >>> I opened https://issues.apache.org/jira/browse/KAFKA-16131.
>> > > >>>
>> > > >>> Thanks & Regards
>> > > >>> Jakub
>> > > >>>
>> > > >>> On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe 
>> > > wrote:
>> > > >>>
>> > >  Hi Stanislav,
>> > > 
>> > >  Thanks for making the first RC. The fact that it's titled RC2 is
>> > > messing
>> > >  with my mind a bit. I hope this doesn't make people think that
>> we're
>> > >  farther along than we are, heh.
>> > > 
>> > >  On Sun, Jan 14, 2024, a

Re: [VOTE] 3.7.0 RC2

2024-01-23 Thread Justine Olshan
Hey Stan,

Just wanted to clarify -- KAFKA-14616 is not particularly related to JBOD
but to ZK -> KRaft migration.

There were some other related migration bugs like
https://issues.apache.org/jira/browse/KAFKA-16180.

This may or may not influence decisions, but wanted to paint the full
picture of the blocker bugs and their causes.

Justine


On Tue, Jan 23, 2024 at 12:17 PM Stanislav Kozlovski
 wrote:

> Hey all, I figured I'd give an update about what known blockers we have
> right now:
>
> - KAFKA-16101: KRaft migration rollback documentation is incorrect -
> https://github.com/apache/kafka/pull/15193; This need not block RC
> creation, but we need the docs updated so that people can test properly
> - KAFKA-14616: Topic recreation with offline broker causes permanent URPs -
> https://github.com/apache/kafka/pull/15230 ; I am of the understanding
> that
> this is blocking JBOD for 3.7
> - KAFKA-16162: New created topics are unavailable after upgrading to 3.7 -
> a strict blocker with an open PR
> https://github.com/apache/kafka/pull/15232
> - although I understand Proveen is out of office
> - KAFKA-16082: JBOD: Possible dataloss when moving leader partition - I am
> hearing mixed opinions on whether this is a blocker (
> https://github.com/apache/kafka/pull/15136)
>
> Given that there are 3 JBOD blocker bugs, and I am not confident they will
> all be merged this week - I am on the edge of voting to revert JBOD from
> this release, or mark it early access.
>
> By all accounts, it seems that if we keep with JBOD the release will have
> to spill into February, which is a month extra from the time-based release
> plan we had of start of January.
>
> Can I ask others for an opinion?
>
> Best,
> Stan
>
> On Thu, Jan 18, 2024 at 1:21 PM Luke Chen  wrote:
>
> > Hi all,
> >
> > I think I've found another blocker issue: KAFKA-16162
> >  .
> > The impact is after upgrading to 3.7.0, any new created topics/partitions
> > will be unavailable.
> > I've put my findings in the JIRA.
> >
> > Thanks.
> > Luke
> >
> > On Thu, Jan 18, 2024 at 9:50 AM Matthias J. Sax 
> wrote:
> >
> > > Stan, thanks for driving this all forward! Excellent job.
> > >
> > > About
> > >
> > > > StreamsStandbyTask -
> https://issues.apache.org/jira/browse/KAFKA-16141
> > > > StreamsUpgradeTest -
> https://issues.apache.org/jira/browse/KAFKA-16139
> > >
> > > For `StreamsUpgradeTest` it was a test setup issue and should be fixed
> > > now in trunk and 3.7 (and actually also in 3.6...)
> > >
> > > For `StreamsStandbyTask` the failing test exposes a regression bug, so
> > > it's a blocker. I updated the ticket accordingly. We already have an
> > > open PR that reverts the code introducing the regression.
> > >
> > >
> > > -Matthias
> > >
> > > On 1/17/24 9:44 AM, Proven Provenzano wrote:
> > > > We have another blocking issue for the RC :
> > > > https://issues.apache.org/jira/browse/KAFKA-16157. This bug is
> similar
> > > to
> > > > https://issues.apache.org/jira/browse/KAFKA-14616. The new issue
> > however
> > > > can lead to the new topic having partitions that a producer cannot
> > write
> > > to.
> > > >
> > > > --Proven
> > > >
> > > > On Tue, Jan 16, 2024 at 12:04 PM Proven Provenzano <
> > > pprovenz...@confluent.io>
> > > > wrote:
> > > >
> > > >>
> > > >> I have a PR https://github.com/apache/kafka/pull/15197 for
> > > >> https://issues.apache.org/jira/browse/KAFKA-16131 that is building
> > now.
> > > >> --Proven
> > > >>
> > > >> On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz 
> wrote:
> > > >>
> > > >>> *> Hi Jakub,> > Thanks for trying the RC. I think what you found
> is a
> > > >>> blocker bug because it *
> > > >>> *> will generate huge amount of logspam. I guess we didn't find it
> in
> > > >>> junit
> > > >>> tests *
> > > >>> *> since logspam doesn't fail the automated tests. But certainly
> it's
> > > not
> > > >>> suitable *
> > > >>> *> for production. Did you file a JIRA yet?*
> > > >>>
> > > >>> Hi Colin,
> > > >>>
> > > >>> I opened https://issues.apache.org/jira/browse/KAFKA-16131.
> > > >>>
> > > >>> Thanks & Regards
> > > >>> Jakub
> > > >>>
> > > >>> On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe 
> > > wrote:
> > > >>>
> > >  Hi Stanislav,
> > > 
> > >  Thanks for making the first RC. The fact that it's titled RC2 is
> > > messing
> > >  with my mind a bit. I hope this doesn't make people think that
> we're
> > >  farther along than we are, heh.
> > > 
> > >  On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
> > > > *> Nice catch! It does seem like we should have gated this behind
> > the
> > > > metadata> version as KIP-858 implies. Is the cluster configured
> > with
> > > > multiple log> dirs? What is the impact of the error messages?*
> > > >
> > > > I did not observe any obvious impact. I was able to send and
> > receive
> > > > messages as normally. But to be honest, I have no idea what else
> > > > this migh

Re: [VOTE] 3.7.0 RC2

2024-01-23 Thread Stanislav Kozlovski
Hey all, I figured I'd give an update about what known blockers we have
right now:

- KAFKA-16101: KRaft migration rollback documentation is incorrect -
https://github.com/apache/kafka/pull/15193; This need not block RC
creation, but we need the docs updated so that people can test properly
- KAFKA-14616: Topic recreation with offline broker causes permanent URPs -
https://github.com/apache/kafka/pull/15230 ; I am of the understanding that
this is blocking JBOD for 3.7
- KAFKA-16162: New created topics are unavailable after upgrading to 3.7 -
a strict blocker with an open PR https://github.com/apache/kafka/pull/15232
- although I understand Proveen is out of office
- KAFKA-16082: JBOD: Possible dataloss when moving leader partition - I am
hearing mixed opinions on whether this is a blocker (
https://github.com/apache/kafka/pull/15136)

Given that there are 3 JBOD blocker bugs, and I am not confident they will
all be merged this week - I am on the edge of voting to revert JBOD from
this release, or mark it early access.

By all accounts, it seems that if we keep with JBOD the release will have
to spill into February, which is a month extra from the time-based release
plan we had of start of January.

Can I ask others for an opinion?

Best,
Stan

On Thu, Jan 18, 2024 at 1:21 PM Luke Chen  wrote:

> Hi all,
>
> I think I've found another blocker issue: KAFKA-16162
>  .
> The impact is after upgrading to 3.7.0, any new created topics/partitions
> will be unavailable.
> I've put my findings in the JIRA.
>
> Thanks.
> Luke
>
> On Thu, Jan 18, 2024 at 9:50 AM Matthias J. Sax  wrote:
>
> > Stan, thanks for driving this all forward! Excellent job.
> >
> > About
> >
> > > StreamsStandbyTask - https://issues.apache.org/jira/browse/KAFKA-16141
> > > StreamsUpgradeTest - https://issues.apache.org/jira/browse/KAFKA-16139
> >
> > For `StreamsUpgradeTest` it was a test setup issue and should be fixed
> > now in trunk and 3.7 (and actually also in 3.6...)
> >
> > For `StreamsStandbyTask` the failing test exposes a regression bug, so
> > it's a blocker. I updated the ticket accordingly. We already have an
> > open PR that reverts the code introducing the regression.
> >
> >
> > -Matthias
> >
> > On 1/17/24 9:44 AM, Proven Provenzano wrote:
> > > We have another blocking issue for the RC :
> > > https://issues.apache.org/jira/browse/KAFKA-16157. This bug is similar
> > to
> > > https://issues.apache.org/jira/browse/KAFKA-14616. The new issue
> however
> > > can lead to the new topic having partitions that a producer cannot
> write
> > to.
> > >
> > > --Proven
> > >
> > > On Tue, Jan 16, 2024 at 12:04 PM Proven Provenzano <
> > pprovenz...@confluent.io>
> > > wrote:
> > >
> > >>
> > >> I have a PR https://github.com/apache/kafka/pull/15197 for
> > >> https://issues.apache.org/jira/browse/KAFKA-16131 that is building
> now.
> > >> --Proven
> > >>
> > >> On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz  wrote:
> > >>
> > >>> *> Hi Jakub,> > Thanks for trying the RC. I think what you found is a
> > >>> blocker bug because it *
> > >>> *> will generate huge amount of logspam. I guess we didn't find it in
> > >>> junit
> > >>> tests *
> > >>> *> since logspam doesn't fail the automated tests. But certainly it's
> > not
> > >>> suitable *
> > >>> *> for production. Did you file a JIRA yet?*
> > >>>
> > >>> Hi Colin,
> > >>>
> > >>> I opened https://issues.apache.org/jira/browse/KAFKA-16131.
> > >>>
> > >>> Thanks & Regards
> > >>> Jakub
> > >>>
> > >>> On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe 
> > wrote:
> > >>>
> >  Hi Stanislav,
> > 
> >  Thanks for making the first RC. The fact that it's titled RC2 is
> > messing
> >  with my mind a bit. I hope this doesn't make people think that we're
> >  farther along than we are, heh.
> > 
> >  On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
> > > *> Nice catch! It does seem like we should have gated this behind
> the
> > > metadata> version as KIP-858 implies. Is the cluster configured
> with
> > > multiple log> dirs? What is the impact of the error messages?*
> > >
> > > I did not observe any obvious impact. I was able to send and
> receive
> > > messages as normally. But to be honest, I have no idea what else
> > > this might impact, so I did not try anything special.
> > >
> > > I think everyone upgrading an existing KRaft cluster will go
> through
> > >>> this
> > > stage (running Kafka 3.7 with an older metadata version for at
> least
> > a
> > > while). So even if it is just a logged exception without any other
> >  impact I
> > > wonder if it might scare users from upgrading. But I leave it to
> > >>> others
> >  to
> > > decide if this is a blocker or not.
> > >
> > 
> >  Hi Jakub,
> > 
> >  Thanks for trying the RC. I think what you found is a blocker bug
> > >>> because
> >  it will generate huge amount of logspam. I

Re: [VOTE] 3.7.0 RC2

2024-01-18 Thread Luke Chen
Hi all,

I think I've found another blocker issue: KAFKA-16162
 .
The impact is after upgrading to 3.7.0, any new created topics/partitions
will be unavailable.
I've put my findings in the JIRA.

Thanks.
Luke

On Thu, Jan 18, 2024 at 9:50 AM Matthias J. Sax  wrote:

> Stan, thanks for driving this all forward! Excellent job.
>
> About
>
> > StreamsStandbyTask - https://issues.apache.org/jira/browse/KAFKA-16141
> > StreamsUpgradeTest - https://issues.apache.org/jira/browse/KAFKA-16139
>
> For `StreamsUpgradeTest` it was a test setup issue and should be fixed
> now in trunk and 3.7 (and actually also in 3.6...)
>
> For `StreamsStandbyTask` the failing test exposes a regression bug, so
> it's a blocker. I updated the ticket accordingly. We already have an
> open PR that reverts the code introducing the regression.
>
>
> -Matthias
>
> On 1/17/24 9:44 AM, Proven Provenzano wrote:
> > We have another blocking issue for the RC :
> > https://issues.apache.org/jira/browse/KAFKA-16157. This bug is similar
> to
> > https://issues.apache.org/jira/browse/KAFKA-14616. The new issue however
> > can lead to the new topic having partitions that a producer cannot write
> to.
> >
> > --Proven
> >
> > On Tue, Jan 16, 2024 at 12:04 PM Proven Provenzano <
> pprovenz...@confluent.io>
> > wrote:
> >
> >>
> >> I have a PR https://github.com/apache/kafka/pull/15197 for
> >> https://issues.apache.org/jira/browse/KAFKA-16131 that is building now.
> >> --Proven
> >>
> >> On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz  wrote:
> >>
> >>> *> Hi Jakub,> > Thanks for trying the RC. I think what you found is a
> >>> blocker bug because it *
> >>> *> will generate huge amount of logspam. I guess we didn't find it in
> >>> junit
> >>> tests *
> >>> *> since logspam doesn't fail the automated tests. But certainly it's
> not
> >>> suitable *
> >>> *> for production. Did you file a JIRA yet?*
> >>>
> >>> Hi Colin,
> >>>
> >>> I opened https://issues.apache.org/jira/browse/KAFKA-16131.
> >>>
> >>> Thanks & Regards
> >>> Jakub
> >>>
> >>> On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe 
> wrote:
> >>>
>  Hi Stanislav,
> 
>  Thanks for making the first RC. The fact that it's titled RC2 is
> messing
>  with my mind a bit. I hope this doesn't make people think that we're
>  farther along than we are, heh.
> 
>  On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
> > *> Nice catch! It does seem like we should have gated this behind the
> > metadata> version as KIP-858 implies. Is the cluster configured with
> > multiple log> dirs? What is the impact of the error messages?*
> >
> > I did not observe any obvious impact. I was able to send and receive
> > messages as normally. But to be honest, I have no idea what else
> > this might impact, so I did not try anything special.
> >
> > I think everyone upgrading an existing KRaft cluster will go through
> >>> this
> > stage (running Kafka 3.7 with an older metadata version for at least
> a
> > while). So even if it is just a logged exception without any other
>  impact I
> > wonder if it might scare users from upgrading. But I leave it to
> >>> others
>  to
> > decide if this is a blocker or not.
> >
> 
>  Hi Jakub,
> 
>  Thanks for trying the RC. I think what you found is a blocker bug
> >>> because
>  it will generate huge amount of logspam. I guess we didn't find it in
> >>> junit
>  tests since logspam doesn't fail the automated tests. But certainly
> it's
>  not suitable for production. Did you file a JIRA yet?
> 
> > On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
> >  wrote:
> >
> >> Hey Luke,
> >>
> >> This is an interesting problem. Given the fact that the KIP for
> >>> having a
> >> 3.8 release passed, I think it weights the scale towards not calling
>  this a
> >> blocker and expecting it to be solved in 3.7.1.
> >>
> >> It is unfortunate that it would not seem safe to migrate to KRaft in
>  3.7.0
> >> (given the inability to rollback safely), but if that's true - the
> >>> same
> >> case would apply for 3.6.0. So in any case users w\ould be expected
> >>> to
>  use a
> >> patch release for this.
> 
>  Hi Luke,
> 
>  Thanks for testing rollback. I think this is a case where the
>  documentation is wrong. The intention was to for the steps to
> basically
> >>> be:
> 
>  1. roll all the brokers into zk mode, but with migration enabled
>  2. take down the kraft quorum
>  3. rmr /controller, allowing a hybrid broker to take over.
>  4. roll all the brokers into zk mode without migration enabled (if
> >>> desired)
> 
>  With these steps, there isn't really unavailability since a ZK
> >>> controller
>  can be elected quickly after the kraft quorum is gone.
> 
> >> Further, since we will have a 3.8 release - 

Re: [VOTE] 3.7.0 RC2

2024-01-17 Thread Matthias J. Sax

Stan, thanks for driving this all forward! Excellent job.

About


StreamsStandbyTask - https://issues.apache.org/jira/browse/KAFKA-16141
StreamsUpgradeTest - https://issues.apache.org/jira/browse/KAFKA-16139


For `StreamsUpgradeTest` it was a test setup issue and should be fixed 
now in trunk and 3.7 (and actually also in 3.6...)


For `StreamsStandbyTask` the failing test exposes a regression bug, so 
it's a blocker. I updated the ticket accordingly. We already have an 
open PR that reverts the code introducing the regression.



-Matthias

On 1/17/24 9:44 AM, Proven Provenzano wrote:

We have another blocking issue for the RC :
https://issues.apache.org/jira/browse/KAFKA-16157. This bug is similar to
https://issues.apache.org/jira/browse/KAFKA-14616. The new issue however
can lead to the new topic having partitions that a producer cannot write to.

--Proven

On Tue, Jan 16, 2024 at 12:04 PM Proven Provenzano 
wrote:



I have a PR https://github.com/apache/kafka/pull/15197 for
https://issues.apache.org/jira/browse/KAFKA-16131 that is building now.
--Proven

On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz  wrote:


*> Hi Jakub,> > Thanks for trying the RC. I think what you found is a
blocker bug because it *
*> will generate huge amount of logspam. I guess we didn't find it in
junit
tests *
*> since logspam doesn't fail the automated tests. But certainly it's not
suitable *
*> for production. Did you file a JIRA yet?*

Hi Colin,

I opened https://issues.apache.org/jira/browse/KAFKA-16131.

Thanks & Regards
Jakub

On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe  wrote:


Hi Stanislav,

Thanks for making the first RC. The fact that it's titled RC2 is messing
with my mind a bit. I hope this doesn't make people think that we're
farther along than we are, heh.

On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:

*> Nice catch! It does seem like we should have gated this behind the
metadata> version as KIP-858 implies. Is the cluster configured with
multiple log> dirs? What is the impact of the error messages?*

I did not observe any obvious impact. I was able to send and receive
messages as normally. But to be honest, I have no idea what else
this might impact, so I did not try anything special.

I think everyone upgrading an existing KRaft cluster will go through

this

stage (running Kafka 3.7 with an older metadata version for at least a
while). So even if it is just a logged exception without any other

impact I

wonder if it might scare users from upgrading. But I leave it to

others

to

decide if this is a blocker or not.



Hi Jakub,

Thanks for trying the RC. I think what you found is a blocker bug

because

it will generate huge amount of logspam. I guess we didn't find it in

junit

tests since logspam doesn't fail the automated tests. But certainly it's
not suitable for production. Did you file a JIRA yet?


On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
 wrote:


Hey Luke,

This is an interesting problem. Given the fact that the KIP for

having a

3.8 release passed, I think it weights the scale towards not calling

this a

blocker and expecting it to be solved in 3.7.1.

It is unfortunate that it would not seem safe to migrate to KRaft in

3.7.0

(given the inability to rollback safely), but if that's true - the

same

case would apply for 3.6.0. So in any case users w\ould be expected

to

use a

patch release for this.


Hi Luke,

Thanks for testing rollback. I think this is a case where the
documentation is wrong. The intention was to for the steps to basically

be:


1. roll all the brokers into zk mode, but with migration enabled
2. take down the kraft quorum
3. rmr /controller, allowing a hybrid broker to take over.
4. roll all the brokers into zk mode without migration enabled (if

desired)


With these steps, there isn't really unavailability since a ZK

controller

can be elected quickly after the kraft quorum is gone.


Further, since we will have a 3.8 release - it is
likely we will ultimately recommend users upgrade from that version

given

its aim is to have strategic KRaft feature parity with ZK.
That being said, I am not 100% on this. Let me know whether you think

this

should block the release, Luke. I am also tagging Colin and David to

weigh

in with their opinions, as they worked on the migration logic.


The rollback docs are new in 3.7 so the fact that they're wrong is a

clear

blocker, I think. But easy to fix, I believe. I will create a PR.

best,
Colin



Hey Kirk and Chris,

Unless I'm missing something - KAFKALESS-16029 is simply a bad log

due

to

improper closing. And the PR description implies this has been

present

since 3.5. While annoying, I don't see a strong reason for this to

block

the release.

Hey Jakub,

Nice catch! It does seem like we should have gated this behind the

metadata

version as KIP-858 implies. Is the cluster configured with multiple

log

dirs? What is the impact of the error messages?

Tagging Igor (the author of the KIP) to weigh i

Re: [VOTE] 3.7.0 RC2

2024-01-17 Thread Proven Provenzano
We have another blocking issue for the RC :
https://issues.apache.org/jira/browse/KAFKA-16157. This bug is similar to
https://issues.apache.org/jira/browse/KAFKA-14616. The new issue however
can lead to the new topic having partitions that a producer cannot write to.

--Proven

On Tue, Jan 16, 2024 at 12:04 PM Proven Provenzano 
wrote:

>
> I have a PR https://github.com/apache/kafka/pull/15197 for
> https://issues.apache.org/jira/browse/KAFKA-16131 that is building now.
> --Proven
>
> On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz  wrote:
>
>> *> Hi Jakub,> > Thanks for trying the RC. I think what you found is a
>> blocker bug because it *
>> *> will generate huge amount of logspam. I guess we didn't find it in
>> junit
>> tests *
>> *> since logspam doesn't fail the automated tests. But certainly it's not
>> suitable *
>> *> for production. Did you file a JIRA yet?*
>>
>> Hi Colin,
>>
>> I opened https://issues.apache.org/jira/browse/KAFKA-16131.
>>
>> Thanks & Regards
>> Jakub
>>
>> On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe  wrote:
>>
>> > Hi Stanislav,
>> >
>> > Thanks for making the first RC. The fact that it's titled RC2 is messing
>> > with my mind a bit. I hope this doesn't make people think that we're
>> > farther along than we are, heh.
>> >
>> > On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
>> > > *> Nice catch! It does seem like we should have gated this behind the
>> > > metadata> version as KIP-858 implies. Is the cluster configured with
>> > > multiple log> dirs? What is the impact of the error messages?*
>> > >
>> > > I did not observe any obvious impact. I was able to send and receive
>> > > messages as normally. But to be honest, I have no idea what else
>> > > this might impact, so I did not try anything special.
>> > >
>> > > I think everyone upgrading an existing KRaft cluster will go through
>> this
>> > > stage (running Kafka 3.7 with an older metadata version for at least a
>> > > while). So even if it is just a logged exception without any other
>> > impact I
>> > > wonder if it might scare users from upgrading. But I leave it to
>> others
>> > to
>> > > decide if this is a blocker or not.
>> > >
>> >
>> > Hi Jakub,
>> >
>> > Thanks for trying the RC. I think what you found is a blocker bug
>> because
>> > it will generate huge amount of logspam. I guess we didn't find it in
>> junit
>> > tests since logspam doesn't fail the automated tests. But certainly it's
>> > not suitable for production. Did you file a JIRA yet?
>> >
>> > > On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
>> > >  wrote:
>> > >
>> > >> Hey Luke,
>> > >>
>> > >> This is an interesting problem. Given the fact that the KIP for
>> having a
>> > >> 3.8 release passed, I think it weights the scale towards not calling
>> > this a
>> > >> blocker and expecting it to be solved in 3.7.1.
>> > >>
>> > >> It is unfortunate that it would not seem safe to migrate to KRaft in
>> > 3.7.0
>> > >> (given the inability to rollback safely), but if that's true - the
>> same
>> > >> case would apply for 3.6.0. So in any case users w\ould be expected
>> to
>> > use a
>> > >> patch release for this.
>> >
>> > Hi Luke,
>> >
>> > Thanks for testing rollback. I think this is a case where the
>> > documentation is wrong. The intention was to for the steps to basically
>> be:
>> >
>> > 1. roll all the brokers into zk mode, but with migration enabled
>> > 2. take down the kraft quorum
>> > 3. rmr /controller, allowing a hybrid broker to take over.
>> > 4. roll all the brokers into zk mode without migration enabled (if
>> desired)
>> >
>> > With these steps, there isn't really unavailability since a ZK
>> controller
>> > can be elected quickly after the kraft quorum is gone.
>> >
>> > >> Further, since we will have a 3.8 release - it is
>> > >> likely we will ultimately recommend users upgrade from that version
>> > given
>> > >> its aim is to have strategic KRaft feature parity with ZK.
>> > >> That being said, I am not 100% on this. Let me know whether you think
>> > this
>> > >> should block the release, Luke. I am also tagging Colin and David to
>> > weigh
>> > >> in with their opinions, as they worked on the migration logic.
>> >
>> > The rollback docs are new in 3.7 so the fact that they're wrong is a
>> clear
>> > blocker, I think. But easy to fix, I believe. I will create a PR.
>> >
>> > best,
>> > Colin
>> >
>> > >>
>> > >> Hey Kirk and Chris,
>> > >>
>> > >> Unless I'm missing something - KAFKALESS-16029 is simply a bad log
>> due
>> > to
>> > >> improper closing. And the PR description implies this has been
>> present
>> > >> since 3.5. While annoying, I don't see a strong reason for this to
>> block
>> > >> the release.
>> > >>
>> > >> Hey Jakub,
>> > >>
>> > >> Nice catch! It does seem like we should have gated this behind the
>> > metadata
>> > >> version as KIP-858 implies. Is the cluster configured with multiple
>> log
>> > >> dirs? What is the impact of the error messages?
>> > >>
>> > >> Tagging Igor (the author

Re: [VOTE] 3.7.0 RC2

2024-01-16 Thread Proven Provenzano
I have a PR https://github.com/apache/kafka/pull/15197 for
https://issues.apache.org/jira/browse/KAFKA-16131 that is building now.
--Proven

On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz  wrote:

> *> Hi Jakub,> > Thanks for trying the RC. I think what you found is a
> blocker bug because it *
> *> will generate huge amount of logspam. I guess we didn't find it in junit
> tests *
> *> since logspam doesn't fail the automated tests. But certainly it's not
> suitable *
> *> for production. Did you file a JIRA yet?*
>
> Hi Colin,
>
> I opened https://issues.apache.org/jira/browse/KAFKA-16131.
>
> Thanks & Regards
> Jakub
>
> On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe  wrote:
>
> > Hi Stanislav,
> >
> > Thanks for making the first RC. The fact that it's titled RC2 is messing
> > with my mind a bit. I hope this doesn't make people think that we're
> > farther along than we are, heh.
> >
> > On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
> > > *> Nice catch! It does seem like we should have gated this behind the
> > > metadata> version as KIP-858 implies. Is the cluster configured with
> > > multiple log> dirs? What is the impact of the error messages?*
> > >
> > > I did not observe any obvious impact. I was able to send and receive
> > > messages as normally. But to be honest, I have no idea what else
> > > this might impact, so I did not try anything special.
> > >
> > > I think everyone upgrading an existing KRaft cluster will go through
> this
> > > stage (running Kafka 3.7 with an older metadata version for at least a
> > > while). So even if it is just a logged exception without any other
> > impact I
> > > wonder if it might scare users from upgrading. But I leave it to others
> > to
> > > decide if this is a blocker or not.
> > >
> >
> > Hi Jakub,
> >
> > Thanks for trying the RC. I think what you found is a blocker bug because
> > it will generate huge amount of logspam. I guess we didn't find it in
> junit
> > tests since logspam doesn't fail the automated tests. But certainly it's
> > not suitable for production. Did you file a JIRA yet?
> >
> > > On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
> > >  wrote:
> > >
> > >> Hey Luke,
> > >>
> > >> This is an interesting problem. Given the fact that the KIP for
> having a
> > >> 3.8 release passed, I think it weights the scale towards not calling
> > this a
> > >> blocker and expecting it to be solved in 3.7.1.
> > >>
> > >> It is unfortunate that it would not seem safe to migrate to KRaft in
> > 3.7.0
> > >> (given the inability to rollback safely), but if that's true - the
> same
> > >> case would apply for 3.6.0. So in any case users w\ould be expected to
> > use a
> > >> patch release for this.
> >
> > Hi Luke,
> >
> > Thanks for testing rollback. I think this is a case where the
> > documentation is wrong. The intention was to for the steps to basically
> be:
> >
> > 1. roll all the brokers into zk mode, but with migration enabled
> > 2. take down the kraft quorum
> > 3. rmr /controller, allowing a hybrid broker to take over.
> > 4. roll all the brokers into zk mode without migration enabled (if
> desired)
> >
> > With these steps, there isn't really unavailability since a ZK controller
> > can be elected quickly after the kraft quorum is gone.
> >
> > >> Further, since we will have a 3.8 release - it is
> > >> likely we will ultimately recommend users upgrade from that version
> > given
> > >> its aim is to have strategic KRaft feature parity with ZK.
> > >> That being said, I am not 100% on this. Let me know whether you think
> > this
> > >> should block the release, Luke. I am also tagging Colin and David to
> > weigh
> > >> in with their opinions, as they worked on the migration logic.
> >
> > The rollback docs are new in 3.7 so the fact that they're wrong is a
> clear
> > blocker, I think. But easy to fix, I believe. I will create a PR.
> >
> > best,
> > Colin
> >
> > >>
> > >> Hey Kirk and Chris,
> > >>
> > >> Unless I'm missing something - KAFKALESS-16029 is simply a bad log due
> > to
> > >> improper closing. And the PR description implies this has been present
> > >> since 3.5. While annoying, I don't see a strong reason for this to
> block
> > >> the release.
> > >>
> > >> Hey Jakub,
> > >>
> > >> Nice catch! It does seem like we should have gated this behind the
> > metadata
> > >> version as KIP-858 implies. Is the cluster configured with multiple
> log
> > >> dirs? What is the impact of the error messages?
> > >>
> > >> Tagging Igor (the author of the KIP) to weigh in.
> > >>
> > >> Best,
> > >> Stanislav
> > >>
> > >> On Sat, Jan 13, 2024 at 7:22 PM Jakub Scholz  wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I was trying the RC2 and run into the following issue ... when I run
> > >> > 3.7.0-RC2 KRaft cluster with metadata version set to 3.6-IV2
> metadata
> > >> > version, I seem to be getting repeated errors like this in the
> > controller
> > >> > logs:
> > >> >
> > >> > 2024-01-13 16:58:01,197 INFO [QuorumController id=0]
>

Re: [VOTE] 3.7.0 RC2

2024-01-16 Thread Stanislav Kozlovski
Hi Kirk,

Given we are going to have to roll a new RC anyway, and the change is so
simple - might as well get it in!

On Mon, Jan 15, 2024 at 8:26 PM Kirk True  wrote:

> Hi Stanislav,
>
> On Sun, Jan 14, 2024, at 1:17 PM, Stanislav Kozlovski wrote:
> > Hey Kirk and Chris,
> >
> > Unless I'm missing something - KAFKALESS-16029 is simply a bad log due to
> > improper closing. And the PR description implies this has been present
> > since 3.5. While annoying, I don't see a strong reason for this to block
> > the release.
>
> I would imagine that it would result in concerned users reporting the
> issue.
>
> I took another look, and the code that causes the issue was indeed changed
> in 3.7. It is easily reproducible.
>
> The PR is ready for review: https://github.com/apache/kafka/pull/15186
>
> Thanks,
> Kirk



-- 
Best,
Stanislav


Re: [VOTE] 3.7.0 RC2

2024-01-15 Thread Kirk True
Hi Stanislav,

On Sun, Jan 14, 2024, at 1:17 PM, Stanislav Kozlovski wrote:
> Hey Kirk and Chris,
> 
> Unless I'm missing something - KAFKALESS-16029 is simply a bad log due to
> improper closing. And the PR description implies this has been present
> since 3.5. While annoying, I don't see a strong reason for this to block
> the release.

I would imagine that it would result in concerned users reporting the issue.

I took another look, and the code that causes the issue was indeed changed in 
3.7. It is easily reproducible.

The PR is ready for review: https://github.com/apache/kafka/pull/15186

Thanks,
Kirk

Re: [VOTE] 3.7.0 RC2

2024-01-15 Thread Stanislav Kozlovski
I wanted to circle back and confirm the integration tests + system tests,
plus give an overall update regarding status.

The integration tests have a fair amount of flakes. I ran and inspected 3
consecutive builds (57
, 58
, 59
), then
cross-checked each run's failures via a script of mine to see any
consistent failures.

Three tests proved very flaky. Two are related to KIP-848 running under
KRaft. The third one is a Trogdor test. All 3 tests pass locally, hence I
deem them not blockers for the release. Especially since KIP-848 is in
early access, I am not particularly concerned with a flaky test. I opened
three JIRAs to track them:
- https://issues.apache.org/jira/browse/KAFKA-16134
- https://issues.apache.org/jira/browse/KAFKA-16135
- https://issues.apache.org/jira/browse/KAFKA-16136

As for the system tests, I again ran 2 consecutive builds (1
,
2
)
and I found 4 tests that exhibit consecutive failures.
- The whole analysis: https://hackmd.io/@hOneAGCrSmKSpL8VF-1HWQ/HyRgRJmta

The failing tests:
StreamsStandbyTask - https://issues.apache.org/jira/browse/KAFKA-16141
StreamsUpgradeTest - https://issues.apache.org/jira/browse/KAFKA-16139
QuotaTest - https://issues.apache.org/jira/browse/KAFKA-16138
ZookeeperMigrationTest - https://issues.apache.org/jira/browse/KAFKA-16140

I am reaching out to subject matter experts regarding the failures.

Thanks to everyone who contributed in testing the release. Here is a
general update regarding known blockers that were recently found:

We are treating https://issues.apache.org/jira/browse/KAFKA-16131 and
https://issues.apache.org/jira/browse/KAFKA-16101 as blockers.

https://issues.apache.org/jira/browse/KAFKA-16132 is a potential other
issue that will likely be treated as a blocker

Best,
Stanislav

On Mon, Jan 15, 2024 at 12:04 PM Jakub Scholz  wrote:

> *> Hi Jakub,> > Thanks for trying the RC. I think what you found is a
> blocker bug because it *
> *> will generate huge amount of logspam. I guess we didn't find it in junit
> tests *
> *> since logspam doesn't fail the automated tests. But certainly it's not
> suitable *
> *> for production. Did you file a JIRA yet?*
>
> Hi Colin,
>
> I opened https://issues.apache.org/jira/browse/KAFKA-16131.
>
> Thanks & Regards
> Jakub
>
> On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe  wrote:
>
> > Hi Stanislav,
> >
> > Thanks for making the first RC. The fact that it's titled RC2 is messing
> > with my mind a bit. I hope this doesn't make people think that we're
> > farther along than we are, heh.
> >
> > On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
> > > *> Nice catch! It does seem like we should have gated this behind the
> > > metadata> version as KIP-858 implies. Is the cluster configured with
> > > multiple log> dirs? What is the impact of the error messages?*
> > >
> > > I did not observe any obvious impact. I was able to send and receive
> > > messages as normally. But to be honest, I have no idea what else
> > > this might impact, so I did not try anything special.
> > >
> > > I think everyone upgrading an existing KRaft cluster will go through
> this
> > > stage (running Kafka 3.7 with an older metadata version for at least a
> > > while). So even if it is just a logged exception without any other
> > impact I
> > > wonder if it might scare users from upgrading. But I leave it to others
> > to
> > > decide if this is a blocker or not.
> > >
> >
> > Hi Jakub,
> >
> > Thanks for trying the RC. I think what you found is a blocker bug because
> > it will generate huge amount of logspam. I guess we didn't find it in
> junit
> > tests since logspam doesn't fail the automated tests. But certainly it's
> > not suitable for production. Did you file a JIRA yet?
> >
> > > On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
> > >  wrote:
> > >
> > >> Hey Luke,
> > >>
> > >> This is an interesting problem. Given the fact that the KIP for
> having a
> > >> 3.8 release passed, I think it weights the scale towards not calling
> > this a
> > >> blocker and expecting it to be solved in 3.7.1.
> > >>
> > >> It is unfortunate that it would not seem safe to migrate to KRaft in
> > 3.7.0
> > >> (given the inability to rollback safely), but if that's true - the
> same
> > >> case would apply for 3.6.0. So in any case users w\ould be expected to
> > use a
> > >> patch release for this.
> >
> > Hi Luke,
> >
> > Thanks for testing rollback. I think this i

Re: [VOTE] 3.7.0 RC2

2024-01-15 Thread Jakub Scholz
*> Hi Jakub,> > Thanks for trying the RC. I think what you found is a
blocker bug because it *
*> will generate huge amount of logspam. I guess we didn't find it in junit
tests *
*> since logspam doesn't fail the automated tests. But certainly it's not
suitable *
*> for production. Did you file a JIRA yet?*

Hi Colin,

I opened https://issues.apache.org/jira/browse/KAFKA-16131.

Thanks & Regards
Jakub

On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe  wrote:

> Hi Stanislav,
>
> Thanks for making the first RC. The fact that it's titled RC2 is messing
> with my mind a bit. I hope this doesn't make people think that we're
> farther along than we are, heh.
>
> On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
> > *> Nice catch! It does seem like we should have gated this behind the
> > metadata> version as KIP-858 implies. Is the cluster configured with
> > multiple log> dirs? What is the impact of the error messages?*
> >
> > I did not observe any obvious impact. I was able to send and receive
> > messages as normally. But to be honest, I have no idea what else
> > this might impact, so I did not try anything special.
> >
> > I think everyone upgrading an existing KRaft cluster will go through this
> > stage (running Kafka 3.7 with an older metadata version for at least a
> > while). So even if it is just a logged exception without any other
> impact I
> > wonder if it might scare users from upgrading. But I leave it to others
> to
> > decide if this is a blocker or not.
> >
>
> Hi Jakub,
>
> Thanks for trying the RC. I think what you found is a blocker bug because
> it will generate huge amount of logspam. I guess we didn't find it in junit
> tests since logspam doesn't fail the automated tests. But certainly it's
> not suitable for production. Did you file a JIRA yet?
>
> > On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
> >  wrote:
> >
> >> Hey Luke,
> >>
> >> This is an interesting problem. Given the fact that the KIP for having a
> >> 3.8 release passed, I think it weights the scale towards not calling
> this a
> >> blocker and expecting it to be solved in 3.7.1.
> >>
> >> It is unfortunate that it would not seem safe to migrate to KRaft in
> 3.7.0
> >> (given the inability to rollback safely), but if that's true - the same
> >> case would apply for 3.6.0. So in any case users w\ould be expected to
> use a
> >> patch release for this.
>
> Hi Luke,
>
> Thanks for testing rollback. I think this is a case where the
> documentation is wrong. The intention was to for the steps to basically be:
>
> 1. roll all the brokers into zk mode, but with migration enabled
> 2. take down the kraft quorum
> 3. rmr /controller, allowing a hybrid broker to take over.
> 4. roll all the brokers into zk mode without migration enabled (if desired)
>
> With these steps, there isn't really unavailability since a ZK controller
> can be elected quickly after the kraft quorum is gone.
>
> >> Further, since we will have a 3.8 release - it is
> >> likely we will ultimately recommend users upgrade from that version
> given
> >> its aim is to have strategic KRaft feature parity with ZK.
> >> That being said, I am not 100% on this. Let me know whether you think
> this
> >> should block the release, Luke. I am also tagging Colin and David to
> weigh
> >> in with their opinions, as they worked on the migration logic.
>
> The rollback docs are new in 3.7 so the fact that they're wrong is a clear
> blocker, I think. But easy to fix, I believe. I will create a PR.
>
> best,
> Colin
>
> >>
> >> Hey Kirk and Chris,
> >>
> >> Unless I'm missing something - KAFKALESS-16029 is simply a bad log due
> to
> >> improper closing. And the PR description implies this has been present
> >> since 3.5. While annoying, I don't see a strong reason for this to block
> >> the release.
> >>
> >> Hey Jakub,
> >>
> >> Nice catch! It does seem like we should have gated this behind the
> metadata
> >> version as KIP-858 implies. Is the cluster configured with multiple log
> >> dirs? What is the impact of the error messages?
> >>
> >> Tagging Igor (the author of the KIP) to weigh in.
> >>
> >> Best,
> >> Stanislav
> >>
> >> On Sat, Jan 13, 2024 at 7:22 PM Jakub Scholz  wrote:
> >>
> >> > Hi,
> >> >
> >> > I was trying the RC2 and run into the following issue ... when I run
> >> > 3.7.0-RC2 KRaft cluster with metadata version set to 3.6-IV2 metadata
> >> > version, I seem to be getting repeated errors like this in the
> controller
> >> > logs:
> >> >
> >> > 2024-01-13 16:58:01,197 INFO [QuorumController id=0]
> >> assignReplicasToDirs:
> >> > event failed with UnsupportedVersionException in 15 microseconds.
> >> > (org.apache.kafka.controller.QuorumController)
> >> > [quorum-controller-0-event-handler]
> >> > 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] Unexpected
> error
> >> > handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
> >> > apiVersion=0, clientId=1000, correlationId=14, headerVersion=2) --
> >> > AssignReplicasToDirsRequestData(b

Re: [VOTE] 3.7.0 RC2

2024-01-15 Thread Luke Chen
Hi Paolo, Colin,

Let's discuss detail about this issue in the PR:
https://github.com/apache/kafka/pull/15193 .

On Mon, Jan 15, 2024 at 4:21 PM Paolo Patierno 
wrote:

> Hi Colin,
> I was the one raising the issue about rollback and I also already tried
> what you mentioned but with no success.
> During the first rolling, I left
> the zookeeper.metadata.migration.enable=true but
> removed controller.quorum.voters and controller.listener.names.
> This is what I get from the brokers on restarting:
>
> 2024-01-15 09:19:14,172] ERROR Exiting Kafka due to fatal exception
> (kafka.Kafka$)
> org.apache.kafka.common.config.ConfigException: If using
> zookeeper.metadata.migration.enable, controller.quorum.voters must contain
> a parseable set of voters.
> at
>
> kafka.server.KafkaConfig.validateNonEmptyQuorumVotersForMigration$1(KafkaConfig.scala:2286)
> at kafka.server.KafkaConfig.validateValues(KafkaConfig.scala:2371)
> at kafka.server.KafkaConfig.(KafkaConfig.scala:2233)
> at kafka.server.KafkaConfig.(KafkaConfig.scala:1604)
> at kafka.server.KafkaConfig$.fromProps(KafkaConfig.scala:1527)
> at kafka.Kafka$.buildServer(Kafka.scala:72)
> at kafka.Kafka$.main(Kafka.scala:91)
> at kafka.Kafka.main(Kafka.scala)
>
> Did you try it?
> Am I missing anything in your procedure?
>
> Thanks,
> Paolo
>
> On Mon, 15 Jan 2024 at 09:13, Colin McCabe  wrote:
>
> > Docs fix discussed in the thread is here:
> > https://github.com/apache/kafka/pull/15193
> >
> > best,
> > Colin
> >
> >
> > On Sun, Jan 14, 2024, at 23:56, Colin McCabe wrote:
> > > Hi Stanislav,
> > >
> > > Thanks for making the first RC. The fact that it's titled RC2 is
> > > messing with my mind a bit. I hope this doesn't make people think that
> > > we're farther along than we are, heh.
> > >
> > > On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
> > >> *> Nice catch! It does seem like we should have gated this behind the
> > >> metadata> version as KIP-858 implies. Is the cluster configured with
> > >> multiple log> dirs? What is the impact of the error messages?*
> > >>
> > >> I did not observe any obvious impact. I was able to send and receive
> > >> messages as normally. But to be honest, I have no idea what else
> > >> this might impact, so I did not try anything special.
> > >>
> > >> I think everyone upgrading an existing KRaft cluster will go through
> > this
> > >> stage (running Kafka 3.7 with an older metadata version for at least a
> > >> while). So even if it is just a logged exception without any other
> > impact I
> > >> wonder if it might scare users from upgrading. But I leave it to
> others
> > to
> > >> decide if this is a blocker or not.
> > >>
> > >
> > > Hi Jakub,
> > >
> > > Thanks for trying the RC. I think what you found is a blocker bug
> > > because it will generate huge amount of logspam. I guess we didn't find
> > > it in junit tests since logspam doesn't fail the automated tests. But
> > > certainly it's not suitable for production. Did you file a JIRA yet?
> > >
> > >> On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
> > >>  wrote:
> > >>
> > >>> Hey Luke,
> > >>>
> > >>> This is an interesting problem. Given the fact that the KIP for
> having
> > a
> > >>> 3.8 release passed, I think it weights the scale towards not calling
> > this a
> > >>> blocker and expecting it to be solved in 3.7.1.
> > >>>
> > >>> It is unfortunate that it would not seem safe to migrate to KRaft in
> > 3.7.0
> > >>> (given the inability to rollback safely), but if that's true - the
> same
> > >>> case would apply for 3.6.0. So in any case users w\ould be expected
> to
> > use a
> > >>> patch release for this.
> > >
> > > Hi Luke,
> > >
> > > Thanks for testing rollback. I think this is a case where the
> > > documentation is wrong. The intention was to for the steps to basically
> > > be:
> > >
> > > 1. roll all the brokers into zk mode, but with migration enabled
> > > 2. take down the kraft quorum
> > > 3. rmr /controller, allowing a hybrid broker to take over.
> > > 4. roll all the brokers into zk mode without migration enabled (if
> > desired)
> > >
> > > With these steps, there isn't really unavailability since a ZK
> > > controller can be elected quickly after the kraft quorum is gone.
> > >
> > >>> Further, since we will have a 3.8 release - it is
> > >>> likely we will ultimately recommend users upgrade from that version
> > given
> > >>> its aim is to have strategic KRaft feature parity with ZK.
> > >>> That being said, I am not 100% on this. Let me know whether you think
> > this
> > >>> should block the release, Luke. I am also tagging Colin and David to
> > weigh
> > >>> in with their opinions, as they worked on the migration logic.
> > >
> > > The rollback docs are new in 3.7 so the fact that they're wrong is a
> > > clear blocker, I think. But easy to fix, I believe. I will create a PR.
> > >
> > > best,
> > > Colin
> > >
> > >>>
> > >>> Hey Kirk and Chris,
> > >>>
> > >>> 

Re: [VOTE] 3.7.0 RC2

2024-01-15 Thread Paolo Patierno
Hi Colin,
I was the one raising the issue about rollback and I also already tried
what you mentioned but with no success.
During the first rolling, I left
the zookeeper.metadata.migration.enable=true but
removed controller.quorum.voters and controller.listener.names.
This is what I get from the brokers on restarting:

2024-01-15 09:19:14,172] ERROR Exiting Kafka due to fatal exception
(kafka.Kafka$)
org.apache.kafka.common.config.ConfigException: If using
zookeeper.metadata.migration.enable, controller.quorum.voters must contain
a parseable set of voters.
at
kafka.server.KafkaConfig.validateNonEmptyQuorumVotersForMigration$1(KafkaConfig.scala:2286)
at kafka.server.KafkaConfig.validateValues(KafkaConfig.scala:2371)
at kafka.server.KafkaConfig.(KafkaConfig.scala:2233)
at kafka.server.KafkaConfig.(KafkaConfig.scala:1604)
at kafka.server.KafkaConfig$.fromProps(KafkaConfig.scala:1527)
at kafka.Kafka$.buildServer(Kafka.scala:72)
at kafka.Kafka$.main(Kafka.scala:91)
at kafka.Kafka.main(Kafka.scala)

Did you try it?
Am I missing anything in your procedure?

Thanks,
Paolo

On Mon, 15 Jan 2024 at 09:13, Colin McCabe  wrote:

> Docs fix discussed in the thread is here:
> https://github.com/apache/kafka/pull/15193
>
> best,
> Colin
>
>
> On Sun, Jan 14, 2024, at 23:56, Colin McCabe wrote:
> > Hi Stanislav,
> >
> > Thanks for making the first RC. The fact that it's titled RC2 is
> > messing with my mind a bit. I hope this doesn't make people think that
> > we're farther along than we are, heh.
> >
> > On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
> >> *> Nice catch! It does seem like we should have gated this behind the
> >> metadata> version as KIP-858 implies. Is the cluster configured with
> >> multiple log> dirs? What is the impact of the error messages?*
> >>
> >> I did not observe any obvious impact. I was able to send and receive
> >> messages as normally. But to be honest, I have no idea what else
> >> this might impact, so I did not try anything special.
> >>
> >> I think everyone upgrading an existing KRaft cluster will go through
> this
> >> stage (running Kafka 3.7 with an older metadata version for at least a
> >> while). So even if it is just a logged exception without any other
> impact I
> >> wonder if it might scare users from upgrading. But I leave it to others
> to
> >> decide if this is a blocker or not.
> >>
> >
> > Hi Jakub,
> >
> > Thanks for trying the RC. I think what you found is a blocker bug
> > because it will generate huge amount of logspam. I guess we didn't find
> > it in junit tests since logspam doesn't fail the automated tests. But
> > certainly it's not suitable for production. Did you file a JIRA yet?
> >
> >> On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
> >>  wrote:
> >>
> >>> Hey Luke,
> >>>
> >>> This is an interesting problem. Given the fact that the KIP for having
> a
> >>> 3.8 release passed, I think it weights the scale towards not calling
> this a
> >>> blocker and expecting it to be solved in 3.7.1.
> >>>
> >>> It is unfortunate that it would not seem safe to migrate to KRaft in
> 3.7.0
> >>> (given the inability to rollback safely), but if that's true - the same
> >>> case would apply for 3.6.0. So in any case users w\ould be expected to
> use a
> >>> patch release for this.
> >
> > Hi Luke,
> >
> > Thanks for testing rollback. I think this is a case where the
> > documentation is wrong. The intention was to for the steps to basically
> > be:
> >
> > 1. roll all the brokers into zk mode, but with migration enabled
> > 2. take down the kraft quorum
> > 3. rmr /controller, allowing a hybrid broker to take over.
> > 4. roll all the brokers into zk mode without migration enabled (if
> desired)
> >
> > With these steps, there isn't really unavailability since a ZK
> > controller can be elected quickly after the kraft quorum is gone.
> >
> >>> Further, since we will have a 3.8 release - it is
> >>> likely we will ultimately recommend users upgrade from that version
> given
> >>> its aim is to have strategic KRaft feature parity with ZK.
> >>> That being said, I am not 100% on this. Let me know whether you think
> this
> >>> should block the release, Luke. I am also tagging Colin and David to
> weigh
> >>> in with their opinions, as they worked on the migration logic.
> >
> > The rollback docs are new in 3.7 so the fact that they're wrong is a
> > clear blocker, I think. But easy to fix, I believe. I will create a PR.
> >
> > best,
> > Colin
> >
> >>>
> >>> Hey Kirk and Chris,
> >>>
> >>> Unless I'm missing something - KAFKALESS-16029 is simply a bad log due
> to
> >>> improper closing. And the PR description implies this has been present
> >>> since 3.5. While annoying, I don't see a strong reason for this to
> block
> >>> the release.
> >>>
> >>> Hey Jakub,
> >>>
> >>> Nice catch! It does seem like we should have gated this behind the
> metadata
> >>> version as KIP-858 implies. Is the cluster configu

Re: [VOTE] 3.7.0 RC2

2024-01-15 Thread Colin McCabe
Docs fix discussed in the thread is here: 
https://github.com/apache/kafka/pull/15193

best,
Colin


On Sun, Jan 14, 2024, at 23:56, Colin McCabe wrote:
> Hi Stanislav,
>
> Thanks for making the first RC. The fact that it's titled RC2 is 
> messing with my mind a bit. I hope this doesn't make people think that 
> we're farther along than we are, heh.
>
> On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
>> *> Nice catch! It does seem like we should have gated this behind the
>> metadata> version as KIP-858 implies. Is the cluster configured with
>> multiple log> dirs? What is the impact of the error messages?*
>>
>> I did not observe any obvious impact. I was able to send and receive
>> messages as normally. But to be honest, I have no idea what else
>> this might impact, so I did not try anything special.
>>
>> I think everyone upgrading an existing KRaft cluster will go through this
>> stage (running Kafka 3.7 with an older metadata version for at least a
>> while). So even if it is just a logged exception without any other impact I
>> wonder if it might scare users from upgrading. But I leave it to others to
>> decide if this is a blocker or not.
>>
>
> Hi Jakub,
>
> Thanks for trying the RC. I think what you found is a blocker bug 
> because it will generate huge amount of logspam. I guess we didn't find 
> it in junit tests since logspam doesn't fail the automated tests. But 
> certainly it's not suitable for production. Did you file a JIRA yet?
>
>> On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
>>  wrote:
>>
>>> Hey Luke,
>>>
>>> This is an interesting problem. Given the fact that the KIP for having a
>>> 3.8 release passed, I think it weights the scale towards not calling this a
>>> blocker and expecting it to be solved in 3.7.1.
>>>
>>> It is unfortunate that it would not seem safe to migrate to KRaft in 3.7.0
>>> (given the inability to rollback safely), but if that's true - the same
>>> case would apply for 3.6.0. So in any case users w\ould be expected to use a
>>> patch release for this.
>
> Hi Luke,
>
> Thanks for testing rollback. I think this is a case where the 
> documentation is wrong. The intention was to for the steps to basically 
> be:
>
> 1. roll all the brokers into zk mode, but with migration enabled
> 2. take down the kraft quorum
> 3. rmr /controller, allowing a hybrid broker to take over.
> 4. roll all the brokers into zk mode without migration enabled (if desired)
>
> With these steps, there isn't really unavailability since a ZK 
> controller can be elected quickly after the kraft quorum is gone.
>
>>> Further, since we will have a 3.8 release - it is
>>> likely we will ultimately recommend users upgrade from that version given
>>> its aim is to have strategic KRaft feature parity with ZK.
>>> That being said, I am not 100% on this. Let me know whether you think this
>>> should block the release, Luke. I am also tagging Colin and David to weigh
>>> in with their opinions, as they worked on the migration logic.
>
> The rollback docs are new in 3.7 so the fact that they're wrong is a 
> clear blocker, I think. But easy to fix, I believe. I will create a PR.
>
> best,
> Colin
>
>>>
>>> Hey Kirk and Chris,
>>>
>>> Unless I'm missing something - KAFKALESS-16029 is simply a bad log due to
>>> improper closing. And the PR description implies this has been present
>>> since 3.5. While annoying, I don't see a strong reason for this to block
>>> the release.
>>>
>>> Hey Jakub,
>>>
>>> Nice catch! It does seem like we should have gated this behind the metadata
>>> version as KIP-858 implies. Is the cluster configured with multiple log
>>> dirs? What is the impact of the error messages?
>>>
>>> Tagging Igor (the author of the KIP) to weigh in.
>>>
>>> Best,
>>> Stanislav
>>>
>>> On Sat, Jan 13, 2024 at 7:22 PM Jakub Scholz  wrote:
>>>
>>> > Hi,
>>> >
>>> > I was trying the RC2 and run into the following issue ... when I run
>>> > 3.7.0-RC2 KRaft cluster with metadata version set to 3.6-IV2 metadata
>>> > version, I seem to be getting repeated errors like this in the controller
>>> > logs:
>>> >
>>> > 2024-01-13 16:58:01,197 INFO [QuorumController id=0]
>>> assignReplicasToDirs:
>>> > event failed with UnsupportedVersionException in 15 microseconds.
>>> > (org.apache.kafka.controller.QuorumController)
>>> > [quorum-controller-0-event-handler]
>>> > 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] Unexpected error
>>> > handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
>>> > apiVersion=0, clientId=1000, correlationId=14, headerVersion=2) --
>>> > AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5,
>>> > directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ,
>>> > topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ,
>>> > partitions=[PartitionData(partitionIndex=2),
>>> > PartitionData(partitionIndex=1)]),
>>> > TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ,
>>> > partitions=[PartitionData(partitionIndex=0)])])]) with context
>>> > RequestContext(header=RequestH

Re: [VOTE] 3.7.0 RC2

2024-01-14 Thread Colin McCabe
Hi Stanislav,

Thanks for making the first RC. The fact that it's titled RC2 is messing with 
my mind a bit. I hope this doesn't make people think that we're farther along 
than we are, heh.

On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
> *> Nice catch! It does seem like we should have gated this behind the
> metadata> version as KIP-858 implies. Is the cluster configured with
> multiple log> dirs? What is the impact of the error messages?*
>
> I did not observe any obvious impact. I was able to send and receive
> messages as normally. But to be honest, I have no idea what else
> this might impact, so I did not try anything special.
>
> I think everyone upgrading an existing KRaft cluster will go through this
> stage (running Kafka 3.7 with an older metadata version for at least a
> while). So even if it is just a logged exception without any other impact I
> wonder if it might scare users from upgrading. But I leave it to others to
> decide if this is a blocker or not.
>

Hi Jakub,

Thanks for trying the RC. I think what you found is a blocker bug because it 
will generate huge amount of logspam. I guess we didn't find it in junit tests 
since logspam doesn't fail the automated tests. But certainly it's not suitable 
for production. Did you file a JIRA yet?

> On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
>  wrote:
>
>> Hey Luke,
>>
>> This is an interesting problem. Given the fact that the KIP for having a
>> 3.8 release passed, I think it weights the scale towards not calling this a
>> blocker and expecting it to be solved in 3.7.1.
>>
>> It is unfortunate that it would not seem safe to migrate to KRaft in 3.7.0
>> (given the inability to rollback safely), but if that's true - the same
>> case would apply for 3.6.0. So in any case users w\ould be expected to use a
>> patch release for this.

Hi Luke,

Thanks for testing rollback. I think this is a case where the documentation is 
wrong. The intention was to for the steps to basically be:

1. roll all the brokers into zk mode, but with migration enabled
2. take down the kraft quorum
3. rmr /controller, allowing a hybrid broker to take over.
4. roll all the brokers into zk mode without migration enabled (if desired)

With these steps, there isn't really unavailability since a ZK controller can 
be elected quickly after the kraft quorum is gone.

>> Further, since we will have a 3.8 release - it is
>> likely we will ultimately recommend users upgrade from that version given
>> its aim is to have strategic KRaft feature parity with ZK.
>> That being said, I am not 100% on this. Let me know whether you think this
>> should block the release, Luke. I am also tagging Colin and David to weigh
>> in with their opinions, as they worked on the migration logic.

The rollback docs are new in 3.7 so the fact that they're wrong is a clear 
blocker, I think. But easy to fix, I believe. I will create a PR.

best,
Colin

>>
>> Hey Kirk and Chris,
>>
>> Unless I'm missing something - KAFKALESS-16029 is simply a bad log due to
>> improper closing. And the PR description implies this has been present
>> since 3.5. While annoying, I don't see a strong reason for this to block
>> the release.
>>
>> Hey Jakub,
>>
>> Nice catch! It does seem like we should have gated this behind the metadata
>> version as KIP-858 implies. Is the cluster configured with multiple log
>> dirs? What is the impact of the error messages?
>>
>> Tagging Igor (the author of the KIP) to weigh in.
>>
>> Best,
>> Stanislav
>>
>> On Sat, Jan 13, 2024 at 7:22 PM Jakub Scholz  wrote:
>>
>> > Hi,
>> >
>> > I was trying the RC2 and run into the following issue ... when I run
>> > 3.7.0-RC2 KRaft cluster with metadata version set to 3.6-IV2 metadata
>> > version, I seem to be getting repeated errors like this in the controller
>> > logs:
>> >
>> > 2024-01-13 16:58:01,197 INFO [QuorumController id=0]
>> assignReplicasToDirs:
>> > event failed with UnsupportedVersionException in 15 microseconds.
>> > (org.apache.kafka.controller.QuorumController)
>> > [quorum-controller-0-event-handler]
>> > 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] Unexpected error
>> > handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
>> > apiVersion=0, clientId=1000, correlationId=14, headerVersion=2) --
>> > AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5,
>> > directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ,
>> > topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ,
>> > partitions=[PartitionData(partitionIndex=2),
>> > PartitionData(partitionIndex=1)]),
>> > TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ,
>> > partitions=[PartitionData(partitionIndex=0)])])]) with context
>> > RequestContext(header=RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
>> > apiVersion=0, clientId=1000, correlationId=14, headerVersion=2),
>> > connectionId='172.16.14.219:9090-172.16.14.217:53590-7', clientAddress=/
>> > 172.16.14.217, principal=User:CN=my-cluster-kafka,O=io.strimzi,
>> > listenerName=ListenerName(CONTRO

Re: [VOTE] 3.7.0 RC2

2024-01-14 Thread Jakub Scholz
*> Nice catch! It does seem like we should have gated this behind the
metadata> version as KIP-858 implies. Is the cluster configured with
multiple log> dirs? What is the impact of the error messages?*

I did not observe any obvious impact. I was able to send and receive
messages as normally. But to be honest, I have no idea what else
this might impact, so I did not try anything special.

I think everyone upgrading an existing KRaft cluster will go through this
stage (running Kafka 3.7 with an older metadata version for at least a
while). So even if it is just a logged exception without any other impact I
wonder if it might scare users from upgrading. But I leave it to others to
decide if this is a blocker or not.

Jakub


On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
 wrote:

> Hey Luke,
>
> This is an interesting problem. Given the fact that the KIP for having a
> 3.8 release passed, I think it weights the scale towards not calling this a
> blocker and expecting it to be solved in 3.7.1.
>
> It is unfortunate that it would not seem safe to migrate to KRaft in 3.7.0
> (given the inability to rollback safely), but if that's true - the same
> case would apply for 3.6.0. So in any case users would be expected to use a
> patch release for this. Further, since we will have a 3.8 release - it is
> likely we will ultimately recommend users upgrade from that version given
> its aim is to have strategic KRaft feature parity with ZK.
> That being said, I am not 100% on this. Let me know whether you think this
> should block the release, Luke. I am also tagging Colin and David to weigh
> in with their opinions, as they worked on the migration logic.
>
> Hey Kirk and Chris,
>
> Unless I'm missing something - KAFKALESS-16029 is simply a bad log due to
> improper closing. And the PR description implies this has been present
> since 3.5. While annoying, I don't see a strong reason for this to block
> the release.
>
> Hey Jakub,
>
> Nice catch! It does seem like we should have gated this behind the metadata
> version as KIP-858 implies. Is the cluster configured with multiple log
> dirs? What is the impact of the error messages?
>
> Tagging Igor (the author of the KIP) to weigh in.
>
> Best,
> Stanislav
>
> On Sat, Jan 13, 2024 at 7:22 PM Jakub Scholz  wrote:
>
> > Hi,
> >
> > I was trying the RC2 and run into the following issue ... when I run
> > 3.7.0-RC2 KRaft cluster with metadata version set to 3.6-IV2 metadata
> > version, I seem to be getting repeated errors like this in the controller
> > logs:
> >
> > 2024-01-13 16:58:01,197 INFO [QuorumController id=0]
> assignReplicasToDirs:
> > event failed with UnsupportedVersionException in 15 microseconds.
> > (org.apache.kafka.controller.QuorumController)
> > [quorum-controller-0-event-handler]
> > 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] Unexpected error
> > handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
> > apiVersion=0, clientId=1000, correlationId=14, headerVersion=2) --
> > AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5,
> > directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ,
> > topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ,
> > partitions=[PartitionData(partitionIndex=2),
> > PartitionData(partitionIndex=1)]),
> > TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ,
> > partitions=[PartitionData(partitionIndex=0)])])]) with context
> > RequestContext(header=RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
> > apiVersion=0, clientId=1000, correlationId=14, headerVersion=2),
> > connectionId='172.16.14.219:9090-172.16.14.217:53590-7', clientAddress=/
> > 172.16.14.217, principal=User:CN=my-cluster-kafka,O=io.strimzi,
> > listenerName=ListenerName(CONTROLPLANE-9090), securityProtocol=SSL,
> > clientInformation=ClientInformation(softwareName=apache-kafka-java,
> > softwareVersion=3.7.0), fromPrivilegedListener=false,
> >
> >
> principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@71004ad2
> > ])
> > (kafka.server.ControllerApis) [quorum-controller-0-event-handler]
> > java.util.concurrent.CompletionException:
> > org.apache.kafka.common.errors.UnsupportedVersionException: Directory
> > assignment is not supported yet.
> >
> >  at
> >
> >
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
> >  at
> >
> >
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
> >  at
> >
> >
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636)
> >  at
> >
> >
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> >  at
> >
> >
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
> >  at
> >
> >
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:880)
> >  at
> >
> >
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleE

Re: [VOTE] 3.7.0 RC2

2024-01-14 Thread Stanislav Kozlovski
Hey Luke,

This is an interesting problem. Given the fact that the KIP for having a
3.8 release passed, I think it weights the scale towards not calling this a
blocker and expecting it to be solved in 3.7.1.

It is unfortunate that it would not seem safe to migrate to KRaft in 3.7.0
(given the inability to rollback safely), but if that's true - the same
case would apply for 3.6.0. So in any case users would be expected to use a
patch release for this. Further, since we will have a 3.8 release - it is
likely we will ultimately recommend users upgrade from that version given
its aim is to have strategic KRaft feature parity with ZK.
That being said, I am not 100% on this. Let me know whether you think this
should block the release, Luke. I am also tagging Colin and David to weigh
in with their opinions, as they worked on the migration logic.

Hey Kirk and Chris,

Unless I'm missing something - KAFKALESS-16029 is simply a bad log due to
improper closing. And the PR description implies this has been present
since 3.5. While annoying, I don't see a strong reason for this to block
the release.

Hey Jakub,

Nice catch! It does seem like we should have gated this behind the metadata
version as KIP-858 implies. Is the cluster configured with multiple log
dirs? What is the impact of the error messages?

Tagging Igor (the author of the KIP) to weigh in.

Best,
Stanislav

On Sat, Jan 13, 2024 at 7:22 PM Jakub Scholz  wrote:

> Hi,
>
> I was trying the RC2 and run into the following issue ... when I run
> 3.7.0-RC2 KRaft cluster with metadata version set to 3.6-IV2 metadata
> version, I seem to be getting repeated errors like this in the controller
> logs:
>
> 2024-01-13 16:58:01,197 INFO [QuorumController id=0] assignReplicasToDirs:
> event failed with UnsupportedVersionException in 15 microseconds.
> (org.apache.kafka.controller.QuorumController)
> [quorum-controller-0-event-handler]
> 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] Unexpected error
> handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
> apiVersion=0, clientId=1000, correlationId=14, headerVersion=2) --
> AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5,
> directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ,
> topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ,
> partitions=[PartitionData(partitionIndex=2),
> PartitionData(partitionIndex=1)]),
> TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ,
> partitions=[PartitionData(partitionIndex=0)])])]) with context
> RequestContext(header=RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
> apiVersion=0, clientId=1000, correlationId=14, headerVersion=2),
> connectionId='172.16.14.219:9090-172.16.14.217:53590-7', clientAddress=/
> 172.16.14.217, principal=User:CN=my-cluster-kafka,O=io.strimzi,
> listenerName=ListenerName(CONTROLPLANE-9090), securityProtocol=SSL,
> clientInformation=ClientInformation(softwareName=apache-kafka-java,
> softwareVersion=3.7.0), fromPrivilegedListener=false,
>
> principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@71004ad2
> ])
> (kafka.server.ControllerApis) [quorum-controller-0-event-handler]
> java.util.concurrent.CompletionException:
> org.apache.kafka.common.errors.UnsupportedVersionException: Directory
> assignment is not supported yet.
>
>  at
>
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
>  at
>
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
>  at
>
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636)
>  at
>
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
>  at
>
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
>  at
>
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:880)
>  at
>
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871)
>  at
>
> org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148)
>  at
>
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137)
>  at
>
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
>  at
>
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
>  at java.base/java.lang.Thread.run(Thread.java:840)
>
> Caused by: org.apache.kafka.common.errors.UnsupportedVersionException:
> Directory assignment is not supported yet.
>
> Is that expected? I guess with the metadata version set to 3.6-IV2, it
> makes sense that the request is not supported. But shouldn't then the
> request not be sent at all by the brokers? (I did not opened a JIRA for it,
> but I can open one if you agree this is not expected)
>
> Thanks & Regards
> Jakub
>
> On Sat, Jan 13, 2024 at 8:03 AM Luke Chen  wrote:
>
> > Hi Stanislav,

Re: [VOTE] 3.7.0 RC2

2024-01-13 Thread Jakub Scholz
Hi,

I was trying the RC2 and run into the following issue ... when I run
3.7.0-RC2 KRaft cluster with metadata version set to 3.6-IV2 metadata
version, I seem to be getting repeated errors like this in the controller
logs:

2024-01-13 16:58:01,197 INFO [QuorumController id=0] assignReplicasToDirs:
event failed with UnsupportedVersionException in 15 microseconds.
(org.apache.kafka.controller.QuorumController)
[quorum-controller-0-event-handler]
2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] Unexpected error
handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
apiVersion=0, clientId=1000, correlationId=14, headerVersion=2) --
AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5,
directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ,
topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ,
partitions=[PartitionData(partitionIndex=2),
PartitionData(partitionIndex=1)]),
TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ,
partitions=[PartitionData(partitionIndex=0)])])]) with context
RequestContext(header=RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
apiVersion=0, clientId=1000, correlationId=14, headerVersion=2),
connectionId='172.16.14.219:9090-172.16.14.217:53590-7', clientAddress=/
172.16.14.217, principal=User:CN=my-cluster-kafka,O=io.strimzi,
listenerName=ListenerName(CONTROLPLANE-9090), securityProtocol=SSL,
clientInformation=ClientInformation(softwareName=apache-kafka-java,
softwareVersion=3.7.0), fromPrivilegedListener=false,
principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@71004ad2])
(kafka.server.ControllerApis) [quorum-controller-0-event-handler]
java.util.concurrent.CompletionException:
org.apache.kafka.common.errors.UnsupportedVersionException: Directory
assignment is not supported yet.

 at
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
 at
java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
 at
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636)
 at
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
 at
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
 at
org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:880)
 at
org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871)
 at
org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148)
 at
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137)
 at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
 at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
 at java.base/java.lang.Thread.run(Thread.java:840)

Caused by: org.apache.kafka.common.errors.UnsupportedVersionException:
Directory assignment is not supported yet.

Is that expected? I guess with the metadata version set to 3.6-IV2, it
makes sense that the request is not supported. But shouldn't then the
request not be sent at all by the brokers? (I did not opened a JIRA for it,
but I can open one if you agree this is not expected)

Thanks & Regards
Jakub

On Sat, Jan 13, 2024 at 8:03 AM Luke Chen  wrote:

> Hi Stanislav,
>
> I commented in the "Apache Kafka 3.7.0 Release" thread, but maybe you
> missed it.
> cross-posting here:
>
> There is a bug KAFKA-16101
>  reporting that "Kafka
> cluster will be unavailable during KRaft migration rollback".
> The impact for this issue is that if brokers try to rollback to ZK mode
> during KRaft migration process, there will be a period of time the cluster
> is unavailable.
> Since ZK migrating to KRaft feature is a production ready feature, I think
> this should be addressed soon.
> Do you think this is a blocker for v3.7.0?
>
> Thanks.
> Luke
>
> On Sat, Jan 13, 2024 at 8:36 AM Chris Egerton 
> wrote:
>
> > Thanks, Kirk!
> >
> > @Stanislav--do you believe that this warrants a new RC?
> >
> > On Fri, Jan 12, 2024, 19:08 Kirk True  wrote:
> >
> > > Hi Chris/Stanislav,
> > >
> > > I'm working on the 'Unable to find FetchSessionHandler' log problem
> > > (KAFKA-16029) and have put out a draft PR (
> > > https://github.com/apache/kafka/pull/15186). I will use the quickstart
> > > approach as a second means to reproduce/verify while I wait for the
> PR's
> > > Jenkins job to finish.
> > >
> > > Thanks,
> > > Kirk
> > >
> > > On Fri, Jan 12, 2024, at 11:31 AM, Chris Egerton wrote:
> > > > Hi Stanislav,
> > > >
> > > >
> > > > Thanks for running this release!
> > > >
> > > > To verify, I:
> > > > - Built from source using Java 11 with both:
> > > > - - the 3.7.0-rc2 tag on GitHub
> > > > - - the kafka-3.7.0-src.tgz artifact from
> > > > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/
> > > > - Ch

Re: [VOTE] 3.7.0 RC2

2024-01-12 Thread Luke Chen
Hi Stanislav,

I commented in the "Apache Kafka 3.7.0 Release" thread, but maybe you
missed it.
cross-posting here:

There is a bug KAFKA-16101
 reporting that "Kafka
cluster will be unavailable during KRaft migration rollback".
The impact for this issue is that if brokers try to rollback to ZK mode
during KRaft migration process, there will be a period of time the cluster
is unavailable.
Since ZK migrating to KRaft feature is a production ready feature, I think
this should be addressed soon.
Do you think this is a blocker for v3.7.0?

Thanks.
Luke

On Sat, Jan 13, 2024 at 8:36 AM Chris Egerton 
wrote:

> Thanks, Kirk!
>
> @Stanislav--do you believe that this warrants a new RC?
>
> On Fri, Jan 12, 2024, 19:08 Kirk True  wrote:
>
> > Hi Chris/Stanislav,
> >
> > I'm working on the 'Unable to find FetchSessionHandler' log problem
> > (KAFKA-16029) and have put out a draft PR (
> > https://github.com/apache/kafka/pull/15186). I will use the quickstart
> > approach as a second means to reproduce/verify while I wait for the PR's
> > Jenkins job to finish.
> >
> > Thanks,
> > Kirk
> >
> > On Fri, Jan 12, 2024, at 11:31 AM, Chris Egerton wrote:
> > > Hi Stanislav,
> > >
> > >
> > > Thanks for running this release!
> > >
> > > To verify, I:
> > > - Built from source using Java 11 with both:
> > > - - the 3.7.0-rc2 tag on GitHub
> > > - - the kafka-3.7.0-src.tgz artifact from
> > > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/
> > > - Checked signatures and checksums
> > > - Ran the quickstart using both:
> > > - - The kafka_2.13-3.7.0.tgz artifact from
> > > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/ with Java
> > 11
> > > and Scala 13 in KRaft mode
> > > - - Our shiny new broker Docker image, apache/kafka:3.7.0-rc2
> > > - Ran all unit tests
> > > - Ran all integration tests for Connect and MM2
> > >
> > >
> > > I found two minor areas for concern:
> > >
> > > 1. (Possibly a blocker)
> > > When running the quickstart, I noticed this ERROR-level log message
> being
> > > emitted frequently (not not every time) when I killed my console
> consumer
> > > via ctrl-C:
> > >
> > > > [2024-01-12 11:00:31,088] ERROR [Consumer clientId=console-consumer,
> > > groupId=console-consumer-74388] Unable to find FetchSessionHandler for
> > node
> > > 1. Ignoring fetch response
> > > (org.apache.kafka.clients.consumer.internals.AbstractFetch)
> > >
> > > I see that this error message is already reported in
> > > https://issues.apache.org/jira/browse/KAFKA-16029. I think we should
> > > prioritize fixing it for this release. I know it's probably benign but
> > it's
> > > really not a good look for us when basic operations log error messages,
> > and
> > > it may give new users some headaches.
> > >
> > >
> > > 2. (Probably not a blocker)
> > > The following unit tests failed the first time around, and all of them
> > > passed the second time I ran them:
> > >
> > > - (clients)
> > ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup()
> > > - (clients) SelectorTest.testConnectionsByClientMetric()
> > > - (clients) Tls13SelectorTest.testConnectionsByClientMetric()
> > > - (connect) TopicAdminTest.retryEndOffsetsShouldRetryWhenTopicNotFound
> (I
> > > thought I fixed this one! 🤬🤬)
> > > - (core) ProducerIdManagerTest.testUnrecoverableErrors(Errors)[2]
> > >
> > >
> > > Thanks again for your work on this release, and congratulations to
> Kafka
> > > Streams for having zero flaky unit tests during my highly-experimental
> > > single laptop run!
> > >
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Thu, Jan 11, 2024 at 1:33 PM Stanislav Kozlovski
> > >  wrote:
> > >
> > > > Hello Kafka users, developers, and client-developers,
> > > >
> > > > This is the first candidate for release of Apache Kafka 3.7.0.
> > > >
> > > > Note it's named "RC2" because I had a few "failed" RCs that I had
> > > > cut/uploaded but ultimately had to scrap prior to announcing due to
> new
> > > > blockers arriving before I could even announce them.
> > > >
> > > > Further - I haven't yet been able to set up the system tests
> > successfully.
> > > > And the integration/unit tests do have a few failures that I have to
> > spend
> > > > time triaging. I would appreciate any help in case anyone notices any
> > tests
> > > > failing that they're subject matters experts in. Expect me to follow
> > up in
> > > > a day or two with more detailed analysis.
> > > >
> > > > Major changes include:
> > > > - Early Access to KIP-848 - the next generation of the consumer
> > rebalance
> > > > protocol
> > > > - KIP-858: Adding JBOD support to KRaft
> > > > - KIP-714: Observability into Client metrics via a standardized
> > interface
> > > >
> > > > Check more information in the WIP blog post:
> > > > https://github.com/apache/kafka-site/pull/578
> > > >
> > > > Release notes for the 3.7.0 release:
> > > >
> > > >
> >
> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/

Re: [VOTE] 3.7.0 RC2

2024-01-12 Thread Chris Egerton
Thanks, Kirk!

@Stanislav--do you believe that this warrants a new RC?

On Fri, Jan 12, 2024, 19:08 Kirk True  wrote:

> Hi Chris/Stanislav,
>
> I'm working on the 'Unable to find FetchSessionHandler' log problem
> (KAFKA-16029) and have put out a draft PR (
> https://github.com/apache/kafka/pull/15186). I will use the quickstart
> approach as a second means to reproduce/verify while I wait for the PR's
> Jenkins job to finish.
>
> Thanks,
> Kirk
>
> On Fri, Jan 12, 2024, at 11:31 AM, Chris Egerton wrote:
> > Hi Stanislav,
> >
> >
> > Thanks for running this release!
> >
> > To verify, I:
> > - Built from source using Java 11 with both:
> > - - the 3.7.0-rc2 tag on GitHub
> > - - the kafka-3.7.0-src.tgz artifact from
> > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/
> > - Checked signatures and checksums
> > - Ran the quickstart using both:
> > - - The kafka_2.13-3.7.0.tgz artifact from
> > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/ with Java
> 11
> > and Scala 13 in KRaft mode
> > - - Our shiny new broker Docker image, apache/kafka:3.7.0-rc2
> > - Ran all unit tests
> > - Ran all integration tests for Connect and MM2
> >
> >
> > I found two minor areas for concern:
> >
> > 1. (Possibly a blocker)
> > When running the quickstart, I noticed this ERROR-level log message being
> > emitted frequently (not not every time) when I killed my console consumer
> > via ctrl-C:
> >
> > > [2024-01-12 11:00:31,088] ERROR [Consumer clientId=console-consumer,
> > groupId=console-consumer-74388] Unable to find FetchSessionHandler for
> node
> > 1. Ignoring fetch response
> > (org.apache.kafka.clients.consumer.internals.AbstractFetch)
> >
> > I see that this error message is already reported in
> > https://issues.apache.org/jira/browse/KAFKA-16029. I think we should
> > prioritize fixing it for this release. I know it's probably benign but
> it's
> > really not a good look for us when basic operations log error messages,
> and
> > it may give new users some headaches.
> >
> >
> > 2. (Probably not a blocker)
> > The following unit tests failed the first time around, and all of them
> > passed the second time I ran them:
> >
> > - (clients)
> ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup()
> > - (clients) SelectorTest.testConnectionsByClientMetric()
> > - (clients) Tls13SelectorTest.testConnectionsByClientMetric()
> > - (connect) TopicAdminTest.retryEndOffsetsShouldRetryWhenTopicNotFound (I
> > thought I fixed this one! 🤬🤬)
> > - (core) ProducerIdManagerTest.testUnrecoverableErrors(Errors)[2]
> >
> >
> > Thanks again for your work on this release, and congratulations to Kafka
> > Streams for having zero flaky unit tests during my highly-experimental
> > single laptop run!
> >
> >
> > Cheers,
> >
> > Chris
> >
> > On Thu, Jan 11, 2024 at 1:33 PM Stanislav Kozlovski
> >  wrote:
> >
> > > Hello Kafka users, developers, and client-developers,
> > >
> > > This is the first candidate for release of Apache Kafka 3.7.0.
> > >
> > > Note it's named "RC2" because I had a few "failed" RCs that I had
> > > cut/uploaded but ultimately had to scrap prior to announcing due to new
> > > blockers arriving before I could even announce them.
> > >
> > > Further - I haven't yet been able to set up the system tests
> successfully.
> > > And the integration/unit tests do have a few failures that I have to
> spend
> > > time triaging. I would appreciate any help in case anyone notices any
> tests
> > > failing that they're subject matters experts in. Expect me to follow
> up in
> > > a day or two with more detailed analysis.
> > >
> > > Major changes include:
> > > - Early Access to KIP-848 - the next generation of the consumer
> rebalance
> > > protocol
> > > - KIP-858: Adding JBOD support to KRaft
> > > - KIP-714: Observability into Client metrics via a standardized
> interface
> > >
> > > Check more information in the WIP blog post:
> > > https://github.com/apache/kafka-site/pull/578
> > >
> > > Release notes for the 3.7.0 release:
> > >
> > >
> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by Thursday, January 18, 9am PT ***
> > >
> > > Usually these deadlines tend to be 2-3 days, but due to this being the
> > > first RC and the tests not having ran yet, I am giving it a bit more
> time.
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > https://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/
> > >
> > > * Docker release artifact to be voted upon:
> > > apache/kafka:3.7.0-rc2
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >
> > > * Javadoc:
> > > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/javadoc/
> > >
> > > * Tag to be voted upon (off 3.7 branch) is the 3.7.0 tag:
> > > https:

Re: [VOTE] 3.7.0 RC2

2024-01-12 Thread Kirk True
Hi Chris/Stanislav,

I'm working on the 'Unable to find FetchSessionHandler' log problem 
(KAFKA-16029) and have put out a draft PR 
(https://github.com/apache/kafka/pull/15186). I will use the quickstart 
approach as a second means to reproduce/verify while I wait for the PR's 
Jenkins job to finish.   

Thanks,
Kirk

On Fri, Jan 12, 2024, at 11:31 AM, Chris Egerton wrote:
> Hi Stanislav,
> 
> 
> Thanks for running this release!
> 
> To verify, I:
> - Built from source using Java 11 with both:
> - - the 3.7.0-rc2 tag on GitHub
> - - the kafka-3.7.0-src.tgz artifact from
> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/
> - Checked signatures and checksums
> - Ran the quickstart using both:
> - - The kafka_2.13-3.7.0.tgz artifact from
> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/ with Java 11
> and Scala 13 in KRaft mode
> - - Our shiny new broker Docker image, apache/kafka:3.7.0-rc2
> - Ran all unit tests
> - Ran all integration tests for Connect and MM2
> 
> 
> I found two minor areas for concern:
> 
> 1. (Possibly a blocker)
> When running the quickstart, I noticed this ERROR-level log message being
> emitted frequently (not not every time) when I killed my console consumer
> via ctrl-C:
> 
> > [2024-01-12 11:00:31,088] ERROR [Consumer clientId=console-consumer,
> groupId=console-consumer-74388] Unable to find FetchSessionHandler for node
> 1. Ignoring fetch response
> (org.apache.kafka.clients.consumer.internals.AbstractFetch)
> 
> I see that this error message is already reported in
> https://issues.apache.org/jira/browse/KAFKA-16029. I think we should
> prioritize fixing it for this release. I know it's probably benign but it's
> really not a good look for us when basic operations log error messages, and
> it may give new users some headaches.
> 
> 
> 2. (Probably not a blocker)
> The following unit tests failed the first time around, and all of them
> passed the second time I ran them:
> 
> - (clients) ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup()
> - (clients) SelectorTest.testConnectionsByClientMetric()
> - (clients) Tls13SelectorTest.testConnectionsByClientMetric()
> - (connect) TopicAdminTest.retryEndOffsetsShouldRetryWhenTopicNotFound (I
> thought I fixed this one! 🤬🤬)
> - (core) ProducerIdManagerTest.testUnrecoverableErrors(Errors)[2]
> 
> 
> Thanks again for your work on this release, and congratulations to Kafka
> Streams for having zero flaky unit tests during my highly-experimental
> single laptop run!
> 
> 
> Cheers,
> 
> Chris
> 
> On Thu, Jan 11, 2024 at 1:33 PM Stanislav Kozlovski
>  wrote:
> 
> > Hello Kafka users, developers, and client-developers,
> >
> > This is the first candidate for release of Apache Kafka 3.7.0.
> >
> > Note it's named "RC2" because I had a few "failed" RCs that I had
> > cut/uploaded but ultimately had to scrap prior to announcing due to new
> > blockers arriving before I could even announce them.
> >
> > Further - I haven't yet been able to set up the system tests successfully.
> > And the integration/unit tests do have a few failures that I have to spend
> > time triaging. I would appreciate any help in case anyone notices any tests
> > failing that they're subject matters experts in. Expect me to follow up in
> > a day or two with more detailed analysis.
> >
> > Major changes include:
> > - Early Access to KIP-848 - the next generation of the consumer rebalance
> > protocol
> > - KIP-858: Adding JBOD support to KRaft
> > - KIP-714: Observability into Client metrics via a standardized interface
> >
> > Check more information in the WIP blog post:
> > https://github.com/apache/kafka-site/pull/578
> >
> > Release notes for the 3.7.0 release:
> >
> > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Thursday, January 18, 9am PT ***
> >
> > Usually these deadlines tend to be 2-3 days, but due to this being the
> > first RC and the tests not having ran yet, I am giving it a bit more time.
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/
> >
> > * Docker release artifact to be voted upon:
> > apache/kafka:3.7.0-rc2
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/javadoc/
> >
> > * Tag to be voted upon (off 3.7 branch) is the 3.7.0 tag:
> > https://github.com/apache/kafka/releases/tag/3.7.0-rc2
> >
> > * Documentation:
> > https://kafka.apache.org/37/documentation.html
> >
> > * Protocol:
> > https://kafka.apache.org/37/protocol.html
> >
> > * Successful Jenkins builds for the 3.7 branch:
> > Unit/integration tests:
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.7/58/
> > There are failing test

Re: [VOTE] 3.7.0 RC2

2024-01-12 Thread Chris Egerton
Hi Stanislav,


Thanks for running this release!

To verify, I:
- Built from source using Java 11 with both:
- - the 3.7.0-rc2 tag on GitHub
- - the kafka-3.7.0-src.tgz artifact from
https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/
- Checked signatures and checksums
- Ran the quickstart using both:
- - The kafka_2.13-3.7.0.tgz artifact from
https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/ with Java 11
and Scala 13 in KRaft mode
- - Our shiny new broker Docker image, apache/kafka:3.7.0-rc2
- Ran all unit tests
- Ran all integration tests for Connect and MM2


I found two minor areas for concern:

1. (Possibly a blocker)
When running the quickstart, I noticed this ERROR-level log message being
emitted frequently (not not every time) when I killed my console consumer
via ctrl-C:

> [2024-01-12 11:00:31,088] ERROR [Consumer clientId=console-consumer,
groupId=console-consumer-74388] Unable to find FetchSessionHandler for node
1. Ignoring fetch response
(org.apache.kafka.clients.consumer.internals.AbstractFetch)

I see that this error message is already reported in
https://issues.apache.org/jira/browse/KAFKA-16029. I think we should
prioritize fixing it for this release. I know it's probably benign but it's
really not a good look for us when basic operations log error messages, and
it may give new users some headaches.


2. (Probably not a blocker)
The following unit tests failed the first time around, and all of them
passed the second time I ran them:

- (clients) ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup()
- (clients) SelectorTest.testConnectionsByClientMetric()
- (clients) Tls13SelectorTest.testConnectionsByClientMetric()
- (connect) TopicAdminTest.retryEndOffsetsShouldRetryWhenTopicNotFound (I
thought I fixed this one! 🤬🤬)
- (core) ProducerIdManagerTest.testUnrecoverableErrors(Errors)[2]


Thanks again for your work on this release, and congratulations to Kafka
Streams for having zero flaky unit tests during my highly-experimental
single laptop run!


Cheers,

Chris

On Thu, Jan 11, 2024 at 1:33 PM Stanislav Kozlovski
 wrote:

> Hello Kafka users, developers, and client-developers,
>
> This is the first candidate for release of Apache Kafka 3.7.0.
>
> Note it's named "RC2" because I had a few "failed" RCs that I had
> cut/uploaded but ultimately had to scrap prior to announcing due to new
> blockers arriving before I could even announce them.
>
> Further - I haven't yet been able to set up the system tests successfully.
> And the integration/unit tests do have a few failures that I have to spend
> time triaging. I would appreciate any help in case anyone notices any tests
> failing that they're subject matters experts in. Expect me to follow up in
> a day or two with more detailed analysis.
>
> Major changes include:
> - Early Access to KIP-848 - the next generation of the consumer rebalance
> protocol
> - KIP-858: Adding JBOD support to KRaft
> - KIP-714: Observability into Client metrics via a standardized interface
>
> Check more information in the WIP blog post:
> https://github.com/apache/kafka-site/pull/578
>
> Release notes for the 3.7.0 release:
>
> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/RELEASE_NOTES.html
>
> *** Please download, test and vote by Thursday, January 18, 9am PT ***
>
> Usually these deadlines tend to be 2-3 days, but due to this being the
> first RC and the tests not having ran yet, I am giving it a bit more time.
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/
>
> * Docker release artifact to be voted upon:
> apache/kafka:3.7.0-rc2
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/javadoc/
>
> * Tag to be voted upon (off 3.7 branch) is the 3.7.0 tag:
> https://github.com/apache/kafka/releases/tag/3.7.0-rc2
>
> * Documentation:
> https://kafka.apache.org/37/documentation.html
>
> * Protocol:
> https://kafka.apache.org/37/protocol.html
>
> * Successful Jenkins builds for the 3.7 branch:
> Unit/integration tests:
> https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.7/58/
> There are failing tests here. I have to follow up with triaging some of
> the failures and figuring out if they're actual problems or simply flakes.
>
> System tests: https://jenkins.confluent.io/job/system-test-kafka/job/3.7/
>
> No successful system test runs yet. I am working on getting the job to run.
>
> * Successful Docker Image Github Actions Pipeline for 3.7 branch:
> Attached are the scan_report and report_jvm output files from the Docker
> Build run:
> https://github.com/apache/kafka/actions/runs/7486094960/job/20375761673
>
> And the final docker image build job - Docker Build Test Pipeline:
> https://github.com/apache/kafka/a