Hi Stan and Gaurav, 
Just to clarify some points mentioned here before 
 KAFKA-14616: I raised a year ago so it's not related to JBOD work. It is 
rather a blocker bug for KRAFT in general. The PR from Colin should fix this. 
Am not sure if it is a blocker for 3.7 per-say as it was a major bug since 3.3 
and got missed from all other releases.
 
Regarding the JBOD's work: 
KAFKA-16082:  Is not a blocker for 3.7 instead it's nice fix. The pr 
https://github.com/apache/kafka/pull/15136 is quite a small one and was 
approved by Proven and I but it is waiting for a committer's approval.
KAFKA-16162: This is a blocker for 3.7.  Same it’s a small pr 
https://github.com/apache/kafka/pull/15270 and it is approved Proven and I and 
the PR is waiting for committer's approval. 
KAFKA-16157: This is a blocker for 3.7. There is one small suggestion for the 
pr https://github.com/apache/kafka/pull/15263 but I don't think any of the 
current feedback is blocking the pr from getting approved. Assuming we get a 
committer's approval on it. 
KAFKA-16195:  Same it's a blocker but it has approval from Proven and I and we 
are waiting for committer's approval on the pr 
https://github.com/apache/kafka/pull/15262. 

If we can’t get a committer approval for KAFKA-16162, KAFKA-16157 and 
KAFKA-16195  in time for 3.7 then we can mark JBOD as early release assuming we 
merge at least KAFKA-16195.

Regards, 
Omnia

> On 26 Jan 2024, at 15:39, ka...@gnarula.com wrote:
> 
> Apologies, I duplicated KAFKA-16157 twice in my previous message. I intended 
> to mention KAFKA-16195
> with the PR at https://github.com/apache/kafka/pull/15262 as the second JIRA.
> 
> Thanks,
> Gaurav
> 
>> On 26 Jan 2024, at 15:34, ka...@gnarula.com wrote:
>> 
>> Hi Stan,
>> 
>> I wanted to share some updates about the bugs you shared earlier.
>> 
>> - KAFKA-14616: I've reviewed and tested the PR from Colin and have observed
>> the fix works as intended.
>> - KAFKA-16162: I reviewed Proven's PR and found some gaps in the proposed 
>> fix. I've
>> therefore raised https://github.com/apache/kafka/pull/15270 following a 
>> discussion with Luke in JIRA.
>> - KAFKA-16082: I don't think this is marked as a blocker anymore. I'm 
>> awaiting
>> feedback/reviews at https://github.com/apache/kafka/pull/15136
>> 
>> In addition to the above, there are 2 JIRAs I'd like to bring everyone's 
>> attention to:
>> 
>> - KAFKA-16157: This is similar to KAFKA-14616 and is marked as a blocker. 
>> I've raised
>> https://github.com/apache/kafka/pull/15263 and am awaiting reviews on it.
>> - KAFKA-16157: I raised this yesterday and have addressed feedback from 
>> Luke. This should
>> hopefully get merged soon.
>> 
>> Regards,
>> Gaurav
>> 
>> 
>>> On 24 Jan 2024, at 11:51, ka...@gnarula.com wrote:
>>> 
>>> Hi Stanislav,
>>> 
>>> Thanks for bringing these JIRAs/PRs up.
>>> 
>>> I'll be testing the open PRs for KAFKA-14616 and KAFKA-16162 this week and 
>>> I hope to have some feedback
>>> by Friday. I gather the latter JIRA is marked as a WIP by Proven and he's 
>>> away. I'll try to build on his work in the meantime.
>>> 
>>> As for KAFKA-16082, we haven't been able to deduce a data loss scenario. 
>>> There's a PR open
>>> by me for promoting an abandoned future replica with approvals from Omnia 
>>> and Proven,
>>> so I'd appreciate a committer reviewing it.
>>> 
>>> Regards,
>>> Gaurav
>>> 
>>> On 23 Jan 2024, at 20:17, Stanislav Kozlovski 
>>> <stanis...@confluent.io.INVALID> wrote:
>>>> 
>>>> Hey all, I figured I'd give an update about what known blockers we have
>>>> right now:
>>>> 
>>>> - KAFKA-16101: KRaft migration rollback documentation is incorrect -
>>>> https://github.com/apache/kafka/pull/15193; This need not block RC
>>>> creation, but we need the docs updated so that people can test properly
>>>> - KAFKA-14616: Topic recreation with offline broker causes permanent URPs -
>>>> https://github.com/apache/kafka/pull/15230 ; I am of the understanding that
>>>> this is blocking JBOD for 3.7
>>>> - KAFKA-16162: New created topics are unavailable after upgrading to 3.7 -
>>>> a strict blocker with an open PR https://github.com/apache/kafka/pull/15232
>>>> - although I understand Proveen is out of office
>>>> - KAFKA-16082: JBOD: Possible dataloss when moving leader partition - I am
>>>> hearing mixed opinions on whether this is a blocker (
>>>> https://github.com/apache/kafka/pull/15136)
>>>> 
>>>> Given that there are 3 JBOD blocker bugs, and I am not confident they will
>>>> all be merged this week - I am on the edge of voting to revert JBOD from
>>>> this release, or mark it early access.
>>>> 
>>>> By all accounts, it seems that if we keep with JBOD the release will have
>>>> to spill into February, which is a month extra from the time-based release
>>>> plan we had of start of January.
>>>> 
>>>> Can I ask others for an opinion?
>>>> 
>>>> Best,
>>>> Stan
>>>> 
>>>> On Thu, Jan 18, 2024 at 1:21 PM Luke Chen <show...@gmail.com> wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I think I've found another blocker issue: KAFKA-16162
>>>>> <https://issues.apache.org/jira/browse/KAFKA-16162> .
>>>>> The impact is after upgrading to 3.7.0, any new created topics/partitions
>>>>> will be unavailable.
>>>>> I've put my findings in the JIRA.
>>>>> 
>>>>> Thanks.
>>>>> Luke
>>>>> 
>>>>> On Thu, Jan 18, 2024 at 9:50 AM Matthias J. Sax <mj...@apache.org> wrote:
>>>>> 
>>>>>> Stan, thanks for driving this all forward! Excellent job.
>>>>>> 
>>>>>> About
>>>>>> 
>>>>>>> StreamsStandbyTask - https://issues.apache.org/jira/browse/KAFKA-16141
>>>>>>> StreamsUpgradeTest - https://issues.apache.org/jira/browse/KAFKA-16139
>>>>>> 
>>>>>> For `StreamsUpgradeTest` it was a test setup issue and should be fixed
>>>>>> now in trunk and 3.7 (and actually also in 3.6...)
>>>>>> 
>>>>>> For `StreamsStandbyTask` the failing test exposes a regression bug, so
>>>>>> it's a blocker. I updated the ticket accordingly. We already have an
>>>>>> open PR that reverts the code introducing the regression.
>>>>>> 
>>>>>> 
>>>>>> -Matthias
>>>>>> 
>>>>>> On 1/17/24 9:44 AM, Proven Provenzano wrote:
>>>>>>> We have another blocking issue for the RC :
>>>>>>> https://issues.apache.org/jira/browse/KAFKA-16157. This bug is similar
>>>>>> to
>>>>>>> https://issues.apache.org/jira/browse/KAFKA-14616. The new issue
>>>>> however
>>>>>>> can lead to the new topic having partitions that a producer cannot
>>>>> write
>>>>>> to.
>>>>>>> 
>>>>>>> --Proven
>>>>>>> 
>>>>>>> On Tue, Jan 16, 2024 at 12:04 PM Proven Provenzano <
>>>>>> pprovenz...@confluent.io>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> I have a PR https://github.com/apache/kafka/pull/15197 for
>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-16131 that is building
>>>>> now.
>>>>>>>> --Proven
>>>>>>>> 
>>>>>>>> On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz <ja...@scholz.cz> wrote:
>>>>>>>> 
>>>>>>>>> *> Hi Jakub,> > Thanks for trying the RC. I think what you found is a
>>>>>>>>> blocker bug because it *
>>>>>>>>> *> will generate huge amount of logspam. I guess we didn't find it in
>>>>>>>>> junit
>>>>>>>>> tests *
>>>>>>>>> *> since logspam doesn't fail the automated tests. But certainly it's
>>>>>> not
>>>>>>>>> suitable *
>>>>>>>>> *> for production. Did you file a JIRA yet?*
>>>>>>>>> 
>>>>>>>>> Hi Colin,
>>>>>>>>> 
>>>>>>>>> I opened https://issues.apache.org/jira/browse/KAFKA-16131.
>>>>>>>>> 
>>>>>>>>> Thanks & Regards
>>>>>>>>> Jakub
>>>>>>>>> 
>>>>>>>>> On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe <cmcc...@apache.org>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Stanislav,
>>>>>>>>>> 
>>>>>>>>>> Thanks for making the first RC. The fact that it's titled RC2 is
>>>>>> messing
>>>>>>>>>> with my mind a bit. I hope this doesn't make people think that we're
>>>>>>>>>> farther along than we are, heh.
>>>>>>>>>> 
>>>>>>>>>> On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
>>>>>>>>>>> *> Nice catch! It does seem like we should have gated this behind
>>>>> the
>>>>>>>>>>> metadata> version as KIP-858 implies. Is the cluster configured
>>>>> with
>>>>>>>>>>> multiple log> dirs? What is the impact of the error messages?*
>>>>>>>>>>> 
>>>>>>>>>>> I did not observe any obvious impact. I was able to send and
>>>>> receive
>>>>>>>>>>> messages as normally. But to be honest, I have no idea what else
>>>>>>>>>>> this might impact, so I did not try anything special.
>>>>>>>>>>> 
>>>>>>>>>>> I think everyone upgrading an existing KRaft cluster will go
>>>>> through
>>>>>>>>> this
>>>>>>>>>>> stage (running Kafka 3.7 with an older metadata version for at
>>>>> least
>>>>>> a
>>>>>>>>>>> while). So even if it is just a logged exception without any other
>>>>>>>>>> impact I
>>>>>>>>>>> wonder if it might scare users from upgrading. But I leave it to
>>>>>>>>> others
>>>>>>>>>> to
>>>>>>>>>>> decide if this is a blocker or not.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Hi Jakub,
>>>>>>>>>> 
>>>>>>>>>> Thanks for trying the RC. I think what you found is a blocker bug
>>>>>>>>> because
>>>>>>>>>> it will generate huge amount of logspam. I guess we didn't find it
>>>>> in
>>>>>>>>> junit
>>>>>>>>>> tests since logspam doesn't fail the automated tests. But certainly
>>>>>> it's
>>>>>>>>>> not suitable for production. Did you file a JIRA yet?
>>>>>>>>>> 
>>>>>>>>>>> On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
>>>>>>>>>>> <stanis...@confluent.io.invalid> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hey Luke,
>>>>>>>>>>>> 
>>>>>>>>>>>> This is an interesting problem. Given the fact that the KIP for
>>>>>>>>> having a
>>>>>>>>>>>> 3.8 release passed, I think it weights the scale towards not
>>>>> calling
>>>>>>>>>> this a
>>>>>>>>>>>> blocker and expecting it to be solved in 3.7.1.
>>>>>>>>>>>> 
>>>>>>>>>>>> It is unfortunate that it would not seem safe to migrate to KRaft
>>>>> in
>>>>>>>>>> 3.7.0
>>>>>>>>>>>> (given the inability to rollback safely), but if that's true - the
>>>>>>>>> same
>>>>>>>>>>>> case would apply for 3.6.0. So in any case users w\ould be
>>>>> expected
>>>>>>>>> to
>>>>>>>>>> use a
>>>>>>>>>>>> patch release for this.
>>>>>>>>>> 
>>>>>>>>>> Hi Luke,
>>>>>>>>>> 
>>>>>>>>>> Thanks for testing rollback. I think this is a case where the
>>>>>>>>>> documentation is wrong. The intention was to for the steps to
>>>>>> basically
>>>>>>>>> be:
>>>>>>>>>> 
>>>>>>>>>> 1. roll all the brokers into zk mode, but with migration enabled
>>>>>>>>>> 2. take down the kraft quorum
>>>>>>>>>> 3. rmr /controller, allowing a hybrid broker to take over.
>>>>>>>>>> 4. roll all the brokers into zk mode without migration enabled (if
>>>>>>>>> desired)
>>>>>>>>>> 
>>>>>>>>>> With these steps, there isn't really unavailability since a ZK
>>>>>>>>> controller
>>>>>>>>>> can be elected quickly after the kraft quorum is gone.
>>>>>>>>>> 
>>>>>>>>>>>> Further, since we will have a 3.8 release - it is
>>>>>>>>>>>> likely we will ultimately recommend users upgrade from that
>>>>> version
>>>>>>>>>> given
>>>>>>>>>>>> its aim is to have strategic KRaft feature parity with ZK.
>>>>>>>>>>>> That being said, I am not 100% on this. Let me know whether you
>>>>>> think
>>>>>>>>>> this
>>>>>>>>>>>> should block the release, Luke. I am also tagging Colin and David
>>>>> to
>>>>>>>>>> weigh
>>>>>>>>>>>> in with their opinions, as they worked on the migration logic.
>>>>>>>>>> 
>>>>>>>>>> The rollback docs are new in 3.7 so the fact that they're wrong is a
>>>>>>>>> clear
>>>>>>>>>> blocker, I think. But easy to fix, I believe. I will create a PR.
>>>>>>>>>> 
>>>>>>>>>> best,
>>>>>>>>>> Colin
>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Hey Kirk and Chris,
>>>>>>>>>>>> 
>>>>>>>>>>>> Unless I'm missing something - KAFKALESS-16029 is simply a bad log
>>>>>>>>> due
>>>>>>>>>> to
>>>>>>>>>>>> improper closing. And the PR description implies this has been
>>>>>>>>> present
>>>>>>>>>>>> since 3.5. While annoying, I don't see a strong reason for this to
>>>>>>>>> block
>>>>>>>>>>>> the release.
>>>>>>>>>>>> 
>>>>>>>>>>>> Hey Jakub,
>>>>>>>>>>>> 
>>>>>>>>>>>> Nice catch! It does seem like we should have gated this behind the
>>>>>>>>>> metadata
>>>>>>>>>>>> version as KIP-858 implies. Is the cluster configured with
>>>>> multiple
>>>>>>>>> log
>>>>>>>>>>>> dirs? What is the impact of the error messages?
>>>>>>>>>>>> 
>>>>>>>>>>>> Tagging Igor (the author of the KIP) to weigh in.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Stanislav
>>>>>>>>>>>> 
>>>>>>>>>>>> On Sat, Jan 13, 2024 at 7:22 PM Jakub Scholz <ja...@scholz.cz>
>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I was trying the RC2 and run into the following issue ... when I
>>>>>>>>> run
>>>>>>>>>>>>> 3.7.0-RC2 KRaft cluster with metadata version set to 3.6-IV2
>>>>>>>>> metadata
>>>>>>>>>>>>> version, I seem to be getting repeated errors like this in the
>>>>>>>>>> controller
>>>>>>>>>>>>> logs:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2024-01-13 16:58:01,197 INFO [QuorumController id=0]
>>>>>>>>>>>> assignReplicasToDirs:
>>>>>>>>>>>>> event failed with UnsupportedVersionException in 15 microseconds.
>>>>>>>>>>>>> (org.apache.kafka.controller.QuorumController)
>>>>>>>>>>>>> [quorum-controller-0-event-handler]
>>>>>>>>>>>>> 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0]
>>>>> Unexpected
>>>>>>>>>> error
>>>>>>>>>>>>> handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
>>>>>>>>>>>>> apiVersion=0, clientId=1000, correlationId=14, headerVersion=2)
>>>>> --
>>>>>>>>>>>>> AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5,
>>>>>>>>>>>>> directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ,
>>>>>>>>>>>>> topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ,
>>>>>>>>>>>>> partitions=[PartitionData(partitionIndex=2),
>>>>>>>>>>>>> PartitionData(partitionIndex=1)]),
>>>>>>>>>>>>> TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ,
>>>>>>>>>>>>> partitions=[PartitionData(partitionIndex=0)])])]) with context
>>>>>>>>>>>>> 
>>>>> RequestContext(header=RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
>>>>>>>>>>>>> apiVersion=0, clientId=1000, correlationId=14, headerVersion=2),
>>>>>>>>>>>>> connectionId='172.16.14.219:9090-172.16.14.217:53590-7',
>>>>>>>>>> clientAddress=/
>>>>>>>>>>>>> 172.16.14.217, principal=User:CN=my-cluster-kafka,O=io.strimzi,
>>>>>>>>>>>>> listenerName=ListenerName(CONTROLPLANE-9090),
>>>>> securityProtocol=SSL,
>>>>>>>>>>>>> 
>>>>> clientInformation=ClientInformation(softwareName=apache-kafka-java,
>>>>>>>>>>>>> softwareVersion=3.7.0), fromPrivilegedListener=false,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@71004ad2
>>>>>>>>>>>>> ])
>>>>>>>>>>>>> (kafka.server.ControllerApis) [quorum-controller-0-event-handler]
>>>>>>>>>>>>> java.util.concurrent.CompletionException:
>>>>>>>>>>>>> org.apache.kafka.common.errors.UnsupportedVersionException:
>>>>>>>>> Directory
>>>>>>>>>>>>> assignment is not supported yet.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:880)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
>>>>>>>>>>>>> at java.base/java.lang.Thread.run(Thread.java:840)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Caused by:
>>>>>>>>> org.apache.kafka.common.errors.UnsupportedVersionException:
>>>>>>>>>>>>> Directory assignment is not supported yet.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Is that expected? I guess with the metadata version set to
>>>>>>>>> 3.6-IV2, it
>>>>>>>>>>>>> makes sense that the request is not supported. But shouldn't then
>>>>>>>>> the
>>>>>>>>>>>>> request not be sent at all by the brokers? (I did not opened a
>>>>> JIRA
>>>>>>>>>> for
>>>>>>>>>>>> it,
>>>>>>>>>>>>> but I can open one if you agree this is not expected)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks & Regards
>>>>>>>>>>>>> Jakub
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Sat, Jan 13, 2024 at 8:03 AM Luke Chen <show...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Stanislav,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I commented in the "Apache Kafka 3.7.0 Release" thread, but
>>>>> maybe
>>>>>>>>>> you
>>>>>>>>>>>>>> missed it.
>>>>>>>>>>>>>> cross-posting here:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> There is a bug KAFKA-16101
>>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-16101> reporting
>>>>>>>>> that
>>>>>>>>>>>>> "Kafka
>>>>>>>>>>>>>> cluster will be unavailable during KRaft migration rollback".
>>>>>>>>>>>>>> The impact for this issue is that if brokers try to rollback to
>>>>>>>>> ZK
>>>>>>>>>> mode
>>>>>>>>>>>>>> during KRaft migration process, there will be a period of time
>>>>>>>>> the
>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>> is unavailable.
>>>>>>>>>>>>>> Since ZK migrating to KRaft feature is a production ready
>>>>>>>>> feature, I
>>>>>>>>>>>>> think
>>>>>>>>>>>>>> this should be addressed soon.
>>>>>>>>>>>>>> Do you think this is a blocker for v3.7.0?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>> Luke
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Sat, Jan 13, 2024 at 8:36 AM Chris Egerton <
>>>>>>>>>> fearthecel...@gmail.com
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks, Kirk!
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> @Stanislav--do you believe that this warrants a new RC?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, Jan 12, 2024, 19:08 Kirk True <k...@kirktrue.pro>
>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Chris/Stanislav,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I'm working on the 'Unable to find FetchSessionHandler' log
>>>>>>>>>> problem
>>>>>>>>>>>>>>>> (KAFKA-16029) and have put out a draft PR (
>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/15186). I will use the
>>>>>>>>>>>>> quickstart
>>>>>>>>>>>>>>>> approach as a second means to reproduce/verify while I wait
>>>>>>>>> for
>>>>>>>>>> the
>>>>>>>>>>>>>> PR's
>>>>>>>>>>>>>>>> Jenkins job to finish.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Kirk
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Fri, Jan 12, 2024, at 11:31 AM, Chris Egerton wrote:
>>>>>>>>>>>>>>>>> Hi Stanislav,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for running this release!
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> To verify, I:
>>>>>>>>>>>>>>>>> - Built from source using Java 11 with both:
>>>>>>>>>>>>>>>>> - - the 3.7.0-rc2 tag on GitHub
>>>>>>>>>>>>>>>>> - - the kafka-3.7.0-src.tgz artifact from
>>>>>>>>>>>>>>>>> 
>>>>>>>>> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/
>>>>>>>>>>>>>>>>> - Checked signatures and checksums
>>>>>>>>>>>>>>>>> - Ran the quickstart using both:
>>>>>>>>>>>>>>>>> - - The kafka_2.13-3.7.0.tgz artifact from
>>>>>>>>>>>>>>>>> 
>>>>>>>>> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/
>>>>>>>>>>>> with
>>>>>>>>>>>>>> Java
>>>>>>>>>>>>>>>> 11
>>>>>>>>>>>>>>>>> and Scala 13 in KRaft mode
>>>>>>>>>>>>>>>>> - - Our shiny new broker Docker image,
>>>>>>>>> apache/kafka:3.7.0-rc2
>>>>>>>>>>>>>>>>> - Ran all unit tests
>>>>>>>>>>>>>>>>> - Ran all integration tests for Connect and MM2
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I found two minor areas for concern:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 1. (Possibly a blocker)
>>>>>>>>>>>>>>>>> When running the quickstart, I noticed this ERROR-level log
>>>>>>>>>>>> message
>>>>>>>>>>>>>>> being
>>>>>>>>>>>>>>>>> emitted frequently (not not every time) when I killed my
>>>>>>>>>> console
>>>>>>>>>>>>>>> consumer
>>>>>>>>>>>>>>>>> via ctrl-C:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> [2024-01-12 11:00:31,088] ERROR [Consumer
>>>>>>>>>>>>>> clientId=console-consumer,
>>>>>>>>>>>>>>>>> groupId=console-consumer-74388] Unable to find
>>>>>>>>>>>> FetchSessionHandler
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> node
>>>>>>>>>>>>>>>>> 1. Ignoring fetch response
>>>>>>>>>>>>>>>>> (org.apache.kafka.clients.consumer.internals.AbstractFetch)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I see that this error message is already reported in
>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-16029. I
>>>>>>>>> think we
>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>> prioritize fixing it for this release. I know it's probably
>>>>>>>>>>>> benign
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>> it's
>>>>>>>>>>>>>>>>> really not a good look for us when basic operations log
>>>>>>>>> error
>>>>>>>>>>>>>> messages,
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> it may give new users some headaches.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 2. (Probably not a blocker)
>>>>>>>>>>>>>>>>> The following unit tests failed the first time around, and
>>>>>>>>>> all of
>>>>>>>>>>>>>> them
>>>>>>>>>>>>>>>>> passed the second time I ran them:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> - (clients)
>>>>>>>>>>>>>>>> 
>>>>>>>>> ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup()
>>>>>>>>>>>>>>>>> - (clients) SelectorTest.testConnectionsByClientMetric()
>>>>>>>>>>>>>>>>> - (clients)
>>>>>>>>> Tls13SelectorTest.testConnectionsByClientMetric()
>>>>>>>>>>>>>>>>> - (connect)
>>>>>>>>>>>>>> TopicAdminTest.retryEndOffsetsShouldRetryWhenTopicNotFound
>>>>>>>>>>>>>>> (I
>>>>>>>>>>>>>>>>> thought I fixed this one! 🤬🤬)
>>>>>>>>>>>>>>>>> - (core)
>>>>>>>>>> ProducerIdManagerTest.testUnrecoverableErrors(Errors)[2]
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks again for your work on this release, and
>>>>>>>>>> congratulations
>>>>>>>>>>>> to
>>>>>>>>>>>>>>> Kafka
>>>>>>>>>>>>>>>>> Streams for having zero flaky unit tests during my
>>>>>>>>>>>>>> highly-experimental
>>>>>>>>>>>>>>>>> single laptop run!
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Thu, Jan 11, 2024 at 1:33 PM Stanislav Kozlovski
>>>>>>>>>>>>>>>>> <stanis...@confluent.io.invalid> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hello Kafka users, developers, and client-developers,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> This is the first candidate for release of Apache Kafka
>>>>>>>>>> 3.7.0.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Note it's named "RC2" because I had a few "failed" RCs
>>>>>>>>> that
>>>>>>>>>> I
>>>>>>>>>>>> had
>>>>>>>>>>>>>>>>>> cut/uploaded but ultimately had to scrap prior to
>>>>>>>>> announcing
>>>>>>>>>>>> due
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>> blockers arriving before I could even announce them.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Further - I haven't yet been able to set up the system
>>>>>>>>> tests
>>>>>>>>>>>>>>>> successfully.
>>>>>>>>>>>>>>>>>> And the integration/unit tests do have a few failures
>>>>>>>>> that I
>>>>>>>>>>>> have
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> spend
>>>>>>>>>>>>>>>>>> time triaging. I would appreciate any help in case anyone
>>>>>>>>>>>> notices
>>>>>>>>>>>>>> any
>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>>> failing that they're subject matters experts in. Expect
>>>>>>>>> me
>>>>>>>>>> to
>>>>>>>>>>>>>> follow
>>>>>>>>>>>>>>>> up in
>>>>>>>>>>>>>>>>>> a day or two with more detailed analysis.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Major changes include:
>>>>>>>>>>>>>>>>>> - Early Access to KIP-848 - the next generation of the
>>>>>>>>>> consumer
>>>>>>>>>>>>>>>> rebalance
>>>>>>>>>>>>>>>>>> protocol
>>>>>>>>>>>>>>>>>> - KIP-858: Adding JBOD support to KRaft
>>>>>>>>>>>>>>>>>> - KIP-714: Observability into Client metrics via a
>>>>>>>>>> standardized
>>>>>>>>>>>>>>>> interface
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Check more information in the WIP blog post:
>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka-site/pull/578
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Release notes for the 3.7.0 release:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/RELEASE_NOTES.html
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> *** Please download, test and vote by Thursday, January
>>>>>>>>> 18,
>>>>>>>>>> 9am
>>>>>>>>>>>>> PT
>>>>>>>>>>>>>>> ***
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Usually these deadlines tend to be 2-3 days, but due to
>>>>>>>>> this
>>>>>>>>>>>>> being
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> first RC and the tests not having ran yet, I am giving
>>>>>>>>> it a
>>>>>>>>>> bit
>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>> time.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Kafka's KEYS file containing PGP keys we use to sign the
>>>>>>>>>>>> release:
>>>>>>>>>>>>>>>>>> https://kafka.apache.org/KEYS
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> * Release artifacts to be voted upon (source and binary):
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> * Docker release artifact to be voted upon:
>>>>>>>>>>>>>>>>>> apache/kafka:3.7.0-rc2
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> * Maven artifacts to be voted upon:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> * Javadoc:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/javadoc/
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> * Tag to be voted upon (off 3.7 branch) is the 3.7.0 tag:
>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/releases/tag/3.7.0-rc2
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> * Documentation:
>>>>>>>>>>>>>>>>>> https://kafka.apache.org/37/documentation.html
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> * Protocol:
>>>>>>>>>>>>>>>>>> https://kafka.apache.org/37/protocol.html
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> * Successful Jenkins builds for the 3.7 branch:
>>>>>>>>>>>>>>>>>> Unit/integration tests:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>> https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.7/58/
>>>>>>>>>>>>>>>>>> There are failing tests here. I have to follow up with
>>>>>>>>>> triaging
>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> the failures and figuring out if they're actual problems
>>>>>>>>> or
>>>>>>>>>>>>> simply
>>>>>>>>>>>>>>>> flakes.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> System tests:
>>>>>>>>>>>>>>>> https://jenkins.confluent.io/job/system-test-kafka/job/3.7/
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> No successful system test runs yet. I am working on
>>>>>>>>> getting
>>>>>>>>>> the
>>>>>>>>>>>>> job
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> run.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> * Successful Docker Image Github Actions Pipeline for 3.7
>>>>>>>>>>>> branch:
>>>>>>>>>>>>>>>>>> Attached are the scan_report and report_jvm output files
>>>>>>>>>> from
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> Docker
>>>>>>>>>>>>>>>>>> Build run:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>> https://github.com/apache/kafka/actions/runs/7486094960/job/20375761673
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> And the final docker image build job - Docker Build Test
>>>>>>>>>>>>> Pipeline:
>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/actions/runs/7486178277
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The image is apache/kafka:3.7.0-rc2 -
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> https://hub.docker.com/layers/apache/kafka/3.7.0-rc2/images/sha256-5b4707c08170d39549fbb6e2a3dbb83936a50f987c0c097f23cb26b4c210c226?context=explore
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> /**************************************
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Stanislav Kozlovski
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Stanislav
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Best,
>>>> Stanislav
>>> 
>> 
> 

Reply via email to