Hi Stanislav, I merged https://github.com/apache/kafka/pull/15308 in trunk. I let you cherry-pick it to 3.7.
I think fixing the absolute show stoppers and calling JBOD support in KRaft early access in 3.7.0 is probably the right call. Even without the bugs we found, there's still quite a few JBOD follow up work to do (KAFKA-16061) + system tests and documentation updates. Thanks, Mickael On Fri, Feb 2, 2024 at 4:49 PM Stanislav Kozlovski <stanis...@confluent.io.invalid> wrote: > > Thanks for the work everybody. Providing a status update at the end of the > week: > > - docs change explaining migration > <https://github.com/apache/kafka/pull/15193> was merged > - the blocker KAFKA-16162 <https://github.com/apache/kafka/pull/15270> was > merged > - the blocker KAFKA-14616 <https://github.com/apache/kafka/pull/15230> was > merged > - a small blocker problem with the shadow jar plugin > <https://github.com/apache/kafka/pull/15308> > - the blockers KAFKALESS-16157 & KAFKALESS-16195 aren't merged > - the good-to-have KAFKA-16082 isn't merged > > I think we should prioritize merging KAFKALESS-16195 and *call JBOD EA*. I > question whether we may find more blocker bugs in the next RC. > The release is late by approximately a month so far, so I do want to scope > down aggressively to meet the time-based goal. > > Best, > Stanislav > > On Mon, Jan 29, 2024 at 5:46 PM Omnia Ibrahim <o.g.h.ibra...@gmail.com> > wrote: > > > Hi Stan and Gaurav, > > Just to clarify some points mentioned here before > > KAFKA-14616: I raised a year ago so it's not related to JBOD work. It is > > rather a blocker bug for KRAFT in general. The PR from Colin should fix > > this. Am not sure if it is a blocker for 3.7 per-say as it was a major bug > > since 3.3 and got missed from all other releases. > > > > Regarding the JBOD's work: > > KAFKA-16082: Is not a blocker for 3.7 instead it's nice fix. The pr > > https://github.com/apache/kafka/pull/15136 is quite a small one and was > > approved by Proven and I but it is waiting for a committer's approval. > > KAFKA-16162: This is a blocker for 3.7. Same it’s a small pr > > https://github.com/apache/kafka/pull/15270 and it is approved Proven and > > I and the PR is waiting for committer's approval. > > KAFKA-16157: This is a blocker for 3.7. There is one small suggestion for > > the pr https://github.com/apache/kafka/pull/15263 but I don't think any > > of the current feedback is blocking the pr from getting approved. Assuming > > we get a committer's approval on it. > > KAFKA-16195: Same it's a blocker but it has approval from Proven and I > > and we are waiting for committer's approval on the pr > > https://github.com/apache/kafka/pull/15262. > > > > If we can’t get a committer approval for KAFKA-16162, KAFKA-16157 and > > KAFKA-16195 in time for 3.7 then we can mark JBOD as early release > > assuming we merge at least KAFKA-16195. > > > > Regards, > > Omnia > > > > > On 26 Jan 2024, at 15:39, ka...@gnarula.com wrote: > > > > > > Apologies, I duplicated KAFKA-16157 twice in my previous message. I > > intended to mention KAFKA-16195 > > > with the PR at https://github.com/apache/kafka/pull/15262 as the second > > JIRA. > > > > > > Thanks, > > > Gaurav > > > > > >> On 26 Jan 2024, at 15:34, ka...@gnarula.com wrote: > > >> > > >> Hi Stan, > > >> > > >> I wanted to share some updates about the bugs you shared earlier. > > >> > > >> - KAFKA-14616: I've reviewed and tested the PR from Colin and have > > observed > > >> the fix works as intended. > > >> - KAFKA-16162: I reviewed Proven's PR and found some gaps in the > > proposed fix. I've > > >> therefore raised https://github.com/apache/kafka/pull/15270 following > > a discussion with Luke in JIRA. > > >> - KAFKA-16082: I don't think this is marked as a blocker anymore. I'm > > awaiting > > >> feedback/reviews at https://github.com/apache/kafka/pull/15136 > > >> > > >> In addition to the above, there are 2 JIRAs I'd like to bring > > everyone's attention to: > > >> > > >> - KAFKA-16157: This is similar to KAFKA-14616 and is marked as a > > blocker. I've raised > > >> https://github.com/apache/kafka/pull/15263 and am awaiting reviews on > > it. > > >> - KAFKA-16157: I raised this yesterday and have addressed feedback from > > Luke. This should > > >> hopefully get merged soon. > > >> > > >> Regards, > > >> Gaurav > > >> > > >> > > >>> On 24 Jan 2024, at 11:51, ka...@gnarula.com wrote: > > >>> > > >>> Hi Stanislav, > > >>> > > >>> Thanks for bringing these JIRAs/PRs up. > > >>> > > >>> I'll be testing the open PRs for KAFKA-14616 and KAFKA-16162 this week > > and I hope to have some feedback > > >>> by Friday. I gather the latter JIRA is marked as a WIP by Proven and > > he's away. I'll try to build on his work in the meantime. > > >>> > > >>> As for KAFKA-16082, we haven't been able to deduce a data loss > > scenario. There's a PR open > > >>> by me for promoting an abandoned future replica with approvals from > > Omnia and Proven, > > >>> so I'd appreciate a committer reviewing it. > > >>> > > >>> Regards, > > >>> Gaurav > > >>> > > >>> On 23 Jan 2024, at 20:17, Stanislav Kozlovski > > >>> <stanis...@confluent.io.INVALID> > > wrote: > > >>>> > > >>>> Hey all, I figured I'd give an update about what known blockers we > > have > > >>>> right now: > > >>>> > > >>>> - KAFKA-16101: KRaft migration rollback documentation is incorrect - > > >>>> https://github.com/apache/kafka/pull/15193; This need not block RC > > >>>> creation, but we need the docs updated so that people can test > > properly > > >>>> - KAFKA-14616: Topic recreation with offline broker causes permanent > > URPs - > > >>>> https://github.com/apache/kafka/pull/15230 ; I am of the > > understanding that > > >>>> this is blocking JBOD for 3.7 > > >>>> - KAFKA-16162: New created topics are unavailable after upgrading to > > 3.7 - > > >>>> a strict blocker with an open PR > > https://github.com/apache/kafka/pull/15232 > > >>>> - although I understand Proveen is out of office > > >>>> - KAFKA-16082: JBOD: Possible dataloss when moving leader partition - > > I am > > >>>> hearing mixed opinions on whether this is a blocker ( > > >>>> https://github.com/apache/kafka/pull/15136) > > >>>> > > >>>> Given that there are 3 JBOD blocker bugs, and I am not confident they > > will > > >>>> all be merged this week - I am on the edge of voting to revert JBOD > > from > > >>>> this release, or mark it early access. > > >>>> > > >>>> By all accounts, it seems that if we keep with JBOD the release will > > have > > >>>> to spill into February, which is a month extra from the time-based > > release > > >>>> plan we had of start of January. > > >>>> > > >>>> Can I ask others for an opinion? > > >>>> > > >>>> Best, > > >>>> Stan > > >>>> > > >>>> On Thu, Jan 18, 2024 at 1:21 PM Luke Chen <show...@gmail.com> wrote: > > >>>> > > >>>>> Hi all, > > >>>>> > > >>>>> I think I've found another blocker issue: KAFKA-16162 > > >>>>> <https://issues.apache.org/jira/browse/KAFKA-16162> . > > >>>>> The impact is after upgrading to 3.7.0, any new created > > topics/partitions > > >>>>> will be unavailable. > > >>>>> I've put my findings in the JIRA. > > >>>>> > > >>>>> Thanks. > > >>>>> Luke > > >>>>> > > >>>>> On Thu, Jan 18, 2024 at 9:50 AM Matthias J. Sax <mj...@apache.org> > > wrote: > > >>>>> > > >>>>>> Stan, thanks for driving this all forward! Excellent job. > > >>>>>> > > >>>>>> About > > >>>>>> > > >>>>>>> StreamsStandbyTask - > > https://issues.apache.org/jira/browse/KAFKA-16141 > > >>>>>>> StreamsUpgradeTest - > > https://issues.apache.org/jira/browse/KAFKA-16139 > > >>>>>> > > >>>>>> For `StreamsUpgradeTest` it was a test setup issue and should be > > fixed > > >>>>>> now in trunk and 3.7 (and actually also in 3.6...) > > >>>>>> > > >>>>>> For `StreamsStandbyTask` the failing test exposes a regression bug, > > so > > >>>>>> it's a blocker. I updated the ticket accordingly. We already have an > > >>>>>> open PR that reverts the code introducing the regression. > > >>>>>> > > >>>>>> > > >>>>>> -Matthias > > >>>>>> > > >>>>>> On 1/17/24 9:44 AM, Proven Provenzano wrote: > > >>>>>>> We have another blocking issue for the RC : > > >>>>>>> https://issues.apache.org/jira/browse/KAFKA-16157. This bug is > > similar > > >>>>>> to > > >>>>>>> https://issues.apache.org/jira/browse/KAFKA-14616. The new issue > > >>>>> however > > >>>>>>> can lead to the new topic having partitions that a producer cannot > > >>>>> write > > >>>>>> to. > > >>>>>>> > > >>>>>>> --Proven > > >>>>>>> > > >>>>>>> On Tue, Jan 16, 2024 at 12:04 PM Proven Provenzano < > > >>>>>> pprovenz...@confluent.io> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> > > >>>>>>>> I have a PR https://github.com/apache/kafka/pull/15197 for > > >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-16131 that is > > building > > >>>>> now. > > >>>>>>>> --Proven > > >>>>>>>> > > >>>>>>>> On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz <ja...@scholz.cz> > > wrote: > > >>>>>>>> > > >>>>>>>>> *> Hi Jakub,> > Thanks for trying the RC. I think what you found > > is a > > >>>>>>>>> blocker bug because it * > > >>>>>>>>> *> will generate huge amount of logspam. I guess we didn't find > > it in > > >>>>>>>>> junit > > >>>>>>>>> tests * > > >>>>>>>>> *> since logspam doesn't fail the automated tests. But certainly > > it's > > >>>>>> not > > >>>>>>>>> suitable * > > >>>>>>>>> *> for production. Did you file a JIRA yet?* > > >>>>>>>>> > > >>>>>>>>> Hi Colin, > > >>>>>>>>> > > >>>>>>>>> I opened https://issues.apache.org/jira/browse/KAFKA-16131. > > >>>>>>>>> > > >>>>>>>>> Thanks & Regards > > >>>>>>>>> Jakub > > >>>>>>>>> > > >>>>>>>>> On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe <cmcc...@apache.org > > > > > >>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> Hi Stanislav, > > >>>>>>>>>> > > >>>>>>>>>> Thanks for making the first RC. The fact that it's titled RC2 is > > >>>>>> messing > > >>>>>>>>>> with my mind a bit. I hope this doesn't make people think that > > we're > > >>>>>>>>>> farther along than we are, heh. > > >>>>>>>>>> > > >>>>>>>>>> On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote: > > >>>>>>>>>>> *> Nice catch! It does seem like we should have gated this > > behind > > >>>>> the > > >>>>>>>>>>> metadata> version as KIP-858 implies. Is the cluster configured > > >>>>> with > > >>>>>>>>>>> multiple log> dirs? What is the impact of the error messages?* > > >>>>>>>>>>> > > >>>>>>>>>>> I did not observe any obvious impact. I was able to send and > > >>>>> receive > > >>>>>>>>>>> messages as normally. But to be honest, I have no idea what > > else > > >>>>>>>>>>> this might impact, so I did not try anything special. > > >>>>>>>>>>> > > >>>>>>>>>>> I think everyone upgrading an existing KRaft cluster will go > > >>>>> through > > >>>>>>>>> this > > >>>>>>>>>>> stage (running Kafka 3.7 with an older metadata version for at > > >>>>> least > > >>>>>> a > > >>>>>>>>>>> while). So even if it is just a logged exception without any > > other > > >>>>>>>>>> impact I > > >>>>>>>>>>> wonder if it might scare users from upgrading. But I leave it > > to > > >>>>>>>>> others > > >>>>>>>>>> to > > >>>>>>>>>>> decide if this is a blocker or not. > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> Hi Jakub, > > >>>>>>>>>> > > >>>>>>>>>> Thanks for trying the RC. I think what you found is a blocker > > bug > > >>>>>>>>> because > > >>>>>>>>>> it will generate huge amount of logspam. I guess we didn't find > > it > > >>>>> in > > >>>>>>>>> junit > > >>>>>>>>>> tests since logspam doesn't fail the automated tests. But > > certainly > > >>>>>> it's > > >>>>>>>>>> not suitable for production. Did you file a JIRA yet? > > >>>>>>>>>> > > >>>>>>>>>>> On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski > > >>>>>>>>>>> <stanis...@confluent.io.invalid> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> Hey Luke, > > >>>>>>>>>>>> > > >>>>>>>>>>>> This is an interesting problem. Given the fact that the KIP > > for > > >>>>>>>>> having a > > >>>>>>>>>>>> 3.8 release passed, I think it weights the scale towards not > > >>>>> calling > > >>>>>>>>>> this a > > >>>>>>>>>>>> blocker and expecting it to be solved in 3.7.1. > > >>>>>>>>>>>> > > >>>>>>>>>>>> It is unfortunate that it would not seem safe to migrate to > > KRaft > > >>>>> in > > >>>>>>>>>> 3.7.0 > > >>>>>>>>>>>> (given the inability to rollback safely), but if that's true > > - the > > >>>>>>>>> same > > >>>>>>>>>>>> case would apply for 3.6.0. So in any case users w\ould be > > >>>>> expected > > >>>>>>>>> to > > >>>>>>>>>> use a > > >>>>>>>>>>>> patch release for this. > > >>>>>>>>>> > > >>>>>>>>>> Hi Luke, > > >>>>>>>>>> > > >>>>>>>>>> Thanks for testing rollback. I think this is a case where the > > >>>>>>>>>> documentation is wrong. The intention was to for the steps to > > >>>>>> basically > > >>>>>>>>> be: > > >>>>>>>>>> > > >>>>>>>>>> 1. roll all the brokers into zk mode, but with migration enabled > > >>>>>>>>>> 2. take down the kraft quorum > > >>>>>>>>>> 3. rmr /controller, allowing a hybrid broker to take over. > > >>>>>>>>>> 4. roll all the brokers into zk mode without migration enabled > > (if > > >>>>>>>>> desired) > > >>>>>>>>>> > > >>>>>>>>>> With these steps, there isn't really unavailability since a ZK > > >>>>>>>>> controller > > >>>>>>>>>> can be elected quickly after the kraft quorum is gone. > > >>>>>>>>>> > > >>>>>>>>>>>> Further, since we will have a 3.8 release - it is > > >>>>>>>>>>>> likely we will ultimately recommend users upgrade from that > > >>>>> version > > >>>>>>>>>> given > > >>>>>>>>>>>> its aim is to have strategic KRaft feature parity with ZK. > > >>>>>>>>>>>> That being said, I am not 100% on this. Let me know whether > > you > > >>>>>> think > > >>>>>>>>>> this > > >>>>>>>>>>>> should block the release, Luke. I am also tagging Colin and > > David > > >>>>> to > > >>>>>>>>>> weigh > > >>>>>>>>>>>> in with their opinions, as they worked on the migration logic. > > >>>>>>>>>> > > >>>>>>>>>> The rollback docs are new in 3.7 so the fact that they're wrong > > is a > > >>>>>>>>> clear > > >>>>>>>>>> blocker, I think. But easy to fix, I believe. I will create a > > PR. > > >>>>>>>>>> > > >>>>>>>>>> best, > > >>>>>>>>>> Colin > > >>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> Hey Kirk and Chris, > > >>>>>>>>>>>> > > >>>>>>>>>>>> Unless I'm missing something - KAFKALESS-16029 is simply a > > bad log > > >>>>>>>>> due > > >>>>>>>>>> to > > >>>>>>>>>>>> improper closing. And the PR description implies this has been > > >>>>>>>>> present > > >>>>>>>>>>>> since 3.5. While annoying, I don't see a strong reason for > > this to > > >>>>>>>>> block > > >>>>>>>>>>>> the release. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Hey Jakub, > > >>>>>>>>>>>> > > >>>>>>>>>>>> Nice catch! It does seem like we should have gated this > > behind the > > >>>>>>>>>> metadata > > >>>>>>>>>>>> version as KIP-858 implies. Is the cluster configured with > > >>>>> multiple > > >>>>>>>>> log > > >>>>>>>>>>>> dirs? What is the impact of the error messages? > > >>>>>>>>>>>> > > >>>>>>>>>>>> Tagging Igor (the author of the KIP) to weigh in. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Best, > > >>>>>>>>>>>> Stanislav > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Sat, Jan 13, 2024 at 7:22 PM Jakub Scholz <ja...@scholz.cz > > > > > >>>>>>>>> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>>> Hi, > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I was trying the RC2 and run into the following issue ... > > when I > > >>>>>>>>> run > > >>>>>>>>>>>>> 3.7.0-RC2 KRaft cluster with metadata version set to 3.6-IV2 > > >>>>>>>>> metadata > > >>>>>>>>>>>>> version, I seem to be getting repeated errors like this in > > the > > >>>>>>>>>> controller > > >>>>>>>>>>>>> logs: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> 2024-01-13 16:58:01,197 INFO [QuorumController id=0] > > >>>>>>>>>>>> assignReplicasToDirs: > > >>>>>>>>>>>>> event failed with UnsupportedVersionException in 15 > > microseconds. > > >>>>>>>>>>>>> (org.apache.kafka.controller.QuorumController) > > >>>>>>>>>>>>> [quorum-controller-0-event-handler] > > >>>>>>>>>>>>> 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] > > >>>>> Unexpected > > >>>>>>>>>> error > > >>>>>>>>>>>>> handling request > > RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS, > > >>>>>>>>>>>>> apiVersion=0, clientId=1000, correlationId=14, > > headerVersion=2) > > >>>>> -- > > >>>>>>>>>>>>> AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5, > > >>>>>>>>>>>>> directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ, > > >>>>>>>>>>>>> topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ, > > >>>>>>>>>>>>> partitions=[PartitionData(partitionIndex=2), > > >>>>>>>>>>>>> PartitionData(partitionIndex=1)]), > > >>>>>>>>>>>>> TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ, > > >>>>>>>>>>>>> partitions=[PartitionData(partitionIndex=0)])])]) with > > context > > >>>>>>>>>>>>> > > >>>>> RequestContext(header=RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS, > > >>>>>>>>>>>>> apiVersion=0, clientId=1000, correlationId=14, > > headerVersion=2), > > >>>>>>>>>>>>> connectionId='172.16.14.219:9090-172.16.14.217:53590-7', > > >>>>>>>>>> clientAddress=/ > > >>>>>>>>>>>>> 172.16.14.217, > > principal=User:CN=my-cluster-kafka,O=io.strimzi, > > >>>>>>>>>>>>> listenerName=ListenerName(CONTROLPLANE-9090), > > >>>>> securityProtocol=SSL, > > >>>>>>>>>>>>> > > >>>>> clientInformation=ClientInformation(softwareName=apache-kafka-java, > > >>>>>>>>>>>>> softwareVersion=3.7.0), fromPrivilegedListener=false, > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>> > > >>>>> > > principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@71004ad2 > > >>>>>>>>>>>>> ]) > > >>>>>>>>>>>>> (kafka.server.ControllerApis) > > [quorum-controller-0-event-handler] > > >>>>>>>>>>>>> java.util.concurrent.CompletionException: > > >>>>>>>>>>>>> org.apache.kafka.common.errors.UnsupportedVersionException: > > >>>>>>>>> Directory > > >>>>>>>>>>>>> assignment is not supported yet. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> at > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>> > > >>>>> > > java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) > > >>>>>>>>>>>>> at > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>> > > >>>>> > > java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) > > >>>>>>>>>>>>> at > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>> > > >>>>> > > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636) > > >>>>>>>>>>>>> at > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>> > > >>>>> > > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) > > >>>>>>>>>>>>> at > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>> > > >>>>> > > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) > > >>>>>>>>>>>>> at > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>> > > >>>>> > > org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:880) > > >>>>>>>>>>>>> at > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>> > > >>>>> > > org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871) > > >>>>>>>>>>>>> at > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>> > > >>>>> > > org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148) > > >>>>>>>>>>>>> at > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>> > > >>>>> > > org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137) > > >>>>>>>>>>>>> at > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>> > > >>>>> > > org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210) > > >>>>>>>>>>>>> at > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>> > > >>>>> > > org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181) > > >>>>>>>>>>>>> at java.base/java.lang.Thread.run(Thread.java:840) > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Caused by: > > >>>>>>>>> org.apache.kafka.common.errors.UnsupportedVersionException: > > >>>>>>>>>>>>> Directory assignment is not supported yet. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Is that expected? I guess with the metadata version set to > > >>>>>>>>> 3.6-IV2, it > > >>>>>>>>>>>>> makes sense that the request is not supported. But shouldn't > > then > > >>>>>>>>> the > > >>>>>>>>>>>>> request not be sent at all by the brokers? (I did not opened > > a > > >>>>> JIRA > > >>>>>>>>>> for > > >>>>>>>>>>>> it, > > >>>>>>>>>>>>> but I can open one if you agree this is not expected) > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Thanks & Regards > > >>>>>>>>>>>>> Jakub > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On Sat, Jan 13, 2024 at 8:03 AM Luke Chen <show...@gmail.com > > > > > >>>>>>>>> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> Hi Stanislav, > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I commented in the "Apache Kafka 3.7.0 Release" thread, but > > >>>>> maybe > > >>>>>>>>>> you > > >>>>>>>>>>>>>> missed it. > > >>>>>>>>>>>>>> cross-posting here: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> There is a bug KAFKA-16101 > > >>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-16101> > > reporting > > >>>>>>>>> that > > >>>>>>>>>>>>> "Kafka > > >>>>>>>>>>>>>> cluster will be unavailable during KRaft migration > > rollback". > > >>>>>>>>>>>>>> The impact for this issue is that if brokers try to > > rollback to > > >>>>>>>>> ZK > > >>>>>>>>>> mode > > >>>>>>>>>>>>>> during KRaft migration process, there will be a period of > > time > > >>>>>>>>> the > > >>>>>>>>>>>>> cluster > > >>>>>>>>>>>>>> is unavailable. > > >>>>>>>>>>>>>> Since ZK migrating to KRaft feature is a production ready > > >>>>>>>>> feature, I > > >>>>>>>>>>>>> think > > >>>>>>>>>>>>>> this should be addressed soon. > > >>>>>>>>>>>>>> Do you think this is a blocker for v3.7.0? > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Thanks. > > >>>>>>>>>>>>>> Luke > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Sat, Jan 13, 2024 at 8:36 AM Chris Egerton < > > >>>>>>>>>> fearthecel...@gmail.com > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Thanks, Kirk! > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> @Stanislav--do you believe that this warrants a new RC? > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> On Fri, Jan 12, 2024, 19:08 Kirk True <k...@kirktrue.pro> > > >>>>>>>>> wrote: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Hi Chris/Stanislav, > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I'm working on the 'Unable to find FetchSessionHandler' > > log > > >>>>>>>>>> problem > > >>>>>>>>>>>>>>>> (KAFKA-16029) and have put out a draft PR ( > > >>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/15186). I will use > > the > > >>>>>>>>>>>>> quickstart > > >>>>>>>>>>>>>>>> approach as a second means to reproduce/verify while I > > wait > > >>>>>>>>> for > > >>>>>>>>>> the > > >>>>>>>>>>>>>> PR's > > >>>>>>>>>>>>>>>> Jenkins job to finish. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Thanks, > > >>>>>>>>>>>>>>>> Kirk > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> On Fri, Jan 12, 2024, at 11:31 AM, Chris Egerton wrote: > > >>>>>>>>>>>>>>>>> Hi Stanislav, > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Thanks for running this release! > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> To verify, I: > > >>>>>>>>>>>>>>>>> - Built from source using Java 11 with both: > > >>>>>>>>>>>>>>>>> - - the 3.7.0-rc2 tag on GitHub > > >>>>>>>>>>>>>>>>> - - the kafka-3.7.0-src.tgz artifact from > > >>>>>>>>>>>>>>>>> > > >>>>>>>>> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/ > > >>>>>>>>>>>>>>>>> - Checked signatures and checksums > > >>>>>>>>>>>>>>>>> - Ran the quickstart using both: > > >>>>>>>>>>>>>>>>> - - The kafka_2.13-3.7.0.tgz artifact from > > >>>>>>>>>>>>>>>>> > > >>>>>>>>> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/ > > >>>>>>>>>>>> with > > >>>>>>>>>>>>>> Java > > >>>>>>>>>>>>>>>> 11 > > >>>>>>>>>>>>>>>>> and Scala 13 in KRaft mode > > >>>>>>>>>>>>>>>>> - - Our shiny new broker Docker image, > > >>>>>>>>> apache/kafka:3.7.0-rc2 > > >>>>>>>>>>>>>>>>> - Ran all unit tests > > >>>>>>>>>>>>>>>>> - Ran all integration tests for Connect and MM2 > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> I found two minor areas for concern: > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> 1. (Possibly a blocker) > > >>>>>>>>>>>>>>>>> When running the quickstart, I noticed this ERROR-level > > log > > >>>>>>>>>>>> message > > >>>>>>>>>>>>>>> being > > >>>>>>>>>>>>>>>>> emitted frequently (not not every time) when I killed my > > >>>>>>>>>> console > > >>>>>>>>>>>>>>> consumer > > >>>>>>>>>>>>>>>>> via ctrl-C: > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> [2024-01-12 11:00:31,088] ERROR [Consumer > > >>>>>>>>>>>>>> clientId=console-consumer, > > >>>>>>>>>>>>>>>>> groupId=console-consumer-74388] Unable to find > > >>>>>>>>>>>> FetchSessionHandler > > >>>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>> node > > >>>>>>>>>>>>>>>>> 1. Ignoring fetch response > > >>>>>>>>>>>>>>>>> > > (org.apache.kafka.clients.consumer.internals.AbstractFetch) > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> I see that this error message is already reported in > > >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-16029. I > > >>>>>>>>> think we > > >>>>>>>>>>>>> should > > >>>>>>>>>>>>>>>>> prioritize fixing it for this release. I know it's > > probably > > >>>>>>>>>>>> benign > > >>>>>>>>>>>>>> but > > >>>>>>>>>>>>>>>> it's > > >>>>>>>>>>>>>>>>> really not a good look for us when basic operations log > > >>>>>>>>> error > > >>>>>>>>>>>>>> messages, > > >>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>> it may give new users some headaches. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> 2. (Probably not a blocker) > > >>>>>>>>>>>>>>>>> The following unit tests failed the first time around, > > and > > >>>>>>>>>> all of > > >>>>>>>>>>>>>> them > > >>>>>>>>>>>>>>>>> passed the second time I ran them: > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> - (clients) > > >>>>>>>>>>>>>>>> > > >>>>>>>>> ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup() > > >>>>>>>>>>>>>>>>> - (clients) SelectorTest.testConnectionsByClientMetric() > > >>>>>>>>>>>>>>>>> - (clients) > > >>>>>>>>> Tls13SelectorTest.testConnectionsByClientMetric() > > >>>>>>>>>>>>>>>>> - (connect) > > >>>>>>>>>>>>>> TopicAdminTest.retryEndOffsetsShouldRetryWhenTopicNotFound > > >>>>>>>>>>>>>>> (I > > >>>>>>>>>>>>>>>>> thought I fixed this one! 🤬🤬) > > >>>>>>>>>>>>>>>>> - (core) > > >>>>>>>>>> ProducerIdManagerTest.testUnrecoverableErrors(Errors)[2] > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Thanks again for your work on this release, and > > >>>>>>>>>> congratulations > > >>>>>>>>>>>> to > > >>>>>>>>>>>>>>> Kafka > > >>>>>>>>>>>>>>>>> Streams for having zero flaky unit tests during my > > >>>>>>>>>>>>>> highly-experimental > > >>>>>>>>>>>>>>>>> single laptop run! > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Cheers, > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Chris > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> On Thu, Jan 11, 2024 at 1:33 PM Stanislav Kozlovski > > >>>>>>>>>>>>>>>>> <stanis...@confluent.io.invalid> wrote: > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Hello Kafka users, developers, and client-developers, > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> This is the first candidate for release of Apache Kafka > > >>>>>>>>>> 3.7.0. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Note it's named "RC2" because I had a few "failed" RCs > > >>>>>>>>> that > > >>>>>>>>>> I > > >>>>>>>>>>>> had > > >>>>>>>>>>>>>>>>>> cut/uploaded but ultimately had to scrap prior to > > >>>>>>>>> announcing > > >>>>>>>>>>>> due > > >>>>>>>>>>>>> to > > >>>>>>>>>>>>>>> new > > >>>>>>>>>>>>>>>>>> blockers arriving before I could even announce them. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Further - I haven't yet been able to set up the system > > >>>>>>>>> tests > > >>>>>>>>>>>>>>>> successfully. > > >>>>>>>>>>>>>>>>>> And the integration/unit tests do have a few failures > > >>>>>>>>> that I > > >>>>>>>>>>>> have > > >>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>> spend > > >>>>>>>>>>>>>>>>>> time triaging. I would appreciate any help in case > > anyone > > >>>>>>>>>>>> notices > > >>>>>>>>>>>>>> any > > >>>>>>>>>>>>>>>> tests > > >>>>>>>>>>>>>>>>>> failing that they're subject matters experts in. Expect > > >>>>>>>>> me > > >>>>>>>>>> to > > >>>>>>>>>>>>>> follow > > >>>>>>>>>>>>>>>> up in > > >>>>>>>>>>>>>>>>>> a day or two with more detailed analysis. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Major changes include: > > >>>>>>>>>>>>>>>>>> - Early Access to KIP-848 - the next generation of the > > >>>>>>>>>> consumer > > >>>>>>>>>>>>>>>> rebalance > > >>>>>>>>>>>>>>>>>> protocol > > >>>>>>>>>>>>>>>>>> - KIP-858: Adding JBOD support to KRaft > > >>>>>>>>>>>>>>>>>> - KIP-714: Observability into Client metrics via a > > >>>>>>>>>> standardized > > >>>>>>>>>>>>>>>> interface > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Check more information in the WIP blog post: > > >>>>>>>>>>>>>>>>>> https://github.com/apache/kafka-site/pull/578 > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Release notes for the 3.7.0 release: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>> > > >>>>> > > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/RELEASE_NOTES.html > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> *** Please download, test and vote by Thursday, January > > >>>>>>>>> 18, > > >>>>>>>>>> 9am > > >>>>>>>>>>>>> PT > > >>>>>>>>>>>>>>> *** > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Usually these deadlines tend to be 2-3 days, but due to > > >>>>>>>>> this > > >>>>>>>>>>>>> being > > >>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>> first RC and the tests not having ran yet, I am giving > > >>>>>>>>> it a > > >>>>>>>>>> bit > > >>>>>>>>>>>>>> more > > >>>>>>>>>>>>>>>> time. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Kafka's KEYS file containing PGP keys we use to sign the > > >>>>>>>>>>>> release: > > >>>>>>>>>>>>>>>>>> https://kafka.apache.org/KEYS > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> * Release artifacts to be voted upon (source and > > binary): > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/ > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> * Docker release artifact to be voted upon: > > >>>>>>>>>>>>>>>>>> apache/kafka:3.7.0-rc2 > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> * Maven artifacts to be voted upon: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>> > > >>>>> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/ > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> * Javadoc: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>> > > >>>>> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/javadoc/ > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> * Tag to be voted upon (off 3.7 branch) is the 3.7.0 > > tag: > > >>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/releases/tag/3.7.0-rc2 > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> * Documentation: > > >>>>>>>>>>>>>>>>>> https://kafka.apache.org/37/documentation.html > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> * Protocol: > > >>>>>>>>>>>>>>>>>> https://kafka.apache.org/37/protocol.html > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> * Successful Jenkins builds for the 3.7 branch: > > >>>>>>>>>>>>>>>>>> Unit/integration tests: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>> https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.7/58/ > > >>>>>>>>>>>>>>>>>> There are failing tests here. I have to follow up with > > >>>>>>>>>> triaging > > >>>>>>>>>>>>>> some > > >>>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>>> the failures and figuring out if they're actual problems > > >>>>>>>>> or > > >>>>>>>>>>>>> simply > > >>>>>>>>>>>>>>>> flakes. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> System tests: > > >>>>>>>>>>>>>>>> > > https://jenkins.confluent.io/job/system-test-kafka/job/3.7/ > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> No successful system test runs yet. I am working on > > >>>>>>>>> getting > > >>>>>>>>>> the > > >>>>>>>>>>>>> job > > >>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>> run. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> * Successful Docker Image Github Actions Pipeline for > > 3.7 > > >>>>>>>>>>>> branch: > > >>>>>>>>>>>>>>>>>> Attached are the scan_report and report_jvm output files > > >>>>>>>>>> from > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>>> Docker > > >>>>>>>>>>>>>>>>>> Build run: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>> > > >>>>>> > > https://github.com/apache/kafka/actions/runs/7486094960/job/20375761673 > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> And the final docker image build job - Docker Build Test > > >>>>>>>>>>>>> Pipeline: > > >>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/actions/runs/7486178277 > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> The image is apache/kafka:3.7.0-rc2 - > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>> > > >>>>> > > https://hub.docker.com/layers/apache/kafka/3.7.0-rc2/images/sha256-5b4707c08170d39549fbb6e2a3dbb83936a50f987c0c097f23cb26b4c210c226?context=explore > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> /************************************** > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Thanks, > > >>>>>>>>>>>>>>>>>> Stanislav Kozlovski > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> -- > > >>>>>>>>>>>> Best, > > >>>>>>>>>>>> Stanislav > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>>> > > >>>> -- > > >>>> Best, > > >>>> Stanislav > > >>> > > >> > > > > > > > > > -- > Best, > Stanislav