We have another blocking issue for the RC : https://issues.apache.org/jira/browse/KAFKA-16157. This bug is similar to https://issues.apache.org/jira/browse/KAFKA-14616. The new issue however can lead to the new topic having partitions that a producer cannot write to.
--Proven On Tue, Jan 16, 2024 at 12:04 PM Proven Provenzano <pprovenz...@confluent.io> wrote: > > I have a PR https://github.com/apache/kafka/pull/15197 for > https://issues.apache.org/jira/browse/KAFKA-16131 that is building now. > --Proven > > On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz <ja...@scholz.cz> wrote: > >> *> Hi Jakub,> > Thanks for trying the RC. I think what you found is a >> blocker bug because it * >> *> will generate huge amount of logspam. I guess we didn't find it in >> junit >> tests * >> *> since logspam doesn't fail the automated tests. But certainly it's not >> suitable * >> *> for production. Did you file a JIRA yet?* >> >> Hi Colin, >> >> I opened https://issues.apache.org/jira/browse/KAFKA-16131. >> >> Thanks & Regards >> Jakub >> >> On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe <cmcc...@apache.org> wrote: >> >> > Hi Stanislav, >> > >> > Thanks for making the first RC. The fact that it's titled RC2 is messing >> > with my mind a bit. I hope this doesn't make people think that we're >> > farther along than we are, heh. >> > >> > On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote: >> > > *> Nice catch! It does seem like we should have gated this behind the >> > > metadata> version as KIP-858 implies. Is the cluster configured with >> > > multiple log> dirs? What is the impact of the error messages?* >> > > >> > > I did not observe any obvious impact. I was able to send and receive >> > > messages as normally. But to be honest, I have no idea what else >> > > this might impact, so I did not try anything special. >> > > >> > > I think everyone upgrading an existing KRaft cluster will go through >> this >> > > stage (running Kafka 3.7 with an older metadata version for at least a >> > > while). So even if it is just a logged exception without any other >> > impact I >> > > wonder if it might scare users from upgrading. But I leave it to >> others >> > to >> > > decide if this is a blocker or not. >> > > >> > >> > Hi Jakub, >> > >> > Thanks for trying the RC. I think what you found is a blocker bug >> because >> > it will generate huge amount of logspam. I guess we didn't find it in >> junit >> > tests since logspam doesn't fail the automated tests. But certainly it's >> > not suitable for production. Did you file a JIRA yet? >> > >> > > On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski >> > > <stanis...@confluent.io.invalid> wrote: >> > > >> > >> Hey Luke, >> > >> >> > >> This is an interesting problem. Given the fact that the KIP for >> having a >> > >> 3.8 release passed, I think it weights the scale towards not calling >> > this a >> > >> blocker and expecting it to be solved in 3.7.1. >> > >> >> > >> It is unfortunate that it would not seem safe to migrate to KRaft in >> > 3.7.0 >> > >> (given the inability to rollback safely), but if that's true - the >> same >> > >> case would apply for 3.6.0. So in any case users w\ould be expected >> to >> > use a >> > >> patch release for this. >> > >> > Hi Luke, >> > >> > Thanks for testing rollback. I think this is a case where the >> > documentation is wrong. The intention was to for the steps to basically >> be: >> > >> > 1. roll all the brokers into zk mode, but with migration enabled >> > 2. take down the kraft quorum >> > 3. rmr /controller, allowing a hybrid broker to take over. >> > 4. roll all the brokers into zk mode without migration enabled (if >> desired) >> > >> > With these steps, there isn't really unavailability since a ZK >> controller >> > can be elected quickly after the kraft quorum is gone. >> > >> > >> Further, since we will have a 3.8 release - it is >> > >> likely we will ultimately recommend users upgrade from that version >> > given >> > >> its aim is to have strategic KRaft feature parity with ZK. >> > >> That being said, I am not 100% on this. Let me know whether you think >> > this >> > >> should block the release, Luke. I am also tagging Colin and David to >> > weigh >> > >> in with their opinions, as they worked on the migration logic. >> > >> > The rollback docs are new in 3.7 so the fact that they're wrong is a >> clear >> > blocker, I think. But easy to fix, I believe. I will create a PR. >> > >> > best, >> > Colin >> > >> > >> >> > >> Hey Kirk and Chris, >> > >> >> > >> Unless I'm missing something - KAFKALESS-16029 is simply a bad log >> due >> > to >> > >> improper closing. And the PR description implies this has been >> present >> > >> since 3.5. While annoying, I don't see a strong reason for this to >> block >> > >> the release. >> > >> >> > >> Hey Jakub, >> > >> >> > >> Nice catch! It does seem like we should have gated this behind the >> > metadata >> > >> version as KIP-858 implies. Is the cluster configured with multiple >> log >> > >> dirs? What is the impact of the error messages? >> > >> >> > >> Tagging Igor (the author of the KIP) to weigh in. >> > >> >> > >> Best, >> > >> Stanislav >> > >> >> > >> On Sat, Jan 13, 2024 at 7:22 PM Jakub Scholz <ja...@scholz.cz> >> wrote: >> > >> >> > >> > Hi, >> > >> > >> > >> > I was trying the RC2 and run into the following issue ... when I >> run >> > >> > 3.7.0-RC2 KRaft cluster with metadata version set to 3.6-IV2 >> metadata >> > >> > version, I seem to be getting repeated errors like this in the >> > controller >> > >> > logs: >> > >> > >> > >> > 2024-01-13 16:58:01,197 INFO [QuorumController id=0] >> > >> assignReplicasToDirs: >> > >> > event failed with UnsupportedVersionException in 15 microseconds. >> > >> > (org.apache.kafka.controller.QuorumController) >> > >> > [quorum-controller-0-event-handler] >> > >> > 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] Unexpected >> > error >> > >> > handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS, >> > >> > apiVersion=0, clientId=1000, correlationId=14, headerVersion=2) -- >> > >> > AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5, >> > >> > directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ, >> > >> > topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ, >> > >> > partitions=[PartitionData(partitionIndex=2), >> > >> > PartitionData(partitionIndex=1)]), >> > >> > TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ, >> > >> > partitions=[PartitionData(partitionIndex=0)])])]) with context >> > >> > RequestContext(header=RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS, >> > >> > apiVersion=0, clientId=1000, correlationId=14, headerVersion=2), >> > >> > connectionId='172.16.14.219:9090-172.16.14.217:53590-7', >> > clientAddress=/ >> > >> > 172.16.14.217, principal=User:CN=my-cluster-kafka,O=io.strimzi, >> > >> > listenerName=ListenerName(CONTROLPLANE-9090), securityProtocol=SSL, >> > >> > clientInformation=ClientInformation(softwareName=apache-kafka-java, >> > >> > softwareVersion=3.7.0), fromPrivilegedListener=false, >> > >> > >> > >> > >> > >> >> > >> principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@71004ad2 >> > >> > ]) >> > >> > (kafka.server.ControllerApis) [quorum-controller-0-event-handler] >> > >> > java.util.concurrent.CompletionException: >> > >> > org.apache.kafka.common.errors.UnsupportedVersionException: >> Directory >> > >> > assignment is not supported yet. >> > >> > >> > >> > at >> > >> > >> > >> > >> > >> >> > >> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) >> > >> > at >> > >> > >> > >> > >> > >> >> > >> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) >> > >> > at >> > >> > >> > >> > >> > >> >> > >> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636) >> > >> > at >> > >> > >> > >> > >> > >> >> > >> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) >> > >> > at >> > >> > >> > >> > >> > >> >> > >> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) >> > >> > at >> > >> > >> > >> > >> > >> >> > >> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:880) >> > >> > at >> > >> > >> > >> > >> > >> >> > >> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871) >> > >> > at >> > >> > >> > >> > >> > >> >> > >> org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148) >> > >> > at >> > >> > >> > >> > >> > >> >> > >> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137) >> > >> > at >> > >> > >> > >> > >> > >> >> > >> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210) >> > >> > at >> > >> > >> > >> > >> > >> >> > >> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181) >> > >> > at java.base/java.lang.Thread.run(Thread.java:840) >> > >> > >> > >> > Caused by: >> org.apache.kafka.common.errors.UnsupportedVersionException: >> > >> > Directory assignment is not supported yet. >> > >> > >> > >> > Is that expected? I guess with the metadata version set to >> 3.6-IV2, it >> > >> > makes sense that the request is not supported. But shouldn't then >> the >> > >> > request not be sent at all by the brokers? (I did not opened a JIRA >> > for >> > >> it, >> > >> > but I can open one if you agree this is not expected) >> > >> > >> > >> > Thanks & Regards >> > >> > Jakub >> > >> > >> > >> > On Sat, Jan 13, 2024 at 8:03 AM Luke Chen <show...@gmail.com> >> wrote: >> > >> > >> > >> > > Hi Stanislav, >> > >> > > >> > >> > > I commented in the "Apache Kafka 3.7.0 Release" thread, but maybe >> > you >> > >> > > missed it. >> > >> > > cross-posting here: >> > >> > > >> > >> > > There is a bug KAFKA-16101 >> > >> > > <https://issues.apache.org/jira/browse/KAFKA-16101> reporting >> that >> > >> > "Kafka >> > >> > > cluster will be unavailable during KRaft migration rollback". >> > >> > > The impact for this issue is that if brokers try to rollback to >> ZK >> > mode >> > >> > > during KRaft migration process, there will be a period of time >> the >> > >> > cluster >> > >> > > is unavailable. >> > >> > > Since ZK migrating to KRaft feature is a production ready >> feature, I >> > >> > think >> > >> > > this should be addressed soon. >> > >> > > Do you think this is a blocker for v3.7.0? >> > >> > > >> > >> > > Thanks. >> > >> > > Luke >> > >> > > >> > >> > > On Sat, Jan 13, 2024 at 8:36 AM Chris Egerton < >> > fearthecel...@gmail.com >> > >> > >> > >> > > wrote: >> > >> > > >> > >> > > > Thanks, Kirk! >> > >> > > > >> > >> > > > @Stanislav--do you believe that this warrants a new RC? >> > >> > > > >> > >> > > > On Fri, Jan 12, 2024, 19:08 Kirk True <k...@kirktrue.pro> >> wrote: >> > >> > > > >> > >> > > > > Hi Chris/Stanislav, >> > >> > > > > >> > >> > > > > I'm working on the 'Unable to find FetchSessionHandler' log >> > problem >> > >> > > > > (KAFKA-16029) and have put out a draft PR ( >> > >> > > > > https://github.com/apache/kafka/pull/15186). I will use the >> > >> > quickstart >> > >> > > > > approach as a second means to reproduce/verify while I wait >> for >> > the >> > >> > > PR's >> > >> > > > > Jenkins job to finish. >> > >> > > > > >> > >> > > > > Thanks, >> > >> > > > > Kirk >> > >> > > > > >> > >> > > > > On Fri, Jan 12, 2024, at 11:31 AM, Chris Egerton wrote: >> > >> > > > > > Hi Stanislav, >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > Thanks for running this release! >> > >> > > > > > >> > >> > > > > > To verify, I: >> > >> > > > > > - Built from source using Java 11 with both: >> > >> > > > > > - - the 3.7.0-rc2 tag on GitHub >> > >> > > > > > - - the kafka-3.7.0-src.tgz artifact from >> > >> > > > > > >> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/ >> > >> > > > > > - Checked signatures and checksums >> > >> > > > > > - Ran the quickstart using both: >> > >> > > > > > - - The kafka_2.13-3.7.0.tgz artifact from >> > >> > > > > > >> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/ >> > >> with >> > >> > > Java >> > >> > > > > 11 >> > >> > > > > > and Scala 13 in KRaft mode >> > >> > > > > > - - Our shiny new broker Docker image, >> apache/kafka:3.7.0-rc2 >> > >> > > > > > - Ran all unit tests >> > >> > > > > > - Ran all integration tests for Connect and MM2 >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > I found two minor areas for concern: >> > >> > > > > > >> > >> > > > > > 1. (Possibly a blocker) >> > >> > > > > > When running the quickstart, I noticed this ERROR-level log >> > >> message >> > >> > > > being >> > >> > > > > > emitted frequently (not not every time) when I killed my >> > console >> > >> > > > consumer >> > >> > > > > > via ctrl-C: >> > >> > > > > > >> > >> > > > > > > [2024-01-12 11:00:31,088] ERROR [Consumer >> > >> > > clientId=console-consumer, >> > >> > > > > > groupId=console-consumer-74388] Unable to find >> > >> FetchSessionHandler >> > >> > > for >> > >> > > > > node >> > >> > > > > > 1. Ignoring fetch response >> > >> > > > > > (org.apache.kafka.clients.consumer.internals.AbstractFetch) >> > >> > > > > > >> > >> > > > > > I see that this error message is already reported in >> > >> > > > > > https://issues.apache.org/jira/browse/KAFKA-16029. I >> think we >> > >> > should >> > >> > > > > > prioritize fixing it for this release. I know it's probably >> > >> benign >> > >> > > but >> > >> > > > > it's >> > >> > > > > > really not a good look for us when basic operations log >> error >> > >> > > messages, >> > >> > > > > and >> > >> > > > > > it may give new users some headaches. >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > 2. (Probably not a blocker) >> > >> > > > > > The following unit tests failed the first time around, and >> > all of >> > >> > > them >> > >> > > > > > passed the second time I ran them: >> > >> > > > > > >> > >> > > > > > - (clients) >> > >> > > > > >> ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup() >> > >> > > > > > - (clients) SelectorTest.testConnectionsByClientMetric() >> > >> > > > > > - (clients) >> Tls13SelectorTest.testConnectionsByClientMetric() >> > >> > > > > > - (connect) >> > >> > > TopicAdminTest.retryEndOffsetsShouldRetryWhenTopicNotFound >> > >> > > > (I >> > >> > > > > > thought I fixed this one! 🤬🤬) >> > >> > > > > > - (core) >> > ProducerIdManagerTest.testUnrecoverableErrors(Errors)[2] >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > Thanks again for your work on this release, and >> > congratulations >> > >> to >> > >> > > > Kafka >> > >> > > > > > Streams for having zero flaky unit tests during my >> > >> > > highly-experimental >> > >> > > > > > single laptop run! >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > Cheers, >> > >> > > > > > >> > >> > > > > > Chris >> > >> > > > > > >> > >> > > > > > On Thu, Jan 11, 2024 at 1:33 PM Stanislav Kozlovski >> > >> > > > > > <stanis...@confluent.io.invalid> wrote: >> > >> > > > > > >> > >> > > > > > > Hello Kafka users, developers, and client-developers, >> > >> > > > > > > >> > >> > > > > > > This is the first candidate for release of Apache Kafka >> > 3.7.0. >> > >> > > > > > > >> > >> > > > > > > Note it's named "RC2" because I had a few "failed" RCs >> that >> > I >> > >> had >> > >> > > > > > > cut/uploaded but ultimately had to scrap prior to >> announcing >> > >> due >> > >> > to >> > >> > > > new >> > >> > > > > > > blockers arriving before I could even announce them. >> > >> > > > > > > >> > >> > > > > > > Further - I haven't yet been able to set up the system >> tests >> > >> > > > > successfully. >> > >> > > > > > > And the integration/unit tests do have a few failures >> that I >> > >> have >> > >> > > to >> > >> > > > > spend >> > >> > > > > > > time triaging. I would appreciate any help in case anyone >> > >> notices >> > >> > > any >> > >> > > > > tests >> > >> > > > > > > failing that they're subject matters experts in. Expect >> me >> > to >> > >> > > follow >> > >> > > > > up in >> > >> > > > > > > a day or two with more detailed analysis. >> > >> > > > > > > >> > >> > > > > > > Major changes include: >> > >> > > > > > > - Early Access to KIP-848 - the next generation of the >> > consumer >> > >> > > > > rebalance >> > >> > > > > > > protocol >> > >> > > > > > > - KIP-858: Adding JBOD support to KRaft >> > >> > > > > > > - KIP-714: Observability into Client metrics via a >> > standardized >> > >> > > > > interface >> > >> > > > > > > >> > >> > > > > > > Check more information in the WIP blog post: >> > >> > > > > > > https://github.com/apache/kafka-site/pull/578 >> > >> > > > > > > >> > >> > > > > > > Release notes for the 3.7.0 release: >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/RELEASE_NOTES.html >> > >> > > > > > > >> > >> > > > > > > *** Please download, test and vote by Thursday, January >> 18, >> > 9am >> > >> > PT >> > >> > > > *** >> > >> > > > > > > >> > >> > > > > > > Usually these deadlines tend to be 2-3 days, but due to >> this >> > >> > being >> > >> > > > the >> > >> > > > > > > first RC and the tests not having ran yet, I am giving >> it a >> > bit >> > >> > > more >> > >> > > > > time. >> > >> > > > > > > >> > >> > > > > > > Kafka's KEYS file containing PGP keys we use to sign the >> > >> release: >> > >> > > > > > > https://kafka.apache.org/KEYS >> > >> > > > > > > >> > >> > > > > > > * Release artifacts to be voted upon (source and binary): >> > >> > > > > > > >> > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/ >> > >> > > > > > > >> > >> > > > > > > * Docker release artifact to be voted upon: >> > >> > > > > > > apache/kafka:3.7.0-rc2 >> > >> > > > > > > >> > >> > > > > > > * Maven artifacts to be voted upon: >> > >> > > > > > > >> > >> > > > >> > >> >> https://repository.apache.org/content/groups/staging/org/apache/kafka/ >> > >> > > > > > > >> > >> > > > > > > * Javadoc: >> > >> > > > > > > >> > >> > > >> > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/javadoc/ >> > >> > > > > > > >> > >> > > > > > > * Tag to be voted upon (off 3.7 branch) is the 3.7.0 tag: >> > >> > > > > > > https://github.com/apache/kafka/releases/tag/3.7.0-rc2 >> > >> > > > > > > >> > >> > > > > > > * Documentation: >> > >> > > > > > > https://kafka.apache.org/37/documentation.html >> > >> > > > > > > >> > >> > > > > > > * Protocol: >> > >> > > > > > > https://kafka.apache.org/37/protocol.html >> > >> > > > > > > >> > >> > > > > > > * Successful Jenkins builds for the 3.7 branch: >> > >> > > > > > > Unit/integration tests: >> > >> > > > > > > >> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.7/58/ >> > >> > > > > > > There are failing tests here. I have to follow up with >> > triaging >> > >> > > some >> > >> > > > of >> > >> > > > > > > the failures and figuring out if they're actual problems >> or >> > >> > simply >> > >> > > > > flakes. >> > >> > > > > > > >> > >> > > > > > > System tests: >> > >> > > > > https://jenkins.confluent.io/job/system-test-kafka/job/3.7/ >> > >> > > > > > > >> > >> > > > > > > No successful system test runs yet. I am working on >> getting >> > the >> > >> > job >> > >> > > > to >> > >> > > > > run. >> > >> > > > > > > >> > >> > > > > > > * Successful Docker Image Github Actions Pipeline for 3.7 >> > >> branch: >> > >> > > > > > > Attached are the scan_report and report_jvm output files >> > from >> > >> the >> > >> > > > > Docker >> > >> > > > > > > Build run: >> > >> > > > > > > >> > >> > > > > >> > >> > > >> > >> >> https://github.com/apache/kafka/actions/runs/7486094960/job/20375761673 >> > >> > > > > > > >> > >> > > > > > > And the final docker image build job - Docker Build Test >> > >> > Pipeline: >> > >> > > > > > > https://github.com/apache/kafka/actions/runs/7486178277 >> > >> > > > > > > >> > >> > > > > > > The image is apache/kafka:3.7.0-rc2 - >> > >> > > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> https://hub.docker.com/layers/apache/kafka/3.7.0-rc2/images/sha256-5b4707c08170d39549fbb6e2a3dbb83936a50f987c0c097f23cb26b4c210c226?context=explore >> > >> > > > > > > >> > >> > > > > > > /************************************** >> > >> > > > > > > >> > >> > > > > > > Thanks, >> > >> > > > > > > Stanislav Kozlovski >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> >> > >> -- >> > >> Best, >> > >> Stanislav >> > >> >> > >> >