Hi Claude,
Thanks for writing this KIP. This issue seems particularly
thorny, and I appreciate everyone's effort to address this.
I want to share my concern with the KIP's proposal of the
use of memory mapped files – mmap is Java's achilles heel,
Kafka should make less use of it, not more.
The
[
https://issues.apache.org/jira/browse/KAFKA-16645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Soarez resolved KAFKA-16645.
-
Resolution: Resolved
> CVEs in 3.7.0 docker im
[
https://issues.apache.org/jira/browse/KAFKA-16645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Soarez reopened KAFKA-16645:
-
Need to re-open to change the resolution, release_notes.py doesn't like the one
I picked
> C
[
https://issues.apache.org/jira/browse/KAFKA-16692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Soarez reopened KAFKA-16692:
-
Re-opening as 3.6 backport is still missing
> InvalidRequestException: ADD_PARTITIONS_TO_
[
https://issues.apache.org/jira/browse/KAFKA-16692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Soarez resolved KAFKA-16692.
-
Resolution: Fixed
> InvalidRequestException: ADD_PARTITIONS_TO_TXN with version 4 wh
[
https://issues.apache.org/jira/browse/KAFKA-16645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Soarez resolved KAFKA-16645.
-
Assignee: Igor Soarez
Resolution: Won't Fix
The vulnerability has already been addressed
[
https://issues.apache.org/jira/browse/KAFKA-16688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Soarez resolved KAFKA-16688.
-
Resolution: Fixed
> SystemTimer leaks resources on cl
[
https://issues.apache.org/jira/browse/KAFKA-16624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Soarez resolved KAFKA-16624.
-
Resolution: Fixed
> Don't generate useless PartitionChangeRecord on older
Hi Omnia, Hi Claude,
Thanks for putting this KIP together.
This is an important unresolved issue in Kafka,
which I have witnessed several times in production.
Please see my questions below:
10 Given the goal is to prevent OOMs, do we also need to
limit the number of KafkaPrincipals in use?
11.
Igor Soarez created KAFKA-16636:
---
Summary: Flaky test - testStickyTaskAssignorLargePartitionCount –
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest
Key: KAFKA-16636
URL: https
Igor Soarez created KAFKA-16635:
---
Summary: Flaky test
"shouldThrottleOldSegments(String).quorum=kraft" –
kafka.server.ReplicationQuotasTest
Key: KAFKA-16635
URL: https://issues.apache.org/jira/browse/K
Igor Soarez created KAFKA-16634:
---
Summary: Flaky test - testFenceMultipleBrokers() –
org.apache.kafka.controller.QuorumControllerTest
Key: KAFKA-16634
URL: https://issues.apache.org/jira/browse/KAFKA-16634
Igor Soarez created KAFKA-16633:
---
Summary: Flaky test -
testDescribeExistingGroupWithNoMembers(String,
String).quorum=kraft+kip848.groupProtocol=consumer –
org.apache.kafka.tools.consumer.group.DescribeConsumerGroupTest
Key
Igor Soarez created KAFKA-16632:
---
Summary: Flaky test
testDeleteOffsetsOfStableConsumerGroupWithTopicPartition [1]
Type=Raft-Isolated, MetadataVersion=3.8-IV0, Security=PLAINTEXT
Igor Soarez created KAFKA-16631:
---
Summary: Flaky test -
testDeleteOffsetsOfStableConsumerGroupWithTopicOnly [1] Type=Raft-Isolated,
MetadataVersion=3.8-IV0, Security=PLAINTEXT
Igor Soarez created KAFKA-16630:
---
Summary: Flaky test
"testPollReturnsRecords(GroupProtocol).groupProtocol=CLASSIC" –
org.apache.kafka.clients.consumer.KafkaConsumerTest
Key: KAFKA-16630
Thanks everyone, I'm very honoured to join!
--
Igor
Hi everyone,
I'd like to volunteer to be the release manager for a 3.7.1 release.
Please keep in mind, this would be my first release, so I might have some
questions,
and it might also take me a bit longer to work through the release process.
So I'm thinking a good target would be toward the
[
https://issues.apache.org/jira/browse/KAFKA-16610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Soarez resolved KAFKA-16610.
-
Resolution: Resolved
> Replace "Map#entrySet#forEach" by
[
https://issues.apache.org/jira/browse/KAFKA-16606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Soarez reopened KAFKA-16606:
-
Assignee: Igor Soarez
> JBOD support in KRaft does not seem to be gated by the metad
Hi Omnia,
Thanks for your answers, and I see you've updated the KIP so thanks for the
changes too.
+1 (binding), thanks for the KIP
--
Igor
Igor Soarez created KAFKA-16602:
---
Summary: Flaky test –
org.apache.kafka.controller.QuorumControllerTest.testBootstrapZkMigrationRecord()
Key: KAFKA-16602
URL: https://issues.apache.org/jira/browse/KAFKA-16602
Igor Soarez created KAFKA-16601:
---
Summary: Flaky test –
org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest.testClosingQuorumControllerClosesMetrics()
Key: KAFKA-16601
URL: https://issues.apache.org
Igor Soarez created KAFKA-16597:
---
Summary: Flaky test -
org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQuerySpecificStalePartitionStoresMultiStreamThreads()
Key: KAFKA-16597
URL: https
Igor Soarez created KAFKA-16596:
---
Summary: Flaky test –
org.apache.kafka.clients.ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup()
Key: KAFKA-16596
URL: https://issues.apache.org/jira/browse/KAFKA
Hi Omnia,
Thanks for this KIP.
11. These seem to me to be small misspellings, please double-check:
s/MM2 main features/MM2's main features
s/syncing consumer group offset/syncing consumer group offsets
s/relays/relies
s/recored's offset/recorded offsets
s/clusters without need for/clusters
[
https://issues.apache.org/jira/browse/KAFKA-15793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Soarez reopened KAFKA-15793:
-
This has come up again:
{code:java}
[2024-04-09T21:06:17.307Z] Gradle Test Run :core:test
Igor Soarez created KAFKA-16504:
---
Summary: Flaky test
org.apache.kafka.controller.QuorumControllerTest.testConfigurationOperations
Key: KAFKA-16504
URL: https://issues.apache.org/jira/browse/KAFKA-16504
[
https://issues.apache.org/jira/browse/KAFKA-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Soarez resolved KAFKA-16403.
-
Resolution: Not A Bug
> Flaky t
[
https://issues.apache.org/jira/browse/KAFKA-16404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Soarez resolved KAFKA-16404.
-
Resolution: Not A Bug
Same as KAFKA-16403, this only failed once. It was likely the result
Igor Soarez created KAFKA-16422:
---
Summary: Flaky test
org.apache.kafka.controller.QuorumControllerMetricsIntegrationTest."testFailingOverIncrementsNewActiveControllerCount(boolean).true"
Key: KAFKA-16422
Igor Soarez created KAFKA-16404:
---
Summary: Flaky test
org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testGetStreamsConfig
Key: KAFKA-16404
URL: https://issues.apache.org/jira/browse/KAFKA-16404
Igor Soarez created KAFKA-16403:
---
Summary: Flaky test
org.apache.kafka.streams.examples.wordcount.WordCountDemoTest.testCountListOfWords
Key: KAFKA-16403
URL: https://issues.apache.org/jira/browse/KAFKA-16403
Igor Soarez created KAFKA-16402:
---
Summary: Flaky test
org.apache.kafka.controller.QuorumControllerTest.testSnapshotSaveAndLoad
Key: KAFKA-16402
URL: https://issues.apache.org/jira/browse/KAFKA-16402
Igor Soarez created KAFKA-16365:
---
Summary: AssignmentsManager mismanages completion notifications
Key: KAFKA-16365
URL: https://issues.apache.org/jira/browse/KAFKA-16365
Project: Kafka
Issue
Igor Soarez created KAFKA-16363:
---
Summary: Storage crashes if dir is unavailable
Key: KAFKA-16363
URL: https://issues.apache.org/jira/browse/KAFKA-16363
Project: Kafka
Issue Type: Sub-task
Igor Soarez created KAFKA-16297:
---
Summary: Race condition while promoting future replica can lead to
partition unavailability.
Key: KAFKA-16297
URL: https://issues.apache.org/jira/browse/KAFKA-16297
d it make sense to authorize these requests as other inter-broker
> protocol calls are usually authorized, that is ClusterAction on Cluster
> resource?
>
> Thanks,
> Viktor
>
> On Tue, Nov 28, 2023 at 4:18 PM Igor Soarez wrote:
>
> > Hi everyone,
> >
> > Th
Igor Soarez created KAFKA-15955:
---
Summary: Migrating ZK brokers send dir assignments
Key: KAFKA-15955
URL: https://issues.apache.org/jira/browse/KAFKA-15955
Project: Kafka
Issue Type: Sub-task
is now more of an open question. It's unclear if this will
actually be necessary.
Please share if you have any thoughts.
Best,
--
Igor
On Tue, Oct 10, 2023, at 5:28 AM, Igor Soarez wrote:
> Hi Colin,
>
> Thanks for the renaming suggestions. UNASSIGNED is better then
> UNKNOWN, MIGRA
Igor Soarez created KAFKA-15893:
---
Summary: Bump MetadataVersion for directory assignments
Key: KAFKA-15893
URL: https://issues.apache.org/jira/browse/KAFKA-15893
Project: Kafka
Issue Type: Sub
Igor Soarez created KAFKA-15886:
---
Summary: Always specify directories for new partition registrations
Key: KAFKA-15886
URL: https://issues.apache.org/jira/browse/KAFKA-15886
Project: Kafka
Igor Soarez created KAFKA-15858:
---
Summary: Broker stays fenced until all assignments are correct
Key: KAFKA-15858
URL: https://issues.apache.org/jira/browse/KAFKA-15858
Project: Kafka
Issue
Hi all,
I think at least one of those is my fault, apologies.
I'll try to make sure all my tests are passing from now on.
It doesn't help that GitHub always shows that the tests have failed,
even when they have not. I suspect this is because Jenkins always
marks the builds as unstable, even when
Igor Soarez created KAFKA-15650:
---
Summary: Data-loss on leader shutdown right after partition
creation?
Key: KAFKA-15650
URL: https://issues.apache.org/jira/browse/KAFKA-15650
Project: Kafka
Igor Soarez created KAFKA-15649:
---
Summary: Handle directory failure timeout
Key: KAFKA-15649
URL: https://issues.apache.org/jira/browse/KAFKA-15649
Project: Kafka
Issue Type: Sub-task
[
https://issues.apache.org/jira/browse/KAFKA-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Soarez reopened KAFKA-15355:
-
closed by mistake
> Message schema changes
> --
>
>
Hi Colin,
> I would call #2 LOST. It was assigned in the past, but we don't know where.
> I see that you called this OFFLINE). This is not really normal...
> it should happen only when we're migrating from ZK mode to KRaft mode,
> or going from an older KRaft release with multiple directories to
Hi David,
Thanks for shedding light on migration goals, makes sense.
Your preference for option a) makes it even more attractive.
We'll keep that as the preferred approach, thanks for the advice.
> One question with this approach is how the KRaft controller learns about
> the multiple log
Hi everyone,
Earlier today Colin, Ron, Proven and I had a chat about this work.
We discussed several aspects which I’d like to share here.
## A new reserved UUID
We'll reserve a third UUID to indicate an unspecified dir,
but one that is known to be selected. As opposed to the
default
Hi everyone,
After a conversation with Colin McCabe and Proven Provenzano yesterday,
we decided that the benefits outweigh the concerns with the overhead
of associating a directory UUID to every replica in the metadata
partition records.
i.e. We prefer to always associate the log dir UUID even
Igor Soarez created KAFKA-15514:
---
Summary: Controller-side replica management changes
Key: KAFKA-15514
URL: https://issues.apache.org/jira/browse/KAFKA-15514
Project: Kafka
Issue Type: Sub
Hi Ron,
I think we can generalize the deconfigured directory scenario
in your last question to address this situation too.
When handling a broker registration request, the controller
can check if OfflineLogDirs=false and any UUIDs are missing
in OnlineLogDirs, compared with the previous
Hi everyone,
I believe we can close this voting thread now, as there
were three +1 binding votes from Ziming, Mickael and Ron.
With that, this vote passes.
Thanks to everyone who participated in reviewing,
and/or taking the time to vote on this KIP!
Best,
--
Igor
wrote:
> >
> > Ok, great, that makes sense, Igor. Thanks. +1 (binding) on the KIP from
> > me.
> >
> > Ron
> >
> > > On Sep 13, 2023, at 11:58 AM, Igor Soarez
> > > wrote:
> > >
> > > Hi Ron,
> > >
> > &
Hi Ron,
Thanks for drilling down on this. I think the KIP isn't really clear here,
and the metadata caching section you quoted needs clarification.
The "hosting broker's latest registration" refers to the previous,
not the current registration. The registrations are only compared by
the
Hi Ron,
Thank you for having a look a this KIP.
Indeed, the log directory UUID should always be generated
and loaded. I've have corrected the wording in the KIP to clarify.
It is a bit of a pain to replace the field, but I agree that is
the best approach for the same reason you pointed out.
I
Hi Ziming,
Thank you for having a look and taking the time to vote.
I have already opened some PRs, see:
https://issues.apache.org/jira/browse/KAFKA-14127
Best,
--
Igor
Igor Soarez created KAFKA-15451:
---
Summary: Include offline dirs in BrokerHeartbeatRequest
Key: KAFKA-15451
URL: https://issues.apache.org/jira/browse/KAFKA-15451
Project: Kafka
Issue Type: Sub
Igor Soarez created KAFKA-15426:
---
Summary: Process and persist directory assignments
Key: KAFKA-15426
URL: https://issues.apache.org/jira/browse/KAFKA-15426
Project: Kafka
Issue Type: Sub-task
Igor Soarez created KAFKA-15368:
---
Summary: Test ZK JBOD to KRaft migration
Key: KAFKA-15368
URL: https://issues.apache.org/jira/browse/KAFKA-15368
Project: Kafka
Issue Type: Sub-task
Igor Soarez created KAFKA-15367:
---
Summary: Test KRaft JBOD enabling migration
Key: KAFKA-15367
URL: https://issues.apache.org/jira/browse/KAFKA-15367
Project: Kafka
Issue Type: Sub-task
Igor Soarez created KAFKA-15366:
---
Summary: Log directory failure integration test
Key: KAFKA-15366
URL: https://issues.apache.org/jira/browse/KAFKA-15366
Project: Kafka
Issue Type: Sub-task
Igor Soarez created KAFKA-15365:
---
Summary: Replica management changes
Key: KAFKA-15365
URL: https://issues.apache.org/jira/browse/KAFKA-15365
Project: Kafka
Issue Type: Sub-task
Igor Soarez created KAFKA-15364:
---
Summary: Handle log directory failure in the Controller
Key: KAFKA-15364
URL: https://issues.apache.org/jira/browse/KAFKA-15364
Project: Kafka
Issue Type: Sub
Igor Soarez created KAFKA-15363:
---
Summary: Broker log directory failure changes
Key: KAFKA-15363
URL: https://issues.apache.org/jira/browse/KAFKA-15363
Project: Kafka
Issue Type: Sub-task
Igor Soarez created KAFKA-15362:
---
Summary: Resolve offline replicas in metadata cache
Key: KAFKA-15362
URL: https://issues.apache.org/jira/browse/KAFKA-15362
Project: Kafka
Issue Type: Sub
Igor Soarez created KAFKA-15361:
---
Summary: Process and persist dir info with broker registration
Key: KAFKA-15361
URL: https://issues.apache.org/jira/browse/KAFKA-15361
Project: Kafka
Issue
Igor Soarez created KAFKA-15360:
---
Summary: Include directory info in BrokerRegistration
Key: KAFKA-15360
URL: https://issues.apache.org/jira/browse/KAFKA-15360
Project: Kafka
Issue Type: Sub
Igor Soarez created KAFKA-15359:
---
Summary: log.dir.failure.timeout.ms configuration
Key: KAFKA-15359
URL: https://issues.apache.org/jira/browse/KAFKA-15359
Project: Kafka
Issue Type: Sub-task
Igor Soarez created KAFKA-15358:
---
Summary: QueuedReplicaToDirAssignments metric
Key: KAFKA-15358
URL: https://issues.apache.org/jira/browse/KAFKA-15358
Project: Kafka
Issue Type: Sub-task
Igor Soarez created KAFKA-15357:
---
Summary: Propagates assignments and logdir failures to controller
Key: KAFKA-15357
URL: https://issues.apache.org/jira/browse/KAFKA-15357
Project: Kafka
Issue
Igor Soarez created KAFKA-15356:
---
Summary: Generate and persist log directory UUIDs
Key: KAFKA-15356
URL: https://issues.apache.org/jira/browse/KAFKA-15356
Project: Kafka
Issue Type: Sub-task
Igor Soarez created KAFKA-15355:
---
Summary: Update metadata records
Key: KAFKA-15355
URL: https://issues.apache.org/jira/browse/KAFKA-15355
Project: Kafka
Issue Type: Sub-task
Hi Mickael,
Thanks for voting, and for pointing out the mistake.
I've corrected it in the KIP now.
The proposed name is "QueuedReplicaToDirAssignments".
Best,
--
Igor
Hi Ismael,
I believe I have addressed all concerns.
Please have a look, and consider a vote on this KIP.
Thank you,
--
Igor
Hi everyone,
Following a face-to-face discussion with Ron and Colin,
I have just made further improvements to this KIP:
1. Every log directory gets a random UUID assigned, even if just one
log dir is configured in the Broker.
2. All online log directories are registered, even if just one if
Hi Colin,
Thanks for your questions.
Please have a look at my answers below.
> In the previous email I asked, "who is responsible for assigning replicas to
> broker directories?" Can you clarify what the answer is to that? If the
> answer is the controller, there is no need for an "unknown"
Hi Colin,
Thanks for your support with getting this over the line and that’s
great re the preliminary pass! Thanks also for sharing your
thoughts, I've had a careful look at each of these and sharing my
comments below.
I agree, it is important to avoid a perf hit on non-JBOD.
I've opted for
Congratulations Divij!
--
Igor
Hi everyone,
We're getting closer to dropping ZooKeeper support, and JBOD
in KRaft mode is one of the outstanding big missing features.
It's been a while since there was new feedback on KIP-858 [1]
which aims to address this gap, so I'm calling for a vote.
A huge thank you to everyone who has
Thanks for the KIP.
Seems straightforward, LGTM.
Non binding +1.
--
Igor
Hi all,
We just had a video call to discuss this KIP and I just wanted
update this thread with a note on the meeting.
Attendees:
- Igor
- Christo
- Divij
- Colt
Items discussed:
- Context, motivation and overview of the proposal.
- How log directories are identified by each Broker.
- How old
Hi Christo,
Thank you for the KIP. Kafka is very sensitive to filesystem errors,
and at the first IO error the whole log directory is permanently
considered offline. It seems your proposal aims to increase the
robustness of Kafka, and that's a positive improvement.
I have some questions:
11.
Hi all,
I have created a TLA+ specification for this KIP, available here:
https://github.com/soarez/kafka/blob/kip-858-tla-plus/tla/Kip858.tla
If there are no further comments I'll start
a voting thread next week.
--
Igor
Hi Alexandre,
Thank you for having a look at this KIP, and thank you for pointing this out.
I like the idea of expanding the health status of a log directory beyond
just online/offline status.
This KIP currently proposes a single logdir state transition, from
online to offline, conveyed in a
Hi Divij, Christo,
Thank you for pointing that out.
Let's aim instead for Monday 5th of June, at the same time – 16:30-17:00 UTC.
Please let me know if this doesn't work either.
Best,
--
Igor
Hi everyone,
Someone suggested at the recent Kafka Summit that it may be useful
to have a video call to discuss remaining concerns.
I'm proposing we have a video call Monday 29th May 16:30-17:00 UTC.
If you'd like to join, please reply to the thread or to me directly so
I can send you a link.
Hi Christo,
Thank you for your interest in this KIP.
Indeed, I'd like to open up voting ASAP.
I'm hoping there will still be a bit more feedback,
but if not I'll probably request a vote next week or so.
Do you have any concerns or suggestions regarding this KIP?
I'll have a look at your KIP
My impression is also that a lot of users run older,
out of EOL, versions of Kafka.
The final 3.x version is particularly concerning, as it will be
the last bridge to migrate away from ZK. If a big portion of users
only upgrade after its EOL period, we might only then discover an
important bug
Thank you for another review Ziming, much appreciated!
1. and 2. You are correct, it would be a big and perhaps strange difference.
Since our last exchange of emails, the proposal has changed and now it
does follow your suggestion to bump metadata.version.
The KIP mentions it under
Hi Jun,
Thank you for sharing your questions, please find my answers below.
41. There can only be user partitions on `metadata.log.dir` if that log
dir is also listed in `log.dirs`.
`LogManager` does not specifically load contents from `metadata.log.dir`.
The broker will communicate UUIDs to
Hi all,
I’ve had to step away from work for personal reasons for a couple of months –
until mid April 2023. I don’t think I’ll be able to continue to address
feedback or update this KIP before then.
--
Igor
Hi David,
Thank you for your suggestions and for having a look at this KIP.
1. Yes, that should be OK. I have updated the section
"Migrating a cluster in ZK mode running with JBOD" to reflect this.
2. I've updated the motivation section to state that.
Best,
--
Igor
Hi Jun,
Thank you for your comments and questions.
30. Thank you for pointing this out. The isNew flag is not available
in KRaft mode. The broker can consider the metadata records:
If, and only if, the logdir assigned is Uuid.ZERO then the replica can
be considered new.
Being able to determine
Hi Tom,
Thank you for having another look.
20. That is a good point.
Thinking about your suggestion:
How would this look like in a non-JBOD KRraft cluster upgrade to JBOD mode?
Upgrading to version that includes the JBOD support patch would automatically
update meta.properties to include the
Hi Tom,
Thank you for having another look.
20. Upon a downgrade to a Kafka version that runs the current
"version == 1" assertion, then yes — a downgrade would not be possible
without first updating (manually) the meta.properties files back
to the previous version.
We could prevent this issue
Hi Jun,
Thank you for having another look.
11. That is correct. I have updated the KIP in an attempt to make this clearer.
I think the goal should be to try to minimize the chance that a log directory
may happen while the metadata is incorrect about the log directory assignment,
but also have a
Hi Jun,
Thank you for reviewing the KIP. Please find my replies to
your comments below.
10. Thanks for pointing out this typo; it has been corrected.
11. I agree that the additional delay in switching to the
future replica is undesirable, however I see a couple of
issues if we forward the
Hi David,
Zookeeper mode writes meta.properties with version=0. KRaft mode requires
version=1 in meta.properties.
Will a manual step be required to update meta.properties or will brokers
somehow update meta.properties files to version 1?
Thanks,
--
Igor
1 - 100 of 149 matches
Mail list logo