[jira] [Updated] (KAFKA-13560) Load indexes and data in async manner in the critical path of replica fetcher threads.
[ https://issues.apache.org/jira/browse/KAFKA-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-13560: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Load indexes and data in async manner in the critical path of replica fetcher > threads. > --- > > Key: KAFKA-13560 > URL: https://issues.apache.org/jira/browse/KAFKA-13560 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Satish Duggana >Priority: Major > Fix For: 3.9.0 > > > https://github.com/apache/kafka/pull/11390#discussion_r762366976 > https://github.com/apache/kafka/pull/11390#discussion_r1033141283 > https://github.com/apache/kafka/pull/15690 removed the below method from in > `TierStateMachine` interface. This can be added back when we implement the > functionality required to address this issue. > {code:java} > public Optional maybeAdvanceState(TopicPartition > topicPartition, PartitionFetchState currentFetchState) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15300) Include remotelog size in complete log size and also add local log size and remote log size separately in kafka-log-dirs tool.
[ https://issues.apache.org/jira/browse/KAFKA-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15300: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Include remotelog size in complete log size and also add local log size and > remote log size separately in kafka-log-dirs tool. > --- > > Key: KAFKA-15300 > URL: https://issues.apache.org/jira/browse/KAFKA-15300 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Satish Duggana >Priority: Major > Fix For: 3.9.0 > > > Include remotelog size in complete log size and also add local log size and > remote log size separately in kafka-log-dirs tool. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15480) Add RemoteStorageInterruptedException
[ https://issues.apache.org/jira/browse/KAFKA-15480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15480: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Add RemoteStorageInterruptedException > - > > Key: KAFKA-15480 > URL: https://issues.apache.org/jira/browse/KAFKA-15480 > Project: Kafka > Issue Type: Task > Components: core >Affects Versions: 3.6.0 >Reporter: Mital Awachat >Priority: Major > Labels: kip > Fix For: 3.9.0 > > > Introduce `RemoteStorageInterruptedException` to propagate interruptions from > the plugin to Kafka without generated (false) errors. > It allows the plugin to notify Kafka an API operation in progress was > interrupted as a result of task cancellation, which can happen under changes > such as leadership migration or topic deletion. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15214) Add metrics for OffsetOutOfRangeException when tiered storage is enabled
[ https://issues.apache.org/jira/browse/KAFKA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15214: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Add metrics for OffsetOutOfRangeException when tiered storage is enabled > > > Key: KAFKA-15214 > URL: https://issues.apache.org/jira/browse/KAFKA-15214 > Project: Kafka > Issue Type: Task > Components: metrics >Affects Versions: 3.6.0 >Reporter: Lixin Yao >Priority: Minor > Labels: KIP-405 > Fix For: 3.9.0 > > > In the current metrics RemoteReadErrorsPerSec, the exception type > OffsetOutOfRangeException is not included. > In our testing with tiered storage feature (at Apple), we noticed several > cases where remote download is affected and stuck due to repeatedly > OffsetOutOfRangeException in some particular broker or topic partitions. The > root cause could be various but currently without a metrics it's very hard to > catch this issue and debug in a timely fashion. It's understandable that the > exception itself could not be the root cause but this exception metric could > be a good metrics for us to alert and investigate. > Related discussion > [https://github.com/apache/kafka/pull/13944#discussion_r1266243006] > I am happy to contribute to this if the request is agreed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15529) Flaky test ReassignReplicaShrinkTest.executeTieredStorageTest
[ https://issues.apache.org/jira/browse/KAFKA-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15529: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Flaky test ReassignReplicaShrinkTest.executeTieredStorageTest > - > > Key: KAFKA-15529 > URL: https://issues.apache.org/jira/browse/KAFKA-15529 > Project: Kafka > Issue Type: Test > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Divij Vaidya >Priority: Blocker > Labels: flaky-test > Fix For: 3.9.0 > > > Example of failed CI build - > [https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14449/3/testReport/junit/org.apache.kafka.tiered.storage.integration/ReassignReplicaShrinkTest/Build___JDK_21_and_Scala_2_13___executeTieredStorageTest_String__quorum_kraft_2/] > > {noformat} > org.opentest4j.AssertionFailedError: Number of fetch requests from broker 0 > to the tier storage does not match the expected value for topic-partition > topicA-1 ==> expected: <3> but was: <4> > at > app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) > at > app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) > at > app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) > at > app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150) > at > app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:559) > at > app//org.apache.kafka.tiered.storage.actions.ConsumeAction.doExecute(ConsumeAction.java:128) > at > app//org.apache.kafka.tiered.storage.TieredStorageTestAction.execute(TieredStorageTestAction.java:25) > at > app//org.apache.kafka.tiered.storage.TieredStorageTestHarness.executeTieredStorageTest(TieredStorageTestHarness.java:112){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15038) Use topic id/name mapping from the Metadata cache in the RemoteLogManager
[ https://issues.apache.org/jira/browse/KAFKA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15038: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Use topic id/name mapping from the Metadata cache in the RemoteLogManager > - > > Key: KAFKA-15038 > URL: https://issues.apache.org/jira/browse/KAFKA-15038 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Alexandre Dupriez >Assignee: Owen C.H. Leung >Priority: Minor > Fix For: 3.9.0 > > > Currently, the {{RemoteLogManager}} maintains its own cache of topic name to > topic id > [[1]|https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L138] > using the information provided during leadership changes, and removing the > mapping upon receiving the notification of partition stopped. > It should be possible to re-use the mapping in a broker's metadata cache, > removing the need for the RLM to build and update a local cache thereby > duplicating the information in the metadata cache. It also allows to preserve > a single source of authority regarding the association between topic names > and ids. > [1] > https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L138 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-9578) Kafka Tiered Storage - System Tests
[ https://issues.apache.org/jira/browse/KAFKA-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-9578: -- Fix Version/s: 3.9.0 (was: 3.8.0) > Kafka Tiered Storage - System Tests > > > Key: KAFKA-9578 > URL: https://issues.apache.org/jira/browse/KAFKA-9578 > Project: Kafka > Issue Type: Test >Reporter: Harsha >Priority: Major > Fix For: 3.9.0 > > > Initial test cases set up by [~Ying Zheng] > > [https://docs.google.com/spreadsheets/d/1gS0s1FOmcjpKYXBddejXAoJAjEZ7AdEzMU9wZc-JgY8/edit#gid=0] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15132) Implement disable & re-enablement for Tiered Storage
[ https://issues.apache.org/jira/browse/KAFKA-15132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15132: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Implement disable & re-enablement for Tiered Storage > > > Key: KAFKA-15132 > URL: https://issues.apache.org/jira/browse/KAFKA-15132 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Divij Vaidya >Assignee: Divij Vaidya >Priority: Major > Labels: kip > Fix For: 3.9.0 > > > KIP-405 [1] introduces the Tiered Storage feature in Apache Kafka. One of the > limitations mentioned in the KIP is inability to re-enable TS on a topic > after it has been disabled. > {quote}Once tier storage is enabled for a topic, it can not be disabled. We > will add this feature in future versions. One possible workaround is to > create a new topic and copy the data from the desired offset and delete the > old topic. > {quote} > This task will propose a new KIP which extends on KIP-405 to describe the > behaviour on on disablement and re-enablement of tiering storage for a topic. > The solution will apply for both Zk and KRaft variants. > [1] KIP-405 - > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15376) Explore options of removing data earlier to the current leader's leader epoch lineage for topics enabled with tiered storage.
[ https://issues.apache.org/jira/browse/KAFKA-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15376: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Explore options of removing data earlier to the current leader's leader epoch > lineage for topics enabled with tiered storage. > - > > Key: KAFKA-15376 > URL: https://issues.apache.org/jira/browse/KAFKA-15376 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Satish Duggana >Priority: Major > Fix For: 3.9.0 > > > Followup on the discussion thread: > [https://github.com/apache/kafka/pull/13561#discussion_r1288778006] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15864) Add more tests asserting the log-start-offset, local-log-start-offset, and HW/LSO/LEO in rolling over segments with tiered storage.
[ https://issues.apache.org/jira/browse/KAFKA-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15864: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Add more tests asserting the log-start-offset, local-log-start-offset, and > HW/LSO/LEO in rolling over segments with tiered storage. > --- > > Key: KAFKA-15864 > URL: https://issues.apache.org/jira/browse/KAFKA-15864 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Satish Duggana >Priority: Major > Labels: tiered-storage > Fix For: 3.9.0 > > > Followup on the > [comment|https://github.com/apache/kafka/pull/14766/files#r1395389551] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15094) Add RemoteIndexCache metrics like misses/evictions/load-failures.
[ https://issues.apache.org/jira/browse/KAFKA-15094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15094: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Add RemoteIndexCache metrics like misses/evictions/load-failures. > - > > Key: KAFKA-15094 > URL: https://issues.apache.org/jira/browse/KAFKA-15094 > Project: Kafka > Issue Type: Improvement >Reporter: Satish Duggana >Assignee: Abhijeet Kumar >Priority: Major > Fix For: 3.9.0 > > > Add metrics like hits/misses/evictions/load-failures for RemoteIndexCache. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14915) Option to consume multiple partitions that have their data in remote storage for the target offsets.
[ https://issues.apache.org/jira/browse/KAFKA-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-14915: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Option to consume multiple partitions that have their data in remote storage > for the target offsets. > > > Key: KAFKA-14915 > URL: https://issues.apache.org/jira/browse/KAFKA-14915 > Project: Kafka > Issue Type: Improvement >Reporter: Satish Duggana >Priority: Major > Labels: tiered-storage > Fix For: 3.9.0 > > > Context: https://github.com/apache/kafka/pull/13535#discussion_r1171250580 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15195) Regenerate segment-aligned producer snapshots when upgrading to a Kafka version supporting Tiered Storage
[ https://issues.apache.org/jira/browse/KAFKA-15195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15195: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Regenerate segment-aligned producer snapshots when upgrading to a Kafka > version supporting Tiered Storage > - > > Key: KAFKA-15195 > URL: https://issues.apache.org/jira/browse/KAFKA-15195 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.6.0 >Reporter: Christo Lolov >Assignee: Christo Lolov >Priority: Major > Fix For: 3.9.0 > > > As mentioned in KIP-405: Kafka Tiered Storage#Upgrade a customer wishing to > upgrade from a Kafka version < 2.8.0 to 3.6 and turn Tiered Storage on will > have to wait for retention to clean up segments without an associated > producer snapshot. > However, in our experience, customers of Kafka expect to be able to > immediately enable tiering on a topic once their cluster upgrade is complete. > Once they do this, however, they start seeing NPEs and no data is uploaded to > Tiered Storage > (https://github.com/apache/kafka/blob/9e50f7cdd37f923cfef4711cf11c1c5271a0a6c7/storage/api/src/main/java/org/apache/kafka/server/log/remote/storage/LogSegmentData.java#L61). > To achieve this, we propose changing Kafka to retroactively create producer > snapshot files on upload whenever a segment is due to be archived and lacks > one. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15301) [Tiered Storage] Historically compacted topics send request to remote for active segment during consume
[ https://issues.apache.org/jira/browse/KAFKA-15301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15301: --- Fix Version/s: 3.9.0 (was: 3.8.0) > [Tiered Storage] Historically compacted topics send request to remote for > active segment during consume > --- > > Key: KAFKA-15301 > URL: https://issues.apache.org/jira/browse/KAFKA-15301 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 3.6.0 >Reporter: Mital Awachat >Assignee: Jimmy Wang >Priority: Major > Fix For: 3.9.0 > > > I have a use case where tiered storage plugin received requests for active > segments. The topics for which it happened were historically compacted topics > for which compaction was disabled and tiering was enabled. > Create topic with compact cleanup policy -> Produce data with few repeat keys > and create multiple segments -> let compaction happen -> change cleanup > policy to delete -> produce some more data for segment rollover -> enable > tiering on topic -> wait for segments to be uploaded to remote storage and > cleanup from local (active segment would remain), consume from beginning -> > Observe logs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15341) Enabling TS for a topic during rolling restart causes problems
[ https://issues.apache.org/jira/browse/KAFKA-15341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15341: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Enabling TS for a topic during rolling restart causes problems > -- > > Key: KAFKA-15341 > URL: https://issues.apache.org/jira/browse/KAFKA-15341 > Project: Kafka > Issue Type: Bug >Reporter: Divij Vaidya >Priority: Major > Labels: KIP-405 > Fix For: 3.9.0 > > > When we are in a rolling restart to enable TS at system level, some brokers > have TS enabled on them and some don't. We send an alter config call to > enable TS for a topic, it hits a broker which has TS enabled, this broker > forwards it to the controller and controller will send the config update to > all brokers. When another broker which doesn't have TS enabled (because it > hasn't undergone the restart yet) gets this config change, it "should" fail > to apply it. But failing now is too late since alterConfig has already > succeeded since controller->broker config propagation is done async. > With this JIRA, we want to have controller check if TS is enabled on all > brokers before applying alter config to turn on TS for a topic. > Context: https://github.com/apache/kafka/pull/14176#discussion_r1291265129 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-13355) Shutdown broker eventually when unrecoverable exceptions like IOException encountered in RLMM.
[ https://issues.apache.org/jira/browse/KAFKA-13355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-13355: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Shutdown broker eventually when unrecoverable exceptions like IOException > encountered in RLMM. > --- > > Key: KAFKA-13355 > URL: https://issues.apache.org/jira/browse/KAFKA-13355 > Project: Kafka > Issue Type: Bug >Reporter: Satish Duggana >Assignee: Abhijeet Kumar >Priority: Major > Labels: tiered-storage > Fix For: 3.9.0 > > > Have mechanism to catch unrecoverable exceptions like IOException from RLMM > and shutdown the broker like it is done in log layer. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15969) Align RemoteStorageThreadPool metrics name with KIP-405
[ https://issues.apache.org/jira/browse/KAFKA-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15969: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Align RemoteStorageThreadPool metrics name with KIP-405 > --- > > Key: KAFKA-15969 > URL: https://issues.apache.org/jira/browse/KAFKA-15969 > Project: Kafka > Issue Type: Bug > Components: metrics >Affects Versions: 3.6.0 >Reporter: Lixin Yao >Priority: Minor > Fix For: 3.9.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > In KIP-405, there are 2 metrics defined below: > ^kafka.log.remote:type=RemoteStorageThreadPool, > name=RemoteLogReaderTaskQueueSize^ > and > ^kafka.log.remote:type=RemoteStorageThreadPool, > name=RemoteLogReaderAvgIdlePercent^ > However, in Kafka 3.6 release, the actual metrics exposes are: > ^org.apache.kafka.storage.internals.log:name=RemoteLogReaderAvgIdlePercent,type=RemoteStorageThreadPool^ > ^org.apache.kafka.storage.internals.log:name=RemoteLogReaderTaskQueueSize,type=RemoteStorageThreadPool^ > The problem is the bean domain name is changed from ^{{kafka.log.remote}}^ to > {{{}^org.apache.kafka.storage.internals.log^{}}}. And the type name is also > changed. > We should either update the metrics path in KIP, or fix the path in the code. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15331) Handle remote log enabled topic deletion when leader is not available
[ https://issues.apache.org/jira/browse/KAFKA-15331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15331: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Handle remote log enabled topic deletion when leader is not available > - > > Key: KAFKA-15331 > URL: https://issues.apache.org/jira/browse/KAFKA-15331 > Project: Kafka > Issue Type: Bug >Reporter: Kamal Chandraprakash >Assignee: hudeqi >Priority: Major > Fix For: 3.9.0 > > > When a topic gets deleted, then there can be a case where all the replicas > can be out of ISR. This case is not handled, See: > [https://github.com/apache/kafka/pull/13947#discussion_r1289331347] for more > details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.
[ https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15388: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Handle topics that were having compaction as retention earlier are changed to > delete only retention policy and onboarded to tiered storage. > > > Key: KAFKA-15388 > URL: https://issues.apache.org/jira/browse/KAFKA-15388 > Project: Kafka > Issue Type: Bug >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Fix For: 3.9.0 > > Attachments: Screenshot 2023-11-15 at 3.47.54 PM.png, > tieredtopicloglist.png > > > Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517] > > There are 3 paths I looked at: > * When data is moved to remote storage (1) > * When data is read from remote storage (2) > * When data is deleted from remote storage (3) > (1) Does not have a problem with compacted topics. Compacted segments are > uploaded and their metadata claims they contain offset from the baseOffset of > the segment until the next segment's baseOffset. There are no gaps in offsets. > (2) Does not have a problem if a customer is querying offsets which do not > exist within a segment, but there are offset after the queried offset within > the same segment. *However, it does have a problem when the next available > offset is in a subsequent segment.* > (3) For data deleted via DeleteRecords there is no problem. For data deleted > via retention there is no problem. > > *I believe the proper solution to (2) is to make tiered storage continue > looking for the next greater offset in subsequent segments.* > Steps to reproduce the issue: > {code:java} > // TODO (christo) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14877) refactor InMemoryLeaderEpochCheckpoint
[ https://issues.apache.org/jira/browse/KAFKA-14877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854579#comment-17854579 ] Satish Duggana commented on KAFKA-14877: InMemoryLeaderEpochCheckpoint seems to be deleted with other refactoring. > refactor InMemoryLeaderEpochCheckpoint > -- > > Key: KAFKA-14877 > URL: https://issues.apache.org/jira/browse/KAFKA-14877 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Priority: Minor > Fix For: 3.8.0 > > > follow up with this comment: > https://github.com/apache/kafka/pull/13456#discussion_r1154306477 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14877) refactor InMemoryLeaderEpochCheckpoint
[ https://issues.apache.org/jira/browse/KAFKA-14877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana resolved KAFKA-14877. Resolution: Invalid > refactor InMemoryLeaderEpochCheckpoint > -- > > Key: KAFKA-14877 > URL: https://issues.apache.org/jira/browse/KAFKA-14877 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Priority: Minor > Fix For: 3.8.0 > > > follow up with this comment: > https://github.com/apache/kafka/pull/13456#discussion_r1154306477 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16947) Kafka Tiered Storage V2
[ https://issues.apache.org/jira/browse/KAFKA-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-16947: --- Fix Version/s: 3.9.0 (was: 3.8.0) > Kafka Tiered Storage V2 > --- > > Key: KAFKA-16947 > URL: https://issues.apache.org/jira/browse/KAFKA-16947 > Project: Kafka > Issue Type: Improvement >Reporter: Satish Duggana >Assignee: Satish Duggana >Priority: Major > Labels: KIP-405 > Fix For: 3.9.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16947) Kafka Tiered Storage V2
[ https://issues.apache.org/jira/browse/KAFKA-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-16947: --- Affects Version/s: (was: 3.6.0) > Kafka Tiered Storage V2 > --- > > Key: KAFKA-16947 > URL: https://issues.apache.org/jira/browse/KAFKA-16947 > Project: Kafka > Issue Type: Improvement >Reporter: Satish Duggana >Assignee: Satish Duggana >Priority: Major > Labels: KIP-405 > Fix For: 3.8.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16947) Kafka Tiered Storage V2
Satish Duggana created KAFKA-16947: -- Summary: Kafka Tiered Storage V2 Key: KAFKA-16947 URL: https://issues.apache.org/jira/browse/KAFKA-16947 Project: Kafka Issue Type: Improvement Affects Versions: 3.6.0 Reporter: Satish Duggana Assignee: Satish Duggana Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16890) Failing to build aux state on broker failover
[ https://issues.apache.org/jira/browse/KAFKA-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana resolved KAFKA-16890. Resolution: Fixed > Failing to build aux state on broker failover > - > > Key: KAFKA-16890 > URL: https://issues.apache.org/jira/browse/KAFKA-16890 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.7.0, 3.7.1 >Reporter: Francois Visconte >Assignee: Kamal Chandraprakash >Priority: Major > Fix For: 3.8.0 > > > We have clusters where we replace machines often falling into a state where > we keep having "Error building remote log auxiliary state for > loadtest_topic-22" and the partition being under-replicated until the leader > is manually restarted. > Looking into a specific case, here is what we observed in > __remote_log_metadata topic: > {code:java} > > partition: 29, offset: 183593, value: > RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22, > id=GZeRTXLMSNe2BQjRXkg6hQ}, startOffset=10823, endOffset=11536, > brokerId=10013, maxTimestampMs=1715774588597, eventTimestampMs=1715781657604, > segmentLeaderEpochs={125=10823, 126=10968, 128=11047, 130=11048, 131=11324, > 133=11442, 134=11443, 135=11445, 136=11521, 137=11533, 139=11535}, > segmentSizeInBytes=704895, customMetadata=Optional.empty, > state=COPY_SEGMENT_STARTED} > partition: 29, offset: 183594, value: > RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22, > id=GZeRTXLMSNe2BQjRXkg6hQ}, customMetadata=Optional.empty, > state=COPY_SEGMENT_FINISHED, eventTimestampMs=1715781658183, brokerId=10013} > partition: 29, offset: 183669, value: > RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22, > id=L1TYzx0lQkagRIF86Kp0QQ}, startOffset=10823, endOffset=11544, > brokerId=10008, maxTimestampMs=1715781445270, eventTimestampMs=1715782717593, > segmentLeaderEpochs={125=10823, 126=10968, 128=11047, 130=11048, 131=11324, > 133=11442, 134=11443, 135=11445, 136=11521, 137=11533, 139=11535, 140=11537, > 142=11543}, segmentSizeInBytes=713088, customMetadata=Optional.empty, > state=COPY_SEGMENT_STARTED} > partition: 29, offset: 183670, value: > RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22, > id=L1TYzx0lQkagRIF86Kp0QQ}, customMetadata=Optional.empty, > state=COPY_SEGMENT_FINISHED, eventTimestampMs=1715782718370, brokerId=10008} > partition: 29, offset: 186215, value: > RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22, > id=L1TYzx0lQkagRIF86Kp0QQ}, customMetadata=Optional.empty, > state=DELETE_SEGMENT_STARTED, eventTimestampMs=1715867874617, brokerId=10008} > partition: 29, offset: 186216, value: > RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22, > id=L1TYzx0lQkagRIF86Kp0QQ}, customMetadata=Optional.empty, > state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1715867874725, brokerId=10008} > partition: 29, offset: 186217, value: > RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22, > id=GZeRTXLMSNe2BQjRXkg6hQ}, customMetadata=Optional.empty, > state=DELETE_SEGMENT_STARTED, eventTimestampMs=1715867874729, brokerId=10008} > partition: 29, offset: 186218, value: > RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22, > id=GZeRTXLMSNe2BQjRXkg6hQ}, customMetadata=Optional.empty, > state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1715867874817, brokerId=10008} > {code} > > It seems that at the time the leader is restarted (10013), a second copy of > the same segment is tiered by the new leader (10008). Interestingly the > segment doesn't have the same end offset, which is concerning. > Then the follower sees the following error repeatedly until the leader is > restarted: > > {code:java} > [2024-05-17 20:46:42,133] DEBUG [ReplicaFetcher replicaId=10013, > leaderId=10008, fetcherId=0] Handling errors in processFetchRequest for > partitions HashSet(loadtest_topic-22) (kafka.server.ReplicaFetcherThread) > [2024-05-17 20:46:43,174] DEBUG [ReplicaFetcher replicaId=10013, > leaderId=10008, fetcherId=0] Received error OFFSET_MOVED_TO_TIERED_STORAGE, > at fetch offset: 11537, topic-partition: loadtest_topic-22 > (kafka.server.ReplicaFetcherThread) > [2024-05-17 20:46:43,175] ERROR [ReplicaFetcher replicaId=10013, > leaderId=10008, fetcherId=0] Error
[jira] [Updated] (KAFKA-13560) Load indexes and data in async manner in the critical path of replica fetcher threads.
[ https://issues.apache.org/jira/browse/KAFKA-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-13560: --- Description: https://github.com/apache/kafka/pull/11390#discussion_r762366976 https://github.com/apache/kafka/pull/11390#discussion_r1033141283 https://github.com/apache/kafka/pull/15690 removed the below method from in `TierStateMachine` interface. This can be added back when we implement the functionality required to address this issue. {code:java} public Optional maybeAdvanceState(TopicPartition topicPartition, PartitionFetchState currentFetchState) {code} was: https://github.com/apache/kafka/pull/11390#discussion_r762366976 https://github.com/apache/kafka/pull/11390#discussion_r1033141283 > Load indexes and data in async manner in the critical path of replica fetcher > threads. > --- > > Key: KAFKA-13560 > URL: https://issues.apache.org/jira/browse/KAFKA-13560 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Satish Duggana >Priority: Major > Fix For: 3.8.0 > > > https://github.com/apache/kafka/pull/11390#discussion_r762366976 > https://github.com/apache/kafka/pull/11390#discussion_r1033141283 > https://github.com/apache/kafka/pull/15690 removed the below method from in > `TierStateMachine` interface. This can be added back when we implement the > functionality required to address this issue. > {code:java} > public Optional maybeAdvanceState(TopicPartition > topicPartition, PartitionFetchState currentFetchState) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-13560) Load indexes and data in async manner in the critical path of replica fetcher threads.
[ https://issues.apache.org/jira/browse/KAFKA-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850290#comment-17850290 ] Satish Duggana commented on KAFKA-13560: https://github.com/apache/kafka/pull/15690 removed the below method from in `TierStateMachine` interface. This can be added back when we implement the functionality required to address this issue. {code:java} public Optional maybeAdvanceState(TopicPartition topicPartition, PartitionFetchState currentFetchState) {code} > Load indexes and data in async manner in the critical path of replica fetcher > threads. > --- > > Key: KAFKA-13560 > URL: https://issues.apache.org/jira/browse/KAFKA-13560 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Satish Duggana >Priority: Major > Fix For: 3.8.0 > > > https://github.com/apache/kafka/pull/11390#discussion_r762366976 > https://github.com/apache/kafka/pull/11390#discussion_r1033141283 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15265) Remote copy/fetch quotas for tiered storage.
[ https://issues.apache.org/jira/browse/KAFKA-15265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831383#comment-17831383 ] Satish Duggana commented on KAFKA-15265: KIP-956 is approved. [~abhijeetkumar] is working on pushing the implementation to the trunk. > Remote copy/fetch quotas for tiered storage. > > > Key: KAFKA-15265 > URL: https://issues.apache.org/jira/browse/KAFKA-15265 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Satish Duggana >Assignee: Abhijeet Kumar >Priority: Major > > Related KIP: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16259) Immutable MetadataCache to improve client performance
[ https://issues.apache.org/jira/browse/KAFKA-16259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana reassigned KAFKA-16259: -- Assignee: Zhifeng Chen > Immutable MetadataCache to improve client performance > - > > Key: KAFKA-16259 > URL: https://issues.apache.org/jira/browse/KAFKA-16259 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 2.8.0 >Reporter: Zhifeng Chen >Assignee: Zhifeng Chen >Priority: Major > Attachments: image-2024-02-14-12-11-07-366.png > > > TL;DR, A Kafka client produce latency issue is identified caused by > synchronized lock contention of metadata cache read/write in the native kafka > producer. > Trigger Condition: A producer need to produce to large number of topics. such > as in kafka rest-proxy > > > What is producer metadata cache > Kafka producer maintains a in-memory copy of cluster metadata, and it avoided > fetch metadata every time when produce message to reduce latency > > What’s the synchronized lock contention problem > Kafka producer metadata cache is a *mutable* object, read/write are isolated > by a synchronized lock. Which means when the metadata cache is being updated, > all read requests are blocked. > Topic metadata expiration frequency increase liner with number of topics. In > a kafka cluster with large number of topic partitions, topic metadata > expiration and refresh triggers high frequent metadata update. When read > operation blocked by update, producer threads are blocked and caused high > produce latency issue. > > *Proposed solution* > TL;DR Optimize performance of metadata cache read operation of native kafka > producer with copy-on-write strategy > What is copy-on-write strategy > It’s a solution to reduce synchronized lock contention by making the object > immutable, and always create a new instance when updating, but since the > object is immutable, read operation will be free from waiting, thus produce > latency reduced significantly > Besides performance, it can also make the metadata cache immutable from > unexpected modification, reduce occurrence of code bugs due to incorrect > synchronization > > {*}Test result{*}: > Environment: Kafka-rest-proxy > Client version: 2.8.0 > Number of topic partitions: 250k > test result show 90%+ latency reduction on test cluster > !image-2024-02-14-12-11-07-366.png! > P99 produce latency on deployed instances reduced from 200ms -> 5ms (upper > part show latency after the improvement, lower part show before improvement) > *Dump show details of the problem* > Threads acquiring lock > Kafka-rest-proxy-jetty-thread-pool-199waiting to acquire [ > 0x7f77d70121a0 ] > Kafka-rest-proxy-jetty-thread-pool-200waiting to acquire [ > 0x7f77d70121a0 ] > Kafka-rest-proxy-jetty-thread-pool-202waiting to acquire [ > 0x7f77d70121a0 ] > Kafka-rest-proxy-jetty-thread-pool-203waiting to acquire [ > 0x7f77d70121a0 ] > Kafka-rest-proxy-jetty-thread-pool-204waiting to acquire [ > 0x7f77d70121a0 ] > Kafka-rest-proxy-jetty-thread-pool-205waiting to acquire [ > 0x7f77d70121a0 ] > Kafka-rest-proxy-jetty-thread-pool-207waiting to acquire [ > 0x7f77d70121a0 ] > Kafka-rest-proxy-jetty-thread-pool-212waiting to acquire [ > 0x7f77d70121a0 ] > Kafka-rest-proxy-jetty-thread-pool-214waiting to acquire [ > 0x7f77d70121a0 ] > Kafka-rest-proxy-jetty-thread-pool-215waiting to acquire [ > 0x7f77d70121a0 ] > Kafka-rest-proxy-jetty-thread-pool-217waiting to acquire [ > 0x7f77d70121a0 ] > Kafka-rest-proxy-jetty-thread-pool-218waiting to acquire [ > 0x7f77d70121a0 ] > Kafka-rest-proxy-jetty-thread-pool-219waiting to acquire [ > 0x7f77d70121a0 ] > Kafka-rest-proxy-jetty-thread-pool-222waiting to acquire [ > 0x7f77d70121a0 ] > ... > at org.apache.kafka.clients.Metadata.fetch(Metadata.java:111) > at > org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata(KafkaProducer.java:1019) > at > org.apache.kafka.clients.producer.KafkaProducer.partitionsFor(KafkaProducer.java:1144) > at > io.confluent.kafkarest.producer.internal.MetadataImpl.maybeUpdate(MetadataImpl.java:39) > at > io.confluent.kafkarest.producer.ResilientProducer.send(ResilientProducer.java:117) > Threads hold the lock > kafka-producer-network-thread | kafka-rest-proxyrunning , holding [ > 0x7f77d70121a0 ] > at > java.util.stream.ReferencePipeline$3$1.accept(java.base@11.0.18/ReferencePipeline.java:195) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(java.base@11.0.18/ArrayList.java:1655) > at > java.util.stream.AbstractPipeline.copyInto(java.base@11.0.18/AbstractPipeline.java:484) > at >
[jira] [Assigned] (KAFKA-16161) Avoid creating remote log metadata snapshot file in partition data directory.
[ https://issues.apache.org/jira/browse/KAFKA-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana reassigned KAFKA-16161: -- Assignee: Kamal Chandraprakash > Avoid creating remote log metadata snapshot file in partition data directory. > - > > Key: KAFKA-16161 > URL: https://issues.apache.org/jira/browse/KAFKA-16161 > Project: Kafka > Issue Type: Improvement >Reporter: Satish Duggana >Assignee: Kamal Chandraprakash >Priority: Major > Labels: KIP-405 > > Avoid creating remote log metadata snapshot file in a partition data > directory. This can be added when the snapshots implementation related > functionality is enabled end to end. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16161) Avoid creating remote log metadata snapshot file in partition data directory.
Satish Duggana created KAFKA-16161: -- Summary: Avoid creating remote log metadata snapshot file in partition data directory. Key: KAFKA-16161 URL: https://issues.apache.org/jira/browse/KAFKA-16161 Project: Kafka Issue Type: Improvement Reporter: Satish Duggana Avoid creating remote log metadata snapshot file in a partition data directory. This can be added when the snapshots implementation related functionality is enabled end to end. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16161) Avoid creating remote log metadata snapshot file in partition data directory.
[ https://issues.apache.org/jira/browse/KAFKA-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-16161: --- Labels: KIP-405 (was: ) > Avoid creating remote log metadata snapshot file in partition data directory. > - > > Key: KAFKA-16161 > URL: https://issues.apache.org/jira/browse/KAFKA-16161 > Project: Kafka > Issue Type: Improvement >Reporter: Satish Duggana >Priority: Major > Labels: KIP-405 > > Avoid creating remote log metadata snapshot file in a partition data > directory. This can be added when the snapshots implementation related > functionality is enabled end to end. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16073) Kafka Tiered Storage: Consumer Fetch Error Due to Delayed localLogStartOffset Update During Segment Deletion
[ https://issues.apache.org/jira/browse/KAFKA-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-16073: --- Summary: Kafka Tiered Storage: Consumer Fetch Error Due to Delayed localLogStartOffset Update During Segment Deletion (was: Kafka Tiered Storage Bug: Consumer Fetch Error Due to Delayed localLogStartOffset Update During Segment Deletion) > Kafka Tiered Storage: Consumer Fetch Error Due to Delayed localLogStartOffset > Update During Segment Deletion > > > Key: KAFKA-16073 > URL: https://issues.apache.org/jira/browse/KAFKA-16073 > Project: Kafka > Issue Type: Bug > Components: core, Tiered-Storage >Affects Versions: 3.6.1 >Reporter: hzh0425 >Assignee: hzh0425 >Priority: Major > Labels: KIP-405, kip-405, tiered-storage > Fix For: 3.6.1, 3.8.0 > > > The identified bug in Apache Kafka's tiered storage feature involves a > delayed update of {{localLogStartOffset}} in the > {{UnifiedLog.deleteSegments}} method, impacting consumer fetch operations. > When segments are deleted from the log's memory state, the > {{localLogStartOffset}} isn't promptly updated. Concurrently, > {{ReplicaManager.handleOffsetOutOfRangeError}} checks if a consumer's fetch > offset is less than the {{{}localLogStartOffset{}}}. If it's greater, Kafka > erroneously sends an {{OffsetOutOfRangeException}} to the consumer. > In a specific concurrent scenario, imagine sequential offsets: {{{}offset1 < > offset2 < offset3{}}}. A client requests data at {{{}offset2{}}}. While a > background deletion process removes segments from memory, it hasn't yet > updated the {{LocalLogStartOffset}} from {{offset1}} to {{{}offset3{}}}. > Consequently, when the fetch offset ({{{}offset2{}}}) is evaluated against > the stale {{offset1}} in {{{}ReplicaManager.handleOffsetOutOfRangeError{}}}, > it incorrectly triggers an {{{}OffsetOutOfRangeException{}}}. This issue > arises from the out-of-sync update of {{{}localLogStartOffset{}}}, leading to > incorrect handling of consumer fetch requests and potential data access > errors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16088) Not reading active segments when RemoteFetch return Empty Records.
[ https://issues.apache.org/jira/browse/KAFKA-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806572#comment-17806572 ] Satish Duggana commented on KAFKA-16088: Updated the description clarifying the specific scenario. > Not reading active segments when RemoteFetch return Empty Records. > > > Key: KAFKA-16088 > URL: https://issues.apache.org/jira/browse/KAFKA-16088 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Reporter: Arpit Goyal >Priority: Critical > Labels: tiered-storage > > This issue is about covering local log segments also while finding the > segment for a specific offset when the topic is compacted earlier but it is > changed to retention and enabled with tiered storage. > Please refer to this comment for details > https://github.com/apache/kafka/pull/15060#pullrequestreview-1802495064 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.
[ https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806413#comment-17806413 ] Satish Duggana commented on KAFKA-15388: [PR-15060|https://github.com/apache/kafka/pull/15060] addresses this issue partially and we have a followup [KAFKA-16088|https://issues.apache.org/jira/browse/KAFKA-16088] on the remaining cases. > Handle topics that were having compaction as retention earlier are changed to > delete only retention policy and onboarded to tiered storage. > > > Key: KAFKA-15388 > URL: https://issues.apache.org/jira/browse/KAFKA-15388 > Project: Kafka > Issue Type: Bug >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Fix For: 3.8.0 > > Attachments: Screenshot 2023-11-15 at 3.47.54 PM.png, > tieredtopicloglist.png > > > Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517] > > There are 3 paths I looked at: > * When data is moved to remote storage (1) > * When data is read from remote storage (2) > * When data is deleted from remote storage (3) > (1) Does not have a problem with compacted topics. Compacted segments are > uploaded and their metadata claims they contain offset from the baseOffset of > the segment until the next segment's baseOffset. There are no gaps in offsets. > (2) Does not have a problem if a customer is querying offsets which do not > exist within a segment, but there are offset after the queried offset within > the same segment. *However, it does have a problem when the next available > offset is in a subsequent segment.* > (3) For data deleted via DeleteRecords there is no problem. For data deleted > via retention there is no problem. > > *I believe the proper solution to (2) is to make tiered storage continue > looking for the next greater offset in subsequent segments.* > Steps to reproduce the issue: > {code:java} > // TODO (christo) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16088) Not reading active segments when RemoteFetch return Empty Records.
[ https://issues.apache.org/jira/browse/KAFKA-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-16088: --- Description: This issue is about covering local log segments also while finding the segment for a specific offset when the topic is compacted earlier but it is changed to retention and enabled with tiered storage. Please refer to this comment for details https://github.com/apache/kafka/pull/15060#pullrequestreview-1802495064 was: Please refer this comment for details https://github.com/apache/kafka/pull/15060#issuecomment-1879657273 > Not reading active segments when RemoteFetch return Empty Records. > > > Key: KAFKA-16088 > URL: https://issues.apache.org/jira/browse/KAFKA-16088 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Reporter: Arpit Goyal >Priority: Critical > Labels: tiered-storage > > This issue is about covering local log segments also while finding the > segment for a specific offset when the topic is compacted earlier but it is > changed to retention and enabled with tiered storage. > Please refer to this comment for details > https://github.com/apache/kafka/pull/15060#pullrequestreview-1802495064 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16073) Kafka Tiered Storage Bug: Consumer Fetch Error Due to Delayed localLogStartOffset Update During Segment Deletion
[ https://issues.apache.org/jira/browse/KAFKA-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804124#comment-17804124 ] Satish Duggana commented on KAFKA-16073: Thanks [~hzh0425@apache], we can discuss the details in the PR. > Kafka Tiered Storage Bug: Consumer Fetch Error Due to Delayed > localLogStartOffset Update During Segment Deletion > > > Key: KAFKA-16073 > URL: https://issues.apache.org/jira/browse/KAFKA-16073 > Project: Kafka > Issue Type: Bug > Components: core, Tiered-Storage >Affects Versions: 3.6.1 >Reporter: hzh0425 >Assignee: hzh0425 >Priority: Major > Labels: KIP-405, kip-405, tiered-storage > Fix For: 3.6.1, 3.8.0 > > > The identified bug in Apache Kafka's tiered storage feature involves a > delayed update of {{localLogStartOffset}} in the > {{UnifiedLog.deleteSegments}} method, impacting consumer fetch operations. > When segments are deleted from the log's memory state, the > {{localLogStartOffset}} isn't promptly updated. Concurrently, > {{ReplicaManager.handleOffsetOutOfRangeError}} checks if a consumer's fetch > offset is less than the {{{}localLogStartOffset{}}}. If it's greater, Kafka > erroneously sends an {{OffsetOutOfRangeException}} to the consumer. > In a specific concurrent scenario, imagine sequential offsets: {{{}offset1 < > offset2 < offset3{}}}. A client requests data at {{{}offset2{}}}. While a > background deletion process removes segments from memory, it hasn't yet > updated the {{LocalLogStartOffset}} from {{offset1}} to {{{}offset3{}}}. > Consequently, when the fetch offset ({{{}offset2{}}}) is evaluated against > the stale {{offset1}} in {{{}ReplicaManager.handleOffsetOutOfRangeError{}}}, > it incorrectly triggers an {{{}OffsetOutOfRangeException{}}}. This issue > arises from the out-of-sync update of {{{}localLogStartOffset{}}}, leading to > incorrect handling of consumer fetch requests and potential data access > errors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16060) Some questions about tiered storage capabilities
[ https://issues.apache.org/jira/browse/KAFKA-16060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803470#comment-17803470 ] Satish Duggana commented on KAFKA-16060: [~jianbin] There are no short term plans to support JBOD with tiered storage. > Some questions about tiered storage capabilities > > > Key: KAFKA-16060 > URL: https://issues.apache.org/jira/browse/KAFKA-16060 > Project: Kafka > Issue Type: Wish > Components: core >Affects Versions: 3.6.1 >Reporter: Jianbin Chen >Priority: Major > > # If a topic has 3 replicas, when the local expiration time is reached, will > all 3 replicas trigger the log transfer to the remote storage, or will only > the leader in the isr transfer the log to the remote storage (hdfs, s3) > # Topics that do not support compression, do you mean topics that > log.cleanup.policy=compact? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16073) Kafka Tiered Storage Bug: Consumer Fetch Error Due to Delayed localLogStartOffset Update During Segment Deletion
[ https://issues.apache.org/jira/browse/KAFKA-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-16073: --- Fix Version/s: 3.8.0 > Kafka Tiered Storage Bug: Consumer Fetch Error Due to Delayed > localLogStartOffset Update During Segment Deletion > > > Key: KAFKA-16073 > URL: https://issues.apache.org/jira/browse/KAFKA-16073 > Project: Kafka > Issue Type: Bug > Components: core, Tiered-Storage >Affects Versions: 3.6.1 >Reporter: hzh0425 >Assignee: hzh0425 >Priority: Major > Labels: KIP-405, kip-405, tiered-storage > Fix For: 3.6.1, 3.8.0 > > > The identified bug in Apache Kafka's tiered storage feature involves a > delayed update of {{localLogStartOffset}} in the > {{UnifiedLog.deleteSegments}} method, impacting consumer fetch operations. > When segments are deleted from the log's memory state, the > {{localLogStartOffset}} isn't promptly updated. Concurrently, > {{ReplicaManager.handleOffsetOutOfRangeError}} checks if a consumer's fetch > offset is less than the {{{}localLogStartOffset{}}}. If it's greater, Kafka > erroneously sends an {{OffsetOutOfRangeException}} to the consumer. > In a specific concurrent scenario, imagine sequential offsets: {{{}offset1 < > offset2 < offset3{}}}. A client requests data at {{{}offset2{}}}. While a > background deletion process removes segments from memory, it hasn't yet > updated the {{LocalLogStartOffset}} from {{offset1}} to {{{}offset3{}}}. > Consequently, when the fetch offset ({{{}offset2{}}}) is evaluated against > the stale {{offset1}} in {{{}ReplicaManager.handleOffsetOutOfRangeError{}}}, > it incorrectly triggers an {{{}OffsetOutOfRangeException{}}}. This issue > arises from the out-of-sync update of {{{}localLogStartOffset{}}}, leading to > incorrect handling of consumer fetch requests and potential data access > errors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-16073) Kafka Tiered Storage Bug: Consumer Fetch Error Due to Delayed localLogStartOffset Update During Segment Deletion
[ https://issues.apache.org/jira/browse/KAFKA-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802024#comment-17802024 ] Satish Duggana edited comment on KAFKA-16073 at 1/4/24 3:50 AM: That was a good catch [~hzh0425@apache] ! I think it is better to avoid holding a lock for local-log-start-offset updates or fetches, that can introduce other side effects. We discussed one possible solution is to address it by updating local-log-start-offset before the segments are removed from inmemory and scheduled for deletion but we need to think through the end to end scenarios. cc [~Kamal C] was (Author: satish.duggana): We discussed one possible solution is to address it by updating local-log-start-offset before the segments are removed from inmemory and scheduled for deletion but we need to think through the end to end scenarios. cc [~Kamal C] > Kafka Tiered Storage Bug: Consumer Fetch Error Due to Delayed > localLogStartOffset Update During Segment Deletion > > > Key: KAFKA-16073 > URL: https://issues.apache.org/jira/browse/KAFKA-16073 > Project: Kafka > Issue Type: Bug > Components: core, Tiered-Storage >Affects Versions: 3.6.1 >Reporter: hzh0425 >Assignee: hzh0425 >Priority: Major > Labels: KIP-405, kip-405, tiered-storage > Fix For: 3.6.1 > > > The identified bug in Apache Kafka's tiered storage feature involves a > delayed update of {{localLogStartOffset}} in the > {{UnifiedLog.deleteSegments}} method, impacting consumer fetch operations. > When segments are deleted from the log's memory state, the > {{localLogStartOffset}} isn't promptly updated. Concurrently, > {{ReplicaManager.handleOffsetOutOfRangeError}} checks if a consumer's fetch > offset is less than the {{{}localLogStartOffset{}}}. If it's greater, Kafka > erroneously sends an {{OffsetOutOfRangeException}} to the consumer. > In a specific concurrent scenario, imagine sequential offsets: {{{}offset1 < > offset2 < offset3{}}}. A client requests data at {{{}offset2{}}}. While a > background deletion process removes segments from memory, it hasn't yet > updated the {{LocalLogStartOffset}} from {{offset1}} to {{{}offset3{}}}. > Consequently, when the fetch offset ({{{}offset2{}}}) is evaluated against > the stale {{offset1}} in {{{}ReplicaManager.handleOffsetOutOfRangeError{}}}, > it incorrectly triggers an {{{}OffsetOutOfRangeException{}}}. This issue > arises from the out-of-sync update of {{{}localLogStartOffset{}}}, leading to > incorrect handling of consumer fetch requests and potential data access > errors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16073) Kafka Tiered Storage Bug: Consumer Fetch Error Due to Delayed localLogStartOffset Update During Segment Deletion
[ https://issues.apache.org/jira/browse/KAFKA-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802024#comment-17802024 ] Satish Duggana commented on KAFKA-16073: We discussed one possible solution is to address it by updating local-log-start-offset before the segments are removed from inmemory and scheduled for deletion but we need to think through the end to end scenarios. cc [~Kamal C] > Kafka Tiered Storage Bug: Consumer Fetch Error Due to Delayed > localLogStartOffset Update During Segment Deletion > > > Key: KAFKA-16073 > URL: https://issues.apache.org/jira/browse/KAFKA-16073 > Project: Kafka > Issue Type: Bug > Components: core, Tiered-Storage >Affects Versions: 3.6.1 >Reporter: hzh0425 >Assignee: hzh0425 >Priority: Major > Labels: KIP-405, kip-405, tiered-storage > Fix For: 3.6.1 > > > The identified bug in Apache Kafka's tiered storage feature involves a > delayed update of {{localLogStartOffset}} in the > {{UnifiedLog.deleteSegments}} method, impacting consumer fetch operations. > When segments are deleted from the log's memory state, the > {{localLogStartOffset}} isn't promptly updated. Concurrently, > {{ReplicaManager.handleOffsetOutOfRangeError}} checks if a consumer's fetch > offset is less than the {{{}localLogStartOffset{}}}. If it's greater, Kafka > erroneously sends an {{OffsetOutOfRangeException}} to the consumer. > In a specific concurrent scenario, imagine sequential offsets: {{{}offset1 < > offset2 < offset3{}}}. A client requests data at {{{}offset2{}}}. While a > background deletion process removes segments from memory, it hasn't yet > updated the {{LocalLogStartOffset}} from {{offset1}} to {{{}offset3{}}}. > Consequently, when the fetch offset ({{{}offset2{}}}) is evaluated against > the stale {{offset1}} in {{{}ReplicaManager.handleOffsetOutOfRangeError{}}}, > it incorrectly triggers an {{{}OffsetOutOfRangeException{}}}. This issue > arises from the out-of-sync update of {{{}localLogStartOffset{}}}, leading to > incorrect handling of consumer fetch requests and potential data access > errors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16073) Kafka Tiered Storage Bug: Consumer Fetch Error Due to Delayed localLogStartOffset Update During Segment Deletion
[ https://issues.apache.org/jira/browse/KAFKA-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801663#comment-17801663 ] Satish Duggana commented on KAFKA-16073: Thanks [~hzh0425@apache] for filing JIRA with a detailed description. I am trying to summarize the scenario that you mentioned earlier in JIRA description with an example. Let me know if I am missing anything here. Let us assume each segment has one offset in this example. log start offset0 log end offset 10 local log start offset 4 fetch offset6 new local log start offset 7 Deletion based on retention configs is started and eventually updating the local log start offset as 7. There is a race condition here where the segments list is updated by removing 4, 5, and 6 offset segments in LocalLog and then updates the local-log-start-offset. But fetch offset is being served concurrently and it may throw OffsetOutOfRangeException if the inmemory segments are already removed in LocalLog and local-log-start-offset is not yet updated as 7 when it executes the [code|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L1866] as it fails the condition because fetch offset(6) < old local-log-start-offset(4). > Kafka Tiered Storage Bug: Consumer Fetch Error Due to Delayed > localLogStartOffset Update During Segment Deletion > > > Key: KAFKA-16073 > URL: https://issues.apache.org/jira/browse/KAFKA-16073 > Project: Kafka > Issue Type: Bug > Components: core, Tiered-Storage >Affects Versions: 3.6.1 >Reporter: hzh0425 >Assignee: hzh0425 >Priority: Major > Labels: KIP-405, kip-405, tiered-storage > Fix For: 3.6.1 > > > The identified bug in Apache Kafka's tiered storage feature involves a > delayed update of {{localLogStartOffset}} in the > {{UnifiedLog.deleteSegments}} method, impacting consumer fetch operations. > When segments are deleted from the log's memory state, the > {{localLogStartOffset}} isn't promptly updated. Concurrently, > {{ReplicaManager.handleOffsetOutOfRangeError}} checks if a consumer's fetch > offset is less than the {{{}localLogStartOffset{}}}. If it's greater, Kafka > erroneously sends an {{OffsetOutOfRangeException}} to the consumer. > In a specific concurrent scenario, imagine sequential offsets: {{{}offset1 < > offset2 < offset3{}}}. A client requests data at {{{}offset2{}}}. While a > background deletion process removes segments from memory, it hasn't yet > updated the {{LocalLogStartOffset}} from {{offset1}} to {{{}offset3{}}}. > Consequently, when the fetch offset ({{{}offset2{}}}) is evaluated against > the stale {{offset1}} in {{{}ReplicaManager.handleOffsetOutOfRangeError{}}}, > it incorrectly triggers an {{{}OffsetOutOfRangeException{}}}. This issue > arises from the out-of-sync update of {{{}localLogStartOffset{}}}, leading to > incorrect handling of consumer fetch requests and potential data access > errors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16060) Some questions about tiered storage capabilities
[ https://issues.apache.org/jira/browse/KAFKA-16060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801167#comment-17801167 ] Satish Duggana commented on KAFKA-16060: [~jianbin] For Q2: Do you mean compaction topics are supported with tiered storage? No, tiered storage is not supported for compaction enabled topics. > Some questions about tiered storage capabilities > > > Key: KAFKA-16060 > URL: https://issues.apache.org/jira/browse/KAFKA-16060 > Project: Kafka > Issue Type: Wish > Components: core >Affects Versions: 3.6.1 >Reporter: Jianbin Chen >Priority: Major > > # If a topic has 3 replicas, when the local expiration time is reached, will > all 3 replicas trigger the log transfer to the remote storage, or will only > the leader in the isr transfer the log to the remote storage (hdfs, s3) > # Topics that do not support compression, do you mean topics that > log.cleanup.policy=compact? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15241) Compute tiered offset by keeping the respective epochs in scope.
[ https://issues.apache.org/jira/browse/KAFKA-15241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15241: --- Fix Version/s: 3.7.0 > Compute tiered offset by keeping the respective epochs in scope. > > > Key: KAFKA-15241 > URL: https://issues.apache.org/jira/browse/KAFKA-15241 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.6.0 >Reporter: Satish Duggana >Assignee: Kamal Chandraprakash >Priority: Major > Fix For: 3.7.0 > > > This is a followup on the discussion > [thread|https://github.com/apache/kafka/pull/14004#discussion_r1268911909] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15158) Add metrics for RemoteDeleteRequestsPerSec, RemoteDeleteErrorsPerSec, BuildRemoteLogAuxStateRequestsPerSec, BuildRemoteLogAuxStateErrorsPerSec
[ https://issues.apache.org/jira/browse/KAFKA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15158: --- Labels: tiered-storage (was: ) > Add metrics for RemoteDeleteRequestsPerSec, RemoteDeleteErrorsPerSec, > BuildRemoteLogAuxStateRequestsPerSec, BuildRemoteLogAuxStateErrorsPerSec > -- > > Key: KAFKA-15158 > URL: https://issues.apache.org/jira/browse/KAFKA-15158 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Gantigmaa Selenge >Priority: Major > Labels: tiered-storage > Fix For: 3.7.0 > > > Add the following metrics for better observability into the RemoteLog related > activities inside the broker. > 1. RemoteWriteRequestsPerSec > 2. RemoteDeleteRequestsPerSec > 3. BuildRemoteLogAuxStateRequestsPerSec > > These metrics will be calculated at topic level (we can add them at > brokerTopicStats) > -*RemoteWriteRequestsPerSec* will be marked on every call to > RemoteLogManager#- > -copyLogSegmentsToRemote()- already covered by KAFKA-14953 > > *RemoteDeleteRequestsPerSec* will be marked on every call to > RemoteLogManager#cleanupExpiredRemoteLogSegments(). This method is introduced > in [https://github.com/apache/kafka/pull/13561] > *BuildRemoteLogAuxStateRequestsPerSec* will be marked on every call to > ReplicaFetcherTierStateMachine#buildRemoteLogAuxState() > > (Note: For all the above, add Error metrics as well such as > RemoteDeleteErrorPerSec) > (Note: This requires a change in KIP-405 and hence, must be approved by KIP > author [~satishd] ) > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16014) Implement RemoteLogSizeComputationTime, RemoteLogSizeBytes, RemoteLogMetadataCount
[ https://issues.apache.org/jira/browse/KAFKA-16014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-16014: --- Fix Version/s: 3.7.0 > Implement RemoteLogSizeComputationTime, RemoteLogSizeBytes, > RemoteLogMetadataCount > -- > > Key: KAFKA-16014 > URL: https://issues.apache.org/jira/browse/KAFKA-16014 > Project: Kafka > Issue Type: Sub-task >Reporter: Luke Chen >Priority: Major > Fix For: 3.7.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15147) Measure pending and outstanding Remote Segment operations
[ https://issues.apache.org/jira/browse/KAFKA-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796808#comment-17796808 ] Satish Duggana edited comment on KAFKA-15147 at 12/14/23 3:54 PM: -- [~enether] These are minor improvements, we can target them to 3.7.0. Christo, [~showuon] etal are working on raising PRs and we plan to review and merge them with in a week. was (Author: satish.duggana): [~enether] These are minor improvements, we can target them to 3.7.0. Christo, Luke etal are working on PRs and we plan to review and merge them. > Measure pending and outstanding Remote Segment operations > - > > Key: KAFKA-15147 > URL: https://issues.apache.org/jira/browse/KAFKA-15147 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Jorge Esteban Quilcate Otoya >Assignee: Christo Lolov >Priority: Major > Labels: tiered-storage > Fix For: 3.7.0 > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-963%3A+Upload+and+delete+lag+metrics+in+Tiered+Storage > > KAFKA-15833: RemoteCopyLagBytes > KAFKA-16002: RemoteCopyLagSegments, RemoteDeleteLagBytes, > RemoteDeleteLagSegments > KAFKA-16013: ExpiresPerSec > KAFKA-16014: RemoteLogSizeComputationTime, RemoteLogSizeBytes, > RemoteLogMetadataCount > KAFKA-15158: RemoteDeleteRequestsPerSec, RemoteDeleteErrorsPerSec, > BuildRemoteLogAuxStateRequestsPerSec, BuildRemoteLogAuxStateErrorsPerSec > > Remote Log Segment operations (copy/delete) are executed by the Remote > Storage Manager, and recorded by Remote Log Metadata Manager (e.g. default > TopicBasedRLMM writes to the internal Kafka topic state changes on remote log > segments). > As executions run, fail, and retry; it will be important to know how many > operations are pending and outstanding over time to alert operators. > Pending operations are not enough to alert, as values can oscillate closer to > zero. An additional condition needs to apply (running time > threshold) to > consider an operation outstanding. > Proposal: > RemoteLogManager could be extended with 2 concurrent maps > (pendingSegmentCopies, pendingSegmentDeletes) `Map[Uuid, Long]` to measure > segmentId time when operation started, and based on this expose 2 metrics per > operation: > * pendingSegmentCopies: gauge of pendingSegmentCopies map > * outstandingSegmentCopies: loop over pending ops, and if now - startedTime > > timeout, then outstanding++ (maybe on debug level?) > Is this a valuable metric to add to Tiered Storage? or better to solve on a > custom RLMM implementation? > Also, does it require a KIP? > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15147) Measure pending and outstanding Remote Segment operations
[ https://issues.apache.org/jira/browse/KAFKA-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796808#comment-17796808 ] Satish Duggana commented on KAFKA-15147: [~enether] These are minor improvements, we can target them to 3.7.0. Christo, Luke etal are working on PRs and we plan to review and merge them. > Measure pending and outstanding Remote Segment operations > - > > Key: KAFKA-15147 > URL: https://issues.apache.org/jira/browse/KAFKA-15147 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Jorge Esteban Quilcate Otoya >Assignee: Christo Lolov >Priority: Major > Labels: tiered-storage > Fix For: 3.7.0 > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-963%3A+Upload+and+delete+lag+metrics+in+Tiered+Storage > > KAFKA-15833: RemoteCopyLagBytes > KAFKA-16002: RemoteCopyLagSegments, RemoteDeleteLagBytes, > RemoteDeleteLagSegments > KAFKA-16013: ExpiresPerSec > KAFKA-16014: RemoteLogSizeComputationTime, RemoteLogSizeBytes, > RemoteLogMetadataCount > KAFKA-15158: RemoteDeleteRequestsPerSec, RemoteDeleteErrorsPerSec, > BuildRemoteLogAuxStateRequestsPerSec, BuildRemoteLogAuxStateErrorsPerSec > > Remote Log Segment operations (copy/delete) are executed by the Remote > Storage Manager, and recorded by Remote Log Metadata Manager (e.g. default > TopicBasedRLMM writes to the internal Kafka topic state changes on remote log > segments). > As executions run, fail, and retry; it will be important to know how many > operations are pending and outstanding over time to alert operators. > Pending operations are not enough to alert, as values can oscillate closer to > zero. An additional condition needs to apply (running time > threshold) to > consider an operation outstanding. > Proposal: > RemoteLogManager could be extended with 2 concurrent maps > (pendingSegmentCopies, pendingSegmentDeletes) `Map[Uuid, Long]` to measure > segmentId time when operation started, and based on this expose 2 metrics per > operation: > * pendingSegmentCopies: gauge of pendingSegmentCopies map > * outstandingSegmentCopies: loop over pending ops, and if now - startedTime > > timeout, then outstanding++ (maybe on debug level?) > Is this a valuable metric to add to Tiered Storage? or better to solve on a > custom RLMM implementation? > Also, does it require a KIP? > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15931) Cached transaction index gets closed if tiered storage read is interrupted
[ https://issues.apache.org/jira/browse/KAFKA-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791514#comment-17791514 ] Satish Duggana commented on KAFKA-15931: Sure [~ivanyu] , added it to https://issues.apache.org/jira/browse/KAFKA-15420 > Cached transaction index gets closed if tiered storage read is interrupted > -- > > Key: KAFKA-15931 > URL: https://issues.apache.org/jira/browse/KAFKA-15931 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Ivan Yurchenko >Priority: Minor > > This reproduces when reading from remote storage with the default > {{fetch.max.wait.ms}} (500) or lower. {{isolation.level=read_committed}} is > needed to trigger this. > It's not easy to reproduce on local-only setups, unfortunately, because reads > are fast and aren't interrupted. > This error is logged > {noformat} > [2023-11-29 14:01:01,166] ERROR Error occurred while reading the remote data > for topic1-0 (kafka.log.remote.RemoteLogReader) > org.apache.kafka.common.KafkaException: Failed read position from the > transaction index > at > org.apache.kafka.storage.internals.log.TransactionIndex$1.hasNext(TransactionIndex.java:235) > at > org.apache.kafka.storage.internals.log.TransactionIndex.collectAbortedTxns(TransactionIndex.java:171) > at > kafka.log.remote.RemoteLogManager.collectAbortedTransactions(RemoteLogManager.java:1359) > at > kafka.log.remote.RemoteLogManager.addAbortedTransactions(RemoteLogManager.java:1341) > at kafka.log.remote.RemoteLogManager.read(RemoteLogManager.java:1310) > at kafka.log.remote.RemoteLogReader.call(RemoteLogReader.java:62) > at kafka.log.remote.RemoteLogReader.call(RemoteLogReader.java:31) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.nio.channels.ClosedChannelException > at > java.base/sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:150) > at java.base/sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:325) > at > org.apache.kafka.storage.internals.log.TransactionIndex$1.hasNext(TransactionIndex.java:233) > ... 10 more > {noformat} > and after that this txn index becomes unusable until the process is restarted. > I suspect, it's caused by the reading thread being interrupted due to the > fetch timeout. At least [this > code|https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/19fb8f93c59dfd791f62d41f332db9e306bc1422/src/java.base/share/classes/java/nio/channels/spi/AbstractInterruptibleChannel.java#L159-L160] > in {{AbstractInterruptibleChannel}} is called. > Fixing may be easy: reopen the channel in {{TransactionIndex}} if it's close. > However, off the top of my head I can't say if there are some less obvious > implications of this change. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15857) Introduce LocalLogStartOffset and TieredOffset in OffsetSpec.
[ https://issues.apache.org/jira/browse/KAFKA-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15857: --- Fix Version/s: 3.7.0 > Introduce LocalLogStartOffset and TieredOffset in OffsetSpec. > - > > Key: KAFKA-15857 > URL: https://issues.apache.org/jira/browse/KAFKA-15857 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Satish Duggana >Assignee: Christo Lolov >Priority: Major > Labels: need-kip, tiered-storage > Fix For: 3.7.0 > > > Introduce EarliestLocalOffset and TieredOffset in OffsetSpec which will help > in finding respective offsets while using AdminClient#listOffsets(). > EarliestLocalOffset - local log start offset of a topic partition. > TieredOffset - Highest offset up to which the segments were copied to remote > storage. > We can discuss further on naming and semantics of these offset specs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15864) Add more tests asserting the log-start-offset, local-log-start-offset, and HW/LSO/LEO in rolling over segments with tiered storage.
[ https://issues.apache.org/jira/browse/KAFKA-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15864: --- Description: Followup on the [comment|https://github.com/apache/kafka/pull/14766/files#r1395389551] > Add more tests asserting the log-start-offset, local-log-start-offset, and > HW/LSO/LEO in rolling over segments with tiered storage. > --- > > Key: KAFKA-15864 > URL: https://issues.apache.org/jira/browse/KAFKA-15864 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Satish Duggana >Priority: Major > Labels: tiered-storage > Fix For: 3.7.0 > > > Followup on the > [comment|https://github.com/apache/kafka/pull/14766/files#r1395389551] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15864) Add more tests asserting the log-start-offset, local-log-start-offset, and HW/LSO/LEO in rolling over segments with tiered storage.
Satish Duggana created KAFKA-15864: -- Summary: Add more tests asserting the log-start-offset, local-log-start-offset, and HW/LSO/LEO in rolling over segments with tiered storage. Key: KAFKA-15864 URL: https://issues.apache.org/jira/browse/KAFKA-15864 Project: Kafka Issue Type: Improvement Components: core Reporter: Satish Duggana Fix For: 3.7.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15857) Introduce LocalLogStartOffset and TieredOffset in OffsetSpec.
Satish Duggana created KAFKA-15857: -- Summary: Introduce LocalLogStartOffset and TieredOffset in OffsetSpec. Key: KAFKA-15857 URL: https://issues.apache.org/jira/browse/KAFKA-15857 Project: Kafka Issue Type: Improvement Components: core Reporter: Satish Duggana Introduce EarliestLocalOffset and TieredOffset in OffsetSpec which will help in finding respective offsets while using AdminClient#listOffsets(). EarliestLocalOffset - local log start offset of a topic partition. TieredOffset - Highest offset up to which the segments were copied to remote storage. We can discuss further on naming and semantics of these offset specs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15857) Introduce LocalLogStartOffset and TieredOffset in OffsetSpec.
[ https://issues.apache.org/jira/browse/KAFKA-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15857: --- Labels: need-kip tiered-storage (was: tiered-storage) > Introduce LocalLogStartOffset and TieredOffset in OffsetSpec. > - > > Key: KAFKA-15857 > URL: https://issues.apache.org/jira/browse/KAFKA-15857 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Satish Duggana >Priority: Major > Labels: need-kip, tiered-storage > > Introduce EarliestLocalOffset and TieredOffset in OffsetSpec which will help > in finding respective offsets while using AdminClient#listOffsets(). > EarliestLocalOffset - local log start offset of a topic partition. > TieredOffset - Highest offset up to which the segments were copied to remote > storage. > We can discuss further on naming and semantics of these offset specs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15802) Trying to access uncopied segments metadata on listOffsets
[ https://issues.apache.org/jira/browse/KAFKA-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787799#comment-17787799 ] Satish Duggana commented on KAFKA-15802: [~mimaison] Sorry for updating the JIRA. This can be resolved as fixed, I closed it now. > Trying to access uncopied segments metadata on listOffsets > -- > > Key: KAFKA-15802 > URL: https://issues.apache.org/jira/browse/KAFKA-15802 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Francois Visconte >Assignee: Jorge Esteban Quilcate Otoya >Priority: Major > Fix For: 3.7.0, 3.6.1 > > > We have a tiered storage cluster running with Aiven s3 plugin. > On our cluster, we have a process doing regular listOffsets requests. > This triggers the following exception: > {code:java} > org.apache.kafka.common.KafkaException: > org.apache.kafka.server.log.remote.storage.RemoteResourceNotFoundException: > Requested remote resource was not found > at > org.apache.kafka.storage.internals.log.RemoteIndexCache.lambda$createCacheEntry$6(RemoteIndexCache.java:355) > at > org.apache.kafka.storage.internals.log.RemoteIndexCache.loadIndexFile(RemoteIndexCache.java:318) > Nov 09, 2023 1:42:01 PM com.github.benmanes.caffeine.cache.LocalAsyncCache > lambda$handleCompletion$7 > WARNING: Exception thrown during asynchronous load > java.util.concurrent.CompletionException: > io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key > cluster/topic-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest > does not exists in storage S3Storage{bucketName='bucket', partSize=16777216} > at > com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:107) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1760) > at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) > at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) > Caused by: io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key > cluster/topic-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest > does not exists in storage S3Storage{bucketName='bucket', partSize=16777216} > at io.aiven.kafka.tieredstorage.storage.s3.S3Storage.fetch(S3Storage.java:80) > at > io.aiven.kafka.tieredstorage.manifest.SegmentManifestProvider.lambda$new$1(SegmentManifestProvider.java:59) > at > com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:103) > ... 7 more > Caused by: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The > specified key does not exist. (Service: S3, Status Code: 404, Request ID: > CFMP27PVC9V2NNEM, Extended Request ID: > F5qqlV06qQJ5qCuWl91oueBaha0QLMBURJudnOnFDQk+YbgFcAg70JBATcARDxN44DGo+PpfZHAsum+ioYMoOw==) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:41) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30) > at > software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) > at >
[jira] [Updated] (KAFKA-15802) Trying to access uncopied segments metadata on listOffsets
[ https://issues.apache.org/jira/browse/KAFKA-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15802: --- Fix Version/s: 3.7.0 3.6.1 > Trying to access uncopied segments metadata on listOffsets > -- > > Key: KAFKA-15802 > URL: https://issues.apache.org/jira/browse/KAFKA-15802 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Francois Visconte >Assignee: Jorge Esteban Quilcate Otoya >Priority: Major > Fix For: 3.7.0, 3.6.1 > > > We have a tiered storage cluster running with Aiven s3 plugin. > On our cluster, we have a process doing regular listOffsets requests. > This triggers the following exception: > {code:java} > org.apache.kafka.common.KafkaException: > org.apache.kafka.server.log.remote.storage.RemoteResourceNotFoundException: > Requested remote resource was not found > at > org.apache.kafka.storage.internals.log.RemoteIndexCache.lambda$createCacheEntry$6(RemoteIndexCache.java:355) > at > org.apache.kafka.storage.internals.log.RemoteIndexCache.loadIndexFile(RemoteIndexCache.java:318) > Nov 09, 2023 1:42:01 PM com.github.benmanes.caffeine.cache.LocalAsyncCache > lambda$handleCompletion$7 > WARNING: Exception thrown during asynchronous load > java.util.concurrent.CompletionException: > io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key > cluster/topic-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest > does not exists in storage S3Storage{bucketName='bucket', partSize=16777216} > at > com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:107) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1760) > at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) > at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) > Caused by: io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key > cluster/topic-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest > does not exists in storage S3Storage{bucketName='bucket', partSize=16777216} > at io.aiven.kafka.tieredstorage.storage.s3.S3Storage.fetch(S3Storage.java:80) > at > io.aiven.kafka.tieredstorage.manifest.SegmentManifestProvider.lambda$new$1(SegmentManifestProvider.java:59) > at > com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:103) > ... 7 more > Caused by: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The > specified key does not exist. (Service: S3, Status Code: 404, Request ID: > CFMP27PVC9V2NNEM, Extended Request ID: > F5qqlV06qQJ5qCuWl91oueBaha0QLMBURJudnOnFDQk+YbgFcAg70JBATcARDxN44DGo+PpfZHAsum+ioYMoOw==) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:41) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30) > at > software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) > at >
[jira] [Commented] (KAFKA-15802) Trying to access uncopied segments metadata on listOffsets
[ https://issues.apache.org/jira/browse/KAFKA-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784698#comment-17784698 ] Satish Duggana commented on KAFKA-15802: [~jeqo] [~divijvaidya] +1 to go with filtering the targeted segments that indicate the segment is available for now. This is the existing approach in 2.8.x internal tiered storage implementation branches. We can discuss later whether a general filtering API is required for some of the methods or a very specific API is needed. > Trying to access uncopied segments metadata on listOffsets > -- > > Key: KAFKA-15802 > URL: https://issues.apache.org/jira/browse/KAFKA-15802 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Francois Visconte >Priority: Major > > We have a tiered storage cluster running with Aiven s3 plugin. > On our cluster, we have a process doing regular listOffsets requests. > This triggers the following exception: > {code:java} > org.apache.kafka.common.KafkaException: > org.apache.kafka.server.log.remote.storage.RemoteResourceNotFoundException: > Requested remote resource was not found > at > org.apache.kafka.storage.internals.log.RemoteIndexCache.lambda$createCacheEntry$6(RemoteIndexCache.java:355) > at > org.apache.kafka.storage.internals.log.RemoteIndexCache.loadIndexFile(RemoteIndexCache.java:318) > Nov 09, 2023 1:42:01 PM com.github.benmanes.caffeine.cache.LocalAsyncCache > lambda$handleCompletion$7 > WARNING: Exception thrown during asynchronous load > java.util.concurrent.CompletionException: > io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key > cluster/topic-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest > does not exists in storage S3Storage{bucketName='bucket', partSize=16777216} > at > com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:107) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1760) > at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) > at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) > Caused by: io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key > cluster/topic-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest > does not exists in storage S3Storage{bucketName='bucket', partSize=16777216} > at io.aiven.kafka.tieredstorage.storage.s3.S3Storage.fetch(S3Storage.java:80) > at > io.aiven.kafka.tieredstorage.manifest.SegmentManifestProvider.lambda$new$1(SegmentManifestProvider.java:59) > at > com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:103) > ... 7 more > Caused by: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The > specified key does not exist. (Service: S3, Status Code: 404, Request ID: > CFMP27PVC9V2NNEM, Extended Request ID: > F5qqlV06qQJ5qCuWl91oueBaha0QLMBURJudnOnFDQk+YbgFcAg70JBATcARDxN44DGo+PpfZHAsum+ioYMoOw==) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:41) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30) > at > software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) > at >
[jira] [Updated] (KAFKA-15802) Trying to access uncopied segments metadata on listOffsets
[ https://issues.apache.org/jira/browse/KAFKA-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15802: --- Attachment: (was: screenshot-1.png) > Trying to access uncopied segments metadata on listOffsets > -- > > Key: KAFKA-15802 > URL: https://issues.apache.org/jira/browse/KAFKA-15802 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Francois Visconte >Priority: Major > > We have a tiered storage cluster running with Aiven s3 plugin. > On our cluster, we have a process doing regular listOffsets requests. > This triggers the following exception: > {code:java} > org.apache.kafka.common.KafkaException: > org.apache.kafka.server.log.remote.storage.RemoteResourceNotFoundException: > Requested remote resource was not found > at > org.apache.kafka.storage.internals.log.RemoteIndexCache.lambda$createCacheEntry$6(RemoteIndexCache.java:355) > at > org.apache.kafka.storage.internals.log.RemoteIndexCache.loadIndexFile(RemoteIndexCache.java:318) > Nov 09, 2023 1:42:01 PM com.github.benmanes.caffeine.cache.LocalAsyncCache > lambda$handleCompletion$7 > WARNING: Exception thrown during asynchronous load > java.util.concurrent.CompletionException: > io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key > cluster/topic-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest > does not exists in storage S3Storage{bucketName='bucket', partSize=16777216} > at > com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:107) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1760) > at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) > at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) > Caused by: io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key > cluster/topic-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest > does not exists in storage S3Storage{bucketName='bucket', partSize=16777216} > at io.aiven.kafka.tieredstorage.storage.s3.S3Storage.fetch(S3Storage.java:80) > at > io.aiven.kafka.tieredstorage.manifest.SegmentManifestProvider.lambda$new$1(SegmentManifestProvider.java:59) > at > com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:103) > ... 7 more > Caused by: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The > specified key does not exist. (Service: S3, Status Code: 404, Request ID: > CFMP27PVC9V2NNEM, Extended Request ID: > F5qqlV06qQJ5qCuWl91oueBaha0QLMBURJudnOnFDQk+YbgFcAg70JBATcARDxN44DGo+PpfZHAsum+ioYMoOw==) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:41) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30) > at > software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:52) > at >
[jira] [Updated] (KAFKA-15802) Trying to access uncopied segments metadata on listOffsets
[ https://issues.apache.org/jira/browse/KAFKA-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15802: --- Attachment: screenshot-1.png > Trying to access uncopied segments metadata on listOffsets > -- > > Key: KAFKA-15802 > URL: https://issues.apache.org/jira/browse/KAFKA-15802 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Francois Visconte >Priority: Major > Attachments: screenshot-1.png > > > We have a tiered storage cluster running with Aiven s3 plugin. > On our cluster, we have a process doing regular listOffsets requests. > This triggers the following exception: > {code:java} > org.apache.kafka.common.KafkaException: > org.apache.kafka.server.log.remote.storage.RemoteResourceNotFoundException: > Requested remote resource was not found > at > org.apache.kafka.storage.internals.log.RemoteIndexCache.lambda$createCacheEntry$6(RemoteIndexCache.java:355) > at > org.apache.kafka.storage.internals.log.RemoteIndexCache.loadIndexFile(RemoteIndexCache.java:318) > Nov 09, 2023 1:42:01 PM com.github.benmanes.caffeine.cache.LocalAsyncCache > lambda$handleCompletion$7 > WARNING: Exception thrown during asynchronous load > java.util.concurrent.CompletionException: > io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key > cluster/topic-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest > does not exists in storage S3Storage{bucketName='bucket', partSize=16777216} > at > com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:107) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1760) > at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) > at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) > Caused by: io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key > cluster/topic-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest > does not exists in storage S3Storage{bucketName='bucket', partSize=16777216} > at io.aiven.kafka.tieredstorage.storage.s3.S3Storage.fetch(S3Storage.java:80) > at > io.aiven.kafka.tieredstorage.manifest.SegmentManifestProvider.lambda$new$1(SegmentManifestProvider.java:59) > at > com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:103) > ... 7 more > Caused by: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The > specified key does not exist. (Service: S3, Status Code: 404, Request ID: > CFMP27PVC9V2NNEM, Extended Request ID: > F5qqlV06qQJ5qCuWl91oueBaha0QLMBURJudnOnFDQk+YbgFcAg70JBATcARDxN44DGo+PpfZHAsum+ioYMoOw==) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60) > at > software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:41) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30) > at > software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:52) > at >
[jira] [Comment Edited] (KAFKA-15609) Corrupted index uploaded to remote tier
[ https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775964#comment-17775964 ] Satish Duggana edited comment on KAFKA-15609 at 10/17/23 12:55 AM: --- We never saw the reported beahvior in none of our clusters/environments. These memory mapped operations are generally OS/platform dependent. I prefer to have a defined contract at the usages level rather than depending on any assumptions. In this case, we can have a contract for RLM to make sure these files are flushed to disk before file paths are given to RSM to read this data and write it into remote storage. was (Author: satish.duggana): These memory mapped operations are generally OS/platform dependent. I prefer to have a defined contract at the usages level rather than depending on any assumptions. In this case, we can have a contract for RLM to make sure these files are flushed to disk before file paths are given to RSM to read this data and write it into remote storage. > Corrupted index uploaded to remote tier > --- > > Key: KAFKA-15609 > URL: https://issues.apache.org/jira/browse/KAFKA-15609 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Divij Vaidya >Priority: Minor > > While testing Tiered Storage, we have observed corrupt indexes being present > in remote tier. One such situation is covered here at > https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another > such possible case of corruption. > Potential cause of index corruption: > We want to ensure that the file we are passing to RSM plugin contains all the > data which is present in MemoryByteBuffer i.e. we should have flushed the > MemoryByteBuffer to the file using force(). In Kafka, when we close a > segment, indexes are flushed asynchronously [1]. Hence, it might be possible > that when we are passing the file to RSM, the file doesn't contain flushed > data. Hence, we may end up uploading indexes which haven't been flushed yet. > Ideally, the contract should enforce that we force flush the content of > MemoryByteBuffer before we give the file for RSM. This will ensure that > indexes are not corrupted/incomplete. > [1] > [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15609) Corrupted index uploaded to remote tier
[ https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775964#comment-17775964 ] Satish Duggana commented on KAFKA-15609: These memory mapped operations are generally OS/platform dependent. I prefer to have a defined contract at the usages level rather than depending on any assumptions. In this case, we can have a contract for RLM to make sure these files are flushed to disk before file paths are given to RSM to read this data and write it into remote storage. > Corrupted index uploaded to remote tier > --- > > Key: KAFKA-15609 > URL: https://issues.apache.org/jira/browse/KAFKA-15609 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Divij Vaidya >Priority: Minor > > While testing Tiered Storage, we have observed corrupt indexes being present > in remote tier. One such situation is covered here at > https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another > such possible case of corruption. > Potential cause of index corruption: > We want to ensure that the file we are passing to RSM plugin contains all the > data which is present in MemoryByteBuffer i.e. we should have flushed the > MemoryByteBuffer to the file using force(). In Kafka, when we close a > segment, indexes are flushed asynchronously [1]. Hence, it might be possible > that when we are passing the file to RSM, the file doesn't contain flushed > data. Hence, we may end up uploading indexes which haven't been flushed yet. > Ideally, the contract should enforce that we force flush the content of > MemoryByteBuffer before we give the file for RSM. This will ensure that > indexes are not corrupted/incomplete. > [1] > [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15609) Corrupted index uploaded to remote tier
[ https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775961#comment-17775961 ] Satish Duggana commented on KAFKA-15609: [~divijvaidya] How are index files read in your RSM implementation to copy them to remote storage? https://issues.apache.org/jira/browse/KAFKA-15612 filed based on another discussion related to the issue mentioned in this JIRA. > Corrupted index uploaded to remote tier > --- > > Key: KAFKA-15609 > URL: https://issues.apache.org/jira/browse/KAFKA-15609 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Divij Vaidya >Priority: Minor > > While testing Tiered Storage, we have observed corrupt indexes being present > in remote tier. One such situation is covered here at > https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another > such possible case of corruption. > Potential cause of index corruption: > We want to ensure that the file we are passing to RSM plugin contains all the > data which is present in MemoryByteBuffer i.e. we should have flushed the > MemoryByteBuffer to the file using force(). In Kafka, when we close a > segment, indexes are flushed asynchronously [1]. Hence, it might be possible > that when we are passing the file to RSM, the file doesn't contain flushed > data. Hence, we may end up uploading indexes which haven't been flushed yet. > Ideally, the contract should enforce that we force flush the content of > MemoryByteBuffer before we give the file for RSM. This will ensure that > indexes are not corrupted/incomplete. > [1] > [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15612) Followup on whether the segment indexes need to be materialized or flushed before they are passed to RSM for writing them to tiered storage.
[ https://issues.apache.org/jira/browse/KAFKA-15612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15612: --- Description: Followup on the [PR comment|https://github.com/apache/kafka/pull/14529#discussion_r1360877868] (was: Followup on the [PR comment|https://github.com/apache/kafka/pull/14529#discussion_r1355263700]) > Followup on whether the segment indexes need to be materialized or flushed > before they are passed to RSM for writing them to tiered storage. > - > > Key: KAFKA-15612 > URL: https://issues.apache.org/jira/browse/KAFKA-15612 > Project: Kafka > Issue Type: Task >Reporter: Satish Duggana >Priority: Major > > Followup on the [PR > comment|https://github.com/apache/kafka/pull/14529#discussion_r1360877868] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15612) Followup on whether the segment indexes need to be materialized or flushed before they are passed to RSM for writing them to tiered storage.
[ https://issues.apache.org/jira/browse/KAFKA-15612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15612: --- Fix Version/s: 3.7.0 > Followup on whether the segment indexes need to be materialized or flushed > before they are passed to RSM for writing them to tiered storage. > - > > Key: KAFKA-15612 > URL: https://issues.apache.org/jira/browse/KAFKA-15612 > Project: Kafka > Issue Type: Task >Reporter: Satish Duggana >Priority: Major > Fix For: 3.7.0 > > > Followup on the [PR > comment|https://github.com/apache/kafka/pull/14529#discussion_r1360877868] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15612) Followup on whether the segment indexes need to be materialized or flushed before they are passed to RSM for writing them to tiered storage.
[ https://issues.apache.org/jira/browse/KAFKA-15612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15612: --- Summary: Followup on whether the segment indexes need to be materialized or flushed before they are passed to RSM for writing them to tiered storage. (was: Followup on whether the indexes need to be materialized before they are passed to RSM for writing them to tiered storage. ) > Followup on whether the segment indexes need to be materialized or flushed > before they are passed to RSM for writing them to tiered storage. > - > > Key: KAFKA-15612 > URL: https://issues.apache.org/jira/browse/KAFKA-15612 > Project: Kafka > Issue Type: Task >Reporter: Satish Duggana >Priority: Major > > Followup on the [PR > comment|https://github.com/apache/kafka/pull/14529#discussion_r1355263700] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15612) Followup on whether the indexes need to be materialized before they are passed to RSM for writing them to tiered storage.
Satish Duggana created KAFKA-15612: -- Summary: Followup on whether the indexes need to be materialized before they are passed to RSM for writing them to tiered storage. Key: KAFKA-15612 URL: https://issues.apache.org/jira/browse/KAFKA-15612 Project: Kafka Issue Type: Task Reporter: Satish Duggana Followup on the [PR comment|https://github.com/apache/kafka/pull/14529#discussion_r1355263700] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15593) Add 3.6.0 to broker/client upgrade/compatibility tests
[ https://issues.apache.org/jira/browse/KAFKA-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15593: --- Fix Version/s: 3.7.0 > Add 3.6.0 to broker/client upgrade/compatibility tests > -- > > Key: KAFKA-15593 > URL: https://issues.apache.org/jira/browse/KAFKA-15593 > Project: Kafka > Issue Type: Sub-task >Reporter: Satish Duggana >Assignee: Satish Duggana >Priority: Major > Fix For: 3.7.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15593) Add 3.6.0 to broker/client upgrade/compatibility tests
[ https://issues.apache.org/jira/browse/KAFKA-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana reassigned KAFKA-15593: -- Assignee: Satish Duggana > Add 3.6.0 to broker/client upgrade/compatibility tests > -- > > Key: KAFKA-15593 > URL: https://issues.apache.org/jira/browse/KAFKA-15593 > Project: Kafka > Issue Type: Sub-task >Reporter: Satish Duggana >Assignee: Satish Duggana >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15593) Add 3.6.0 to broker/client upgrade/compatibility tests
Satish Duggana created KAFKA-15593: -- Summary: Add 3.6.0 to broker/client upgrade/compatibility tests Key: KAFKA-15593 URL: https://issues.apache.org/jira/browse/KAFKA-15593 Project: Kafka Issue Type: Sub-task Reporter: Satish Duggana -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15594) Add 3.6.0 to streams upgrade/compatibility tests
Satish Duggana created KAFKA-15594: -- Summary: Add 3.6.0 to streams upgrade/compatibility tests Key: KAFKA-15594 URL: https://issues.apache.org/jira/browse/KAFKA-15594 Project: Kafka Issue Type: Sub-task Reporter: Satish Duggana -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15576) Add 3.6.0 to broker/client and streams upgrade/compatibility tests
[ https://issues.apache.org/jira/browse/KAFKA-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15576: --- Summary: Add 3.6.0 to broker/client and streams upgrade/compatibility tests (was: Add 3.2.0 to broker/client and streams upgrade/compatibility tests) > Add 3.6.0 to broker/client and streams upgrade/compatibility tests > -- > > Key: KAFKA-15576 > URL: https://issues.apache.org/jira/browse/KAFKA-15576 > Project: Kafka > Issue Type: Task >Reporter: Satish Duggana >Priority: Major > Fix For: 3.7.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15576) Add 3.2.0 to broker/client and streams upgrade/compatibility tests
Satish Duggana created KAFKA-15576: -- Summary: Add 3.2.0 to broker/client and streams upgrade/compatibility tests Key: KAFKA-15576 URL: https://issues.apache.org/jira/browse/KAFKA-15576 Project: Kafka Issue Type: Task Reporter: Satish Duggana Fix For: 3.7.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15535) Add documentation of "remote.log.index.file.cache.total.size.bytes" configuration property.
[ https://issues.apache.org/jira/browse/KAFKA-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773898#comment-17773898 ] Satish Duggana commented on KAFKA-15535: Thanks [~hudeqi] for checking that out. > Add documentation of "remote.log.index.file.cache.total.size.bytes" > configuration property. > > > Key: KAFKA-15535 > URL: https://issues.apache.org/jira/browse/KAFKA-15535 > Project: Kafka > Issue Type: Task > Components: documentation >Reporter: Satish Duggana >Assignee: hudeqi >Priority: Major > Labels: tiered-storage > Fix For: 3.7.0 > > > Add documentation of "remote.log.index.file.cache.total.size.bytes" > configuration property. > Please double check all the existing public tiered storage configurations. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15535) Add documentation of "remote.log.index.file.cache.total.size.bytes" configuration property.
[ https://issues.apache.org/jira/browse/KAFKA-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana resolved KAFKA-15535. Resolution: Fixed > Add documentation of "remote.log.index.file.cache.total.size.bytes" > configuration property. > > > Key: KAFKA-15535 > URL: https://issues.apache.org/jira/browse/KAFKA-15535 > Project: Kafka > Issue Type: Task > Components: documentation >Reporter: Satish Duggana >Assignee: hudeqi >Priority: Major > Labels: tiered-storage > Fix For: 3.7.0 > > > Add documentation of "remote.log.index.file.cache.total.size.bytes" > configuration property. > Please double check all the existing public tiered storage configurations. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15535) Add documentation of "remote.log.index.file.cache.total.size.bytes" configuration property.
Satish Duggana created KAFKA-15535: -- Summary: Add documentation of "remote.log.index.file.cache.total.size.bytes" configuration property. Key: KAFKA-15535 URL: https://issues.apache.org/jira/browse/KAFKA-15535 Project: Kafka Issue Type: Task Components: documentation Reporter: Satish Duggana Fix For: 3.7.0 Add documentation of "remote.log.index.file.cache.total.size.bytes" configuration property. Please double check all the existing public tiered storage configurations. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15530) Add missing documentation of metrics introduced as part of KAFKA-15196
[ https://issues.apache.org/jira/browse/KAFKA-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15530: --- Affects Version/s: 3.6.0 > Add missing documentation of metrics introduced as part of KAFKA-15196 > -- > > Key: KAFKA-15530 > URL: https://issues.apache.org/jira/browse/KAFKA-15530 > Project: Kafka > Issue Type: Task > Components: documentation >Affects Versions: 3.6.0 >Reporter: Satish Duggana >Priority: Major > > This is a followup to the 3.6.0 RC2 verification email > [thread|https://lists.apache.org/thread/js2nmq3ggn46qg122h4jg5p2fcq5hr2s]. > Add the missing documentation of a few metrics added as part of the > [change|https://github.com/apache/kafka/commit/2f71708955b293658cec3b27e9a5588d39c38d7e]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15530) Add missing documentation of metrics introduced as part of KAFKA-15196
Satish Duggana created KAFKA-15530: -- Summary: Add missing documentation of metrics introduced as part of KAFKA-15196 Key: KAFKA-15530 URL: https://issues.apache.org/jira/browse/KAFKA-15530 Project: Kafka Issue Type: Task Reporter: Satish Duggana This is a followup to the 3.6.0 RC2 verification email [thread|https://lists.apache.org/thread/js2nmq3ggn46qg122h4jg5p2fcq5hr2s]. Add the missing documentation of a few metrics added as part of the [change|https://github.com/apache/kafka/commit/2f71708955b293658cec3b27e9a5588d39c38d7e]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15530) Add missing documentation of metrics introduced as part of KAFKA-15196
[ https://issues.apache.org/jira/browse/KAFKA-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15530: --- Component/s: documentation > Add missing documentation of metrics introduced as part of KAFKA-15196 > -- > > Key: KAFKA-15530 > URL: https://issues.apache.org/jira/browse/KAFKA-15530 > Project: Kafka > Issue Type: Task > Components: documentation >Reporter: Satish Duggana >Priority: Major > > This is a followup to the 3.6.0 RC2 verification email > [thread|https://lists.apache.org/thread/js2nmq3ggn46qg122h4jg5p2fcq5hr2s]. > Add the missing documentation of a few metrics added as part of the > [change|https://github.com/apache/kafka/commit/2f71708955b293658cec3b27e9a5588d39c38d7e]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15001) CVE vulnerabilities in Jetty
[ https://issues.apache.org/jira/browse/KAFKA-15001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15001: --- Fix Version/s: (was: 3.6.0) > CVE vulnerabilities in Jetty > - > > Key: KAFKA-15001 > URL: https://issues.apache.org/jira/browse/KAFKA-15001 > Project: Kafka > Issue Type: Task >Affects Versions: 3.4.0, 3.3.2 >Reporter: Arushi Rai >Priority: Critical > Fix For: 3.5.1, 3.4.2 > > > Kafka is using org.eclipse.jetty_jetty-server and org.eclipse.jetty_jetty-io > version 9.4.48.v20220622 where 3 moderate and medium vulnerabilities have > been reported. > Moderate [CVE-2023-26048|https://nvd.nist.gov/vuln/detail/CVE-2023-26048] in > org.eclipse.jetty_jetty-server > Medium [CVE-2023-26049|https://nvd.nist.gov/vuln/detail/CVE-2023-26049] in > org.eclipse.jetty_jetty-io > Medium [CVE-2023-26048|https://nvd.nist.gov/vuln/detail/CVE-2023-26048] in > org.eclipse.jetty_jetty-io > These are fixed in jetty versions 11.0.14, 10.0.14, 9.4.51 and Kafka should > use the same. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15503) CVE-2023-40167, CVE-2023-36479 - Upgrade jetty to 9.4.52, 10.0.16, 11.0.16, 12.0.1
[ https://issues.apache.org/jira/browse/KAFKA-15503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768810#comment-17768810 ] Satish Duggana commented on KAFKA-15503: https://github.com/apache/kafka/pull/10526 is cherrypicked to 3.6 branch. > CVE-2023-40167, CVE-2023-36479 - Upgrade jetty to 9.4.52, 10.0.16, 11.0.16, > 12.0.1 > -- > > Key: KAFKA-15503 > URL: https://issues.apache.org/jira/browse/KAFKA-15503 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Rafael Rios Saavedra >Assignee: Divij Vaidya >Priority: Major > Labels: CVE, security > Fix For: 3.6.0 > > > CVE-2023-40167 and CVE-2023-36479 vulnerabilities affects Jetty version > {*}9.4.51{*}. For more information see > [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-40167] > [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-364749] > Upgrading to Jetty version *9.4.52, 10.0.16, 11.0.16, 12.0.1* should address > this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15503) CVE-2023-40167, CVE-2023-36479 - Upgrade jetty to 9.4.52, 10.0.16, 11.0.16, 12.0.1
[ https://issues.apache.org/jira/browse/KAFKA-15503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15503: --- Fix Version/s: 3.6.0 (was: 3.0.0) (was: 2.8.0) (was: 2.7.1) (was: 2.6.2) > CVE-2023-40167, CVE-2023-36479 - Upgrade jetty to 9.4.52, 10.0.16, 11.0.16, > 12.0.1 > -- > > Key: KAFKA-15503 > URL: https://issues.apache.org/jira/browse/KAFKA-15503 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Rafael Rios Saavedra >Assignee: Divij Vaidya >Priority: Major > Labels: CVE, security > Fix For: 3.6.0 > > > CVE-2023-40167 and CVE-2023-36479 vulnerabilities affects Jetty version > {*}9.4.51{*}. For more information see > [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-40167] > [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-364749] > Upgrading to Jetty version *9.4.52, 10.0.16, 11.0.16, 12.0.1* should address > this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15503) CVE-2023-40167, CVE-2023-36479 - Upgrade jetty to 9.4.52, 10.0.16, 11.0.16, 12.0.1
[ https://issues.apache.org/jira/browse/KAFKA-15503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15503: --- Affects Version/s: (was: 2.7.0) (was: 2.6.1) (was: 3.4.1) (was: 3.5.1) > CVE-2023-40167, CVE-2023-36479 - Upgrade jetty to 9.4.52, 10.0.16, 11.0.16, > 12.0.1 > -- > > Key: KAFKA-15503 > URL: https://issues.apache.org/jira/browse/KAFKA-15503 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Rafael Rios Saavedra >Assignee: Divij Vaidya >Priority: Major > Labels: CVE, security > Fix For: 2.8.0, 2.7.1, 2.6.2, 3.0.0 > > > CVE-2023-40167 and CVE-2023-36479 vulnerabilities affects Jetty version > {*}9.4.51{*}. For more information see > [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-40167] > [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-364749] > Upgrading to Jetty version *9.4.52, 10.0.16, 11.0.16, 12.0.1* should address > this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15503) CVE-2023-40167, CVE-2023-36479 - Upgrade jetty to 9.4.52, 10.0.16, 11.0.16, 12.0.1
Satish Duggana created KAFKA-15503: -- Summary: CVE-2023-40167, CVE-2023-36479 - Upgrade jetty to 9.4.52, 10.0.16, 11.0.16, 12.0.1 Key: KAFKA-15503 URL: https://issues.apache.org/jira/browse/KAFKA-15503 Project: Kafka Issue Type: Bug Affects Versions: 2.7.0, 2.6.1, 3.4.1, 3.6.0, 3.5.1 Reporter: Rafael Rios Saavedra Assignee: Divij Vaidya Fix For: 2.8.0, 2.7.1, 2.6.2, 3.0.0 CVE-2023-40167 and CVE-2023-36479 vulnerabilities affects Jetty version {*}9.4.51{*}. For more information see [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-40167] [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-364749] Upgrading to Jetty version *9.4.52, 10.0.16, 11.0.16, 12.0.1* should address this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15498) Upgrade Snappy-Java to 1.1.10.4
[ https://issues.apache.org/jira/browse/KAFKA-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15498: --- Priority: Blocker (was: Major) > Upgrade Snappy-Java to 1.1.10.4 > --- > > Key: KAFKA-15498 > URL: https://issues.apache.org/jira/browse/KAFKA-15498 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.4.1, 3.5.1 >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Blocker > Fix For: 3.6.0 > > > Snappy-java published a new vulnerability > <[https://github.com/xerial/snappy-java/security/advisories/GHSA-55g7-9cwv-5qfv]> > that will cause OOM error in the server. > Kafka is also impacted by this vulnerability since it's like CVE-2023-34455 > <[https://nvd.nist.gov/vuln/detail/CVE-2023-34455]>. > We'd better bump the snappy-java version to bypass this vulnerability. > PR <[https://github.com/apache/kafka/pull/14434]> is created to run the CI > build. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15001) CVE vulnerabilities in Jetty
[ https://issues.apache.org/jira/browse/KAFKA-15001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15001: --- Fix Version/s: 3.6.0 > CVE vulnerabilities in Jetty > - > > Key: KAFKA-15001 > URL: https://issues.apache.org/jira/browse/KAFKA-15001 > Project: Kafka > Issue Type: Task >Affects Versions: 3.4.0, 3.3.2 >Reporter: Arushi Rai >Priority: Critical > Fix For: 3.6.0, 3.5.1, 3.4.2 > > > Kafka is using org.eclipse.jetty_jetty-server and org.eclipse.jetty_jetty-io > version 9.4.48.v20220622 where 3 moderate and medium vulnerabilities have > been reported. > Moderate [CVE-2023-26048|https://nvd.nist.gov/vuln/detail/CVE-2023-26048] in > org.eclipse.jetty_jetty-server > Medium [CVE-2023-26049|https://nvd.nist.gov/vuln/detail/CVE-2023-26049] in > org.eclipse.jetty_jetty-io > Medium [CVE-2023-26048|https://nvd.nist.gov/vuln/detail/CVE-2023-26048] in > org.eclipse.jetty_jetty-io > These are fixed in jetty versions 11.0.14, 10.0.14, 9.4.51 and Kafka should > use the same. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15483) Update metrics documentation for the new metrics implemented as part of KIP-938
[ https://issues.apache.org/jira/browse/KAFKA-15483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana reassigned KAFKA-15483: -- Assignee: David Arthur > Update metrics documentation for the new metrics implemented as part of > KIP-938 > --- > > Key: KAFKA-15483 > URL: https://issues.apache.org/jira/browse/KAFKA-15483 > Project: Kafka > Issue Type: Task > Components: docs, documentation >Affects Versions: 3.6.0 >Reporter: Satish Duggana >Assignee: David Arthur >Priority: Major > > Update the kafka-site documentation for 3.6 release with the newly introduced > metrics in 3.6 for KIP-938. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15483) Update metrics documentation for the new metrics implemented as part of KIP-938
[ https://issues.apache.org/jira/browse/KAFKA-15483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767638#comment-17767638 ] Satish Duggana commented on KAFKA-15483: [~mumrah] Assigning it to you as you raised [PR-548|https://github.com/apache/kafka-site/pull/548] to address this issue. > Update metrics documentation for the new metrics implemented as part of > KIP-938 > --- > > Key: KAFKA-15483 > URL: https://issues.apache.org/jira/browse/KAFKA-15483 > Project: Kafka > Issue Type: Task > Components: docs, documentation >Affects Versions: 3.6.0 >Reporter: Satish Duggana >Priority: Major > > Update the kafka-site documentation for 3.6 release with the newly introduced > metrics in 3.6 for KIP-938. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15483) Update metrics documentation for the new metrics implemented as part of KIP-938
[ https://issues.apache.org/jira/browse/KAFKA-15483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15483: --- Affects Version/s: 3.6.0 > Update metrics documentation for the new metrics implemented as part of > KIP-938 > --- > > Key: KAFKA-15483 > URL: https://issues.apache.org/jira/browse/KAFKA-15483 > Project: Kafka > Issue Type: Task > Components: docs, documentation >Affects Versions: 3.6.0 >Reporter: Satish Duggana >Priority: Major > > Update the kafka-site documentation for 3.6 release with the newly introduced > metrics in 3.6 for KIP-938. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15483) Update metrics documentation for the new metrics implemented as part of KIP-938
[ https://issues.apache.org/jira/browse/KAFKA-15483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767347#comment-17767347 ] Satish Duggana commented on KAFKA-15483: [~cmccabe] [~mumrah] Please help in updating the kafka-site documentation for 3.6 release with the newly introduced metrics in 3.6 for KIP-938. > Update metrics documentation for the new metrics implemented as part of > KIP-938 > --- > > Key: KAFKA-15483 > URL: https://issues.apache.org/jira/browse/KAFKA-15483 > Project: Kafka > Issue Type: Task > Components: docs, documentation >Reporter: Satish Duggana >Priority: Major > > Update the kafka-site documentation for 3.6 release with the newly introduced > metrics in 3.6 for KIP-938. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15483) Update metrics documentation for the new metrics implemented as part of KIP-938
[ https://issues.apache.org/jira/browse/KAFKA-15483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15483: --- Description: Update the kafka-site documentation for 3.6 release with the newly introduced metrics in 3.6 for KIP-938. > Update metrics documentation for the new metrics implemented as part of > KIP-938 > --- > > Key: KAFKA-15483 > URL: https://issues.apache.org/jira/browse/KAFKA-15483 > Project: Kafka > Issue Type: Task > Components: docs, documentation >Reporter: Satish Duggana >Priority: Major > > Update the kafka-site documentation for 3.6 release with the newly introduced > metrics in 3.6 for KIP-938. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15483) Update metrics documentation for the new metrics implemented as part of KIP-938
Satish Duggana created KAFKA-15483: -- Summary: Update metrics documentation for the new metrics implemented as part of KIP-938 Key: KAFKA-15483 URL: https://issues.apache.org/jira/browse/KAFKA-15483 Project: Kafka Issue Type: Task Components: docs, documentation Reporter: Satish Duggana -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14661) Upgrade Zookeeper to 3.8.2
[ https://issues.apache.org/jira/browse/KAFKA-14661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-14661: --- Summary: Upgrade Zookeeper to 3.8.2 (was: Upgrade Zookeeper to 3.8.1 ) > Upgrade Zookeeper to 3.8.2 > --- > > Key: KAFKA-14661 > URL: https://issues.apache.org/jira/browse/KAFKA-14661 > Project: Kafka > Issue Type: Improvement > Components: packaging >Reporter: Divij Vaidya >Assignee: Christo Lolov >Priority: Blocker > Fix For: 3.6.0 > > > Current Zk version (3.6.x) supported by Apache Kafka has been EOL since > December 2022 [1] > Users of Kafka are facing regulatory hurdles because of using a dependency > which is EOL, hence, I would suggest to upgrade this in all upcoming releases > (including patch releases of 3.3.x and 3.4.x versions). > Some things to consider while upgrading (as pointed by [~ijuma] at [2]): > # If we upgrade the zk server to 3.8.1, what is the impact on the zk > clients. That is, what's the earliest zk client version that is supported by > the 3.8.x server? > # We need to ensure there are no regressions (particularly on the stability > front) when it comes to this upgrade. It would be good for someone to stress > test the system a bit with the new version and check if all works well. > [1] [https://zookeeper.apache.org/releases.html] > [2][https://github.com/apache/kafka/pull/12620#issuecomment-1409028650] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (KAFKA-15092) KafkaClusterTestKit in test jar depends on MockFaultHandler
[ https://issues.apache.org/jira/browse/KAFKA-15092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana closed KAFKA-15092. -- > KafkaClusterTestKit in test jar depends on MockFaultHandler > --- > > Key: KAFKA-15092 > URL: https://issues.apache.org/jira/browse/KAFKA-15092 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Gary Russell >Priority: Major > > {noformat} > java.lang.NoClassDefFoundError: org/apache/kafka/server/fault/MockFaultHandler > at > kafka.testkit.KafkaClusterTestKit$SimpleFaultHandlerFactory.(KafkaClusterTestKit.java:119) > at > kafka.testkit.KafkaClusterTestKit$Builder.(KafkaClusterTestKit.java:143) > {noformat} > MockFaultHandler is missing from the test jar. > This PR https://github.com/apache/kafka/pull/13375/files seems to work around > it by adding the {code}server-common sourcesets.test.output{code} to the > class path. > The class needs to be available for third parties to create an embedded KRaft > broker. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15092) KafkaClusterTestKit in test jar depends on MockFaultHandler
[ https://issues.apache.org/jira/browse/KAFKA-15092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana resolved KAFKA-15092. Resolution: Invalid > KafkaClusterTestKit in test jar depends on MockFaultHandler > --- > > Key: KAFKA-15092 > URL: https://issues.apache.org/jira/browse/KAFKA-15092 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Gary Russell >Priority: Major > > {noformat} > java.lang.NoClassDefFoundError: org/apache/kafka/server/fault/MockFaultHandler > at > kafka.testkit.KafkaClusterTestKit$SimpleFaultHandlerFactory.(KafkaClusterTestKit.java:119) > at > kafka.testkit.KafkaClusterTestKit$Builder.(KafkaClusterTestKit.java:143) > {noformat} > MockFaultHandler is missing from the test jar. > This PR https://github.com/apache/kafka/pull/13375/files seems to work around > it by adding the {code}server-common sourcesets.test.output{code} to the > class path. > The class needs to be available for third parties to create an embedded KRaft > broker. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (KAFKA-15482) kafka.utils.TestUtils Depends on MockTime Which is Not in Any Jar
[ https://issues.apache.org/jira/browse/KAFKA-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana closed KAFKA-15482. -- > kafka.utils.TestUtils Depends on MockTime Which is Not in Any Jar > - > > Key: KAFKA-15482 > URL: https://issues.apache.org/jira/browse/KAFKA-15482 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Gary Russell >Priority: Major > > Commit > 7eea2a3908fdcee1627c18827e6dcb5ed0089fdd > Moved it to server-commons, but it is not included in the jar. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15482) kafka.utils.TestUtils Depends on MockTime Which is Not in Any Jar
[ https://issues.apache.org/jira/browse/KAFKA-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana resolved KAFKA-15482. Resolution: Invalid > kafka.utils.TestUtils Depends on MockTime Which is Not in Any Jar > - > > Key: KAFKA-15482 > URL: https://issues.apache.org/jira/browse/KAFKA-15482 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Gary Russell >Priority: Major > > Commit > 7eea2a3908fdcee1627c18827e6dcb5ed0089fdd > Moved it to server-commons, but it is not included in the jar. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15480) Add RemoteStorageInterruptedException
[ https://issues.apache.org/jira/browse/KAFKA-15480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15480: --- Fix Version/s: 3.7.0 > Add RemoteStorageInterruptedException > - > > Key: KAFKA-15480 > URL: https://issues.apache.org/jira/browse/KAFKA-15480 > Project: Kafka > Issue Type: Task > Components: core >Affects Versions: 3.6.0 >Reporter: Mital Awachat >Priority: Major > Fix For: 3.7.0 > > > Introduce `RemoteStorageInterruptedException` to propagate interruptions from > the plugin to Kafka without generated (false) errors. > It allows the plugin to notify Kafka an API operation in progress was > interrupted as a result of task cancellation, which can happen under changes > such as leadership migration or topic deletion. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15141) High CPU usage with log4j2
[ https://issues.apache.org/jira/browse/KAFKA-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-15141: --- Fix Version/s: 3.7.0 > High CPU usage with log4j2 > -- > > Key: KAFKA-15141 > URL: https://issues.apache.org/jira/browse/KAFKA-15141 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2, 3.5.0, 3.4.1 >Reporter: Gaurav Narula >Assignee: Gaurav Narula >Priority: Major > Fix For: 3.6.0, 3.7.0 > > > Kafka brokers make use of the [Logging > trait|https://github.com/apache/kafka/blob/1f4cbc5d53259031123b6e9e6bb9a5bbe1e084e8/core/src/main/scala/kafka/utils/Logging.scala#L41] > which instantiates a Logger object for every instantiation of the class > using the trait by default. > When using log4j2 as the logging implementation, the instantiation of a > Logger object requires a stack traversal > [[1]|https://github.com/apache/logging-log4j2/blob/2.x/log4j-api/src/main/java/org/apache/logging/log4j/spi/AbstractLoggerAdapter.java#L121] > and > [[2]|https://github.com/apache/logging-log4j2/blob/83bba1bc322e80e7e95edbebc2383f2724dbe0de/log4j-slf4j-impl/src/main/java/org/apache/logging/slf4j/Log4jLoggerFactory.java#L54]. > While LOG4J2-2940 ensures stack is not traversed unless required, the default > {{{}ContextSelector{}}}, > [ClassLoaderContextSelector|https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/selector/ClassLoaderContextSelector.html] > causes a stack traversal. > These stack traversals are frequent and quite CPU intensive and profiling > suggests they consume ~5% CPU time (of the CPU stacks we have profiled on a > sample of clusters). While log4j2 users can potentially avoid this by > changing the context selector in their configuration, it is easy to overlook > and the default configurations results in high CPU usage inadvertently. > An easy fix would be to instantiate the loggers statically for some commonly > instantiated classes in Kafka which make use of the Logging trait. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-8391) Flaky Test RebalanceSourceConnectorsIntegrationTest#testDeleteConnector
[ https://issues.apache.org/jira/browse/KAFKA-8391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-8391: -- Fix Version/s: 2.4.0 > Flaky Test RebalanceSourceConnectorsIntegrationTest#testDeleteConnector > --- > > Key: KAFKA-8391 > URL: https://issues.apache.org/jira/browse/KAFKA-8391 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.3.0 >Reporter: Matthias J. Sax >Assignee: Sagar Rao >Priority: Critical > Labels: flaky-test > Fix For: 2.4.0 > > Attachments: 100-gradle-builds.tar > > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/4747/testReport/junit/org.apache.kafka.connect.integration/RebalanceSourceConnectorsIntegrationTest/testDeleteConnector/] > {quote}java.lang.AssertionError: Condition not met within timeout 3. > Connector tasks did not stop in time. at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:375) at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:352) at > org.apache.kafka.connect.integration.RebalanceSourceConnectorsIntegrationTest.testDeleteConnector(RebalanceSourceConnectorsIntegrationTest.java:166){quote} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-13875) update docs to include topoicId for kafka-topics.sh --describe output
[ https://issues.apache.org/jira/browse/KAFKA-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-13875: --- Fix Version/s: 3.6.0 > update docs to include topoicId for kafka-topics.sh --describe output > - > > Key: KAFKA-13875 > URL: https://issues.apache.org/jira/browse/KAFKA-13875 > Project: Kafka > Issue Type: Improvement > Components: admin >Affects Versions: 3.2.0 >Reporter: Luke Chen >Assignee: Richard Joerger >Priority: Major > Labels: newbie > Fix For: 3.6.0 > > > The topic describe output in quickstart doc here: > [https://kafka.apache.org/quickstart] should be updated now. > {code:java} > bin/kafka-topics.sh --describe --topic quickstart-events --bootstrap-server > localhost:9092 > Topic:quickstart-events PartitionCount:1ReplicationFactor:1 Configs: > Topic: quickstart-events Partition: 0Leader: 0 Replicas: 0 Isr: > 0{code} > After Topic Id implementation, we included the topic id info in the output > now. Also the configs is not empty now. The doc should be updated to avoid > new users get confused. -- This message was sent by Atlassian Jira (v8.20.10#820010)