[jira] [Resolved] (KAFKA-14174) Operation documentation for KRaft
[ https://issues.apache.org/jira/browse/KAFKA-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-14174. Resolution: Fixed > Operation documentation for KRaft > - > > Key: KAFKA-14174 > URL: https://issues.apache.org/jira/browse/KAFKA-14174 > Project: Kafka > Issue Type: Improvement > Components: documentation >Affects Versions: 3.3.0 > Reporter: Jose Armando Garcia Sancio > Assignee: Jose Armando Garcia Sancio >Priority: Blocker > Labels: documentation, kraft > > KRaft documentation for 3.3 > # Disk recovery > # External controller is the recommended configuration. The majority of > integration tests don't run against co-located mode. > # Talk about KRaft operation -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14265) Prefix ACLs may shadow other prefix ACLs
[ https://issues.apache.org/jira/browse/KAFKA-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-14265. Resolution: Fixed > Prefix ACLs may shadow other prefix ACLs > > > Key: KAFKA-14265 > URL: https://issues.apache.org/jira/browse/KAFKA-14265 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.3.1 > > > Prefix ACLs may shadow other prefix ACLs. Consider the case where we have > prefix ACLs for foobar, fooa, and f. If we were matching a resource named > "foobar", we'd start scanning at the foobar ACL, hit the fooa ACL, and stop > -- missing the f ACL. > To fix this, we should re-scan for ACLs at the first divergence point (in > this case, f) whenever we hit a mismatch of this kind. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14259) BrokerRegistration#toString throws an exception, terminating metadata replay
[ https://issues.apache.org/jira/browse/KAFKA-14259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-14259. Resolution: Fixed > BrokerRegistration#toString throws an exception, terminating metadata replay > > > Key: KAFKA-14259 > URL: https://issues.apache.org/jira/browse/KAFKA-14259 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.3.0 > > > BrokerRegistration#toString throws an exception, terminating metadata replay, > because the sorted() method is used on an entry set rather than a key set. > {noformat} > Caused by: > > > java.util.concurrent.ExecutionException: > java.lang.ClassCastException: class java.util.HashMap$Node cannot be cast to > class java.lang.Comparable (java.util.HashMap$Node and java.lan > g.Comparable are in module java.base of loader 'bootstrap') > > > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) > > > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) > > > at kafka.server.BrokerServer.startup(BrokerServer.scala:846) > > > ... 147 more > > > > > > Caused by: > > > java.lang.ClassCastException: class java.util.HashMap$Node cannot > be cast to class java.lang.Comparable (java.util.HashMap$Node and > java.lang.Comparable are in module java.base > of loader 'bootstrap') > > > at > java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:47) > > > at > java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355) > > > at java.base/java.util.TimSort.sort(TimSort.java:220) > > > at java.base/java.util.Arrays.sort(Arrays.java:1307) > > > at > java.base/java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:353) > > > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:510) > > > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > > > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > > > at > java.base/j
[jira] [Resolved] (KAFKA-14207) Add a 6.10 section for KRaft
[ https://issues.apache.org/jira/browse/KAFKA-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-14207. Resolution: Fixed > Add a 6.10 section for KRaft > > > Key: KAFKA-14207 > URL: https://issues.apache.org/jira/browse/KAFKA-14207 > Project: Kafka > Issue Type: Sub-task > Components: documentation >Affects Versions: 3.3.0 > Reporter: Jose Armando Garcia Sancio > Assignee: Jose Armando Garcia Sancio >Priority: Major > Labels: documentation, kraft > > The section should talk about: > # Limitation > # Recommended deployment: external controller > # How to start a KRaft cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14241) Implement the snapshot cleanup policy
Jose Armando Garcia Sancio created KAFKA-14241: -- Summary: Implement the snapshot cleanup policy Key: KAFKA-14241 URL: https://issues.apache.org/jira/browse/KAFKA-14241 Project: Kafka Issue Type: Sub-task Components: kraft Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.4.0 It looks like delete policy needs to be set to either delete or compact: {code:java} .define(CleanupPolicyProp, LIST, Defaults.CleanupPolicy, ValidList.in(LogConfig.Compact, LogConfig.Delete), MEDIUM, CompactDoc, KafkaConfig.LogCleanupPolicyProp) {code} Neither is correct for KRaft topics. KIP-630 talks about adding a third policy called snapshot: {code:java} The __cluster_metadata topic will have snapshot as the cleanup.policy. {code} [https://cwiki.apache.org/confluence/display/KAFKA/KIP-630%3A+Kafka+Raft+Snapshot#KIP630:KafkaRaftSnapshot-ProposedChanges] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14238) KRaft replicas can delete segments not included in a snapshot
Jose Armando Garcia Sancio created KAFKA-14238: -- Summary: KRaft replicas can delete segments not included in a snapshot Key: KAFKA-14238 URL: https://issues.apache.org/jira/browse/KAFKA-14238 Project: Kafka Issue Type: Bug Components: core, kraft Reporter: Jose Armando Garcia Sancio Fix For: 3.3.0 We see this in the log {code:java} Deleting segment LogSegment(baseOffset=243864, size=9269150, lastModifiedTime=1662486784182, largestRecordTimestamp=Some(1662486784160)) due to retention time 60480ms breach based on the largest record timestamp in the segment {code} This then cause {{KafkaRaftClient}} to throw an exception when sending batches to the listener: {code:java} java.lang.IllegalStateException: Snapshot expected since next offset of org.apache.kafka.controller.QuorumController$QuorumMetaLogListener@195461949 is 0, log start offset is 369668 and high-watermark is 547379 at org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$4(KafkaRaftClient.java:312) at java.base/java.util.Optional.orElseThrow(Optional.java:403) at org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$5(KafkaRaftClient.java:311) at java.base/java.util.OptionalLong.ifPresent(OptionalLong.java:165) at org.apache.kafka.raft.KafkaRaftClient.updateListenersProgress(KafkaRaftClient.java:309){code} The on disk state for the cluster metadata partition confirms this: {code:java} ls __cluster_metadata-0/ 00369668.index 00369668.log 00369668.timeindex 00503411.index 00503411.log 00503411.snapshot 00503411.timeindex 00548746.snapshot leader-epoch-checkpoint partition.metadata quorum-state{code} Noticed that there are no {{checkpoint}} files and the log doesn't have a segment at base offset 0. This is happening because the {{LogConfig}} used for KRaft sets the retention policy to {{delete}} which causes the method {{deleteOldSegments}} to delete old segments even if there are no snaspshot for it. For KRaft, Kafka should only delete segment that breach the log start offset. Log configuration for KRaft: {code:java} val props = new Properties() props.put(LogConfig.MaxMessageBytesProp, config.maxBatchSizeInBytes.toString) props.put(LogConfig.SegmentBytesProp, Int.box(config.logSegmentBytes)) props.put(LogConfig.SegmentMsProp, Long.box(config.logSegmentMillis)) props.put(LogConfig.FileDeleteDelayMsProp, Int.box(Defaults.FileDeleteDelayMs)) LogConfig.validateValues(props) val defaultLogConfig = LogConfig(props){code} Segment deletion code: {code:java} def deleteOldSegments(): Int = { if (config.delete) { deleteLogStartOffsetBreachedSegments() + deleteRetentionSizeBreachedSegments() + deleteRetentionMsBreachedSegments() } else { deleteLogStartOffsetBreachedSegments() } }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14073) Logging the reason for creating a snapshot
[ https://issues.apache.org/jira/browse/KAFKA-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-14073. Resolution: Fixed > Logging the reason for creating a snapshot > -- > > Key: KAFKA-14073 > URL: https://issues.apache.org/jira/browse/KAFKA-14073 > Project: Kafka > Issue Type: Improvement >Reporter: dengziming >Priority: Minor > Labels: kraft, newbie > > So far we have two reasons for creating a snapshot. 1. X bytes were applied. > 2. the metadata version changed. we should log the reason when creating > snapshot both in the broker side and controller side. see > https://github.com/apache/kafka/pull/12265#discussion_r915972383 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14222) Exhausted BatchMemoryPool
Jose Armando Garcia Sancio created KAFKA-14222: -- Summary: Exhausted BatchMemoryPool Key: KAFKA-14222 URL: https://issues.apache.org/jira/browse/KAFKA-14222 Project: Kafka Issue Type: Bug Components: kraft Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.3.0 For a large number of topics and partition the broker can encounter this issue: {code:java} [2022-09-12 14:14:42,114] ERROR [BrokerMetadataSnapshotter id=4] Unexpected error handling CreateSnapshotEvent (kafka.server.metadata.BrokerMetadataSnapshotter) org.apache.kafka.raft.errors.BufferAllocationException: Append failed because we failed to allocate memory to write the batch at org.apache.kafka.raft.internals.BatchAccumulator.append(BatchAccumulator.java:161) at org.apache.kafka.raft.internals.BatchAccumulator.append(BatchAccumulator.java:112) at org.apache.kafka.snapshot.RecordsSnapshotWriter.append(RecordsSnapshotWriter.java:167) at kafka.server.metadata.RecordListConsumer.accept(BrokerMetadataSnapshotter.scala:49) at kafka.server.metadata.RecordListConsumer.accept(BrokerMetadataSnapshotter.scala:42) at org.apache.kafka.image.TopicImage.write(TopicImage.java:78) at org.apache.kafka.image.TopicsImage.write(TopicsImage.java:79) at org.apache.kafka.image.MetadataImage.write(MetadataImage.java:129) at kafka.server.metadata.BrokerMetadataSnapshotter$CreateSnapshotEvent.run(BrokerMetadataSnapshotter.scala:116) at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121) at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200) at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173) at java.base/java.lang.Thread.run(Thread.java:829) {code} This can happen because the snapshot is larger than {{{}5 * 8MB{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14204) QuorumController must correctly handle overly large batches
[ https://issues.apache.org/jira/browse/KAFKA-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-14204. Resolution: Fixed > QuorumController must correctly handle overly large batches > --- > > Key: KAFKA-14204 > URL: https://issues.apache.org/jira/browse/KAFKA-14204 > Project: Kafka > Issue Type: Bug > Components: controller, kraft >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14207) Add a 6.10 section for KRaft
Jose Armando Garcia Sancio created KAFKA-14207: -- Summary: Add a 6.10 section for KRaft Key: KAFKA-14207 URL: https://issues.apache.org/jira/browse/KAFKA-14207 Project: Kafka Issue Type: Sub-task Components: documentation Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.3.0 The section should talk about: # Limitation # Recommended deployment: external controller # How to start a KRaft cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14205) Document how to recover from kraft controller disk failure
Jose Armando Garcia Sancio created KAFKA-14205: -- Summary: Document how to recover from kraft controller disk failure Key: KAFKA-14205 URL: https://issues.apache.org/jira/browse/KAFKA-14205 Project: Kafka Issue Type: Sub-task Components: documentation Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.3.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14203) KRaft broker should disable snapshot generation after error replaying the metadata log
Jose Armando Garcia Sancio created KAFKA-14203: -- Summary: KRaft broker should disable snapshot generation after error replaying the metadata log Key: KAFKA-14203 URL: https://issues.apache.org/jira/browse/KAFKA-14203 Project: Kafka Issue Type: Bug Components: core Affects Versions: 3.3.0 Reporter: Jose Armando Garcia Sancio Fix For: 3.3.0 The broker skips records for which there was an error when replaying the log. This means that the MetadataImage has diverged from the state persistent in the log. The broker should disable snapshot generation else the next time a snapshot gets generated it will result in inconsistent data getting persisted. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14179) Improve docs/upgrade.html to talk about metadata.version upgrades
[ https://issues.apache.org/jira/browse/KAFKA-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-14179. Fix Version/s: (was: 3.3.0) Resolution: Duplicate > Improve docs/upgrade.html to talk about metadata.version upgrades > - > > Key: KAFKA-14179 > URL: https://issues.apache.org/jira/browse/KAFKA-14179 > Project: Kafka > Issue Type: Improvement > Components: documentation >Affects Versions: 3.3.0 > Reporter: Jose Armando Garcia Sancio >Assignee: Colin McCabe >Priority: Blocker > Labels: documentation, kraft > > The rolling upgrade documentation for 3.3.0 only talks about software and IBP > upgrades. It doesn't talk about metadata.version upgrades. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14188) Quickstart for KRaft
Jose Armando Garcia Sancio created KAFKA-14188: -- Summary: Quickstart for KRaft Key: KAFKA-14188 URL: https://issues.apache.org/jira/browse/KAFKA-14188 Project: Kafka Issue Type: Task Components: documentation, kraft Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Either: # Improve the quick start documentation to talk about both KRAft and ZK # Create a KRaft quick start that is very similar to the ZK quick start but uses a different startup process. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14183) Kraft bootstrap metadata file should use snapshot header/footer
[ https://issues.apache.org/jira/browse/KAFKA-14183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-14183. Resolution: Fixed > Kraft bootstrap metadata file should use snapshot header/footer > --- > > Key: KAFKA-14183 > URL: https://issues.apache.org/jira/browse/KAFKA-14183 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson > Assignee: Jose Armando Garcia Sancio >Priority: Major > Fix For: 3.3.0 > > > The bootstrap checkpoint file that we use in kraft is intended to follow the > usual snapshot format, but currently it does not include the header/footer > control records. The main purpose of these at the moment is to set a version > for the checkpoint file itself. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14142) Improve information returned about the cluster metadata partition
[ https://issues.apache.org/jira/browse/KAFKA-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-14142. Resolution: Won't Fix We discussed this and we decided that the kafka-metadata-quorum tool already returns enough information to determine this. > Improve information returned about the cluster metadata partition > - > > Key: KAFKA-14142 > URL: https://issues.apache.org/jira/browse/KAFKA-14142 > Project: Kafka > Issue Type: Improvement > Components: kraft > Reporter: Jose Armando Garcia Sancio >Assignee: Jason Gustafson >Priority: Blocker > Fix For: 3.3.0 > > > The Apacke Kafka operator needs to know when it is safe to format and start a > KRaft Controller that had a disk failure of the metadata log dir. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14179) Improve docs/upgrade.html to talk about metadata.version upgrades
Jose Armando Garcia Sancio created KAFKA-14179: -- Summary: Improve docs/upgrade.html to talk about metadata.version upgrades Key: KAFKA-14179 URL: https://issues.apache.org/jira/browse/KAFKA-14179 Project: Kafka Issue Type: Improvement Components: documentation Reporter: Jose Armando Garcia Sancio Fix For: 3.3.0 The rolling upgrade documentation for 3.3.0 only talks about software and IBP upgrades. It doesn't talk about metadata.version upgrades. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-13911) Rate is calculated as NaN for minimum config values
[ https://issues.apache.org/jira/browse/KAFKA-13911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-13911. Reviewer: Ismael Juma Resolution: Fixed Closing as it was merged to trunk and 3.3. > Rate is calculated as NaN for minimum config values > --- > > Key: KAFKA-13911 > URL: https://issues.apache.org/jira/browse/KAFKA-13911 > Project: Kafka > Issue Type: Bug >Reporter: Divij Vaidya >Assignee: Divij Vaidya >Priority: Minor > Fix For: 3.3.0 > > > Implementation of connection creation rate quotas in Kafka is dependent on > two configurations: > # > [quota.window.num|https://kafka.apache.org/documentation.html#brokerconfigs_quota.window.num] > # > [quota.window.size.seconds|https://kafka.apache.org/documentation.html#brokerconfigs_quota.window.size.seconds] > The minimum possible values of these configuration is 1 as per the > documentation. However, 1 as a minimum value for quota.window.num is invalid > and leads to failure for calculation of rate as demonstrated below. > As a proof of the bug, the following unit test fails: > {code:java} > @Test > public void testUseWithMinimumPossibleConfiguration() { > final Rate r = new Rate(); > MetricConfig config = new MetricConfig().samples(1).timeWindow(1, > TimeUnit.SECONDS); > Time elapsed = new MockTime(); > r.record(config, 1.0, elapsed.milliseconds()); > elapsed.sleep(100); > r.record(config, 1.0, elapsed.milliseconds()); > elapsed.sleep(1000); > final Double observedRate = r.measure(config, elapsed.milliseconds()); > assertFalse(Double.isNaN(observedRate)); > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14174) Documentation for KRaft
Jose Armando Garcia Sancio created KAFKA-14174: -- Summary: Documentation for KRaft Key: KAFKA-14174 URL: https://issues.apache.org/jira/browse/KAFKA-14174 Project: Kafka Issue Type: Improvement Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.3.0 KRaft documentation for 3.3 # Disk recovery # Talk about KRaft operation -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-13959) Controller should unfence Broker with busy metadata log
[ https://issues.apache.org/jira/browse/KAFKA-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-13959. Resolution: Fixed > Controller should unfence Broker with busy metadata log > --- > > Key: KAFKA-13959 > URL: https://issues.apache.org/jira/browse/KAFKA-13959 > Project: Kafka > Issue Type: Bug > Components: kraft >Affects Versions: 3.3.0 > Reporter: Jose Armando Garcia Sancio >Assignee: dengziming >Priority: Blocker > Fix For: 3.3.0 > > > https://issues.apache.org/jira/browse/KAFKA-13955 showed that it is possible > for the controller to not unfence a broker if the committed offset keeps > increasing. > > One solution to this problem is to require the broker to only catch up to the > last committed offset when they last sent the heartbeat. For example: > # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit > offset is {{{}X{}}}. The controller remember this last commit offset, call it > {{X'}} > # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence > the broker if {{Z >= X}} or {{{}Z >= X'{}}}. > Another solution is to unfence the broker when the applied offset of the > broker has reached the offset of its own broker registration record. > This change should also set the default for MetadataMaxIdleIntervalMs back to > 500. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14145) Faster propagation of high-watermark in KRaft topic partitions
Jose Armando Garcia Sancio created KAFKA-14145: -- Summary: Faster propagation of high-watermark in KRaft topic partitions Key: KAFKA-14145 URL: https://issues.apache.org/jira/browse/KAFKA-14145 Project: Kafka Issue Type: Task Components: kraft Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.4.0 Typically, the HWM is increase after one round of Fetch requests from the majority of the replicas. The HWM is propagated after another round of Fetch requests. If the LEO doesn't change the propagation of the HWM can be delay by one Fetch wait timeout (500ms). Looking at the KafkaRaftClient implementation we would have to have an index for both the fetch offset and the last sent high-watermark for that replica. Another issue here is that we changed the KafkaRaftManager so that it doesn't set the replica id when it is an observer/broker. Since the HWM is not part of the Fetch request the leader would have to keep track of this in the LeaderState. val nodeId = if (config.processRoles.contains(ControllerRole)) \{ OptionalInt.of(config.nodeId) } else \{ OptionalInt.empty() }{{}} We would need to find a better solution for https://issues.apache.org/jira/browse/KAFKA-13168 or improve the FETCH request so that it includes the HWM. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14142) Improve information returned about the cluster metadata partition
Jose Armando Garcia Sancio created KAFKA-14142: -- Summary: Improve information returned about the cluster metadata partition Key: KAFKA-14142 URL: https://issues.apache.org/jira/browse/KAFKA-14142 Project: Kafka Issue Type: Improvement Components: kraft Reporter: Jose Armando Garcia Sancio Assignee: Jason Gustafson Fix For: 3.3.0 The Apacke Kafka operator needs to know when it is safe to format and start a KRaft Controller that had a disk failure of the metadata log dir. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-13968) Broker should not generator snapshot until been unfenced
[ https://issues.apache.org/jira/browse/KAFKA-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-13968. Resolution: Fixed > Broker should not generator snapshot until been unfenced > > > Key: KAFKA-13968 > URL: https://issues.apache.org/jira/browse/KAFKA-13968 > Project: Kafka > Issue Type: Bug > Components: kraft >Reporter: dengziming >Assignee: dengziming >Priority: Blocker > Fix For: 3.3.0 > > > > There is a bug when computing `FeaturesDelta` which cause us to generate > snapshot on every commit. > > [2022-06-08 13:07:43,010] INFO [BrokerMetadataSnapshotter id=0] Creating a > new snapshot at offset 0... > (kafka.server.metadata.BrokerMetadataSnapshotter:66) > [2022-06-08 13:07:43,222] INFO [BrokerMetadataSnapshotter id=0] Creating a > new snapshot at offset 2... > (kafka.server.metadata.BrokerMetadataSnapshotter:66) > [2022-06-08 13:07:43,727] INFO [BrokerMetadataSnapshotter id=0] Creating a > new snapshot at offset 3... > (kafka.server.metadata.BrokerMetadataSnapshotter:66) > [2022-06-08 13:07:44,228] INFO [BrokerMetadataSnapshotter id=0] Creating a > new snapshot at offset 4... > (kafka.server.metadata.BrokerMetadataSnapshotter:66) > > Before a broker being unfenced, it won't starting publishing metadata, so > it's meaningless to generate a snapshot. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-13955) Fix failing KRaftClusterTest tests
[ https://issues.apache.org/jira/browse/KAFKA-13955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-13955. Resolution: Fixed > Fix failing KRaftClusterTest tests > -- > > Key: KAFKA-13955 > URL: https://issues.apache.org/jira/browse/KAFKA-13955 > Project: Kafka > Issue Type: Test >Reporter: Luke Chen >Assignee: dengziming >Priority: Major > > Tests are failing with timeout exception > java.util.concurrent.TimeoutException: > testCreateClusterAndPerformReassignment() timed out after 120 seconds > > Failing tests: > Build / JDK 8 and Scala 2.12 / > kafka.server.KRaftClusterTest.testIncrementalAlterConfigs() > Build / JDK 8 and Scala 2.12 / > kafka.server.KRaftClusterTest.testSetLog4jConfigurations() > Build / JDK 8 and Scala 2.12 / > kafka.server.KRaftClusterTest.testLegacyAlterConfigs() > Build / JDK 8 and Scala 2.12 / > kafka.server.KRaftClusterTest.testCreateClusterAndPerformReassignment() > Build / JDK 8 and Scala 2.12 / > kafka.server.KRaftClusterTest.testUnregisterBroker() > Build / JDK 8 and Scala 2.12 / > kafka.server.KRaftClusterTest.testCreateClusterAndCreateAndManyTopics() > Build / JDK 8 and Scala 2.12 / > kafka.server.KRaftClusterTest.testCreateClusterAndCreateListDeleteTopic() -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13959) Controller should unfence Broker with busy metadata log
Jose Armando Garcia Sancio created KAFKA-13959: -- Summary: Controller should unfence Broker with busy metadata log Key: KAFKA-13959 URL: https://issues.apache.org/jira/browse/KAFKA-13959 Project: Kafka Issue Type: Bug Components: kraft Affects Versions: 3.3.0 Reporter: Jose Armando Garcia Sancio https://issues.apache.org/jira/browse/KAFKA-13955 showed that it is possible for the controller to not unfence a broker if the committed offset keeps increasing. One solution to this problem is to require the broker to only catch up to the last committed offset when they last sent the heartbeat. For example: # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit offset is {{{}X{}}}. The controller remember this last commit offset, call it {{X'}} # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence the broker if {{Z >= X}} or {{{}Z >= X'{}}}. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13883) KIP-835: Monitor Quorum
[ https://issues.apache.org/jira/browse/KAFKA-13883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-13883. Resolution: Fixed > KIP-835: Monitor Quorum > --- > > Key: KAFKA-13883 > URL: https://issues.apache.org/jira/browse/KAFKA-13883 > Project: Kafka > Issue Type: Improvement > Components: kraft > Reporter: Jose Armando Garcia Sancio > Assignee: Jose Armando Garcia Sancio >Priority: Major > > Tracking issue for the implementation of KIP-835. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13918) Schedule or cancel nooprecord write on metadata version change
[ https://issues.apache.org/jira/browse/KAFKA-13918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-13918. Resolution: Duplicate > Schedule or cancel nooprecord write on metadata version change > -- > > Key: KAFKA-13918 > URL: https://issues.apache.org/jira/browse/KAFKA-13918 > Project: Kafka > Issue Type: Sub-task > Components: controller > Reporter: Jose Armando Garcia Sancio > Assignee: Jose Armando Garcia Sancio >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13918) Schedule or cancel nooprecord write on metadata version change
Jose Armando Garcia Sancio created KAFKA-13918: -- Summary: Schedule or cancel nooprecord write on metadata version change Key: KAFKA-13918 URL: https://issues.apache.org/jira/browse/KAFKA-13918 Project: Kafka Issue Type: Sub-task Components: controller Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13904) Move BrokerMetadataListener metrics to broker-metadata-metrics
Jose Armando Garcia Sancio created KAFKA-13904: -- Summary: Move BrokerMetadataListener metrics to broker-metadata-metrics Key: KAFKA-13904 URL: https://issues.apache.org/jira/browse/KAFKA-13904 Project: Kafka Issue Type: Bug Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio The metrics in BorkerMetadataListener should be moved to the broker-metadata-metrics. This is okay because those metrics were never documented in a KIP and instead are now documented in KIP-835. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Reopened] (KAFKA-13502) Support configuring BROKER_LOGGER on controller-only KRaft nodes
[ https://issues.apache.org/jira/browse/KAFKA-13502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio reopened KAFKA-13502: I accidentally resolved this issue. > Support configuring BROKER_LOGGER on controller-only KRaft nodes > > > Key: KAFKA-13502 > URL: https://issues.apache.org/jira/browse/KAFKA-13502 > Project: Kafka > Issue Type: Improvement >Reporter: Colin McCabe >Priority: Major > Labels: kip-500 > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13884) KRaft Obsever are not required to flush on every append
Jose Armando Garcia Sancio created KAFKA-13884: -- Summary: KRaft Obsever are not required to flush on every append Key: KAFKA-13884 URL: https://issues.apache.org/jira/browse/KAFKA-13884 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio The current implementation of the KRaft Client flushes to disk when observers append to the log. This is not required since observer don't participate in leader election and the advancement of the high-watermark. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13883) KIP-835: Monitor Quorum
Jose Armando Garcia Sancio created KAFKA-13883: -- Summary: KIP-835: Monitor Quorum Key: KAFKA-13883 URL: https://issues.apache.org/jira/browse/KAFKA-13883 Project: Kafka Issue Type: Improvement Components: kraft Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Tracking issue for the implementation of KIP-835. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13806) Check CRC when reading snapshots
Jose Armando Garcia Sancio created KAFKA-13806: -- Summary: Check CRC when reading snapshots Key: KAFKA-13806 URL: https://issues.apache.org/jira/browse/KAFKA-13806 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13798) KafkaController should send LeaderAndIsr request when LeaderRecoveryState is altered
Jose Armando Garcia Sancio created KAFKA-13798: -- Summary: KafkaController should send LeaderAndIsr request when LeaderRecoveryState is altered Key: KAFKA-13798 URL: https://issues.apache.org/jira/browse/KAFKA-13798 Project: Kafka Issue Type: Task Components: controller Affects Versions: 3.2.0 Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio The current implementation of KIP-704 and the ZK Controller only sends a LeaderAndIsr request to the followers if the AlterPartition completes an reassignment. That means that if there are no reassignment pending then the ZK Controller never sends a LeaderAndIsr request to the follower. The controller needs to send a LeaderAndIsr request when the partition has recovered because of "fetch from follower" feature. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13784) DescribeQuorum should return the current leader if the handling node is not the current leader
Jose Armando Garcia Sancio created KAFKA-13784: -- Summary: DescribeQuorum should return the current leader if the handling node is not the current leader Key: KAFKA-13784 URL: https://issues.apache.org/jira/browse/KAFKA-13784 Project: Kafka Issue Type: Bug Components: kraft Affects Versions: 3.2.0 Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio For clients calling DescribeQuorum leader it is not possible for them to discover the current leader. If the request is sent to a node that is not the leader is simply replies with INVALID_REQUEST. KIP-595 mentions that it should instead reply with the current leader. > f the response indicates that the intended node is not the current leader, >then check the response to see if the {{LeaderId}} has been set. If so, then >attempt to retry the request with the new leader. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (KAFKA-13682) Implement auto preferred leader election in KRaft Controller
[ https://issues.apache.org/jira/browse/KAFKA-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-13682. Resolution: Fixed > Implement auto preferred leader election in KRaft Controller > > > Key: KAFKA-13682 > URL: https://issues.apache.org/jira/browse/KAFKA-13682 > Project: Kafka > Issue Type: Task > Components: kraft > Reporter: Jose Armando Garcia Sancio > Assignee: Jose Armando Garcia Sancio >Priority: Major > Labels: kip-500 > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (KAFKA-13587) Implement unclean leader election in KIP-704
[ https://issues.apache.org/jira/browse/KAFKA-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-13587. Resolution: Fixed > Implement unclean leader election in KIP-704 > > > Key: KAFKA-13587 > URL: https://issues.apache.org/jira/browse/KAFKA-13587 > Project: Kafka > Issue Type: Improvement > Reporter: Jose Armando Garcia Sancio > Assignee: Jose Armando Garcia Sancio >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13754) Follower should reject Fetch request while the leader is recovering
Jose Armando Garcia Sancio created KAFKA-13754: -- Summary: Follower should reject Fetch request while the leader is recovering Key: KAFKA-13754 URL: https://issues.apache.org/jira/browse/KAFKA-13754 Project: Kafka Issue Type: Task Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio In the PR for KIP-704 we removed leader recovery state validation from the FETCH. This is okay because the leader immediately recovers the partition. We should enable this validation before implementing log recovery from unclean leader election. The old implementation and test is in this commit: https://github.com/apache/kafka/pull/11733/commits/c7e54b8f6cef087deac119d61a46d3586ead72b9 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13696) Topic partition leader should always send AlterPartition when transitioning from RECOVRING TO RECOVERD
Jose Armando Garcia Sancio created KAFKA-13696: -- Summary: Topic partition leader should always send AlterPartition when transitioning from RECOVRING TO RECOVERD Key: KAFKA-13696 URL: https://issues.apache.org/jira/browse/KAFKA-13696 Project: Kafka Issue Type: Task Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13682) Implement auto preferred leader electino in KRaft Controller
Jose Armando Garcia Sancio created KAFKA-13682: -- Summary: Implement auto preferred leader electino in KRaft Controller Key: KAFKA-13682 URL: https://issues.apache.org/jira/browse/KAFKA-13682 Project: Kafka Issue Type: Task Components: kraft Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13621) Resign leader on partition
Jose Armando Garcia Sancio created KAFKA-13621: -- Summary: Resign leader on partition Key: KAFKA-13621 URL: https://issues.apache.org/jira/browse/KAFKA-13621 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio h1. Motivation If the current leader A at epoch X gets partition from the rest of the quorum, quorum voter A will stay leader at epoch X. This happens because voter A will never receive an request from the rest of the voters increasing the epoch. These requests that typically increase the epoch of past leaders are BeginQuorumEpoch and Vote. In addition if voter A (leader at epoch X) doesn't get partition from the rest of the brokers (observer in the KRaft protocol) the brokers will never learn about the new quorum leader. This happens because 1. observers learn about the leader from the Fetch response and 2. observer send a Fetch request to a random leader if the Fetch request times out. Neither of these two scenarios will cause the broker to send a request to a different voter because the leader at epoch X will never send a different leader in the response and the broker will never send a Fetch request to a different voter because the Fetch request will never timeout. h1. Proposed Changes In this scenario the A, the leader at epoch X, will stop receiving Fetch request from the majority of the voters. Voter A should resign as leader if the Fetch request from the majority of the voters is old enough. A reasonable value for "old enough" is the Fetch timeout value. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (KAFKA-13502) Support configuring BROKER_LOGGER on controller-only KRaft nodes
[ https://issues.apache.org/jira/browse/KAFKA-13502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-13502. Resolution: Fixed This issue was fixed by KAFKA-13552. > Support configuring BROKER_LOGGER on controller-only KRaft nodes > > > Key: KAFKA-13502 > URL: https://issues.apache.org/jira/browse/KAFKA-13502 > Project: Kafka > Issue Type: Improvement >Reporter: Colin McCabe >Priority: Major > Labels: kip-500 > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (KAFKA-13552) Unable to dynamically change broker log levels on KRaft
[ https://issues.apache.org/jira/browse/KAFKA-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-13552. Resolution: Fixed > Unable to dynamically change broker log levels on KRaft > --- > > Key: KAFKA-13552 > URL: https://issues.apache.org/jira/browse/KAFKA-13552 > Project: Kafka > Issue Type: Bug > Components: kraft >Affects Versions: 3.1.0, 3.0.0 >Reporter: Ron Dagostino >Assignee: Colin McCabe >Priority: Major > > It is currently not possible to dynamically change the log level in KRaft. > For example: > kafka-configs.sh --bootstrap-server --alter --add-config > "kafka.server.ReplicaManager=DEBUG" --entity-type broker-loggers > --entity-name 0 > Results in: > org.apache.kafka.common.errors.InvalidRequestException: Unexpected resource > type BROKER_LOGGER. > The code to process this request is in ZkAdminManager.alterLogLevelConfigs(). > This needs to be moved out of there, and the functionality has to be > processed locally on the broker instead of being forwarded to the KRaft > controller. > It is also an open question as to how we can dynamically alter log levels for > a remote KRaft controller. Connecting directly to it is one possible > solution, but that may not be desirable since generally connecting directly > to the controller is not necessary. The ticket for this particular spect of > the issue is https://issues.apache.org/jira/browse/KAFKA-13502 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13587) Implement unclean leader election in KIP-704
Jose Armando Garcia Sancio created KAFKA-13587: -- Summary: Implement unclean leader election in KIP-704 Key: KAFKA-13587 URL: https://issues.apache.org/jira/browse/KAFKA-13587 Project: Kafka Issue Type: Improvement Reporter: Jose Armando Garcia Sancio -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13489) Support different compression type for snapshots
Jose Armando Garcia Sancio created KAFKA-13489: -- Summary: Support different compression type for snapshots Key: KAFKA-13489 URL: https://issues.apache.org/jira/browse/KAFKA-13489 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (KAFKA-12932) Interfaces for SnapshotReader and SnapshotWriter
[ https://issues.apache.org/jira/browse/KAFKA-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-12932. Resolution: Fixed > Interfaces for SnapshotReader and SnapshotWriter > > > Key: KAFKA-12932 > URL: https://issues.apache.org/jira/browse/KAFKA-12932 > Project: Kafka > Issue Type: Sub-task > Reporter: Jose Armando Garcia Sancio >Assignee: loboxu >Priority: Major > > Change the snapshot API so that SnapshotWriter and SnapshotReader are > interfaces. Change the existing types SnapshotWriter and SnapshotReader to > use a different name and to implement the interfaces introduced by this issue. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (KAFKA-13357) Controller snapshot contains producer ids records but broker does not
[ https://issues.apache.org/jira/browse/KAFKA-13357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-13357. Resolution: Fixed > Controller snapshot contains producer ids records but broker does not > - > > Key: KAFKA-13357 > URL: https://issues.apache.org/jira/browse/KAFKA-13357 > Project: Kafka > Issue Type: Sub-task > Components: kraft >Affects Versions: 3.0.0 > Reporter: Jose Armando Garcia Sancio >Assignee: Colin McCabe >Priority: Blocker > > MetadataDelta ignores PRODUCER_IDS_RECORDS. A broker doesn't need this state > for its operation. The broker needs to handle this records if we want to hold > the invariant that controllers snapshots are equivalent to broker snapshots. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (KAFKA-12973) Update KIP and dev mailing list
[ https://issues.apache.org/jira/browse/KAFKA-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-12973. Resolution: Fixed > Update KIP and dev mailing list > --- > > Key: KAFKA-12973 > URL: https://issues.apache.org/jira/browse/KAFKA-12973 > Project: Kafka > Issue Type: Sub-task > Reporter: Jose Armando Garcia Sancio > Assignee: Jose Armando Garcia Sancio >Priority: Major > > Update KIP-630 and the Kafka mailing list based on the small implementation > deviations from what is documented in the KIP. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13357) Controller snapshot contains producer ids records but broker does not
Jose Armando Garcia Sancio created KAFKA-13357: -- Summary: Controller snapshot contains producer ids records but broker does not Key: KAFKA-13357 URL: https://issues.apache.org/jira/browse/KAFKA-13357 Project: Kafka Issue Type: Sub-task Components: kraft Affects Versions: 3.0.0 Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio MetadataDelta ignores PRODUCER_IDS_RECORDS. A broker doesn't need this state for its operation. The broker needs to handle this records if we want to hold the invariant that controllers snapshots are equivalent to broker snapshots. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13321) Notify listener of leader change on registration
Jose Armando Garcia Sancio created KAFKA-13321: -- Summary: Notify listener of leader change on registration Key: KAFKA-13321 URL: https://issues.apache.org/jira/browse/KAFKA-13321 Project: Kafka Issue Type: Sub-task Components: kraft Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio When a Listener is registered with the RaftClient, the RaftClient doesn't notify the listener of the current leader when it is an follower. The current implementation of RaftClient notifies this listener of the leader change if it is the current leader and it has caught up to the leader epoch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13208) Use TopicIdPartition instead of TopicPartition when computing the topic delta
Jose Armando Garcia Sancio created KAFKA-13208: -- Summary: Use TopicIdPartition instead of TopicPartition when computing the topic delta Key: KAFKA-13208 URL: https://issues.apache.org/jira/browse/KAFKA-13208 Project: Kafka Issue Type: Improvement Components: kraft, replication Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio {{TopicPartition}} is used as the key when computing the local changes in {{TopicsDelta}}. The topic id is included in the Map value return by {{localChanges}}. I think that the handling of this code and the corresponding code in {{ReplicaManager}} could be simplified if {{localChanges}} instead returned something like {code:java} { deletes: Set[TopicIdPartition], leaders: Map[TopicIdPartition, PartitionRegistration], followers: Map[TopicIdPartition, PartitionRegistration] }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13198) TopicsDelta doesn't update deleted topic when processing PartitionChangeRecord
Jose Armando Garcia Sancio created KAFKA-13198: -- Summary: TopicsDelta doesn't update deleted topic when processing PartitionChangeRecord Key: KAFKA-13198 URL: https://issues.apache.org/jira/browse/KAFKA-13198 Project: Kafka Issue Type: Bug Components: kraft, replication Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.0.0 In KRaft when a replica gets reassigned away from a topic partition we are not notifying the {{ReplicaManager}} to stop the replica. On solution is to track those topic partition ids when processing {{PartitionChangeRecord}} and to returned them as {{deleted}} when the replica manager calls {{calculateDeltaChanges}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13193) Replica manager doesn't update partition state when transitioning from leader to follower with unknown leader
Jose Armando Garcia Sancio created KAFKA-13193: -- Summary: Replica manager doesn't update partition state when transitioning from leader to follower with unknown leader Key: KAFKA-13193 URL: https://issues.apache.org/jira/browse/KAFKA-13193 Project: Kafka Issue Type: Bug Components: kraft, replication Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio This issue applies to both the ZK and KRaft implementation of the replica manager. In the rare case when a replica transition from leader to follower with no leader the partition state is not updated. This is because when handling makeFollowers the ReplicaManager only updates the partition state if the leader is alive. The solution is to always transition to follower but not start the fetcher thread if the leader is unknown or not alive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13182) Input to AbstractFetchetManager::addFetcherForPartition could be simplified
Jose Armando Garcia Sancio created KAFKA-13182: -- Summary: Input to AbstractFetchetManager::addFetcherForPartition could be simplified Key: KAFKA-13182 URL: https://issues.apache.org/jira/browse/KAFKA-13182 Project: Kafka Issue Type: Improvement Components: replication Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio The input to the addFetcherForPartition method in AbstractFetcherManager includes more information than it needs. The fetcher manager only needs the leader id. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13181) ReplicaManager should start fetchers on UnfencedBrokerRecords
Jose Armando Garcia Sancio created KAFKA-13181: -- Summary: ReplicaManager should start fetchers on UnfencedBrokerRecords Key: KAFKA-13181 URL: https://issues.apache.org/jira/browse/KAFKA-13181 Project: Kafka Issue Type: Sub-task Components: kraft, replication Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio The Kraft ReplicaManager starts fetching from the leader if it is a follower and there is an endpoint for the leader. Need to improve the ReplicaManager to also start fetching when the leader registers and gets unfenced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13168) KRaft observers should not have a replica id
Jose Armando Garcia Sancio created KAFKA-13168: -- Summary: KRaft observers should not have a replica id Key: KAFKA-13168 URL: https://issues.apache.org/jira/browse/KAFKA-13168 Project: Kafka Issue Type: Bug Components: kraft Reporter: Jose Armando Garcia Sancio Fix For: 3.0.0 To avoid miss configuration of a broker affecting the quorum of the cluster metadata partition when a Kafka node is configure as broker only the replica id for the KRaft client should be set to {{Optional::empty()}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13165) Validate node id, process role and quorum voters
Jose Armando Garcia Sancio created KAFKA-13165: -- Summary: Validate node id, process role and quorum voters Key: KAFKA-13165 URL: https://issues.apache.org/jira/browse/KAFKA-13165 Project: Kafka Issue Type: Sub-task Components: kraft Reporter: Jose Armando Garcia Sancio Under certain configuration is possible for the Kafka Server to boot up as a broker only but be the cluster metadata quorum leader. We should validate the configuration to avoid this case. # If the {{process.roles}} contains {{controller}} then the {{node.id}} needs to be in the {{controller.quorum.voters}} # If the {{process.roles}} doesn't contain {{controller}} then the {{node.id}} cannot be in the {{controller.quorum.voters}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-12646) Implement snapshot generation on brokers
[ https://issues.apache.org/jira/browse/KAFKA-12646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-12646. Resolution: Fixed > Implement snapshot generation on brokers > > > Key: KAFKA-12646 > URL: https://issues.apache.org/jira/browse/KAFKA-12646 > Project: Kafka > Issue Type: Sub-task > Components: controller > Reporter: Jose Armando Garcia Sancio >Assignee: Colin McCabe >Priority: Major > Labels: kip-500 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-12647) Implement loading snapshot in the broker
[ https://issues.apache.org/jira/browse/KAFKA-12647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-12647. Resolution: Fixed > Implement loading snapshot in the broker > > > Key: KAFKA-12647 > URL: https://issues.apache.org/jira/browse/KAFKA-12647 > Project: Kafka > Issue Type: Sub-task > Reporter: Jose Armando Garcia Sancio >Assignee: Colin McCabe >Priority: Major > Labels: kip-500 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-12997) Expose log record append time to the controller/broker
[ https://issues.apache.org/jira/browse/KAFKA-12997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-12997. Resolution: Fixed > Expose log record append time to the controller/broker > -- > > Key: KAFKA-12997 > URL: https://issues.apache.org/jira/browse/KAFKA-12997 > Project: Kafka > Issue Type: Sub-task >Reporter: Niket Goel > Assignee: Jose Armando Garcia Sancio >Priority: Minor > Labels: kip-500 > > The snapshot records are generated by each individual quorum participant > which also stamps the append time in the records. These appends times are > generated from a different clock (except in the case of the quorum leader) as > compared to the metadata log records (where timestamps are stamped by the > leader). > To enable having a single clock to compare timestamps, > https://issues.apache.org/jira/browse/KAFKA-12952 adds a timestamp field to > the snapshot header which should contain the append time of the highest > record contained in the snapshot (which will be in leader time). > This JIRA tracks exposing and wiring the batch timestamp such that it can be > provided to the SnapshotWriter at the time of snapshot creation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13157) Kafka-dump-log needs to support snapshot records
Jose Armando Garcia Sancio created KAFKA-13157: -- Summary: Kafka-dump-log needs to support snapshot records Key: KAFKA-13157 URL: https://issues.apache.org/jira/browse/KAFKA-13157 Project: Kafka Issue Type: Sub-task Components: kraft Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Extends the kafka-dump-log tool to allow the user to view and print kraft snapshot files -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-13112) Controller's committed offset get out of sync with raft client listener context
[ https://issues.apache.org/jira/browse/KAFKA-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-13112. Resolution: Fixed Yes. > Controller's committed offset get out of sync with raft client listener > context > --- > > Key: KAFKA-13112 > URL: https://issues.apache.org/jira/browse/KAFKA-13112 > Project: Kafka > Issue Type: Bug > Components: controller, kraft > Reporter: Jose Armando Garcia Sancio > Assignee: Jose Armando Garcia Sancio >Priority: Blocker > Labels: kip-500 > Fix For: 3.0.0 > > > The active controller creates an in-memory snapshot for every offset returned > by RaftClient::scheduleAppend and RaftClient::scheduleAtomicAppend. For > RaftClient::scheduleAppend, the RaftClient is free to split those records > into multiple batches. Because of this when scheduleAppend is use there is no > guarantee that the active leader will always have an in-memory snapshot for > every "last committed offset". > To get around this problem, when the active controller renounces from leader > if there is no snapshot at the last committed offset it will instead. > # Reset the snapshot registry > # Unregister the listener from the RaftClient > # Register a new listener with the RaftClient -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13148) Kraft Controller doesn't handle scheduleAppend returning Long.MAX_VALUE
Jose Armando Garcia Sancio created KAFKA-13148: -- Summary: Kraft Controller doesn't handle scheduleAppend returning Long.MAX_VALUE Key: KAFKA-13148 URL: https://issues.apache.org/jira/browse/KAFKA-13148 Project: Kafka Issue Type: Bug Components: controller, kraft Reporter: Jose Armando Garcia Sancio In some cases the RaftClient will return Long.MAX_VALUE: {code:java} /** * Append a list of records to the log. The write will be scheduled for some time * in the future. There is no guarantee that appended records will be written to * the log and eventually committed. However, it is guaranteed that if any of the * records become committed, then all of them will be. * * If the provided current leader epoch does not match the current epoch, which * is possible when the state machine has yet to observe the epoch change, then * this method will return {@link Long#MAX_VALUE} to indicate an offset which is * not possible to become committed. The state machine is expected to discard all * uncommitted entries after observing an epoch change. * * @param epoch the current leader epoch * @param records the list of records to append * @return the expected offset of the last record; {@link Long#MAX_VALUE} if the records could * be committed; null if no memory could be allocated for the batch at this time * @throws org.apache.kafka.common.errors.RecordBatchTooLargeException if the size of the records is greater than the maximum * batch size; if this exception is throw none of the elements in records were * committed */ Long scheduleAtomicAppend(int epoch, List records); {code} The controller doesn't handle this case: {code:java} // If the operation returned a batch of records, those records need to be // written before we can return our result to the user. Here, we hand off // the batch of records to the raft client. They will be written out // asynchronously. final long offset; if (result.isAtomic()) { offset = raftClient.scheduleAtomicAppend(controllerEpoch, result.records()); } else { offset = raftClient.scheduleAppend(controllerEpoch, result.records()); } op.processBatchEndOffset(offset); writeOffset = offset; resultAndOffset = ControllerResultAndOffset.of(offset, result); for (ApiMessageAndVersion message : result.records()) { replay(message.message(), Optional.empty(), offset); } snapshotRegistry.getOrCreateSnapshot(offset); log.debug("Read-write operation {} will be completed when the log " + "reaches offset {}.", this, resultAndOffset.offset()); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13114) Unregsiter listener during renounce when the in-memory snapshot is missing
Jose Armando Garcia Sancio created KAFKA-13114: -- Summary: Unregsiter listener during renounce when the in-memory snapshot is missing Key: KAFKA-13114 URL: https://issues.apache.org/jira/browse/KAFKA-13114 Project: Kafka Issue Type: Sub-task Components: controller Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.0.0 Need to improve the renounce logic to do the following when the last committer offset in-memory snapshot is missing: # Reset the snapshot registry # Unregister the listener from the RaftClient # Register the listener from the RaftClient -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13113) Add unregister support to the RaftClient.
Jose Armando Garcia Sancio created KAFKA-13113: -- Summary: Add unregister support to the RaftClient. Key: KAFKA-13113 URL: https://issues.apache.org/jira/browse/KAFKA-13113 Project: Kafka Issue Type: Sub-task Components: kraft Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.0.0 Implement the following API: {code:java} interface RaftClient { ListenerContext register(Listener); void unregister(ListenerContext); } interface ListenerContext { } interface Listener { void handleCommit(ListenerContext, BatchReader); void handleSnapshot(ListenerContext, SnapshotReader); void handleLeaderChange(ListenerContext, LeaderAndEpoch); } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13112) Controller's committed offset get out of sync with raft client listener context
Jose Armando Garcia Sancio created KAFKA-13112: -- Summary: Controller's committed offset get out of sync with raft client listener context Key: KAFKA-13112 URL: https://issues.apache.org/jira/browse/KAFKA-13112 Project: Kafka Issue Type: Bug Components: controller, kraft Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13104) Controller should notify the RaftClient when it resigns
Jose Armando Garcia Sancio created KAFKA-13104: -- Summary: Controller should notify the RaftClient when it resigns Key: KAFKA-13104 URL: https://issues.apache.org/jira/browse/KAFKA-13104 Project: Kafka Issue Type: Bug Components: controller, kraft Reporter: Jose Armando Garcia Sancio Fix For: 3.0.0 {code:java} private Throwable handleEventException(String name, Optional startProcessingTimeNs, Throwable exception) { ... renounce(); return new UnknownServerException(exception); } {code} When the active controller encounters an event exception it attempts to renounce leadership. Unfortunately, this doesn't tell the {{RaftClient}} that it should attempt to give up leadership. This will result in inconsistent state with the {{RaftClient}} as leader but with the controller as inactive. We should change this implementation so that the active controller asks the {{RaftClient}} to resign. The active controller waits until {{handleLeaderChange}} before calling {{renounce()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13100) Controller cannot revert to an in-memory snapshot
Jose Armando Garcia Sancio created KAFKA-13100: -- Summary: Controller cannot revert to an in-memory snapshot Key: KAFKA-13100 URL: https://issues.apache.org/jira/browse/KAFKA-13100 Project: Kafka Issue Type: Bug Components: controller, kraft Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.0.0 {code:java} [2021-07-16 16:34:55,578] DEBUG [Controller 3002] Executing handleRenounce[3]. (org.apache.kafka.controller.QuorumController) [2021-07-16 16:34:55,578] WARN [Controller 3002] Renouncing the leadership at oldEpoch 3 due to a metadata log event. Reverting to last committed offset 214. (org.apache.kafka.controller.QuorumController) [2021-07-16 16:34:55,579] WARN [Controller 3002] org.apache.kafka.controller.QuorumController@646b1289: failed with unknown server exception RuntimeException at epoch -1 in 1510 us. Reverting to last committed offset 214. (org.apache.kafka.controller. QuorumController) java.lang.RuntimeException: No snapshot for epoch 214. Snapshot epochs are: -1, 1, 3, 5, 11, 16, 21, 26, 31, 36, 41, 46, 51, 56, 61, 94, 96, 97, 107, 108, 112, 125, 126, 128, 135, 171, 208, 213 at org.apache.kafka.timeline.SnapshotRegistry.getSnapshot(SnapshotRegistry.java:173) at org.apache.kafka.timeline.SnapshotRegistry.revertToSnapshot(SnapshotRegistry.java:203) at org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:784) at org.apache.kafka.controller.QuorumController.access$2500(QuorumController.java:121) at org.apache.kafka.controller.QuorumController$QuorumMetaLogListener.lambda$handleLeaderChange$3(QuorumController.java:769) at org.apache.kafka.controller.QuorumController$ControlEvent.run(QuorumController.java:311) at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121) at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200) at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173) at java.lang.Thread.run(Thread.java:748) [2021-07-16 16:34:55,580] ERROR [Controller 3002] Unexpected exception in handleException (org.apache.kafka.queue.KafkaEventQueue) java.lang.RuntimeException: No snapshot for epoch 214. Snapshot epochs are: -1, 1, 3, 5, 11, 16, 21, 26, 31, 36, 41, 46, 51, 56, 61, 94, 96, 97, 107, 108, 112, 125, 126, 128, 135, 171, 208, 213 at org.apache.kafka.timeline.SnapshotRegistry.getSnapshot(SnapshotRegistry.java:173) at org.apache.kafka.timeline.SnapshotRegistry.revertToSnapshot(SnapshotRegistry.java:203) at org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:784) at org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:287) at org.apache.kafka.controller.QuorumController.access$500(QuorumController.java:121) at org.apache.kafka.controller.QuorumController$ControlEvent.handleException(QuorumController.java:317) at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:126) at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200) at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-13098) No such file exception when recovering snapshots in metadata log dir
[ https://issues.apache.org/jira/browse/KAFKA-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-13098. Resolution: Fixed > No such file exception when recovering snapshots in metadata log dir > > > Key: KAFKA-13098 > URL: https://issues.apache.org/jira/browse/KAFKA-13098 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.0.0 > Reporter: Jose Armando Garcia Sancio > Assignee: Jose Armando Garcia Sancio >Priority: Blocker > Labels: kip-500 > Fix For: 3.0.0 > > > {code:java} > RaftClusterTest > testCreateClusterAndCreateListDeleteTopic() FAILED > java.io.UncheckedIOException: java.nio.file.NoSuchFileException: > /tmp/kafka-286994548094074875/broker_0_data0/@metadata-0/partition.metadata.tmp > at > java.nio.file.FileTreeIterator.fetchNextIfNeeded(FileTreeIterator.java:88) > at java.nio.file.FileTreeIterator.hasNext(FileTreeIterator.java:104) > at java.util.Iterator.forEachRemaining(Iterator.java:115) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > > > > at > java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) > at > kafka.raft.KafkaMetadataLog$.recoverSnapshots(KafkaMetadataLog.scala:616) > at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:583) > > at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:257) > at kafka.raft.KafkaRaftManager.(RaftManager.scala:132) > > > > at > kafka.testkit.KafkaClusterTestKit$Builder.build(KafkaClusterTestKit.java:227) > at > kafka.server.RaftClusterTest.testCreateClusterAndCreateListDeleteTopic(RaftClusterTest.scala:87) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13098) No such file exception when recovering snapshots in metadata log dir
Jose Armando Garcia Sancio created KAFKA-13098: -- Summary: No such file exception when recovering snapshots in metadata log dir Key: KAFKA-13098 URL: https://issues.apache.org/jira/browse/KAFKA-13098 Project: Kafka Issue Type: Bug Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.0.0 {code:java} RaftClusterTest > testCreateClusterAndCreateListDeleteTopic() FAILED java.io.UncheckedIOException: java.nio.file.NoSuchFileException: /tmp/kafka-286994548094074875/broker_0_data0/@metadata-0/partition.metadata.tmp at java.nio.file.FileTreeIterator.fetchNextIfNeeded(FileTreeIterator.java:88) at java.nio.file.FileTreeIterator.hasNext(FileTreeIterator.java:104) at java.util.Iterator.forEachRemaining(Iterator.java:115) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) at kafka.raft.KafkaMetadataLog$.recoverSnapshots(KafkaMetadataLog.scala:616) at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:583) at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:257) at kafka.raft.KafkaRaftManager.(RaftManager.scala:132) at kafka.testkit.KafkaClusterTestKit$Builder.build(KafkaClusterTestKit.java:227) at kafka.server.RaftClusterTest.testCreateClusterAndCreateListDeleteTopic(RaftClusterTest.scala:87) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-13078) Closing FileRawSnapshotWriter too early
[ https://issues.apache.org/jira/browse/KAFKA-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-13078. Resolution: Fixed > Closing FileRawSnapshotWriter too early > --- > > Key: KAFKA-13078 > URL: https://issues.apache.org/jira/browse/KAFKA-13078 > Project: Kafka > Issue Type: Bug > Components: kraft >Affects Versions: 3.0.0 > Reporter: Jose Armando Garcia Sancio > Assignee: Jose Armando Garcia Sancio >Priority: Blocker > Labels: kip-500 > Fix For: 3.0.0 > > > We are getting the following error > {code:java} > [2021-07-13 17:23:42,174] ERROR [kafka-raft-io-thread]: Error due to > (kafka.raft.KafkaRaftManager$RaftIoThread) > java.io.UncheckedIOException: Error calculating snapshot size. temp path = > /mnt/kafka/kafka-metadata-logs/@metadata-0/0062-02-3249768281228588378.checkpoint.part, > snapshotId = OffsetAndEpoch(offset=62, epoch=2). > at > org.apache.kafka.snapshot.FileRawSnapshotWriter.sizeInBytes(FileRawSnapshotWriter.java:63) > at > org.apache.kafka.raft.KafkaRaftClient.maybeSendFetchOrFetchSnapshot(KafkaRaftClient.java:2044) > at > org.apache.kafka.raft.KafkaRaftClient.pollFollowerAsObserver(KafkaRaftClient.java:2032) > at > org.apache.kafka.raft.KafkaRaftClient.pollFollower(KafkaRaftClient.java:1995) > at > org.apache.kafka.raft.KafkaRaftClient.pollCurrentState(KafkaRaftClient.java:2104) > at org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:2217) > at kafka.raft.KafkaRaftManager$RaftIoThread.doWork(RaftManager.scala:52) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) > Caused by: java.nio.channels.ClosedChannelException > at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) > at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:300) > at > org.apache.kafka.snapshot.FileRawSnapshotWriter.sizeInBytes(FileRawSnapshotWriter.java:60) > ... 7 more > {code} > This is because the {{FollowerState}} is closing the snapshot write passed > through the argument instead of the one being replaced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-13080) Fetch snapshot request are not directed to kraft in controller
[ https://issues.apache.org/jira/browse/KAFKA-13080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-13080. Resolution: Fixed > Fetch snapshot request are not directed to kraft in controller > -- > > Key: KAFKA-13080 > URL: https://issues.apache.org/jira/browse/KAFKA-13080 > Project: Kafka > Issue Type: Bug > Components: controller, kraft > Reporter: Jose Armando Garcia Sancio > Assignee: Jose Armando Garcia Sancio >Priority: Blocker > Labels: kip-500 > Fix For: 3.0.0 > > > Kraft followers and observer are seeing the following error > {code:java} > [2021-07-13 18:15:47,289] ERROR [RaftManager nodeId=2] Unexpected error > UNKNOWN_SERVER_ERROR in FETCH_SNAPSHOT response: > InboundResponse(correlationId=29862, > data=FetchSnapshotResponseData(throttleTimeMs=0, errorCode=-1, topics=[]), > sourceId=3001) (org.apache.kafka.raft.KafkaRaftClient) {code} > This is because ControllerApis is not directing FetchSnapshost request to the > raft manager. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13090) Improve cluster snapshot integration test
Jose Armando Garcia Sancio created KAFKA-13090: -- Summary: Improve cluster snapshot integration test Key: KAFKA-13090 URL: https://issues.apache.org/jira/browse/KAFKA-13090 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.0.0 Extends the test in RaftClusterSnapshotTest to verify that both the controllers and brokers are generating snapshots. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13089) Revisit the usage of BufferSuppliers in Kraft
Jose Armando Garcia Sancio created KAFKA-13089: -- Summary: Revisit the usage of BufferSuppliers in Kraft Key: KAFKA-13089 URL: https://issues.apache.org/jira/browse/KAFKA-13089 Project: Kafka Issue Type: Sub-task Components: kraft Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio The latest KafkaRaftClient creates a new BufferSupplier every time it is needed. A buffer supplier is needed when reading from the log and when reading from a snapshot. It would be good to investigate if there is a performance and memory usage advantage of sharing the buffer supplier between those use cases and every time the log or snapshot are read. If BufferSupplier is share, it is very likely that the implementation will have to be thread-safe because we need support multiple Listeners and each Listener would be using a different thread. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13080) Fetch snapshot request are not directed to kraft in controller
Jose Armando Garcia Sancio created KAFKA-13080: -- Summary: Fetch snapshot request are not directed to kraft in controller Key: KAFKA-13080 URL: https://issues.apache.org/jira/browse/KAFKA-13080 Project: Kafka Issue Type: Bug Components: controller, kraft Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.0.0 Kraft followers and observer are seeing the following error {code:java} [2021-07-13 18:15:47,289] ERROR [RaftManager nodeId=2] Unexpected error UNKNOWN_SERVER_ERROR in FETCH_SNAPSHOT response: InboundResponse(correlationId=29862, data=FetchSnapshotResponseData(throttleTimeMs=0, errorCode=-1, topics=[]), sourceId=3001) (org.apache.kafka.raft.KafkaRaftClient) {code} This is because ControllerApis is not directing FetchSnapshost request to the raft manager. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13078) Closing FileRawSnapshotWriter too early
Jose Armando Garcia Sancio created KAFKA-13078: -- Summary: Closing FileRawSnapshotWriter too early Key: KAFKA-13078 URL: https://issues.apache.org/jira/browse/KAFKA-13078 Project: Kafka Issue Type: Bug Components: kraft Affects Versions: 3.0.0 Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.0.0 We are getting the following error {code:java} [2021-07-13 17:23:42,174] ERROR [kafka-raft-io-thread]: Error due to (kafka.raft.KafkaRaftManager$RaftIoThread) java.io.UncheckedIOException: Error calculating snapshot size. temp path = /mnt/kafka/kafka-metadata-logs/@metadata-0/0062-02-3249768281228588378.checkpoint.part, snapshotId = OffsetAndEpoch(offset=62, epoch=2). at org.apache.kafka.snapshot.FileRawSnapshotWriter.sizeInBytes(FileRawSnapshotWriter.java:63) at org.apache.kafka.raft.KafkaRaftClient.maybeSendFetchOrFetchSnapshot(KafkaRaftClient.java:2044) at org.apache.kafka.raft.KafkaRaftClient.pollFollowerAsObserver(KafkaRaftClient.java:2032) at org.apache.kafka.raft.KafkaRaftClient.pollFollower(KafkaRaftClient.java:1995) at org.apache.kafka.raft.KafkaRaftClient.pollCurrentState(KafkaRaftClient.java:2104) at org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:2217) at kafka.raft.KafkaRaftManager$RaftIoThread.doWork(RaftManager.scala:52) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) Caused by: java.nio.channels.ClosedChannelException at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:300) at org.apache.kafka.snapshot.FileRawSnapshotWriter.sizeInBytes(FileRawSnapshotWriter.java:60) ... 7 more {code} This is because the {{FollowerState}} is closing the snapshot write passed through the argument instead of the one being replaced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13074) Implement mayClean for MockLog
Jose Armando Garcia Sancio created KAFKA-13074: -- Summary: Implement mayClean for MockLog Key: KAFKA-13074 URL: https://issues.apache.org/jira/browse/KAFKA-13074 Project: Kafka Issue Type: Bug Reporter: Jose Armando Garcia Sancio The current implement of MockLog doesn't implement maybeClean. It is expected that MockLog has the same semantic as KafkaMetadataLog. This is assumed to be true for a few of the tests suite like the raft simulation and the kafka raft client test context. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13073) Simulation test fails due to inconsistency in MockLog's implementation
Jose Armando Garcia Sancio created KAFKA-13073: -- Summary: Simulation test fails due to inconsistency in MockLog's implementation Key: KAFKA-13073 URL: https://issues.apache.org/jira/browse/KAFKA-13073 Project: Kafka Issue Type: Bug Components: controller, replication Affects Versions: 3.0.0 Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio Fix For: 3.0.0 We are getting the following error on trunk {code:java} RaftEventSimulationTest > canRecoverAfterAllNodesKilled STANDARD_OUT timestamp = 2021-07-12T16:26:55.663, RaftEventSimulationTest:canRecoverAfterAllNodesKilled = java.lang.RuntimeException: Uncaught exception during poll of node 1 |---jqwik--- tries = 25| # of calls to property checks = 25 | # of not rejected calls generation = RANDOMIZED | parameters are randomly generated after-failure = PREVIOUS_SEED | use the previous seed when-fixed-seed = ALLOW | fixing the random seed is allowed edge-cases#mode = MIXIN | edge cases are mixed in edge-cases#total = 108| # of all combined edge cases edge-cases#tried = 4 | # of edge cases tried in current run seed = 8079861963960994566| random seed to reproduce generated values Sample -- arg0: 4002 arg1: 2 arg2: 4{code} I think there are a couple of issues here: # The {{ListenerContext}} for {{KafkaRaftClient}} uses the value returned by {{ReplicatedLog::startOffset()}} to determined the log start and when to load a snapshot while the {{MockLog}} implementation uses {{logStartOffset}} which could be a different value. # {{MockLog}} doesn't implement {{ReplicatedLog::maybeClean}} so the log start offset is always 0. # The snapshot id validation for {{MockLog}} and {{KafkaMetadataLog}}'s {{createNewSnapshot}} throws an exception when the snapshot id is less than the log start offset. Solutions: Fix the error quoted above we only need to fix bullet point 3. but I think we should fix all of the issues enumerated in this Jira. For 1. we should change the {{MockLog}} implementation so that it uses {{startOffset}} both externally and internally. For 2. I will file another issue to track this implementation. For 3. I think this validation is too strict. I think it is safe to simply ignore any attempt by the state machine to create an snapshot with an id less that the log start offset. We should return a {{Optional.empty()}}when the snapshot id is less than the log start offset. This tells the user that it doesn't need to generate a snapshot for that offset. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-12974) Change the default for snapshot generation configuration
[ https://issues.apache.org/jira/browse/KAFKA-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-12974. Resolution: Fixed Already fixed. Default set to 20MB. > Change the default for snapshot generation configuration > > > Key: KAFKA-12974 > URL: https://issues.apache.org/jira/browse/KAFKA-12974 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 3.0.0 > Reporter: Jose Armando Garcia Sancio >Priority: Blocker > > In PR https://github.com/apache/kafka/pull/10812 the default for the > {{metadata.log.snapshot.min.new_record.bytes}} is set to {{Int.MaxValue}}. > This was done to disable snapshot generation by default since snapshot > loading is not implemented on the broker. > This value should be changed to something much smaller. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-12863) Configure controller snapshot generation
[ https://issues.apache.org/jira/browse/KAFKA-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-12863. Fix Version/s: 3.0.0 Resolution: Fixed > Configure controller snapshot generation > > > Key: KAFKA-12863 > URL: https://issues.apache.org/jira/browse/KAFKA-12863 > Project: Kafka > Issue Type: Sub-task > Components: controller > Reporter: Jose Armando Garcia Sancio > Assignee: Jose Armando Garcia Sancio >Priority: Major > Labels: kip-500 > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-12952) Metadata Snapshot File Delimiters
[ https://issues.apache.org/jira/browse/KAFKA-12952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-12952. Fix Version/s: 3.0.0 Resolution: Fixed > Metadata Snapshot File Delimiters > - > > Key: KAFKA-12952 > URL: https://issues.apache.org/jira/browse/KAFKA-12952 > Project: Kafka > Issue Type: Sub-task > Components: controller, kraft >Reporter: Niket Goel >Assignee: Niket Goel >Priority: Minor > Labels: kip-500 > Fix For: 3.0.0 > > > Create new Control Records that will serve as the header and footer for a > Metadata Snapshot File. These records will be contained at the beginning and > end of each Snapshot File, and can be checked to verify completeness of a > snapshot file. > The following fields are proposed for the Header: > # *Version :* Schema version for the snapshot header > # *Last Contained Log Time* : The append time of the highest record > contained in this snapshot > # *End Offset* : End offset of the snapshot from the snapshot ID > # *Epoch :* Epoch of the snapshot ** from the Snapshot ID** > # *Creator ID* : (Optional) ID of the broker/Controller that created the > snapshot > # *Cluster ID :* (Optional) ID of the cluster that created the snapshot > # *Create Time :* Timestamp of the snapshot creation (might not be needed as > each record batch has a timestamp already. > The following fields are proposed for the footer: > # *Version* : Schema version of the snapshot footer (same as header) > # *Record Type* : A type fields indicating this is the end record for the > snapshot file. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13020) SnapshotReader should decode and repor the append time in the header
Jose Armando Garcia Sancio created KAFKA-13020: -- Summary: SnapshotReader should decode and repor the append time in the header Key: KAFKA-13020 URL: https://issues.apache.org/jira/browse/KAFKA-13020 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-13006) Remove the method RaftClient.leaderAndEpoch
Jose Armando Garcia Sancio created KAFKA-13006: -- Summary: Remove the method RaftClient.leaderAndEpoch Key: KAFKA-13006 URL: https://issues.apache.org/jira/browse/KAFKA-13006 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio The are semantic differences between {{RaftClient.leaderAndEpoch}} and {{RaftClient.Listener.handleLeaderChange}} specially when the raft client transition from follower to leader. To simplify the API, I think that we should remove the method {{RaftClient.leaderAndEpoch}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12992) Make kraft configuration properties public
Jose Armando Garcia Sancio created KAFKA-12992: -- Summary: Make kraft configuration properties public Key: KAFKA-12992 URL: https://issues.apache.org/jira/browse/KAFKA-12992 Project: Kafka Issue Type: Sub-task Components: core Reporter: Jose Armando Garcia Sancio Fix For: 3.0.0 All of the Kraft configurations should be made public: {code:java} /* * KRaft mode configs. Note that these configs are defined as internal. We will make them public in the 3.0.0 release. */ .defineInternal(ProcessRolesProp, LIST, Collections.emptyList(), ValidList.in("broker", "controller"), HIGH, ProcessRolesDoc) .defineInternal(NodeIdProp, INT, Defaults.EmptyNodeId, null, HIGH, NodeIdDoc) .defineInternal(InitialBrokerRegistrationTimeoutMsProp, INT, Defaults.InitialBrokerRegistrationTimeoutMs, null, MEDIUM, InitialBrokerRegistrationTimeoutMsDoc) .defineInternal(BrokerHeartbeatIntervalMsProp, INT, Defaults.BrokerHeartbeatIntervalMs, null, MEDIUM, BrokerHeartbeatIntervalMsDoc) .defineInternal(BrokerSessionTimeoutMsProp, INT, Defaults.BrokerSessionTimeoutMs, null, MEDIUM, BrokerSessionTimeoutMsDoc) .defineInternal(MetadataLogDirProp, STRING, null, null, HIGH, MetadataLogDirDoc) .defineInternal(ControllerListenerNamesProp, STRING, null, null, HIGH, ControllerListenerNamesDoc) .defineInternal(SaslMechanismControllerProtocolProp, STRING, SaslConfigs.DEFAULT_SASL_MECHANISM, null, HIGH, SaslMechanismControllerProtocolDoc) {code} https://github.com/apache/kafka/blob/2beaf9a720330615bc5474ec079f8b4b105eff91/core/src/main/scala/kafka/server/KafkaConfig.scala#L1043-L1053 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12982) Notify listeners of raft client shutdowns
Jose Armando Garcia Sancio created KAFKA-12982: -- Summary: Notify listeners of raft client shutdowns Key: KAFKA-12982 URL: https://issues.apache.org/jira/browse/KAFKA-12982 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio `RaftClient.Listener.beginShutdown` should be called when the `RaftClient` is shutting down. I think there should be two ways to terminate the `RaftClient`. Those are `shutdown` and `close`. It looks like the current code for `close` only closes the metrics registry. It doesn't notify the listeners that the raft client was close. It doesn't stop future `poll` from updating the raft client. There is also an assumption that `shutdown` can only be called once. I think to satisfy this we should remove this method from `RaftClient` and keep it as implementation method in `KafkaRaftClient`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12974) Change the default for snapshot generation configuration
Jose Armando Garcia Sancio created KAFKA-12974: -- Summary: Change the default for snapshot generation configuration Key: KAFKA-12974 URL: https://issues.apache.org/jira/browse/KAFKA-12974 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio In PR https://github.com/apache/kafka/pull/10812 the default for the {metadata.log.snapshot.min.new_record.bytes} is set to {Int.MaxValue}. This was done to disable snapshot generation by default since snapshot loading is not implemented on the broker. This value should be changed to something much smaller. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12973) Update KIP and dev mailing list
Jose Armando Garcia Sancio created KAFKA-12973: -- Summary: Update KIP and dev mailing list Key: KAFKA-12973 URL: https://issues.apache.org/jira/browse/KAFKA-12973 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio Update KIP-630 and the Kafka mailing list based on the small implementation deviations from what is documented in the KIP. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12968) Add integration tests for "test-kraft-server-start"
Jose Armando Garcia Sancio created KAFKA-12968: -- Summary: Add integration tests for "test-kraft-server-start" Key: KAFKA-12968 URL: https://issues.apache.org/jira/browse/KAFKA-12968 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12958) Add similation invariant for leadership and snapshot
Jose Armando Garcia Sancio created KAFKA-12958: -- Summary: Add similation invariant for leadership and snapshot Key: KAFKA-12958 URL: https://issues.apache.org/jira/browse/KAFKA-12958 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio During the simulation we should add an invariant that notified leaders are never asked to load snapshots. The state machine always sees the following sequence of callback calls: Leaders see: ... handleLeaderChange state machine is notify of leadership handleSnapshot is never called Non-leader see: ... handleLeaderChange state machine is notify that is not leader handleSnapshot is called 0 or more times -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12932) Interfaces for SnapshotReader and SnapshotWriter
Jose Armando Garcia Sancio created KAFKA-12932: -- Summary: Interfaces for SnapshotReader and SnapshotWriter Key: KAFKA-12932 URL: https://issues.apache.org/jira/browse/KAFKA-12932 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio Change the snapshot API so that SnapshotWriter and SnapshotReader are interfaces. Change the existing types SnapshotWriter and SnapshotReader to use a different name and to implement the interfaces introduced by this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12908) Load snapshot heuristic
Jose Armando Garcia Sancio created KAFKA-12908: -- Summary: Load snapshot heuristic Key: KAFKA-12908 URL: https://issues.apache.org/jira/browse/KAFKA-12908 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio The {{KafkaRaftCient}} implementation only forces the {{RaftClient.Listener}} to load a snapshot only when the listener's next offset is less than the start offset. This is technically correct but in some cases it may be more efficient to load a snapshot even when the next offset exists in the log. This is clearly true when the latest snapshot has less entries than the number of records from the next offset to the latest snapshot. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12873) Log truncation due to divergence should also remove snapshots
Jose Armando Garcia Sancio created KAFKA-12873: -- Summary: Log truncation due to divergence should also remove snapshots Key: KAFKA-12873 URL: https://issues.apache.org/jira/browse/KAFKA-12873 Project: Kafka Issue Type: Sub-task Components: log Reporter: Jose Armando Garcia Sancio It should not be possible for log truncation to truncate past the high-watermark and we know that snapshots are less than the high-watermark. Having said that I think we should add code that removes any snapshot that is greater than the log end offset after a log truncation. Currently the code that does log truncation is in `KafkaMetadataLog::truncateTo`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12863) Configure controller snapshot generation
Jose Armando Garcia Sancio created KAFKA-12863: -- Summary: Configure controller snapshot generation Key: KAFKA-12863 URL: https://issues.apache.org/jira/browse/KAFKA-12863 Project: Kafka Issue Type: Sub-task Components: controller Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12837) Process entire batch in broker metadata listener
Jose Armando Garcia Sancio created KAFKA-12837: -- Summary: Process entire batch in broker metadata listener Key: KAFKA-12837 URL: https://issues.apache.org/jira/browse/KAFKA-12837 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio The currently BrokerMetadataListener process one batch at a time even thought it is possible for the BatchReader to contain more than one batch. This is functionally correct but it would required less coordination between the RaftIOThread and the broker metadata listener thread if the broker is changed to process all of the batches included in the BatchReader sent through handleCommit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-12342) Get rid of raft/meta log shim layer
[ https://issues.apache.org/jira/browse/KAFKA-12342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-12342. Resolution: Fixed > Get rid of raft/meta log shim layer > --- > > Key: KAFKA-12342 > URL: https://issues.apache.org/jira/browse/KAFKA-12342 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > Labels: kip-500 > > We currently use a shim to bridge the interface differences between > `RaftClient` and `MetaLogManager`. We need to converge the two interfaces and > get rid of the shim. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-12543) Re-design the ownership model for snapshots
[ https://issues.apache.org/jira/browse/KAFKA-12543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Armando Garcia Sancio resolved KAFKA-12543. Resolution: Fixed > Re-design the ownership model for snapshots > --- > > Key: KAFKA-12543 > URL: https://issues.apache.org/jira/browse/KAFKA-12543 > Project: Kafka > Issue Type: Sub-task > Reporter: Jose Armando Garcia Sancio > Assignee: Jose Armando Garcia Sancio >Priority: Major > > With the current implementation, {{RawSnapshotReader}} are created and closed > by the {{KafkaRaftClient}} as needed to satisfy {{FetchSnapshot}} requests. > This means that for {{FileRawSnapshotReader}} they are closed before the > network client has had a chance to send the bytes over the network. > One way to fix this is to make the {{KafkaMetadataLog}} the owner of the > {{FileRawSnapshotReader}}. Once a {{FileRawSnapshotReader}} is created it > will stay open until the snapshot is deleted by > {{ReplicatedLog::deleteBeforeSnapshot}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12787) Configure and integrate controller snapshot with the RaftClient
Jose Armando Garcia Sancio created KAFKA-12787: -- Summary: Configure and integrate controller snapshot with the RaftClient Key: KAFKA-12787 URL: https://issues.apache.org/jira/browse/KAFKA-12787 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio Assignee: Jose Armando Garcia Sancio -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12773) Use UncheckedIOException when wrapping IOException
Jose Armando Garcia Sancio created KAFKA-12773: -- Summary: Use UncheckedIOException when wrapping IOException Key: KAFKA-12773 URL: https://issues.apache.org/jira/browse/KAFKA-12773 Project: Kafka Issue Type: Sub-task Reporter: Jose Armando Garcia Sancio Use UncheckedIOException when wrapping IOException instead of RuntimeException. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12668) MockScheduler is not safe to use in concurrent code.
Jose Armando Garcia Sancio created KAFKA-12668: -- Summary: MockScheduler is not safe to use in concurrent code. Key: KAFKA-12668 URL: https://issues.apache.org/jira/browse/KAFKA-12668 Project: Kafka Issue Type: Improvement Components: unit tests Reporter: Jose Armando Garcia Sancio The current implementation of MockScheduler executes tasks in the same stack when schedule is called. This violates Log's assumption since Log calls schedule while holding a lock. This can cause deadlock in tests. One solution is to change MockSchedule schedule method so that tick is not called. tick should be called by a stack (thread) that doesn't hold any locks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12646) Implement loading snapshot in the controller
Jose Armando Garcia Sancio created KAFKA-12646: -- Summary: Implement loading snapshot in the controller Key: KAFKA-12646 URL: https://issues.apache.org/jira/browse/KAFKA-12646 Project: Kafka Issue Type: Sub-task Components: controller Reporter: Jose Armando Garcia Sancio -- This message was sent by Atlassian Jira (v8.3.4#803005)