[jira] [Created] (KAFKA-14494) Kafka Java client can't send data when behind SOCKS proxy - while native client can
Oleg Zhovtanyuk created KAFKA-14494: --- Summary: Kafka Java client can't send data when behind SOCKS proxy - while native client can Key: KAFKA-14494 URL: https://issues.apache.org/jira/browse/KAFKA-14494 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 3.3.1 Reporter: Oleg Zhovtanyuk When Kafka Java client sits behind the SOCK5 proxy, it can connect to the cluster, get the list of brokers, but enters the infinite loop trying to detect the least loaded broker. To the contrary, NodeJS client (a wrapper for librdkafka) perform the same steps, but proceeds further to the binary data exchange. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [kafka] ijuma commented on pull request #12890: KAFKA-14414: Remove unnecessary usage of ObjectSerializationCache
ijuma commented on PR #12890: URL: https://github.com/apache/kafka/pull/12890#issuecomment-1352675158 More details on identity hash code here: https://shipilev.net/jvm/anatomy-quarks/26-identity-hash-code/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] ijuma commented on pull request #12890: KAFKA-14414: Remove unnecessary usage of ObjectSerializationCache
ijuma commented on PR #12890: URL: https://github.com/apache/kafka/pull/12890#issuecomment-1352674511 Out of curiosity, what profiler was used to compute this? Also, did we see an actual improvement (eg was cpu usage or latency lower after the change)? I ask because profilers are known to have safepoint bias and can incorrectly attribute the cost when it comes to certain method types. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (KAFKA-4852) ByteBufferSerializer not compatible with offsets
[ https://issues.apache.org/jira/browse/KAFKA-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647871#comment-17647871 ] Werner Daehn commented on KAFKA-4852: - Agree, it is resolved. > ByteBufferSerializer not compatible with offsets > > > Key: KAFKA-4852 > URL: https://issues.apache.org/jira/browse/KAFKA-4852 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.10.1.1 > Environment: all >Reporter: Werner Daehn >Assignee: LinShunkang >Priority: Minor > Fix For: 3.4.0 > > > Quick intro: A ByteBuffer.rewind() resets the position to zero. What if the > ByteBuffer was created with an offset? new ByteBuffer(data, 3, 10)? The > ByteBufferSerializer will send from pos=0 and not from pos=3 onwards. > Solution: No rewind() but flip() for reading a ByteBuffer. That's what the > flip is meant for. > Story: > Imagine the incoming data comes from a byte[], e.g. a network stream > containing topicname, partition, key, value, ... and you want to create a new > ProducerRecord for that. As the constructor of ProducerRecord requires > (topic, partition, key, value) you have to copy from above byte[] the key and > value. That means there is a memcopy taking place. Since the payload can be > potentially large, that introduces a lot of overhead. Twice the memory. > A nice solution to this problem is to simply wrap the network byte[] into new > ByteBuffers: > ByteBuffer key = ByteBuffer.wrap(data, keystart, keylength); > ByteBuffer value = ByteBuffer.wrap(data, valuestart, valuelength); > and then use the ByteBufferSerializer instead of the ByteArraySerializer. > But that does not work as the ByteBufferSerializer does a rewind(), hence > both, key and value, will start at position=0 of the data[]. > public class ByteBufferSerializer implements Serializer { > public byte[] serialize(String topic, ByteBuffer data) { > data.rewind(); -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [kafka] satishd commented on a diff in pull request #11390: [KAFKA-13369] Follower fetch protocol changes for tiered storage.
satishd commented on code in PR #11390: URL: https://github.com/apache/kafka/pull/11390#discussion_r1049205016 ## core/src/main/scala/kafka/server/ReplicaFetcherThread.scala: ## @@ -192,4 +204,140 @@ class ReplicaFetcherThread(name: String, partition.truncateFullyAndStartAt(offset, isFuture = false) } + def buildProducerSnapshotFile(snapshotFile: File, remoteLogSegmentMetadata: RemoteLogSegmentMetadata, rlm: RemoteLogManager): Unit = { +val tmpSnapshotFile = new File(snapshotFile.getAbsolutePath + ".tmp") +// Copy it to snapshot file in atomic manner. +Files.copy(rlm.storageManager().fetchIndex(remoteLogSegmentMetadata, RemoteStorageManager.IndexType.PRODUCER_SNAPSHOT), + tmpSnapshotFile.toPath, StandardCopyOption.REPLACE_EXISTING) +Utils.atomicMoveWithFallback(tmpSnapshotFile.toPath, snapshotFile.toPath, false) + } + + /** + * It tries to build the required state for this partition from leader and remote storage so that it can start + * fetching records from the leader. + */ + override protected def buildRemoteLogAuxState(partition: TopicPartition, +currentLeaderEpoch: Int, +leaderLocalLogStartOffset: Long, + epochForLeaderLocalLogStartOffset: Int, +leaderLogStartOffset: Long): Long = { + +def fetchEarlierEpochEndOffset(epoch: Int): EpochEndOffset = { + val previousEpoch = epoch - 1 + // Find the end-offset for the epoch earlier to the given epoch from the leader + val partitionsWithEpochs = Map(partition -> new EpochData().setPartition(partition.partition()) +.setCurrentLeaderEpoch(currentLeaderEpoch) +.setLeaderEpoch(previousEpoch)) + val maybeEpochEndOffset = leader.fetchEpochEndOffsets(partitionsWithEpochs).get(partition) + if (maybeEpochEndOffset.isEmpty) { +throw new KafkaException("No response received for partition: " + partition); + } + + val epochEndOffset = maybeEpochEndOffset.get + if (epochEndOffset.errorCode() != Errors.NONE.code()) { +throw Errors.forCode(epochEndOffset.errorCode()).exception() + } + + epochEndOffset +} + +val log = replicaMgr.localLogOrException(partition) +val nextOffset = { + if (log.remoteStorageSystemEnable && log.config.remoteLogConfig.remoteStorageEnable) { Review Comment: As discussed offline, the current follower fetch retries when it receives an error while building aurxiliary state. It will eventually gets the auxiliary data from remote storage for the available leader-log-start-offset. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (KAFKA-14465) java.lang.NumberFormatException: For input string: "index"
[ https://issues.apache.org/jira/browse/KAFKA-14465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jianbin.chen updated KAFKA-14465: - Attachment: image-2022-12-15-12-26-24-718.png > java.lang.NumberFormatException: For input string: "index" > --- > > Key: KAFKA-14465 > URL: https://issues.apache.org/jira/browse/KAFKA-14465 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: jianbin.chen >Priority: Major > Attachments: image-2022-12-15-12-26-24-718.png > > > {code:java} > [2022-12-13 07:12:20,369] WARN [Log partition=fp_sg_flow_copy-1, > dir=/home/admin/output/kafka-logs] Found a corrupted index file corresponding > to log file /home/admin/output/kafk > a-logs/fp_sg_flow_copy-1/0165.log due to Corrupt index found, > index file > (/home/admin/output/kafka-logs/fp_sg_flow_copy-1/0165.index) > has non-zero > size but the last offset is 165 which is no greater than the base offset > 165.}, recovering segment and rebuilding index files... (kafka.log.Log) > [2022-12-13 07:12:20,369] ERROR There was an error in one of the threads > during logs loading: java.lang.NumberFormatException: For input string: > "index" (kafka.log.LogManager) > [2022-12-13 07:12:20,374] INFO [ProducerStateManager > partition=fp_sg_flow_copy-1] Writing producer snapshot at offset 165 > (kafka.log.ProducerStateManager) > [2022-12-13 07:12:20,378] INFO [Log partition=fp_sg_flow_copy-1, > dir=/home/admin/output/kafka-logs] Loading producer state from offset 165 > with message format version 2 (kafka.lo > g.Log) > [2022-12-13 07:12:20,381] INFO [Log partition=fp_sg_flow_copy-1, > dir=/home/admin/output/kafka-logs] Completed load of log with 1 segments, log > start offset 165 and log end offset > 165 in 13 ms (kafka.log.Log) > [2022-12-13 07:12:20,389] ERROR [KafkaServer id=1] Fatal error during > KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) > java.lang.NumberFormatException: For input string: "index" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:589) > at java.lang.Long.parseLong(Long.java:631) > at scala.collection.immutable.StringLike.toLong(StringLike.scala:306) > at scala.collection.immutable.StringLike.toLong$(StringLike.scala:306) > at scala.collection.immutable.StringOps.toLong(StringOps.scala:29) > at kafka.log.Log$.offsetFromFile(Log.scala:1846) > at kafka.log.Log.$anonfun$loadSegmentFiles$3(Log.scala:331) > at > scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:789) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:788) > at kafka.log.Log.loadSegmentFiles(Log.scala:320) > at kafka.log.Log.loadSegments(Log.scala:403) > at kafka.log.Log.(Log.scala:216) > at kafka.log.Log$.apply(Log.scala:1748) > at kafka.log.LogManager.loadLog(LogManager.scala:265) > at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:335) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > [2022-12-13 07:12:20,401] INFO [KafkaServer id=1] shutting down > (kafka.server.KafkaServer) > {code} > When I restart the broker, it becomes like this, I deleted the > 000165.index file, after starting it, there are still other > files with the same error, please tell me how to fix it and what it is causing -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14465) java.lang.NumberFormatException: For input string: "index"
[ https://issues.apache.org/jira/browse/KAFKA-14465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jianbin.chen updated KAFKA-14465: - Description: {code:java} [2022-12-13 07:12:20,369] WARN [Log partition=fp_sg_flow_copy-1, dir=/home/admin/output/kafka-logs] Found a corrupted index file corresponding to log file /home/admin/output/kafk a-logs/fp_sg_flow_copy-1/0165.log due to Corrupt index found, index file (/home/admin/output/kafka-logs/fp_sg_flow_copy-1/0165.index) has non-zero size but the last offset is 165 which is no greater than the base offset 165.}, recovering segment and rebuilding index files... (kafka.log.Log) [2022-12-13 07:12:20,369] ERROR There was an error in one of the threads during logs loading: java.lang.NumberFormatException: For input string: "index" (kafka.log.LogManager) [2022-12-13 07:12:20,374] INFO [ProducerStateManager partition=fp_sg_flow_copy-1] Writing producer snapshot at offset 165 (kafka.log.ProducerStateManager) [2022-12-13 07:12:20,378] INFO [Log partition=fp_sg_flow_copy-1, dir=/home/admin/output/kafka-logs] Loading producer state from offset 165 with message format version 2 (kafka.lo g.Log) [2022-12-13 07:12:20,381] INFO [Log partition=fp_sg_flow_copy-1, dir=/home/admin/output/kafka-logs] Completed load of log with 1 segments, log start offset 165 and log end offset 165 in 13 ms (kafka.log.Log) [2022-12-13 07:12:20,389] ERROR [KafkaServer id=1] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) java.lang.NumberFormatException: For input string: "index" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:589) at java.lang.Long.parseLong(Long.java:631) at scala.collection.immutable.StringLike.toLong(StringLike.scala:306) at scala.collection.immutable.StringLike.toLong$(StringLike.scala:306) at scala.collection.immutable.StringOps.toLong(StringOps.scala:29) at kafka.log.Log$.offsetFromFile(Log.scala:1846) at kafka.log.Log.$anonfun$loadSegmentFiles$3(Log.scala:331) at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:789) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:788) at kafka.log.Log.loadSegmentFiles(Log.scala:320) at kafka.log.Log.loadSegments(Log.scala:403) at kafka.log.Log.(Log.scala:216) at kafka.log.Log$.apply(Log.scala:1748) at kafka.log.LogManager.loadLog(LogManager.scala:265) at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:335) at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) [2022-12-13 07:12:20,401] INFO [KafkaServer id=1] shutting down (kafka.server.KafkaServer) {code} When I restart the broker, it becomes like this, I deleted the 000165.index file, after starting it, there are still other files with the same error, please tell me how to fix it and what it is causing I tried to open the file 0165.index but it failed !image-2022-12-15-12-26-24-718.png! was: {code:java} [2022-12-13 07:12:20,369] WARN [Log partition=fp_sg_flow_copy-1, dir=/home/admin/output/kafka-logs] Found a corrupted index file corresponding to log file /home/admin/output/kafk a-logs/fp_sg_flow_copy-1/0165.log due to Corrupt index found, index file (/home/admin/output/kafka-logs/fp_sg_flow_copy-1/0165.index) has non-zero size but the last offset is 165 which is no greater than the base offset 165.}, recovering segment and rebuilding index files... (kafka.log.Log) [2022-12-13 07:12:20,369] ERROR There was an error in one of the threads during logs loading: java.lang.NumberFormatException: For input string: "index" (kafka.log.LogManager) [2022-12-13 07:12:20,374] INFO [ProducerStateManager partition=fp_sg_flow_copy-1] Writing producer snapshot at offset 165 (kafka.log.ProducerStateManager) [2022-12-13 07:12:20,378] INFO [Log partition=fp_sg_flow_copy-1, dir=/home/admin/output/kafka-logs] Loading producer state from offset 165 with message format version 2 (kafka.lo g.Log) [2022-12-13 07:12:20,381] INFO [Log partition=fp_sg_flow_c
[jira] [Updated] (KAFKA-14077) KRaft should support recovery from failed disk
[ https://issues.apache.org/jira/browse/KAFKA-14077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-14077: --- Fix Version/s: 3.5.0 (was: 3.4.0) > KRaft should support recovery from failed disk > -- > > Key: KAFKA-14077 > URL: https://issues.apache.org/jira/browse/KAFKA-14077 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jose Armando Garcia Sancio >Priority: Blocker > Fix For: 3.5.0 > > > If one of the nodes in the metadata quorum has a disk failure, there is no > way currently to safely bring the node back into the quorum. When we lose > disk state, we are at risk of losing committed data even if the failure only > affects a minority of the cluster. > Here is an example. Suppose that a metadata quorum has 3 members: v1, v2, and > v3. Initially, v1 is the leader and writes a record at offset 1. After v2 > acknowledges replication of the record, it becomes committed. Suppose that v1 > fails before v3 has a chance to replicate this record. As long as v1 remains > down, the raft protocol guarantees that only v2 can become leader, so the > record cannot be lost. The raft protocol expects that when v1 returns, it > will still have that record, but what if there is a disk failure, the state > cannot be recovered and v1 participates in leader election? Then we would > have committed data on a minority of the voters. The main problem here > concerns how we recover from this impaired state without risking the loss of > this data. > Consider a naive solution which brings v1 back with an empty disk. Since the > node has lost is prior knowledge of the state of the quorum, it will vote for > any candidate that comes along. If v3 becomes a candidate, then it will vote > for itself and it just needs the vote from v1 to become leader. If that > happens, then the committed data on v2 will become lost. > This is just one scenario. In general, the invariants that the raft protocol > is designed to preserve go out the window when disk state is lost. For > example, it is also possible to contrive a scenario where the loss of disk > state leads to multiple leaders. There is a good reason why raft requires > that any vote cast by a voter is written to disk since otherwise the voter > may vote for different candidates in the same epoch. > Many systems solve this problem with a unique identifier which is generated > automatically and stored on disk. This identifier is then committed to the > raft log. If a disk changes, we would see a new identifier and we can prevent > the node from breaking raft invariants. Then recovery from a failed disk > requires a quorum reconfiguration. We need something like this in KRaft to > make disk recovery possible. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-13602) Allow to broadcast a result record
[ https://issues.apache.org/jira/browse/KAFKA-13602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-13602: --- Fix Version/s: 3.5.0 (was: 3.4.0) > Allow to broadcast a result record > -- > > Key: KAFKA-13602 > URL: https://issues.apache.org/jira/browse/KAFKA-13602 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Matthias J. Sax >Assignee: Sagar Rao >Priority: Major > Labels: needs-kip, newbie++ > Fix For: 3.5.0 > > > From time to time, users ask how they can send a record to more than one > partition in a sink topic. Currently, this is only possible by replicate the > message N times before the sink and use a custom partitioner to write the N > messages into the N different partitions. > It might be worth to make this easier and add a new feature for it. There are > multiple options: > * extend `to()` / `addSink()` with a "broadcast" option/config > * add `toAllPartitions()` / `addBroadcastSink()` methods > * allow StreamPartitioner to return `-1` for "all partitions" > * extend `StreamPartitioner` to allow returning more than one partition (ie > a list/collection of integers instead of a single int) > The first three options imply that a "full broadcast" is supported only, so > it's less flexible. On the other hand, it's easier to use (especially the > first two options are easy as they do not require to implement a custom > partitioner). > The last option would be most flexible and also allow for a "partial > broadcast" (aka multi-cast) pattern. It might also be possible to combine two > options, or maye even a totally different idea. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-13602) Allow to broadcast a result record
[ https://issues.apache.org/jira/browse/KAFKA-13602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-13602: --- Labels: kip newbie++ (was: needs-kip newbie++) > Allow to broadcast a result record > -- > > Key: KAFKA-13602 > URL: https://issues.apache.org/jira/browse/KAFKA-13602 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Matthias J. Sax >Assignee: Sagar Rao >Priority: Major > Labels: kip, newbie++ > Fix For: 3.5.0 > > > From time to time, users ask how they can send a record to more than one > partition in a sink topic. Currently, this is only possible by replicate the > message N times before the sink and use a custom partitioner to write the N > messages into the N different partitions. > It might be worth to make this easier and add a new feature for it. There are > multiple options: > * extend `to()` / `addSink()` with a "broadcast" option/config > * add `toAllPartitions()` / `addBroadcastSink()` methods > * allow StreamPartitioner to return `-1` for "all partitions" > * extend `StreamPartitioner` to allow returning more than one partition (ie > a list/collection of integers instead of a single int) > The first three options imply that a "full broadcast" is supported only, so > it's less flexible. On the other hand, it's easier to use (especially the > first two options are easy as they do not require to implement a custom > partitioner). > The last option would be most flexible and also allow for a "partial > broadcast" (aka multi-cast) pattern. It might also be possible to combine two > options, or maye even a totally different idea. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (KAFKA-13602) Allow to broadcast a result record
[ https://issues.apache.org/jira/browse/KAFKA-13602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman reopened KAFKA-13602: Reverting in 3.4 due to logging-related perf/security issue, and some open semantic questions. Retargeting for 3.5 > Allow to broadcast a result record > -- > > Key: KAFKA-13602 > URL: https://issues.apache.org/jira/browse/KAFKA-13602 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Matthias J. Sax >Assignee: Sagar Rao >Priority: Major > Labels: needs-kip, newbie++ > Fix For: 3.4.0 > > > From time to time, users ask how they can send a record to more than one > partition in a sink topic. Currently, this is only possible by replicate the > message N times before the sink and use a custom partitioner to write the N > messages into the N different partitions. > It might be worth to make this easier and add a new feature for it. There are > multiple options: > * extend `to()` / `addSink()` with a "broadcast" option/config > * add `toAllPartitions()` / `addBroadcastSink()` methods > * allow StreamPartitioner to return `-1` for "all partitions" > * extend `StreamPartitioner` to allow returning more than one partition (ie > a list/collection of integers instead of a single int) > The first three options imply that a "full broadcast" is supported only, so > it's less flexible. On the other hand, it's easier to use (especially the > first two options are easy as they do not require to implement a custom > partitioner). > The last option would be most flexible and also allow for a "partial > broadcast" (aka multi-cast) pattern. It might also be possible to combine two > options, or maye even a totally different idea. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-12319) Flaky test ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit()
[ https://issues.apache.org/jira/browse/KAFKA-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647830#comment-17647830 ] A. Sophie Blee-Goldman commented on KAFKA-12319: bumping this to 3.5.0 as we are past code freeze for 3.4 > Flaky test > ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit() > - > > Key: KAFKA-12319 > URL: https://issues.apache.org/jira/browse/KAFKA-12319 > Project: Kafka > Issue Type: Test >Reporter: Justine Olshan >Assignee: Divij Vaidya >Priority: Major > Labels: flaky-test > Fix For: 3.5.0 > > > I've seen this test fail a few times locally. But recently I saw it fail on a > PR build on Jenkins. > > [https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-10041/7/testReport/junit/kafka.network/ConnectionQuotasTest/Build___JDK_11___testListenerConnectionRateLimitWhenActualRateAboveLimit__/] > h3. Error Message > java.util.concurrent.ExecutionException: org.opentest4j.AssertionFailedError: > Expected rate (30 +- 7), but got 37.436825357209706 (600 connections / 16.027 > sec) ==> expected: <30.0> but was: <37.436825357209706> > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-8280) Flaky Test DynamicBrokerReconfigurationTest#testUncleanLeaderElectionEnable
[ https://issues.apache.org/jira/browse/KAFKA-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-8280: -- Fix Version/s: 3.5.0 (was: 3.4.0) > Flaky Test DynamicBrokerReconfigurationTest#testUncleanLeaderElectionEnable > --- > > Key: KAFKA-8280 > URL: https://issues.apache.org/jira/browse/KAFKA-8280 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: John Roesler >Priority: Blocker > Labels: flakey > Fix For: 3.5.0 > > > I saw this fail again on > https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/3979/testReport/junit/kafka.server/DynamicBrokerReconfigurationTest/testUncleanLeaderElectionEnable/ > {noformat} > Error Message > java.lang.AssertionError: Unclean leader not elected > Stacktrace > java.lang.AssertionError: Unclean leader not elected > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > kafka.server.DynamicBrokerReconfigurationTest.testUncleanLeaderElectionEnable(DynamicBrokerReconfigurationTest.scala:510) > {noformat} > {noformat} > Standard Output > Completed Updating config for entity: brokers '0'. > Completed Updating config for entity: brokers '1'. > Completed Updating config for entity: brokers '2'. > [2019-04-23 01:17:11,690] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=1] Error for partition testtopic-6 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-04-23 01:17:11,690] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=1] Error for partition testtopic-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-04-23 01:17:11,722] ERROR [ReplicaFetcher replicaId=2, leaderId=1, > fetcherId=0] Error for partition testtopic-7 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-04-23 01:17:11,722] ERROR [ReplicaFetcher replicaId=2, leaderId=1, > fetcherId=0] Error for partition testtopic-1 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-04-23 01:17:11,760] ERROR [ReplicaFetcher replicaId=2, leaderId=0, > fetcherId=1] Error for partition testtopic-6 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-04-23 01:17:11,761] ERROR [ReplicaFetcher replicaId=2, leaderId=0, > fetcherId=1] Error for partition testtopic-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-04-23 01:17:11,765] ERROR [ReplicaFetcher replicaId=2, leaderId=0, > fetcherId=0] Error for partition testtopic-3 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-04-23 01:17:11,765] ERROR [ReplicaFetcher replicaId=2, leaderId=0, > fetcherId=0] Error for partition testtopic-9 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-04-23 01:17:13,779] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=1] Error for partition __consumer_offsets-8 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-04-23 01:17:13,780] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=1] Error for partition __consumer_offsets-38 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-04-23 01:17:13,780] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=1] Error for partition __consumer_offsets-2 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-04-23 01:17:13,780] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=1] Error for partition __consumer_offsets-14 at offset 0 > (kafka.server.ReplicaFetcherThread:76) >
[jira] [Updated] (KAFKA-6527) Transient failure in DynamicBrokerReconfigurationTest.testDefaultTopicConfig
[ https://issues.apache.org/jira/browse/KAFKA-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-6527: -- Fix Version/s: 3.5.0 (was: 3.4.0) > Transient failure in DynamicBrokerReconfigurationTest.testDefaultTopicConfig > > > Key: KAFKA-6527 > URL: https://issues.apache.org/jira/browse/KAFKA-6527 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > Labels: flakey > Fix For: 3.5.0 > > > {code:java} > java.lang.AssertionError: Log segment size increase not applied > at kafka.utils.TestUtils$.fail(TestUtils.scala:355) > at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:865) > at > kafka.server.DynamicBrokerReconfigurationTest.testDefaultTopicConfig(DynamicBrokerReconfigurationTest.scala:348) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-7957) Flaky Test DynamicBrokerReconfigurationTest#testMetricsReporterUpdate
[ https://issues.apache.org/jira/browse/KAFKA-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-7957: -- Fix Version/s: 3.5.0 (was: 3.4.0) > Flaky Test DynamicBrokerReconfigurationTest#testMetricsReporterUpdate > - > > Key: KAFKA-7957 > URL: https://issues.apache.org/jira/browse/KAFKA-7957 > Project: Kafka > Issue Type: Bug > Components: core, unit tests >Affects Versions: 2.2.0 >Reporter: Matthias J. Sax >Priority: Blocker > Labels: flaky-test > Fix For: 3.5.0 > > > To get stable nightly builds for `2.2` release, I create tickets for all > observed test failures. > [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/18/] > {quote}java.lang.AssertionError: Messages not sent at > kafka.utils.TestUtils$.fail(TestUtils.scala:356) at > kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:766) at > kafka.server.DynamicBrokerReconfigurationTest.startProduceConsume(DynamicBrokerReconfigurationTest.scala:1270) > at > kafka.server.DynamicBrokerReconfigurationTest.testMetricsReporterUpdate(DynamicBrokerReconfigurationTest.scala:650){quote} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-12319) Flaky test ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit()
[ https://issues.apache.org/jira/browse/KAFKA-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-12319: --- Fix Version/s: 3.5.0 (was: 3.4.0) > Flaky test > ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit() > - > > Key: KAFKA-12319 > URL: https://issues.apache.org/jira/browse/KAFKA-12319 > Project: Kafka > Issue Type: Test >Reporter: Justine Olshan >Assignee: Divij Vaidya >Priority: Major > Labels: flaky-test > Fix For: 3.5.0 > > > I've seen this test fail a few times locally. But recently I saw it fail on a > PR build on Jenkins. > > [https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-10041/7/testReport/junit/kafka.network/ConnectionQuotasTest/Build___JDK_11___testListenerConnectionRateLimitWhenActualRateAboveLimit__/] > h3. Error Message > java.util.concurrent.ExecutionException: org.opentest4j.AssertionFailedError: > Expected rate (30 +- 7), but got 37.436825357209706 (600 connections / 16.027 > sec) ==> expected: <30.0> but was: <37.436825357209706> > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-13421) Fix ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
[ https://issues.apache.org/jira/browse/KAFKA-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647828#comment-17647828 ] A. Sophie Blee-Goldman commented on KAFKA-13421: bumping this to 3.5.0 as we are past code freeze for 3.4 > Fix > ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup > - > > Key: KAFKA-13421 > URL: https://issues.apache.org/jira/browse/KAFKA-13421 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: Philip Nee >Priority: Blocker > Fix For: 3.5.0 > > > ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup > is failing with this error: > {code} > ConsumerBounceTest > testSubscribeWhenTopicUnavailable() PASSED > kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup() > failed, log available in > /home/cmccabe/src/kafka9/core/build/reports/testOutput/kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup().test.stdout > > > ConsumerBounceTest > > testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup() > FAILED > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode > = NodeExists > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:126) > > at > kafka.zk.KafkaZkClient$CheckedEphemeral.getAfterNodeExists(KafkaZkClient.scala:1904) > > at > kafka.zk.KafkaZkClient$CheckedEphemeral.create(KafkaZkClient.scala:1842) > at > kafka.zk.KafkaZkClient.checkedEphemeralCreate(KafkaZkClient.scala:1809) > at kafka.zk.KafkaZkClient.registerBroker(KafkaZkClient.scala:96) > at kafka.server.KafkaServer.startup(KafkaServer.scala:320) > at > kafka.integration.KafkaServerTestHarness.$anonfun$restartDeadBrokers$2(KafkaServerTestHarness.scala:2 > 12) > at > scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.scala:18) > at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563) > at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561) > at scala.collection.AbstractIterable.foreach(Iterable.scala:919) > at scala.collection.IterableOps$WithFilter.foreach(Iterable.scala:889) > at > kafka.integration.KafkaServerTestHarness.restartDeadBrokers(KafkaServerTestHarness.scala:203) > at > kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsB > igGroup$1(ConsumerBounceTest.scala:327) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190) > at > kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(C > onsumerBounceTest.scala:319) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-13421) Fix ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
[ https://issues.apache.org/jira/browse/KAFKA-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-13421: --- Fix Version/s: 3.5.0 (was: 3.4.0) > Fix > ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup > - > > Key: KAFKA-13421 > URL: https://issues.apache.org/jira/browse/KAFKA-13421 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: Philip Nee >Priority: Blocker > Fix For: 3.5.0 > > > ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup > is failing with this error: > {code} > ConsumerBounceTest > testSubscribeWhenTopicUnavailable() PASSED > kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup() > failed, log available in > /home/cmccabe/src/kafka9/core/build/reports/testOutput/kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup().test.stdout > > > ConsumerBounceTest > > testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup() > FAILED > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode > = NodeExists > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:126) > > at > kafka.zk.KafkaZkClient$CheckedEphemeral.getAfterNodeExists(KafkaZkClient.scala:1904) > > at > kafka.zk.KafkaZkClient$CheckedEphemeral.create(KafkaZkClient.scala:1842) > at > kafka.zk.KafkaZkClient.checkedEphemeralCreate(KafkaZkClient.scala:1809) > at kafka.zk.KafkaZkClient.registerBroker(KafkaZkClient.scala:96) > at kafka.server.KafkaServer.startup(KafkaServer.scala:320) > at > kafka.integration.KafkaServerTestHarness.$anonfun$restartDeadBrokers$2(KafkaServerTestHarness.scala:2 > 12) > at > scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.scala:18) > at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563) > at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561) > at scala.collection.AbstractIterable.foreach(Iterable.scala:919) > at scala.collection.IterableOps$WithFilter.foreach(Iterable.scala:889) > at > kafka.integration.KafkaServerTestHarness.restartDeadBrokers(KafkaServerTestHarness.scala:203) > at > kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsB > igGroup$1(ConsumerBounceTest.scala:327) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190) > at > kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(C > onsumerBounceTest.scala:319) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-13736) Flaky kafka.network.SocketServerTest.closingChannelWithBufferedReceives
[ https://issues.apache.org/jira/browse/KAFKA-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman resolved KAFKA-13736. Resolution: Fixed > Flaky kafka.network.SocketServerTest.closingChannelWithBufferedReceives > --- > > Key: KAFKA-13736 > URL: https://issues.apache.org/jira/browse/KAFKA-13736 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Priority: Blocker > Labels: flakey, flaky-test > > Examples: > https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-11895/1/tests > {code} > java.lang.AssertionError: receiveRequest timed out > at > kafka.network.SocketServerTest.receiveRequest(SocketServerTest.scala:140) > at > kafka.network.SocketServerTest.$anonfun$verifyRemoteCloseWithBufferedReceives$6(SocketServerTest.scala:1521) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) > at > kafka.network.SocketServerTest.$anonfun$verifyRemoteCloseWithBufferedReceives$1(SocketServerTest.scala:1520) > at > kafka.network.SocketServerTest.verifyRemoteCloseWithBufferedReceives(SocketServerTest.scala:1483) > at > kafka.network.SocketServerTest.closingChannelWithBufferedReceives(SocketServerTest.scala:1431) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-13736) Flaky kafka.network.SocketServerTest.closingChannelWithBufferedReceives
[ https://issues.apache.org/jira/browse/KAFKA-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-13736: --- Fix Version/s: (was: 3.4.0) > Flaky kafka.network.SocketServerTest.closingChannelWithBufferedReceives > --- > > Key: KAFKA-13736 > URL: https://issues.apache.org/jira/browse/KAFKA-13736 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Priority: Blocker > Labels: flakey, flaky-test > > Examples: > https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-11895/1/tests > {code} > java.lang.AssertionError: receiveRequest timed out > at > kafka.network.SocketServerTest.receiveRequest(SocketServerTest.scala:140) > at > kafka.network.SocketServerTest.$anonfun$verifyRemoteCloseWithBufferedReceives$6(SocketServerTest.scala:1521) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) > at > kafka.network.SocketServerTest.$anonfun$verifyRemoteCloseWithBufferedReceives$1(SocketServerTest.scala:1520) > at > kafka.network.SocketServerTest.verifyRemoteCloseWithBufferedReceives(SocketServerTest.scala:1483) > at > kafka.network.SocketServerTest.closingChannelWithBufferedReceives(SocketServerTest.scala:1431) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-13950) Resource leak at multiple places in the code
[ https://issues.apache.org/jira/browse/KAFKA-13950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647827#comment-17647827 ] A. Sophie Blee-Goldman commented on KAFKA-13950: Bumping this to 3.5.0 as we are past code freeze for 3.4 > Resource leak at multiple places in the code > > > Key: KAFKA-13950 > URL: https://issues.apache.org/jira/browse/KAFKA-13950 > Project: Kafka > Issue Type: Bug > Components: clients, kraft, streams, unit tests >Reporter: Divij Vaidya >Assignee: Divij Vaidya >Priority: Major > Fix For: 3.4.0 > > > I ran Amazon CodeGuru reviewer on Apache Kafka's code base and the code tool > detected various places where Closable resources are not being closed > properly leading to leaks. > This task will fix the resource leak detected at multiple places. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-13950) Resource leak at multiple places in the code
[ https://issues.apache.org/jira/browse/KAFKA-13950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-13950: --- Fix Version/s: 3.5.0 (was: 3.4.0) > Resource leak at multiple places in the code > > > Key: KAFKA-13950 > URL: https://issues.apache.org/jira/browse/KAFKA-13950 > Project: Kafka > Issue Type: Bug > Components: clients, kraft, streams, unit tests >Reporter: Divij Vaidya >Assignee: Divij Vaidya >Priority: Major > Fix For: 3.5.0 > > > I ran Amazon CodeGuru reviewer on Apache Kafka's code base and the code tool > detected various places where Closable resources are not being closed > properly leading to leaks. > This task will fix the resource leak detected at multiple places. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14186) Add unit tests for BatchFileWriter
[ https://issues.apache.org/jira/browse/KAFKA-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647826#comment-17647826 ] A. Sophie Blee-Goldman commented on KAFKA-14186: [~mumrah] bumping this to 3.5.0 as we are past code freeze for 3.4 > Add unit tests for BatchFileWriter > -- > > Key: KAFKA-14186 > URL: https://issues.apache.org/jira/browse/KAFKA-14186 > Project: Kafka > Issue Type: Test >Reporter: David Arthur >Priority: Minor > Fix For: 3.4.0 > > > We have integration tests that cover this class, but no direct unit tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14241) Implement the snapshot cleanup policy
[ https://issues.apache.org/jira/browse/KAFKA-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-14241: --- Fix Version/s: 3.5.0 (was: 3.4.0) > Implement the snapshot cleanup policy > - > > Key: KAFKA-14241 > URL: https://issues.apache.org/jira/browse/KAFKA-14241 > Project: Kafka > Issue Type: Sub-task > Components: kraft >Reporter: Jose Armando Garcia Sancio >Assignee: Jose Armando Garcia Sancio >Priority: Major > Fix For: 3.5.0 > > > It looks like delete policy needs to be set to either delete or compact: > {code:java} > .define(CleanupPolicyProp, LIST, Defaults.CleanupPolicy, > ValidList.in(LogConfig.Compact, LogConfig.Delete), MEDIUM, CompactDoc, > KafkaConfig.LogCleanupPolicyProp) > {code} > Neither is correct for KRaft topics. KIP-630 talks about adding a third > policy called snapshot: > {code:java} > The __cluster_metadata topic will have snapshot as the cleanup.policy. {code} > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-630%3A+Kafka+Raft+Snapshot#KIP630:KafkaRaftSnapshot-ProposedChanges] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14186) Add unit tests for BatchFileWriter
[ https://issues.apache.org/jira/browse/KAFKA-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-14186: --- Fix Version/s: 3.5.0 (was: 3.4.0) > Add unit tests for BatchFileWriter > -- > > Key: KAFKA-14186 > URL: https://issues.apache.org/jira/browse/KAFKA-14186 > Project: Kafka > Issue Type: Test >Reporter: David Arthur >Priority: Minor > Fix For: 3.5.0 > > > We have integration tests that cover this class, but no direct unit tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-13588) We should consolidate `changelogFor` methods to simplify the generation of internal topic names
[ https://issues.apache.org/jira/browse/KAFKA-13588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-13588: --- Fix Version/s: 4.0.0 (was: 3.4.0) > We should consolidate `changelogFor` methods to simplify the generation of > internal topic names > --- > > Key: KAFKA-13588 > URL: https://issues.apache.org/jira/browse/KAFKA-13588 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Walker Carlson >Assignee: Guozhang Wang >Priority: Minor > Labels: newbie > Fix For: 4.0.0 > > > [https://github.com/apache/kafka/pull/11611#discussion_r772625486] > we should use `ProcessorContextUtils#changelogFor` after we remove > `init(final ProcessorContext context, final StateStore root)` in > `CahceingWindowStore#initInternal` — this will happen in around Dec.2022, > which is around 3.4.0 > > Or any other place that we generate an internal topic name. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-7109) KafkaConsumer should close its incremental fetch sessions on close
[ https://issues.apache.org/jira/browse/KAFKA-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647825#comment-17647825 ] A. Sophie Blee-Goldman commented on KAFKA-7109: --- bumping this to 3.5.0 as we are past code freeze for 3.4 > KafkaConsumer should close its incremental fetch sessions on close > -- > > Key: KAFKA-7109 > URL: https://issues.apache.org/jira/browse/KAFKA-7109 > Project: Kafka > Issue Type: Improvement >Reporter: Colin McCabe >Assignee: Divij Vaidya >Priority: Minor > Labels: new-consumer-threading-should-fix > Fix For: 3.5.0 > > > KafkaConsumer should close its incremental fetch sessions on close. > Currently, the sessions are not closed, but simply time out once the consumer > is gone. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-7109) KafkaConsumer should close its incremental fetch sessions on close
[ https://issues.apache.org/jira/browse/KAFKA-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-7109: -- Fix Version/s: 3.5.0 (was: 3.4.0) > KafkaConsumer should close its incremental fetch sessions on close > -- > > Key: KAFKA-7109 > URL: https://issues.apache.org/jira/browse/KAFKA-7109 > Project: Kafka > Issue Type: Improvement >Reporter: Colin McCabe >Assignee: Divij Vaidya >Priority: Minor > Labels: new-consumer-threading-should-fix > Fix For: 3.5.0 > > > KafkaConsumer should close its incremental fetch sessions on close. > Currently, the sessions are not closed, but simply time out once the consumer > is gone. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-4852) ByteBufferSerializer not compatible with offsets
[ https://issues.apache.org/jira/browse/KAFKA-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman resolved KAFKA-4852. --- Resolution: Fixed > ByteBufferSerializer not compatible with offsets > > > Key: KAFKA-4852 > URL: https://issues.apache.org/jira/browse/KAFKA-4852 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.10.1.1 > Environment: all >Reporter: Werner Daehn >Assignee: LinShunkang >Priority: Minor > Fix For: 3.4.0 > > > Quick intro: A ByteBuffer.rewind() resets the position to zero. What if the > ByteBuffer was created with an offset? new ByteBuffer(data, 3, 10)? The > ByteBufferSerializer will send from pos=0 and not from pos=3 onwards. > Solution: No rewind() but flip() for reading a ByteBuffer. That's what the > flip is meant for. > Story: > Imagine the incoming data comes from a byte[], e.g. a network stream > containing topicname, partition, key, value, ... and you want to create a new > ProducerRecord for that. As the constructor of ProducerRecord requires > (topic, partition, key, value) you have to copy from above byte[] the key and > value. That means there is a memcopy taking place. Since the payload can be > potentially large, that introduces a lot of overhead. Twice the memory. > A nice solution to this problem is to simply wrap the network byte[] into new > ByteBuffers: > ByteBuffer key = ByteBuffer.wrap(data, keystart, keylength); > ByteBuffer value = ByteBuffer.wrap(data, valuestart, valuelength); > and then use the ByteBufferSerializer instead of the ByteArraySerializer. > But that does not work as the ByteBufferSerializer does a rewind(), hence > both, key and value, will start at position=0 of the data[]. > public class ByteBufferSerializer implements Serializer { > public byte[] serialize(String topic, ByteBuffer data) { > data.rewind(); -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-4852) ByteBufferSerializer not compatible with offsets
[ https://issues.apache.org/jira/browse/KAFKA-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647824#comment-17647824 ] A. Sophie Blee-Goldman commented on KAFKA-4852: --- [~LSK] [~wdaehn] is there a reason this ticket was not resolved, or was that just an oversight? I'm going to close it since the PR seems to have been merged, if there was some newly found issue please reopen this or file a new issue if applicable > ByteBufferSerializer not compatible with offsets > > > Key: KAFKA-4852 > URL: https://issues.apache.org/jira/browse/KAFKA-4852 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.10.1.1 > Environment: all >Reporter: Werner Daehn >Assignee: LinShunkang >Priority: Minor > Fix For: 3.4.0 > > > Quick intro: A ByteBuffer.rewind() resets the position to zero. What if the > ByteBuffer was created with an offset? new ByteBuffer(data, 3, 10)? The > ByteBufferSerializer will send from pos=0 and not from pos=3 onwards. > Solution: No rewind() but flip() for reading a ByteBuffer. That's what the > flip is meant for. > Story: > Imagine the incoming data comes from a byte[], e.g. a network stream > containing topicname, partition, key, value, ... and you want to create a new > ProducerRecord for that. As the constructor of ProducerRecord requires > (topic, partition, key, value) you have to copy from above byte[] the key and > value. That means there is a memcopy taking place. Since the payload can be > potentially large, that introduces a lot of overhead. Twice the memory. > A nice solution to this problem is to simply wrap the network byte[] into new > ByteBuffers: > ByteBuffer key = ByteBuffer.wrap(data, keystart, keylength); > ByteBuffer value = ByteBuffer.wrap(data, valuestart, valuelength); > and then use the ByteBufferSerializer instead of the ByteArraySerializer. > But that does not work as the ByteBufferSerializer does a rewind(), hence > both, key and value, will start at position=0 of the data[]. > public class ByteBufferSerializer implements Serializer { > public byte[] serialize(String topic, ByteBuffer data) { > data.rewind(); -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14418) Add safety checks for modifying partitions of __consumer_offsets
[ https://issues.apache.org/jira/browse/KAFKA-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647822#comment-17647822 ] A. Sophie Blee-Goldman commented on KAFKA-14418: [~divijvaidya] I just went ahead and removed the target version for this one since we are past code freeze for 3.4 and it didn't seem clear whether to target 3.5 based on your PR comment > Add safety checks for modifying partitions of __consumer_offsets > > > Key: KAFKA-14418 > URL: https://issues.apache.org/jira/browse/KAFKA-14418 > Project: Kafka > Issue Type: Improvement >Reporter: Divij Vaidya >Priority: Major > > Today a user can change the number of partitions of > {{{}{}}}{{{}_consumer_{}}}{{{}_{}}}{{{}offsets{}}} topic by changing the > configuration value for {{offsets.topic.num.partitions}} or manually by using > CreatePartition API. For example, the following command changes the number of > partitions to 60 for __consumer_offsets topic. > {code:java} > ./bin/kafka-topics.sh --topic __consumer_offsets --alter --partitions 60 > --bootstrap-server=localhost:9092{code} > Changing offsets of this reserved partition leads to problems with consumer > groups unless you restart all brokers. Thus, there is a high probability > that is this operations is not done right, users may shoot themselves in the > foot. Example scenario: > [https://stackoverflow.com/questions/73944561/kafka-consumer-group-coordinator-inconsistent] > > To remedy this, I propose the following changes: > 1. `kafka-topic.sh` should explicitly block adding new partitions to internal > topics. > 2. Add an operational guide to the docs for safely modifying partitions of > __consumer_offsets at [https://kafka.apache.org/documentation.html#basic_ops] > 3. when \{{offsets.topic.num.partitions }}is modified, it should update the > value of groupPartitionCount at > [https://github.com/apache/kafka/blob/3.3.1/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L192] > without a broker startup -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14318) KIP-878: Autoscaling for Statically Partitioned Streams
[ https://issues.apache.org/jira/browse/KAFKA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-14318: --- Fix Version/s: 3.5.0 (was: 3.4.0) > KIP-878: Autoscaling for Statically Partitioned Streams > --- > > Key: KAFKA-14318 > URL: https://issues.apache.org/jira/browse/KAFKA-14318 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: A. Sophie Blee-Goldman >Assignee: A. Sophie Blee-Goldman >Priority: Major > Labels: needs-kip > Fix For: 3.5.0 > > > [KIP-878: Autoscaling for Statically Partitioned > Streams|https://cwiki.apache.org/confluence/display/KAFKA/KIP-878%3A+Autoscaling+for+Statically+Partitioned+Streams] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14418) Add safety checks for modifying partitions of __consumer_offsets
[ https://issues.apache.org/jira/browse/KAFKA-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-14418: --- Fix Version/s: (was: 3.4.0) > Add safety checks for modifying partitions of __consumer_offsets > > > Key: KAFKA-14418 > URL: https://issues.apache.org/jira/browse/KAFKA-14418 > Project: Kafka > Issue Type: Improvement >Reporter: Divij Vaidya >Priority: Major > > Today a user can change the number of partitions of > {{{}{}}}{{{}_consumer_{}}}{{{}_{}}}{{{}offsets{}}} topic by changing the > configuration value for {{offsets.topic.num.partitions}} or manually by using > CreatePartition API. For example, the following command changes the number of > partitions to 60 for __consumer_offsets topic. > {code:java} > ./bin/kafka-topics.sh --topic __consumer_offsets --alter --partitions 60 > --bootstrap-server=localhost:9092{code} > Changing offsets of this reserved partition leads to problems with consumer > groups unless you restart all brokers. Thus, there is a high probability > that is this operations is not done right, users may shoot themselves in the > foot. Example scenario: > [https://stackoverflow.com/questions/73944561/kafka-consumer-group-coordinator-inconsistent] > > To remedy this, I propose the following changes: > 1. `kafka-topic.sh` should explicitly block adding new partitions to internal > topics. > 2. Add an operational guide to the docs for safely modifying partitions of > __consumer_offsets at [https://kafka.apache.org/documentation.html#basic_ops] > 3. when \{{offsets.topic.num.partitions }}is modified, it should update the > value of groupPartitionCount at > [https://github.com/apache/kafka/blob/3.3.1/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L192] > without a broker startup -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14318) KIP-878: Autoscaling for Statically Partitioned Streams
[ https://issues.apache.org/jira/browse/KAFKA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-14318: --- Labels: kip (was: needs-kip) > KIP-878: Autoscaling for Statically Partitioned Streams > --- > > Key: KAFKA-14318 > URL: https://issues.apache.org/jira/browse/KAFKA-14318 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: A. Sophie Blee-Goldman >Assignee: A. Sophie Blee-Goldman >Priority: Major > Labels: kip > Fix For: 3.5.0 > > > [KIP-878: Autoscaling for Statically Partitioned > Streams|https://cwiki.apache.org/confluence/display/KAFKA/KIP-878%3A+Autoscaling+for+Statically+Partitioned+Streams] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14302) Infinite probing rebalance if a changelog topic got emptied
[ https://issues.apache.org/jira/browse/KAFKA-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-14302: --- Fix Version/s: 3.5.0 (was: 3.4.0) > Infinite probing rebalance if a changelog topic got emptied > --- > > Key: KAFKA-14302 > URL: https://issues.apache.org/jira/browse/KAFKA-14302 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 3.3.1 >Reporter: Damien Gasparina >Priority: Major > Fix For: 3.5.0 > > Attachments: image-2022-10-14-12-04-01-190.png, logs.tar.gz2 > > > If a store, with a changelog topic, has been fully emptied, it could generate > infinite probing rebalance. > > The scenario is the following: > * A Kafka Streams application, deployed on many instances, have a store with > a changelog > * Many entries are pushed into the changelog, thus the Log end Offset is > high, let's say 20,000 > * Then, the store got emptied, either due to data retention (windowing) or > tombstone > * Then an instance of the application is restarted, and its local disk is > deleted (e.g. Kubernetes without Persistent Volume) > * After restart, the application restores the store from the changelog, but > does not write a checkpoint file as there are no data > * As there are no checkpoint entries, this instance specify a taskOffsetSums > with offset set to 0 in the subscriptionUserData > * The group leader, during the assignment, then compute a lag of 20,000 (end > offsets - task offset), which is greater than the default acceptable lag, > thus decide to schedule a probing rebalance > * In ther next probing rebalance, nothing changed, so... new probing > rebalance > > I was able to reproduce locally with a simple topology: > > {code:java} > var table = streamsBuilder.stream("table"); > streamsBuilder > .stream("stream") > .join(table, (eSt, eTb) -> eSt.toString() + eTb.toString(), > JoinWindows.of(Duration.ofSeconds(5))) > .to("output");{code} > > > > Due to this issue, application having an empty changelog are experiencing > frequent rebalance: > !image-2022-10-14-12-04-01-190.png! > > With assignments similar to: > {code:java} > [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3] INFO > org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor - > stream-thread > [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3-consumer] > Assigned tasks [0_5, 0_4, 0_3, 0_2, 0_1, 0_0] including stateful [0_5, 0_4, > 0_3, 0_2, 0_1, 0_0] to clients as: > d0e2d556-2587-48e8-b9ab-43a4e8207be6=[activeTasks: ([]) standbyTasks: ([0_0, > 0_1, 0_2, 0_3, 0_4, 0_5])] > 8323d214-4c56-470f-bace-e4291cdf10eb=[activeTasks: ([0_0, 0_1, 0_2, 0_3, 0_4, > 0_5]) standbyTasks: ([])].{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14302) Infinite probing rebalance if a changelog topic got emptied
[ https://issues.apache.org/jira/browse/KAFKA-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647820#comment-17647820 ] A. Sophie Blee-Goldman commented on KAFKA-14302: I did look into this briefly but it was not the straightforward bug I initially assumed from the title. Bumping it to 3.5 since we are past code freeze for 3.4 > Infinite probing rebalance if a changelog topic got emptied > --- > > Key: KAFKA-14302 > URL: https://issues.apache.org/jira/browse/KAFKA-14302 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 3.3.1 >Reporter: Damien Gasparina >Priority: Major > Fix For: 3.5.0 > > Attachments: image-2022-10-14-12-04-01-190.png, logs.tar.gz2 > > > If a store, with a changelog topic, has been fully emptied, it could generate > infinite probing rebalance. > > The scenario is the following: > * A Kafka Streams application, deployed on many instances, have a store with > a changelog > * Many entries are pushed into the changelog, thus the Log end Offset is > high, let's say 20,000 > * Then, the store got emptied, either due to data retention (windowing) or > tombstone > * Then an instance of the application is restarted, and its local disk is > deleted (e.g. Kubernetes without Persistent Volume) > * After restart, the application restores the store from the changelog, but > does not write a checkpoint file as there are no data > * As there are no checkpoint entries, this instance specify a taskOffsetSums > with offset set to 0 in the subscriptionUserData > * The group leader, during the assignment, then compute a lag of 20,000 (end > offsets - task offset), which is greater than the default acceptable lag, > thus decide to schedule a probing rebalance > * In ther next probing rebalance, nothing changed, so... new probing > rebalance > > I was able to reproduce locally with a simple topology: > > {code:java} > var table = streamsBuilder.stream("table"); > streamsBuilder > .stream("stream") > .join(table, (eSt, eTb) -> eSt.toString() + eTb.toString(), > JoinWindows.of(Duration.ofSeconds(5))) > .to("output");{code} > > > > Due to this issue, application having an empty changelog are experiencing > frequent rebalance: > !image-2022-10-14-12-04-01-190.png! > > With assignments similar to: > {code:java} > [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3] INFO > org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor - > stream-thread > [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3-consumer] > Assigned tasks [0_5, 0_4, 0_3, 0_2, 0_1, 0_0] including stateful [0_5, 0_4, > 0_3, 0_2, 0_1, 0_0] to clients as: > d0e2d556-2587-48e8-b9ab-43a4e8207be6=[activeTasks: ([]) standbyTasks: ([0_0, > 0_1, 0_2, 0_3, 0_4, 0_5])] > 8323d214-4c56-470f-bace-e4291cdf10eb=[activeTasks: ([0_0, 0_1, 0_2, 0_3, 0_4, > 0_5]) standbyTasks: ([])].{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14241) Implement the snapshot cleanup policy
[ https://issues.apache.org/jira/browse/KAFKA-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647819#comment-17647819 ] A. Sophie Blee-Goldman commented on KAFKA-14241: [~jagsancio] bumping this to 3.5.0 as we are past code freeze for 3.4 > Implement the snapshot cleanup policy > - > > Key: KAFKA-14241 > URL: https://issues.apache.org/jira/browse/KAFKA-14241 > Project: Kafka > Issue Type: Sub-task > Components: kraft >Reporter: Jose Armando Garcia Sancio >Assignee: Jose Armando Garcia Sancio >Priority: Major > Fix For: 3.4.0 > > > It looks like delete policy needs to be set to either delete or compact: > {code:java} > .define(CleanupPolicyProp, LIST, Defaults.CleanupPolicy, > ValidList.in(LogConfig.Compact, LogConfig.Delete), MEDIUM, CompactDoc, > KafkaConfig.LogCleanupPolicyProp) > {code} > Neither is correct for KRaft topics. KIP-630 talks about adding a third > policy called snapshot: > {code:java} > The __cluster_metadata topic will have snapshot as the cleanup.policy. {code} > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-630%3A+Kafka+Raft+Snapshot#KIP630:KafkaRaftSnapshot-ProposedChanges] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14139) Replaced disk can lead to loss of committed data even with non-empty ISR
[ https://issues.apache.org/jira/browse/KAFKA-14139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-14139: --- Fix Version/s: 3.5.0 (was: 3.4.0) > Replaced disk can lead to loss of committed data even with non-empty ISR > > > Key: KAFKA-14139 > URL: https://issues.apache.org/jira/browse/KAFKA-14139 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Major > Fix For: 3.5.0 > > > We have been thinking about disk failure cases recently. Suppose that a disk > has failed and the user needs to restart the disk from an empty state. The > concern is whether this can lead to the unnecessary loss of committed data. > For normal topic partitions, removal from the ISR during controlled shutdown > buys us some protection. After the replica is restarted, it must prove its > state to the leader before it can be added back to the ISR. And it cannot > become a leader until it does so. > An obvious exception to this is when the replica is the last member in the > ISR. In this case, the disk failure itself has compromised the committed > data, so some amount of loss must be expected. > We have been considering other scenarios in which the loss of one disk can > lead to data loss even when there are replicas remaining which have all of > the committed entries. One such scenario is this: > Suppose we have a partition with two replicas: A and B. Initially A is the > leader and it is the only member of the ISR. > # Broker B catches up to A, so A attempts to send an AlterPartition request > to the controller to add B into the ISR. > # Before the AlterPartition request is received, replica B has a hard > failure. > # The current controller successfully fences broker B. It takes no action on > this partition since B is already out of the ISR. > # Before the controller receives the AlterPartition request to add B, it > also fails. > # While the new controller is initializing, suppose that replica B finishes > startup, but the disk has been replaced (all of the previous state has been > lost). > # The new controller sees the registration from broker B first. > # Finally, the AlterPartition from A arrives which adds B back into the ISR > even though it has an empty log. > (Credit for coming up with this scenario goes to [~junrao] .) > I tested this in KRaft and confirmed that this sequence is possible (even if > perhaps unlikely). There are a few ways we could have potentially detected > the issue. First, perhaps the leader should have bumped the leader epoch on > all partitions when B was fenced. Then the inflight AlterPartition would be > doomed no matter when it arrived. > Alternatively, we could have relied on the broker epoch to distinguish the > dead broker's state from that of the restarted broker. This could be done by > including the broker epoch in both the `Fetch` request and in > `AlterPartition`. > Finally, perhaps even normal kafka replication should be using a unique > identifier for each disk so that we can reliably detect when it has changed. > For example, something like what was proposed for the metadata quorum here: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Voter+Changes.] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14094) KIP-853: KRaft Voters Change
[ https://issues.apache.org/jira/browse/KAFKA-14094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-14094: --- Fix Version/s: 3.5.0 (was: 3.4.0) > KIP-853: KRaft Voters Change > > > Key: KAFKA-14094 > URL: https://issues.apache.org/jira/browse/KAFKA-14094 > Project: Kafka > Issue Type: Improvement > Components: kraft >Reporter: José Armando García Sancio >Assignee: Jose Armando Garcia Sancio >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14139) Replaced disk can lead to loss of committed data even with non-empty ISR
[ https://issues.apache.org/jira/browse/KAFKA-14139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647817#comment-17647817 ] A. Sophie Blee-Goldman commented on KAFKA-14139: [~hachikuji] bumping this to 3.5.0 as we are past code freeze for 3.4 > Replaced disk can lead to loss of committed data even with non-empty ISR > > > Key: KAFKA-14139 > URL: https://issues.apache.org/jira/browse/KAFKA-14139 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Major > Fix For: 3.4.0 > > > We have been thinking about disk failure cases recently. Suppose that a disk > has failed and the user needs to restart the disk from an empty state. The > concern is whether this can lead to the unnecessary loss of committed data. > For normal topic partitions, removal from the ISR during controlled shutdown > buys us some protection. After the replica is restarted, it must prove its > state to the leader before it can be added back to the ISR. And it cannot > become a leader until it does so. > An obvious exception to this is when the replica is the last member in the > ISR. In this case, the disk failure itself has compromised the committed > data, so some amount of loss must be expected. > We have been considering other scenarios in which the loss of one disk can > lead to data loss even when there are replicas remaining which have all of > the committed entries. One such scenario is this: > Suppose we have a partition with two replicas: A and B. Initially A is the > leader and it is the only member of the ISR. > # Broker B catches up to A, so A attempts to send an AlterPartition request > to the controller to add B into the ISR. > # Before the AlterPartition request is received, replica B has a hard > failure. > # The current controller successfully fences broker B. It takes no action on > this partition since B is already out of the ISR. > # Before the controller receives the AlterPartition request to add B, it > also fails. > # While the new controller is initializing, suppose that replica B finishes > startup, but the disk has been replaced (all of the previous state has been > lost). > # The new controller sees the registration from broker B first. > # Finally, the AlterPartition from A arrives which adds B back into the ISR > even though it has an empty log. > (Credit for coming up with this scenario goes to [~junrao] .) > I tested this in KRaft and confirmed that this sequence is possible (even if > perhaps unlikely). There are a few ways we could have potentially detected > the issue. First, perhaps the leader should have bumped the leader epoch on > all partitions when B was fenced. Then the inflight AlterPartition would be > doomed no matter when it arrived. > Alternatively, we could have relied on the broker epoch to distinguish the > dead broker's state from that of the restarted broker. This could be done by > including the broker epoch in both the `Fetch` request and in > `AlterPartition`. > Finally, perhaps even normal kafka replication should be using a unique > identifier for each disk so that we can reliably detect when it has changed. > For example, something like what was proposed for the metadata quorum here: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Voter+Changes.] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14094) KIP-853: KRaft Voters Change
[ https://issues.apache.org/jira/browse/KAFKA-14094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647816#comment-17647816 ] A. Sophie Blee-Goldman commented on KAFKA-14094: [~jagsancio] bumping this to 3.5.0 as we are past code freeze for 3.4 > KIP-853: KRaft Voters Change > > > Key: KAFKA-14094 > URL: https://issues.apache.org/jira/browse/KAFKA-14094 > Project: Kafka > Issue Type: Improvement > Components: kraft >Reporter: José Armando García Sancio >Assignee: Jose Armando Garcia Sancio >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14291) KRaft: ApiVersionsResponse doesn't have finalizedFeatures and finalizedFeatureEpoch in KRaft mode
[ https://issues.apache.org/jira/browse/KAFKA-14291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman resolved KAFKA-14291. Resolution: Duplicate Marking this as a duplicate of KAFKA-13990 as described above > KRaft: ApiVersionsResponse doesn't have finalizedFeatures and > finalizedFeatureEpoch in KRaft mode > - > > Key: KAFKA-14291 > URL: https://issues.apache.org/jira/browse/KAFKA-14291 > Project: Kafka > Issue Type: Bug > Components: kraft >Reporter: Akhilesh Chaganti >Priority: Critical > > https://github.com/apache/kafka/blob/d834947ae7abc8a9421d741e742200bb36f51fb3/core/src/main/scala/kafka/server/ApiVersionManager.scala#L53 > ``` > class SimpleApiVersionManager( > val listenerType: ListenerType, > val enabledApis: collection.Set[ApiKeys], > brokerFeatures: Features[SupportedVersionRange] > ) extends ApiVersionManager { > def this(listenerType: ListenerType) = { > this(listenerType, ApiKeys.apisForListener(listenerType).asScala, > BrokerFeatures.defaultSupportedFeatures()) > } > private val apiVersions = > ApiVersionsResponse.collectApis(enabledApis.asJava) > override def apiVersionResponse(requestThrottleMs: Int): > ApiVersionsResponse = { > ApiVersionsResponse.createApiVersionsResponse(requestThrottleMs, > apiVersions, brokerFeatures) > } > } > ``` > ApiVersionManager for KRaft doesn't add the finalizedFeatures and > finalizedFeatureEpoch to the ApiVersionsResponse. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14145) Faster propagation of high-watermark in KRaft topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-14145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647815#comment-17647815 ] A. Sophie Blee-Goldman commented on KAFKA-14145: [~jagsancio] moving this to 3.5.0 since we are past code freeze for 3.4 > Faster propagation of high-watermark in KRaft topic partitions > -- > > Key: KAFKA-14145 > URL: https://issues.apache.org/jira/browse/KAFKA-14145 > Project: Kafka > Issue Type: Task > Components: kraft >Reporter: Jose Armando Garcia Sancio >Assignee: Jose Armando Garcia Sancio >Priority: Critical > Fix For: 3.5.0 > > > Typically, the HWM is increase after one round of Fetch requests from the > majority of the replicas. The HWM is propagated after another round of Fetch > requests. If the LEO doesn't change the propagation of the HWM can be delay > by one Fetch wait timeout (500ms). > Looking at the KafkaRaftClient implementation we would have to have an index > for both the fetch offset and the last sent high-watermark for that replica. > Another issue here is that we changed the KafkaRaftManager so that it doesn't > set the replica id when it is an observer/broker. Since the HWM is not part > of the Fetch request the leader would have to keep track of this in the > LeaderState. > {code:java} > val nodeId = if (config.processRoles.contains(ControllerRole)) { > OptionalInt.of(config.nodeId) > } else { > OptionalInt.empty() > } {code} > We would need to find a better solution for > https://issues.apache.org/jira/browse/KAFKA-13168 or improve the FETCH > request so that it includes the HWM. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14145) Faster propagation of high-watermark in KRaft topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-14145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-14145: --- Fix Version/s: 3.5.0 (was: 3.4.0) > Faster propagation of high-watermark in KRaft topic partitions > -- > > Key: KAFKA-14145 > URL: https://issues.apache.org/jira/browse/KAFKA-14145 > Project: Kafka > Issue Type: Task > Components: kraft >Reporter: Jose Armando Garcia Sancio >Assignee: Jose Armando Garcia Sancio >Priority: Critical > Fix For: 3.5.0 > > > Typically, the HWM is increase after one round of Fetch requests from the > majority of the replicas. The HWM is propagated after another round of Fetch > requests. If the LEO doesn't change the propagation of the HWM can be delay > by one Fetch wait timeout (500ms). > Looking at the KafkaRaftClient implementation we would have to have an index > for both the fetch offset and the last sent high-watermark for that replica. > Another issue here is that we changed the KafkaRaftManager so that it doesn't > set the replica id when it is an observer/broker. Since the HWM is not part > of the Fetch request the leader would have to keep track of this in the > LeaderState. > {code:java} > val nodeId = if (config.processRoles.contains(ControllerRole)) { > OptionalInt.of(config.nodeId) > } else { > OptionalInt.empty() > } {code} > We would need to find a better solution for > https://issues.apache.org/jira/browse/KAFKA-13168 or improve the FETCH > request so that it includes the HWM. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-7632) Support Compression Level
[ https://issues.apache.org/jira/browse/KAFKA-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-7632: -- Fix Version/s: (was: 3.4.0) > Support Compression Level > - > > Key: KAFKA-7632 > URL: https://issues.apache.org/jira/browse/KAFKA-7632 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 2.1.0 > Environment: all >Reporter: Dave Waters >Assignee: Dongjin Lee >Priority: Major > Labels: needs-kip > > The compression level for ZSTD is currently set to use the default level (3), > which is a conservative setting that in some use cases eliminates the value > that ZSTD provides with improved compression. Each use case will vary, so > exposing the level as a producer, broker, and topic configuration setting > will allow the user to adjust the level. > Since it applies to the other compression codecs, we should add the same > functionalities to them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14291) KRaft: ApiVersionsResponse doesn't have finalizedFeatures and finalizedFeatureEpoch in KRaft mode
[ https://issues.apache.org/jira/browse/KAFKA-14291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-14291: --- Fix Version/s: (was: 3.4.0) > KRaft: ApiVersionsResponse doesn't have finalizedFeatures and > finalizedFeatureEpoch in KRaft mode > - > > Key: KAFKA-14291 > URL: https://issues.apache.org/jira/browse/KAFKA-14291 > Project: Kafka > Issue Type: Bug > Components: kraft >Reporter: Akhilesh Chaganti >Priority: Critical > > https://github.com/apache/kafka/blob/d834947ae7abc8a9421d741e742200bb36f51fb3/core/src/main/scala/kafka/server/ApiVersionManager.scala#L53 > ``` > class SimpleApiVersionManager( > val listenerType: ListenerType, > val enabledApis: collection.Set[ApiKeys], > brokerFeatures: Features[SupportedVersionRange] > ) extends ApiVersionManager { > def this(listenerType: ListenerType) = { > this(listenerType, ApiKeys.apisForListener(listenerType).asScala, > BrokerFeatures.defaultSupportedFeatures()) > } > private val apiVersions = > ApiVersionsResponse.collectApis(enabledApis.asJava) > override def apiVersionResponse(requestThrottleMs: Int): > ApiVersionsResponse = { > ApiVersionsResponse.createApiVersionsResponse(requestThrottleMs, > apiVersions, brokerFeatures) > } > } > ``` > ApiVersionManager for KRaft doesn't add the finalizedFeatures and > finalizedFeatureEpoch to the ApiVersionsResponse. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-7632) Support Compression Level
[ https://issues.apache.org/jira/browse/KAFKA-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647813#comment-17647813 ] A. Sophie Blee-Goldman commented on KAFKA-7632: --- [~dongjin] Removing the target version as this seems to have been bumped through a few versions > Support Compression Level > - > > Key: KAFKA-7632 > URL: https://issues.apache.org/jira/browse/KAFKA-7632 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 2.1.0 > Environment: all >Reporter: Dave Waters >Assignee: Dongjin Lee >Priority: Major > Labels: needs-kip > Fix For: 3.4.0 > > > The compression level for ZSTD is currently set to use the default level (3), > which is a conservative setting that in some use cases eliminates the value > that ZSTD provides with improved compression. Each use case will vary, so > exposing the level as a producer, broker, and topic configuration setting > will allow the user to adjust the level. > Since it applies to the other compression codecs, we should add the same > functionalities to them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-12399) Deprecate Log4J Appender
[ https://issues.apache.org/jira/browse/KAFKA-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647811#comment-17647811 ] A. Sophie Blee-Goldman commented on KAFKA-12399: [~dongjin] moving this to 3.5.0 since we are past code freeze for 3.4 > Deprecate Log4J Appender > > > Key: KAFKA-12399 > URL: https://issues.apache.org/jira/browse/KAFKA-12399 > Project: Kafka > Issue Type: Improvement > Components: logging >Reporter: Dongjin Lee >Assignee: Dongjin Lee >Priority: Major > Labels: needs-kip > Fix For: 3.5.0 > > > As a following job of KAFKA-9366, we have to entirely remove the log4j 1.2.7 > dependency from the classpath by removing dependencies on log4j-appender. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-12399) Deprecate Log4J Appender
[ https://issues.apache.org/jira/browse/KAFKA-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-12399: --- Fix Version/s: 3.5.0 (was: 3.4.0) > Deprecate Log4J Appender > > > Key: KAFKA-12399 > URL: https://issues.apache.org/jira/browse/KAFKA-12399 > Project: Kafka > Issue Type: Improvement > Components: logging >Reporter: Dongjin Lee >Assignee: Dongjin Lee >Priority: Major > Labels: needs-kip > Fix For: 3.5.0 > > > As a following job of KAFKA-9366, we have to entirely remove the log4j 1.2.7 > dependency from the classpath by removing dependencies on log4j-appender. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-12679) Rebalancing a restoring or running task may cause directory livelocking with newly created task
[ https://issues.apache.org/jira/browse/KAFKA-12679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-12679: --- Fix Version/s: (was: 3.4.0) > Rebalancing a restoring or running task may cause directory livelocking with > newly created task > --- > > Key: KAFKA-12679 > URL: https://issues.apache.org/jira/browse/KAFKA-12679 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.6.1 > Environment: Broker and client version 2.6.1 > Multi-node broker cluster > Multi-node, auto scaling streams app instances >Reporter: Peter Nahas >Priority: Major > Attachments: Backoff-between-directory-lock-attempts.patch > > > If a task that uses a state store is in the restoring state or in a running > state and the task gets rebalanced to a separate thread on the same instance, > the newly created task will attempt to lock the state store director while > the first thread is continuing to use it. This is totally normal and expected > behavior when the first thread is not yet aware of the rebalance. However, > that newly created task is effectively running a while loop with no backoff > waiting to lock the directory: > # TaskManager tells the task to restore in `tryToCompleteRestoration` > # The task attempts to lock the directory > # The lock attempt fails and throws a > `org.apache.kafka.streams.errors.LockException` > # TaskManager catches the exception, stops further processing on the task > and reports that not all tasks have restored > # The StreamThread `runLoop` continues to run. > I've seen some documentation indicate that there is supposed to be a backoff > when this condition occurs, but there does not appear to be any in the code. > The result is that if this goes on for long enough, the lock-loop may > dominate CPU usage in the process and starve out the old stream thread task > processing. > > When in this state, the DEBUG level logging for TaskManager will produce a > steady stream of messages like the following: > {noformat} > 2021-03-30 20:59:51,098 DEBUG --- [StreamThread-10] o.a.k.s.p.i.TaskManager > : stream-thread [StreamThread-10] Could not initialize 0_34 due > to the following exception; will retry > org.apache.kafka.streams.errors.LockException: stream-thread > [StreamThread-10] standby-task [0_34] Failed to lock the state directory for > task 0_34 > {noformat} > > > I've attached a git formatted patch to resolve the issue. Simply detect the > scenario and sleep for the backoff time in the appropriate StreamThread. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-13299) Accept listeners that have the same port but use IPv4 vs IPv6
[ https://issues.apache.org/jira/browse/KAFKA-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647810#comment-17647810 ] A. Sophie Blee-Goldman commented on KAFKA-13299: [~mdedetrich-aiven] moving this to 3.5.0 since we are past code freeze for 3.4 > Accept listeners that have the same port but use IPv4 vs IPv6 > - > > Key: KAFKA-13299 > URL: https://issues.apache.org/jira/browse/KAFKA-13299 > Project: Kafka > Issue Type: Improvement >Reporter: Matthew de Detrich >Assignee: Matthew de Detrich >Priority: Major > Fix For: 3.5.0 > > > Currently we are going through a process where we want to migrate Kafka > brokers from IPv4 to IPv6. The simplest way for us to do this would be to > allow Kafka to have 2 listeners of the same port however one listener has an > IPv4 address allocated and another listener has an IPv6 address allocated. > Currently this is not possible in Kafka because it validates that all of the > listeners have a unique port. With some rudimentary testing if this > validation is removed (so we are able to have 2 listeners of the same port > but with different IP versions) there doesn't seem to be any immediate > problems, the kafka clusters works without any problems. > Is there some fundamental reason behind this limitation of having unique > ports? Consequently would there be any problems in loosening this limitation > (i.e. duplicate ports are allowed if the IP versions are different) or just > altogether removing the restriction -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [kafka] akhileshchg opened a new pull request, #12998: KAFKA-14493: Introduce Zk to KRaft migration state machine STUBs in KRaft controller.
akhileshchg opened a new pull request, #12998: URL: https://github.com/apache/kafka/pull/12998 KAFKA-14493: Introduce Zk to KRaft migration state machine STUBs in KRaft controller. This patch introduces a preliminary state machine that can be used by the KRaft controller to drive online migration from Zk to KRaft. MigrationState -- Defines the states we can have during migration from Zk to KRaft. KRaftMigrationDriver -- Defines the state transitions and events to handle actions like controller change, metadata change, and broker change and has interfaces through which it claims Zk controllership, performs zk writes, and sends RPCs to ZkBrokers. MigrationClient -- Interface that defines the functions used to claim and relinquish Zk controllership, read to and write from Zk. BrokersRpcClient -- Interface that defines the functions used to send RPCs to Zk brokers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (KAFKA-13299) Accept listeners that have the same port but use IPv4 vs IPv6
[ https://issues.apache.org/jira/browse/KAFKA-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-13299: --- Fix Version/s: 3.5.0 (was: 3.4.0) > Accept listeners that have the same port but use IPv4 vs IPv6 > - > > Key: KAFKA-13299 > URL: https://issues.apache.org/jira/browse/KAFKA-13299 > Project: Kafka > Issue Type: Improvement >Reporter: Matthew de Detrich >Assignee: Matthew de Detrich >Priority: Major > Fix For: 3.5.0 > > > Currently we are going through a process where we want to migrate Kafka > brokers from IPv4 to IPv6. The simplest way for us to do this would be to > allow Kafka to have 2 listeners of the same port however one listener has an > IPv4 address allocated and another listener has an IPv6 address allocated. > Currently this is not possible in Kafka because it validates that all of the > listeners have a unique port. With some rudimentary testing if this > validation is removed (so we are able to have 2 listeners of the same port > but with different IP versions) there doesn't seem to be any immediate > problems, the kafka clusters works without any problems. > Is there some fundamental reason behind this limitation of having unique > ports? Consequently would there be any problems in loosening this limitation > (i.e. duplicate ports are allowed if the IP versions are different) or just > altogether removing the restriction -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-13891) sync group failed with rebalanceInProgress error cause rebalance many rounds in coopeartive
[ https://issues.apache.org/jira/browse/KAFKA-13891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647809#comment-17647809 ] A. Sophie Blee-Goldman commented on KAFKA-13891: Bumping this to 3.5.0 since we're past 3.4.0 code freeze, but I'll try to find either time to work on it myself, or someone else who can pick it up so that it doesn't get lost > sync group failed with rebalanceInProgress error cause rebalance many rounds > in coopeartive > --- > > Key: KAFKA-13891 > URL: https://issues.apache.org/jira/browse/KAFKA-13891 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.0.0 >Reporter: Shawn Wang >Priority: Major > Fix For: 3.5.0 > > > This issue was first found in > [KAFKA-13419|https://issues.apache.org/jira/browse/KAFKA-13419] > But the previous PR forgot to reset generation when sync group failed with > rebalanceInProgress error. So the previous bug still exists and it may cause > consumer to rebalance many rounds before final stable. > Here's the example ({*}bold is added{*}): > # consumer A joined and synced group successfully with generation 1 *( with > ownedPartition P1/P2 )* > # New rebalance started with generation 2, consumer A joined successfully, > but somehow, consumer A doesn't send out sync group immediately > # other consumer completed sync group successfully in generation 2, except > consumer A. > # After consumer A send out sync group, the new rebalance start, with > generation 3. So consumer A got REBALANCE_IN_PROGRESS error with sync group > response > # When receiving REBALANCE_IN_PROGRESS, we re-join the group, with > generation 3, with the assignment (ownedPartition) in generation 1. > # So, now, we have out-of-date ownedPartition sent, with unexpected results > happened > # *After the generation-3 rebalance, consumer A got P3/P4 partition. the > ownedPartition is ignored because of old generation.* > # *consumer A revoke P1/P2 and re-join to start a new round of rebalance* > # *if some other consumer C failed to syncGroup before consumer A's > joinGroup. the same issue will happens again and result in many rounds of > rebalance before stable* > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-13891) sync group failed with rebalanceInProgress error cause rebalance many rounds in coopeartive
[ https://issues.apache.org/jira/browse/KAFKA-13891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-13891: --- Fix Version/s: 3.5.0 (was: 3.4.0) > sync group failed with rebalanceInProgress error cause rebalance many rounds > in coopeartive > --- > > Key: KAFKA-13891 > URL: https://issues.apache.org/jira/browse/KAFKA-13891 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.0.0 >Reporter: Shawn Wang >Priority: Major > Fix For: 3.5.0 > > > This issue was first found in > [KAFKA-13419|https://issues.apache.org/jira/browse/KAFKA-13419] > But the previous PR forgot to reset generation when sync group failed with > rebalanceInProgress error. So the previous bug still exists and it may cause > consumer to rebalance many rounds before final stable. > Here's the example ({*}bold is added{*}): > # consumer A joined and synced group successfully with generation 1 *( with > ownedPartition P1/P2 )* > # New rebalance started with generation 2, consumer A joined successfully, > but somehow, consumer A doesn't send out sync group immediately > # other consumer completed sync group successfully in generation 2, except > consumer A. > # After consumer A send out sync group, the new rebalance start, with > generation 3. So consumer A got REBALANCE_IN_PROGRESS error with sync group > response > # When receiving REBALANCE_IN_PROGRESS, we re-join the group, with > generation 3, with the assignment (ownedPartition) in generation 1. > # So, now, we have out-of-date ownedPartition sent, with unexpected results > happened > # *After the generation-3 rebalance, consumer A got P3/P4 partition. the > ownedPartition is ignored because of old generation.* > # *consumer A revoke P1/P2 and re-join to start a new round of rebalance* > # *if some other consumer C failed to syncGroup before consumer A's > joinGroup. the same issue will happens again and result in many rounds of > rebalance before stable* > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14038) Optimize calculation of size for log in remote tier
[ https://issues.apache.org/jira/browse/KAFKA-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-14038: --- Fix Version/s: 3.5.0 (was: 3.4.0) > Optimize calculation of size for log in remote tier > --- > > Key: KAFKA-14038 > URL: https://issues.apache.org/jira/browse/KAFKA-14038 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Divij Vaidya >Assignee: Divij Vaidya >Priority: Major > Labels: needs-kip > Fix For: 3.5.0 > > > {color:#24292f}As per the Tiered Storage feature introduced in > [KIP-405|https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage], > users can configure the retention of remote tier based on time, by size, or > both. The work of computing the log segments to be deleted based on the > retention config is [owned by > RemoteLogManager|https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-1.RemoteLogManager(RLM)ThreadPool] > (RLM).{color} > {color:#24292f}To compute remote segments eligible for deletion based on > retention by size config, {color}RLM needs to compute the > {{total_remote_log_size}} i.e. the total size of logs available in the remote > tier for that topic-partition. RLM could use the > {{RemoteLogMetadataManager.listRemoteLogSegments()}} to fetch metadata for > all the remote segments and then aggregate the segment sizes by using > {{{}RemoteLogSegmentMetadata.segmentSizeInBytes(){}}}to find the total log > size stored in the remote tier. > The above method involves iterating through all metadata of all the segments > i.e. O({color:#24292f}num_remote_segments{color}) on each execution of RLM > thread. {color:#24292f}Since the main feature of tiered storage is storing a > large amount of data, we expect num_remote_segments to be large and a > frequent linear scan could be expensive (depending on the underlying storage > used by RemoteLogMetadataManager). > Segment offloads and segment deletions are run together in the same task and > a fixed size thread pool is shared among all topic-partitions. A slow logic > for calculation of total_log_size could result in the loss of availability as > demonstrated in the following scenario:{color} > # {color:#24292f}Calculation of total_size is slow and the threads in the > thread pool are busy with segment deletions{color} > # Segment offloads are delayed (since they run together with deletions) > # Local disk fills up, since local deletion requires the segment to be > offloaded > # If local disk is completely full, Kafka fails > Details are in KIP - > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-852%3A+Optimize+calculation+of+size+for+log+in+remote+tier] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14038) Optimize calculation of size for log in remote tier
[ https://issues.apache.org/jira/browse/KAFKA-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647807#comment-17647807 ] A. Sophie Blee-Goldman commented on KAFKA-14038: [~divijvaidya] bumped to 3.5 as we are past code freeze for 3.4.0 > Optimize calculation of size for log in remote tier > --- > > Key: KAFKA-14038 > URL: https://issues.apache.org/jira/browse/KAFKA-14038 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Divij Vaidya >Assignee: Divij Vaidya >Priority: Major > Labels: needs-kip > Fix For: 3.5.0 > > > {color:#24292f}As per the Tiered Storage feature introduced in > [KIP-405|https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage], > users can configure the retention of remote tier based on time, by size, or > both. The work of computing the log segments to be deleted based on the > retention config is [owned by > RemoteLogManager|https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-1.RemoteLogManager(RLM)ThreadPool] > (RLM).{color} > {color:#24292f}To compute remote segments eligible for deletion based on > retention by size config, {color}RLM needs to compute the > {{total_remote_log_size}} i.e. the total size of logs available in the remote > tier for that topic-partition. RLM could use the > {{RemoteLogMetadataManager.listRemoteLogSegments()}} to fetch metadata for > all the remote segments and then aggregate the segment sizes by using > {{{}RemoteLogSegmentMetadata.segmentSizeInBytes(){}}}to find the total log > size stored in the remote tier. > The above method involves iterating through all metadata of all the segments > i.e. O({color:#24292f}num_remote_segments{color}) on each execution of RLM > thread. {color:#24292f}Since the main feature of tiered storage is storing a > large amount of data, we expect num_remote_segments to be large and a > frequent linear scan could be expensive (depending on the underlying storage > used by RemoteLogMetadataManager). > Segment offloads and segment deletions are run together in the same task and > a fixed size thread pool is shared among all topic-partitions. A slow logic > for calculation of total_log_size could result in the loss of availability as > demonstrated in the following scenario:{color} > # {color:#24292f}Calculation of total_size is slow and the threads in the > thread pool are busy with segment deletions{color} > # Segment offloads are delayed (since they run together with deletions) > # Local disk fills up, since local deletion requires the segment to be > offloaded > # If local disk is completely full, Kafka fails > Details are in KIP - > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-852%3A+Optimize+calculation+of+size+for+log+in+remote+tier] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14493) Zk to KRaft migration state machine in KRaft controller
Akhilesh Chaganti created KAFKA-14493: - Summary: Zk to KRaft migration state machine in KRaft controller Key: KAFKA-14493 URL: https://issues.apache.org/jira/browse/KAFKA-14493 Project: Kafka Issue Type: Sub-task Reporter: Akhilesh Chaganti Assignee: Akhilesh Chaganti -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14077) KRaft should support recovery from failed disk
[ https://issues.apache.org/jira/browse/KAFKA-14077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647805#comment-17647805 ] A. Sophie Blee-Goldman commented on KAFKA-14077: [~hachikuji] [~jagsancio] what's the status here, can we bump this to 3.5.0 or is someone working on this as an active blocker for 3.4? > KRaft should support recovery from failed disk > -- > > Key: KAFKA-14077 > URL: https://issues.apache.org/jira/browse/KAFKA-14077 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jose Armando Garcia Sancio >Priority: Blocker > Fix For: 3.4.0 > > > If one of the nodes in the metadata quorum has a disk failure, there is no > way currently to safely bring the node back into the quorum. When we lose > disk state, we are at risk of losing committed data even if the failure only > affects a minority of the cluster. > Here is an example. Suppose that a metadata quorum has 3 members: v1, v2, and > v3. Initially, v1 is the leader and writes a record at offset 1. After v2 > acknowledges replication of the record, it becomes committed. Suppose that v1 > fails before v3 has a chance to replicate this record. As long as v1 remains > down, the raft protocol guarantees that only v2 can become leader, so the > record cannot be lost. The raft protocol expects that when v1 returns, it > will still have that record, but what if there is a disk failure, the state > cannot be recovered and v1 participates in leader election? Then we would > have committed data on a minority of the voters. The main problem here > concerns how we recover from this impaired state without risking the loss of > this data. > Consider a naive solution which brings v1 back with an empty disk. Since the > node has lost is prior knowledge of the state of the quorum, it will vote for > any candidate that comes along. If v3 becomes a candidate, then it will vote > for itself and it just needs the vote from v1 to become leader. If that > happens, then the committed data on v2 will become lost. > This is just one scenario. In general, the invariants that the raft protocol > is designed to preserve go out the window when disk state is lost. For > example, it is also possible to contrive a scenario where the loss of disk > state leads to multiple leaders. There is a good reason why raft requires > that any vote cast by a voter is written to disk since otherwise the voter > may vote for different candidates in the same epoch. > Many systems solve this problem with a unique identifier which is generated > automatically and stored on disk. This identifier is then committed to the > raft log. If a disk changes, we would see a new identifier and we can prevent > the node from breaking raft invariants. Then recovery from a failed disk > requires a quorum reconfiguration. We need something like this in KRaft to > make disk recovery possible. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14279) Add 3.3.1 to broker/client and stream upgrade/compatibility tests
[ https://issues.apache.org/jira/browse/KAFKA-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647804#comment-17647804 ] A. Sophie Blee-Goldman commented on KAFKA-14279: [~jsancio] why is this marked as a blocker for 3.4? Which version is this ticket supposed to cover w.r.t being added to the system tests, 3.3.1 or 3.2.0 or 3.4.0? There seem to be multiple versions at play here > Add 3.3.1 to broker/client and stream upgrade/compatibility tests > - > > Key: KAFKA-14279 > URL: https://issues.apache.org/jira/browse/KAFKA-14279 > Project: Kafka > Issue Type: Task > Components: clients, core, streams, system tests >Reporter: José Armando García Sancio >Assignee: José Armando García Sancio >Priority: Blocker > Fix For: 3.4.0 > > > Per the penultimate bullet on the [release > checklist|https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-Afterthevotepasses], > Kafka v3.2.0 is released. We should add this version to the system tests. > Example PRs: > * Broker and clients: [https://github.com/apache/kafka/pull/6794] > * Streams: [https://github.com/apache/kafka/pull/6597/files] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-12473) Make the "cooperative-sticky, range" as the default assignor
[ https://issues.apache.org/jira/browse/KAFKA-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A. Sophie Blee-Goldman updated KAFKA-12473: --- Fix Version/s: 3.5.0 (was: 3.4.0) > Make the "cooperative-sticky, range" as the default assignor > > > Key: KAFKA-12473 > URL: https://issues.apache.org/jira/browse/KAFKA-12473 > Project: Kafka > Issue Type: Improvement > Components: consumer >Reporter: A. Sophie Blee-Goldman >Assignee: Luke Chen >Priority: Critical > Labels: kip > Fix For: 3.5.0 > > > Now that 3.0 is coming up, we can change the default > ConsumerPartitionAssignor to something better than the RangeAssignor. The > original plan was to switch over to the StickyAssignor, but now that we have > incremental cooperative rebalancing we should consider using the new > CooperativeStickyAssignor instead: this will enable the consumer group to > follow the COOPERATIVE protocol, improving the rebalancing experience OOTB. > KIP: > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=177048248 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-12886) Enable request forwarding by default
[ https://issues.apache.org/jira/browse/KAFKA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647800#comment-17647800 ] A. Sophie Blee-Goldman commented on KAFKA-12886: [~hachikuji] [~jagsancio] what's the status of this? Is it still an open blocker for 3.4? > Enable request forwarding by default > > > Key: KAFKA-12886 > URL: https://issues.apache.org/jira/browse/KAFKA-12886 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Priority: Blocker > Fix For: 3.4.0 > > > KIP-590 documents that request forwarding will be enabled in 3.0 by default: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-590%3A+Redirect+Zookeeper+Mutation+Protocols+to+The+Controller. > This makes it a requirement for users with custom principal implementations > to provide a `KafkaPrincipalSerde` implementation. We waited until 3.0 > because we saw this as a compatibility break. > The KIP documents that use of forwarding will be controlled by the IBP. So > once the IBP has been configured to 3.0 or above, then the brokers will begin > forwarding. > (Note that forwarding has always been a requirement for kraft.) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [kafka] dengziming commented on pull request #12993: KAFKA-14471: Move IndexEntry and related to storage module
dengziming commented on PR #12993: URL: https://github.com/apache/kafka/pull/12993#issuecomment-1352473472 > @dengziming do you happen to have cycles to review this PR? Sure, I have added it to my review backlog. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] PhantomMaa opened a new pull request, #12997: KAFKA-14492: Extract a method to create LogManager, in order to be overrided by subclass of KafkaServer
PhantomMaa opened a new pull request, #12997: URL: https://github.com/apache/kafka/pull/12997 In our scene, we want to implement a subclass of LogManager, add do some interception like encrypting/auditing. Even more, we want enhance the write log performance in the future. Split alone a create method can provide an extension point to allow user define the action from outside kafka core. Base kafka, entend it, rather than modification. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-14492) Extract a method to create LogManager, in order to be overrided by subclass of KafkaServer
Phantom Ma created KAFKA-14492: -- Summary: Extract a method to create LogManager, in order to be overrided by subclass of KafkaServer Key: KAFKA-14492 URL: https://issues.apache.org/jira/browse/KAFKA-14492 Project: Kafka Issue Type: Improvement Components: core Reporter: Phantom Ma In our scene, we want to implement a subclass of LogManager, add do some interception like encrypting/auditing. Even more, we want enhance the write log performance in the future. Split alone a create method can provide an extension point to allow user define the action from outside kafka core. Base kafka, entend it, rather than modification. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [kafka] satishd commented on a diff in pull request #12989: KAFKA-14469: Fix remote tiered related topic configuration
satishd commented on code in PR #12989: URL: https://github.com/apache/kafka/pull/12989#discussion_r1049118373 ## clients/src/main/java/org/apache/kafka/common/config/TopicConfig.java: ## @@ -75,16 +75,16 @@ public class TopicConfig { "\"delete\" retention policy. This represents an SLA on how soon consumers must read " + "their data. If set to -1, no time limit is applied."; -public static final String REMOTE_LOG_STORAGE_ENABLE_CONFIG = "remote.storage.enable"; +public static final String REMOTE_LOG_STORAGE_ENABLE_CONFIG = "remote.log.storage.enable"; Review Comment: Topic configurations do not include "log". That is why we decided not to have for the newly introduced configs. Same for other configs mentioned in this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] ijuma opened a new pull request, #12996: KAFKA-14472: Move TransactionIndex and related to storage module
ijuma opened a new pull request, #12996: URL: https://github.com/apache/kafka/pull/12996 Follows #12993 and hence draft until that is merged. For broader context on this change, please check: * KAFKA-14470: Move log layer to storage module ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] ijuma commented on pull request #12989: KAFKA-14469: Fix remote tiered related topic configuration
ijuma commented on PR #12989: URL: https://github.com/apache/kafka/pull/12989#issuecomment-1352436760 Thoughts @satishd? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] ijuma commented on pull request #12995: KAFKA-14396: De-flake memory tests with new TestUtils.restrictMemory
ijuma commented on PR #12995: URL: https://github.com/apache/kafka/pull/12995#issuecomment-1352436236 The challenge is that the JVM continues to be reused for several tests afterwards. Once you get an OOM, it can leave lasting effects. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] gharris1727 commented on pull request #12995: KAFKA-14396: De-flake memory tests with new TestUtils.restrictMemory
gharris1727 commented on PR #12995: URL: https://github.com/apache/kafka/pull/12995#issuecomment-1352397995 Also something I noticed through removing the close() call: none and gzip each pass the test, despite never having close() called. This appears to be because they allocate no or very little memory, and thus don't fail in the restricted memory environment. However, all of the other compression methods seem to naively allocate a full-size buffer, which did cause their tests to fail. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (KAFKA-14395) Add config to configure client supplier for KafkaStreams
[ https://issues.apache.org/jira/browse/KAFKA-14395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Li resolved KAFKA-14395. Resolution: Done > Add config to configure client supplier for KafkaStreams > > > Key: KAFKA-14395 > URL: https://issues.apache.org/jira/browse/KAFKA-14395 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Hao Li >Assignee: Hao Li >Priority: Major > Labels: kip > > For KIP: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-884%3A+Add+config+to+configure+KafkaClientSupplier+in+Kafka+Streams -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [kafka] gharris1727 opened a new pull request, #12995: KAFKA-14396: De-flake memory tests with new TestUtils.restrictMemory
gharris1727 opened a new pull request, #12995: URL: https://github.com/apache/kafka/pull/12995 Rather than relying on measuring Runtime::freeMemory() and calling System.gc(), tests which want to assert that there are to excessive allocations should execute in a restricted-memory environment. In order to do this, add a new TestUtil method which intentionally consumes memory in a way that allows tests to assert that the code under test does not over-allocate or leak memory. This is a temporary effect on a single JVM, and is reverted as soon as the test code completes. Apply this fix to MemoryRecordsBuilderTest::testBuffersDereferencedOnClose(), which now fails if the eponymous close() call is omitted. I also plan to apply this fix to ThreadCacheTest#cacheOverheadsSmallValues and ThreadCacheTest#cacheOverheadsLargeValues and will keep this PR in draft until then. Signed-off-by: Greg Harris ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] jsancio opened a new pull request, #12994: KAFKA-14457; Controller metrics should only expose committed data
jsancio opened a new pull request, #12994: URL: https://github.com/apache/kafka/pull/12994 TODO: Write description ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] (KAFKA-14405) Log a warning when users attempt to set a config controlled by Streams
[ https://issues.apache.org/jira/browse/KAFKA-14405 ] RockieRockie deleted comment on KAFKA-14405: -- was (Author: JIRAUSER289157): Hello :) Do you basically mean that the StreamConfig should always log a warning in the constructor that its not mutable through the parameter passed in constructor? ex. Properties props = new Properties(); props.put(StreamsConfig.APPLICATION_ID_CONFIG, "original"); //builders etc. KafkaStreams streams = new KafkaStreams(builder.build(), props); //During execution of this statement log a warning. props.put(StreamsConfig.APPLICATION_ID_CONFIG, "replacement"); //Here the change won't be reflected in the streams.applicationConfigs > Log a warning when users attempt to set a config controlled by Streams > -- > > Key: KAFKA-14405 > URL: https://issues.apache.org/jira/browse/KAFKA-14405 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: A. Sophie Blee-Goldman >Assignee: Ashmeet Lamba >Priority: Major > Labels: newbie > > Related to https://issues.apache.org/jira/browse/KAFKA-14404 > It's too easy for users to try overriding one of the client configs that > Streams hardcodes, and since we just silently ignore it there's no good way > for them to tell their config is not being used. Sometimes this may be > harmless but in cases like the Producer's partitioner, there could be > important application logic that's never being invoked. > When processing user configs in StreamsConfig, we should check for all these > configs and log a warning when any of them have been set -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14491) Introduce Versioned Key-Value Stores to Kafka Streams
Victoria Xia created KAFKA-14491: Summary: Introduce Versioned Key-Value Stores to Kafka Streams Key: KAFKA-14491 URL: https://issues.apache.org/jira/browse/KAFKA-14491 Project: Kafka Issue Type: Improvement Reporter: Victoria Xia Assignee: Victoria Xia The key-value state stores used by Kafka Streams today maintain only the latest value associated with each key. In order to support applications which require access to older record versions, Kafka Streams should have versioned state stores. Versioned state stores are similar to key-value stores except they can store multiple record versions for a single key. An example use case for versioned key-value stores is in providing proper temporal join semantics for stream-tables joins with regards to out-of-order data. See KIP for more: https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14490) Consider using UncheckdIOException instead of IOException in the log layer
[ https://issues.apache.org/jira/browse/KAFKA-14490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647727#comment-17647727 ] Ismael Juma commented on KAFKA-14490: - Yes, I had seen the raft ticket saying the same before. :) > Consider using UncheckdIOException instead of IOException in the log layer > -- > > Key: KAFKA-14490 > URL: https://issues.apache.org/jira/browse/KAFKA-14490 > Project: Kafka > Issue Type: Sub-task >Reporter: Ismael Juma >Priority: Major > > IOException is a checked exception, which makes it difficult to use with > lambdas. We should consider using UncheckdIOException instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14490) Consider using UncheckdIOException instead of IOException in the log layer
[ https://issues.apache.org/jira/browse/KAFKA-14490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647717#comment-17647717 ] José Armando García Sancio commented on KAFKA-14490: I think this is a good idea. [~ijuma] we made that decision in the raft module. In the raft module we convert all IOException to UncheckedIOException as early in the stack as possible. > Consider using UncheckdIOException instead of IOException in the log layer > -- > > Key: KAFKA-14490 > URL: https://issues.apache.org/jira/browse/KAFKA-14490 > Project: Kafka > Issue Type: Sub-task >Reporter: Ismael Juma >Priority: Major > > IOException is a checked exception, which makes it difficult to use with > lambdas. We should consider using UncheckdIOException instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14471) Move IndexEntry and related to storage module
[ https://issues.apache.org/jira/browse/KAFKA-14471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-14471: Fix Version/s: 3.5.0 > Move IndexEntry and related to storage module > - > > Key: KAFKA-14471 > URL: https://issues.apache.org/jira/browse/KAFKA-14471 > Project: Kafka > Issue Type: Sub-task >Reporter: Ismael Juma >Assignee: Ismael Juma >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [kafka] ijuma commented on pull request #12993: KAFKA-14471: Move IndexEntry and related to storage module
ijuma commented on PR #12993: URL: https://github.com/apache/kafka/pull/12993#issuecomment-1352261934 @dengziming do you happen to have cycles to review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] sndp2693 commented on pull request #9243: Add ConsumerGroupCommand to delete static members
sndp2693 commented on PR #9243: URL: https://github.com/apache/kafka/pull/9243#issuecomment-1352259137 @dajac Yes I had started that as well. I am not sure if anyone looked at it or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] ijuma opened a new pull request, #12993: KAFKA-14471: Move IndexEntry and related to storage module
ijuma opened a new pull request, #12993: URL: https://github.com/apache/kafka/pull/12993 For broader context on this change, please check: * KAFKA-14470: Move log layer to storage module ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-14490) Consider using UncheckdIOException instead of IOException in the log layer
Ismael Juma created KAFKA-14490: --- Summary: Consider using UncheckdIOException instead of IOException in the log layer Key: KAFKA-14490 URL: https://issues.apache.org/jira/browse/KAFKA-14490 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma IOException is a checked exception, which makes it difficult to use with lambdas. We should consider using UncheckdIOException instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14489) Adjust visibility of classes moved to storage module
Ismael Juma created KAFKA-14489: --- Summary: Adjust visibility of classes moved to storage module Key: KAFKA-14489 URL: https://issues.apache.org/jira/browse/KAFKA-14489 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma Once the log layer has been completely migrated to the storage module, we should adjust the visibility of classes that are only used within the log layer to be package private. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14484) Move UnifiedLog to storage module
Ismael Juma created KAFKA-14484: --- Summary: Move UnifiedLog to storage module Key: KAFKA-14484 URL: https://issues.apache.org/jira/browse/KAFKA-14484 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14488) Move log layer tests to storage module
Ismael Juma created KAFKA-14488: --- Summary: Move log layer tests to storage module Key: KAFKA-14488 URL: https://issues.apache.org/jira/browse/KAFKA-14488 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma This should be split into multiple tasks. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14487) Move LogManager to storage module
Ismael Juma created KAFKA-14487: --- Summary: Move LogManager to storage module Key: KAFKA-14487 URL: https://issues.apache.org/jira/browse/KAFKA-14487 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14485) Move LogCleaner to storage module
Ismael Juma created KAFKA-14485: --- Summary: Move LogCleaner to storage module Key: KAFKA-14485 URL: https://issues.apache.org/jira/browse/KAFKA-14485 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14486) Move LogCleanerManager to storage module
Ismael Juma created KAFKA-14486: --- Summary: Move LogCleanerManager to storage module Key: KAFKA-14486 URL: https://issues.apache.org/jira/browse/KAFKA-14486 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14482) Move LogLoader to storage module
Ismael Juma created KAFKA-14482: --- Summary: Move LogLoader to storage module Key: KAFKA-14482 URL: https://issues.apache.org/jira/browse/KAFKA-14482 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14483) Move LocalLog to storage module
Ismael Juma created KAFKA-14483: --- Summary: Move LocalLog to storage module Key: KAFKA-14483 URL: https://issues.apache.org/jira/browse/KAFKA-14483 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14481) Move LogSegment/LogSegments to storage module
[ https://issues.apache.org/jira/browse/KAFKA-14481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-14481: Summary: Move LogSegment/LogSegments to storage module (was: Move LogSegment/LogSegments to storage) > Move LogSegment/LogSegments to storage module > - > > Key: KAFKA-14481 > URL: https://issues.apache.org/jira/browse/KAFKA-14481 > Project: Kafka > Issue Type: Sub-task >Reporter: Ismael Juma >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14481) Move LogSegment/LogSegments to storage
Ismael Juma created KAFKA-14481: --- Summary: Move LogSegment/LogSegments to storage Key: KAFKA-14481 URL: https://issues.apache.org/jira/browse/KAFKA-14481 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14480) Move ProducerStateManager to storage module
Ismael Juma created KAFKA-14480: --- Summary: Move ProducerStateManager to storage module Key: KAFKA-14480 URL: https://issues.apache.org/jira/browse/KAFKA-14480 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14479) Move CleanerConfig to storage module
[ https://issues.apache.org/jira/browse/KAFKA-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-14479: Summary: Move CleanerConfig to storage module (was: Move CleanerConfig to storage) > Move CleanerConfig to storage module > > > Key: KAFKA-14479 > URL: https://issues.apache.org/jira/browse/KAFKA-14479 > Project: Kafka > Issue Type: Sub-task >Reporter: Ismael Juma >Assignee: Ismael Juma >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14478) Move LogConfig to storage module
Ismael Juma created KAFKA-14478: --- Summary: Move LogConfig to storage module Key: KAFKA-14478 URL: https://issues.apache.org/jira/browse/KAFKA-14478 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma Assignee: Ismael Juma -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14479) Move CleanerConfig to storage
Ismael Juma created KAFKA-14479: --- Summary: Move CleanerConfig to storage Key: KAFKA-14479 URL: https://issues.apache.org/jira/browse/KAFKA-14479 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma Assignee: Ismael Juma -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14477) Move LogValidator to storage module
Ismael Juma created KAFKA-14477: --- Summary: Move LogValidator to storage module Key: KAFKA-14477 URL: https://issues.apache.org/jira/browse/KAFKA-14477 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-14477) Move LogValidator to storage module
[ https://issues.apache.org/jira/browse/KAFKA-14477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma reassigned KAFKA-14477: --- Assignee: Ismael Juma > Move LogValidator to storage module > --- > > Key: KAFKA-14477 > URL: https://issues.apache.org/jira/browse/KAFKA-14477 > Project: Kafka > Issue Type: Sub-task >Reporter: Ismael Juma >Assignee: Ismael Juma >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14476) Move OffsetMap to storage module
Ismael Juma created KAFKA-14476: --- Summary: Move OffsetMap to storage module Key: KAFKA-14476 URL: https://issues.apache.org/jira/browse/KAFKA-14476 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma Assignee: Ismael Juma -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14475) Move TimestampIndex/LazyIndex to storage module
Ismael Juma created KAFKA-14475: --- Summary: Move TimestampIndex/LazyIndex to storage module Key: KAFKA-14475 URL: https://issues.apache.org/jira/browse/KAFKA-14475 Project: Kafka Issue Type: Sub-task Reporter: Ismael Juma -- This message was sent by Atlassian Jira (v8.20.10#820010)