[jira] [Created] (KAFKA-15375) When running in KRaft mode, LogManager may creates CleanShutdown file by mistake
Vincent Jiang created KAFKA-15375: - Summary: When running in KRaft mode, LogManager may creates CleanShutdown file by mistake Key: KAFKA-15375 URL: https://issues.apache.org/jira/browse/KAFKA-15375 Project: Kafka Issue Type: Bug Reporter: Vincent Jiang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14497) LastStableOffset is advanced prematurely when a log is reopened.
Vincent Jiang created KAFKA-14497: - Summary: LastStableOffset is advanced prematurely when a log is reopened. Key: KAFKA-14497 URL: https://issues.apache.org/jira/browse/KAFKA-14497 Project: Kafka Issue Type: Bug Reporter: Vincent Jiang In below test case, last stable offset of log is advanced prematurely after reopen: # producer #1 appends transaction records to leader. offsets = [0, 1, 2, 3] # producer #2 appends transactional records to leader. offsets = [4, 5, 6, 7] # all records are replicated to followers and high watermark advanced to 8. # at this point, lastStableOffset = 0. (first offset of an open transaction) # producer #1 aborts the transaction by writing an abort marker at offset 8. ProducerStateManager.unreplicatedTxns contains the aborted transaction (firstOffset=0, lastOffset=8) # then the log is closed and reopened. # after reopen, log.lastStableOffset is initialized to 4. This is because ProducerStateManager.unreplicatedTxns is empty after reopening log. We should rebuild ProducerStateManager.unreplicatedTxns when reloading a log, so that lastStableOffset remains unchanged before and after reopen. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14347) deleted records may be kept unexpectedly when leader changes while adding a new replica
Vincent Jiang created KAFKA-14347: - Summary: deleted records may be kept unexpectedly when leader changes while adding a new replica Key: KAFKA-14347 URL: https://issues.apache.org/jira/browse/KAFKA-14347 Project: Kafka Issue Type: Improvement Reporter: Vincent Jiang Consider that in a compacted topic, a regular record _k1=v1_ is deleted by a later tombstone record {_}k1=null{_}{_}.{_} And imagine that somehow __ log compaction is making different progress on the three replicas, {_}r1{_}, _r2_ and _r3:_ _-_ on replica {_}r1{_}, log compaction has not cleaned _k1=v1_ or _k1=null_ yet. - on replica {_}r2{_}, log compaction cleaned and removed both _k1=v1_ and _k1=null._ In this case, following sequence can cause record _k1=v1_ being kept unexpectedly: 1. Replica _r3_ is re-assigned to a different node and starts to replicate data from leader. 2. At the beginning, _r1_ is the leader, so _r3_ replicates record _k1=v1_ from {_}r1{_}. 3. Before _k1=null_ is replicated from {_}r1{_}, leader changes to {_}r2{_}. 4. _r3_ replicates data from {_}r2{_}. Because _k1=null_ record has been cleaned in {_}r2{_}, it will not be replicated. As a result, _r3_ has record _k1=v1_ but not {_}k1=null{_}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14151) Add additional validation to protect on-disk log segment data from being corrupted
Vincent Jiang created KAFKA-14151: - Summary: Add additional validation to protect on-disk log segment data from being corrupted Key: KAFKA-14151 URL: https://issues.apache.org/jira/browse/KAFKA-14151 Project: Kafka Issue Type: Improvement Components: log Reporter: Vincent Jiang We received escalations reporting bad records being written to log segment on-disk data due to environmental issues (bug in old version JVM jit). We should consider adding additional validation to protect the on-disk data from being corrupted by inadvertent bugs or environmental issues -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14005) LogCleaner doesn't clean log if there is no dirty range
Vincent Jiang created KAFKA-14005: - Summary: LogCleaner doesn't clean log if there is no dirty range Key: KAFKA-14005 URL: https://issues.apache.org/jira/browse/KAFKA-14005 Project: Kafka Issue Type: Bug Reporter: Vincent Jiang When there is no dirty range to clean (firstDirtyOffset == firstUnclenableOffset), buildOffsetMap for dirty range returns an empty offset map, with map.latestOffset = -1. Then target cleaning offset range becomes [startOffset, map.latestOffset + 1) = [startOffset, 0], hence no segments are cleaned. The correct cleaning offset range should be [startOffset, firstDirtyOffset], so that the log can be cleaned again to remove abort/commit markers, or tombstones. LogCleanerTest.FakeOffsetMap.clear() method has a bug - it doesn't reset lastOffset. This bug causes test case like testAbortMarkerRemoval() pass false-positively. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13717) KafkaConsumer.close throws authorization exception even when commit offsets is empty
Vincent Jiang created KAFKA-13717: - Summary: KafkaConsumer.close throws authorization exception even when commit offsets is empty Key: KAFKA-13717 URL: https://issues.apache.org/jira/browse/KAFKA-13717 Project: Kafka Issue Type: Bug Components: unit tests Reporter: Vincent Jiang When offsets is empty and coordinator is unknown, KafkaConsumer.close doesn't throw exception before commit [https://github.com/apache/kafka/commit/4b468a9d81f7380f7197a2a6b859c1b4dca84bd9|https://github.com/apache/kafka/commit/4b468a9d81f7380f7197a2a6b859c1b4dca84bd9,]. After this commit, Kafka.close may throw authorization exception. Root cause is because in the commit, the logic is changed to call lookupCoordinator even if offsets is empty. Even if a consumer doesn't have access to a group or a topic, it might be better to not throw authorization exception in this case because close() call doesn't touch actually access any resource. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13706) org.apache.kafka.test.MockSelector doesn't remove closed connections from its 'ready' field
Vincent Jiang created KAFKA-13706: - Summary: org.apache.kafka.test.MockSelector doesn't remove closed connections from its 'ready' field Key: KAFKA-13706 URL: https://issues.apache.org/jira/browse/KAFKA-13706 Project: Kafka Issue Type: Bug Components: unit tests Reporter: Vincent Jiang MockSelector.close(String id) method doesn't remove closed connection from "ready" field. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13461) KafkaController stops functioning as active controller after ZooKeeperClient auth failure
Vincent Jiang created KAFKA-13461: - Summary: KafkaController stops functioning as active controller after ZooKeeperClient auth failure Key: KAFKA-13461 URL: https://issues.apache.org/jira/browse/KAFKA-13461 Project: Kafka Issue Type: Bug Components: zkclient Reporter: Vincent Jiang When java.security.auth.login.config is present, but there is no "Client" section, ZookeeperSaslClient creation fails and raises LoginExcpetion, result in warning log: {code:java} WARN SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '***'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.{code} When this happens after initial startup, ClientCnxn enqueues an AuthFailed event which will trigger following sequence: # zkclient reinitialization is triggered # the controller resigns. # Before the controller's ZK session expires, the controller successfully connect to ZK and maintains the current session # In KafkaController.elect(), the controller sets activeControllerId to itself and short-circuits the rest of the elect. Since the controller resigned earlier and also skips the call to onControllerFailover(), the controller is not actually functioning as the active controller (e.g. the necessary ZK watchers haven't been registered). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13305) NullPointerException in LogCleanerManager "uncleanable-bytes" gauge
Vincent Jiang created KAFKA-13305: - Summary: NullPointerException in LogCleanerManager "uncleanable-bytes" gauge Key: KAFKA-13305 URL: https://issues.apache.org/jira/browse/KAFKA-13305 Project: Kafka Issue Type: Bug Components: log cleaner Reporter: Vincent Jiang We've seen following exception in production environment: {quote} java.lang.NullPointerException: Cannot invoke "kafka.log.UnifiedLog.logStartOffset()" because "log" is null at kafka.log.LogCleanerManager$.cleanableOffsets(LogCleanerManager.scala:599) {quote} Looks like uncleanablePartitions never has partitions removed from it to reflect partition deletion/reassignment. We should fix the NullPointerException and removed deleted partitions from uncleanablePartitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)