[jira] [Updated] (KAFKA-17941) TransactionStateManager.loadTransactionMetadata method may get stuck in an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-17941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Jiang updated KAFKA-17941: -- Description: When loading transaction metadata from a transaction log partition, if the partition contains a segment ending with an empty batch, "currOffset" update logic at [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L482] will be skipped. Since "currOffset" is not advanced to next offset of last batch properly, TransactionStateManager.loadTransactionMetadata method will be stuck in the "while" loop at [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L438.|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L438,] After the change of https://issues.apache.org/jira/browse/KAFKA-17076, there is a higher chance for compaction process to generate segments ending with an empty batch. As a result, this issue is more likely to be hit now comparing to before KAFKA-17076 change. was:When loading transaction metadata from a transaction log partition, if the partition contains a segment ending with an empty batch, "currOffset" update logic at [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L482] will be skipped. Since "currOffset" is not advanced to next offset of last batch properly, TransactionStateManager.loadTransactionMetadata method will be stuck in the "while" loop at [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L438.|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L438,] > TransactionStateManager.loadTransactionMetadata method may get stuck in an > infinite loop > > > Key: KAFKA-17941 > URL: https://issues.apache.org/jira/browse/KAFKA-17941 > Project: Kafka > Issue Type: Bug >Reporter: Vincent Jiang >Assignee: Vincent Jiang >Priority: Major > > When loading transaction metadata from a transaction log partition, if the > partition contains a segment ending with an empty batch, "currOffset" update > logic at > [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L482] > will be skipped. Since "currOffset" is not advanced to next offset of last > batch properly, TransactionStateManager.loadTransactionMetadata method will > be stuck in the "while" loop at > [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L438.|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L438,] > > After the change of https://issues.apache.org/jira/browse/KAFKA-17076, there > is a higher chance for compaction process to generate segments ending with an > empty batch. As a result, this issue is more likely to be hit now comparing > to before KAFKA-17076 change. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-17941) TransactionStateManager.loadTransactionMetadata method may get stuck in an infinite loop
Vincent Jiang created KAFKA-17941: - Summary: TransactionStateManager.loadTransactionMetadata method may get stuck in an infinite loop Key: KAFKA-17941 URL: https://issues.apache.org/jira/browse/KAFKA-17941 Project: Kafka Issue Type: Bug Reporter: Vincent Jiang When loading transaction metadata from a transaction log partition, if the partition contains a segment ending with an empty batch, "currOffset" update logic at [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L482] will be skipped. Since "currOffset" is not advanced to next offset of last batch properly, TransactionStateManager.loadTransactionMetadata method will be stuck in the "while" loop at [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L438.|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L438,] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15375) When running in KRaft mode, LogManager may creates CleanShutdown file by mistake
[ https://issues.apache.org/jira/browse/KAFKA-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Jiang reassigned KAFKA-15375: - Assignee: Vincent Jiang > When running in KRaft mode, LogManager may creates CleanShutdown file by > mistake > - > > Key: KAFKA-15375 > URL: https://issues.apache.org/jira/browse/KAFKA-15375 > Project: Kafka > Issue Type: Bug >Reporter: Vincent Jiang >Assignee: Vincent Jiang >Priority: Major > > Consider following sequence when running Kafka in KRaft mode: > # A partition log "log1" is created under "logDir1", and some records are > appended to it. > # Broker crashes. No clean shutdown file is created in "logDir1". > # Broker is restarted. BrokerServer.startup is called. > # On a different thread, LogManager.startup is called by > BrokerMetadataPublisher. > # Before LogManager.startup finishing recovering logs under "logDir1", fatal > exception is thrown in BrokerServer.startup. > # In exception hander, BrokerServer.startup calls LogManager.shutdown. As a > result, a clean shutdown file is created under "logDir1" > # Broker is restarted again. Due to the clean shutdown file created in step > 6, recovery is skipped for logs under "logDir1", which is not right because > "logDir1" was not fully recovered in step 5. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15375) When running in KRaft mode, LogManager may creates CleanShutdown file by mistake
[ https://issues.apache.org/jira/browse/KAFKA-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Jiang updated KAFKA-15375: -- Description: Consider following sequence when running Kafka in KRaft mode: # A partition log "log1" is created under "logDir1", and some records are appended to it. # Broker crashes. No clean shutdown file is created in "logDir1". # Broker is restarted. BrokerServer.startup is called. # On a different thread, LogManager.startup is called by BrokerMetadataPublisher. # Before LogManager.startup finishing recovering logs under "logDir1", fatal exception is thrown in BrokerServer.startup. # In exception hander, BrokerServer.startup calls LogManager.shutdown. As a result, a clean shutdown file is created under "logDir1" # Broker is restarted again. Due to the clean shutdown file created in step 6, recovery is skipped for logs under "logDir1", which is not right because "logDir1" was not fully recovered in step 5. was: Consider following sequence when running Kafka in KRaft mode: # A partition log "log1" is created under "logDir1", and some records are appended to it. # Broker crashes. No clean shutdown file is created in "logDir1". # Broker is restarted. BrokerServer.startup is called. # On a different thread, LogManager.startup is called by BrokerMetadataPublisher. # Before LogManager.startup finishing recovering logs under "logDir1", fatal exception is thrown in BrokerServer.startup. # In exception hander, BrokerServer.startup calls LogManager.shutdown. As a result, a clean shutdown file is created under "logDir1" # Broker is restarted again. Due to the clean shutdown file created in step 6, recovery is skipped for logs under "logDir1", which is not right because "logDir1" needs recovery. > When running in KRaft mode, LogManager may creates CleanShutdown file by > mistake > - > > Key: KAFKA-15375 > URL: https://issues.apache.org/jira/browse/KAFKA-15375 > Project: Kafka > Issue Type: Bug >Reporter: Vincent Jiang >Priority: Major > > Consider following sequence when running Kafka in KRaft mode: > # A partition log "log1" is created under "logDir1", and some records are > appended to it. > # Broker crashes. No clean shutdown file is created in "logDir1". > # Broker is restarted. BrokerServer.startup is called. > # On a different thread, LogManager.startup is called by > BrokerMetadataPublisher. > # Before LogManager.startup finishing recovering logs under "logDir1", fatal > exception is thrown in BrokerServer.startup. > # In exception hander, BrokerServer.startup calls LogManager.shutdown. As a > result, a clean shutdown file is created under "logDir1" > # Broker is restarted again. Due to the clean shutdown file created in step > 6, recovery is skipped for logs under "logDir1", which is not right because > "logDir1" was not fully recovered in step 5. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15375) When running in KRaft mode, LogManager may creates CleanShutdown file by mistake
[ https://issues.apache.org/jira/browse/KAFKA-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Jiang updated KAFKA-15375: -- Description: Consider following sequence when running Kafka in KRaft mode: # A partition log "log1" is created under "logDir1", and some records are appended to it. # Broker crashes. No clean shutdown file is created in "logDir1". # Broker is restarted. BrokerServer.startup is called. # On a different thread, LogManager.startup is called by BrokerMetadataPublisher. # Before LogManager.startup finishing recovering logs under "logDir1", fatal exception is thrown in BrokerServer.startup. # In exception hander, BrokerServer.startup calls LogManager.shutdown. As a result, a clean shutdown file is created under "logDir1" # Broker is restarted again. Due to the clean shutdown file created in step 6, recovery is skipped for logs under "logDir1", which is not right because "logDir1" needs recovery. > When running in KRaft mode, LogManager may creates CleanShutdown file by > mistake > - > > Key: KAFKA-15375 > URL: https://issues.apache.org/jira/browse/KAFKA-15375 > Project: Kafka > Issue Type: Bug >Reporter: Vincent Jiang >Priority: Major > > Consider following sequence when running Kafka in KRaft mode: > # A partition log "log1" is created under "logDir1", and some records are > appended to it. > # Broker crashes. No clean shutdown file is created in "logDir1". > # Broker is restarted. BrokerServer.startup is called. > # On a different thread, LogManager.startup is called by > BrokerMetadataPublisher. > # Before LogManager.startup finishing recovering logs under "logDir1", fatal > exception is thrown in BrokerServer.startup. > # In exception hander, BrokerServer.startup calls LogManager.shutdown. As a > result, a clean shutdown file is created under "logDir1" > # Broker is restarted again. Due to the clean shutdown file created in step > 6, recovery is skipped for logs under "logDir1", which is not right because > "logDir1" needs recovery. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15375) When running in KRaft mode, LogManager may creates CleanShutdown file by mistake
Vincent Jiang created KAFKA-15375: - Summary: When running in KRaft mode, LogManager may creates CleanShutdown file by mistake Key: KAFKA-15375 URL: https://issues.apache.org/jira/browse/KAFKA-15375 Project: Kafka Issue Type: Bug Reporter: Vincent Jiang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14497) LastStableOffset is advanced prematurely when a log is reopened.
Vincent Jiang created KAFKA-14497: - Summary: LastStableOffset is advanced prematurely when a log is reopened. Key: KAFKA-14497 URL: https://issues.apache.org/jira/browse/KAFKA-14497 Project: Kafka Issue Type: Bug Reporter: Vincent Jiang In below test case, last stable offset of log is advanced prematurely after reopen: # producer #1 appends transaction records to leader. offsets = [0, 1, 2, 3] # producer #2 appends transactional records to leader. offsets = [4, 5, 6, 7] # all records are replicated to followers and high watermark advanced to 8. # at this point, lastStableOffset = 0. (first offset of an open transaction) # producer #1 aborts the transaction by writing an abort marker at offset 8. ProducerStateManager.unreplicatedTxns contains the aborted transaction (firstOffset=0, lastOffset=8) # then the log is closed and reopened. # after reopen, log.lastStableOffset is initialized to 4. This is because ProducerStateManager.unreplicatedTxns is empty after reopening log. We should rebuild ProducerStateManager.unreplicatedTxns when reloading a log, so that lastStableOffset remains unchanged before and after reopen. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14347) deleted records may be kept unexpectedly when leader changes while adding a new replica
[ https://issues.apache.org/jira/browse/KAFKA-14347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627365#comment-17627365 ] Vincent Jiang commented on KAFKA-14347: --- Related issue: https://issues.apache.org/jira/browse/KAFKA-4546 > deleted records may be kept unexpectedly when leader changes while adding a > new replica > --- > > Key: KAFKA-14347 > URL: https://issues.apache.org/jira/browse/KAFKA-14347 > Project: Kafka > Issue Type: Improvement >Reporter: Vincent Jiang >Priority: Major > > Consider that in a compacted topic, a regular record _k1=v1_ is deleted by a > later tombstone record {_}k1=null{_}{_}.{_} And imagine that somehow __ log > compaction is making different progress on the three replicas, {_}r1{_}, _r2_ > and _r3:_ > _-_ on replica {_}r1{_}, log compaction has not cleaned _k1=v1_ or _k1=null_ > yet. > - on replica {_}r2{_}, log compaction cleaned and removed both _k1=v1_ and > _k1=null._ > In this case, following sequence can cause record _k1=v1_ being kept > unexpectedly: > 1. Replica _r3_ is re-assigned to a different node and starts to replicate > data from leader. > 2. At the beginning, _r1_ is the leader, so _r3_ replicates record _k1=v1_ > from {_}r1{_}. > 3. Before _k1=null_ is replicated from {_}r1{_}, leader changes to {_}r2{_}. > 4. _r3_ replicates data from {_}r2{_}. Because _k1=null_ record has been > cleaned in {_}r2{_}, it will not be replicated. > As a result, _r3_ has record _k1=v1_ but not {_}k1=null{_}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14347) deleted records may be kept unexpectedly when leader changes while adding a new replica
Vincent Jiang created KAFKA-14347: - Summary: deleted records may be kept unexpectedly when leader changes while adding a new replica Key: KAFKA-14347 URL: https://issues.apache.org/jira/browse/KAFKA-14347 Project: Kafka Issue Type: Improvement Reporter: Vincent Jiang Consider that in a compacted topic, a regular record _k1=v1_ is deleted by a later tombstone record {_}k1=null{_}{_}.{_} And imagine that somehow __ log compaction is making different progress on the three replicas, {_}r1{_}, _r2_ and _r3:_ _-_ on replica {_}r1{_}, log compaction has not cleaned _k1=v1_ or _k1=null_ yet. - on replica {_}r2{_}, log compaction cleaned and removed both _k1=v1_ and _k1=null._ In this case, following sequence can cause record _k1=v1_ being kept unexpectedly: 1. Replica _r3_ is re-assigned to a different node and starts to replicate data from leader. 2. At the beginning, _r1_ is the leader, so _r3_ replicates record _k1=v1_ from {_}r1{_}. 3. Before _k1=null_ is replicated from {_}r1{_}, leader changes to {_}r2{_}. 4. _r3_ replicates data from {_}r2{_}. Because _k1=null_ record has been cleaned in {_}r2{_}, it will not be replicated. As a result, _r3_ has record _k1=v1_ but not {_}k1=null{_}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14096) Race Condition in Log Rolling Leading to Disk Failure
[ https://issues.apache.org/jira/browse/KAFKA-14096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579155#comment-17579155 ] Vincent Jiang commented on KAFKA-14096: --- I have two questions about this issue: 1. at 2022-07-18 18:24:47,782, logging shows segment 141935201 was scheduled to be deleted, which means the segment should have been removed from Log.segments. Then at 2022-07-18 18:24:48,024, how did Log.flush method still see the segment? 2. by default, there is a 60 seconds delay after a segment is scheduled for deletion. So segment deletion should not happen yet at 2022-07-18 18:24:48,024. If so, why did Log.flush see ClosedChannelException? I assume renaming file shall not cause ClosedChannelException. [~eazama] , to better understanding the issue, could you share the full broker log? > Race Condition in Log Rolling Leading to Disk Failure > - > > Key: KAFKA-14096 > URL: https://issues.apache.org/jira/browse/KAFKA-14096 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.5.1 >Reporter: Eric Azama >Priority: Major > > We've recently encountered what appears to be a race condition that can lead > to disk being marked offline. One of our brokers recently crashed because its > log directory failed. We found the following in the server.log file > {code:java} > [2022-07-18 18:24:42,940] INFO [Log partition=TOPIC-REDACTED-15, > dir=/data1/kafka-logs] Rolled new log segment at offset 141946850 in 37 ms. > (kafka.log.Log) > [...] > [2022-07-18 18:24:47,782] INFO [Log partition=TOPIC-REDACTED-15, > dir=/data1/kafka-logs] Scheduling segments for deletion > List(LogSegment(baseOffset=141935201, size=1073560219, > lastModifiedTime=1658168598869, largestTime=1658168495678)) (kafka.log.Log) > [2022-07-18 18:24:48,024] ERROR Error while flushing log for > TOPIC-REDACTED-15 in dir /data1/kafka-logs with offset 141935201 > (kafka.server.LogDirFailureChannel) > java.nio.channels.ClosedChannelException > at java.base/sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:150) > at java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:452) > at org.apache.kafka.common.record.FileRecords.flush(FileRecords.java:176) > at kafka.log.LogSegment.$anonfun$flush$1(LogSegment.scala:472) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31) > at kafka.log.LogSegment.flush(LogSegment.scala:471) > at kafka.log.Log.$anonfun$flush$4(Log.scala:1956) > at kafka.log.Log.$anonfun$flush$4$adapted(Log.scala:1955) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at kafka.log.Log.$anonfun$flush$2(Log.scala:1955) > at kafka.log.Log.flush(Log.scala:2322) > at kafka.log.Log.$anonfun$roll$9(Log.scala:1925) > at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > [2022-07-18 18:24:48,036] ERROR Uncaught exception in scheduled task > 'flush-log' (kafka.utils.KafkaScheduler) > org.apache.kafka.common.errors.KafkaStorageException: Error while flushing > log for TOPIC-REDACTED-15 in dir /data1/kafka-logs with offset 141935201{code} > and the following in the log-cleaner.log file > {code:java} > [2022-07-18 18:24:47,062] INFO Cleaner 0: Cleaning > LogSegment(baseOffset=141935201, size=1073560219, > lastModifiedTime=1658168598869, largestTime=1658168495678) in log > TOPIC-REDACTED-15 into 141935201 with deletion horizon 1658082163480, > retaining deletes. (kafka.log.LogCleaner) {code} > The timing of the log-cleaner log shows that the log flush failed because the > log segment had been cleaned and the underlying file was already renamed or > deleted. > The stacktrace indicates that the log flush that triggered the exception was > part of the process of rolling a new log segment. (at > kafka.log.Log.$anonfun$roll$9([Log.scala:1925|https://github.com/apache/kafka/blob/2.5.1/core/src/main/scala/kafka/log/Log.scala#L1925]))
[jira] [Updated] (KAFKA-14151) Add additional validation to protect on-disk log segment data from being corrupted
[ https://issues.apache.org/jira/browse/KAFKA-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Jiang updated KAFKA-14151: -- Priority: Major (was: Minor) > Add additional validation to protect on-disk log segment data from being > corrupted > -- > > Key: KAFKA-14151 > URL: https://issues.apache.org/jira/browse/KAFKA-14151 > Project: Kafka > Issue Type: Improvement > Components: log >Reporter: Vincent Jiang >Priority: Major > > We received escalations reporting bad records being written to log segment > on-disk data due to environmental issues (bug in old version JVM jit). We > should consider adding additional validation to protect the on-disk data > from being corrupted by inadvertent bugs or environmental issues -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14151) Add additional validation to protect on-disk log segment data from being corrupted
Vincent Jiang created KAFKA-14151: - Summary: Add additional validation to protect on-disk log segment data from being corrupted Key: KAFKA-14151 URL: https://issues.apache.org/jira/browse/KAFKA-14151 Project: Kafka Issue Type: Improvement Components: log Reporter: Vincent Jiang We received escalations reporting bad records being written to log segment on-disk data due to environmental issues (bug in old version JVM jit). We should consider adding additional validation to protect the on-disk data from being corrupted by inadvertent bugs or environmental issues -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14151) Add additional validation to protect on-disk log segment data from being corrupted
[ https://issues.apache.org/jira/browse/KAFKA-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Jiang updated KAFKA-14151: -- Priority: Minor (was: Major) > Add additional validation to protect on-disk log segment data from being > corrupted > -- > > Key: KAFKA-14151 > URL: https://issues.apache.org/jira/browse/KAFKA-14151 > Project: Kafka > Issue Type: Improvement > Components: log >Reporter: Vincent Jiang >Priority: Minor > > We received escalations reporting bad records being written to log segment > on-disk data due to environmental issues (bug in old version JVM jit). We > should consider adding additional validation to protect the on-disk data > from being corrupted by inadvertent bugs or environmental issues -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14005) LogCleaner doesn't clean log if there is no dirty range
Vincent Jiang created KAFKA-14005: - Summary: LogCleaner doesn't clean log if there is no dirty range Key: KAFKA-14005 URL: https://issues.apache.org/jira/browse/KAFKA-14005 Project: Kafka Issue Type: Bug Reporter: Vincent Jiang When there is no dirty range to clean (firstDirtyOffset == firstUnclenableOffset), buildOffsetMap for dirty range returns an empty offset map, with map.latestOffset = -1. Then target cleaning offset range becomes [startOffset, map.latestOffset + 1) = [startOffset, 0], hence no segments are cleaned. The correct cleaning offset range should be [startOffset, firstDirtyOffset], so that the log can be cleaned again to remove abort/commit markers, or tombstones. LogCleanerTest.FakeOffsetMap.clear() method has a bug - it doesn't reset lastOffset. This bug causes test case like testAbortMarkerRemoval() pass false-positively. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13717) KafkaConsumer.close throws authorization exception even when commit offsets is empty
Vincent Jiang created KAFKA-13717: - Summary: KafkaConsumer.close throws authorization exception even when commit offsets is empty Key: KAFKA-13717 URL: https://issues.apache.org/jira/browse/KAFKA-13717 Project: Kafka Issue Type: Bug Components: unit tests Reporter: Vincent Jiang When offsets is empty and coordinator is unknown, KafkaConsumer.close doesn't throw exception before commit [https://github.com/apache/kafka/commit/4b468a9d81f7380f7197a2a6b859c1b4dca84bd9|https://github.com/apache/kafka/commit/4b468a9d81f7380f7197a2a6b859c1b4dca84bd9,]. After this commit, Kafka.close may throw authorization exception. Root cause is because in the commit, the logic is changed to call lookupCoordinator even if offsets is empty. Even if a consumer doesn't have access to a group or a topic, it might be better to not throw authorization exception in this case because close() call doesn't touch actually access any resource. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13706) org.apache.kafka.test.MockSelector doesn't remove closed connections from its 'ready' field
Vincent Jiang created KAFKA-13706: - Summary: org.apache.kafka.test.MockSelector doesn't remove closed connections from its 'ready' field Key: KAFKA-13706 URL: https://issues.apache.org/jira/browse/KAFKA-13706 Project: Kafka Issue Type: Bug Components: unit tests Reporter: Vincent Jiang MockSelector.close(String id) method doesn't remove closed connection from "ready" field. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13461) KafkaController stops functioning as active controller after ZooKeeperClient auth failure
Vincent Jiang created KAFKA-13461: - Summary: KafkaController stops functioning as active controller after ZooKeeperClient auth failure Key: KAFKA-13461 URL: https://issues.apache.org/jira/browse/KAFKA-13461 Project: Kafka Issue Type: Bug Components: zkclient Reporter: Vincent Jiang When java.security.auth.login.config is present, but there is no "Client" section, ZookeeperSaslClient creation fails and raises LoginExcpetion, result in warning log: {code:java} WARN SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '***'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.{code} When this happens after initial startup, ClientCnxn enqueues an AuthFailed event which will trigger following sequence: # zkclient reinitialization is triggered # the controller resigns. # Before the controller's ZK session expires, the controller successfully connect to ZK and maintains the current session # In KafkaController.elect(), the controller sets activeControllerId to itself and short-circuits the rest of the elect. Since the controller resigned earlier and also skips the call to onControllerFailover(), the controller is not actually functioning as the active controller (e.g. the necessary ZK watchers haven't been registered). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13305) NullPointerException in LogCleanerManager "uncleanable-bytes" gauge
Vincent Jiang created KAFKA-13305: - Summary: NullPointerException in LogCleanerManager "uncleanable-bytes" gauge Key: KAFKA-13305 URL: https://issues.apache.org/jira/browse/KAFKA-13305 Project: Kafka Issue Type: Bug Components: log cleaner Reporter: Vincent Jiang We've seen following exception in production environment: {quote} java.lang.NullPointerException: Cannot invoke "kafka.log.UnifiedLog.logStartOffset()" because "log" is null at kafka.log.LogCleanerManager$.cleanableOffsets(LogCleanerManager.scala:599) {quote} Looks like uncleanablePartitions never has partitions removed from it to reflect partition deletion/reassignment. We should fix the NullPointerException and removed deleted partitions from uncleanablePartitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)