[jira] [Commented] (KAFKA-12493) The controller should handle the consistency between the controllerContext and the partition replicas assignment on zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-12493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377956#comment-17377956 ] Wenbing Shen commented on KAFKA-12493: -- [~kkonstantine] This issue does not block 3.0.0, you just do it. Thanks. > The controller should handle the consistency between the controllerContext > and the partition replicas assignment on zookeeper > - > > Key: KAFKA-12493 > URL: https://issues.apache.org/jira/browse/KAFKA-12493 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Major > Fix For: 3.0.0 > > > This question can be linked to this email: > [https://lists.apache.org/thread.html/redf5748ec787a9c65fc48597e3d2256ffdd729de14afb873c63e6c5b%40%3Cusers.kafka.apache.org%3E] > > This is a 100% recurring problem. > Problem description: > In the production environment of our customer’s site, the existing partitions > were redistributed in the code of colleagues in other departments and written > into zookeeper. This caused the controller to only judge the newly added > partitions when processing partition modification events. Partition > allocation plan and new partition and replica allocation in the partition > state machine and replica state machine, and issue LeaderAndISR and other > control requests. > But the controller did not verify the existing partition replicas assigment > in the controllerContext and whether the original partition allocation on the > znode in zookeeper has changed. This seems to be no problem, but when we have > to restart the broker for some reasons, such as configuration updates and > upgrades Wait, this will cause this part of the topic in real-time production > to be abnormal, the controller cannot complete the allocation of the new > leader, and the original leader cannot correctly identify the replica > allocated on the current zookeeper. The real-time business in our customer's > on-site environment is interrupted and partially Data has been lost. > This problem can be stably reproduced in the following ways: > Adding partitions or modifying replicas of an existing topic through the > following code will cause the original partition replicas to be reallocated > and finally written to zookeeper.Next, the controller did not accurately > process this event, restart the topic related broker, this topic will not be > able to be produced and consumed. > > {code:java} > public void updateKafkaTopic(KafkaTopicVO kafkaTopicVO) { > ZkUtils zkUtils = ZkUtils.apply(ZK_LIST, SESSION_TIMEOUT, > CONNECTION_TIMEOUT, JaasUtils.isZkSecurityEnabled()); > try { > if (kafkaTopicVO.getPartitionNum() >= 0 && > kafkaTopicVO.getReplicationNum() >= 0) { > // Get the original broker data information > Seq brokerMetadata = > AdminUtils.getBrokerMetadatas(zkUtils, > RackAwareMode.Enforced$.MODULE$, > Option.apply(null)); > // Generate a new partition replica allocation plan > scala.collection.Map> replicaAssign = > AdminUtils.assignReplicasToBrokers(brokerMetadata, > kafkaTopicVO.getPartitionNum(), // Number of partitions > kafkaTopicVO.getReplicationNum(), // Number of replicas > per partition > -1, > -1); > // Modify the partition replica allocation plan > AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK(zkUtils, > kafkaTopicVO.getTopicNameList().get(0), > replicaAssign, > null, > true); > } > } catch (Exception e) { > System.out.println("Adjust partition abnormal"); > System.exit(0); > } finally { > zkUtils.close(); > } > } > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12891) Add --files and --file-separator options to the ConsoleProducer
Wenbing Shen created KAFKA-12891: Summary: Add --files and --file-separator options to the ConsoleProducer Key: KAFKA-12891 URL: https://issues.apache.org/jira/browse/KAFKA-12891 Project: Kafka Issue Type: New Feature Components: tools Reporter: Wenbing Shen Assignee: Wenbing Shen Introduce *--files* to the producer command line tool to support reading data from a given *multi-file*, Multiple files are separated by *--files-separator*, the default *comma* is the separator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10886) Kafka crashed in windows environment2
[ https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352242#comment-17352242 ] Wenbing Shen commented on KAFKA-10886: -- I already know why you can’t use this patch, because the patch I submitted is from our company’s internal Kafka, which is somewhat different from the community’s 2.0.0 version of the code. I’m sorry, this is the first time I uploaded a patch, and I didn't thoughtful well at that time. when I have time, I will use the community code to fix the incompatibility with windows and upload the latest patch. If you need to fix the problem urgently, you can download the patch, modify it and apply it to your kafka code manually . > Kafka crashed in windows environment2 > - > > Key: KAFKA-10886 > URL: https://issues.apache.org/jira/browse/KAFKA-10886 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.0.0 > Environment: Windows Server >Reporter: Wenbing Shen >Priority: Critical > Labels: windows > Attachments: windows_kafka_full_crash.patch > > > I tried using the > [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix > the Kafka problem in the Windows environment, but it didn't seem to > work.These include restarting the Kafka service causing data to be deleted by > mistake, deleting a topic or a partition migration causing a disk to go > offline or the broker crashed. > [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23 > 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException: > > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at > kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log.maybeHandleIOException(Log.scala:1837) at > kafka.log.Log.renameDir(Log.scala:687) at > kafka.log.LogManager.asyncDelete(LogManager.scala:833) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at > kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at > kafka.cluster.Partition.delete(Partition.scala:262) at > kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at > kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at > kafka.server.KafkaApis.handle(KafkaApis.scala:113) at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at > java.lang.Thread.run(Unknown Source) Suppressed: > java.nio.file.AccessDeniedException: > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) > ... 23
[jira] [Updated] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck
[ https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-12734: - Affects Version/s: (was: 2.8.0) 2.3.0 > LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip > activeSegment sanityCheck > > > Key: KAFKA-12734 > URL: https://issues.apache.org/jira/browse/KAFKA-12734 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: LoadIndex.png, image-2021-04-30-22-49-24-202.png, > niobufferoverflow.png > > > This question is similar to KAFKA-9156 > We introduced Lazy Index, which helps us skip checking the index files of all > log segments when starting kafka, which has greatly improved the speed of our > kafka startup. > Unfortunately, it skips the index file detection of the active segment. The > active segment will receive write requests from the client or the replica > synchronization thread. > There is a situation when we skip the index detection of all segments, and we > do not need to recover the unflushed log segment, and the index file of the > last active segment is damaged at this time. When appending data to the > active segment, at this time The program reported an error. > Below are the problems I encountered in the production environment: > When Kafka starts to load the log segment, I see in the program log that the > memory mapping position of the index file with timestamp and offset is at the > larger position of the current index file, but in fact, the index file is not > written With so many index items, I guess this kind of problem will occur > during the kafka startup process. When kafka has not been started yet, stop > the kafka process at this time, and then start the kafka process again, > whether it will cause the limit address of the index file memory map to be a > file The maximum value is not cut to the actual size used, which will cause > the memory map position to be set to limit when Kafka is started. > At this time, adding data to the active segment will cause niobufferoverflow. > I agree to skip the index detection of all inactive segments, because in fact > they will no longer receive write requests, but for active segments, we need > to perform index file detection. > Another situation is that we have CleanShutdown, but due to some factors, > the index file of the active segment sets the position of the memory map to > limit, resulting in a niobuffer overflow in the write -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck
[ https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340247#comment-17340247 ] Wenbing Shen commented on KAFKA-12734: -- [~ecastilla] LazyIndex was introduced from version 2.3.0. After the introduction of this feature, the problem has always existed. Among them, the concurrency problem is solved at [KAFKA-9156|https://issues.apache.org/jira/browse/KAFKA-9156] (2.3.2, 2.4.0) , At [KAFKA-10471|https://issues.apache.org/jira/browse/KAFKA-10471] (2.8.0) the problem described by this pr is solved. I think if you want to fix the problem described in this pr, you need to introduce the repair code from 2.8.0 > LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip > activeSegment sanityCheck > > > Key: KAFKA-12734 > URL: https://issues.apache.org/jira/browse/KAFKA-12734 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: LoadIndex.png, image-2021-04-30-22-49-24-202.png, > niobufferoverflow.png > > > This question is similar to KAFKA-9156 > We introduced Lazy Index, which helps us skip checking the index files of all > log segments when starting kafka, which has greatly improved the speed of our > kafka startup. > Unfortunately, it skips the index file detection of the active segment. The > active segment will receive write requests from the client or the replica > synchronization thread. > There is a situation when we skip the index detection of all segments, and we > do not need to recover the unflushed log segment, and the index file of the > last active segment is damaged at this time. When appending data to the > active segment, at this time The program reported an error. > Below are the problems I encountered in the production environment: > When Kafka starts to load the log segment, I see in the program log that the > memory mapping position of the index file with timestamp and offset is at the > larger position of the current index file, but in fact, the index file is not > written With so many index items, I guess this kind of problem will occur > during the kafka startup process. When kafka has not been started yet, stop > the kafka process at this time, and then start the kafka process again, > whether it will cause the limit address of the index file memory map to be a > file The maximum value is not cut to the actual size used, which will cause > the memory map position to be set to limit when Kafka is started. > At this time, adding data to the active segment will cause niobufferoverflow. > I agree to skip the index detection of all inactive segments, because in fact > they will no longer receive write requests, but for active segments, we need > to perform index file detection. > Another situation is that we have CleanShutdown, but due to some factors, > the index file of the active segment sets the position of the memory map to > limit, resulting in a niobuffer overflow in the write -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck
[ https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen resolved KAFKA-12734. -- Resolution: Duplicate Duplicate with jiraId KAFKA-10471, This problem has been fixed in version 2.8. > LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip > activeSegment sanityCheck > > > Key: KAFKA-12734 > URL: https://issues.apache.org/jira/browse/KAFKA-12734 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: LoadIndex.png, image-2021-04-30-22-49-24-202.png, > niobufferoverflow.png > > > This question is similar to KAFKA-9156 > We introduced Lazy Index, which helps us skip checking the index files of all > log segments when starting kafka, which has greatly improved the speed of our > kafka startup. > Unfortunately, it skips the index file detection of the active segment. The > active segment will receive write requests from the client or the replica > synchronization thread. > There is a situation when we skip the index detection of all segments, and we > do not need to recover the unflushed log segment, and the index file of the > last active segment is damaged at this time. When appending data to the > active segment, at this time The program reported an error. > Below are the problems I encountered in the production environment: > When Kafka starts to load the log segment, I see in the program log that the > memory mapping position of the index file with timestamp and offset is at the > larger position of the current index file, but in fact, the index file is not > written With so many index items, I guess this kind of problem will occur > during the kafka startup process. When kafka has not been started yet, stop > the kafka process at this time, and then start the kafka process again, > whether it will cause the limit address of the index file memory map to be a > file The maximum value is not cut to the actual size used, which will cause > the memory map position to be set to limit when Kafka is started. > At this time, adding data to the active segment will cause niobufferoverflow. > I agree to skip the index detection of all inactive segments, because in fact > they will no longer receive write requests, but for active segments, we need > to perform index file detection. > Another situation is that we have CleanShutdown, but due to some factors, > the index file of the active segment sets the position of the memory map to > limit, resulting in a niobuffer overflow in the write -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck
[ https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337435#comment-17337435 ] Wenbing Shen commented on KAFKA-12734: -- This is a problem we often encounter before applying LazyIndex. SanityCheck will help us rebuild the index, but after applying LazyIndex, it skips checking all index files, resulting in abnormal index files in the active segment. At this time, appending data to the log will cause niobufferoverflow exception. !image-2021-04-30-22-49-24-202.png! > LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip > activeSegment sanityCheck > > > Key: KAFKA-12734 > URL: https://issues.apache.org/jira/browse/KAFKA-12734 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: LoadIndex.png, image-2021-04-30-22-49-24-202.png, > niobufferoverflow.png > > > This question is similar to KAFKA-9156 > We introduced Lazy Index, which helps us skip checking the index files of all > log segments when starting kafka, which has greatly improved the speed of our > kafka startup. > Unfortunately, it skips the index file detection of the active segment. The > active segment will receive write requests from the client or the replica > synchronization thread. > There is a situation when we skip the index detection of all segments, and we > do not need to recover the unflushed log segment, and the index file of the > last active segment is damaged at this time. When appending data to the > active segment, at this time The program reported an error. > Below are the problems I encountered in the production environment: > When Kafka starts to load the log segment, I see in the program log that the > memory mapping position of the index file with timestamp and offset is at the > larger position of the current index file, but in fact, the index file is not > written With so many index items, I guess this kind of problem will occur > during the kafka startup process. When kafka has not been started yet, stop > the kafka process at this time, and then start the kafka process again, > whether it will cause the limit address of the index file memory map to be a > file The maximum value is not cut to the actual size used, which will cause > the memory map position to be set to limit when Kafka is started. > At this time, adding data to the active segment will cause niobufferoverflow. > I agree to skip the index detection of all inactive segments, because in fact > they will no longer receive write requests, but for active segments, we need > to perform index file detection. > Another situation is that we have CleanShutdown, but due to some factors, > the index file of the active segment sets the position of the memory map to > limit, resulting in a niobuffer overflow in the write -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck
[ https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-12734: - Attachment: image-2021-04-30-22-49-24-202.png > LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip > activeSegment sanityCheck > > > Key: KAFKA-12734 > URL: https://issues.apache.org/jira/browse/KAFKA-12734 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: LoadIndex.png, image-2021-04-30-22-49-24-202.png, > niobufferoverflow.png > > > This question is similar to KAFKA-9156 > We introduced Lazy Index, which helps us skip checking the index files of all > log segments when starting kafka, which has greatly improved the speed of our > kafka startup. > Unfortunately, it skips the index file detection of the active segment. The > active segment will receive write requests from the client or the replica > synchronization thread. > There is a situation when we skip the index detection of all segments, and we > do not need to recover the unflushed log segment, and the index file of the > last active segment is damaged at this time. When appending data to the > active segment, at this time The program reported an error. > Below are the problems I encountered in the production environment: > When Kafka starts to load the log segment, I see in the program log that the > memory mapping position of the index file with timestamp and offset is at the > larger position of the current index file, but in fact, the index file is not > written With so many index items, I guess this kind of problem will occur > during the kafka startup process. When kafka has not been started yet, stop > the kafka process at this time, and then start the kafka process again, > whether it will cause the limit address of the index file memory map to be a > file The maximum value is not cut to the actual size used, which will cause > the memory map position to be set to limit when Kafka is started. > At this time, adding data to the active segment will cause niobufferoverflow. > I agree to skip the index detection of all inactive segments, because in fact > they will no longer receive write requests, but for active segments, we need > to perform index file detection. > Another situation is that we have CleanShutdown, but due to some factors, > the index file of the active segment sets the position of the memory map to > limit, resulting in a niobuffer overflow in the write -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck
[ https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-12734: - Description: This question is similar to KAFKA-9156 We introduced Lazy Index, which helps us skip checking the index files of all log segments when starting kafka, which has greatly improved the speed of our kafka startup. Unfortunately, it skips the index file detection of the active segment. The active segment will receive write requests from the client or the replica synchronization thread. There is a situation when we skip the index detection of all segments, and we do not need to recover the unflushed log segment, and the index file of the last active segment is damaged at this time. When appending data to the active segment, at this time The program reported an error. Below are the problems I encountered in the production environment: When Kafka starts to load the log segment, I see in the program log that the memory mapping position of the index file with timestamp and offset is at the larger position of the current index file, but in fact, the index file is not written With so many index items, I guess this kind of problem will occur during the kafka startup process. When kafka has not been started yet, stop the kafka process at this time, and then start the kafka process again, whether it will cause the limit address of the index file memory map to be a file The maximum value is not cut to the actual size used, which will cause the memory map position to be set to limit when Kafka is started. At this time, adding data to the active segment will cause niobufferoverflow. I agree to skip the index detection of all inactive segments, because in fact they will no longer receive write requests, but for active segments, we need to perform index file detection. Another situation is that we have CleanShutdown, but due to some factors, the index file of the active segment sets the position of the memory map to limit, resulting in a niobuffer overflow in the write was: This question is similar to KAFKA-9156 We introduced Lazy Index, which helps us skip checking the index files of all log segments when starting kafka, which has greatly improved the speed of our kafka startup. Unfortunately, it skips the index file detection of the active segment. The active segment will receive write requests from the client or the replica synchronization thread. There is a situation when we skip the index detection of all segments, and we do not need to recover the unflushed log segment, and the index file of the last active segment is damaged at this time. When appending data to the active segment, at this time The program reported an error. Below are the problems I encountered in the production environment: When Kafka starts to load the log segment, I see in the program log that the memory mapping position of the index file with timestamp and offset is at the larger position of the current index file, but in fact, the index file is not written With so many index items, I guess this kind of problem will occur during the kafka startup process. When kafka has not been started yet, stop the kafka process at this time, and then start the kafka process again, whether it will cause the limit address of the index file memory map to be a file The maximum value is not cut to the actual size used, which will cause the memory map position to be set to limit when Kafka is started. At this time, adding data to the active segment will cause niobufferoverflow. I agree to skip the index detection of all inactive segments, because in fact they will no longer receive write requests, but for active segments, we need to perform index file detection. > LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip > activeSegment sanityCheck > > > Key: KAFKA-12734 > URL: https://issues.apache.org/jira/browse/KAFKA-12734 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: LoadIndex.png, niobufferoverflow.png > > > This question is similar to KAFKA-9156 > We introduced Lazy Index, which helps us skip checking the index files of all > log segments when starting kafka, which has greatly improved the speed of our > kafka startup. > Unfortunately, it skips the index file detection of the active segment. The > active segment will receive write requests from the client or the replica > synchronization thread. > There is a situation when we skip the index detection of all segments, and we > do not need to recover the unflushed log segment,
[jira] [Updated] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck
[ https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-12734: - Description: This question is similar to KAFKA-9156 We introduced Lazy Index, which helps us skip checking the index files of all log segments when starting kafka, which has greatly improved the speed of our kafka startup. Unfortunately, it skips the index file detection of the active segment. The active segment will receive write requests from the client or the replica synchronization thread. There is a situation when we skip the index detection of all segments, and we do not need to recover the unflushed log segment, and the index file of the last active segment is damaged at this time. When appending data to the active segment, at this time The program reported an error. Below are the problems I encountered in the production environment: When Kafka starts to load the log segment, I see in the program log that the memory mapping position of the index file with timestamp and offset is at the larger position of the current index file, but in fact, the index file is not written With so many index items, I guess this kind of problem will occur during the kafka startup process. When kafka has not been started yet, stop the kafka process at this time, and then start the kafka process again, whether it will cause the limit address of the index file memory map to be a file The maximum value is not cut to the actual size used, which will cause the memory map position to be set to limit when Kafka is started. At this time, adding data to the active segment will cause niobufferoverflow. I agree to skip the index detection of all inactive segments, because in fact they will no longer receive write requests, but for active segments, we need to perform index file detection. was: This question is similar to KAFKA-9156 We introduced Lazy Index, which helps us skip checking the index files of all log segments when starting kafka, which has greatly improved the speed of our kafka startup. Unfortunately, it skips the index file detection of the active segment. The active segment will receive write requests from the client or the replica synchronization thread. There is a situation when we skip the index detection of all segments, and we do not need to recover the unflushed log segment, and the index file of the last active segment is damaged at this time. When appending data to the active segment, at this time The program reported an error. Below are the problems I encountered in the production environment: When Kafka starts to load the log segment, I see in the program log that the memory mapping position of the index file with timestamp and offset is at the larger position of the current index file, but in fact, the index file is not written With so many index items, I guess this kind of problem will occur during the kafka startup process. When kafka has not been started yet, stop the kafka process at this time, and then start the kafka process again, whether it will cause the limit address of the index file memory map to be a file The maximum value is not cut to the actual size used, which will cause the memory map position to be set to limit when Kafka is started. At this time, adding data to the active segment will cause niobufferoverflow. I agree to skip the index detection of all inactive segments, because in fact they will no longer receive write requests, but for active segments, we need to perform index file detection. > LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip > activeSegment sanityCheck > > > Key: KAFKA-12734 > URL: https://issues.apache.org/jira/browse/KAFKA-12734 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: LoadIndex.png, niobufferoverflow.png > > > This question is similar to KAFKA-9156 > We introduced Lazy Index, which helps us skip checking the index files of all > log segments when starting kafka, which has greatly improved the speed of our > kafka startup. > Unfortunately, it skips the index file detection of the active segment. The > active segment will receive write requests from the client or the replica > synchronization thread. > There is a situation when we skip the index detection of all segments, and we > do not need to recover the unflushed log segment, and the index file of the > last active segment is damaged at this time. When appending data to the > active segment, at this time The program reported an error. > Below are the problems I encountered in the
[jira] [Assigned] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck
[ https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen reassigned KAFKA-12734: Assignee: Wenbing Shen > LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip > activeSegment sanityCheck > > > Key: KAFKA-12734 > URL: https://issues.apache.org/jira/browse/KAFKA-12734 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: LoadIndex.png, niobufferoverflow.png > > > This question is similar to KAFKA-9156 > We introduced Lazy Index, which helps us skip checking the index files of all > log segments when starting kafka, which has greatly improved the speed of our > kafka startup. > Unfortunately, it skips the index file detection of the active segment. The > active segment will receive write requests from the client or the replica > synchronization thread. > There is a situation when we skip the index detection of all segments, and we > do not need to recover the unflushed log segment, and the index file of the > last active segment is damaged at this time. When appending data to the > active segment, at this time The program reported an error. > Below are the problems I encountered in the production environment: > When Kafka starts to load the log segment, I see in the program log that the > memory mapping position of the index file with timestamp and offset is at the > larger position of the current index file, but in fact, the index file is not > written With so many index items, I guess this kind of problem will occur > during the kafka startup process. When kafka has not been started yet, stop > the kafka process at this time, and then start the kafka process again, > whether it will cause the limit address of the index file memory map to be a > file The maximum value is not cut to the actual size used, which will cause > the memory map position to be set to limit when Kafka is started. > At this time, adding data to the active segment will cause niobufferoverflow. > I agree to skip the index detection of all inactive segments, because in fact > they will no longer receive write requests, but for active segments, we need > to perform index file detection. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck
Wenbing Shen created KAFKA-12734: Summary: LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck Key: KAFKA-12734 URL: https://issues.apache.org/jira/browse/KAFKA-12734 Project: Kafka Issue Type: Bug Components: log Affects Versions: 2.8.0, 2.7.0, 2.6.0, 2.5.0, 2.4.0 Reporter: Wenbing Shen Attachments: LoadIndex.png, niobufferoverflow.png This question is similar to KAFKA-9156 We introduced Lazy Index, which helps us skip checking the index files of all log segments when starting kafka, which has greatly improved the speed of our kafka startup. Unfortunately, it skips the index file detection of the active segment. The active segment will receive write requests from the client or the replica synchronization thread. There is a situation when we skip the index detection of all segments, and we do not need to recover the unflushed log segment, and the index file of the last active segment is damaged at this time. When appending data to the active segment, at this time The program reported an error. Below are the problems I encountered in the production environment: When Kafka starts to load the log segment, I see in the program log that the memory mapping position of the index file with timestamp and offset is at the larger position of the current index file, but in fact, the index file is not written With so many index items, I guess this kind of problem will occur during the kafka startup process. When kafka has not been started yet, stop the kafka process at this time, and then start the kafka process again, whether it will cause the limit address of the index file memory map to be a file The maximum value is not cut to the actual size used, which will cause the memory map position to be set to limit when Kafka is started. At this time, adding data to the active segment will cause niobufferoverflow. I agree to skip the index detection of all inactive segments, because in fact they will no longer receive write requests, but for active segments, we need to perform index file detection. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10773) When I execute the below command, Kafka cannot start in local.
[ https://issues.apache.org/jira/browse/KAFKA-10773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326600#comment-17326600 ] Wenbing Shen commented on KAFKA-10773: -- Similar to KAFKA-12680. In fact, this is a bug of wsl. The link for this bug is here: [https://github.com/microsoft/WSL/issues/2281] > When I execute the below command, Kafka cannot start in local. > -- > > Key: KAFKA-10773 > URL: https://issues.apache.org/jira/browse/KAFKA-10773 > Project: Kafka > Issue Type: Bug >Reporter: NAYUSIK >Priority: Critical > Attachments: image-2020-11-27-19-09-49-389.png > > > When I execute the below command, Kafka cannot start in local. > *confluent local services start* > *Please check the below error log and let me know how I modify it.* > [2020-11-27 18:52:21,019] ERROR Error while deleting the clean shutdown file > in dir /tmp/confluent.056805/kafka/data (kafka.server.LogDirFailureChannel) > java.io.IOException: Invalid argument > at java.io.RandomAccessFile.setLength(Native Method) > at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:190) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:176) > at > kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:242) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:242) > at kafka.log.LogSegment.recover(LogSegment.scala:380) > at kafka.log.Log.recoverSegment(Log.scala:692) > at kafka.log.Log.recoverLog(Log.scala:830) > at kafka.log.Log.$anonfun$loadSegments$3(Log.scala:767) > at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17) > at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2626) > at kafka.log.Log.loadSegments(Log.scala:767) > at kafka.log.Log.(Log.scala:313) > at kafka.log.MergedLog$.apply(MergedLog.scala:796) > at kafka.log.LogManager.loadLog(LogManager.scala:294) > at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:373) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > [2020-11-27 18:52:21,019] ERROR Error while deleting the clean shutdown file > in dir /tmp/confluent.056805/kafka/data (kafka.server.LogDirFailureChannel) > java.io.IOException: Invalid argument > at java.io.RandomAccessFile.setLength(Native Method) > at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:190) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:176) > at > kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:242) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:242) > at kafka.log.LogSegment.recover(LogSegment.scala:380) > at kafka.log.Log.recoverSegment(Log.scala:692) > at kafka.log.Log.recoverLog(Log.scala:830) > at kafka.log.Log.$anonfun$loadSegments$3(Log.scala:767) > at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17) > at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2626) > at kafka.log.Log.loadSegments(Log.scala:767) > at kafka.log.Log.(Log.scala:313) > at kafka.log.MergedLog$.apply(MergedLog.scala:796) > at kafka.log.LogManager.loadLog(LogManager.scala:294) > at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:373) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > > [2020-11-27 18:52:22,261] ERROR Shutdown broker because all log dirs in > /tmp/confluent.056805/kafka/data have failed (kafka.log.LogManager) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-12680) Failed to restart the broker in kraft mode
[ https://issues.apache.org/jira/browse/KAFKA-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen resolved KAFKA-12680. -- Resolution: Not A Problem > Failed to restart the broker in kraft mode > -- > > Key: KAFKA-12680 > URL: https://issues.apache.org/jira/browse/KAFKA-12680 > Project: Kafka > Issue Type: Bug >Reporter: Wenbing Shen >Priority: Major > > I tested kraft mode for the first time today, I deployed a single node kraft > mode broker according to the documentation: > [https://github.com/apache/kafka/blob/6d1d68617ecd023b787f54aafc24a4232663428d/config/kraft/README.md] > > first step: ./bin/kafka-storage.sh random-uuid > Second step: Use the uuid generated above to execute the following commands: > ./bin/kafka-storage.sh format -t -c ./config/kraft/server.properties > > third step: ./bin/kafka-server-start.sh ./config/kraft/server.properties > > Then I created two topics with two partitions and a single replica. > ./bin/kafka-topics.sh --create --topic test-01 --partitions 2 > --replication-factor 1 --bootstrap-server localhost:9092 > Verify that there is no problem with production and consumption, but when I > call kafka-server-stop.sh, when I call the start command again, the broker > starts to report an error. > I am not sure if it is a known bug or a problem with my usage > > [2021-04-18 00:19:37,443] ERROR Exiting Kafka due to fatal exception > (kafka.Kafka$) > java.io.IOException: Invalid argument > at java.io.RandomAccessFile.setLength(Native Method) > at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:189) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:175) > at > kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:241) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241) > at kafka.log.LogSegment.recover(LogSegment.scala:385) > at kafka.log.Log.recoverSegment(Log.scala:741) > at kafka.log.Log.recoverLog(Log.scala:894) > at kafka.log.Log.$anonfun$loadSegments$2(Log.scala:816) > at kafka.log.Log$$Lambda$153/391630194.apply$mcJ$sp(Unknown Source) > at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17) > at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2456) > at kafka.log.Log.loadSegments(Log.scala:816) > at kafka.log.Log.(Log.scala:326) > at kafka.log.Log$.apply(Log.scala:2593) > at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:358) > at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:253) > at kafka.raft.KafkaRaftManager.(RaftManager.scala:127) > at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:74) > at kafka.Kafka$.buildServer(Kafka.scala:79) > at kafka.Kafka$.main(Kafka.scala:87) > at kafka.Kafka.main(Kafka.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (KAFKA-12680) Failed to restart the broker in kraft mode
[ https://issues.apache.org/jira/browse/KAFKA-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen reopened KAFKA-12680: -- > Failed to restart the broker in kraft mode > -- > > Key: KAFKA-12680 > URL: https://issues.apache.org/jira/browse/KAFKA-12680 > Project: Kafka > Issue Type: Bug >Reporter: Wenbing Shen >Priority: Major > > I tested kraft mode for the first time today, I deployed a single node kraft > mode broker according to the documentation: > [https://github.com/apache/kafka/blob/6d1d68617ecd023b787f54aafc24a4232663428d/config/kraft/README.md] > > first step: ./bin/kafka-storage.sh random-uuid > Second step: Use the uuid generated above to execute the following commands: > ./bin/kafka-storage.sh format -t -c ./config/kraft/server.properties > > third step: ./bin/kafka-server-start.sh ./config/kraft/server.properties > > Then I created two topics with two partitions and a single replica. > ./bin/kafka-topics.sh --create --topic test-01 --partitions 2 > --replication-factor 1 --bootstrap-server localhost:9092 > Verify that there is no problem with production and consumption, but when I > call kafka-server-stop.sh, when I call the start command again, the broker > starts to report an error. > I am not sure if it is a known bug or a problem with my usage > > [2021-04-18 00:19:37,443] ERROR Exiting Kafka due to fatal exception > (kafka.Kafka$) > java.io.IOException: Invalid argument > at java.io.RandomAccessFile.setLength(Native Method) > at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:189) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:175) > at > kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:241) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241) > at kafka.log.LogSegment.recover(LogSegment.scala:385) > at kafka.log.Log.recoverSegment(Log.scala:741) > at kafka.log.Log.recoverLog(Log.scala:894) > at kafka.log.Log.$anonfun$loadSegments$2(Log.scala:816) > at kafka.log.Log$$Lambda$153/391630194.apply$mcJ$sp(Unknown Source) > at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17) > at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2456) > at kafka.log.Log.loadSegments(Log.scala:816) > at kafka.log.Log.(Log.scala:326) > at kafka.log.Log$.apply(Log.scala:2593) > at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:358) > at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:253) > at kafka.raft.KafkaRaftManager.(RaftManager.scala:127) > at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:74) > at kafka.Kafka$.buildServer(Kafka.scala:79) > at kafka.Kafka$.main(Kafka.scala:87) > at kafka.Kafka.main(Kafka.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12680) Failed to restart the broker in kraft mode
[ https://issues.apache.org/jira/browse/KAFKA-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326589#comment-17326589 ] Wenbing Shen commented on KAFKA-12680: -- Hi [~ijuma] Sorry, this problem is caused by me. In fact, this is a bug of wsl. The link for this bug is here: [https://github.com/microsoft/WSL/issues/2281] I did not find the problem in my centos physical machine test today. In fact, I should classify this problem as not a problem. I will change it now. Sorry for wasting your precious time. This is my first time using wsl. I Unexpectedly, it has this problem. Put your time on important issues. I wish the kraft model a great success in the end, and I will do my best to accomplish this together. :) > Failed to restart the broker in kraft mode > -- > > Key: KAFKA-12680 > URL: https://issues.apache.org/jira/browse/KAFKA-12680 > Project: Kafka > Issue Type: Bug >Reporter: Wenbing Shen >Priority: Major > > I tested kraft mode for the first time today, I deployed a single node kraft > mode broker according to the documentation: > [https://github.com/apache/kafka/blob/6d1d68617ecd023b787f54aafc24a4232663428d/config/kraft/README.md] > > first step: ./bin/kafka-storage.sh random-uuid > Second step: Use the uuid generated above to execute the following commands: > ./bin/kafka-storage.sh format -t -c ./config/kraft/server.properties > > third step: ./bin/kafka-server-start.sh ./config/kraft/server.properties > > Then I created two topics with two partitions and a single replica. > ./bin/kafka-topics.sh --create --topic test-01 --partitions 2 > --replication-factor 1 --bootstrap-server localhost:9092 > Verify that there is no problem with production and consumption, but when I > call kafka-server-stop.sh, when I call the start command again, the broker > starts to report an error. > I am not sure if it is a known bug or a problem with my usage > > [2021-04-18 00:19:37,443] ERROR Exiting Kafka due to fatal exception > (kafka.Kafka$) > java.io.IOException: Invalid argument > at java.io.RandomAccessFile.setLength(Native Method) > at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:189) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:175) > at > kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:241) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241) > at kafka.log.LogSegment.recover(LogSegment.scala:385) > at kafka.log.Log.recoverSegment(Log.scala:741) > at kafka.log.Log.recoverLog(Log.scala:894) > at kafka.log.Log.$anonfun$loadSegments$2(Log.scala:816) > at kafka.log.Log$$Lambda$153/391630194.apply$mcJ$sp(Unknown Source) > at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17) > at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2456) > at kafka.log.Log.loadSegments(Log.scala:816) > at kafka.log.Log.(Log.scala:326) > at kafka.log.Log$.apply(Log.scala:2593) > at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:358) > at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:253) > at kafka.raft.KafkaRaftManager.(RaftManager.scala:127) > at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:74) > at kafka.Kafka$.buildServer(Kafka.scala:79) > at kafka.Kafka$.main(Kafka.scala:87) > at kafka.Kafka.main(Kafka.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-12680) Failed to restart the broker in kraft mode
[ https://issues.apache.org/jira/browse/KAFKA-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen resolved KAFKA-12680. -- Resolution: Cannot Reproduce > Failed to restart the broker in kraft mode > -- > > Key: KAFKA-12680 > URL: https://issues.apache.org/jira/browse/KAFKA-12680 > Project: Kafka > Issue Type: Bug >Reporter: Wenbing Shen >Priority: Major > > I tested kraft mode for the first time today, I deployed a single node kraft > mode broker according to the documentation: > [https://github.com/apache/kafka/blob/6d1d68617ecd023b787f54aafc24a4232663428d/config/kraft/README.md] > > first step: ./bin/kafka-storage.sh random-uuid > Second step: Use the uuid generated above to execute the following commands: > ./bin/kafka-storage.sh format -t -c ./config/kraft/server.properties > > third step: ./bin/kafka-server-start.sh ./config/kraft/server.properties > > Then I created two topics with two partitions and a single replica. > ./bin/kafka-topics.sh --create --topic test-01 --partitions 2 > --replication-factor 1 --bootstrap-server localhost:9092 > Verify that there is no problem with production and consumption, but when I > call kafka-server-stop.sh, when I call the start command again, the broker > starts to report an error. > I am not sure if it is a known bug or a problem with my usage > > [2021-04-18 00:19:37,443] ERROR Exiting Kafka due to fatal exception > (kafka.Kafka$) > java.io.IOException: Invalid argument > at java.io.RandomAccessFile.setLength(Native Method) > at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:189) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:175) > at > kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:241) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241) > at kafka.log.LogSegment.recover(LogSegment.scala:385) > at kafka.log.Log.recoverSegment(Log.scala:741) > at kafka.log.Log.recoverLog(Log.scala:894) > at kafka.log.Log.$anonfun$loadSegments$2(Log.scala:816) > at kafka.log.Log$$Lambda$153/391630194.apply$mcJ$sp(Unknown Source) > at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17) > at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2456) > at kafka.log.Log.loadSegments(Log.scala:816) > at kafka.log.Log.(Log.scala:326) > at kafka.log.Log$.apply(Log.scala:2593) > at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:358) > at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:253) > at kafka.raft.KafkaRaftManager.(RaftManager.scala:127) > at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:74) > at kafka.Kafka$.buildServer(Kafka.scala:79) > at kafka.Kafka$.main(Kafka.scala:87) > at kafka.Kafka.main(Kafka.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-12702) Unhandled exception caught in InterBrokerSendThread
[ https://issues.apache.org/jira/browse/KAFKA-12702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-12702: - Attachment: image-2021-04-21-17-12-28-471.png > Unhandled exception caught in InterBrokerSendThread > --- > > Key: KAFKA-12702 > URL: https://issues.apache.org/jira/browse/KAFKA-12702 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Wenbing Shen >Priority: Blocker > Attachments: afterFixing.png, beforeFixing.png, > image-2021-04-21-17-12-28-471.png > > > In kraft mode, if listeners and advertised.listeners are not configured with > host addresses, the host parameter value of Listener in > BrokerRegistrationRequestData will be null. When the broker is started, a > null pointer exception will be thrown, causing startup failure. > A feasible solution is to replace the empty host of endPoint in > advertisedListeners with InetAddress.getLocalHost.getCanonicalHostName in > Broker Server when building networkListeners. > The following is the debug log: > before fixing: > [2021-04-21 14:15:20,032] DEBUG (broker-2-to-controller-send-thread > org.apache.kafka.clients.NetworkClient 522) [broker-2-to-controller] Sending > BROKER_REGISTRATION request with header RequestHeader(apiKey=BROKER_REGIS > TRATION, apiVersion=0, clientId=2, correlationId=6) and timeout 3 to node > 2: BrokerRegistrationRequestData(brokerId=2, > clusterId='nCqve6D1TEef3NpQniA0Mg', incarnationId=X8w4_1DFT2yUjOm6asPjIQ, > listeners=[Listener(n > ame='PLAINTEXT', {color:#FF}host=null,{color} port=9092, > securityProtocol=0)], features=[], rack=null) > [2021-04-21 14:15:20,033] ERROR (broker-2-to-controller-send-thread > kafka.server.BrokerToControllerRequestThread 76) > [broker-2-to-controller-send-thread]: unhandled exception caught in > InterBrokerSendThread > java.lang.NullPointerException > at > org.apache.kafka.common.message.BrokerRegistrationRequestData$Listener.addSize(BrokerRegistrationRequestData.java:515) > at > org.apache.kafka.common.message.BrokerRegistrationRequestData.addSize(BrokerRegistrationRequestData.java:216) > at > org.apache.kafka.common.protocol.SendBuilder.buildSend(SendBuilder.java:218) > at > org.apache.kafka.common.protocol.SendBuilder.buildRequestSend(SendBuilder.java:187) > at > org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:101) > at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:525) > at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:501) > at org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:461) > at > kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:104) > at > kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99) > at kafka.common.InterBrokerSendThread$$Lambda$259/910445654.apply(Unknown > Source) > at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563) > at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561) > at scala.collection.AbstractIterable.foreach(Iterable.scala:919) > at > kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99) > at > kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73) > at > kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManager.scala:368) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) > [2021-04-21 14:15:20,034] INFO (broker-2-to-controller-send-thread > kafka.server.BrokerToControllerRequestThread 66) > [broker-2-to-controller-send-thread]: Stopped > after fixing: > [2021-04-21 15:05:01,095] DEBUG (BrokerToControllerChannelManager broker=2 > name=heartbeat org.apache.kafka.clients.NetworkClient 512) > [BrokerToControllerChannelManager broker=2 name=heartbeat] Sending > BROKER_REGISTRATI > ON request with header RequestHeader(apiKey=BROKER_REGISTRATION, > apiVersion=0, clientId=2, correlationId=0) and timeout 3 to node 2: > BrokerRegistrationRequestData(brokerId=2, clusterId='nCqve6D1TEef3NpQniA0Mg', > inc > arnationId=xF29h_IRR1KzrERWwssQ2w, listeners=[Listener(name='PLAINTEXT', > host='hdpxxx.cn', port=9092, securityProtocol=0)], features=[], rack=null) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12702) Unhandled exception caught in InterBrokerSendThread
[ https://issues.apache.org/jira/browse/KAFKA-12702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326387#comment-17326387 ] Wenbing Shen commented on KAFKA-12702: -- We need to fix this problem, because according to the comments in the configuration file (config/kraft/server.properties), if listeners and advertised.listeners are not configured with an address, the program will automatically obtain java.net.InetAddress.getCanonicalHostName(), but this will actually cause the service to fail to start. !image-2021-04-21-17-12-28-471.png! > Unhandled exception caught in InterBrokerSendThread > --- > > Key: KAFKA-12702 > URL: https://issues.apache.org/jira/browse/KAFKA-12702 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Wenbing Shen >Priority: Blocker > Attachments: afterFixing.png, beforeFixing.png, > image-2021-04-21-17-12-28-471.png > > > In kraft mode, if listeners and advertised.listeners are not configured with > host addresses, the host parameter value of Listener in > BrokerRegistrationRequestData will be null. When the broker is started, a > null pointer exception will be thrown, causing startup failure. > A feasible solution is to replace the empty host of endPoint in > advertisedListeners with InetAddress.getLocalHost.getCanonicalHostName in > Broker Server when building networkListeners. > The following is the debug log: > before fixing: > [2021-04-21 14:15:20,032] DEBUG (broker-2-to-controller-send-thread > org.apache.kafka.clients.NetworkClient 522) [broker-2-to-controller] Sending > BROKER_REGISTRATION request with header RequestHeader(apiKey=BROKER_REGIS > TRATION, apiVersion=0, clientId=2, correlationId=6) and timeout 3 to node > 2: BrokerRegistrationRequestData(brokerId=2, > clusterId='nCqve6D1TEef3NpQniA0Mg', incarnationId=X8w4_1DFT2yUjOm6asPjIQ, > listeners=[Listener(n > ame='PLAINTEXT', {color:#FF}host=null,{color} port=9092, > securityProtocol=0)], features=[], rack=null) > [2021-04-21 14:15:20,033] ERROR (broker-2-to-controller-send-thread > kafka.server.BrokerToControllerRequestThread 76) > [broker-2-to-controller-send-thread]: unhandled exception caught in > InterBrokerSendThread > java.lang.NullPointerException > at > org.apache.kafka.common.message.BrokerRegistrationRequestData$Listener.addSize(BrokerRegistrationRequestData.java:515) > at > org.apache.kafka.common.message.BrokerRegistrationRequestData.addSize(BrokerRegistrationRequestData.java:216) > at > org.apache.kafka.common.protocol.SendBuilder.buildSend(SendBuilder.java:218) > at > org.apache.kafka.common.protocol.SendBuilder.buildRequestSend(SendBuilder.java:187) > at > org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:101) > at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:525) > at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:501) > at org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:461) > at > kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:104) > at > kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99) > at kafka.common.InterBrokerSendThread$$Lambda$259/910445654.apply(Unknown > Source) > at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563) > at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561) > at scala.collection.AbstractIterable.foreach(Iterable.scala:919) > at > kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99) > at > kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73) > at > kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManager.scala:368) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) > [2021-04-21 14:15:20,034] INFO (broker-2-to-controller-send-thread > kafka.server.BrokerToControllerRequestThread 66) > [broker-2-to-controller-send-thread]: Stopped > after fixing: > [2021-04-21 15:05:01,095] DEBUG (BrokerToControllerChannelManager broker=2 > name=heartbeat org.apache.kafka.clients.NetworkClient 512) > [BrokerToControllerChannelManager broker=2 name=heartbeat] Sending > BROKER_REGISTRATI > ON request with header RequestHeader(apiKey=BROKER_REGISTRATION, > apiVersion=0, clientId=2, correlationId=0) and timeout 3 to node 2: > BrokerRegistrationRequestData(brokerId=2, clusterId='nCqve6D1TEef3NpQniA0Mg', > inc > arnationId=xF29h_IRR1KzrERWwssQ2w, listeners=[Listener(name='PLAINTEXT', > host='hdpxxx.cn', port=9092, securityProtocol=0)], features=[], rack=null) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12702) Unhandled exception caught in InterBrokerSendThread
Wenbing Shen created KAFKA-12702: Summary: Unhandled exception caught in InterBrokerSendThread Key: KAFKA-12702 URL: https://issues.apache.org/jira/browse/KAFKA-12702 Project: Kafka Issue Type: Bug Affects Versions: 2.8.0 Reporter: Wenbing Shen Attachments: afterFixing.png, beforeFixing.png In kraft mode, if listeners and advertised.listeners are not configured with host addresses, the host parameter value of Listener in BrokerRegistrationRequestData will be null. When the broker is started, a null pointer exception will be thrown, causing startup failure. A feasible solution is to replace the empty host of endPoint in advertisedListeners with InetAddress.getLocalHost.getCanonicalHostName in Broker Server when building networkListeners. The following is the debug log: before fixing: [2021-04-21 14:15:20,032] DEBUG (broker-2-to-controller-send-thread org.apache.kafka.clients.NetworkClient 522) [broker-2-to-controller] Sending BROKER_REGISTRATION request with header RequestHeader(apiKey=BROKER_REGIS TRATION, apiVersion=0, clientId=2, correlationId=6) and timeout 3 to node 2: BrokerRegistrationRequestData(brokerId=2, clusterId='nCqve6D1TEef3NpQniA0Mg', incarnationId=X8w4_1DFT2yUjOm6asPjIQ, listeners=[Listener(n ame='PLAINTEXT', {color:#FF}host=null,{color} port=9092, securityProtocol=0)], features=[], rack=null) [2021-04-21 14:15:20,033] ERROR (broker-2-to-controller-send-thread kafka.server.BrokerToControllerRequestThread 76) [broker-2-to-controller-send-thread]: unhandled exception caught in InterBrokerSendThread java.lang.NullPointerException at org.apache.kafka.common.message.BrokerRegistrationRequestData$Listener.addSize(BrokerRegistrationRequestData.java:515) at org.apache.kafka.common.message.BrokerRegistrationRequestData.addSize(BrokerRegistrationRequestData.java:216) at org.apache.kafka.common.protocol.SendBuilder.buildSend(SendBuilder.java:218) at org.apache.kafka.common.protocol.SendBuilder.buildRequestSend(SendBuilder.java:187) at org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:101) at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:525) at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:501) at org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:461) at kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:104) at kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99) at kafka.common.InterBrokerSendThread$$Lambda$259/910445654.apply(Unknown Source) at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563) at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561) at scala.collection.AbstractIterable.foreach(Iterable.scala:919) at kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99) at kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73) at kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManager.scala:368) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) [2021-04-21 14:15:20,034] INFO (broker-2-to-controller-send-thread kafka.server.BrokerToControllerRequestThread 66) [broker-2-to-controller-send-thread]: Stopped after fixing: [2021-04-21 15:05:01,095] DEBUG (BrokerToControllerChannelManager broker=2 name=heartbeat org.apache.kafka.clients.NetworkClient 512) [BrokerToControllerChannelManager broker=2 name=heartbeat] Sending BROKER_REGISTRATI ON request with header RequestHeader(apiKey=BROKER_REGISTRATION, apiVersion=0, clientId=2, correlationId=0) and timeout 3 to node 2: BrokerRegistrationRequestData(brokerId=2, clusterId='nCqve6D1TEef3NpQniA0Mg', inc arnationId=xF29h_IRR1KzrERWwssQ2w, listeners=[Listener(name='PLAINTEXT', host='hdpxxx.cn', port=9092, securityProtocol=0)], features=[], rack=null) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-12684) The valid partition list is incorrectly replaced by the successfully elected partition list
[ https://issues.apache.org/jira/browse/KAFKA-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen reassigned KAFKA-12684: Assignee: Wenbing Shen > The valid partition list is incorrectly replaced by the successfully elected > partition list > --- > > Key: KAFKA-12684 > URL: https://issues.apache.org/jira/browse/KAFKA-12684 > Project: Kafka > Issue Type: Bug > Components: tools >Affects Versions: 2.6.0, 2.7.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Minor > Fix For: 3.0.0 > > Attachments: election-preferred-leader.png, non-preferred-leader.png > > > When using the kafka-election-tool for preferred replica election, if there > are partitions in the elected list that are in the preferred replica, the > list of partitions already in the preferred replica will be replaced by the > successfully elected partition list. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12684) The valid partition list is incorrectly replaced by the successfully elected partition list
Wenbing Shen created KAFKA-12684: Summary: The valid partition list is incorrectly replaced by the successfully elected partition list Key: KAFKA-12684 URL: https://issues.apache.org/jira/browse/KAFKA-12684 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 2.7.0, 2.6.0 Reporter: Wenbing Shen Fix For: 3.0.0 Attachments: election-preferred-leader.png, non-preferred-leader.png When using the kafka-election-tool for preferred replica election, if there are partitions in the elected list that are in the preferred replica, the list of partitions already in the preferred replica will be replaced by the successfully elected partition list. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12629) Flaky Test RaftClusterTest
[ https://issues.apache.org/jira/browse/KAFKA-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324517#comment-17324517 ] Wenbing Shen commented on KAFKA-12629: -- Found another error log today: One of the errors occurred in testCreateClusterAndCreateAndManyTopicsWithManyPartitions, It reports: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.UnknownServerException: The server experienced an unexpected error when processing the request. The strange thing is that this test case creates topics of test-topic-1, test-topic-2, and test-topic-3, but the log in the standard output indicates that the copy acquisition thread of broker 0 needs to fetch test-topic-0 partition data from broker 1, But the hostname of broker 1 cannot be resolved. The detailed log can be viewed from this link: https://github.com/apache/kafka/pull/10551/checks?check_run_id=2373652453 > Flaky Test RaftClusterTest > -- > > Key: KAFKA-12629 > URL: https://issues.apache.org/jira/browse/KAFKA-12629 > Project: Kafka > Issue Type: Test > Components: core, unit tests >Reporter: Matthias J. Sax >Assignee: Luke Chen >Priority: Critical > Labels: flaky-test > > {quote} {{java.util.concurrent.ExecutionException: > java.lang.ClassNotFoundException: > org.apache.kafka.controller.NoOpSnapshotWriterBuilder > at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191) > at > kafka.testkit.KafkaClusterTestKit.startup(KafkaClusterTestKit.java:364) > at > kafka.server.RaftClusterTest.testCreateClusterAndCreateAndManyTopicsWithManyPartitions(RaftClusterTest.scala:181)}}{quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12680) Failed to restart the broker in kraft mode
Wenbing Shen created KAFKA-12680: Summary: Failed to restart the broker in kraft mode Key: KAFKA-12680 URL: https://issues.apache.org/jira/browse/KAFKA-12680 Project: Kafka Issue Type: Bug Reporter: Wenbing Shen I tested kraft mode for the first time today, I deployed a single node kraft mode broker according to the documentation: [https://github.com/apache/kafka/blob/6d1d68617ecd023b787f54aafc24a4232663428d/config/kraft/README.md] first step: ./bin/kafka-storage.sh random-uuid Second step: Use the uuid generated above to execute the following commands: ./bin/kafka-storage.sh format -t -c ./config/kraft/server.properties third step: ./bin/kafka-server-start.sh ./config/kraft/server.properties Then I created two topics with two partitions and a single replica. ./bin/kafka-topics.sh --create --topic test-01 --partitions 2 --replication-factor 1 --bootstrap-server localhost:9092 Verify that there is no problem with production and consumption, but when I call kafka-server-stop.sh, when I call the start command again, the broker starts to report an error. I am not sure if it is a known bug or a problem with my usage [2021-04-18 00:19:37,443] ERROR Exiting Kafka due to fatal exception (kafka.Kafka$) java.io.IOException: Invalid argument at java.io.RandomAccessFile.setLength(Native Method) at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:189) at kafka.log.AbstractIndex.resize(AbstractIndex.scala:175) at kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:241) at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241) at kafka.log.LogSegment.recover(LogSegment.scala:385) at kafka.log.Log.recoverSegment(Log.scala:741) at kafka.log.Log.recoverLog(Log.scala:894) at kafka.log.Log.$anonfun$loadSegments$2(Log.scala:816) at kafka.log.Log$$Lambda$153/391630194.apply$mcJ$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17) at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2456) at kafka.log.Log.loadSegments(Log.scala:816) at kafka.log.Log.(Log.scala:326) at kafka.log.Log$.apply(Log.scala:2593) at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:358) at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:253) at kafka.raft.KafkaRaftManager.(RaftManager.scala:127) at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:74) at kafka.Kafka$.buildServer(Kafka.scala:79) at kafka.Kafka$.main(Kafka.scala:87) at kafka.Kafka.main(Kafka.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10886) Kafka crashed in windows environment2
[ https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322702#comment-17322702 ] Wenbing Shen commented on KAFKA-10886: -- [~asrivastava] I have the repaired binary file, and it has been running well in our company’s windows product project, but this belongs to our company’s product. I don’t have the right to publish the binary file. However, I have published the repair method on this jira. , Can you tell me how you applied this patch? > Kafka crashed in windows environment2 > - > > Key: KAFKA-10886 > URL: https://issues.apache.org/jira/browse/KAFKA-10886 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.0.0 > Environment: Windows Server >Reporter: Wenbing Shen >Priority: Critical > Labels: windows > Attachments: windows_kafka_full_crash.patch > > > I tried using the > [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix > the Kafka problem in the Windows environment, but it didn't seem to > work.These include restarting the Kafka service causing data to be deleted by > mistake, deleting a topic or a partition migration causing a disk to go > offline or the broker crashed. > [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23 > 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException: > > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at > kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log.maybeHandleIOException(Log.scala:1837) at > kafka.log.Log.renameDir(Log.scala:687) at > kafka.log.LogManager.asyncDelete(LogManager.scala:833) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at > kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at > kafka.cluster.Partition.delete(Partition.scala:262) at > kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at > kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at > kafka.server.KafkaApis.handle(KafkaApis.scala:113) at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at > java.lang.Thread.run(Unknown Source) Suppressed: > java.nio.file.AccessDeniedException: > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) > ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler > kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving > replicas in dir >
[jira] [Resolved] (KAFKA-12445) Improve the display of ConsumerPerformance indicators
[ https://issues.apache.org/jira/browse/KAFKA-12445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen resolved KAFKA-12445. -- Resolution: Won't Do > Improve the display of ConsumerPerformance indicators > - > > Key: KAFKA-12445 > URL: https://issues.apache.org/jira/browse/KAFKA-12445 > Project: Kafka > Issue Type: Improvement > Components: tools >Affects Versions: 2.7.0 >Reporter: Wenbing Shen >Priority: Minor > Attachments: image-2021-03-10-13-30-27-734.png > > > The current test indicators are shown below, the user experience is poor, and > there is no intuitive display of the meaning of each indicator. > bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic > test-perf10 --messages 4 --from-latest > start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, > nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec > 2021-03-10 04:32:54:349, 2021-03-10 04:33:45:651, 390.6348, 7.6144, 40001, > 779.7162, 3096, 48206, 8.1034, 829.7930 > > show-detailed-stats: > bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic > test-perf --messages 1 --show-detailed-stats > time, threadId, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, > rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec > 2021-03-10 11:19:00:146, 0, 785.6112, 157.1222, 823773, 164754.6000, > 1615346338626, -1615346333626, 0., 0. > 2021-03-10 11:19:05:146, 0, 4577.7817, 758.4341, 4800152, 795275.8000, 0, > 5000, 758.4341, 795275.8000 > 2021-03-10 11:19:10:146, 0, 8556.0875, 795.6612, 8971708, 834311.2000, 0, > 5000, 795.6612, 834311.2000 > 2021-03-10 11:19:15:286, 0, 9526.5665, 188.8091, 9989329, 197980.7393, 0, > 5140, 188.8091, 197980.7393 > 2021-03-10 11:19:20:310, 0, 9536.3321, 1.9438, 569, 2038.2166, 0, 5024, > 1.9438, 2038.2166 > > One of the optimization methods is to display the indicator variable name and > indicator test value in the form of a table, so that the meaning of each > measurement value can be clearly expressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10886) Kafka crashed in windows environment2
[ https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322672#comment-17322672 ] Wenbing Shen commented on KAFKA-10886: -- [~asrivastava] Sorry, the patch has not been merged into the development branch, you can manually apply this patch to your kafka project code, and manually recompile the binary file. > Kafka crashed in windows environment2 > - > > Key: KAFKA-10886 > URL: https://issues.apache.org/jira/browse/KAFKA-10886 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.0.0 > Environment: Windows Server >Reporter: Wenbing Shen >Priority: Critical > Labels: windows > Attachments: windows_kafka_full_crash.patch > > > I tried using the > [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix > the Kafka problem in the Windows environment, but it didn't seem to > work.These include restarting the Kafka service causing data to be deleted by > mistake, deleting a topic or a partition migration causing a disk to go > offline or the broker crashed. > [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23 > 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException: > > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at > kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log.maybeHandleIOException(Log.scala:1837) at > kafka.log.Log.renameDir(Log.scala:687) at > kafka.log.LogManager.asyncDelete(LogManager.scala:833) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at > kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at > kafka.cluster.Partition.delete(Partition.scala:262) at > kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at > kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at > kafka.server.KafkaApis.handle(KafkaApis.scala:113) at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at > java.lang.Thread.run(Unknown Source) Suppressed: > java.nio.file.AccessDeniedException: > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) > ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler > kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving > replicas in dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-9156) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow in concurrent state
[ https://issues.apache.org/jira/browse/KAFKA-9156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312207#comment-17312207 ] Wenbing Shen edited comment on KAFKA-9156 at 3/31/21, 9:04 AM: --- Hi, [~amironov] [~ijuma] , I applied kafka-7283 and kafka-9156 in our version of kafka-2.0.0 to apply the benefits of LazyIndex when starting the broker. When I applied this feature to a small Kafka cluster, there was no problem, but when I first applied it to a cluster with high traffic, some brokers with small traffic seemed to be no exception, but after the brokers with large traffic started, a large number of replica fetcher threads throw java.nio.BufferOverflowException. Same as the problem encountered by [~iBlackeyes] , the current patch still does not fix this problem. Its stack information is as follows: [2021-03-31 15:23:54,935] ERROR (ReplicaFetcherThread-1-1001 kafka.server.ReplicaFetcherThread 76) [ReplicaFetcher replicaId=1006, leaderId=1001, fetcherId=1] Error due to org.apache.kafka.common.KafkaException: Error processing data for partition sinan_assets_tagged_default-9 offset 38543576 at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:214) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:175) at scala.Option.foreach(Option.scala:257) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:175) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:172) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:172) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:172) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:172) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:255) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:170) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:114) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) Caused by: java.nio.BufferOverflowException at java.nio.Buffer.nextPutIndex(Buffer.java:527) at java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:793) at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:131) at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:111) at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:111) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:255) at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:111) at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:578) at kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1582) at kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1582) at scala.Option.foreach(Option.scala:257) at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1582) at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1568) at kafka.log.Log.maybeHandleIOException(Log.scala:1943) at kafka.log.Log.roll(Log.scala:1568) at kafka.log.Log.kafka$log$Log$$maybeRoll(Log.scala:1553) at kafka.log.Log$$anonfun$append$2.apply(Log.scala:956) at kafka.log.Log$$anonfun$append$2.apply(Log.scala:850) at kafka.log.Log.maybeHandleIOException(Log.scala:1943) at kafka.log.Log.append(Log.scala:850) at kafka.log.Log.appendAsFollower(Log.scala:831) at kafka.cluster.Partition$$anonfun$doAppendRecordsToFollowerOrFutureReplica$1.apply(Partition.scala:589) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:255) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:261) at kafka.cluster.Partition.doAppendRecordsToFollowerOrFutureReplica(Partition.scala:576) at kafka.cluster.Partition.appendRecordsToFollowerOrFutureReplica(Partition.scala:596) at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:129) at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:43) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:189) ... 13 more [2021-03-31 15:23:54,936] INFO (ReplicaFetcherThread-1-1001 kafka.server.ReplicaFetcherThread 66) [ReplicaFetcher replicaId=1006, leaderId=1001, fetcherId=1] Stopped was (Author: wenbing.shen): Hi, [~amironov] [~ijuma] , I applied kafka-7283 and kafka-9156 in our version of kafka-2.0.0 to apply the benefits of LazyIndex when
[jira] [Commented] (KAFKA-9156) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow in concurrent state
[ https://issues.apache.org/jira/browse/KAFKA-9156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312207#comment-17312207 ] Wenbing Shen commented on KAFKA-9156: - Hi, [~amironov] [~ijuma] , I applied kafka-7283 and kafka-9156 in our version of kafka-2.0.0 to apply the benefits of LazyIndex when starting the broker. When I applied this feature to a small Kafka cluster, there was no problem, but when I first applied it to a cluster with high traffic, some brokers with small traffic seemed to be no exception, but after the brokers with large traffic started, a large number of replica fetcher threads throw java.nio.BufferOverflowException. Same as the problem encountered by [~iBlackeyes] , the current patch still does not fix this problem. Its stack information is as follows: {panel:title=我的标题} 文本标题 {panel} [2021-03-31 15:23:54,935] ERROR (ReplicaFetcherThread-1-1001 kafka.server.ReplicaFetcherThread 76) [ReplicaFetcher replicaId=1006, leaderId=1001, fetcherId=1] Error due to org.apache.kafka.common.KafkaException: Error processing data for partition sinan_assets_tagged_default-9 offset 38543576 at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:214) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:175) at scala.Option.foreach(Option.scala:257) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:175) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:172) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:172) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:172) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:172) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:255) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:170) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:114) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) Caused by: java.nio.BufferOverflowException at java.nio.Buffer.nextPutIndex(Buffer.java:527) at java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:793) at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:131) at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:111) at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:111) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:255) at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:111) at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:578) at kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1582) at kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1582) at scala.Option.foreach(Option.scala:257) at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1582) at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1568) at kafka.log.Log.maybeHandleIOException(Log.scala:1943) at kafka.log.Log.roll(Log.scala:1568) at kafka.log.Log.kafka$log$Log$$maybeRoll(Log.scala:1553) at kafka.log.Log$$anonfun$append$2.apply(Log.scala:956) at kafka.log.Log$$anonfun$append$2.apply(Log.scala:850) at kafka.log.Log.maybeHandleIOException(Log.scala:1943) at kafka.log.Log.append(Log.scala:850) at kafka.log.Log.appendAsFollower(Log.scala:831) at kafka.cluster.Partition$$anonfun$doAppendRecordsToFollowerOrFutureReplica$1.apply(Partition.scala:589) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:255) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:261) at kafka.cluster.Partition.doAppendRecordsToFollowerOrFutureReplica(Partition.scala:576) at kafka.cluster.Partition.appendRecordsToFollowerOrFutureReplica(Partition.scala:596) at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:129) at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:43) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:189) ... 13 more [2021-03-31 15:23:54,936] INFO (ReplicaFetcherThread-1-1001 kafka.server.ReplicaFetcherThread 66) [ReplicaFetcher replicaId=1006, leaderId=1001, fetcherId=1] Stopped > LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow in concurrent > state > --- > >
[jira] [Assigned] (KAFKA-12556) Add --under-preferred-replica-partitions option to describe topics command
[ https://issues.apache.org/jira/browse/KAFKA-12556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen reassigned KAFKA-12556: Assignee: Wenbing Shen > Add --under-preferred-replica-partitions option to describe topics command > -- > > Key: KAFKA-12556 > URL: https://issues.apache.org/jira/browse/KAFKA-12556 > Project: Kafka > Issue Type: Improvement > Components: tools >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Minor > > Whether the preferred replica is the partition leader directly affects the > external output traffic of the broker. When the preferred replica of all > partitions becomes the leader, the external output traffic of the broker will > be in a balanced state. When there are a large number of partition leaders > that are not preferred replicas, it will be destroyed this state of balance. > Currently, the controller will periodically check the unbalanced ratio of the > partition preferred replicas (if enabled) to trigger the preferred replica > election, or manually trigger the election through the kafka-leader-election > tool. However, if we want to know which partition leader is in the > non-preferred replica, we need to look it up in the controller log or judge > ourselves from the topic details list. > We can add the --under-preferred-replica-partitions configuration option in > TopicCommand describe topics to query the list of partitions in the current > cluster that are in non-preferred replicas. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12556) Add --under-preferred-replica-partitions option to describe topics command
Wenbing Shen created KAFKA-12556: Summary: Add --under-preferred-replica-partitions option to describe topics command Key: KAFKA-12556 URL: https://issues.apache.org/jira/browse/KAFKA-12556 Project: Kafka Issue Type: Improvement Components: tools Reporter: Wenbing Shen Whether the preferred replica is the partition leader directly affects the external output traffic of the broker. When the preferred replica of all partitions becomes the leader, the external output traffic of the broker will be in a balanced state. When there are a large number of partition leaders that are not preferred replicas, it will be destroyed this state of balance. Currently, the controller will periodically check the unbalanced ratio of the partition preferred replicas (if enabled) to trigger the preferred replica election, or manually trigger the election through the kafka-leader-election tool. However, if we want to know which partition leader is in the non-preferred replica, we need to look it up in the controller log or judge ourselves from the topic details list. We can add the --under-preferred-replica-partitions configuration option in TopicCommand describe topics to query the list of partitions in the current cluster that are in non-preferred replicas. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12507) java.lang.OutOfMemoryError: Direct buffer memory
[ https://issues.apache.org/jira/browse/KAFKA-12507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304593#comment-17304593 ] Wenbing Shen commented on KAFKA-12507: -- How much do you set about these parameters: -XX:MaxDirectMemorySize、num.network.threads、socket.request.max.bytes We have encountered the same problem in the environment of our customers. We set -XX:MaxDirectMemorySize=2G 、num.network.threads=120,Later, I circumvented this problem by calculating real-time traffic and reducing the num.network.threads. > java.lang.OutOfMemoryError: Direct buffer memory > > > Key: KAFKA-12507 > URL: https://issues.apache.org/jira/browse/KAFKA-12507 > Project: Kafka > Issue Type: Bug > Components: core > Environment: kafka version: 2.0.1 > java version: > java version "1.8.0_211" > Java(TM) SE Runtime Environment (build 1.8.0_211-b12) > Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode) > the command we use to start kafka broker: > java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 > -XX:InitiatingHeapOccupancyPercent=35 -Djava.awt.headless=true > -XX:+ExplicitGCInvokesConcurrent >Reporter: diehu >Priority: Major > > Hi, we have three brokers in our kafka cluster, and we use scripts to send > data to kafka at a rate of about 3.6w eps. After about one month, we got the > OOM error: > {code:java} > [2021-01-09 17:12:24,750] ERROR Processor got uncaught exception. > (kafka.network.Processor) > java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241) > at sun.nio.ch.IOUtil.read(IOUtil.java:195) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > at > org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:104) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:117) > at > org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:335) > at > org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:296) > at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:562) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:498) > at org.apache.kafka.common.network.Selector.poll(Selector.java:427) > at kafka.network.Processor.poll(SocketServer.scala:679) > at kafka.network.Processor.run(SocketServer.scala:584) > at java.lang.Thread.run(Thread.java:748){code} > the kafka server is not shutdown, but always get this error. And at the same > time, data can not be produced to kafka cluster, consumer can not consume > data from kafka cluster. > We used the recommended java parameter XX:+ExplicitGCInvokesConcurrent but > it seems not useful. > Only kafka cluster restart helps to fix this problem. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-12493) The controller should handle the consistency between the controllerContext and the partition replicas assignment on zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-12493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen reassigned KAFKA-12493: Assignee: Wenbing Shen > The controller should handle the consistency between the controllerContext > and the partition replicas assignment on zookeeper > - > > Key: KAFKA-12493 > URL: https://issues.apache.org/jira/browse/KAFKA-12493 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Major > Fix For: 3.0.0 > > > This question can be linked to this email: > [https://lists.apache.org/thread.html/redf5748ec787a9c65fc48597e3d2256ffdd729de14afb873c63e6c5b%40%3Cusers.kafka.apache.org%3E] > > This is a 100% recurring problem. > Problem description: > In the production environment of our customer’s site, the existing partitions > were redistributed in the code of colleagues in other departments and written > into zookeeper. This caused the controller to only judge the newly added > partitions when processing partition modification events. Partition > allocation plan and new partition and replica allocation in the partition > state machine and replica state machine, and issue LeaderAndISR and other > control requests. > But the controller did not verify the existing partition replicas assigment > in the controllerContext and whether the original partition allocation on the > znode in zookeeper has changed. This seems to be no problem, but when we have > to restart the broker for some reasons, such as configuration updates and > upgrades Wait, this will cause this part of the topic in real-time production > to be abnormal, the controller cannot complete the allocation of the new > leader, and the original leader cannot correctly identify the replica > allocated on the current zookeeper. The real-time business in our customer's > on-site environment is interrupted and partially Data has been lost. > This problem can be stably reproduced in the following ways: > Adding partitions or modifying replicas of an existing topic through the > following code will cause the original partition replicas to be reallocated > and finally written to zookeeper.Next, the controller did not accurately > process this event, restart the topic related broker, this topic will not be > able to be produced and consumed. > > {code:java} > public void updateKafkaTopic(KafkaTopicVO kafkaTopicVO) { > ZkUtils zkUtils = ZkUtils.apply(ZK_LIST, SESSION_TIMEOUT, > CONNECTION_TIMEOUT, JaasUtils.isZkSecurityEnabled()); > try { > if (kafkaTopicVO.getPartitionNum() >= 0 && > kafkaTopicVO.getReplicationNum() >= 0) { > // Get the original broker data information > Seq brokerMetadata = > AdminUtils.getBrokerMetadatas(zkUtils, > RackAwareMode.Enforced$.MODULE$, > Option.apply(null)); > // Generate a new partition replica allocation plan > scala.collection.Map> replicaAssign = > AdminUtils.assignReplicasToBrokers(brokerMetadata, > kafkaTopicVO.getPartitionNum(), // Number of partitions > kafkaTopicVO.getReplicationNum(), // Number of replicas > per partition > -1, > -1); > // Modify the partition replica allocation plan > AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK(zkUtils, > kafkaTopicVO.getTopicNameList().get(0), > replicaAssign, > null, > true); > } > } catch (Exception e) { > System.out.println("Adjust partition abnormal"); > System.exit(0); > } finally { > zkUtils.close(); > } > } > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12493) The controller should handle the consistency between the controllerContext and the partition replicas assignment on zookeeper
Wenbing Shen created KAFKA-12493: Summary: The controller should handle the consistency between the controllerContext and the partition replicas assignment on zookeeper Key: KAFKA-12493 URL: https://issues.apache.org/jira/browse/KAFKA-12493 Project: Kafka Issue Type: Bug Components: controller Affects Versions: 2.7.0, 2.6.0, 2.5.0, 2.4.0, 2.3.0, 2.2.0, 2.1.0, 2.0.0 Reporter: Wenbing Shen Fix For: 3.0.0 This question can be linked to this email: [https://lists.apache.org/thread.html/redf5748ec787a9c65fc48597e3d2256ffdd729de14afb873c63e6c5b%40%3Cusers.kafka.apache.org%3E] This is a 100% recurring problem. Problem description: In the production environment of our customer’s site, the existing partitions were redistributed in the code of colleagues in other departments and written into zookeeper. This caused the controller to only judge the newly added partitions when processing partition modification events. Partition allocation plan and new partition and replica allocation in the partition state machine and replica state machine, and issue LeaderAndISR and other control requests. But the controller did not verify the existing partition replicas assigment in the controllerContext and whether the original partition allocation on the znode in zookeeper has changed. This seems to be no problem, but when we have to restart the broker for some reasons, such as configuration updates and upgrades Wait, this will cause this part of the topic in real-time production to be abnormal, the controller cannot complete the allocation of the new leader, and the original leader cannot correctly identify the replica allocated on the current zookeeper. The real-time business in our customer's on-site environment is interrupted and partially Data has been lost. This problem can be stably reproduced in the following ways: Adding partitions or modifying replicas of an existing topic through the following code will cause the original partition replicas to be reallocated and finally written to zookeeper.Next, the controller did not accurately process this event, restart the topic related broker, this topic will not be able to be produced and consumed. {code:java} public void updateKafkaTopic(KafkaTopicVO kafkaTopicVO) { ZkUtils zkUtils = ZkUtils.apply(ZK_LIST, SESSION_TIMEOUT, CONNECTION_TIMEOUT, JaasUtils.isZkSecurityEnabled()); try { if (kafkaTopicVO.getPartitionNum() >= 0 && kafkaTopicVO.getReplicationNum() >= 0) { // Get the original broker data information Seq brokerMetadata = AdminUtils.getBrokerMetadatas(zkUtils, RackAwareMode.Enforced$.MODULE$, Option.apply(null)); // Generate a new partition replica allocation plan scala.collection.Map> replicaAssign = AdminUtils.assignReplicasToBrokers(brokerMetadata, kafkaTopicVO.getPartitionNum(), // Number of partitions kafkaTopicVO.getReplicationNum(), // Number of replicas per partition -1, -1); // Modify the partition replica allocation plan AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK(zkUtils, kafkaTopicVO.getTopicNameList().get(0), replicaAssign, null, true); } } catch (Exception e) { System.out.println("Adjust partition abnormal"); System.exit(0); } finally { zkUtils.close(); } } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-12454) Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in current kafka cluster
[ https://issues.apache.org/jira/browse/KAFKA-12454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-12454: - Fix Version/s: 3.0.0 > Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in > current kafka cluster > --- > > Key: KAFKA-12454 > URL: https://issues.apache.org/jira/browse/KAFKA-12454 > Project: Kafka > Issue Type: Improvement >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Minor > Fix For: 3.0.0 > > > When non-existent brokerIds value are given, the kafka-log-dirs tool will > have a timeout error: > Exception in thread "main" java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node > assignment. Call: describeLogDirs > at > org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) > at > org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) > at > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260) > at kafka.admin.LogDirsCommand$.describe(LogDirsCommand.scala:50) > at kafka.admin.LogDirsCommand$.main(LogDirsCommand.scala:36) > at kafka.admin.LogDirsCommand.main(LogDirsCommand.scala) > Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting > for a node assignment. Call: describeLogDirs > > When the brokerId entered by the user does not exist, an error message > indicating that the node is not present should be printed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-12454) Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in current kafka cluster
[ https://issues.apache.org/jira/browse/KAFKA-12454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-12454: - Affects Version/s: (was: 2.8.0) > Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in > current kafka cluster > --- > > Key: KAFKA-12454 > URL: https://issues.apache.org/jira/browse/KAFKA-12454 > Project: Kafka > Issue Type: Improvement >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Minor > > When non-existent brokerIds value are given, the kafka-log-dirs tool will > have a timeout error: > Exception in thread "main" java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node > assignment. Call: describeLogDirs > at > org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) > at > org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) > at > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260) > at kafka.admin.LogDirsCommand$.describe(LogDirsCommand.scala:50) > at kafka.admin.LogDirsCommand$.main(LogDirsCommand.scala:36) > at kafka.admin.LogDirsCommand.main(LogDirsCommand.scala) > Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting > for a node assignment. Call: describeLogDirs > > When the brokerId entered by the user does not exist, an error message > indicating that the node is not present should be printed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-12454) Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in current kafka cluster
[ https://issues.apache.org/jira/browse/KAFKA-12454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-12454: - Affects Version/s: 2.8.0 > Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in > current kafka cluster > --- > > Key: KAFKA-12454 > URL: https://issues.apache.org/jira/browse/KAFKA-12454 > Project: Kafka > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Minor > > When non-existent brokerIds value are given, the kafka-log-dirs tool will > have a timeout error: > Exception in thread "main" java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node > assignment. Call: describeLogDirs > at > org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) > at > org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) > at > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260) > at kafka.admin.LogDirsCommand$.describe(LogDirsCommand.scala:50) > at kafka.admin.LogDirsCommand$.main(LogDirsCommand.scala:36) > at kafka.admin.LogDirsCommand.main(LogDirsCommand.scala) > Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting > for a node assignment. Call: describeLogDirs > > When the brokerId entered by the user does not exist, an error message > indicating that the node is not present should be printed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-12454) Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in current kafka cluster
[ https://issues.apache.org/jira/browse/KAFKA-12454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen reassigned KAFKA-12454: Assignee: Wenbing Shen > Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in > current kafka cluster > --- > > Key: KAFKA-12454 > URL: https://issues.apache.org/jira/browse/KAFKA-12454 > Project: Kafka > Issue Type: Improvement >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Minor > > When non-existent brokerIds value are given, the kafka-log-dirs tool will > have a timeout error: > Exception in thread "main" java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node > assignment. Call: describeLogDirs > at > org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) > at > org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) > at > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260) > at kafka.admin.LogDirsCommand$.describe(LogDirsCommand.scala:50) > at kafka.admin.LogDirsCommand$.main(LogDirsCommand.scala:36) > at kafka.admin.LogDirsCommand.main(LogDirsCommand.scala) > Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting > for a node assignment. Call: describeLogDirs > > When the brokerId entered by the user does not exist, an error message > indicating that the node is not present should be printed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12454) Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in current kafka cluster
Wenbing Shen created KAFKA-12454: Summary: Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in current kafka cluster Key: KAFKA-12454 URL: https://issues.apache.org/jira/browse/KAFKA-12454 Project: Kafka Issue Type: Improvement Reporter: Wenbing Shen When non-existent brokerIds value are given, the kafka-log-dirs tool will have a timeout error: Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeLogDirs at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260) at kafka.admin.LogDirsCommand$.describe(LogDirsCommand.scala:50) at kafka.admin.LogDirsCommand$.main(LogDirsCommand.scala:36) at kafka.admin.LogDirsCommand.main(LogDirsCommand.scala) Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeLogDirs When the brokerId entered by the user does not exist, an error message indicating that the node is not present should be printed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12445) Improve the display of ConsumerPerformance indicators
[ https://issues.apache.org/jira/browse/KAFKA-12445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298532#comment-17298532 ] Wenbing Shen commented on KAFKA-12445: -- In my own environment, I do this. I don’t know if the community can accept this approach. !image-2021-03-10-13-30-27-734.png! > Improve the display of ConsumerPerformance indicators > - > > Key: KAFKA-12445 > URL: https://issues.apache.org/jira/browse/KAFKA-12445 > Project: Kafka > Issue Type: Improvement > Components: tools >Affects Versions: 2.7.0 >Reporter: Wenbing Shen >Priority: Minor > Attachments: image-2021-03-10-13-30-27-734.png > > > The current test indicators are shown below, the user experience is poor, and > there is no intuitive display of the meaning of each indicator. > bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic > test-perf10 --messages 4 --from-latest > start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, > nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec > 2021-03-10 04:32:54:349, 2021-03-10 04:33:45:651, 390.6348, 7.6144, 40001, > 779.7162, 3096, 48206, 8.1034, 829.7930 > > show-detailed-stats: > bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic > test-perf --messages 1 --show-detailed-stats > time, threadId, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, > rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec > 2021-03-10 11:19:00:146, 0, 785.6112, 157.1222, 823773, 164754.6000, > 1615346338626, -1615346333626, 0., 0. > 2021-03-10 11:19:05:146, 0, 4577.7817, 758.4341, 4800152, 795275.8000, 0, > 5000, 758.4341, 795275.8000 > 2021-03-10 11:19:10:146, 0, 8556.0875, 795.6612, 8971708, 834311.2000, 0, > 5000, 795.6612, 834311.2000 > 2021-03-10 11:19:15:286, 0, 9526.5665, 188.8091, 9989329, 197980.7393, 0, > 5140, 188.8091, 197980.7393 > 2021-03-10 11:19:20:310, 0, 9536.3321, 1.9438, 569, 2038.2166, 0, 5024, > 1.9438, 2038.2166 > > One of the optimization methods is to display the indicator variable name and > indicator test value in the form of a table, so that the meaning of each > measurement value can be clearly expressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-12445) Improve the display of ConsumerPerformance indicators
[ https://issues.apache.org/jira/browse/KAFKA-12445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-12445: - Attachment: image-2021-03-10-13-30-27-734.png > Improve the display of ConsumerPerformance indicators > - > > Key: KAFKA-12445 > URL: https://issues.apache.org/jira/browse/KAFKA-12445 > Project: Kafka > Issue Type: Improvement > Components: tools >Affects Versions: 2.7.0 >Reporter: Wenbing Shen >Priority: Minor > Attachments: image-2021-03-10-13-30-27-734.png > > > The current test indicators are shown below, the user experience is poor, and > there is no intuitive display of the meaning of each indicator. > bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic > test-perf10 --messages 4 --from-latest > start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, > nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec > 2021-03-10 04:32:54:349, 2021-03-10 04:33:45:651, 390.6348, 7.6144, 40001, > 779.7162, 3096, 48206, 8.1034, 829.7930 > > show-detailed-stats: > bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic > test-perf --messages 1 --show-detailed-stats > time, threadId, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, > rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec > 2021-03-10 11:19:00:146, 0, 785.6112, 157.1222, 823773, 164754.6000, > 1615346338626, -1615346333626, 0., 0. > 2021-03-10 11:19:05:146, 0, 4577.7817, 758.4341, 4800152, 795275.8000, 0, > 5000, 758.4341, 795275.8000 > 2021-03-10 11:19:10:146, 0, 8556.0875, 795.6612, 8971708, 834311.2000, 0, > 5000, 795.6612, 834311.2000 > 2021-03-10 11:19:15:286, 0, 9526.5665, 188.8091, 9989329, 197980.7393, 0, > 5140, 188.8091, 197980.7393 > 2021-03-10 11:19:20:310, 0, 9536.3321, 1.9438, 569, 2038.2166, 0, 5024, > 1.9438, 2038.2166 > > One of the optimization methods is to display the indicator variable name and > indicator test value in the form of a table, so that the meaning of each > measurement value can be clearly expressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12445) Improve the display of ConsumerPerformance indicators
Wenbing Shen created KAFKA-12445: Summary: Improve the display of ConsumerPerformance indicators Key: KAFKA-12445 URL: https://issues.apache.org/jira/browse/KAFKA-12445 Project: Kafka Issue Type: Improvement Components: tools Affects Versions: 2.7.0 Reporter: Wenbing Shen The current test indicators are shown below, the user experience is poor, and there is no intuitive display of the meaning of each indicator. bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic test-perf10 --messages 4 --from-latest start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec 2021-03-10 04:32:54:349, 2021-03-10 04:33:45:651, 390.6348, 7.6144, 40001, 779.7162, 3096, 48206, 8.1034, 829.7930 show-detailed-stats: bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic test-perf --messages 1 --show-detailed-stats time, threadId, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec 2021-03-10 11:19:00:146, 0, 785.6112, 157.1222, 823773, 164754.6000, 1615346338626, -1615346333626, 0., 0. 2021-03-10 11:19:05:146, 0, 4577.7817, 758.4341, 4800152, 795275.8000, 0, 5000, 758.4341, 795275.8000 2021-03-10 11:19:10:146, 0, 8556.0875, 795.6612, 8971708, 834311.2000, 0, 5000, 795.6612, 834311.2000 2021-03-10 11:19:15:286, 0, 9526.5665, 188.8091, 9989329, 197980.7393, 0, 5140, 188.8091, 197980.7393 2021-03-10 11:19:20:310, 0, 9536.3321, 1.9438, 569, 2038.2166, 0, 5024, 1.9438, 2038.2166 One of the optimization methods is to display the indicator variable name and indicator test value in the form of a table, so that the meaning of each measurement value can be clearly expressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-10886) Kafka crashed in windows environment2
[ https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-10886: - Attachment: (was: image-2021-03-01-14-29-55-418.png) > Kafka crashed in windows environment2 > - > > Key: KAFKA-10886 > URL: https://issues.apache.org/jira/browse/KAFKA-10886 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.0.0 > Environment: Windows Server >Reporter: Wenbing Shen >Priority: Critical > Labels: windows > Attachments: windows_kafka_full_crash.patch > > > I tried using the > [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix > the Kafka problem in the Windows environment, but it didn't seem to > work.These include restarting the Kafka service causing data to be deleted by > mistake, deleting a topic or a partition migration causing a disk to go > offline or the broker crashed. > [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23 > 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException: > > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at > kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log.maybeHandleIOException(Log.scala:1837) at > kafka.log.Log.renameDir(Log.scala:687) at > kafka.log.LogManager.asyncDelete(LogManager.scala:833) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at > kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at > kafka.cluster.Partition.delete(Partition.scala:262) at > kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at > kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at > kafka.server.KafkaApis.handle(KafkaApis.scala:113) at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at > java.lang.Thread.run(Unknown Source) Suppressed: > java.nio.file.AccessDeniedException: > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) > ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler > kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving > replicas in dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10886) Kafka crashed in windows environment2
[ https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292652#comment-17292652 ] Wenbing Shen commented on KAFKA-10886: -- [~mehajabeeny] I'm not sure if applying 2.7.0 from the kafka-2.0.0 version would cause problems, but I fixed kafka-2.0.0 version, which is currently working well on our company's products in the customer's field. kafka-9458 does not fully solve the problem in windows system. These include restarting the Kafka service causing data to be deleted by mistake, deleting a topic or a partition migration causing a disk to go offline or the broker crashed.Can you manually go through the code to apply this patch, and have you solved the problem currently? > Kafka crashed in windows environment2 > - > > Key: KAFKA-10886 > URL: https://issues.apache.org/jira/browse/KAFKA-10886 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.0.0 > Environment: Windows Server >Reporter: Wenbing Shen >Priority: Critical > Labels: windows > Attachments: image-2021-03-01-14-29-55-418.png, > windows_kafka_full_crash.patch > > > I tried using the > [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix > the Kafka problem in the Windows environment, but it didn't seem to > work.These include restarting the Kafka service causing data to be deleted by > mistake, deleting a topic or a partition migration causing a disk to go > offline or the broker crashed. > [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23 > 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException: > > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at > kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log.maybeHandleIOException(Log.scala:1837) at > kafka.log.Log.renameDir(Log.scala:687) at > kafka.log.LogManager.asyncDelete(LogManager.scala:833) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at > kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at > kafka.cluster.Partition.delete(Partition.scala:262) at > kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at > kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at > kafka.server.KafkaApis.handle(KafkaApis.scala:113) at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at > java.lang.Thread.run(Unknown Source) Suppressed: > java.nio.file.AccessDeniedException: > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at >
[jira] [Updated] (KAFKA-10886) Kafka crashed in windows environment2
[ https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-10886: - Attachment: image-2021-03-01-14-29-55-418.png > Kafka crashed in windows environment2 > - > > Key: KAFKA-10886 > URL: https://issues.apache.org/jira/browse/KAFKA-10886 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.0.0 > Environment: Windows Server >Reporter: Wenbing Shen >Priority: Critical > Labels: windows > Attachments: image-2021-03-01-14-29-55-418.png, > windows_kafka_full_crash.patch > > > I tried using the > [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix > the Kafka problem in the Windows environment, but it didn't seem to > work.These include restarting the Kafka service causing data to be deleted by > mistake, deleting a topic or a partition migration causing a disk to go > offline or the broker crashed. > [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23 > 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException: > > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at > kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log.maybeHandleIOException(Log.scala:1837) at > kafka.log.Log.renameDir(Log.scala:687) at > kafka.log.LogManager.asyncDelete(LogManager.scala:833) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at > kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at > kafka.cluster.Partition.delete(Partition.scala:262) at > kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at > kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at > kafka.server.KafkaApis.handle(KafkaApis.scala:113) at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at > java.lang.Thread.run(Unknown Source) Suppressed: > java.nio.file.AccessDeniedException: > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) > ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler > kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving > replicas in dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12157) test Upgrade 2.7.0 from 2.0.0 occur a question
Wenbing Shen created KAFKA-12157: Summary: test Upgrade 2.7.0 from 2.0.0 occur a question Key: KAFKA-12157 URL: https://issues.apache.org/jira/browse/KAFKA-12157 Project: Kafka Issue Type: Bug Components: log Affects Versions: 2.7.0 Reporter: Wenbing Shen Attachments: 1001server.log, 1001serverlog.png, 1003server.log, 1003serverlog.png, 1003statechange.log I was in a test environment, rolling upgrade from version 2.0.0 to version 2.7.0, and encountered the following problems. When the rolling upgrade progressed to the second round, I stopped the first broker(1001) in the second round and the following error occurred. When an agent processes the client producer request, the starting offset of the leader epoch of the partition leader suddenly becomes 0, and then continues to process write requests for the same partition, and an error log will appear.All partition leaders with 1001 replicas are transferred to the 1003 node, and these partitions on the 1003 node will generate this error if they receive production requests.When I restart 1001, the 1001 broker will report the following error: [2021-01-06 16:46:55,955] ERROR (ReplicaFetcherThread-8-1003 kafka.server.ReplicaFetcherThread 76) [ReplicaFetcher replicaId=1001, leaderId=1003, fetcherId=8] Unexpected error occurred while processing data for partition test-perf1-9 at offset 9666953 I use the following command to make a production request: nohup /home/kafka/software/kafka/bin/kafka-producer-perf-test.sh --num-records 1 --record-size 1000 --throughput 3 --producer-props bootstrap.servers=hdp1:9092,hdp2:9092,hdp3:9092 acks=1 --topic test-perf1 > 1pro.log 2>&1 & I tried to reproduce the problem again, but after three attempts, it did not reappear. I am curious how this problem occurred and why the 1003 broker resets startOffset to 0 of leaderEpoch 4 when the offset is assigned by broker in Log.append function. broker 1003: server.log [2021-01-06 16:37:59,492] WARN (data-plane-kafka-request-handler-131 kafka.server.epoch.LeaderEpochFileCache 70) [LeaderEpochCache test-perf1-9] New epoch en try EpochEntry(epoch=4, startOffset=0) caused truncation of conflicting entries ListBuffer(EpochEntry(epoch=4, startOffset=9667122), EpochEntry(epoch=3, star tOffset=9195729), EpochEntry(epoch=2, startOffset=8348201)). Cache now contains 0 entries. [2021-01-06 16:37:59,493] WARN (data-plane-kafka-request-handler-131 kafka.server.epoch.LeaderEpochFileCache 70) [LeaderEpochCache test-perf1-8] New epoch en try EpochEntry(epoch=3, startOffset=0) caused truncation of conflicting entries ListBuffer(EpochEntry(epoch=3, startOffset=9667478), EpochEntry(epoch=2, star tOffset=9196127), EpochEntry(epoch=1, startOffset=8342787)). Cache now contains 0 entries. [2021-01-06 16:37:59,495] WARN (data-plane-kafka-request-handler-131 kafka.server.epoch.LeaderEpochFileCache 70) [LeaderEpochCache test-perf1-2] New epoch en try EpochEntry(epoch=3, startOffset=0) caused truncation of conflicting entries ListBuffer(EpochEntry(epoch=3, startOffset=9667478), EpochEntry(epoch=2, star tOffset=9196127), EpochEntry(epoch=1, startOffset=8336727)). Cache now contains 0 entries. [2021-01-06 16:37:59,498] ERROR (data-plane-kafka-request-handler-142 kafka.server.ReplicaManager 76) [ReplicaManager broker=1003] Error processing append op eration on partition test-perf1-9 java.lang.IllegalArgumentException: Received invalid partition leader epoch entry EpochEntry(epoch=4, startOffset=-3) at kafka.server.epoch.LeaderEpochFileCache.assign(LeaderEpochFileCache.scala:67) at kafka.server.epoch.LeaderEpochFileCache.assign(LeaderEpochFileCache.scala:59) at kafka.log.Log.maybeAssignEpochStartOffset(Log.scala:1268) at kafka.log.Log.$anonfun$append$6(Log.scala:1181) at kafka.log.Log$$Lambda$935/184936331.accept(Unknown Source) at java.lang.Iterable.forEach(Iterable.java:75) at kafka.log.Log.$anonfun$append$2(Log.scala:1179) at kafka.log.Log.append(Log.scala:2387) at kafka.log.Log.appendAsLeader(Log.scala:1050) at kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1079) at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1067) at kafka.server.ReplicaManager.$anonfun$appendToLocalLog$4(ReplicaManager.scala:953) at kafka.server.ReplicaManager$$Lambda$1025/1369541490.apply(Unknown Source) at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) at scala.collection.mutable.HashMap.map(HashMap.scala:35) at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:941) at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:621) at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:625) broker 1001:server.log [2021-01-06 16:46:55,955]
[jira] [Assigned] (KAFKA-10904) There is a misleading log when the replica fetcher thread handles offsets that are out of range
[ https://issues.apache.org/jira/browse/KAFKA-10904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen reassigned KAFKA-10904: Assignee: Wenbing Shen > There is a misleading log when the replica fetcher thread handles offsets > that are out of range > --- > > Key: KAFKA-10904 > URL: https://issues.apache.org/jira/browse/KAFKA-10904 > Project: Kafka > Issue Type: Improvement > Components: replication >Affects Versions: 2.7.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Minor > Attachments: ReplicaFetcherThread-has-a-misleading-log.png > > > There is ambiguity in the replica fetcher thread's log. When the fetcher > thread is handling with offset out of range, it needs to try to truncate the > log. When the end offset of the follower replica is greater than the log > start offset of the leader replica and smaller than the end offset of the > leader replica, the follower replica will maintain its own fetch > offset.However, such cases are processed together with cases where the > follower replica's end offset is smaller than the leader replica's start > offset, resulting in ambiguities in the log, where the follower replica's > fetch offset is reported to reset to the leader replica's start offset.In > fact, it still maintains its own fetch offset, so this WARN log is misleading > to the user. > > [2020-11-12 05:30:54,319] WARN (ReplicaFetcherThread-1-1003 > kafka.server.ReplicaFetcherThread 70) [ReplicaFetcher replicaId=1 > 010, leaderId=1003, fetcherId=1] Reset fetch offset for partition > eb_raw_msdns-17 from 1933959108 to current leader's start o > ffset 1883963889 > [2020-11-12 05:30:54,320] INFO (ReplicaFetcherThread-1-1003 > kafka.server.ReplicaFetcherThread 66) [ReplicaFetcher replicaId=1 > 010, leaderId=1003, fetcherId=1] Current offset 1933959108 for partition > eb_raw_msdns-17 is out of range, which typically imp > lies a leader change. Reset fetch offset to 1933959108 > > I think it is more accurate to print the WARN log only when follower replica > really need to truncate the fetch offset to the leader replica's log start > offset. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10904) There is a misleading log when the replica fetcher thread handles offsets that are out of range
Wenbing Shen created KAFKA-10904: Summary: There is a misleading log when the replica fetcher thread handles offsets that are out of range Key: KAFKA-10904 URL: https://issues.apache.org/jira/browse/KAFKA-10904 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 2.7.0 Reporter: Wenbing Shen Attachments: ReplicaFetcherThread-has-a-misleading-log.png There is ambiguity in the replica fetcher thread's log. When the fetcher thread is handling with offset out of range, it needs to try to truncate the log. When the end offset of the follower replica is greater than the log start offset of the leader replica and smaller than the end offset of the leader replica, the follower replica will maintain its own fetch offset.However, such cases are processed together with cases where the follower replica's end offset is smaller than the leader replica's start offset, resulting in ambiguities in the log, where the follower replica's fetch offset is reported to reset to the leader replica's start offset.In fact, it still maintains its own fetch offset, so this WARN log is misleading to the user. [2020-11-12 05:30:54,319] WARN (ReplicaFetcherThread-1-1003 kafka.server.ReplicaFetcherThread 70) [ReplicaFetcher replicaId=1 010, leaderId=1003, fetcherId=1] Reset fetch offset for partition eb_raw_msdns-17 from 1933959108 to current leader's start o ffset 1883963889 [2020-11-12 05:30:54,320] INFO (ReplicaFetcherThread-1-1003 kafka.server.ReplicaFetcherThread 66) [ReplicaFetcher replicaId=1 010, leaderId=1003, fetcherId=1] Current offset 1933959108 for partition eb_raw_msdns-17 is out of range, which typically imp lies a leader change. Reset fetch offset to 1933959108 I think it is more accurate to print the WARN log only when follower replica really need to truncate the fetch offset to the leader replica's log start offset. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10889) The log cleaner is not working for topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256867#comment-17256867 ] Wenbing Shen commented on KAFKA-10889: -- [~becket_qin] Hello,Qin,I found an internal theme that has not been deleted since September 30 this year, and there are many empty log segments. Because the log segment cannot be deleted, a partition copy of this theme has occupied 753G. Do you know what caused this? !20201231-1.png! > The log cleaner is not working for topic partitions > --- > > Key: KAFKA-10889 > URL: https://issues.apache.org/jira/browse/KAFKA-10889 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, > 20201231-1.png, 20201231-2.png, 20201231-3.png, 20201231-4.png, > 20201231-5.png, image-2020-12-28-17-17-15-947.png > > > * I have a topic that is reserved for the default of 7 days, but the log > exists from October 26th to December 25th today.The log cleaner doesn't seem > to be working on it.This seems to be an underlying problem in Kafka. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-10889) The log cleaner is not working for topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-10889: - Attachment: 20201231-5.png 20201231-4.png 20201231-3.png 20201231-2.png 20201231-1.png > The log cleaner is not working for topic partitions > --- > > Key: KAFKA-10889 > URL: https://issues.apache.org/jira/browse/KAFKA-10889 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, > 20201231-1.png, 20201231-2.png, 20201231-3.png, 20201231-4.png, > 20201231-5.png, image-2020-12-28-17-17-15-947.png > > > * I have a topic that is reserved for the default of 7 days, but the log > exists from October 26th to December 25th today.The log cleaner doesn't seem > to be working on it.This seems to be an underlying problem in Kafka. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-10891) The control plane needs to force the validation of requests from the controller
[ https://issues.apache.org/jira/browse/KAFKA-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-10891: - Component/s: (was: core) > The control plane needs to force the validation of requests from the > controller > --- > > Key: KAFKA-10891 > URL: https://issues.apache.org/jira/browse/KAFKA-10891 > Project: Kafka > Issue Type: Improvement > Components: controller, network >Affects Versions: 2.7.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Major > Attachments: 0880c08b0110fd91d30e.png, 0880c08b0110fdd1eb0f.png > > > Current, data and control request through different plane in isolation, these > endpoints are registered to the zookeeper node plane, this will cause the > client to obtain the control endpoint, the client may use control endpoint > for production, consumption and other data request, this violates the > separation of data requests and the design of the control request, the server > needs to be in the control plane inspection control request, refused to > request data through the control plane. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-10891) The control plane needs to force the validation of requests from the controller
[ https://issues.apache.org/jira/browse/KAFKA-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-10891: - Component/s: network > The control plane needs to force the validation of requests from the > controller > --- > > Key: KAFKA-10891 > URL: https://issues.apache.org/jira/browse/KAFKA-10891 > Project: Kafka > Issue Type: Improvement > Components: controller, core, network >Affects Versions: 2.7.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Major > Attachments: 0880c08b0110fd91d30e.png, 0880c08b0110fdd1eb0f.png > > > Current, data and control request through different plane in isolation, these > endpoints are registered to the zookeeper node plane, this will cause the > client to obtain the control endpoint, the client may use control endpoint > for production, consumption and other data request, this violates the > separation of data requests and the design of the control request, the server > needs to be in the control plane inspection control request, refused to > request data through the control plane. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10891) The control plane needs to force the validation of requests from the controller
[ https://issues.apache.org/jira/browse/KAFKA-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256052#comment-17256052 ] Wenbing Shen commented on KAFKA-10891: -- [~junrao] Can you take a look at this PR?Thank you very much! > The control plane needs to force the validation of requests from the > controller > --- > > Key: KAFKA-10891 > URL: https://issues.apache.org/jira/browse/KAFKA-10891 > Project: Kafka > Issue Type: Improvement > Components: controller, core >Affects Versions: 2.7.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Major > Attachments: 0880c08b0110fd91d30e.png, 0880c08b0110fdd1eb0f.png > > > Current, data and control request through different plane in isolation, these > endpoints are registered to the zookeeper node plane, this will cause the > client to obtain the control endpoint, the client may use control endpoint > for production, consumption and other data request, this violates the > separation of data requests and the design of the control request, the server > needs to be in the control plane inspection control request, refused to > request data through the control plane. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10889) The log cleaner is not working for topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255980#comment-17255980 ] Wenbing Shen commented on KAFKA-10889: -- [~becket_qin] Thank you for your guidance. > The log cleaner is not working for topic partitions > --- > > Key: KAFKA-10889 > URL: https://issues.apache.org/jira/browse/KAFKA-10889 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, > image-2020-12-28-17-17-15-947.png > > > * I have a topic that is reserved for the default of 7 days, but the log > exists from October 26th to December 25th today.The log cleaner doesn't seem > to be working on it.This seems to be an underlying problem in Kafka. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10891) The control plane needs to force the validation of requests from the controller
[ https://issues.apache.org/jira/browse/KAFKA-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255911#comment-17255911 ] Wenbing Shen commented on KAFKA-10891: -- [~showuon] I added validation of control requests in PR, and, when the control plane is on, the broker sends the controlled shutdown request through the control plane. > The control plane needs to force the validation of requests from the > controller > --- > > Key: KAFKA-10891 > URL: https://issues.apache.org/jira/browse/KAFKA-10891 > Project: Kafka > Issue Type: Improvement > Components: controller, core >Affects Versions: 2.7.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Major > Attachments: 0880c08b0110fd91d30e.png, 0880c08b0110fdd1eb0f.png > > > Current, data and control request through different plane in isolation, these > endpoints are registered to the zookeeper node plane, this will cause the > client to obtain the control endpoint, the client may use control endpoint > for production, consumption and other data request, this violates the > separation of data requests and the design of the control request, the server > needs to be in the control plane inspection control request, refused to > request data through the control plane. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10891) The control plane needs to force the validation of requests from the controller
Wenbing Shen created KAFKA-10891: Summary: The control plane needs to force the validation of requests from the controller Key: KAFKA-10891 URL: https://issues.apache.org/jira/browse/KAFKA-10891 Project: Kafka Issue Type: Improvement Components: controller, core Affects Versions: 2.7.0 Reporter: Wenbing Shen Assignee: Wenbing Shen Attachments: 0880c08b0110fd91d30e.png, 0880c08b0110fdd1eb0f.png Current, data and control request through different plane in isolation, these endpoints are registered to the zookeeper node plane, this will cause the client to obtain the control endpoint, the client may use control endpoint for production, consumption and other data request, this violates the separation of data requests and the design of the control request, the server needs to be in the control plane inspection control request, refused to request data through the control plane. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10889) The log cleaner is not working for topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255618#comment-17255618 ] Wenbing Shen commented on KAFKA-10889: -- [~becket_qin] Why not isolate the message creation time from the log append time,return the creation time for the client,and use the log append time between brokers,which is more friendly to log cleanup. > The log cleaner is not working for topic partitions > --- > > Key: KAFKA-10889 > URL: https://issues.apache.org/jira/browse/KAFKA-10889 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, > image-2020-12-28-17-17-15-947.png > > > * I have a topic that is reserved for the default of 7 days, but the log > exists from October 26th to December 25th today.The log cleaner doesn't seem > to be working on it.This seems to be an underlying problem in Kafka. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-10889) The log cleaner is not working for topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255491#comment-17255491 ] Wenbing Shen edited comment on KAFKA-10889 at 12/28/20, 9:43 AM: - Hello,When the user is unfamiliar with the configuration, is there a way to solve the problem by design? [~becket_qin] [~junrao] was (Author: wenbing.shen): [~becket_qin] [~junrao] > The log cleaner is not working for topic partitions > --- > > Key: KAFKA-10889 > URL: https://issues.apache.org/jira/browse/KAFKA-10889 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, > image-2020-12-28-17-17-15-947.png > > > * I have a topic that is reserved for the default of 7 days, but the log > exists from October 26th to December 25th today.The log cleaner doesn't seem > to be working on it.This seems to be an underlying problem in Kafka. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-10889) The log cleaner is not working for topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255491#comment-17255491 ] Wenbing Shen edited comment on KAFKA-10889 at 12/28/20, 9:43 AM: - [~becket_qin] [~junrao] Hello,When the user is unfamiliar with the configuration, is there a way to solve the problem by design? was (Author: wenbing.shen): Hello,When the user is unfamiliar with the configuration, is there a way to solve the problem by design? [~becket_qin] [~junrao] > The log cleaner is not working for topic partitions > --- > > Key: KAFKA-10889 > URL: https://issues.apache.org/jira/browse/KAFKA-10889 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, > image-2020-12-28-17-17-15-947.png > > > * I have a topic that is reserved for the default of 7 days, but the log > exists from October 26th to December 25th today.The log cleaner doesn't seem > to be working on it.This seems to be an underlying problem in Kafka. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10889) The log cleaner is not working for topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255491#comment-17255491 ] Wenbing Shen commented on KAFKA-10889: -- [~becket_qin] [~junrao] > The log cleaner is not working for topic partitions > --- > > Key: KAFKA-10889 > URL: https://issues.apache.org/jira/browse/KAFKA-10889 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, > image-2020-12-28-17-17-15-947.png > > > * I have a topic that is reserved for the default of 7 days, but the log > exists from October 26th to December 25th today.The log cleaner doesn't seem > to be working on it.This seems to be an underlying problem in Kafka. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-10889) The log cleaner is not working for topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255480#comment-17255480 ] Wenbing Shen edited comment on KAFKA-10889 at 12/28/20, 9:24 AM: - The problem appears to be related to this kip: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message] Related to these two configurations: message.timestamp.type=CreateTime message.timestamp.difference.max.ms=Long.MAXVALUE The client generated messages which timestamp is 2020-12-26 14:49:32,this results in a period before 2021-01-02 14:49:32,all log segments of this partition will not be deleted. !image-2020-12-28-17-17-15-947.png! was (Author: wenbing.shen): [https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message] message.timestamp.type max.message.time.difference.ms > The log cleaner is not working for topic partitions > --- > > Key: KAFKA-10889 > URL: https://issues.apache.org/jira/browse/KAFKA-10889 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, > image-2020-12-28-17-17-15-947.png > > > * I have a topic that is reserved for the default of 7 days, but the log > exists from October 26th to December 25th today.The log cleaner doesn't seem > to be working on it.This seems to be an underlying problem in Kafka. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10889) The log cleaner is not working for topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255480#comment-17255480 ] Wenbing Shen commented on KAFKA-10889: -- [https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message] message.timestamp.type max.message.time.difference.ms > The log cleaner is not working for topic partitions > --- > > Key: KAFKA-10889 > URL: https://issues.apache.org/jira/browse/KAFKA-10889 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png > > > * I have a topic that is reserved for the default of 7 days, but the log > exists from October 26th to December 25th today.The log cleaner doesn't seem > to be working on it.This seems to be an underlying problem in Kafka. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10889) The log cleaner is not working for topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254945#comment-17254945 ] Wenbing Shen commented on KAFKA-10889: -- * I'm getting it a little wrong here,log cleaner is only working on the log segment of the compact type.Uncompact segments are removed by the log manager's timed task.However, there is still a problem on my side that the log segment of non compact type has not been cleaned up. > The log cleaner is not working for topic partitions > --- > > Key: KAFKA-10889 > URL: https://issues.apache.org/jira/browse/KAFKA-10889 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png > > > * I have a topic that is reserved for the default of 7 days, but the log > exists from October 26th to December 25th today.The log cleaner doesn't seem > to be working on it.This seems to be an underlying problem in Kafka. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-10889) The log cleaner is not working for topic partitions
[ https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen reassigned KAFKA-10889: Assignee: Wenbing Shen > The log cleaner is not working for topic partitions > --- > > Key: KAFKA-10889 > URL: https://issues.apache.org/jira/browse/KAFKA-10889 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png > > > * I have a topic that is reserved for the default of 7 days, but the log > exists from October 26th to December 25th today.The log cleaner doesn't seem > to be working on it.This seems to be an underlying problem in Kafka. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10889) The log cleaner is not working for topic partitions
Wenbing Shen created KAFKA-10889: Summary: The log cleaner is not working for topic partitions Key: KAFKA-10889 URL: https://issues.apache.org/jira/browse/KAFKA-10889 Project: Kafka Issue Type: Bug Components: log cleaner Affects Versions: 2.0.0 Reporter: Wenbing Shen Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png * I have a topic that is reserved for the default of 7 days, but the log exists from October 26th to December 25th today.The log cleaner doesn't seem to be working on it.This seems to be an underlying problem in Kafka. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-10886) Kafka crashed in windows environment2
[ https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen reassigned KAFKA-10886: Assignee: (was: Wenbing Shen) > Kafka crashed in windows environment2 > - > > Key: KAFKA-10886 > URL: https://issues.apache.org/jira/browse/KAFKA-10886 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.0.0 > Environment: Windows Server >Reporter: Wenbing Shen >Priority: Critical > Labels: windows > Attachments: windows_kafka_full_crash.patch > > > I tried using the > [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix > the Kafka problem in the Windows environment, but it didn't seem to > work.These include restarting the Kafka service causing data to be deleted by > mistake, deleting a topic or a partition migration causing a disk to go > offline or the broker crashed. > [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23 > 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException: > > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at > kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log.maybeHandleIOException(Log.scala:1837) at > kafka.log.Log.renameDir(Log.scala:687) at > kafka.log.LogManager.asyncDelete(LogManager.scala:833) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at > kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at > kafka.cluster.Partition.delete(Partition.scala:262) at > kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at > kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at > kafka.server.KafkaApis.handle(KafkaApis.scala:113) at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at > java.lang.Thread.run(Unknown Source) Suppressed: > java.nio.file.AccessDeniedException: > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) > ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler > kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving > replicas in dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-10886) Kafka crashed in windows environment2
[ https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen reassigned KAFKA-10886: Assignee: Wenbing Shen > Kafka crashed in windows environment2 > - > > Key: KAFKA-10886 > URL: https://issues.apache.org/jira/browse/KAFKA-10886 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.0.0 > Environment: Windows Server >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Critical > Labels: windows > Attachments: windows_kafka_full_crash.patch > > > I tried using the > [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix > the Kafka problem in the Windows environment, but it didn't seem to > work.These include restarting the Kafka service causing data to be deleted by > mistake, deleting a topic or a partition migration causing a disk to go > offline or the broker crashed. > [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23 > 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException: > > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at > kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log.maybeHandleIOException(Log.scala:1837) at > kafka.log.Log.renameDir(Log.scala:687) at > kafka.log.LogManager.asyncDelete(LogManager.scala:833) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at > kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at > kafka.cluster.Partition.delete(Partition.scala:262) at > kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at > kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at > kafka.server.KafkaApis.handle(KafkaApis.scala:113) at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at > java.lang.Thread.run(Unknown Source) Suppressed: > java.nio.file.AccessDeniedException: > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) > ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler > kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving > replicas in dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-9458) Kafka crashed in windows environment
[ https://issues.apache.org/jira/browse/KAFKA-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251530#comment-17251530 ] Wenbing Shen edited comment on KAFKA-9458 at 12/23/20, 9:53 AM: The current patch is deficient. When topic is deleted or partition migration is carried out, the service will still be suspended or the disk will be offline. I have provided the following patch file, which is effective for self-test,My Kafka version is 2.0.0 . [^kafka_windows_crash_by_delete_topic_and_Partition_migration] https://issues.apache.org/jira/browse/KAFKA-10886 I provide a complete and effective patch. was (Author: wenbing.shen): The current patch is deficient. When topic is deleted or partition migration is carried out, the service will still be suspended or the disk will be offline. I have provided the following patch file, which is effective for self-test,My Kafka version is 2.0.0 . [^kafka_windows_crash_by_delete_topic_and_Partition_migration] > Kafka crashed in windows environment > > > Key: KAFKA-9458 > URL: https://issues.apache.org/jira/browse/KAFKA-9458 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.4.0 > Environment: Windows Server 2019 >Reporter: hirik >Priority: Critical > Labels: windows > Attachments: Windows_crash_fix.patch, > kafka_windows_crash_by_delete_topic_and_Partition_migration, logs.zip > > > Hi, > while I was trying to validate Kafka retention policy, Kafka Server crashed > with below exception trace. > [2020-01-21 17:10:40,475] INFO [Log partition=test1-3, > dir=C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka] > Rolled new log segment at offset 1 in 52 ms. (kafka.log.Log) > [2020-01-21 17:10:40,484] ERROR Error while deleting segments for test1-3 in > dir C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka > (kafka.server.LogDirFailureChannel) > java.nio.file.FileSystemException: > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex > -> > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex.deleted: > The process cannot access the file because it is being used by another > process. > at > java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) > at > java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) > at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395) > at > java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292) > at java.base/java.nio.file.Files.move(Files.java:1425) > at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:795) > at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209) > at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:497) > at kafka.log.Log.$anonfun$deleteSegmentFiles$1(Log.scala:2206) > at kafka.log.Log.$anonfun$deleteSegmentFiles$1$adapted(Log.scala:2206) > at scala.collection.immutable.List.foreach(List.scala:305) > at kafka.log.Log.deleteSegmentFiles(Log.scala:2206) > at kafka.log.Log.removeAndDeleteSegments(Log.scala:2191) > at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1700) > at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.scala:17) > at kafka.log.Log.maybeHandleIOException(Log.scala:2316) > at kafka.log.Log.deleteSegments(Log.scala:1691) > at kafka.log.Log.deleteOldSegments(Log.scala:1686) > at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1763) > at kafka.log.Log.deleteOldSegments(Log.scala:1753) > at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:982) > at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:979) > at scala.collection.immutable.List.foreach(List.scala:305) > at kafka.log.LogManager.cleanupLogs(LogManager.scala:979) > at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:403) > at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:830) > Suppressed: java.nio.file.FileSystemException: >
[jira] [Comment Edited] (KAFKA-10886) Kafka crashed in windows environment2
[ https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253993#comment-17253993 ] Wenbing Shen edited comment on KAFKA-10886 at 12/23/20, 9:49 AM: - After reference kafka-9458, I provided the above patch, which completely solves the kafka compatibility problem with Windows, and this is a qualified patch was (Author: wenbing.shen): I refer to [kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] and completely fix the kafka compatibility problem in Windows environment. This is a qualified patch. > Kafka crashed in windows environment2 > - > > Key: KAFKA-10886 > URL: https://issues.apache.org/jira/browse/KAFKA-10886 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.0.0 > Environment: Windows Server >Reporter: Wenbing Shen >Priority: Critical > Labels: windows > Attachments: windows_kafka_full_crash.patch > > > I tried using the > [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix > the Kafka problem in the Windows environment, but it didn't seem to > work.These include restarting the Kafka service causing data to be deleted by > mistake, deleting a topic or a partition migration causing a disk to go > offline or the broker crashed. > [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23 > 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException: > > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at > kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log.maybeHandleIOException(Log.scala:1837) at > kafka.log.Log.renameDir(Log.scala:687) at > kafka.log.LogManager.asyncDelete(LogManager.scala:833) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at > kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at > kafka.cluster.Partition.delete(Partition.scala:262) at > kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at > kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at > kafka.server.KafkaApis.handle(KafkaApis.scala:113) at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at > java.lang.Thread.run(Unknown Source) Suppressed: > java.nio.file.AccessDeniedException: > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) > ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler > kafka.server.ReplicaManager 66) [ReplicaManager broker=1002]
[jira] [Updated] (KAFKA-10886) Kafka crashed in windows environment2
[ https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-10886: - External issue URL: (was: https://issues.apache.org/jira/browse/KAFKA-9458) > Kafka crashed in windows environment2 > - > > Key: KAFKA-10886 > URL: https://issues.apache.org/jira/browse/KAFKA-10886 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.0.0 > Environment: Windows Server >Reporter: Wenbing Shen >Priority: Critical > Labels: windows > > I tried using the Kafka-9458 patch to fix the Kafka problem in the Windows > environment, but it didn't seem to work.These include restarting the Kafka > service causing data to be deleted by mistake, deleting a topic or a > partition migration causing a disk to go offline or the broker crashed. > [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23 > 17:26:11,124] ERROR (kafka-request-handler-11 > kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 > in log dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException: > > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at > kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at > kafka.log.Log.maybeHandleIOException(Log.scala:1837) at > kafka.log.Log.renameDir(Log.scala:687) at > kafka.log.LogManager.asyncDelete(LogManager.scala:833) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at > kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at > kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at > kafka.cluster.Partition.delete(Partition.scala:262) at > kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at > kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at > kafka.server.KafkaApis.handle(KafkaApis.scala:113) at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at > java.lang.Thread.run(Unknown Source) Suppressed: > java.nio.file.AccessDeniedException: > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 > -> > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at > sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at > sun.nio.fs.WindowsFileCopy.move(Unknown Source) at > sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at > java.nio.file.Files.move(Unknown Source) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) > ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler > kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving > replicas in dir > E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-10886) Kafka crashed in windows environment2
[ https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-10886: - Description: I tried using the [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix the Kafka problem in the Windows environment, but it didn't seem to work.These include restarting the Kafka service causing data to be deleted by mistake, deleting a topic or a partition migration causing a disk to go offline or the broker crashed. [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 in log dir E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 in log dir E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException: E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 -> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at sun.nio.fs.WindowsFileCopy.move(Unknown Source) at sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at java.nio.file.Files.move(Unknown Source) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at kafka.log.Log.maybeHandleIOException(Log.scala:1837) at kafka.log.Log.renameDir(Log.scala:687) at kafka.log.LogManager.asyncDelete(LogManager.scala:833) at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at kafka.cluster.Partition.delete(Partition.scala:262) at kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371) at kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at kafka.server.KafkaApis.handle(KafkaApis.scala:113) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at java.lang.Thread.run(Unknown Source) Suppressed: java.nio.file.AccessDeniedException: E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 -> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at sun.nio.fs.WindowsFileCopy.move(Unknown Source) at sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at java.nio.file.Files.move(Unknown Source) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving replicas in dir E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log was: I tried using the Kafka-9458 patch to fix the Kafka problem in the Windows environment, but it didn't seem to work.These include restarting the Kafka service causing data to be deleted by mistake, deleting a topic or a partition migration causing a disk to go offline or the broker crashed. [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 in log dir E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 in log dir E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException: E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 ->
[jira] [Created] (KAFKA-10886) Kafka crashed in windows environment2
Wenbing Shen created KAFKA-10886: Summary: Kafka crashed in windows environment2 Key: KAFKA-10886 URL: https://issues.apache.org/jira/browse/KAFKA-10886 Project: Kafka Issue Type: Bug Components: log Affects Versions: 2.0.0 Environment: Windows Server Reporter: Wenbing Shen I tried using the Kafka-9458 patch to fix the Kafka problem in the Windows environment, but it didn't seem to work.These include restarting the Kafka service causing data to be deleted by mistake, deleting a topic or a partition migration causing a disk to go offline or the broker crashed. [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 in log dir E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 in log dir E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException: E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 -> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at sun.nio.fs.WindowsFileCopy.move(Unknown Source) at sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at java.nio.file.Files.move(Unknown Source) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at kafka.log.Log.maybeHandleIOException(Log.scala:1837) at kafka.log.Log.renameDir(Log.scala:687) at kafka.log.LogManager.asyncDelete(LogManager.scala:833) at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at kafka.cluster.Partition.delete(Partition.scala:262) at kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371) at kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at kafka.server.KafkaApis.handle(KafkaApis.scala:113) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at java.lang.Thread.run(Unknown Source) Suppressed: java.nio.file.AccessDeniedException: E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 -> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at sun.nio.fs.WindowsFileCopy.move(Unknown Source) at sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at java.nio.file.Files.move(Unknown Source) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving replicas in dir E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-10882) When sending a response to the client,a null pointer exception has occurred in the error code set
[ https://issues.apache.org/jira/browse/KAFKA-10882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-10882: - Attachment: 0880c08b0110fb95d40a.png > When sending a response to the client,a null pointer exception has occurred > in the error code set > - > > Key: KAFKA-10882 > URL: https://issues.apache.org/jira/browse/KAFKA-10882 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.11.0.4 >Reporter: Wenbing Shen >Priority: Major > Attachments: 0880c08b0110fb95d40a.png, 0880c08b0110fbd3bb0d.png, > 0880c08b0110fbd3fb0e.png > > > After the IO thread receives the production request, a null-pointer exception > occurs when the error code set is parsed when the response error is returned > to the client.A replica grab thread failed to resolve the protocol on another > proxy node.I don't know if this is the same problem. I hope I can get help. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10882) When sending a response to the client,a null pointer exception has occurred in the error code set
Wenbing Shen created KAFKA-10882: Summary: When sending a response to the client,a null pointer exception has occurred in the error code set Key: KAFKA-10882 URL: https://issues.apache.org/jira/browse/KAFKA-10882 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.11.0.4 Reporter: Wenbing Shen Attachments: 0880c08b0110fbd3bb0d.png, 0880c08b0110fbd3fb0e.png After the IO thread receives the production request, a null-pointer exception occurs when the error code set is parsed when the response error is returned to the client.A replica grab thread failed to resolve the protocol on another proxy node.I don't know if this is the same problem. I hope I can get help. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-9458) Kafka crashed in windows environment
[ https://issues.apache.org/jira/browse/KAFKA-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251530#comment-17251530 ] Wenbing Shen edited comment on KAFKA-9458 at 12/18/20, 5:57 AM: The current patch is deficient. When topic is deleted or partition migration is carried out, the service will still be suspended or the disk will be offline. I have provided the following patch file, which is effective for self-test,My Kafka version is 2.0.0 . [^kafka_windows_crash_by_delete_topic_and_Partition_migration] was (Author: wenbing.shen): The current patch is deficient. When topic is deleted or partition migration is carried out, the service will still be suspended or the disk will be offline. I have provided the following patch file, which is effective for self-test [^kafka_windows_crash_by_delete_topic_and_Partition_migration] > Kafka crashed in windows environment > > > Key: KAFKA-9458 > URL: https://issues.apache.org/jira/browse/KAFKA-9458 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.4.0 > Environment: Windows Server 2019 >Reporter: hirik >Priority: Critical > Labels: windows > Attachments: Windows_crash_fix.patch, > kafka_windows_crash_by_delete_topic_and_Partition_migration, logs.zip > > > Hi, > while I was trying to validate Kafka retention policy, Kafka Server crashed > with below exception trace. > [2020-01-21 17:10:40,475] INFO [Log partition=test1-3, > dir=C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka] > Rolled new log segment at offset 1 in 52 ms. (kafka.log.Log) > [2020-01-21 17:10:40,484] ERROR Error while deleting segments for test1-3 in > dir C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka > (kafka.server.LogDirFailureChannel) > java.nio.file.FileSystemException: > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex > -> > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex.deleted: > The process cannot access the file because it is being used by another > process. > at > java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) > at > java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) > at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395) > at > java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292) > at java.base/java.nio.file.Files.move(Files.java:1425) > at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:795) > at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209) > at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:497) > at kafka.log.Log.$anonfun$deleteSegmentFiles$1(Log.scala:2206) > at kafka.log.Log.$anonfun$deleteSegmentFiles$1$adapted(Log.scala:2206) > at scala.collection.immutable.List.foreach(List.scala:305) > at kafka.log.Log.deleteSegmentFiles(Log.scala:2206) > at kafka.log.Log.removeAndDeleteSegments(Log.scala:2191) > at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1700) > at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.scala:17) > at kafka.log.Log.maybeHandleIOException(Log.scala:2316) > at kafka.log.Log.deleteSegments(Log.scala:1691) > at kafka.log.Log.deleteOldSegments(Log.scala:1686) > at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1763) > at kafka.log.Log.deleteOldSegments(Log.scala:1753) > at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:982) > at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:979) > at scala.collection.immutable.List.foreach(List.scala:305) > at kafka.log.LogManager.cleanupLogs(LogManager.scala:979) > at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:403) > at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:830) > Suppressed: java.nio.file.FileSystemException: > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex > -> >
[jira] [Updated] (KAFKA-9458) Kafka crashed in windows environment
[ https://issues.apache.org/jira/browse/KAFKA-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-9458: Attachment: kafka_windows_crash_by_delete_topic_and_Partition_migration > Kafka crashed in windows environment > > > Key: KAFKA-9458 > URL: https://issues.apache.org/jira/browse/KAFKA-9458 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.4.0 > Environment: Windows Server 2019 >Reporter: hirik >Priority: Critical > Labels: windows > Attachments: Windows_crash_fix.patch, > kafka_windows_crash_by_delete_topic_and_Partition_migration, logs.zip > > > Hi, > while I was trying to validate Kafka retention policy, Kafka Server crashed > with below exception trace. > [2020-01-21 17:10:40,475] INFO [Log partition=test1-3, > dir=C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka] > Rolled new log segment at offset 1 in 52 ms. (kafka.log.Log) > [2020-01-21 17:10:40,484] ERROR Error while deleting segments for test1-3 in > dir C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka > (kafka.server.LogDirFailureChannel) > java.nio.file.FileSystemException: > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex > -> > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex.deleted: > The process cannot access the file because it is being used by another > process. > at > java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) > at > java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) > at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395) > at > java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292) > at java.base/java.nio.file.Files.move(Files.java:1425) > at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:795) > at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209) > at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:497) > at kafka.log.Log.$anonfun$deleteSegmentFiles$1(Log.scala:2206) > at kafka.log.Log.$anonfun$deleteSegmentFiles$1$adapted(Log.scala:2206) > at scala.collection.immutable.List.foreach(List.scala:305) > at kafka.log.Log.deleteSegmentFiles(Log.scala:2206) > at kafka.log.Log.removeAndDeleteSegments(Log.scala:2191) > at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1700) > at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.scala:17) > at kafka.log.Log.maybeHandleIOException(Log.scala:2316) > at kafka.log.Log.deleteSegments(Log.scala:1691) > at kafka.log.Log.deleteOldSegments(Log.scala:1686) > at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1763) > at kafka.log.Log.deleteOldSegments(Log.scala:1753) > at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:982) > at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:979) > at scala.collection.immutable.List.foreach(List.scala:305) > at kafka.log.LogManager.cleanupLogs(LogManager.scala:979) > at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:403) > at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:830) > Suppressed: java.nio.file.FileSystemException: > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex > -> > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex.deleted: > The process cannot access the file because it is being used by another > process. > at > java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) > at > java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) > at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:309) > at > java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292) > at java.base/java.nio.file.Files.move(Files.java:1425) > at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:792) > ... 27 more >
[jira] [Commented] (KAFKA-9458) Kafka crashed in windows environment
[ https://issues.apache.org/jira/browse/KAFKA-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251530#comment-17251530 ] Wenbing Shen commented on KAFKA-9458: - The current patch is deficient. When topic is deleted or partition migration is carried out, the service will still be suspended or the disk will be offline. I have provided the following patch file, which is effective for self-test [^kafka_windows_crash_by_delete_topic_and_Partition_migration] > Kafka crashed in windows environment > > > Key: KAFKA-9458 > URL: https://issues.apache.org/jira/browse/KAFKA-9458 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.4.0 > Environment: Windows Server 2019 >Reporter: hirik >Priority: Critical > Labels: windows > Attachments: Windows_crash_fix.patch, > kafka_windows_crash_by_delete_topic_and_Partition_migration, logs.zip > > > Hi, > while I was trying to validate Kafka retention policy, Kafka Server crashed > with below exception trace. > [2020-01-21 17:10:40,475] INFO [Log partition=test1-3, > dir=C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka] > Rolled new log segment at offset 1 in 52 ms. (kafka.log.Log) > [2020-01-21 17:10:40,484] ERROR Error while deleting segments for test1-3 in > dir C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka > (kafka.server.LogDirFailureChannel) > java.nio.file.FileSystemException: > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex > -> > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex.deleted: > The process cannot access the file because it is being used by another > process. > at > java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) > at > java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) > at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395) > at > java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292) > at java.base/java.nio.file.Files.move(Files.java:1425) > at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:795) > at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209) > at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:497) > at kafka.log.Log.$anonfun$deleteSegmentFiles$1(Log.scala:2206) > at kafka.log.Log.$anonfun$deleteSegmentFiles$1$adapted(Log.scala:2206) > at scala.collection.immutable.List.foreach(List.scala:305) > at kafka.log.Log.deleteSegmentFiles(Log.scala:2206) > at kafka.log.Log.removeAndDeleteSegments(Log.scala:2191) > at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1700) > at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.scala:17) > at kafka.log.Log.maybeHandleIOException(Log.scala:2316) > at kafka.log.Log.deleteSegments(Log.scala:1691) > at kafka.log.Log.deleteOldSegments(Log.scala:1686) > at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1763) > at kafka.log.Log.deleteOldSegments(Log.scala:1753) > at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:982) > at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:979) > at scala.collection.immutable.List.foreach(List.scala:305) > at kafka.log.LogManager.cleanupLogs(LogManager.scala:979) > at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:403) > at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:830) > Suppressed: java.nio.file.FileSystemException: > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex > -> > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex.deleted: > The process cannot access the file because it is being used by another > process. > at > java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) > at > java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) > at
[jira] [Commented] (KAFKA-10672) Restarting Kafka always takes a lot of time
[ https://issues.apache.org/jira/browse/KAFKA-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243730#comment-17243730 ] Wenbing Shen commented on KAFKA-10672: -- * We increased the batch number of one IO read * After many tests, the startup speed increased by 50% on average * See the attached file for detailed code > Restarting Kafka always takes a lot of time > --- > > Key: KAFKA-10672 > URL: https://issues.apache.org/jira/browse/KAFKA-10672 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.0.0 > Environment: A cluster of 21 Kafka nodes; > Each node has 12 disks; > Each node has about 1500 partitions; > There are approximately 700 leader partitions per node; > Slow-loading partitions have about 1000 log segments; >Reporter: Wenbing Shen >Priority: Major > Attachments: AbstractIterator.java, AbstractIteratorOfRestart.java, > AbstractLegacyRecordBatch.java, ByteBufferLogInputStream.java, > DefaultRecordBatch.java, FileLogInputStream.java, FileRecords.java, > LazyDownConversionRecords.java, Log.scala, LogInputStream.java, > LogManager.scala, LogSegment.scala, MemoryRecords.java, > RecordBatchIterator.java, RecordBatchIteratorOfRestart.java, Records.java, > server.log > > > When the snapshot file does not exist, or the latest snapshot file before the > current active period, restoring the state of producers will traverse the log > section, it will traverse the log all batch, in the period when the > individual broker node partition number many, that there are most of the > number of logs, can cause a lot of IO number, IO will only load one batch at > a time, such as a log there will always be in the tens of thousands of batch, > I found that in the code for each batch are at least two IO operation, when a > batch as the default 16 KB,When a log segment is 1G, 65,536 batches will be > generated, and then at least 65,536 *2= 131,072 IO operations will be > generated, which will lead to a lot of time spent in kafka startup process. > We configured 15 log recovery threads in the production environment, and it > still took more than 2 hours to load a partition,can community puts forward > some proposals to the situation or improve.For detailed logs, see the section > on test-perf-18 partitions in the nearby logs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-10672) Restarting Kafka always takes a lot of time
[ https://issues.apache.org/jira/browse/KAFKA-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen updated KAFKA-10672: - Attachment: Records.java RecordBatchIteratorOfRestart.java RecordBatchIterator.java MemoryRecords.java LogSegment.scala LogManager.scala LogInputStream.java Log.scala LazyDownConversionRecords.java FileRecords.java FileLogInputStream.java DefaultRecordBatch.java ByteBufferLogInputStream.java AbstractLegacyRecordBatch.java AbstractIteratorOfRestart.java AbstractIterator.java > Restarting Kafka always takes a lot of time > --- > > Key: KAFKA-10672 > URL: https://issues.apache.org/jira/browse/KAFKA-10672 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.0.0 > Environment: A cluster of 21 Kafka nodes; > Each node has 12 disks; > Each node has about 1500 partitions; > There are approximately 700 leader partitions per node; > Slow-loading partitions have about 1000 log segments; >Reporter: Wenbing Shen >Priority: Major > Attachments: AbstractIterator.java, AbstractIteratorOfRestart.java, > AbstractLegacyRecordBatch.java, ByteBufferLogInputStream.java, > DefaultRecordBatch.java, FileLogInputStream.java, FileRecords.java, > LazyDownConversionRecords.java, Log.scala, LogInputStream.java, > LogManager.scala, LogSegment.scala, MemoryRecords.java, > RecordBatchIterator.java, RecordBatchIteratorOfRestart.java, Records.java, > server.log > > > When the snapshot file does not exist, or the latest snapshot file before the > current active period, restoring the state of producers will traverse the log > section, it will traverse the log all batch, in the period when the > individual broker node partition number many, that there are most of the > number of logs, can cause a lot of IO number, IO will only load one batch at > a time, such as a log there will always be in the tens of thousands of batch, > I found that in the code for each batch are at least two IO operation, when a > batch as the default 16 KB,When a log segment is 1G, 65,536 batches will be > generated, and then at least 65,536 *2= 131,072 IO operations will be > generated, which will lead to a lot of time spent in kafka startup process. > We configured 15 log recovery threads in the production environment, and it > still took more than 2 hours to load a partition,can community puts forward > some proposals to the situation or improve.For detailed logs, see the section > on test-perf-18 partitions in the nearby logs -- This message was sent by Atlassian Jira (v8.3.4#803005)