[jira] [Commented] (KAFKA-12493) The controller should handle the consistency between the controllerContext and the partition replicas assignment on zookeeper

2021-07-09 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377956#comment-17377956
 ] 

Wenbing Shen commented on KAFKA-12493:
--

[~kkonstantine] This issue does not block 3.0.0, you just do it. Thanks. 

> The controller should handle the consistency between the controllerContext 
> and the partition replicas assignment on zookeeper
> -
>
> Key: KAFKA-12493
> URL: https://issues.apache.org/jira/browse/KAFKA-12493
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Major
> Fix For: 3.0.0
>
>
> This question can be linked to this email: 
> [https://lists.apache.org/thread.html/redf5748ec787a9c65fc48597e3d2256ffdd729de14afb873c63e6c5b%40%3Cusers.kafka.apache.org%3E]
>  
> This is a 100% recurring problem.
> Problem description:
> In the production environment of our customer’s site, the existing partitions 
> were redistributed in the code of colleagues in other departments and written 
> into zookeeper. This caused the controller to only judge the newly added 
> partitions when processing partition modification events. Partition 
> allocation plan and new partition and replica allocation in the partition 
> state machine and replica state machine, and issue LeaderAndISR and other 
> control requests.
> But the controller did not verify the existing partition replicas assigment 
> in the controllerContext and whether the original partition allocation on the 
> znode in zookeeper has changed. This seems to be no problem, but when we have 
> to restart the broker for some reasons, such as configuration updates and 
> upgrades Wait, this will cause this part of the topic in real-time production 
> to be abnormal, the controller cannot complete the allocation of the new 
> leader, and the original leader cannot correctly identify the replica 
> allocated on the current zookeeper. The real-time business in our customer's 
> on-site environment is interrupted and partially Data has been lost.
> This problem can be stably reproduced in the following ways:
> Adding partitions or modifying replicas of an existing topic through the 
> following code will cause the original partition replicas to be reallocated 
> and finally written to zookeeper.Next, the controller did not accurately 
> process this event, restart the topic related broker, this topic will not be 
> able to be produced and consumed.
>  
> {code:java}
> public void updateKafkaTopic(KafkaTopicVO kafkaTopicVO) {
> ZkUtils zkUtils = ZkUtils.apply(ZK_LIST, SESSION_TIMEOUT, 
> CONNECTION_TIMEOUT, JaasUtils.isZkSecurityEnabled());
> try {
> if (kafkaTopicVO.getPartitionNum() >= 0 && 
> kafkaTopicVO.getReplicationNum() >= 0) {
> // Get the original broker data information
> Seq brokerMetadata = 
> AdminUtils.getBrokerMetadatas(zkUtils,
> RackAwareMode.Enforced$.MODULE$,
> Option.apply(null));
> // Generate a new partition replica allocation plan
> scala.collection.Map> replicaAssign = 
> AdminUtils.assignReplicasToBrokers(brokerMetadata,
> kafkaTopicVO.getPartitionNum(), // Number of partitions
> kafkaTopicVO.getReplicationNum(), // Number of replicas 
> per partition
> -1,
> -1);
> // Modify the partition replica allocation plan
> AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK(zkUtils,
> kafkaTopicVO.getTopicNameList().get(0),
> replicaAssign,
> null,
> true);
> }
> } catch (Exception e) {
> System.out.println("Adjust partition abnormal");
> System.exit(0);
> } finally {
> zkUtils.close();
> }
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12891) Add --files and --file-separator options to the ConsoleProducer

2021-06-04 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-12891:


 Summary: Add --files and --file-separator options to the 
ConsoleProducer
 Key: KAFKA-12891
 URL: https://issues.apache.org/jira/browse/KAFKA-12891
 Project: Kafka
  Issue Type: New Feature
  Components: tools
Reporter: Wenbing Shen
Assignee: Wenbing Shen


Introduce *--files* to the producer command line tool to support reading data 
from a given *multi-file*,
Multiple files are separated by *--files-separator*, the default *comma* is the 
separator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10886) Kafka crashed in windows environment2

2021-05-26 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352242#comment-17352242
 ] 

Wenbing Shen commented on KAFKA-10886:
--

I already know why you can’t use this patch, because the patch I submitted is 
from our company’s internal Kafka, which is somewhat different from the 
community’s 2.0.0 version of the code. I’m sorry, this is the first time I 
uploaded a patch, and 
I didn't thoughtful well at that time. when I have time, I will use the 
community code to fix the incompatibility with windows and upload the latest 
patch. If you need to fix the problem urgently, you can download the patch, 
modify it and apply it to your kafka code manually .

> Kafka crashed in windows environment2
> -
>
> Key: KAFKA-10886
> URL: https://issues.apache.org/jira/browse/KAFKA-10886
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.0.0
> Environment: Windows Server
>Reporter: Wenbing Shen
>Priority: Critical
>  Labels: windows
> Attachments: windows_kafka_full_crash.patch
>
>
> I tried using the 
> [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix 
> the Kafka problem in the Windows environment, but it didn't seem to 
> work.These include restarting the Kafka service causing data to be deleted by 
> mistake, deleting a topic or a partition migration causing a disk to go 
> offline or the broker crashed.
> [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23
>  17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException:
>  
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at 
> kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log.maybeHandleIOException(Log.scala:1837) at 
> kafka.log.Log.renameDir(Log.scala:687) at 
> kafka.log.LogManager.asyncDelete(LogManager.scala:833) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at 
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at 
> kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at 
> kafka.cluster.Partition.delete(Partition.scala:262) at 
> kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371)
>  at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at 
> kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:113) at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at 
> java.lang.Thread.run(Unknown Source) Suppressed: 
> java.nio.file.AccessDeniedException: 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) 
> ... 23 

[jira] [Updated] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck

2021-05-06 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-12734:
-
Affects Version/s: (was: 2.8.0)
   2.3.0

> LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip 
> activeSegment  sanityCheck
> 
>
> Key: KAFKA-12734
> URL: https://issues.apache.org/jira/browse/KAFKA-12734
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: LoadIndex.png, image-2021-04-30-22-49-24-202.png, 
> niobufferoverflow.png
>
>
> This question is similar to KAFKA-9156
> We introduced Lazy Index, which helps us skip checking the index files of all 
> log segments when starting kafka, which has greatly improved the speed of our 
> kafka startup.
> Unfortunately, it skips the index file detection of the active segment. The 
> active segment will receive write requests from the client or the replica 
> synchronization thread.
> There is a situation when we skip the index detection of all segments, and we 
> do not need to recover the unflushed log segment, and the index file of the 
> last active segment is damaged at this time. When appending data to the 
> active segment, at this time The program reported an error.
> Below are the problems I encountered in the production environment:
> When Kafka starts to load the log segment, I see in the program log that the 
> memory mapping position of the index file with timestamp and offset is at the 
> larger position of the current index file, but in fact, the index file is not 
> written With so many index items, I guess this kind of problem will occur 
> during the kafka startup process. When kafka has not been started yet, stop 
> the kafka process at this time, and then start the kafka process again, 
> whether it will cause the limit address of the index file memory map to be a 
> file The maximum value is not cut to the actual size used, which will cause 
> the memory map position to be set to limit when Kafka is started.
>  At this time, adding data to the active segment will cause niobufferoverflow.
> I agree to skip the index detection of all inactive segments, because in fact 
> they will no longer receive write requests, but for active segments, we need 
> to perform index file detection.
>  Another situation is that we have CleanShutdown, but due to some factors, 
> the index file of the active segment sets the position of the memory map to 
> limit, resulting in a niobuffer overflow in the write



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck

2021-05-06 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340247#comment-17340247
 ] 

Wenbing Shen commented on KAFKA-12734:
--

[~ecastilla] LazyIndex was introduced from version 2.3.0. After the 
introduction of this feature, the problem has always existed. Among them, the 
concurrency problem is solved at 
[KAFKA-9156|https://issues.apache.org/jira/browse/KAFKA-9156] (2.3.2, 2.4.0) , 
At [KAFKA-10471|https://issues.apache.org/jira/browse/KAFKA-10471] (2.8.0) the 
problem described by this pr is solved. I think if you want to fix the problem 
described in this pr, you need to introduce the repair code from 2.8.0

> LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip 
> activeSegment  sanityCheck
> 
>
> Key: KAFKA-12734
> URL: https://issues.apache.org/jira/browse/KAFKA-12734
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: LoadIndex.png, image-2021-04-30-22-49-24-202.png, 
> niobufferoverflow.png
>
>
> This question is similar to KAFKA-9156
> We introduced Lazy Index, which helps us skip checking the index files of all 
> log segments when starting kafka, which has greatly improved the speed of our 
> kafka startup.
> Unfortunately, it skips the index file detection of the active segment. The 
> active segment will receive write requests from the client or the replica 
> synchronization thread.
> There is a situation when we skip the index detection of all segments, and we 
> do not need to recover the unflushed log segment, and the index file of the 
> last active segment is damaged at this time. When appending data to the 
> active segment, at this time The program reported an error.
> Below are the problems I encountered in the production environment:
> When Kafka starts to load the log segment, I see in the program log that the 
> memory mapping position of the index file with timestamp and offset is at the 
> larger position of the current index file, but in fact, the index file is not 
> written With so many index items, I guess this kind of problem will occur 
> during the kafka startup process. When kafka has not been started yet, stop 
> the kafka process at this time, and then start the kafka process again, 
> whether it will cause the limit address of the index file memory map to be a 
> file The maximum value is not cut to the actual size used, which will cause 
> the memory map position to be set to limit when Kafka is started.
>  At this time, adding data to the active segment will cause niobufferoverflow.
> I agree to skip the index detection of all inactive segments, because in fact 
> they will no longer receive write requests, but for active segments, we need 
> to perform index file detection.
>  Another situation is that we have CleanShutdown, but due to some factors, 
> the index file of the active segment sets the position of the memory map to 
> limit, resulting in a niobuffer overflow in the write



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck

2021-04-30 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen resolved KAFKA-12734.
--
Resolution: Duplicate

Duplicate with jiraId KAFKA-10471, This problem has been fixed in version 2.8.

> LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip 
> activeSegment  sanityCheck
> 
>
> Key: KAFKA-12734
> URL: https://issues.apache.org/jira/browse/KAFKA-12734
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: LoadIndex.png, image-2021-04-30-22-49-24-202.png, 
> niobufferoverflow.png
>
>
> This question is similar to KAFKA-9156
> We introduced Lazy Index, which helps us skip checking the index files of all 
> log segments when starting kafka, which has greatly improved the speed of our 
> kafka startup.
> Unfortunately, it skips the index file detection of the active segment. The 
> active segment will receive write requests from the client or the replica 
> synchronization thread.
> There is a situation when we skip the index detection of all segments, and we 
> do not need to recover the unflushed log segment, and the index file of the 
> last active segment is damaged at this time. When appending data to the 
> active segment, at this time The program reported an error.
> Below are the problems I encountered in the production environment:
> When Kafka starts to load the log segment, I see in the program log that the 
> memory mapping position of the index file with timestamp and offset is at the 
> larger position of the current index file, but in fact, the index file is not 
> written With so many index items, I guess this kind of problem will occur 
> during the kafka startup process. When kafka has not been started yet, stop 
> the kafka process at this time, and then start the kafka process again, 
> whether it will cause the limit address of the index file memory map to be a 
> file The maximum value is not cut to the actual size used, which will cause 
> the memory map position to be set to limit when Kafka is started.
>  At this time, adding data to the active segment will cause niobufferoverflow.
> I agree to skip the index detection of all inactive segments, because in fact 
> they will no longer receive write requests, but for active segments, we need 
> to perform index file detection.
>  Another situation is that we have CleanShutdown, but due to some factors, 
> the index file of the active segment sets the position of the memory map to 
> limit, resulting in a niobuffer overflow in the write



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck

2021-04-30 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337435#comment-17337435
 ] 

Wenbing Shen commented on KAFKA-12734:
--

This is a problem we often encounter before applying LazyIndex. SanityCheck 
will help us rebuild the index, but after applying LazyIndex, it skips checking 
all index files, resulting in abnormal index files in the active segment. At 
this time, appending data to the log will cause niobufferoverflow exception.

!image-2021-04-30-22-49-24-202.png!

> LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip 
> activeSegment  sanityCheck
> 
>
> Key: KAFKA-12734
> URL: https://issues.apache.org/jira/browse/KAFKA-12734
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: LoadIndex.png, image-2021-04-30-22-49-24-202.png, 
> niobufferoverflow.png
>
>
> This question is similar to KAFKA-9156
> We introduced Lazy Index, which helps us skip checking the index files of all 
> log segments when starting kafka, which has greatly improved the speed of our 
> kafka startup.
> Unfortunately, it skips the index file detection of the active segment. The 
> active segment will receive write requests from the client or the replica 
> synchronization thread.
> There is a situation when we skip the index detection of all segments, and we 
> do not need to recover the unflushed log segment, and the index file of the 
> last active segment is damaged at this time. When appending data to the 
> active segment, at this time The program reported an error.
> Below are the problems I encountered in the production environment:
> When Kafka starts to load the log segment, I see in the program log that the 
> memory mapping position of the index file with timestamp and offset is at the 
> larger position of the current index file, but in fact, the index file is not 
> written With so many index items, I guess this kind of problem will occur 
> during the kafka startup process. When kafka has not been started yet, stop 
> the kafka process at this time, and then start the kafka process again, 
> whether it will cause the limit address of the index file memory map to be a 
> file The maximum value is not cut to the actual size used, which will cause 
> the memory map position to be set to limit when Kafka is started.
>  At this time, adding data to the active segment will cause niobufferoverflow.
> I agree to skip the index detection of all inactive segments, because in fact 
> they will no longer receive write requests, but for active segments, we need 
> to perform index file detection.
>  Another situation is that we have CleanShutdown, but due to some factors, 
> the index file of the active segment sets the position of the memory map to 
> limit, resulting in a niobuffer overflow in the write



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck

2021-04-30 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-12734:
-
Attachment: image-2021-04-30-22-49-24-202.png

> LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip 
> activeSegment  sanityCheck
> 
>
> Key: KAFKA-12734
> URL: https://issues.apache.org/jira/browse/KAFKA-12734
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: LoadIndex.png, image-2021-04-30-22-49-24-202.png, 
> niobufferoverflow.png
>
>
> This question is similar to KAFKA-9156
> We introduced Lazy Index, which helps us skip checking the index files of all 
> log segments when starting kafka, which has greatly improved the speed of our 
> kafka startup.
> Unfortunately, it skips the index file detection of the active segment. The 
> active segment will receive write requests from the client or the replica 
> synchronization thread.
> There is a situation when we skip the index detection of all segments, and we 
> do not need to recover the unflushed log segment, and the index file of the 
> last active segment is damaged at this time. When appending data to the 
> active segment, at this time The program reported an error.
> Below are the problems I encountered in the production environment:
> When Kafka starts to load the log segment, I see in the program log that the 
> memory mapping position of the index file with timestamp and offset is at the 
> larger position of the current index file, but in fact, the index file is not 
> written With so many index items, I guess this kind of problem will occur 
> during the kafka startup process. When kafka has not been started yet, stop 
> the kafka process at this time, and then start the kafka process again, 
> whether it will cause the limit address of the index file memory map to be a 
> file The maximum value is not cut to the actual size used, which will cause 
> the memory map position to be set to limit when Kafka is started.
>  At this time, adding data to the active segment will cause niobufferoverflow.
> I agree to skip the index detection of all inactive segments, because in fact 
> they will no longer receive write requests, but for active segments, we need 
> to perform index file detection.
>  Another situation is that we have CleanShutdown, but due to some factors, 
> the index file of the active segment sets the position of the memory map to 
> limit, resulting in a niobuffer overflow in the write



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck

2021-04-30 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-12734:
-
Description: 
This question is similar to KAFKA-9156

We introduced Lazy Index, which helps us skip checking the index files of all 
log segments when starting kafka, which has greatly improved the speed of our 
kafka startup.

Unfortunately, it skips the index file detection of the active segment. The 
active segment will receive write requests from the client or the replica 
synchronization thread.

There is a situation when we skip the index detection of all segments, and we 
do not need to recover the unflushed log segment, and the index file of the 
last active segment is damaged at this time. When appending data to the active 
segment, at this time The program reported an error.

Below are the problems I encountered in the production environment:

When Kafka starts to load the log segment, I see in the program log that the 
memory mapping position of the index file with timestamp and offset is at the 
larger position of the current index file, but in fact, the index file is not 
written With so many index items, I guess this kind of problem will occur 
during the kafka startup process. When kafka has not been started yet, stop the 
kafka process at this time, and then start the kafka process again, whether it 
will cause the limit address of the index file memory map to be a file The 
maximum value is not cut to the actual size used, which will cause the memory 
map position to be set to limit when Kafka is started.
 At this time, adding data to the active segment will cause niobufferoverflow.

I agree to skip the index detection of all inactive segments, because in fact 
they will no longer receive write requests, but for active segments, we need to 
perform index file detection.

 Another situation is that we have CleanShutdown, but due to some factors, the 
index file of the active segment sets the position of the memory map to limit, 
resulting in a niobuffer overflow in the write

  was:
This question is similar to KAFKA-9156

We introduced Lazy Index, which helps us skip checking the index files of all 
log segments when starting kafka, which has greatly improved the speed of our 
kafka startup.

Unfortunately, it skips the index file detection of the active segment. The 
active segment will receive write requests from the client or the replica 
synchronization thread.

There is a situation when we skip the index detection of all segments, and we 
do not need to recover the unflushed log segment, and the index file of the 
last active segment is damaged at this time. When appending data to the active 
segment, at this time The program reported an error.

Below are the problems I encountered in the production environment:

When Kafka starts to load the log segment, I see in the program log that the 
memory mapping position of the index file with timestamp and offset is at the 
larger position of the current index file, but in fact, the index file is not 
written With so many index items, I guess this kind of problem will occur 
during the kafka startup process. When kafka has not been started yet, stop the 
kafka process at this time, and then start the kafka process again, whether it 
will cause the limit address of the index file memory map to be a file The 
maximum value is not cut to the actual size used, which will cause the memory 
map position to be set to limit when Kafka is started.
 At this time, adding data to the active segment will cause niobufferoverflow.

I agree to skip the index detection of all inactive segments, because in fact 
they will no longer receive write requests, but for active segments, we need to 
perform index file detection.

 

 


> LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip 
> activeSegment  sanityCheck
> 
>
> Key: KAFKA-12734
> URL: https://issues.apache.org/jira/browse/KAFKA-12734
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: LoadIndex.png, niobufferoverflow.png
>
>
> This question is similar to KAFKA-9156
> We introduced Lazy Index, which helps us skip checking the index files of all 
> log segments when starting kafka, which has greatly improved the speed of our 
> kafka startup.
> Unfortunately, it skips the index file detection of the active segment. The 
> active segment will receive write requests from the client or the replica 
> synchronization thread.
> There is a situation when we skip the index detection of all segments, and we 
> do not need to recover the unflushed log segment, 

[jira] [Updated] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck

2021-04-30 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-12734:
-
Description: 
This question is similar to KAFKA-9156

We introduced Lazy Index, which helps us skip checking the index files of all 
log segments when starting kafka, which has greatly improved the speed of our 
kafka startup.

Unfortunately, it skips the index file detection of the active segment. The 
active segment will receive write requests from the client or the replica 
synchronization thread.

There is a situation when we skip the index detection of all segments, and we 
do not need to recover the unflushed log segment, and the index file of the 
last active segment is damaged at this time. When appending data to the active 
segment, at this time The program reported an error.

Below are the problems I encountered in the production environment:

When Kafka starts to load the log segment, I see in the program log that the 
memory mapping position of the index file with timestamp and offset is at the 
larger position of the current index file, but in fact, the index file is not 
written With so many index items, I guess this kind of problem will occur 
during the kafka startup process. When kafka has not been started yet, stop the 
kafka process at this time, and then start the kafka process again, whether it 
will cause the limit address of the index file memory map to be a file The 
maximum value is not cut to the actual size used, which will cause the memory 
map position to be set to limit when Kafka is started.
 At this time, adding data to the active segment will cause niobufferoverflow.

I agree to skip the index detection of all inactive segments, because in fact 
they will no longer receive write requests, but for active segments, we need to 
perform index file detection.

 

 

  was:
This question is similar to KAFKA-9156

We introduced Lazy Index, which helps us skip checking the index files of all 
log segments when starting kafka, which has greatly improved the speed of our 
kafka startup.

Unfortunately, it skips the index file detection of the active segment. The 
active segment will receive write requests from the client or the replica 
synchronization thread.

There is a situation when we skip the index detection of all segments, and we 
do not need to recover the unflushed log segment, and the index file of the 
last active segment is damaged at this time. When appending data to the active 
segment, at this time The program reported an error.

Below are the problems I encountered in the production environment:

When Kafka starts to load the log segment, I see in the program log that the 
memory mapping position of the index file with timestamp and offset is at the 
larger position of the current index file, but in fact, the index file is not 
written With so many index items, I guess this kind of problem will occur 
during the kafka startup process. When kafka has not been started yet, stop the 
kafka process at this time, and then start the kafka process again, whether it 
will cause the limit address of the index file memory map to be a file The 
maximum value is not cut to the actual size used, which will cause the memory 
map position to be set to limit when Kafka is started.
At this time, adding data to the active segment will cause niobufferoverflow.

I agree to skip the index detection of all inactive segments, because in fact 
they will no longer receive write requests, but for active segments, we need to 
perform index file detection.


> LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip 
> activeSegment  sanityCheck
> 
>
> Key: KAFKA-12734
> URL: https://issues.apache.org/jira/browse/KAFKA-12734
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: LoadIndex.png, niobufferoverflow.png
>
>
> This question is similar to KAFKA-9156
> We introduced Lazy Index, which helps us skip checking the index files of all 
> log segments when starting kafka, which has greatly improved the speed of our 
> kafka startup.
> Unfortunately, it skips the index file detection of the active segment. The 
> active segment will receive write requests from the client or the replica 
> synchronization thread.
> There is a situation when we skip the index detection of all segments, and we 
> do not need to recover the unflushed log segment, and the index file of the 
> last active segment is damaged at this time. When appending data to the 
> active segment, at this time The program reported an error.
> Below are the problems I encountered in the 

[jira] [Assigned] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck

2021-04-29 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen reassigned KAFKA-12734:


Assignee: Wenbing Shen

> LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip 
> activeSegment  sanityCheck
> 
>
> Key: KAFKA-12734
> URL: https://issues.apache.org/jira/browse/KAFKA-12734
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: LoadIndex.png, niobufferoverflow.png
>
>
> This question is similar to KAFKA-9156
> We introduced Lazy Index, which helps us skip checking the index files of all 
> log segments when starting kafka, which has greatly improved the speed of our 
> kafka startup.
> Unfortunately, it skips the index file detection of the active segment. The 
> active segment will receive write requests from the client or the replica 
> synchronization thread.
> There is a situation when we skip the index detection of all segments, and we 
> do not need to recover the unflushed log segment, and the index file of the 
> last active segment is damaged at this time. When appending data to the 
> active segment, at this time The program reported an error.
> Below are the problems I encountered in the production environment:
> When Kafka starts to load the log segment, I see in the program log that the 
> memory mapping position of the index file with timestamp and offset is at the 
> larger position of the current index file, but in fact, the index file is not 
> written With so many index items, I guess this kind of problem will occur 
> during the kafka startup process. When kafka has not been started yet, stop 
> the kafka process at this time, and then start the kafka process again, 
> whether it will cause the limit address of the index file memory map to be a 
> file The maximum value is not cut to the actual size used, which will cause 
> the memory map position to be set to limit when Kafka is started.
> At this time, adding data to the active segment will cause niobufferoverflow.
> I agree to skip the index detection of all inactive segments, because in fact 
> they will no longer receive write requests, but for active segments, we need 
> to perform index file detection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck

2021-04-29 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-12734:


 Summary: LazyTimeIndex & LazyOffsetIndex may cause 
niobufferoverflow when skip activeSegment  sanityCheck
 Key: KAFKA-12734
 URL: https://issues.apache.org/jira/browse/KAFKA-12734
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 2.8.0, 2.7.0, 2.6.0, 2.5.0, 2.4.0
Reporter: Wenbing Shen
 Attachments: LoadIndex.png, niobufferoverflow.png

This question is similar to KAFKA-9156

We introduced Lazy Index, which helps us skip checking the index files of all 
log segments when starting kafka, which has greatly improved the speed of our 
kafka startup.

Unfortunately, it skips the index file detection of the active segment. The 
active segment will receive write requests from the client or the replica 
synchronization thread.

There is a situation when we skip the index detection of all segments, and we 
do not need to recover the unflushed log segment, and the index file of the 
last active segment is damaged at this time. When appending data to the active 
segment, at this time The program reported an error.

Below are the problems I encountered in the production environment:

When Kafka starts to load the log segment, I see in the program log that the 
memory mapping position of the index file with timestamp and offset is at the 
larger position of the current index file, but in fact, the index file is not 
written With so many index items, I guess this kind of problem will occur 
during the kafka startup process. When kafka has not been started yet, stop the 
kafka process at this time, and then start the kafka process again, whether it 
will cause the limit address of the index file memory map to be a file The 
maximum value is not cut to the actual size used, which will cause the memory 
map position to be set to limit when Kafka is started.
At this time, adding data to the active segment will cause niobufferoverflow.

I agree to skip the index detection of all inactive segments, because in fact 
they will no longer receive write requests, but for active segments, we need to 
perform index file detection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10773) When I execute the below command, Kafka cannot start in local.

2021-04-21 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326600#comment-17326600
 ] 

Wenbing Shen commented on KAFKA-10773:
--

Similar to KAFKA-12680. In fact, this is a bug of wsl. The link for this bug is 
here: [https://github.com/microsoft/WSL/issues/2281]

> When I execute the below command, Kafka cannot start in local.
> --
>
> Key: KAFKA-10773
> URL: https://issues.apache.org/jira/browse/KAFKA-10773
> Project: Kafka
>  Issue Type: Bug
>Reporter: NAYUSIK
>Priority: Critical
> Attachments: image-2020-11-27-19-09-49-389.png
>
>
> When I execute the below command, Kafka cannot start in local.
> *confluent local services start*
> *Please check the below error log and let me know how I modify it.*
> [2020-11-27 18:52:21,019] ERROR Error while deleting the clean shutdown file 
> in dir /tmp/confluent.056805/kafka/data (kafka.server.LogDirFailureChannel)
> java.io.IOException: Invalid argument
>  at java.io.RandomAccessFile.setLength(Native Method)
>  at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:190)
>  at kafka.log.AbstractIndex.resize(AbstractIndex.scala:176)
>  at 
> kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:242)
>  at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:242)
>  at kafka.log.LogSegment.recover(LogSegment.scala:380)
>  at kafka.log.Log.recoverSegment(Log.scala:692)
>  at kafka.log.Log.recoverLog(Log.scala:830)
>  at kafka.log.Log.$anonfun$loadSegments$3(Log.scala:767)
>  at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
>  at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2626)
>  at kafka.log.Log.loadSegments(Log.scala:767)
>  at kafka.log.Log.(Log.scala:313)
>  at kafka.log.MergedLog$.apply(MergedLog.scala:796)
>  at kafka.log.LogManager.loadLog(LogManager.scala:294)
>  at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:373)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> [2020-11-27 18:52:21,019] ERROR Error while deleting the clean shutdown file 
> in dir /tmp/confluent.056805/kafka/data (kafka.server.LogDirFailureChannel)
> java.io.IOException: Invalid argument
>  at java.io.RandomAccessFile.setLength(Native Method)
>  at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:190)
>  at kafka.log.AbstractIndex.resize(AbstractIndex.scala:176)
>  at 
> kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:242)
>  at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:242)
>  at kafka.log.LogSegment.recover(LogSegment.scala:380)
>  at kafka.log.Log.recoverSegment(Log.scala:692)
>  at kafka.log.Log.recoverLog(Log.scala:830)
>  at kafka.log.Log.$anonfun$loadSegments$3(Log.scala:767)
>  at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
>  at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2626)
>  at kafka.log.Log.loadSegments(Log.scala:767)
>  at kafka.log.Log.(Log.scala:313)
>  at kafka.log.MergedLog$.apply(MergedLog.scala:796)
>  at kafka.log.LogManager.loadLog(LogManager.scala:294)
>  at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:373)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
>  
> [2020-11-27 18:52:22,261] ERROR Shutdown broker because all log dirs in 
> /tmp/confluent.056805/kafka/data have failed (kafka.log.LogManager)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12680) Failed to restart the broker in kraft mode

2021-04-21 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen resolved KAFKA-12680.
--
Resolution: Not A Problem

> Failed to restart the broker in kraft mode
> --
>
> Key: KAFKA-12680
> URL: https://issues.apache.org/jira/browse/KAFKA-12680
> Project: Kafka
>  Issue Type: Bug
>Reporter: Wenbing Shen
>Priority: Major
>
> I tested kraft mode for the first time today, I deployed a single node kraft 
> mode broker according to the documentation:
> [https://github.com/apache/kafka/blob/6d1d68617ecd023b787f54aafc24a4232663428d/config/kraft/README.md]
>  
> first step: ./bin/kafka-storage.sh random-uuid
> Second step: Use the uuid generated above to execute the following commands:
> ./bin/kafka-storage.sh format -t  -c ./config/kraft/server.properties
>  
> third step: ./bin/kafka-server-start.sh ./config/kraft/server.properties
>  
> Then I created two topics with two partitions and a single replica.
> ./bin/kafka-topics.sh --create --topic test-01 --partitions 2 
> --replication-factor 1 --bootstrap-server localhost:9092
> Verify that there is no problem with production and consumption, but when I 
> call kafka-server-stop.sh, when I call the start command again, the broker 
> starts to report an error.
> I am not sure if it is a known bug or a problem with my usage
>  
> [2021-04-18 00:19:37,443] ERROR Exiting Kafka due to fatal exception 
> (kafka.Kafka$)
> java.io.IOException: Invalid argument
>  at java.io.RandomAccessFile.setLength(Native Method)
>  at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:189)
>  at kafka.log.AbstractIndex.resize(AbstractIndex.scala:175)
>  at 
> kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:241)
>  at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241)
>  at kafka.log.LogSegment.recover(LogSegment.scala:385)
>  at kafka.log.Log.recoverSegment(Log.scala:741)
>  at kafka.log.Log.recoverLog(Log.scala:894)
>  at kafka.log.Log.$anonfun$loadSegments$2(Log.scala:816)
>  at kafka.log.Log$$Lambda$153/391630194.apply$mcJ$sp(Unknown Source)
>  at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
>  at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2456)
>  at kafka.log.Log.loadSegments(Log.scala:816)
>  at kafka.log.Log.(Log.scala:326)
>  at kafka.log.Log$.apply(Log.scala:2593)
>  at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:358)
>  at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:253)
>  at kafka.raft.KafkaRaftManager.(RaftManager.scala:127)
>  at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:74)
>  at kafka.Kafka$.buildServer(Kafka.scala:79)
>  at kafka.Kafka$.main(Kafka.scala:87)
>  at kafka.Kafka.main(Kafka.scala)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KAFKA-12680) Failed to restart the broker in kraft mode

2021-04-21 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen reopened KAFKA-12680:
--

> Failed to restart the broker in kraft mode
> --
>
> Key: KAFKA-12680
> URL: https://issues.apache.org/jira/browse/KAFKA-12680
> Project: Kafka
>  Issue Type: Bug
>Reporter: Wenbing Shen
>Priority: Major
>
> I tested kraft mode for the first time today, I deployed a single node kraft 
> mode broker according to the documentation:
> [https://github.com/apache/kafka/blob/6d1d68617ecd023b787f54aafc24a4232663428d/config/kraft/README.md]
>  
> first step: ./bin/kafka-storage.sh random-uuid
> Second step: Use the uuid generated above to execute the following commands:
> ./bin/kafka-storage.sh format -t  -c ./config/kraft/server.properties
>  
> third step: ./bin/kafka-server-start.sh ./config/kraft/server.properties
>  
> Then I created two topics with two partitions and a single replica.
> ./bin/kafka-topics.sh --create --topic test-01 --partitions 2 
> --replication-factor 1 --bootstrap-server localhost:9092
> Verify that there is no problem with production and consumption, but when I 
> call kafka-server-stop.sh, when I call the start command again, the broker 
> starts to report an error.
> I am not sure if it is a known bug or a problem with my usage
>  
> [2021-04-18 00:19:37,443] ERROR Exiting Kafka due to fatal exception 
> (kafka.Kafka$)
> java.io.IOException: Invalid argument
>  at java.io.RandomAccessFile.setLength(Native Method)
>  at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:189)
>  at kafka.log.AbstractIndex.resize(AbstractIndex.scala:175)
>  at 
> kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:241)
>  at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241)
>  at kafka.log.LogSegment.recover(LogSegment.scala:385)
>  at kafka.log.Log.recoverSegment(Log.scala:741)
>  at kafka.log.Log.recoverLog(Log.scala:894)
>  at kafka.log.Log.$anonfun$loadSegments$2(Log.scala:816)
>  at kafka.log.Log$$Lambda$153/391630194.apply$mcJ$sp(Unknown Source)
>  at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
>  at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2456)
>  at kafka.log.Log.loadSegments(Log.scala:816)
>  at kafka.log.Log.(Log.scala:326)
>  at kafka.log.Log$.apply(Log.scala:2593)
>  at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:358)
>  at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:253)
>  at kafka.raft.KafkaRaftManager.(RaftManager.scala:127)
>  at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:74)
>  at kafka.Kafka$.buildServer(Kafka.scala:79)
>  at kafka.Kafka$.main(Kafka.scala:87)
>  at kafka.Kafka.main(Kafka.scala)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12680) Failed to restart the broker in kraft mode

2021-04-21 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326589#comment-17326589
 ] 

Wenbing Shen commented on KAFKA-12680:
--

Hi [~ijuma] Sorry, this problem is caused by me. In fact, this is a bug of wsl. 
The link for this bug is here: [https://github.com/microsoft/WSL/issues/2281]
I did not find the problem in my centos physical machine test today.

In fact, I should classify this problem as not a problem. I will change it now. 
Sorry for wasting your precious time. This is my first time using wsl. I 
Unexpectedly, it has this problem. Put your time on important issues. 

I wish the kraft model a great success in the end, and I will do my best to 
accomplish this together. :)

> Failed to restart the broker in kraft mode
> --
>
> Key: KAFKA-12680
> URL: https://issues.apache.org/jira/browse/KAFKA-12680
> Project: Kafka
>  Issue Type: Bug
>Reporter: Wenbing Shen
>Priority: Major
>
> I tested kraft mode for the first time today, I deployed a single node kraft 
> mode broker according to the documentation:
> [https://github.com/apache/kafka/blob/6d1d68617ecd023b787f54aafc24a4232663428d/config/kraft/README.md]
>  
> first step: ./bin/kafka-storage.sh random-uuid
> Second step: Use the uuid generated above to execute the following commands:
> ./bin/kafka-storage.sh format -t  -c ./config/kraft/server.properties
>  
> third step: ./bin/kafka-server-start.sh ./config/kraft/server.properties
>  
> Then I created two topics with two partitions and a single replica.
> ./bin/kafka-topics.sh --create --topic test-01 --partitions 2 
> --replication-factor 1 --bootstrap-server localhost:9092
> Verify that there is no problem with production and consumption, but when I 
> call kafka-server-stop.sh, when I call the start command again, the broker 
> starts to report an error.
> I am not sure if it is a known bug or a problem with my usage
>  
> [2021-04-18 00:19:37,443] ERROR Exiting Kafka due to fatal exception 
> (kafka.Kafka$)
> java.io.IOException: Invalid argument
>  at java.io.RandomAccessFile.setLength(Native Method)
>  at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:189)
>  at kafka.log.AbstractIndex.resize(AbstractIndex.scala:175)
>  at 
> kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:241)
>  at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241)
>  at kafka.log.LogSegment.recover(LogSegment.scala:385)
>  at kafka.log.Log.recoverSegment(Log.scala:741)
>  at kafka.log.Log.recoverLog(Log.scala:894)
>  at kafka.log.Log.$anonfun$loadSegments$2(Log.scala:816)
>  at kafka.log.Log$$Lambda$153/391630194.apply$mcJ$sp(Unknown Source)
>  at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
>  at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2456)
>  at kafka.log.Log.loadSegments(Log.scala:816)
>  at kafka.log.Log.(Log.scala:326)
>  at kafka.log.Log$.apply(Log.scala:2593)
>  at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:358)
>  at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:253)
>  at kafka.raft.KafkaRaftManager.(RaftManager.scala:127)
>  at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:74)
>  at kafka.Kafka$.buildServer(Kafka.scala:79)
>  at kafka.Kafka$.main(Kafka.scala:87)
>  at kafka.Kafka.main(Kafka.scala)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12680) Failed to restart the broker in kraft mode

2021-04-21 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen resolved KAFKA-12680.
--
Resolution: Cannot Reproduce

> Failed to restart the broker in kraft mode
> --
>
> Key: KAFKA-12680
> URL: https://issues.apache.org/jira/browse/KAFKA-12680
> Project: Kafka
>  Issue Type: Bug
>Reporter: Wenbing Shen
>Priority: Major
>
> I tested kraft mode for the first time today, I deployed a single node kraft 
> mode broker according to the documentation:
> [https://github.com/apache/kafka/blob/6d1d68617ecd023b787f54aafc24a4232663428d/config/kraft/README.md]
>  
> first step: ./bin/kafka-storage.sh random-uuid
> Second step: Use the uuid generated above to execute the following commands:
> ./bin/kafka-storage.sh format -t  -c ./config/kraft/server.properties
>  
> third step: ./bin/kafka-server-start.sh ./config/kraft/server.properties
>  
> Then I created two topics with two partitions and a single replica.
> ./bin/kafka-topics.sh --create --topic test-01 --partitions 2 
> --replication-factor 1 --bootstrap-server localhost:9092
> Verify that there is no problem with production and consumption, but when I 
> call kafka-server-stop.sh, when I call the start command again, the broker 
> starts to report an error.
> I am not sure if it is a known bug or a problem with my usage
>  
> [2021-04-18 00:19:37,443] ERROR Exiting Kafka due to fatal exception 
> (kafka.Kafka$)
> java.io.IOException: Invalid argument
>  at java.io.RandomAccessFile.setLength(Native Method)
>  at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:189)
>  at kafka.log.AbstractIndex.resize(AbstractIndex.scala:175)
>  at 
> kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:241)
>  at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241)
>  at kafka.log.LogSegment.recover(LogSegment.scala:385)
>  at kafka.log.Log.recoverSegment(Log.scala:741)
>  at kafka.log.Log.recoverLog(Log.scala:894)
>  at kafka.log.Log.$anonfun$loadSegments$2(Log.scala:816)
>  at kafka.log.Log$$Lambda$153/391630194.apply$mcJ$sp(Unknown Source)
>  at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
>  at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2456)
>  at kafka.log.Log.loadSegments(Log.scala:816)
>  at kafka.log.Log.(Log.scala:326)
>  at kafka.log.Log$.apply(Log.scala:2593)
>  at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:358)
>  at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:253)
>  at kafka.raft.KafkaRaftManager.(RaftManager.scala:127)
>  at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:74)
>  at kafka.Kafka$.buildServer(Kafka.scala:79)
>  at kafka.Kafka$.main(Kafka.scala:87)
>  at kafka.Kafka.main(Kafka.scala)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12702) Unhandled exception caught in InterBrokerSendThread

2021-04-21 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-12702:
-
Attachment: image-2021-04-21-17-12-28-471.png

> Unhandled exception caught in InterBrokerSendThread
> ---
>
> Key: KAFKA-12702
> URL: https://issues.apache.org/jira/browse/KAFKA-12702
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wenbing Shen
>Priority: Blocker
> Attachments: afterFixing.png, beforeFixing.png, 
> image-2021-04-21-17-12-28-471.png
>
>
> In kraft mode, if listeners and advertised.listeners are not configured with 
> host addresses, the host parameter value of Listener in 
> BrokerRegistrationRequestData will be null. When the broker is started, a 
> null pointer exception will be thrown, causing startup failure.
> A feasible solution is to replace the empty host of endPoint in 
> advertisedListeners with InetAddress.getLocalHost.getCanonicalHostName in 
> Broker Server when building networkListeners.
> The following is the debug log:
> before fixing:
> [2021-04-21 14:15:20,032] DEBUG (broker-2-to-controller-send-thread 
> org.apache.kafka.clients.NetworkClient 522) [broker-2-to-controller] Sending 
> BROKER_REGISTRATION request with header RequestHeader(apiKey=BROKER_REGIS
> TRATION, apiVersion=0, clientId=2, correlationId=6) and timeout 3 to node 
> 2: BrokerRegistrationRequestData(brokerId=2, 
> clusterId='nCqve6D1TEef3NpQniA0Mg', incarnationId=X8w4_1DFT2yUjOm6asPjIQ, 
> listeners=[Listener(n
> ame='PLAINTEXT', {color:#FF}host=null,{color} port=9092, 
> securityProtocol=0)], features=[], rack=null)
> [2021-04-21 14:15:20,033] ERROR (broker-2-to-controller-send-thread 
> kafka.server.BrokerToControllerRequestThread 76) 
> [broker-2-to-controller-send-thread]: unhandled exception caught in 
> InterBrokerSendThread
> java.lang.NullPointerException
>  at 
> org.apache.kafka.common.message.BrokerRegistrationRequestData$Listener.addSize(BrokerRegistrationRequestData.java:515)
>  at 
> org.apache.kafka.common.message.BrokerRegistrationRequestData.addSize(BrokerRegistrationRequestData.java:216)
>  at 
> org.apache.kafka.common.protocol.SendBuilder.buildSend(SendBuilder.java:218)
>  at 
> org.apache.kafka.common.protocol.SendBuilder.buildRequestSend(SendBuilder.java:187)
>  at 
> org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:101)
>  at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:525)
>  at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:501)
>  at org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:461)
>  at 
> kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:104)
>  at 
> kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99)
>  at kafka.common.InterBrokerSendThread$$Lambda$259/910445654.apply(Unknown 
> Source)
>  at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563)
>  at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561)
>  at scala.collection.AbstractIterable.foreach(Iterable.scala:919)
>  at 
> kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99)
>  at 
> kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73)
>  at 
> kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManager.scala:368)
>  at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
> [2021-04-21 14:15:20,034] INFO (broker-2-to-controller-send-thread 
> kafka.server.BrokerToControllerRequestThread 66) 
> [broker-2-to-controller-send-thread]: Stopped
> after fixing:
> [2021-04-21 15:05:01,095] DEBUG (BrokerToControllerChannelManager broker=2 
> name=heartbeat org.apache.kafka.clients.NetworkClient 512) 
> [BrokerToControllerChannelManager broker=2 name=heartbeat] Sending 
> BROKER_REGISTRATI
> ON request with header RequestHeader(apiKey=BROKER_REGISTRATION, 
> apiVersion=0, clientId=2, correlationId=0) and timeout 3 to node 2: 
> BrokerRegistrationRequestData(brokerId=2, clusterId='nCqve6D1TEef3NpQniA0Mg', 
> inc
> arnationId=xF29h_IRR1KzrERWwssQ2w, listeners=[Listener(name='PLAINTEXT', 
> host='hdpxxx.cn', port=9092, securityProtocol=0)], features=[], rack=null)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12702) Unhandled exception caught in InterBrokerSendThread

2021-04-21 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326387#comment-17326387
 ] 

Wenbing Shen commented on KAFKA-12702:
--

We need to fix this problem, because according to the comments in the 
configuration file (config/kraft/server.properties), if listeners and 
advertised.listeners are not configured with an address, the program will 
automatically obtain java.net.InetAddress.getCanonicalHostName(), but this will 
actually cause the service to fail to start. !image-2021-04-21-17-12-28-471.png!

> Unhandled exception caught in InterBrokerSendThread
> ---
>
> Key: KAFKA-12702
> URL: https://issues.apache.org/jira/browse/KAFKA-12702
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wenbing Shen
>Priority: Blocker
> Attachments: afterFixing.png, beforeFixing.png, 
> image-2021-04-21-17-12-28-471.png
>
>
> In kraft mode, if listeners and advertised.listeners are not configured with 
> host addresses, the host parameter value of Listener in 
> BrokerRegistrationRequestData will be null. When the broker is started, a 
> null pointer exception will be thrown, causing startup failure.
> A feasible solution is to replace the empty host of endPoint in 
> advertisedListeners with InetAddress.getLocalHost.getCanonicalHostName in 
> Broker Server when building networkListeners.
> The following is the debug log:
> before fixing:
> [2021-04-21 14:15:20,032] DEBUG (broker-2-to-controller-send-thread 
> org.apache.kafka.clients.NetworkClient 522) [broker-2-to-controller] Sending 
> BROKER_REGISTRATION request with header RequestHeader(apiKey=BROKER_REGIS
> TRATION, apiVersion=0, clientId=2, correlationId=6) and timeout 3 to node 
> 2: BrokerRegistrationRequestData(brokerId=2, 
> clusterId='nCqve6D1TEef3NpQniA0Mg', incarnationId=X8w4_1DFT2yUjOm6asPjIQ, 
> listeners=[Listener(n
> ame='PLAINTEXT', {color:#FF}host=null,{color} port=9092, 
> securityProtocol=0)], features=[], rack=null)
> [2021-04-21 14:15:20,033] ERROR (broker-2-to-controller-send-thread 
> kafka.server.BrokerToControllerRequestThread 76) 
> [broker-2-to-controller-send-thread]: unhandled exception caught in 
> InterBrokerSendThread
> java.lang.NullPointerException
>  at 
> org.apache.kafka.common.message.BrokerRegistrationRequestData$Listener.addSize(BrokerRegistrationRequestData.java:515)
>  at 
> org.apache.kafka.common.message.BrokerRegistrationRequestData.addSize(BrokerRegistrationRequestData.java:216)
>  at 
> org.apache.kafka.common.protocol.SendBuilder.buildSend(SendBuilder.java:218)
>  at 
> org.apache.kafka.common.protocol.SendBuilder.buildRequestSend(SendBuilder.java:187)
>  at 
> org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:101)
>  at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:525)
>  at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:501)
>  at org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:461)
>  at 
> kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:104)
>  at 
> kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99)
>  at kafka.common.InterBrokerSendThread$$Lambda$259/910445654.apply(Unknown 
> Source)
>  at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563)
>  at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561)
>  at scala.collection.AbstractIterable.foreach(Iterable.scala:919)
>  at 
> kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99)
>  at 
> kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73)
>  at 
> kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManager.scala:368)
>  at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
> [2021-04-21 14:15:20,034] INFO (broker-2-to-controller-send-thread 
> kafka.server.BrokerToControllerRequestThread 66) 
> [broker-2-to-controller-send-thread]: Stopped
> after fixing:
> [2021-04-21 15:05:01,095] DEBUG (BrokerToControllerChannelManager broker=2 
> name=heartbeat org.apache.kafka.clients.NetworkClient 512) 
> [BrokerToControllerChannelManager broker=2 name=heartbeat] Sending 
> BROKER_REGISTRATI
> ON request with header RequestHeader(apiKey=BROKER_REGISTRATION, 
> apiVersion=0, clientId=2, correlationId=0) and timeout 3 to node 2: 
> BrokerRegistrationRequestData(brokerId=2, clusterId='nCqve6D1TEef3NpQniA0Mg', 
> inc
> arnationId=xF29h_IRR1KzrERWwssQ2w, listeners=[Listener(name='PLAINTEXT', 
> host='hdpxxx.cn', port=9092, securityProtocol=0)], features=[], rack=null)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12702) Unhandled exception caught in InterBrokerSendThread

2021-04-21 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-12702:


 Summary: Unhandled exception caught in InterBrokerSendThread
 Key: KAFKA-12702
 URL: https://issues.apache.org/jira/browse/KAFKA-12702
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Wenbing Shen
 Attachments: afterFixing.png, beforeFixing.png

In kraft mode, if listeners and advertised.listeners are not configured with 
host addresses, the host parameter value of Listener in 
BrokerRegistrationRequestData will be null. When the broker is started, a null 
pointer exception will be thrown, causing startup failure.

A feasible solution is to replace the empty host of endPoint in 
advertisedListeners with InetAddress.getLocalHost.getCanonicalHostName in 
Broker Server when building networkListeners.

The following is the debug log:

before fixing:

[2021-04-21 14:15:20,032] DEBUG (broker-2-to-controller-send-thread 
org.apache.kafka.clients.NetworkClient 522) [broker-2-to-controller] Sending 
BROKER_REGISTRATION request with header RequestHeader(apiKey=BROKER_REGIS
TRATION, apiVersion=0, clientId=2, correlationId=6) and timeout 3 to node 
2: BrokerRegistrationRequestData(brokerId=2, 
clusterId='nCqve6D1TEef3NpQniA0Mg', incarnationId=X8w4_1DFT2yUjOm6asPjIQ, 
listeners=[Listener(n
ame='PLAINTEXT', {color:#FF}host=null,{color} port=9092, 
securityProtocol=0)], features=[], rack=null)
[2021-04-21 14:15:20,033] ERROR (broker-2-to-controller-send-thread 
kafka.server.BrokerToControllerRequestThread 76) 
[broker-2-to-controller-send-thread]: unhandled exception caught in 
InterBrokerSendThread
java.lang.NullPointerException
 at 
org.apache.kafka.common.message.BrokerRegistrationRequestData$Listener.addSize(BrokerRegistrationRequestData.java:515)
 at 
org.apache.kafka.common.message.BrokerRegistrationRequestData.addSize(BrokerRegistrationRequestData.java:216)
 at org.apache.kafka.common.protocol.SendBuilder.buildSend(SendBuilder.java:218)
 at 
org.apache.kafka.common.protocol.SendBuilder.buildRequestSend(SendBuilder.java:187)
 at 
org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:101)
 at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:525)
 at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:501)
 at org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:461)
 at 
kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:104)
 at 
kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99)
 at kafka.common.InterBrokerSendThread$$Lambda$259/910445654.apply(Unknown 
Source)
 at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563)
 at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561)
 at scala.collection.AbstractIterable.foreach(Iterable.scala:919)
 at 
kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99)
 at kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73)
 at 
kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManager.scala:368)
 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
[2021-04-21 14:15:20,034] INFO (broker-2-to-controller-send-thread 
kafka.server.BrokerToControllerRequestThread 66) 
[broker-2-to-controller-send-thread]: Stopped



after fixing:

[2021-04-21 15:05:01,095] DEBUG (BrokerToControllerChannelManager broker=2 
name=heartbeat org.apache.kafka.clients.NetworkClient 512) 
[BrokerToControllerChannelManager broker=2 name=heartbeat] Sending 
BROKER_REGISTRATI
ON request with header RequestHeader(apiKey=BROKER_REGISTRATION, apiVersion=0, 
clientId=2, correlationId=0) and timeout 3 to node 2: 
BrokerRegistrationRequestData(brokerId=2, clusterId='nCqve6D1TEef3NpQniA0Mg', 
inc
arnationId=xF29h_IRR1KzrERWwssQ2w, listeners=[Listener(name='PLAINTEXT', 
host='hdpxxx.cn', port=9092, securityProtocol=0)], features=[], rack=null)

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-12684) The valid partition list is incorrectly replaced by the successfully elected partition list

2021-04-18 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen reassigned KAFKA-12684:


Assignee: Wenbing Shen

> The valid partition list is incorrectly replaced by the successfully elected 
> partition list
> ---
>
> Key: KAFKA-12684
> URL: https://issues.apache.org/jira/browse/KAFKA-12684
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: election-preferred-leader.png, non-preferred-leader.png
>
>
> When using the kafka-election-tool for preferred replica election, if there 
> are partitions in the elected list that are in the preferred replica, the 
> list of partitions already in the preferred replica will be replaced by the 
> successfully elected partition list.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12684) The valid partition list is incorrectly replaced by the successfully elected partition list

2021-04-18 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-12684:


 Summary: The valid partition list is incorrectly replaced by the 
successfully elected partition list
 Key: KAFKA-12684
 URL: https://issues.apache.org/jira/browse/KAFKA-12684
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 2.7.0, 2.6.0
Reporter: Wenbing Shen
 Fix For: 3.0.0
 Attachments: election-preferred-leader.png, non-preferred-leader.png

When using the kafka-election-tool for preferred replica election, if there are 
partitions in the elected list that are in the preferred replica, the list of 
partitions already in the preferred replica will be replaced by the 
successfully elected partition list.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12629) Flaky Test RaftClusterTest

2021-04-18 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324517#comment-17324517
 ] 

Wenbing Shen commented on KAFKA-12629:
--

Found another error log today: 

One of the errors occurred in 
testCreateClusterAndCreateAndManyTopicsWithManyPartitions, It reports:

java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request.

The strange thing is that this test case creates topics of test-topic-1, 
test-topic-2, and test-topic-3, but the log in the standard output indicates 
that the copy acquisition thread of broker 0 needs to fetch test-topic-0 
partition data from broker 1, But the hostname of broker 1 cannot be resolved.

The detailed log can be viewed from this link: 

https://github.com/apache/kafka/pull/10551/checks?check_run_id=2373652453

> Flaky Test RaftClusterTest
> --
>
> Key: KAFKA-12629
> URL: https://issues.apache.org/jira/browse/KAFKA-12629
> Project: Kafka
>  Issue Type: Test
>  Components: core, unit tests
>Reporter: Matthias J. Sax
>Assignee: Luke Chen
>Priority: Critical
>  Labels: flaky-test
>
> {quote} {{java.util.concurrent.ExecutionException: 
> java.lang.ClassNotFoundException: 
> org.apache.kafka.controller.NoOpSnapshotWriterBuilder
>   at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
>   at 
> kafka.testkit.KafkaClusterTestKit.startup(KafkaClusterTestKit.java:364)
>   at 
> kafka.server.RaftClusterTest.testCreateClusterAndCreateAndManyTopicsWithManyPartitions(RaftClusterTest.scala:181)}}{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12680) Failed to restart the broker in kraft mode

2021-04-17 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-12680:


 Summary: Failed to restart the broker in kraft mode
 Key: KAFKA-12680
 URL: https://issues.apache.org/jira/browse/KAFKA-12680
 Project: Kafka
  Issue Type: Bug
Reporter: Wenbing Shen


I tested kraft mode for the first time today, I deployed a single node kraft 
mode broker according to the documentation:

[https://github.com/apache/kafka/blob/6d1d68617ecd023b787f54aafc24a4232663428d/config/kraft/README.md]

 

first step: ./bin/kafka-storage.sh random-uuid


Second step: Use the uuid generated above to execute the following commands:

./bin/kafka-storage.sh format -t  -c ./config/kraft/server.properties

 

third step: ./bin/kafka-server-start.sh ./config/kraft/server.properties

 

Then I created two topics with two partitions and a single replica.

./bin/kafka-topics.sh --create --topic test-01 --partitions 2 
--replication-factor 1 --bootstrap-server localhost:9092

Verify that there is no problem with production and consumption, but when I 
call kafka-server-stop.sh, when I call the start command again, the broker 
starts to report an error.

I am not sure if it is a known bug or a problem with my usage

 

[2021-04-18 00:19:37,443] ERROR Exiting Kafka due to fatal exception 
(kafka.Kafka$)
java.io.IOException: Invalid argument
 at java.io.RandomAccessFile.setLength(Native Method)
 at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:189)
 at kafka.log.AbstractIndex.resize(AbstractIndex.scala:175)
 at kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:241)
 at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241)
 at kafka.log.LogSegment.recover(LogSegment.scala:385)
 at kafka.log.Log.recoverSegment(Log.scala:741)
 at kafka.log.Log.recoverLog(Log.scala:894)
 at kafka.log.Log.$anonfun$loadSegments$2(Log.scala:816)
 at kafka.log.Log$$Lambda$153/391630194.apply$mcJ$sp(Unknown Source)
 at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
 at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2456)
 at kafka.log.Log.loadSegments(Log.scala:816)
 at kafka.log.Log.(Log.scala:326)
 at kafka.log.Log$.apply(Log.scala:2593)
 at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:358)
 at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:253)
 at kafka.raft.KafkaRaftManager.(RaftManager.scala:127)
 at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:74)
 at kafka.Kafka$.buildServer(Kafka.scala:79)
 at kafka.Kafka$.main(Kafka.scala:87)
 at kafka.Kafka.main(Kafka.scala)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10886) Kafka crashed in windows environment2

2021-04-16 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322702#comment-17322702
 ] 

Wenbing Shen commented on KAFKA-10886:
--

[~asrivastava] I have the repaired binary file, and it has been running well in 
our company’s windows product project, but this belongs to our company’s 
product. I don’t have the right to publish the binary file. However, I have 
published the repair method on this jira. , Can you tell me how you applied 
this patch?

> Kafka crashed in windows environment2
> -
>
> Key: KAFKA-10886
> URL: https://issues.apache.org/jira/browse/KAFKA-10886
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.0.0
> Environment: Windows Server
>Reporter: Wenbing Shen
>Priority: Critical
>  Labels: windows
> Attachments: windows_kafka_full_crash.patch
>
>
> I tried using the 
> [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix 
> the Kafka problem in the Windows environment, but it didn't seem to 
> work.These include restarting the Kafka service causing data to be deleted by 
> mistake, deleting a topic or a partition migration causing a disk to go 
> offline or the broker crashed.
> [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23
>  17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException:
>  
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at 
> kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log.maybeHandleIOException(Log.scala:1837) at 
> kafka.log.Log.renameDir(Log.scala:687) at 
> kafka.log.LogManager.asyncDelete(LogManager.scala:833) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at 
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at 
> kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at 
> kafka.cluster.Partition.delete(Partition.scala:262) at 
> kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371)
>  at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at 
> kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:113) at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at 
> java.lang.Thread.run(Unknown Source) Suppressed: 
> java.nio.file.AccessDeniedException: 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) 
> ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler 
> kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving 
> replicas in dir 
> 

[jira] [Resolved] (KAFKA-12445) Improve the display of ConsumerPerformance indicators

2021-04-16 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen resolved KAFKA-12445.
--
Resolution: Won't Do

> Improve the display of ConsumerPerformance indicators
> -
>
> Key: KAFKA-12445
> URL: https://issues.apache.org/jira/browse/KAFKA-12445
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.7.0
>Reporter: Wenbing Shen
>Priority: Minor
> Attachments: image-2021-03-10-13-30-27-734.png
>
>
> The current test indicators are shown below, the user experience is poor, and 
> there is no intuitive display of the meaning of each indicator.
> bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic 
> test-perf10 --messages 4 --from-latest
> start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, 
> nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
> 2021-03-10 04:32:54:349, 2021-03-10 04:33:45:651, 390.6348, 7.6144, 40001, 
> 779.7162, 3096, 48206, 8.1034, 829.7930
>  
> show-detailed-stats:
> bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic 
> test-perf --messages 1 --show-detailed-stats
> time, threadId, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, 
> rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
> 2021-03-10 11:19:00:146, 0, 785.6112, 157.1222, 823773, 164754.6000, 
> 1615346338626, -1615346333626, 0., 0.
> 2021-03-10 11:19:05:146, 0, 4577.7817, 758.4341, 4800152, 795275.8000, 0, 
> 5000, 758.4341, 795275.8000
> 2021-03-10 11:19:10:146, 0, 8556.0875, 795.6612, 8971708, 834311.2000, 0, 
> 5000, 795.6612, 834311.2000
> 2021-03-10 11:19:15:286, 0, 9526.5665, 188.8091, 9989329, 197980.7393, 0, 
> 5140, 188.8091, 197980.7393
> 2021-03-10 11:19:20:310, 0, 9536.3321, 1.9438, 569, 2038.2166, 0, 5024, 
> 1.9438, 2038.2166
>  
> One of the optimization methods is to display the indicator variable name and 
> indicator test value in the form of a table, so that the meaning of each 
> measurement value can be clearly expressed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10886) Kafka crashed in windows environment2

2021-04-16 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322672#comment-17322672
 ] 

Wenbing Shen commented on KAFKA-10886:
--

[~asrivastava]  Sorry, the patch has not been merged into the development 
branch, you can manually apply this patch to your kafka project code, and 
manually recompile the binary file.

> Kafka crashed in windows environment2
> -
>
> Key: KAFKA-10886
> URL: https://issues.apache.org/jira/browse/KAFKA-10886
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.0.0
> Environment: Windows Server
>Reporter: Wenbing Shen
>Priority: Critical
>  Labels: windows
> Attachments: windows_kafka_full_crash.patch
>
>
> I tried using the 
> [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix 
> the Kafka problem in the Windows environment, but it didn't seem to 
> work.These include restarting the Kafka service causing data to be deleted by 
> mistake, deleting a topic or a partition migration causing a disk to go 
> offline or the broker crashed.
> [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23
>  17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException:
>  
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at 
> kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log.maybeHandleIOException(Log.scala:1837) at 
> kafka.log.Log.renameDir(Log.scala:687) at 
> kafka.log.LogManager.asyncDelete(LogManager.scala:833) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at 
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at 
> kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at 
> kafka.cluster.Partition.delete(Partition.scala:262) at 
> kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371)
>  at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at 
> kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:113) at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at 
> java.lang.Thread.run(Unknown Source) Suppressed: 
> java.nio.file.AccessDeniedException: 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) 
> ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler 
> kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving 
> replicas in dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-9156) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow in concurrent state

2021-03-31 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312207#comment-17312207
 ] 

Wenbing Shen edited comment on KAFKA-9156 at 3/31/21, 9:04 AM:
---

Hi, [~amironov]  [~ijuma] , I applied kafka-7283 and kafka-9156 in our version 
of kafka-2.0.0 to apply the benefits of LazyIndex when starting the broker. 
When I applied this feature to a small Kafka cluster, there was no problem, but 
when I first applied it to a cluster with high traffic, some brokers with small 
traffic seemed to be no exception, but after the brokers with large traffic 
started, a large number of replica fetcher threads throw 
java.nio.BufferOverflowException. 

 Same as the problem encountered by [~iBlackeyes] , the current patch still 
does not fix this problem.

Its stack information is as follows:

 

[2021-03-31 15:23:54,935] ERROR (ReplicaFetcherThread-1-1001 
kafka.server.ReplicaFetcherThread 76) [ReplicaFetcher replicaId=1006, 
leaderId=1001, fetcherId=1] Error due to
 org.apache.kafka.common.KafkaException: Error processing data for partition 
sinan_assets_tagged_default-9 offset 38543576
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:214)
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:175)
 at scala.Option.foreach(Option.scala:257)
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:175)
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:172)
 at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:172)
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:172)
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:172)
 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:255)
 at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:170)
 at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:114)
 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
 Caused by: java.nio.BufferOverflowException
 at java.nio.Buffer.nextPutIndex(Buffer.java:527)
 at java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:793)
 at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:131)
 at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:111)
 at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:111)
 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:255)
 at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:111)
 at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:578)
 at kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1582)
 at kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1582)
 at scala.Option.foreach(Option.scala:257)
 at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1582)
 at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1568)
 at kafka.log.Log.maybeHandleIOException(Log.scala:1943)
 at kafka.log.Log.roll(Log.scala:1568)
 at kafka.log.Log.kafka$log$Log$$maybeRoll(Log.scala:1553)
 at kafka.log.Log$$anonfun$append$2.apply(Log.scala:956)
 at kafka.log.Log$$anonfun$append$2.apply(Log.scala:850)
 at kafka.log.Log.maybeHandleIOException(Log.scala:1943)
 at kafka.log.Log.append(Log.scala:850)
 at kafka.log.Log.appendAsFollower(Log.scala:831)
 at 
kafka.cluster.Partition$$anonfun$doAppendRecordsToFollowerOrFutureReplica$1.apply(Partition.scala:589)
 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:255)
 at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:261)
 at 
kafka.cluster.Partition.doAppendRecordsToFollowerOrFutureReplica(Partition.scala:576)
 at 
kafka.cluster.Partition.appendRecordsToFollowerOrFutureReplica(Partition.scala:596)
 at 
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:129)
 at 
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:43)
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:189)
 ... 13 more
 [2021-03-31 15:23:54,936] INFO (ReplicaFetcherThread-1-1001 
kafka.server.ReplicaFetcherThread 66) [ReplicaFetcher replicaId=1006, 
leaderId=1001, fetcherId=1] Stopped

 


was (Author: wenbing.shen):
Hi, [~amironov]  [~ijuma] , I applied kafka-7283 and kafka-9156 in our version 
of kafka-2.0.0 to apply the benefits of LazyIndex when 

[jira] [Commented] (KAFKA-9156) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow in concurrent state

2021-03-31 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312207#comment-17312207
 ] 

Wenbing Shen commented on KAFKA-9156:
-

Hi, [~amironov]  [~ijuma] , I applied kafka-7283 and kafka-9156 in our version 
of kafka-2.0.0 to apply the benefits of LazyIndex when starting the broker. 
When I applied this feature to a small Kafka cluster, there was no problem, but 
when I first applied it to a cluster with high traffic, some brokers with small 
traffic seemed to be no exception, but after the brokers with large traffic 
started, a large number of replica fetcher threads throw 
java.nio.BufferOverflowException. 

 Same as the problem encountered by [~iBlackeyes] , the current patch still 
does not fix this problem.

Its stack information is as follows:

 

 
{panel:title=我的标题}
文本标题
{panel}
[2021-03-31 15:23:54,935] ERROR (ReplicaFetcherThread-1-1001 
kafka.server.ReplicaFetcherThread 76) [ReplicaFetcher replicaId=1006, 
leaderId=1001, fetcherId=1] Error due to
org.apache.kafka.common.KafkaException: Error processing data for partition 
sinan_assets_tagged_default-9 offset 38543576
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:214)
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:175)
 at scala.Option.foreach(Option.scala:257)
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:175)
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:172)
 at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:172)
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:172)
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:172)
 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:255)
 at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:170)
 at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:114)
 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
Caused by: java.nio.BufferOverflowException
 at java.nio.Buffer.nextPutIndex(Buffer.java:527)
 at java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:793)
 at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:131)
 at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:111)
 at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:111)
 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:255)
 at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:111)
 at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:578)
 at kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1582)
 at kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1582)
 at scala.Option.foreach(Option.scala:257)
 at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1582)
 at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1568)
 at kafka.log.Log.maybeHandleIOException(Log.scala:1943)
 at kafka.log.Log.roll(Log.scala:1568)
 at kafka.log.Log.kafka$log$Log$$maybeRoll(Log.scala:1553)
 at kafka.log.Log$$anonfun$append$2.apply(Log.scala:956)
 at kafka.log.Log$$anonfun$append$2.apply(Log.scala:850)
 at kafka.log.Log.maybeHandleIOException(Log.scala:1943)
 at kafka.log.Log.append(Log.scala:850)
 at kafka.log.Log.appendAsFollower(Log.scala:831)
 at 
kafka.cluster.Partition$$anonfun$doAppendRecordsToFollowerOrFutureReplica$1.apply(Partition.scala:589)
 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:255)
 at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:261)
 at 
kafka.cluster.Partition.doAppendRecordsToFollowerOrFutureReplica(Partition.scala:576)
 at 
kafka.cluster.Partition.appendRecordsToFollowerOrFutureReplica(Partition.scala:596)
 at 
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:129)
 at 
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:43)
 at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:189)
 ... 13 more
[2021-03-31 15:23:54,936] INFO (ReplicaFetcherThread-1-1001 
kafka.server.ReplicaFetcherThread 66) [ReplicaFetcher replicaId=1006, 
leaderId=1001, fetcherId=1] Stopped

 

> LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow in concurrent 
> state
> ---
>
>   

[jira] [Assigned] (KAFKA-12556) Add --under-preferred-replica-partitions option to describe topics command

2021-03-25 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen reassigned KAFKA-12556:


Assignee: Wenbing Shen

> Add --under-preferred-replica-partitions option to describe topics command
> --
>
> Key: KAFKA-12556
> URL: https://issues.apache.org/jira/browse/KAFKA-12556
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Minor
>
> Whether the preferred replica is the partition leader directly affects the 
> external output traffic of the broker. When the preferred replica of all 
> partitions becomes the leader, the external output traffic of the broker will 
> be in a balanced state. When there are a large number of partition leaders 
> that are not preferred replicas, it will be destroyed this state of balance.
> Currently, the controller will periodically check the unbalanced ratio of the 
> partition preferred replicas (if enabled) to trigger the preferred replica 
> election, or manually trigger the election through the kafka-leader-election 
> tool. However, if we want to know which partition leader is in the 
> non-preferred replica, we need to look it up in the controller log or judge 
> ourselves from the topic details list.
> We can add the --under-preferred-replica-partitions configuration option in 
> TopicCommand describe topics to query the list of partitions in the current 
> cluster that are in non-preferred replicas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12556) Add --under-preferred-replica-partitions option to describe topics command

2021-03-25 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-12556:


 Summary: Add --under-preferred-replica-partitions option to 
describe topics command
 Key: KAFKA-12556
 URL: https://issues.apache.org/jira/browse/KAFKA-12556
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Reporter: Wenbing Shen


Whether the preferred replica is the partition leader directly affects the 
external output traffic of the broker. When the preferred replica of all 
partitions becomes the leader, the external output traffic of the broker will 
be in a balanced state. When there are a large number of partition leaders that 
are not preferred replicas, it will be destroyed this state of balance.

Currently, the controller will periodically check the unbalanced ratio of the 
partition preferred replicas (if enabled) to trigger the preferred replica 
election, or manually trigger the election through the kafka-leader-election 
tool. However, if we want to know which partition leader is in the 
non-preferred replica, we need to look it up in the controller log or judge 
ourselves from the topic details list.

We can add the --under-preferred-replica-partitions configuration option in 
TopicCommand describe topics to query the list of partitions in the current 
cluster that are in non-preferred replicas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12507) java.lang.OutOfMemoryError: Direct buffer memory

2021-03-18 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304593#comment-17304593
 ] 

Wenbing Shen commented on KAFKA-12507:
--

How much do you set about these parameters:    
-XX:MaxDirectMemorySize、num.network.threads、socket.request.max.bytes

We have encountered the same problem in the environment of our customers. We 
set  -XX:MaxDirectMemorySize=2G 、num.network.threads=120,Later, I circumvented 
this problem by calculating real-time traffic and reducing the 
num.network.threads.

> java.lang.OutOfMemoryError: Direct buffer memory
> 
>
> Key: KAFKA-12507
> URL: https://issues.apache.org/jira/browse/KAFKA-12507
> Project: Kafka
>  Issue Type: Bug
>  Components: core
> Environment: kafka version: 2.0.1
> java version: 
> java version "1.8.0_211"
> Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
> Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
> the command we use to start kafka broker:
> java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 
> -XX:InitiatingHeapOccupancyPercent=35 -Djava.awt.headless=true 
> -XX:+ExplicitGCInvokesConcurrent 
>Reporter: diehu
>Priority: Major
>
> Hi, we have three brokers in our kafka cluster, and we use scripts to send 
> data to kafka at a rate of about 3.6w eps. After about one month, we got the 
> OOM error: 
> {code:java}
> [2021-01-09 17:12:24,750] ERROR Processor got uncaught exception. 
> (kafka.network.Processor)
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
> at sun.nio.ch.IOUtil.read(IOUtil.java:195)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> at 
> org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:104)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:117)
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:335)
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:296)
> at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:562)
> at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:498)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:427)
> at kafka.network.Processor.poll(SocketServer.scala:679)
> at kafka.network.Processor.run(SocketServer.scala:584)
> at java.lang.Thread.run(Thread.java:748){code}
>  the kafka server is not shutdown, but always get this error. And at the same 
> time, data can not be produced to kafka cluster, consumer can not consume 
> data from kafka cluster.
> We used the recommended java parameter XX:+ExplicitGCInvokesConcurrent  but 
> it seems not useful. 
>  Only kafka cluster restart helps to fix this problem.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-12493) The controller should handle the consistency between the controllerContext and the partition replicas assignment on zookeeper

2021-03-17 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen reassigned KAFKA-12493:


Assignee: Wenbing Shen

> The controller should handle the consistency between the controllerContext 
> and the partition replicas assignment on zookeeper
> -
>
> Key: KAFKA-12493
> URL: https://issues.apache.org/jira/browse/KAFKA-12493
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Major
> Fix For: 3.0.0
>
>
> This question can be linked to this email: 
> [https://lists.apache.org/thread.html/redf5748ec787a9c65fc48597e3d2256ffdd729de14afb873c63e6c5b%40%3Cusers.kafka.apache.org%3E]
>  
> This is a 100% recurring problem.
> Problem description:
> In the production environment of our customer’s site, the existing partitions 
> were redistributed in the code of colleagues in other departments and written 
> into zookeeper. This caused the controller to only judge the newly added 
> partitions when processing partition modification events. Partition 
> allocation plan and new partition and replica allocation in the partition 
> state machine and replica state machine, and issue LeaderAndISR and other 
> control requests.
> But the controller did not verify the existing partition replicas assigment 
> in the controllerContext and whether the original partition allocation on the 
> znode in zookeeper has changed. This seems to be no problem, but when we have 
> to restart the broker for some reasons, such as configuration updates and 
> upgrades Wait, this will cause this part of the topic in real-time production 
> to be abnormal, the controller cannot complete the allocation of the new 
> leader, and the original leader cannot correctly identify the replica 
> allocated on the current zookeeper. The real-time business in our customer's 
> on-site environment is interrupted and partially Data has been lost.
> This problem can be stably reproduced in the following ways:
> Adding partitions or modifying replicas of an existing topic through the 
> following code will cause the original partition replicas to be reallocated 
> and finally written to zookeeper.Next, the controller did not accurately 
> process this event, restart the topic related broker, this topic will not be 
> able to be produced and consumed.
>  
> {code:java}
> public void updateKafkaTopic(KafkaTopicVO kafkaTopicVO) {
> ZkUtils zkUtils = ZkUtils.apply(ZK_LIST, SESSION_TIMEOUT, 
> CONNECTION_TIMEOUT, JaasUtils.isZkSecurityEnabled());
> try {
> if (kafkaTopicVO.getPartitionNum() >= 0 && 
> kafkaTopicVO.getReplicationNum() >= 0) {
> // Get the original broker data information
> Seq brokerMetadata = 
> AdminUtils.getBrokerMetadatas(zkUtils,
> RackAwareMode.Enforced$.MODULE$,
> Option.apply(null));
> // Generate a new partition replica allocation plan
> scala.collection.Map> replicaAssign = 
> AdminUtils.assignReplicasToBrokers(brokerMetadata,
> kafkaTopicVO.getPartitionNum(), // Number of partitions
> kafkaTopicVO.getReplicationNum(), // Number of replicas 
> per partition
> -1,
> -1);
> // Modify the partition replica allocation plan
> AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK(zkUtils,
> kafkaTopicVO.getTopicNameList().get(0),
> replicaAssign,
> null,
> true);
> }
> } catch (Exception e) {
> System.out.println("Adjust partition abnormal");
> System.exit(0);
> } finally {
> zkUtils.close();
> }
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12493) The controller should handle the consistency between the controllerContext and the partition replicas assignment on zookeeper

2021-03-17 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-12493:


 Summary: The controller should handle the consistency between the 
controllerContext and the partition replicas assignment on zookeeper
 Key: KAFKA-12493
 URL: https://issues.apache.org/jira/browse/KAFKA-12493
 Project: Kafka
  Issue Type: Bug
  Components: controller
Affects Versions: 2.7.0, 2.6.0, 2.5.0, 2.4.0, 2.3.0, 2.2.0, 2.1.0, 2.0.0
Reporter: Wenbing Shen
 Fix For: 3.0.0


This question can be linked to this email: 
[https://lists.apache.org/thread.html/redf5748ec787a9c65fc48597e3d2256ffdd729de14afb873c63e6c5b%40%3Cusers.kafka.apache.org%3E]

 

This is a 100% recurring problem.

Problem description:

In the production environment of our customer’s site, the existing partitions 
were redistributed in the code of colleagues in other departments and written 
into zookeeper. This caused the controller to only judge the newly added 
partitions when processing partition modification events. Partition allocation 
plan and new partition and replica allocation in the partition state machine 
and replica state machine, and issue LeaderAndISR and other control requests.

But the controller did not verify the existing partition replicas assigment in 
the controllerContext and whether the original partition allocation on the 
znode in zookeeper has changed. This seems to be no problem, but when we have 
to restart the broker for some reasons, such as configuration updates and 
upgrades Wait, this will cause this part of the topic in real-time production 
to be abnormal, the controller cannot complete the allocation of the new 
leader, and the original leader cannot correctly identify the replica allocated 
on the current zookeeper. The real-time business in our customer's on-site 
environment is interrupted and partially Data has been lost.

This problem can be stably reproduced in the following ways:

Adding partitions or modifying replicas of an existing topic through the 
following code will cause the original partition replicas to be reallocated and 
finally written to zookeeper.Next, the controller did not accurately process 
this event, restart the topic related broker, this topic will not be able to be 
produced and consumed.

 
{code:java}
public void updateKafkaTopic(KafkaTopicVO kafkaTopicVO) {

ZkUtils zkUtils = ZkUtils.apply(ZK_LIST, SESSION_TIMEOUT, 
CONNECTION_TIMEOUT, JaasUtils.isZkSecurityEnabled());
try {
if (kafkaTopicVO.getPartitionNum() >= 0 && 
kafkaTopicVO.getReplicationNum() >= 0) {
// Get the original broker data information
Seq brokerMetadata = 
AdminUtils.getBrokerMetadatas(zkUtils,
RackAwareMode.Enforced$.MODULE$,
Option.apply(null));
// Generate a new partition replica allocation plan
scala.collection.Map> replicaAssign = 
AdminUtils.assignReplicasToBrokers(brokerMetadata,
kafkaTopicVO.getPartitionNum(), // Number of partitions
kafkaTopicVO.getReplicationNum(), // Number of replicas per 
partition
-1,
-1);
// Modify the partition replica allocation plan
AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK(zkUtils,
kafkaTopicVO.getTopicNameList().get(0),
replicaAssign,
null,
true);
}

} catch (Exception e) {
System.out.println("Adjust partition abnormal");
System.exit(0);
} finally {
zkUtils.close();
}
}
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12454) Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in current kafka cluster

2021-03-17 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-12454:
-
Fix Version/s: 3.0.0

> Add ERROR logging on kafka-log-dirs when given brokerIds do not  exist in 
> current kafka cluster
> ---
>
> Key: KAFKA-12454
> URL: https://issues.apache.org/jira/browse/KAFKA-12454
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Minor
> Fix For: 3.0.0
>
>
> When non-existent brokerIds value are given, the kafka-log-dirs tool will 
> have a timeout error:
> Exception in thread "main" java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node 
> assignment. Call: describeLogDirs
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>  at kafka.admin.LogDirsCommand$.describe(LogDirsCommand.scala:50)
>  at kafka.admin.LogDirsCommand$.main(LogDirsCommand.scala:36)
>  at kafka.admin.LogDirsCommand.main(LogDirsCommand.scala)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting 
> for a node assignment. Call: describeLogDirs
>  
> When the brokerId entered by the user does not exist, an error message 
> indicating that the node is not present should be printed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12454) Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in current kafka cluster

2021-03-13 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-12454:
-
Affects Version/s: (was: 2.8.0)

> Add ERROR logging on kafka-log-dirs when given brokerIds do not  exist in 
> current kafka cluster
> ---
>
> Key: KAFKA-12454
> URL: https://issues.apache.org/jira/browse/KAFKA-12454
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Minor
>
> When non-existent brokerIds value are given, the kafka-log-dirs tool will 
> have a timeout error:
> Exception in thread "main" java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node 
> assignment. Call: describeLogDirs
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>  at kafka.admin.LogDirsCommand$.describe(LogDirsCommand.scala:50)
>  at kafka.admin.LogDirsCommand$.main(LogDirsCommand.scala:36)
>  at kafka.admin.LogDirsCommand.main(LogDirsCommand.scala)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting 
> for a node assignment. Call: describeLogDirs
>  
> When the brokerId entered by the user does not exist, an error message 
> indicating that the node is not present should be printed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12454) Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in current kafka cluster

2021-03-12 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-12454:
-
Affects Version/s: 2.8.0

> Add ERROR logging on kafka-log-dirs when given brokerIds do not  exist in 
> current kafka cluster
> ---
>
> Key: KAFKA-12454
> URL: https://issues.apache.org/jira/browse/KAFKA-12454
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Minor
>
> When non-existent brokerIds value are given, the kafka-log-dirs tool will 
> have a timeout error:
> Exception in thread "main" java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node 
> assignment. Call: describeLogDirs
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>  at kafka.admin.LogDirsCommand$.describe(LogDirsCommand.scala:50)
>  at kafka.admin.LogDirsCommand$.main(LogDirsCommand.scala:36)
>  at kafka.admin.LogDirsCommand.main(LogDirsCommand.scala)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting 
> for a node assignment. Call: describeLogDirs
>  
> When the brokerId entered by the user does not exist, an error message 
> indicating that the node is not present should be printed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-12454) Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in current kafka cluster

2021-03-11 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen reassigned KAFKA-12454:


Assignee: Wenbing Shen

> Add ERROR logging on kafka-log-dirs when given brokerIds do not  exist in 
> current kafka cluster
> ---
>
> Key: KAFKA-12454
> URL: https://issues.apache.org/jira/browse/KAFKA-12454
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Minor
>
> When non-existent brokerIds value are given, the kafka-log-dirs tool will 
> have a timeout error:
> Exception in thread "main" java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node 
> assignment. Call: describeLogDirs
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>  at kafka.admin.LogDirsCommand$.describe(LogDirsCommand.scala:50)
>  at kafka.admin.LogDirsCommand$.main(LogDirsCommand.scala:36)
>  at kafka.admin.LogDirsCommand.main(LogDirsCommand.scala)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting 
> for a node assignment. Call: describeLogDirs
>  
> When the brokerId entered by the user does not exist, an error message 
> indicating that the node is not present should be printed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12454) Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in current kafka cluster

2021-03-11 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-12454:


 Summary: Add ERROR logging on kafka-log-dirs when given brokerIds 
do not  exist in current kafka cluster
 Key: KAFKA-12454
 URL: https://issues.apache.org/jira/browse/KAFKA-12454
 Project: Kafka
  Issue Type: Improvement
Reporter: Wenbing Shen


When non-existent brokerIds value are given, the kafka-log-dirs tool will have 
a timeout error:

Exception in thread "main" java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node 
assignment. Call: describeLogDirs
 at 
org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
 at 
org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
 at 
org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
 at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
 at kafka.admin.LogDirsCommand$.describe(LogDirsCommand.scala:50)
 at kafka.admin.LogDirsCommand$.main(LogDirsCommand.scala:36)
 at kafka.admin.LogDirsCommand.main(LogDirsCommand.scala)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting 
for a node assignment. Call: describeLogDirs

 

When the brokerId entered by the user does not exist, an error message 
indicating that the node is not present should be printed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12445) Improve the display of ConsumerPerformance indicators

2021-03-09 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298532#comment-17298532
 ] 

Wenbing Shen commented on KAFKA-12445:
--

In my own environment, I do this. I don’t know if the community can accept this 
approach.

!image-2021-03-10-13-30-27-734.png!

> Improve the display of ConsumerPerformance indicators
> -
>
> Key: KAFKA-12445
> URL: https://issues.apache.org/jira/browse/KAFKA-12445
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.7.0
>Reporter: Wenbing Shen
>Priority: Minor
> Attachments: image-2021-03-10-13-30-27-734.png
>
>
> The current test indicators are shown below, the user experience is poor, and 
> there is no intuitive display of the meaning of each indicator.
> bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic 
> test-perf10 --messages 4 --from-latest
> start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, 
> nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
> 2021-03-10 04:32:54:349, 2021-03-10 04:33:45:651, 390.6348, 7.6144, 40001, 
> 779.7162, 3096, 48206, 8.1034, 829.7930
>  
> show-detailed-stats:
> bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic 
> test-perf --messages 1 --show-detailed-stats
> time, threadId, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, 
> rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
> 2021-03-10 11:19:00:146, 0, 785.6112, 157.1222, 823773, 164754.6000, 
> 1615346338626, -1615346333626, 0., 0.
> 2021-03-10 11:19:05:146, 0, 4577.7817, 758.4341, 4800152, 795275.8000, 0, 
> 5000, 758.4341, 795275.8000
> 2021-03-10 11:19:10:146, 0, 8556.0875, 795.6612, 8971708, 834311.2000, 0, 
> 5000, 795.6612, 834311.2000
> 2021-03-10 11:19:15:286, 0, 9526.5665, 188.8091, 9989329, 197980.7393, 0, 
> 5140, 188.8091, 197980.7393
> 2021-03-10 11:19:20:310, 0, 9536.3321, 1.9438, 569, 2038.2166, 0, 5024, 
> 1.9438, 2038.2166
>  
> One of the optimization methods is to display the indicator variable name and 
> indicator test value in the form of a table, so that the meaning of each 
> measurement value can be clearly expressed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12445) Improve the display of ConsumerPerformance indicators

2021-03-09 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-12445:
-
Attachment: image-2021-03-10-13-30-27-734.png

> Improve the display of ConsumerPerformance indicators
> -
>
> Key: KAFKA-12445
> URL: https://issues.apache.org/jira/browse/KAFKA-12445
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.7.0
>Reporter: Wenbing Shen
>Priority: Minor
> Attachments: image-2021-03-10-13-30-27-734.png
>
>
> The current test indicators are shown below, the user experience is poor, and 
> there is no intuitive display of the meaning of each indicator.
> bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic 
> test-perf10 --messages 4 --from-latest
> start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, 
> nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
> 2021-03-10 04:32:54:349, 2021-03-10 04:33:45:651, 390.6348, 7.6144, 40001, 
> 779.7162, 3096, 48206, 8.1034, 829.7930
>  
> show-detailed-stats:
> bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic 
> test-perf --messages 1 --show-detailed-stats
> time, threadId, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, 
> rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
> 2021-03-10 11:19:00:146, 0, 785.6112, 157.1222, 823773, 164754.6000, 
> 1615346338626, -1615346333626, 0., 0.
> 2021-03-10 11:19:05:146, 0, 4577.7817, 758.4341, 4800152, 795275.8000, 0, 
> 5000, 758.4341, 795275.8000
> 2021-03-10 11:19:10:146, 0, 8556.0875, 795.6612, 8971708, 834311.2000, 0, 
> 5000, 795.6612, 834311.2000
> 2021-03-10 11:19:15:286, 0, 9526.5665, 188.8091, 9989329, 197980.7393, 0, 
> 5140, 188.8091, 197980.7393
> 2021-03-10 11:19:20:310, 0, 9536.3321, 1.9438, 569, 2038.2166, 0, 5024, 
> 1.9438, 2038.2166
>  
> One of the optimization methods is to display the indicator variable name and 
> indicator test value in the form of a table, so that the meaning of each 
> measurement value can be clearly expressed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12445) Improve the display of ConsumerPerformance indicators

2021-03-09 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-12445:


 Summary: Improve the display of ConsumerPerformance indicators
 Key: KAFKA-12445
 URL: https://issues.apache.org/jira/browse/KAFKA-12445
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.7.0
Reporter: Wenbing Shen


The current test indicators are shown below, the user experience is poor, and 
there is no intuitive display of the meaning of each indicator.

bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic 
test-perf10 --messages 4 --from-latest

start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, 
nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2021-03-10 04:32:54:349, 2021-03-10 04:33:45:651, 390.6348, 7.6144, 40001, 
779.7162, 3096, 48206, 8.1034, 829.7930

 

show-detailed-stats:

bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic test-perf 
--messages 1 --show-detailed-stats

time, threadId, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, 
rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2021-03-10 11:19:00:146, 0, 785.6112, 157.1222, 823773, 164754.6000, 
1615346338626, -1615346333626, 0., 0.
2021-03-10 11:19:05:146, 0, 4577.7817, 758.4341, 4800152, 795275.8000, 0, 5000, 
758.4341, 795275.8000
2021-03-10 11:19:10:146, 0, 8556.0875, 795.6612, 8971708, 834311.2000, 0, 5000, 
795.6612, 834311.2000
2021-03-10 11:19:15:286, 0, 9526.5665, 188.8091, 9989329, 197980.7393, 0, 5140, 
188.8091, 197980.7393
2021-03-10 11:19:20:310, 0, 9536.3321, 1.9438, 569, 2038.2166, 0, 5024, 
1.9438, 2038.2166

 

One of the optimization methods is to display the indicator variable name and 
indicator test value in the form of a table, so that the meaning of each 
measurement value can be clearly expressed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10886) Kafka crashed in windows environment2

2021-03-01 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-10886:
-
Attachment: (was: image-2021-03-01-14-29-55-418.png)

> Kafka crashed in windows environment2
> -
>
> Key: KAFKA-10886
> URL: https://issues.apache.org/jira/browse/KAFKA-10886
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.0.0
> Environment: Windows Server
>Reporter: Wenbing Shen
>Priority: Critical
>  Labels: windows
> Attachments: windows_kafka_full_crash.patch
>
>
> I tried using the 
> [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix 
> the Kafka problem in the Windows environment, but it didn't seem to 
> work.These include restarting the Kafka service causing data to be deleted by 
> mistake, deleting a topic or a partition migration causing a disk to go 
> offline or the broker crashed.
> [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23
>  17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException:
>  
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at 
> kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log.maybeHandleIOException(Log.scala:1837) at 
> kafka.log.Log.renameDir(Log.scala:687) at 
> kafka.log.LogManager.asyncDelete(LogManager.scala:833) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at 
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at 
> kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at 
> kafka.cluster.Partition.delete(Partition.scala:262) at 
> kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371)
>  at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at 
> kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:113) at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at 
> java.lang.Thread.run(Unknown Source) Suppressed: 
> java.nio.file.AccessDeniedException: 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) 
> ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler 
> kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving 
> replicas in dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10886) Kafka crashed in windows environment2

2021-02-28 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292652#comment-17292652
 ] 

Wenbing Shen commented on KAFKA-10886:
--

[~mehajabeeny]  I'm not sure if applying 2.7.0 from the kafka-2.0.0 version 
would cause problems, but I fixed kafka-2.0.0 version, which is currently 
working well on our company's products in the customer's field. kafka-9458 does 
not fully solve the problem in windows system. These include restarting the 
Kafka service causing data to be deleted by mistake, deleting a topic or a 
partition migration causing a disk to go offline or the broker crashed.Can you 
manually go through the code to apply this patch, and have you solved the 
problem currently?

> Kafka crashed in windows environment2
> -
>
> Key: KAFKA-10886
> URL: https://issues.apache.org/jira/browse/KAFKA-10886
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.0.0
> Environment: Windows Server
>Reporter: Wenbing Shen
>Priority: Critical
>  Labels: windows
> Attachments: image-2021-03-01-14-29-55-418.png, 
> windows_kafka_full_crash.patch
>
>
> I tried using the 
> [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix 
> the Kafka problem in the Windows environment, but it didn't seem to 
> work.These include restarting the Kafka service causing data to be deleted by 
> mistake, deleting a topic or a partition migration causing a disk to go 
> offline or the broker crashed.
> [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23
>  17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException:
>  
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at 
> kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log.maybeHandleIOException(Log.scala:1837) at 
> kafka.log.Log.renameDir(Log.scala:687) at 
> kafka.log.LogManager.asyncDelete(LogManager.scala:833) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at 
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at 
> kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at 
> kafka.cluster.Partition.delete(Partition.scala:262) at 
> kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371)
>  at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at 
> kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:113) at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at 
> java.lang.Thread.run(Unknown Source) Suppressed: 
> java.nio.file.AccessDeniedException: 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> 

[jira] [Updated] (KAFKA-10886) Kafka crashed in windows environment2

2021-02-28 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-10886:
-
Attachment: image-2021-03-01-14-29-55-418.png

> Kafka crashed in windows environment2
> -
>
> Key: KAFKA-10886
> URL: https://issues.apache.org/jira/browse/KAFKA-10886
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.0.0
> Environment: Windows Server
>Reporter: Wenbing Shen
>Priority: Critical
>  Labels: windows
> Attachments: image-2021-03-01-14-29-55-418.png, 
> windows_kafka_full_crash.patch
>
>
> I tried using the 
> [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix 
> the Kafka problem in the Windows environment, but it didn't seem to 
> work.These include restarting the Kafka service causing data to be deleted by 
> mistake, deleting a topic or a partition migration causing a disk to go 
> offline or the broker crashed.
> [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23
>  17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException:
>  
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at 
> kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log.maybeHandleIOException(Log.scala:1837) at 
> kafka.log.Log.renameDir(Log.scala:687) at 
> kafka.log.LogManager.asyncDelete(LogManager.scala:833) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at 
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at 
> kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at 
> kafka.cluster.Partition.delete(Partition.scala:262) at 
> kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371)
>  at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at 
> kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:113) at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at 
> java.lang.Thread.run(Unknown Source) Suppressed: 
> java.nio.file.AccessDeniedException: 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) 
> ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler 
> kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving 
> replicas in dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12157) test Upgrade 2.7.0 from 2.0.0 occur a question

2021-01-07 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-12157:


 Summary: test Upgrade 2.7.0 from 2.0.0 occur a question
 Key: KAFKA-12157
 URL: https://issues.apache.org/jira/browse/KAFKA-12157
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 2.7.0
Reporter: Wenbing Shen
 Attachments: 1001server.log, 1001serverlog.png, 1003server.log, 
1003serverlog.png, 1003statechange.log

I was in a test environment, rolling upgrade from version 2.0.0 to version 
2.7.0, and encountered the following problems. When the rolling upgrade 
progressed to the second round, I stopped the first broker(1001) in the second 
round and the following error occurred. When an agent processes the client 
producer request, the starting offset of the leader epoch of the partition 
leader suddenly becomes 0, and then continues to process write requests for the 
same partition, and an error log will appear.All partition leaders with 1001 
replicas are transferred to the 1003 node, and these partitions on the 1003 
node will generate this error if they receive production requests.When I 
restart 1001, the 1001 broker will report the following error:

[2021-01-06 16:46:55,955] ERROR (ReplicaFetcherThread-8-1003 
kafka.server.ReplicaFetcherThread 76) [ReplicaFetcher replicaId=1001, 
leaderId=1003, fetcherId=8] Unexpected error occurred while processing data for 
partition test-perf1-9 at offset 9666953

I use the following command to make a production request:

nohup /home/kafka/software/kafka/bin/kafka-producer-perf-test.sh --num-records 
1 --record-size 1000 --throughput 3 --producer-props 
bootstrap.servers=hdp1:9092,hdp2:9092,hdp3:9092 acks=1 --topic test-perf1 > 
1pro.log 2>&1 &

 

I tried to reproduce the problem again, but after three attempts, it did not 
reappear. I am curious how this problem occurred and why the 1003 broker resets 
startOffset to 0 of leaderEpoch 4 when the offset is assigned by broker in 
Log.append function.

 

broker 1003: server.log

[2021-01-06 16:37:59,492] WARN (data-plane-kafka-request-handler-131 
kafka.server.epoch.LeaderEpochFileCache 70) [LeaderEpochCache test-perf1-9] New 
epoch en
try EpochEntry(epoch=4, startOffset=0) caused truncation of conflicting entries 
ListBuffer(EpochEntry(epoch=4, startOffset=9667122), EpochEntry(epoch=3, star
tOffset=9195729), EpochEntry(epoch=2, startOffset=8348201)). Cache now contains 
0 entries.
[2021-01-06 16:37:59,493] WARN (data-plane-kafka-request-handler-131 
kafka.server.epoch.LeaderEpochFileCache 70) [LeaderEpochCache test-perf1-8] New 
epoch en
try EpochEntry(epoch=3, startOffset=0) caused truncation of conflicting entries 
ListBuffer(EpochEntry(epoch=3, startOffset=9667478), EpochEntry(epoch=2, star
tOffset=9196127), EpochEntry(epoch=1, startOffset=8342787)). Cache now contains 
0 entries.
[2021-01-06 16:37:59,495] WARN (data-plane-kafka-request-handler-131 
kafka.server.epoch.LeaderEpochFileCache 70) [LeaderEpochCache test-perf1-2] New 
epoch en
try EpochEntry(epoch=3, startOffset=0) caused truncation of conflicting entries 
ListBuffer(EpochEntry(epoch=3, startOffset=9667478), EpochEntry(epoch=2, star
tOffset=9196127), EpochEntry(epoch=1, startOffset=8336727)). Cache now contains 
0 entries.
[2021-01-06 16:37:59,498] ERROR (data-plane-kafka-request-handler-142 
kafka.server.ReplicaManager 76) [ReplicaManager broker=1003] Error processing 
append op
eration on partition test-perf1-9
java.lang.IllegalArgumentException: Received invalid partition leader epoch 
entry EpochEntry(epoch=4, startOffset=-3)
 at 
kafka.server.epoch.LeaderEpochFileCache.assign(LeaderEpochFileCache.scala:67)
 at 
kafka.server.epoch.LeaderEpochFileCache.assign(LeaderEpochFileCache.scala:59)
 at kafka.log.Log.maybeAssignEpochStartOffset(Log.scala:1268)
 at kafka.log.Log.$anonfun$append$6(Log.scala:1181)
 at kafka.log.Log$$Lambda$935/184936331.accept(Unknown Source)
 at java.lang.Iterable.forEach(Iterable.java:75)
 at kafka.log.Log.$anonfun$append$2(Log.scala:1179)
 at kafka.log.Log.append(Log.scala:2387)
 at kafka.log.Log.appendAsLeader(Log.scala:1050)
 at 
kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1079)
 at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1067)
 at 
kafka.server.ReplicaManager.$anonfun$appendToLocalLog$4(ReplicaManager.scala:953)
 at kafka.server.ReplicaManager$$Lambda$1025/1369541490.apply(Unknown Source)
 at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
 at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
 at scala.collection.mutable.HashMap.map(HashMap.scala:35)
 at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:941)
 at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:621)
 at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:625)

 

broker 1001:server.log

[2021-01-06 16:46:55,955] 

[jira] [Assigned] (KAFKA-10904) There is a misleading log when the replica fetcher thread handles offsets that are out of range

2021-01-05 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen reassigned KAFKA-10904:


Assignee: Wenbing Shen

> There is a misleading log when the replica fetcher thread handles offsets 
> that are out of range
> ---
>
> Key: KAFKA-10904
> URL: https://issues.apache.org/jira/browse/KAFKA-10904
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 2.7.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Minor
> Attachments: ReplicaFetcherThread-has-a-misleading-log.png
>
>
> There is ambiguity in the replica fetcher thread's log. When the fetcher 
> thread is handling with offset out of range, it needs to try to truncate the 
> log. When the end offset of the follower replica is greater than the log 
> start offset of the leader replica and smaller than the end offset of the 
> leader replica, the follower replica will maintain its own fetch 
> offset.However, such cases are processed together with cases where the 
> follower replica's end offset is smaller than the leader replica's start 
> offset, resulting in ambiguities in the log, where the follower replica's 
> fetch offset is reported to reset to the leader replica's start offset.In 
> fact, it still maintains its own fetch offset, so this WARN log is misleading 
> to the user.
>  
> [2020-11-12 05:30:54,319] WARN (ReplicaFetcherThread-1-1003 
> kafka.server.ReplicaFetcherThread 70) [ReplicaFetcher replicaId=1
> 010, leaderId=1003, fetcherId=1] Reset fetch offset for partition 
> eb_raw_msdns-17 from 1933959108 to current leader's start o
> ffset 1883963889
> [2020-11-12 05:30:54,320] INFO (ReplicaFetcherThread-1-1003 
> kafka.server.ReplicaFetcherThread 66) [ReplicaFetcher replicaId=1
> 010, leaderId=1003, fetcherId=1] Current offset 1933959108 for partition 
> eb_raw_msdns-17 is out of range, which typically imp
> lies a leader change. Reset fetch offset to 1933959108
>  
> I think it is more accurate to print the WARN log only when follower replica 
> really need to truncate the fetch offset to the leader replica's log start 
> offset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10904) There is a misleading log when the replica fetcher thread handles offsets that are out of range

2021-01-05 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-10904:


 Summary: There is a misleading log when the replica fetcher thread 
handles offsets that are out of range
 Key: KAFKA-10904
 URL: https://issues.apache.org/jira/browse/KAFKA-10904
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 2.7.0
Reporter: Wenbing Shen
 Attachments: ReplicaFetcherThread-has-a-misleading-log.png

There is ambiguity in the replica fetcher thread's log. When the fetcher thread 
is handling with offset out of range, it needs to try to truncate the log. When 
the end offset of the follower replica is greater than the log start offset of 
the leader replica and smaller than the end offset of the leader replica, the 
follower replica will maintain its own fetch offset.However, such cases are 
processed together with cases where the follower replica's end offset is 
smaller than the leader replica's start offset, resulting in ambiguities in the 
log, where the follower replica's fetch offset is reported to reset to the 
leader replica's start offset.In fact, it still maintains its own fetch offset, 
so this WARN log is misleading to the user.

 

[2020-11-12 05:30:54,319] WARN (ReplicaFetcherThread-1-1003 
kafka.server.ReplicaFetcherThread 70) [ReplicaFetcher replicaId=1
010, leaderId=1003, fetcherId=1] Reset fetch offset for partition 
eb_raw_msdns-17 from 1933959108 to current leader's start o
ffset 1883963889
[2020-11-12 05:30:54,320] INFO (ReplicaFetcherThread-1-1003 
kafka.server.ReplicaFetcherThread 66) [ReplicaFetcher replicaId=1
010, leaderId=1003, fetcherId=1] Current offset 1933959108 for partition 
eb_raw_msdns-17 is out of range, which typically imp
lies a leader change. Reset fetch offset to 1933959108

 

I think it is more accurate to print the WARN log only when follower replica 
really need to truncate the fetch offset to the leader replica's log start 
offset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10889) The log cleaner is not working for topic partitions

2020-12-30 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256867#comment-17256867
 ] 

Wenbing Shen commented on KAFKA-10889:
--

[~becket_qin] Hello,Qin,I found an internal theme that has not been deleted 
since September 30 this year, and there are many empty log segments. Because 
the log segment cannot be deleted, a partition copy of this theme has occupied 
753G. Do you know what caused this? !20201231-1.png!

> The log cleaner is not working for topic partitions
> ---
>
> Key: KAFKA-10889
> URL: https://issues.apache.org/jira/browse/KAFKA-10889
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.0.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, 
> 20201231-1.png, 20201231-2.png, 20201231-3.png, 20201231-4.png, 
> 20201231-5.png, image-2020-12-28-17-17-15-947.png
>
>
> * I have a topic that is reserved for the default of 7 days, but the log 
> exists from October 26th to December 25th today.The log cleaner doesn't seem 
> to be working on it.This seems to be an underlying problem in Kafka.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10889) The log cleaner is not working for topic partitions

2020-12-30 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-10889:
-
Attachment: 20201231-5.png
20201231-4.png
20201231-3.png
20201231-2.png
20201231-1.png

> The log cleaner is not working for topic partitions
> ---
>
> Key: KAFKA-10889
> URL: https://issues.apache.org/jira/browse/KAFKA-10889
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.0.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, 
> 20201231-1.png, 20201231-2.png, 20201231-3.png, 20201231-4.png, 
> 20201231-5.png, image-2020-12-28-17-17-15-947.png
>
>
> * I have a topic that is reserved for the default of 7 days, but the log 
> exists from October 26th to December 25th today.The log cleaner doesn't seem 
> to be working on it.This seems to be an underlying problem in Kafka.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10891) The control plane needs to force the validation of requests from the controller

2020-12-30 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-10891:
-
Component/s: (was: core)

> The control plane needs to force the validation of requests from the 
> controller
> ---
>
> Key: KAFKA-10891
> URL: https://issues.apache.org/jira/browse/KAFKA-10891
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, network
>Affects Versions: 2.7.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Major
> Attachments: 0880c08b0110fd91d30e.png, 0880c08b0110fdd1eb0f.png
>
>
> Current, data and control request through different plane in isolation, these 
> endpoints are registered to the zookeeper node plane, this will cause the 
> client to obtain the control endpoint, the client may use control endpoint 
> for production, consumption and other data request, this violates the 
> separation of data requests and the design of the control request, the server 
> needs to be in the control plane inspection control request, refused to 
> request data through the control plane.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10891) The control plane needs to force the validation of requests from the controller

2020-12-30 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-10891:
-
Component/s: network

> The control plane needs to force the validation of requests from the 
> controller
> ---
>
> Key: KAFKA-10891
> URL: https://issues.apache.org/jira/browse/KAFKA-10891
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, core, network
>Affects Versions: 2.7.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Major
> Attachments: 0880c08b0110fd91d30e.png, 0880c08b0110fdd1eb0f.png
>
>
> Current, data and control request through different plane in isolation, these 
> endpoints are registered to the zookeeper node plane, this will cause the 
> client to obtain the control endpoint, the client may use control endpoint 
> for production, consumption and other data request, this violates the 
> separation of data requests and the design of the control request, the server 
> needs to be in the control plane inspection control request, refused to 
> request data through the control plane.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10891) The control plane needs to force the validation of requests from the controller

2020-12-29 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256052#comment-17256052
 ] 

Wenbing Shen commented on KAFKA-10891:
--

[~junrao] Can you take a look at this PR?Thank you very much!

> The control plane needs to force the validation of requests from the 
> controller
> ---
>
> Key: KAFKA-10891
> URL: https://issues.apache.org/jira/browse/KAFKA-10891
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, core
>Affects Versions: 2.7.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Major
> Attachments: 0880c08b0110fd91d30e.png, 0880c08b0110fdd1eb0f.png
>
>
> Current, data and control request through different plane in isolation, these 
> endpoints are registered to the zookeeper node plane, this will cause the 
> client to obtain the control endpoint, the client may use control endpoint 
> for production, consumption and other data request, this violates the 
> separation of data requests and the design of the control request, the server 
> needs to be in the control plane inspection control request, refused to 
> request data through the control plane.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10889) The log cleaner is not working for topic partitions

2020-12-29 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255980#comment-17255980
 ] 

Wenbing Shen commented on KAFKA-10889:
--

[~becket_qin] Thank you for your guidance.

> The log cleaner is not working for topic partitions
> ---
>
> Key: KAFKA-10889
> URL: https://issues.apache.org/jira/browse/KAFKA-10889
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.0.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, 
> image-2020-12-28-17-17-15-947.png
>
>
> * I have a topic that is reserved for the default of 7 days, but the log 
> exists from October 26th to December 25th today.The log cleaner doesn't seem 
> to be working on it.This seems to be an underlying problem in Kafka.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10891) The control plane needs to force the validation of requests from the controller

2020-12-29 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255911#comment-17255911
 ] 

Wenbing Shen commented on KAFKA-10891:
--

[~showuon]  I added validation of control requests in PR, and, when the control 
plane is on, the broker sends the controlled shutdown request through the 
control plane.

> The control plane needs to force the validation of requests from the 
> controller
> ---
>
> Key: KAFKA-10891
> URL: https://issues.apache.org/jira/browse/KAFKA-10891
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, core
>Affects Versions: 2.7.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Major
> Attachments: 0880c08b0110fd91d30e.png, 0880c08b0110fdd1eb0f.png
>
>
> Current, data and control request through different plane in isolation, these 
> endpoints are registered to the zookeeper node plane, this will cause the 
> client to obtain the control endpoint, the client may use control endpoint 
> for production, consumption and other data request, this violates the 
> separation of data requests and the design of the control request, the server 
> needs to be in the control plane inspection control request, refused to 
> request data through the control plane.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10891) The control plane needs to force the validation of requests from the controller

2020-12-29 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-10891:


 Summary: The control plane needs to force the validation of 
requests from the controller
 Key: KAFKA-10891
 URL: https://issues.apache.org/jira/browse/KAFKA-10891
 Project: Kafka
  Issue Type: Improvement
  Components: controller, core
Affects Versions: 2.7.0
Reporter: Wenbing Shen
Assignee: Wenbing Shen
 Attachments: 0880c08b0110fd91d30e.png, 0880c08b0110fdd1eb0f.png

Current, data and control request through different plane in isolation, these 
endpoints are registered to the zookeeper node plane, this will cause the 
client to obtain the control endpoint, the client may use control endpoint for 
production, consumption and other data request, this violates the separation of 
data requests and the design of the control request, the server needs to be in 
the control plane inspection control request, refused to request data through 
the control plane.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10889) The log cleaner is not working for topic partitions

2020-12-28 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255618#comment-17255618
 ] 

Wenbing Shen commented on KAFKA-10889:
--

[~becket_qin]  Why not isolate the message creation time from the log append 
time,return the creation time for the client,and use the log append time 
between brokers,which is more friendly to log cleanup.

> The log cleaner is not working for topic partitions
> ---
>
> Key: KAFKA-10889
> URL: https://issues.apache.org/jira/browse/KAFKA-10889
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.0.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, 
> image-2020-12-28-17-17-15-947.png
>
>
> * I have a topic that is reserved for the default of 7 days, but the log 
> exists from October 26th to December 25th today.The log cleaner doesn't seem 
> to be working on it.This seems to be an underlying problem in Kafka.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-10889) The log cleaner is not working for topic partitions

2020-12-28 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255491#comment-17255491
 ] 

Wenbing Shen edited comment on KAFKA-10889 at 12/28/20, 9:43 AM:
-

Hello,When the user is unfamiliar with the configuration, is there a way to 
solve the problem by design?

[~becket_qin]

[~junrao]


was (Author: wenbing.shen):
[~becket_qin]

[~junrao]

> The log cleaner is not working for topic partitions
> ---
>
> Key: KAFKA-10889
> URL: https://issues.apache.org/jira/browse/KAFKA-10889
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.0.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, 
> image-2020-12-28-17-17-15-947.png
>
>
> * I have a topic that is reserved for the default of 7 days, but the log 
> exists from October 26th to December 25th today.The log cleaner doesn't seem 
> to be working on it.This seems to be an underlying problem in Kafka.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-10889) The log cleaner is not working for topic partitions

2020-12-28 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255491#comment-17255491
 ] 

Wenbing Shen edited comment on KAFKA-10889 at 12/28/20, 9:43 AM:
-

[~becket_qin]

[~junrao]

Hello,When the user is unfamiliar with the configuration, is there a way to 
solve the problem by design?


was (Author: wenbing.shen):
Hello,When the user is unfamiliar with the configuration, is there a way to 
solve the problem by design?

[~becket_qin]

[~junrao]

> The log cleaner is not working for topic partitions
> ---
>
> Key: KAFKA-10889
> URL: https://issues.apache.org/jira/browse/KAFKA-10889
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.0.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, 
> image-2020-12-28-17-17-15-947.png
>
>
> * I have a topic that is reserved for the default of 7 days, but the log 
> exists from October 26th to December 25th today.The log cleaner doesn't seem 
> to be working on it.This seems to be an underlying problem in Kafka.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10889) The log cleaner is not working for topic partitions

2020-12-28 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255491#comment-17255491
 ] 

Wenbing Shen commented on KAFKA-10889:
--

[~becket_qin]

[~junrao]

> The log cleaner is not working for topic partitions
> ---
>
> Key: KAFKA-10889
> URL: https://issues.apache.org/jira/browse/KAFKA-10889
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.0.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, 
> image-2020-12-28-17-17-15-947.png
>
>
> * I have a topic that is reserved for the default of 7 days, but the log 
> exists from October 26th to December 25th today.The log cleaner doesn't seem 
> to be working on it.This seems to be an underlying problem in Kafka.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-10889) The log cleaner is not working for topic partitions

2020-12-28 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255480#comment-17255480
 ] 

Wenbing Shen edited comment on KAFKA-10889 at 12/28/20, 9:24 AM:
-

The problem appears to be related to this kip:

[https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message]

Related to these two configurations:

message.timestamp.type=CreateTime

message.timestamp.difference.max.ms=Long.MAXVALUE

The client generated messages which timestamp is 2020-12-26 14:49:32,this 
results in a period before 2021-01-02 14:49:32,all log segments of this 
partition will not be deleted.

 

!image-2020-12-28-17-17-15-947.png!


was (Author: wenbing.shen):
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message]

message.timestamp.type
max.message.time.difference.ms

> The log cleaner is not working for topic partitions
> ---
>
> Key: KAFKA-10889
> URL: https://issues.apache.org/jira/browse/KAFKA-10889
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.0.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png, 
> image-2020-12-28-17-17-15-947.png
>
>
> * I have a topic that is reserved for the default of 7 days, but the log 
> exists from October 26th to December 25th today.The log cleaner doesn't seem 
> to be working on it.This seems to be an underlying problem in Kafka.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10889) The log cleaner is not working for topic partitions

2020-12-28 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255480#comment-17255480
 ] 

Wenbing Shen commented on KAFKA-10889:
--

[https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message]

message.timestamp.type
max.message.time.difference.ms

> The log cleaner is not working for topic partitions
> ---
>
> Key: KAFKA-10889
> URL: https://issues.apache.org/jira/browse/KAFKA-10889
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.0.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png
>
>
> * I have a topic that is reserved for the default of 7 days, but the log 
> exists from October 26th to December 25th today.The log cleaner doesn't seem 
> to be working on it.This seems to be an underlying problem in Kafka.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10889) The log cleaner is not working for topic partitions

2020-12-25 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254945#comment-17254945
 ] 

Wenbing Shen commented on KAFKA-10889:
--

* I'm getting it a little wrong here,log cleaner is only working on the log 
segment of the compact type.Uncompact segments are removed by the log manager's 
timed task.However, there is still a problem on my side that the log segment of 
non compact type has not been cleaned up.

> The log cleaner is not working for topic partitions
> ---
>
> Key: KAFKA-10889
> URL: https://issues.apache.org/jira/browse/KAFKA-10889
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.0.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png
>
>
> * I have a topic that is reserved for the default of 7 days, but the log 
> exists from October 26th to December 25th today.The log cleaner doesn't seem 
> to be working on it.This seems to be an underlying problem in Kafka.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-10889) The log cleaner is not working for topic partitions

2020-12-25 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen reassigned KAFKA-10889:


Assignee: Wenbing Shen

> The log cleaner is not working for topic partitions
> ---
>
> Key: KAFKA-10889
> URL: https://issues.apache.org/jira/browse/KAFKA-10889
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.0.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Blocker
> Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png
>
>
> * I have a topic that is reserved for the default of 7 days, but the log 
> exists from October 26th to December 25th today.The log cleaner doesn't seem 
> to be working on it.This seems to be an underlying problem in Kafka.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10889) The log cleaner is not working for topic partitions

2020-12-25 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-10889:


 Summary: The log cleaner is not working for topic partitions
 Key: KAFKA-10889
 URL: https://issues.apache.org/jira/browse/KAFKA-10889
 Project: Kafka
  Issue Type: Bug
  Components: log cleaner
Affects Versions: 2.0.0
Reporter: Wenbing Shen
 Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png

* I have a topic that is reserved for the default of 7 days, but the log exists 
from October 26th to December 25th today.The log cleaner doesn't seem to be 
working on it.This seems to be an underlying problem in Kafka.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-10886) Kafka crashed in windows environment2

2020-12-23 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen reassigned KAFKA-10886:


Assignee: (was: Wenbing Shen)

> Kafka crashed in windows environment2
> -
>
> Key: KAFKA-10886
> URL: https://issues.apache.org/jira/browse/KAFKA-10886
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.0.0
> Environment: Windows Server
>Reporter: Wenbing Shen
>Priority: Critical
>  Labels: windows
> Attachments: windows_kafka_full_crash.patch
>
>
> I tried using the 
> [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix 
> the Kafka problem in the Windows environment, but it didn't seem to 
> work.These include restarting the Kafka service causing data to be deleted by 
> mistake, deleting a topic or a partition migration causing a disk to go 
> offline or the broker crashed.
> [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23
>  17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException:
>  
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at 
> kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log.maybeHandleIOException(Log.scala:1837) at 
> kafka.log.Log.renameDir(Log.scala:687) at 
> kafka.log.LogManager.asyncDelete(LogManager.scala:833) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at 
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at 
> kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at 
> kafka.cluster.Partition.delete(Partition.scala:262) at 
> kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371)
>  at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at 
> kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:113) at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at 
> java.lang.Thread.run(Unknown Source) Suppressed: 
> java.nio.file.AccessDeniedException: 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) 
> ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler 
> kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving 
> replicas in dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-10886) Kafka crashed in windows environment2

2020-12-23 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen reassigned KAFKA-10886:


Assignee: Wenbing Shen

> Kafka crashed in windows environment2
> -
>
> Key: KAFKA-10886
> URL: https://issues.apache.org/jira/browse/KAFKA-10886
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.0.0
> Environment: Windows Server
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Critical
>  Labels: windows
> Attachments: windows_kafka_full_crash.patch
>
>
> I tried using the 
> [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix 
> the Kafka problem in the Windows environment, but it didn't seem to 
> work.These include restarting the Kafka service causing data to be deleted by 
> mistake, deleting a topic or a partition migration causing a disk to go 
> offline or the broker crashed.
> [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23
>  17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException:
>  
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at 
> kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log.maybeHandleIOException(Log.scala:1837) at 
> kafka.log.Log.renameDir(Log.scala:687) at 
> kafka.log.LogManager.asyncDelete(LogManager.scala:833) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at 
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at 
> kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at 
> kafka.cluster.Partition.delete(Partition.scala:262) at 
> kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371)
>  at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at 
> kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:113) at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at 
> java.lang.Thread.run(Unknown Source) Suppressed: 
> java.nio.file.AccessDeniedException: 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) 
> ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler 
> kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving 
> replicas in dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-9458) Kafka crashed in windows environment

2020-12-23 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251530#comment-17251530
 ] 

Wenbing Shen edited comment on KAFKA-9458 at 12/23/20, 9:53 AM:


The current patch is deficient. When topic is deleted or partition migration is 
carried out, the service will still be suspended or the disk will be offline. I 
have provided the following patch file, which is effective for self-test,My 
Kafka version is 2.0.0 .

[^kafka_windows_crash_by_delete_topic_and_Partition_migration]

 

https://issues.apache.org/jira/browse/KAFKA-10886  I provide a complete and 
effective patch.


was (Author: wenbing.shen):
The current patch is deficient. When topic is deleted or partition migration is 
carried out, the service will still be suspended or the disk will be offline. I 
have provided the following patch file, which is effective for self-test,My 
Kafka version is 2.0.0 .

[^kafka_windows_crash_by_delete_topic_and_Partition_migration]

> Kafka crashed in windows environment
> 
>
> Key: KAFKA-9458
> URL: https://issues.apache.org/jira/browse/KAFKA-9458
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.4.0
> Environment: Windows Server 2019
>Reporter: hirik
>Priority: Critical
>  Labels: windows
> Attachments: Windows_crash_fix.patch, 
> kafka_windows_crash_by_delete_topic_and_Partition_migration, logs.zip
>
>
> Hi,
> while I was trying to validate Kafka retention policy, Kafka Server crashed 
> with below exception trace. 
> [2020-01-21 17:10:40,475] INFO [Log partition=test1-3, 
> dir=C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka] 
> Rolled new log segment at offset 1 in 52 ms. (kafka.log.Log)
> [2020-01-21 17:10:40,484] ERROR Error while deleting segments for test1-3 in 
> dir C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka 
> (kafka.server.LogDirFailureChannel)
> java.nio.file.FileSystemException: 
> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex
>  -> 
> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex.deleted:
>  The process cannot access the file because it is being used by another 
> process.
> at 
> java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
>  at 
> java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
>  at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395)
>  at 
> java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
>  at java.base/java.nio.file.Files.move(Files.java:1425)
>  at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:795)
>  at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209)
>  at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:497)
>  at kafka.log.Log.$anonfun$deleteSegmentFiles$1(Log.scala:2206)
>  at kafka.log.Log.$anonfun$deleteSegmentFiles$1$adapted(Log.scala:2206)
>  at scala.collection.immutable.List.foreach(List.scala:305)
>  at kafka.log.Log.deleteSegmentFiles(Log.scala:2206)
>  at kafka.log.Log.removeAndDeleteSegments(Log.scala:2191)
>  at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1700)
>  at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.scala:17)
>  at kafka.log.Log.maybeHandleIOException(Log.scala:2316)
>  at kafka.log.Log.deleteSegments(Log.scala:1691)
>  at kafka.log.Log.deleteOldSegments(Log.scala:1686)
>  at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1763)
>  at kafka.log.Log.deleteOldSegments(Log.scala:1753)
>  at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:982)
>  at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:979)
>  at scala.collection.immutable.List.foreach(List.scala:305)
>  at kafka.log.LogManager.cleanupLogs(LogManager.scala:979)
>  at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:403)
>  at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>  at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>  at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:830)
>  Suppressed: java.nio.file.FileSystemException: 
> 

[jira] [Comment Edited] (KAFKA-10886) Kafka crashed in windows environment2

2020-12-23 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253993#comment-17253993
 ] 

Wenbing Shen edited comment on KAFKA-10886 at 12/23/20, 9:49 AM:
-

After reference kafka-9458, I provided the above patch, which completely solves 
the kafka compatibility problem with Windows, and this is a qualified patch


was (Author: wenbing.shen):
I refer to [kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] and 
completely fix the kafka compatibility problem in Windows environment. This is 
a qualified patch.

> Kafka crashed in windows environment2
> -
>
> Key: KAFKA-10886
> URL: https://issues.apache.org/jira/browse/KAFKA-10886
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.0.0
> Environment: Windows Server
>Reporter: Wenbing Shen
>Priority: Critical
>  Labels: windows
> Attachments: windows_kafka_full_crash.patch
>
>
> I tried using the 
> [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] patch to fix 
> the Kafka problem in the Windows environment, but it didn't seem to 
> work.These include restarting the Kafka service causing data to be deleted by 
> mistake, deleting a topic or a partition migration causing a disk to go 
> offline or the broker crashed.
> [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23
>  17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException:
>  
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at 
> kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log.maybeHandleIOException(Log.scala:1837) at 
> kafka.log.Log.renameDir(Log.scala:687) at 
> kafka.log.LogManager.asyncDelete(LogManager.scala:833) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at 
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at 
> kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at 
> kafka.cluster.Partition.delete(Partition.scala:262) at 
> kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371)
>  at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at 
> kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:113) at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at 
> java.lang.Thread.run(Unknown Source) Suppressed: 
> java.nio.file.AccessDeniedException: 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) 
> ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler 
> kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] 

[jira] [Updated] (KAFKA-10886) Kafka crashed in windows environment2

2020-12-23 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-10886:
-
External issue URL:   (was: 
https://issues.apache.org/jira/browse/KAFKA-9458)

> Kafka crashed in windows environment2
> -
>
> Key: KAFKA-10886
> URL: https://issues.apache.org/jira/browse/KAFKA-10886
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.0.0
> Environment: Windows Server
>Reporter: Wenbing Shen
>Priority: Critical
>  Labels: windows
>
> I tried using the Kafka-9458 patch to fix the Kafka problem in the Windows 
> environment, but it didn't seem to work.These include restarting the Kafka 
> service causing data to be deleted by mistake, deleting a topic or a 
> partition migration causing a disk to go offline or the broker crashed.
> [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23
>  17:26:11,124] ERROR (kafka-request-handler-11 
> kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 
> in log dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException:
>  
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at 
> kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
> kafka.log.Log.maybeHandleIOException(Log.scala:1837) at 
> kafka.log.Log.renameDir(Log.scala:687) at 
> kafka.log.LogManager.asyncDelete(LogManager.scala:833) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at 
> kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at 
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at 
> kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at 
> kafka.cluster.Partition.delete(Partition.scala:262) at 
> kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371)
>  at 
> kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at 
> kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:113) at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at 
> java.lang.Thread.run(Unknown Source) Suppressed: 
> java.nio.file.AccessDeniedException: 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
>  -> 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
>  at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
> sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
> sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
> sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
> java.nio.file.Files.move(Unknown Source) at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) 
> ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler 
> kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving 
> replicas in dir 
> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10886) Kafka crashed in windows environment2

2020-12-23 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-10886:
-
Description: 
I tried using the [Kafka-9458|https://issues.apache.org/jira/browse/KAFKA-9458] 
patch to fix the Kafka problem in the Windows environment, but it didn't seem 
to work.These include restarting the Kafka service causing data to be deleted 
by mistake, deleting a topic or a partition migration causing a disk to go 
offline or the broker crashed.

[2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 
kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 in 
log dir 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23
 17:26:11,124] ERROR (kafka-request-handler-11 
kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 in 
log dir 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException:
 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
 -> 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
 at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
java.nio.file.Files.move(Unknown Source) at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at 
kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at 
kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
kafka.log.Log.maybeHandleIOException(Log.scala:1837) at 
kafka.log.Log.renameDir(Log.scala:687) at 
kafka.log.LogManager.asyncDelete(LogManager.scala:833) at 
kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at 
kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at 
kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at 
kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at 
kafka.cluster.Partition.delete(Partition.scala:262) at 
kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at 
kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371)
 at 
kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369)
 at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at 
kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at 
kafka.server.KafkaApis.handle(KafkaApis.scala:113) at 
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at 
java.lang.Thread.run(Unknown Source) Suppressed: 
java.nio.file.AccessDeniedException: 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
 -> 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
 at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
java.nio.file.Files.move(Unknown Source) at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) ... 
23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler 
kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving 
replicas in dir 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log

  was:
I tried using the Kafka-9458 patch to fix the Kafka problem in the Windows 
environment, but it didn't seem to work.These include restarting the Kafka 
service causing data to be deleted by mistake, deleting a topic or a partition 
migration causing a disk to go offline or the broker crashed.

[2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 
kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 in 
log dir 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23
 17:26:11,124] ERROR (kafka-request-handler-11 
kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 in 
log dir 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException:
 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
 -> 

[jira] [Created] (KAFKA-10886) Kafka crashed in windows environment2

2020-12-23 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-10886:


 Summary: Kafka crashed in windows environment2
 Key: KAFKA-10886
 URL: https://issues.apache.org/jira/browse/KAFKA-10886
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 2.0.0
 Environment: Windows Server
Reporter: Wenbing Shen


I tried using the Kafka-9458 patch to fix the Kafka problem in the Windows 
environment, but it didn't seem to work.These include restarting the Kafka 
service causing data to be deleted by mistake, deleting a topic or a partition 
migration causing a disk to go offline or the broker crashed.

[2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 
kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 in 
log dir 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23
 17:26:11,124] ERROR (kafka-request-handler-11 
kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 in 
log dir 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException:
 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
 -> 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
 at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
java.nio.file.Files.move(Unknown Source) at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at 
kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at 
kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at 
kafka.log.Log.maybeHandleIOException(Log.scala:1837) at 
kafka.log.Log.renameDir(Log.scala:687) at 
kafka.log.LogManager.asyncDelete(LogManager.scala:833) at 
kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at 
kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at 
kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at 
kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at 
kafka.cluster.Partition.delete(Partition.scala:262) at 
kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at 
kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371)
 at 
kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369)
 at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at 
kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at 
kafka.server.KafkaApis.handle(KafkaApis.scala:113) at 
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at 
java.lang.Thread.run(Unknown Source) Suppressed: 
java.nio.file.AccessDeniedException: 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1
 -> 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete
 at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at 
sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at 
sun.nio.fs.WindowsFileCopy.move(Unknown Source) at 
sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at 
java.nio.file.Files.move(Unknown Source) at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) ... 
23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler 
kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving 
replicas in dir 
E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10882) When sending a response to the client,a null pointer exception has occurred in the error code set

2020-12-22 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-10882:
-
Attachment: 0880c08b0110fb95d40a.png

> When sending a response to the client,a null pointer exception has occurred 
> in the error code set
> -
>
> Key: KAFKA-10882
> URL: https://issues.apache.org/jira/browse/KAFKA-10882
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.11.0.4
>Reporter: Wenbing Shen
>Priority: Major
> Attachments: 0880c08b0110fb95d40a.png, 0880c08b0110fbd3bb0d.png, 
> 0880c08b0110fbd3fb0e.png
>
>
> After the IO thread receives the production request, a null-pointer exception 
> occurs when the error code set is parsed when the response error is returned 
> to the client.A replica grab thread failed to resolve the protocol on another 
> proxy node.I don't know if this is the same problem. I hope I can get help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10882) When sending a response to the client,a null pointer exception has occurred in the error code set

2020-12-22 Thread Wenbing Shen (Jira)
Wenbing Shen created KAFKA-10882:


 Summary: When sending a response to the client,a null pointer 
exception has occurred in the error code set
 Key: KAFKA-10882
 URL: https://issues.apache.org/jira/browse/KAFKA-10882
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.11.0.4
Reporter: Wenbing Shen
 Attachments: 0880c08b0110fbd3bb0d.png, 0880c08b0110fbd3fb0e.png

After the IO thread receives the production request, a null-pointer exception 
occurs when the error code set is parsed when the response error is returned to 
the client.A replica grab thread failed to resolve the protocol on another 
proxy node.I don't know if this is the same problem. I hope I can get help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-9458) Kafka crashed in windows environment

2020-12-17 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251530#comment-17251530
 ] 

Wenbing Shen edited comment on KAFKA-9458 at 12/18/20, 5:57 AM:


The current patch is deficient. When topic is deleted or partition migration is 
carried out, the service will still be suspended or the disk will be offline. I 
have provided the following patch file, which is effective for self-test,My 
Kafka version is 2.0.0 .

[^kafka_windows_crash_by_delete_topic_and_Partition_migration]


was (Author: wenbing.shen):
The current patch is deficient. When topic is deleted or partition migration is 
carried out, the service will still be suspended or the disk will be offline. I 
have provided the following patch file, which is effective for self-test

[^kafka_windows_crash_by_delete_topic_and_Partition_migration]

> Kafka crashed in windows environment
> 
>
> Key: KAFKA-9458
> URL: https://issues.apache.org/jira/browse/KAFKA-9458
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.4.0
> Environment: Windows Server 2019
>Reporter: hirik
>Priority: Critical
>  Labels: windows
> Attachments: Windows_crash_fix.patch, 
> kafka_windows_crash_by_delete_topic_and_Partition_migration, logs.zip
>
>
> Hi,
> while I was trying to validate Kafka retention policy, Kafka Server crashed 
> with below exception trace. 
> [2020-01-21 17:10:40,475] INFO [Log partition=test1-3, 
> dir=C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka] 
> Rolled new log segment at offset 1 in 52 ms. (kafka.log.Log)
> [2020-01-21 17:10:40,484] ERROR Error while deleting segments for test1-3 in 
> dir C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka 
> (kafka.server.LogDirFailureChannel)
> java.nio.file.FileSystemException: 
> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex
>  -> 
> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex.deleted:
>  The process cannot access the file because it is being used by another 
> process.
> at 
> java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
>  at 
> java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
>  at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395)
>  at 
> java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
>  at java.base/java.nio.file.Files.move(Files.java:1425)
>  at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:795)
>  at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209)
>  at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:497)
>  at kafka.log.Log.$anonfun$deleteSegmentFiles$1(Log.scala:2206)
>  at kafka.log.Log.$anonfun$deleteSegmentFiles$1$adapted(Log.scala:2206)
>  at scala.collection.immutable.List.foreach(List.scala:305)
>  at kafka.log.Log.deleteSegmentFiles(Log.scala:2206)
>  at kafka.log.Log.removeAndDeleteSegments(Log.scala:2191)
>  at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1700)
>  at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.scala:17)
>  at kafka.log.Log.maybeHandleIOException(Log.scala:2316)
>  at kafka.log.Log.deleteSegments(Log.scala:1691)
>  at kafka.log.Log.deleteOldSegments(Log.scala:1686)
>  at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1763)
>  at kafka.log.Log.deleteOldSegments(Log.scala:1753)
>  at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:982)
>  at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:979)
>  at scala.collection.immutable.List.foreach(List.scala:305)
>  at kafka.log.LogManager.cleanupLogs(LogManager.scala:979)
>  at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:403)
>  at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>  at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>  at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:830)
>  Suppressed: java.nio.file.FileSystemException: 
> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex
>  -> 
> 

[jira] [Updated] (KAFKA-9458) Kafka crashed in windows environment

2020-12-17 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-9458:

Attachment: kafka_windows_crash_by_delete_topic_and_Partition_migration

> Kafka crashed in windows environment
> 
>
> Key: KAFKA-9458
> URL: https://issues.apache.org/jira/browse/KAFKA-9458
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.4.0
> Environment: Windows Server 2019
>Reporter: hirik
>Priority: Critical
>  Labels: windows
> Attachments: Windows_crash_fix.patch, 
> kafka_windows_crash_by_delete_topic_and_Partition_migration, logs.zip
>
>
> Hi,
> while I was trying to validate Kafka retention policy, Kafka Server crashed 
> with below exception trace. 
> [2020-01-21 17:10:40,475] INFO [Log partition=test1-3, 
> dir=C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka] 
> Rolled new log segment at offset 1 in 52 ms. (kafka.log.Log)
> [2020-01-21 17:10:40,484] ERROR Error while deleting segments for test1-3 in 
> dir C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka 
> (kafka.server.LogDirFailureChannel)
> java.nio.file.FileSystemException: 
> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex
>  -> 
> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex.deleted:
>  The process cannot access the file because it is being used by another 
> process.
> at 
> java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
>  at 
> java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
>  at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395)
>  at 
> java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
>  at java.base/java.nio.file.Files.move(Files.java:1425)
>  at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:795)
>  at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209)
>  at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:497)
>  at kafka.log.Log.$anonfun$deleteSegmentFiles$1(Log.scala:2206)
>  at kafka.log.Log.$anonfun$deleteSegmentFiles$1$adapted(Log.scala:2206)
>  at scala.collection.immutable.List.foreach(List.scala:305)
>  at kafka.log.Log.deleteSegmentFiles(Log.scala:2206)
>  at kafka.log.Log.removeAndDeleteSegments(Log.scala:2191)
>  at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1700)
>  at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.scala:17)
>  at kafka.log.Log.maybeHandleIOException(Log.scala:2316)
>  at kafka.log.Log.deleteSegments(Log.scala:1691)
>  at kafka.log.Log.deleteOldSegments(Log.scala:1686)
>  at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1763)
>  at kafka.log.Log.deleteOldSegments(Log.scala:1753)
>  at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:982)
>  at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:979)
>  at scala.collection.immutable.List.foreach(List.scala:305)
>  at kafka.log.LogManager.cleanupLogs(LogManager.scala:979)
>  at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:403)
>  at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>  at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>  at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:830)
>  Suppressed: java.nio.file.FileSystemException: 
> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex
>  -> 
> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex.deleted:
>  The process cannot access the file because it is being used by another 
> process.
> at 
> java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
>  at 
> java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
>  at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:309)
>  at 
> java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
>  at java.base/java.nio.file.Files.move(Files.java:1425)
>  at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:792)
>  ... 27 more
> 

[jira] [Commented] (KAFKA-9458) Kafka crashed in windows environment

2020-12-17 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251530#comment-17251530
 ] 

Wenbing Shen commented on KAFKA-9458:
-

The current patch is deficient. When topic is deleted or partition migration is 
carried out, the service will still be suspended or the disk will be offline. I 
have provided the following patch file, which is effective for self-test

[^kafka_windows_crash_by_delete_topic_and_Partition_migration]

> Kafka crashed in windows environment
> 
>
> Key: KAFKA-9458
> URL: https://issues.apache.org/jira/browse/KAFKA-9458
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.4.0
> Environment: Windows Server 2019
>Reporter: hirik
>Priority: Critical
>  Labels: windows
> Attachments: Windows_crash_fix.patch, 
> kafka_windows_crash_by_delete_topic_and_Partition_migration, logs.zip
>
>
> Hi,
> while I was trying to validate Kafka retention policy, Kafka Server crashed 
> with below exception trace. 
> [2020-01-21 17:10:40,475] INFO [Log partition=test1-3, 
> dir=C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka] 
> Rolled new log segment at offset 1 in 52 ms. (kafka.log.Log)
> [2020-01-21 17:10:40,484] ERROR Error while deleting segments for test1-3 in 
> dir C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka 
> (kafka.server.LogDirFailureChannel)
> java.nio.file.FileSystemException: 
> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex
>  -> 
> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex.deleted:
>  The process cannot access the file because it is being used by another 
> process.
> at 
> java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
>  at 
> java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
>  at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395)
>  at 
> java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
>  at java.base/java.nio.file.Files.move(Files.java:1425)
>  at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:795)
>  at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209)
>  at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:497)
>  at kafka.log.Log.$anonfun$deleteSegmentFiles$1(Log.scala:2206)
>  at kafka.log.Log.$anonfun$deleteSegmentFiles$1$adapted(Log.scala:2206)
>  at scala.collection.immutable.List.foreach(List.scala:305)
>  at kafka.log.Log.deleteSegmentFiles(Log.scala:2206)
>  at kafka.log.Log.removeAndDeleteSegments(Log.scala:2191)
>  at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1700)
>  at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.scala:17)
>  at kafka.log.Log.maybeHandleIOException(Log.scala:2316)
>  at kafka.log.Log.deleteSegments(Log.scala:1691)
>  at kafka.log.Log.deleteOldSegments(Log.scala:1686)
>  at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1763)
>  at kafka.log.Log.deleteOldSegments(Log.scala:1753)
>  at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:982)
>  at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:979)
>  at scala.collection.immutable.List.foreach(List.scala:305)
>  at kafka.log.LogManager.cleanupLogs(LogManager.scala:979)
>  at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:403)
>  at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>  at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>  at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:830)
>  Suppressed: java.nio.file.FileSystemException: 
> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex
>  -> 
> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\.timeindex.deleted:
>  The process cannot access the file because it is being used by another 
> process.
> at 
> java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
>  at 
> java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
>  at 

[jira] [Commented] (KAFKA-10672) Restarting Kafka always takes a lot of time

2020-12-03 Thread Wenbing Shen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243730#comment-17243730
 ] 

Wenbing Shen commented on KAFKA-10672:
--

* We increased the batch number of one IO read

 * After many tests, the startup speed increased by 50% on average

 * See the attached file for detailed code

> Restarting Kafka always takes a lot of time
> ---
>
> Key: KAFKA-10672
> URL: https://issues.apache.org/jira/browse/KAFKA-10672
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.0
> Environment: A cluster of 21 Kafka nodes;
> Each node has 12 disks;
> Each node has about 1500 partitions;
> There are approximately 700 leader partitions per node;
> Slow-loading partitions have about 1000 log segments;
>Reporter: Wenbing Shen
>Priority: Major
> Attachments: AbstractIterator.java, AbstractIteratorOfRestart.java, 
> AbstractLegacyRecordBatch.java, ByteBufferLogInputStream.java, 
> DefaultRecordBatch.java, FileLogInputStream.java, FileRecords.java, 
> LazyDownConversionRecords.java, Log.scala, LogInputStream.java, 
> LogManager.scala, LogSegment.scala, MemoryRecords.java, 
> RecordBatchIterator.java, RecordBatchIteratorOfRestart.java, Records.java, 
> server.log
>
>
> When the snapshot file does not exist, or the latest snapshot file before the 
> current active period, restoring the state of producers will traverse the log 
> section, it will traverse the log all batch, in the period when the 
> individual broker node partition number many, that there are most of the 
> number of logs, can cause a lot of IO number, IO will only load one batch at 
> a time, such as a log there will always be in the tens of thousands of batch, 
> I found that in the code for each batch are at least two IO operation, when a 
> batch as the default 16 KB,When a log segment is 1G, 65,536 batches will be 
> generated, and then at least 65,536 *2= 131,072 IO operations will be 
> generated, which will lead to a lot of time spent in kafka startup process. 
> We configured 15 log recovery threads in the production environment, and it 
> still took more than 2 hours to load a partition,can community puts forward 
> some proposals to the situation or improve.For detailed logs, see the section 
> on test-perf-18 partitions in the nearby logs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10672) Restarting Kafka always takes a lot of time

2020-12-03 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated KAFKA-10672:
-
Attachment: Records.java
RecordBatchIteratorOfRestart.java
RecordBatchIterator.java
MemoryRecords.java
LogSegment.scala
LogManager.scala
LogInputStream.java
Log.scala
LazyDownConversionRecords.java
FileRecords.java
FileLogInputStream.java
DefaultRecordBatch.java
ByteBufferLogInputStream.java
AbstractLegacyRecordBatch.java
AbstractIteratorOfRestart.java
AbstractIterator.java

> Restarting Kafka always takes a lot of time
> ---
>
> Key: KAFKA-10672
> URL: https://issues.apache.org/jira/browse/KAFKA-10672
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.0
> Environment: A cluster of 21 Kafka nodes;
> Each node has 12 disks;
> Each node has about 1500 partitions;
> There are approximately 700 leader partitions per node;
> Slow-loading partitions have about 1000 log segments;
>Reporter: Wenbing Shen
>Priority: Major
> Attachments: AbstractIterator.java, AbstractIteratorOfRestart.java, 
> AbstractLegacyRecordBatch.java, ByteBufferLogInputStream.java, 
> DefaultRecordBatch.java, FileLogInputStream.java, FileRecords.java, 
> LazyDownConversionRecords.java, Log.scala, LogInputStream.java, 
> LogManager.scala, LogSegment.scala, MemoryRecords.java, 
> RecordBatchIterator.java, RecordBatchIteratorOfRestart.java, Records.java, 
> server.log
>
>
> When the snapshot file does not exist, or the latest snapshot file before the 
> current active period, restoring the state of producers will traverse the log 
> section, it will traverse the log all batch, in the period when the 
> individual broker node partition number many, that there are most of the 
> number of logs, can cause a lot of IO number, IO will only load one batch at 
> a time, such as a log there will always be in the tens of thousands of batch, 
> I found that in the code for each batch are at least two IO operation, when a 
> batch as the default 16 KB,When a log segment is 1G, 65,536 batches will be 
> generated, and then at least 65,536 *2= 131,072 IO operations will be 
> generated, which will lead to a lot of time spent in kafka startup process. 
> We configured 15 log recovery threads in the production environment, and it 
> still took more than 2 hours to load a partition,can community puts forward 
> some proposals to the situation or improve.For detailed logs, see the section 
> on test-perf-18 partitions in the nearby logs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)