[ 
https://issues.apache.org/jira/browse/KAFKA-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243730#comment-17243730
 ] 

Wenbing Shen commented on KAFKA-10672:
--------------------------------------

* We increased the batch number of one IO read

 * After many tests, the startup speed increased by 50% on average

 * See the attached file for detailed code

> Restarting Kafka always takes a lot of time
> -------------------------------------------
>
>                 Key: KAFKA-10672
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10672
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 2.0.0
>         Environment: A cluster of 21 Kafka nodes;
> Each node has 12 disks;
> Each node has about 1500 partitions;
> There are approximately 700 leader partitions per node;
> Slow-loading partitions have about 1000 log segments;
>            Reporter: Wenbing Shen
>            Priority: Major
>         Attachments: AbstractIterator.java, AbstractIteratorOfRestart.java, 
> AbstractLegacyRecordBatch.java, ByteBufferLogInputStream.java, 
> DefaultRecordBatch.java, FileLogInputStream.java, FileRecords.java, 
> LazyDownConversionRecords.java, Log.scala, LogInputStream.java, 
> LogManager.scala, LogSegment.scala, MemoryRecords.java, 
> RecordBatchIterator.java, RecordBatchIteratorOfRestart.java, Records.java, 
> server.log
>
>
> When the snapshot file does not exist, or the latest snapshot file before the 
> current active period, restoring the state of producers will traverse the log 
> section, it will traverse the log all batch, in the period when the 
> individual broker node partition number many, that there are most of the 
> number of logs, can cause a lot of IO number, IO will only load one batch at 
> a time, such as a log there will always be in the tens of thousands of batch, 
> I found that in the code for each batch are at least two IO operation, when a 
> batch as the default 16 KB,When a log segment is 1G, 65,536 batches will be 
> generated, and then at least 65,536 *2= 131,072 IO operations will be 
> generated, which will lead to a lot of time spent in kafka startup process. 
> We configured 15 log recovery threads in the production environment, and it 
> still took more than 2 hours to load a partition,can community puts forward 
> some proposals to the situation or improve.For detailed logs, see the section 
> on test-perf-18 partitions in the nearby logs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to