[ https://issues.apache.org/jira/browse/KAFKA-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243730#comment-17243730 ]
Wenbing Shen commented on KAFKA-10672: -------------------------------------- * We increased the batch number of one IO read * After many tests, the startup speed increased by 50% on average * See the attached file for detailed code > Restarting Kafka always takes a lot of time > ------------------------------------------- > > Key: KAFKA-10672 > URL: https://issues.apache.org/jira/browse/KAFKA-10672 > Project: Kafka > Issue Type: Improvement > Components: core > Affects Versions: 2.0.0 > Environment: A cluster of 21 Kafka nodes; > Each node has 12 disks; > Each node has about 1500 partitions; > There are approximately 700 leader partitions per node; > Slow-loading partitions have about 1000 log segments; > Reporter: Wenbing Shen > Priority: Major > Attachments: AbstractIterator.java, AbstractIteratorOfRestart.java, > AbstractLegacyRecordBatch.java, ByteBufferLogInputStream.java, > DefaultRecordBatch.java, FileLogInputStream.java, FileRecords.java, > LazyDownConversionRecords.java, Log.scala, LogInputStream.java, > LogManager.scala, LogSegment.scala, MemoryRecords.java, > RecordBatchIterator.java, RecordBatchIteratorOfRestart.java, Records.java, > server.log > > > When the snapshot file does not exist, or the latest snapshot file before the > current active period, restoring the state of producers will traverse the log > section, it will traverse the log all batch, in the period when the > individual broker node partition number many, that there are most of the > number of logs, can cause a lot of IO number, IO will only load one batch at > a time, such as a log there will always be in the tens of thousands of batch, > I found that in the code for each batch are at least two IO operation, when a > batch as the default 16 KB,When a log segment is 1G, 65,536 batches will be > generated, and then at least 65,536 *2= 131,072 IO operations will be > generated, which will lead to a lot of time spent in kafka startup process. > We configured 15 log recovery threads in the production environment, and it > still took more than 2 hours to load a partition,can community puts forward > some proposals to the situation or improve.For detailed logs, see the section > on test-perf-18 partitions in the nearby logs -- This message was sent by Atlassian Jira (v8.3.4#803005)