weiguang liu created KAFKA-15120: ------------------------------------ Summary: Optimizing Recovery Time for Non-Transactional and Idempotent Partitions in Kafka Key: KAFKA-15120 URL: https://issues.apache.org/jira/browse/KAFKA-15120 Project: Kafka Issue Type: Improvement Components: core Reporter: weiguang liu
Kafka's recovery logic involves rebuilding the index and producerStats from the log segment after the recovery point. In scenarios where a broker has a large number of partitions, the recovery time can become very long. For example, when a broker has 1,000 partitions and the average log segment size is 1GB, the broker may require reading as much as 500GB of log data for recovery, which can be unbearable. Most of the partitions might not be using transactions and idempotency, so can we consider using a recovery method that starts from the recovery point for those partitions that do not use transactions and idempotency, instead of starting the recovery from the beginning of the entire log segment? My understanding is that for non-transactional and idempotent partitions, the index is append-only and can be completely recovered from the recovery point, rather than from the start offset of the log segment. I am not sure what the potential risks of this approach might be or why the community did not consider it. -- This message was sent by Atlassian Jira (v8.20.10#820010)