Dong Lin created KAFKA-6604: ------------------------------- Summary: ReplicaManager should not remove partitions on the log dirctory from high watermark checkpoint file Key: KAFKA-6604 URL: https://issues.apache.org/jira/browse/KAFKA-6604 Project: Kafka Issue Type: Bug Reporter: Dong Lin Assignee: Dong Lin
Currently a broker may truncate a partition to log start offset in the following scenario: - Broker A is restarted after shutdown - Controller knows that broker A is started. - Som event (e.g. topic deletion) triggered controller to send LeaderAndIsrRequest for partition P1. - Broker A receives LeaderAndIsrRequest for partition P1. After the broker receives the first LeaderAndIsrRequest, it will overwrite the HW checkpoint file with all its leader partitions and follower partitions. The checkpoint file will contain only the HW for partition P1. - Controller sends broker A a LeaderAndIsrRequest for all its leader and follower partitions. - Broker creates ReplicaFetcherThread for its follower partitions, truncates the log to HW, which will be zero for all partitions except P1. When this happens, potentially all logs in the broker will be truncated to log start offset and then the cluster will run with reduced availability for a long time. The right solution is to keep the partitions in the high watermark checkpoint file if the partition exists in LogManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005)