Aishwarya Ganesan created KAFKA-4009:
----------------------------------------
Summary: Data corruption or EIO leads to data loss
Key: KAFKA-4009
URL: https://issues.apache.org/jira/browse/KAFKA-4009
Project: Kafka
Issue Type: Bug
Components: log
Affects Versions: 0.9.0.0
Reporter: Aishwarya Ganesan
I have a 3 node kafka cluster (N1,N2 and N3) with
log.flush.interval.messages=1, min.insync.replicas=3 and
unclean.leader.election.enable=false and a single Zookeeper node. My workload
inserts few messages and on completion of the workload, the
recovery-point-offset-checkpoint reflects the latest offset of the messages
committed.
I have a small testing tool that drives distributed applications into corner
cases by simulating possible error conditions like EIO, ENOSPC and EDQUOT that
can be encountered in all modern file systems such as ext4. The tool also
simulates on-disk silent data corruption.
When I introduce silent data corruption in a node (say N1) in the ISR, Kafka is
able to detect corruption using checksum and ignores the log entry from that
point onwards. Even though N1 has lost log entries and
recovery-point-offset-checkpoint file in N1 indicates the latest offsets, N1 is
allowed to become the leader because it is in the ISR. Also, the other nodes
N2 and N3 crash with the following log message:
FATAL [ReplicaFetcherThread-0-1], Halting because log truncation is not allowed
for topic my-topic1, Current leader 1's latest offset 0 is less than replica
3's latest offset 1 (kafka.server.ReplicaFetcherThread)
The end result is that a silent data corruption leads to data loss because
querying the cluster returns only messages before the corrupted entry. Note
that the cluster at this point has only N1. This situation could have been
avoided if the node N1 which had to ignore the log entry wasn't allowed to
become the leader. This scenario wouldn't happen in a majority based leader
election as other nodes (N2 or N3) would have denied vote for N1 because N1's
log is not complete compared to N2 or N3's log.
If this scenario happens in any of the followers, it ignores the log entry and
copies data from the leader and there is no data loss.
Encountering an EIO thrown by the file system for a particular block results in
the same consequence of data loss on querying the cluster and the remaining two
nodes crash. An EIO on read could be thrown for a variety of reasons including
a latent sector error of one or more sectors on disk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)