Imran Patel created KAFKA-3039:
----------------------------------
Summary: Temporary loss of leader resulted in log being completely
truncated
Key: KAFKA-3039
URL: https://issues.apache.org/jira/browse/KAFKA-3039
Project: Kafka
Issue Type: Bug
Components: core
Affects Versions: 0.9.0.0
Environment: Debian 3.2.54-2 x86_64 GNU/Linux
Reporter: Imran Patel
Priority: Critical
We had an event recently where the temporarily loss of a leader for a partition
(during a manual restart), resulted in the leader coming back with no high
watermark state and truncating its log to zero. Logs (attached below) indicate
that it did have the data but not the commit state. How is this possible?
Leader (broker 3)
[2015-12-18 21:19:44,666] INFO Completed load of log messages-14 with log end
offset 14175963374 (kafka.log.Log)
[2015-12-18 21:19:45,170] INFO Partition [messages,14] on broker 3: No
checkpointed highwatermark is found for partition [messages,14]
(kafka.cluster.Partition)
[2015-12-18 21:19:45,238] INFO Truncating log messages-14 to offset 0.
(kafka.log.Log)
[2015-12-18 21:20:34,066] INFO Partition [messages,14] on broker 3: Expanding
ISR for partition [messages,14] from 3 to 3,10 (kafka.cluster.Partition)
Replica (broker 10)
[2015-12-18 21:19:19,525] INFO Partition [messages,14] on broker 10: Shrinking
ISR for partition [messages,14] from 3,10,4 to 10,4 (kafka.cluster.Partition)
[2015-12-18 21:20:34,049] ERROR [ReplicaFetcherThread-0-3], Current offset
14175984203 for partition [messages,14] out of range; reset offset to 35977
(kafka.server.ReplicaFetcherThread)
[2015-12-18 21:20:34,033] WARN [ReplicaFetcherThread-0-3], Replica 10 for
partition [messages,14] reset its fetch offset from 14175984203 to current
leader 3's latest offset 35977 (kafka.server.ReplicaFetcherThread)
Some relevant config parameters:
offsets.topic.replication.factor = 3
offsets.commit.required.acks = -1
replica.high.watermark.checkpoint.interval.ms = 5000
unclean.leader.election.enable = false
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)