[
https://issues.apache.org/jira/browse/ZOOKEEPER-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087178#comment-13087178
]
Vishal Kathuria commented on ZOOKEEPER-1156:
--------------------------------------------
Here is the scenario
Lets say the current leader A is at zxid 80.
A participant B with zxid 81 joins and gets a message from leader TRUNC,80
B then calculates the length of log up till zxid 80. The actual length is, say
450, but because of the bug, the value it calculates is 420. B truncates the
log to size 420.
When loadDatabase is called again, the log is replayed till 79 because log
record 80 isn't complete.
The node B doesn't have the change that had zxid 80. The leader will not send
change 80 to B either.
In my manual repro, the change with zxid 80 was a create. I could see the
created node when I connected to A but not when connected to B.
> Log truncation truncating log too much - can cause data loss
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-1156
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1156
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum, server
> Affects Versions: 3.3.3
> Reporter: Vishal Kathuria
> Priority: Blocker
> Fix For: 3.3.4
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> The log truncation relies on position calculation for a particular zxid to
> figure out the new size of the log file. There is a bug in
> PositionInputStream implementation which skips counting the bytes in the log
> which have value 0. This can lead to underestimating the actual log size. The
> log records which should be there can get truncated, leading to data loss on
> the participant which is executing the trunc.
> Clients can see different values depending on whether they connect to the
> node on which trunc was executed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira