[
https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408457#comment-13408457
]
Bill Bridge commented on ZOOKEEPER-1453:
----------------------------------------
The ZooKeeper log records are not sector aligned on disk. When you crash a
server there can be disk I/O in flight that is an integral number of sectors.
Some will complete and some will not. The completion order is not guaranteed to
be the same as the order in the file. There is a reasonable chance that the
last log record will be partial. There may be some unwritten sectors in the
file and then some blocks with records.
The code needs to recognize that partial records at the end of a log are a
possibility and pretend they were not written. One hazard with doing that is a
corruption in the middle of a log might be considered an EOF. One sanity check
would be to include in every log record the highest known to be persistent
record id. After finding the end of the log the code could scan farther for a
valid record and declare the log corrupt if the valid record implies the log
was previously committed beyond the apparent EOF.
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
> Key: ZOOKEEPER-1453
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.3.3
> Reporter: Patrick Hunt
> Priority: Critical
> Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz,
> 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz,
> 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that
> during server recovery
> org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next()
> does not indicate if the available logs are valid or not. In some cases (say
> a truncated record and a single txnlog in the datadir) we will not detect
> that the file is corrupt, vs reaching the end of the file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira