[
https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408948#comment-13408948
]
Bill Bridge commented on ZOOKEEPER-1453:
----------------------------------------
This is not just about the disk write cache. It is necessary to disable the
disk write cache to ensure a commit is truly persistent, but that is not the
only source of this problem.
The problem is that the OS maintains a queue of disk writes. Some number of
them have been submitted to the disk and some are still in the queue. These
writes are not necessarily in file block order. When there is a reboot the
write for the last data block may still be in the queue while the write to the
block before it has gone to the disk.
If the reboot does a hardware reset or is caused by power failure, a similar
thing can happen for the writes that have been given to the disk. Disks like to
have about 4 - 8 requests queued to them so that they can order them to reduce
seek/rotational latency. If there is a reset from the disk controller or loss
of power, the disk cannot complete the requests it was given. This can result
in some completing and some not. Since the disk reorders based on location of
the sectors and the current location of the disk arm and current location of
spindle rotation, it is not possible to predict what order writes will complete
in.
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
> Key: ZOOKEEPER-1453
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.3.3
> Reporter: Patrick Hunt
> Priority: Critical
> Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz,
> 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz,
> 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that
> during server recovery
> org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next()
> does not indicate if the available logs are valid or not. In some cases (say
> a truncated record and a single txnlog in the datadir) we will not detect
> that the file is corrupt, vs reaching the end of the file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira