[
https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412379#comment-13412379
]
Bill Bridge commented on ZOOKEEPER-1453:
----------------------------------------
Yes a full redesign of the logging system would be too much for this problem.
Corruptions usually come from software bugs that misdirect data when writing or
when the storage accidentally fails to write (note that writing to the wrong
place can look like sa lost write at the correct place). Another common source
is administrator mistakes like copying data to the wrong file or simultaneously
assigning the same file to two different uses. Bit flips within a sector do not
happen with disks. I think the CRC is a reasonable check value for the kinds of
corruption we are likely to encounter.
Sorry, I omitted a critical point about preformattting logs. The same log file
is used over and over again so that there is no allocation when writing. Before
the first use, the log is initialized to contain valid empty blocks for log
sequence zero. Since every reuse is at a higher sequence number the EOF is the
first block with a lower sequence number than the sequence number recorded in
the log header. This is how Oracle writes online logs. [Oracle Online Redo
Log|http://docs.oracle.com/cd/E11882_01/server.112/e25789/physical.htm#i1006163]
ZooKeeper is different, it uses a new file for every log. It incrementally
preallocates with zero to batch the allocations, and the zeroes are not forced
to disk. The real data writes usually overwrite the zeroes in the filesystem
buffer cache. Thus the zeroes are not likely to be on disk if there is a
partial write due to a crash. I suppose there are times when the fsync
unnecessarily forces the zeros to disk. I guess the consequences of a crash
during fsync are file system dependent. Maybe checking for the 0x42 being 0 at
the end of a record indicates a partial record when the zeros were flushed
earlier, and EOF in the middle of a record means a partial write as well.
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
> Key: ZOOKEEPER-1453
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.3.3
> Reporter: Patrick Hunt
> Priority: Critical
> Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz,
> 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz,
> 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that
> during server recovery
> org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next()
> does not indicate if the available logs are valid or not. In some cases (say
> a truncated record and a single txnlog in the datadir) we will not detect
> that the file is corrupt, vs reaching the end of the file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira