[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412379#comment-13412379
 ] 

Bill Bridge commented on ZOOKEEPER-1453:
----------------------------------------

Yes a full redesign of the logging system would be too much for this problem. 
Corruptions usually come from software bugs that misdirect data when writing or 
when the storage accidentally fails to write (note that writing to the wrong 
place can look like sa lost write at the correct place). Another common source 
is administrator mistakes like copying data to the wrong file or simultaneously 
assigning the same file to two different uses. Bit flips within a sector do not 
happen with disks. I think the CRC is a reasonable check value for the kinds of 
corruption we are likely to encounter.

Sorry, I omitted a critical point about preformattting logs. The same log file 
is used over and over again so that there is no allocation when writing. Before 
the first use, the log is initialized to contain valid empty blocks for log 
sequence zero. Since every reuse is at a higher sequence number the EOF is the 
first block with a lower sequence number than the sequence number recorded in 
the log header. This is how Oracle writes online logs. [Oracle Online Redo 
Log|http://docs.oracle.com/cd/E11882_01/server.112/e25789/physical.htm#i1006163]

ZooKeeper is different, it uses a new file for every log. It incrementally 
preallocates with zero to batch the allocations, and the zeroes are not forced 
to disk. The real data writes usually overwrite the zeroes in the filesystem 
buffer cache. Thus the zeroes are not likely to be on disk if there is a 
partial write due to a crash. I suppose there are times when the fsync 
unnecessarily forces the zeros to disk. I guess the consequences of a crash 
during fsync are file system dependent. Maybe checking for the 0x42 being 0 at 
the end of a record indicates a partial record when the zeros were flushed 
earlier, and EOF in the middle of a record means a partial write as well.


                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 
> 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 
> 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that 
> during server recovery 
> org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() 
> does not indicate if the available logs are valid or not. In some cases (say 
> a truncated record and a single txnlog in the datadir) we will not detect 
> that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to