[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408457#comment-13408457
 ] 

Bill Bridge commented on ZOOKEEPER-1453:
----------------------------------------

The ZooKeeper log records are not sector aligned on disk. When you crash a 
server there can be disk I/O in flight that is an integral number of sectors. 
Some will complete and some will not. The completion order is not guaranteed to 
be the same as the order in the file.  There is a reasonable chance that the 
last log record will be partial. There may be some unwritten sectors in the 
file and then some blocks with records.

The code needs to recognize that partial records at the end of a log are a 
possibility and pretend they were not written. One hazard with doing that is a 
corruption in the middle of a log might be considered an EOF. One sanity check 
would be to include in every log record the highest known to be persistent 
record id. After finding the end of the log the code could scan farther for a 
valid record and declare the log corrupt if the valid record implies the log 
was previously committed beyond the apparent EOF.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 
> 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 
> 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that 
> during server recovery 
> org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() 
> does not indicate if the available logs are valid or not. In some cases (say 
> a truncated record and a single txnlog in the datadir) we will not detect 
> that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to