[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400163#comment-13400163
 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

Flavio, 

This is on Linux servers, and we're trying to simulate non-graceful node 
failures, so we're calling "reboot -f". Since this doesn't call shutdown, it 
doesn't allow zookeeper a chance to gracefully shutdown. What I suspect is 
happening is if zookeeper happened to be in the middle of writing its logs or 
snapshots out to disk, this would get truncated or suffer some other file 
system corruption. When it comes back up we restart zookeeper and it never 
rejoins the ensemble. 
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that 
> during server recovery 
> org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() 
> does not indicate if the available logs are valid or not. In some cases (say 
> a truncated record and a single txnlog in the datadir) we will not detect 
> that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to