Sergey Maslyakov created ZOOKEEPER-1747:
-------------------------------------------

             Summary: Zookeeper server fails to start if transaction log file 
is corrupted
                 Key: ZOOKEEPER-1747
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1747
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.4.5
         Environment: Solaris10/x86, Java 1.6
            Reporter: Sergey Maslyakov


On multiple occasions when ZK was not able to write out a transaction log or a 
snapshot file, the consequent attempt to restart the server fails. Usually it 
happens when the underlying file system filled up; thus, preventing ZK server 
from writing out consistent data file.

Upon start-up, the server reads in the snapshot and the transaction log. If the 
deserializer fails and throws an exception, server terminates. Please see the 
stack trace below.

Server not coming up for whatever reason is often an undesirable condition. It 
would be nice to have an option to force-ignore parsing errors, especially, in 
the transaction log. A check sum on the data could be a possible solution to 
ensure the integrity and "parsability".

Another robustness enhancement could be via proper handling of the condition 
when snapshot or transaction log cannot be completely written to disk. 
Basically, better handling of write errors.


{noformat}
2013-08-28 12:05:30,732 ERROR [ZooKeeperServerMain] Unexpected exception, 
exiting abnormally
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at 
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at 
org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
        at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
        at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
        at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
        at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
        at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:160)
        at 
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
        at 
org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
        at 
org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:383)
        at 
org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
        at 
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
        at 
org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
        at 
org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:129)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to