Sergey Maslyakov created ZOOKEEPER-1747:
-------------------------------------------
Summary: Zookeeper server fails to start if transaction log file
is corrupted
Key: ZOOKEEPER-1747
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1747
Project: ZooKeeper
Issue Type: Bug
Components: server
Affects Versions: 3.4.5
Environment: Solaris10/x86, Java 1.6
Reporter: Sergey Maslyakov
On multiple occasions when ZK was not able to write out a transaction log or a
snapshot file, the consequent attempt to restart the server fails. Usually it
happens when the underlying file system filled up; thus, preventing ZK server
from writing out consistent data file.
Upon start-up, the server reads in the snapshot and the transaction log. If the
deserializer fails and throws an exception, server terminates. Please see the
stack trace below.
Server not coming up for whatever reason is often an undesirable condition. It
would be nice to have an option to force-ignore parsing errors, especially, in
the transaction log. A check sum on the data could be a possible solution to
ensure the integrity and "parsability".
Another robustness enhancement could be via proper handling of the condition
when snapshot or transaction log cannot be completely written to disk.
Basically, better handling of write errors.
{noformat}
2013-08-28 12:05:30,732 ERROR [ZooKeeperServerMain] Unexpected exception,
exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at
org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:160)
at
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at
org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
at
org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:383)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
at
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
at
org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
at
org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:129)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
{noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira