[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623589#comment-15623589 ]
Abraham Fine commented on ZOOKEEPER-1621: ----------------------------------------- [~hanm] I do not see an issue with the generation of invalid log files as long as no data is lost and the system knows how to handle them without user intervention especially if preventing this would have an impact on performance. bq. while in other cases it might not be straightforward to tell which data is good and which is bad Would you mind explaining what cases you are referring to? > ZooKeeper does not recover from crash when disk was full > -------------------------------------------------------- > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance > Reporter: David Arthur > Assignee: Michi Mutsuzaki > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.3.4#6332)