[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16304970#comment-16304970 ] Jeff Widman commented on ZOOKEEPER-1621: Any update here? > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306470#comment-16306470 ] ASF GitHub Bot commented on ZOOKEEPER-1621: --- GitHub user abhishekrai opened a pull request: https://github.com/apache/zookeeper/pull/439 ZOOKEEPER-1621: Delete and skip txn log with incomplete header Based on the patch by Michi Mutsuzaki. When Zookeeper server encounters a txn log with incomplete header, the old behavior was to crash due to the resulting EOFException. The new behavior is catch the exception and skip the txn log. Additionally, the txn log is deleted to ensure that it does not influence future loads/PurgeTxnLog in believing that this is the only txn log before the following snapshot that they need to load/retain. You can merge this pull request into a Git repository by running: $ git pull https://github.com/abhishekrai/zookeeper ZOOKEEPER-1621 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/439.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #439 commit 6b457a069ccdb01e1ee77537b02db80f3005f5b1 Author: Abhishek Rai Date: 2017-12-29T17:38:52Z ZOOKEEPER-1621: Delete and skip txn log with incomplete header Based on the patch by Michi Mutsuzaki. When Zookeeper server encounters a txn log with incomplete header, the old behavior was to crash due to the resulting EOFException. The new behavior is catch the exception and skip the txn log. Additionally, the txn log is deleted to ensure that it does not influence future loads/PurgeTxnLog in believing that this is the only txn log before the following snapshot that they need to load/retain. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persisten
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306483#comment-16306483 ] Hadoop QA commented on ZOOKEEPER-1621: -- -1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1392//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1392//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1392//console This message is automatically generated. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) >
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316970#comment-16316970 ] ASF GitHub Bot commented on ZOOKEEPER-1621: --- Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/439#discussion_r160244035 --- Diff: src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java --- @@ -307,4 +315,104 @@ public void testReloadSnapshotWithMissingParent() throws Exception { startServer(); } + +/** + * Verify that FileTxnIterator doesn't throw an EOFException when the + * transaction log header is incomplete. + */ +@Test +public void testIncompleteHeader() throws Exception { +ClientBase.setupTestEnv(); +File dataDir = ClientBase.createTmpDir(); +loadDatabase(dataDir, NUM_MESSAGES); + +File logDir = new File(dataDir, FileTxnSnapLog.version + +FileTxnSnapLog.VERSION); +FileTxnLog.FileTxnIterator fileItr = new FileTxnLog.FileTxnIterator(logDir, 0); +List logFiles = fileItr.getStoredFiles(); +int numTransactions = 0; +while (fileItr.next()) { +numTransactions++; +} +Assert.assertTrue("Verify the number of log files", + logFiles.size() > 0); +Assert.assertTrue("Verify the number of transactions", + numTransactions >= NUM_MESSAGES); + +// Truncate the last log file. +File lastLogFile = logFiles.get(logFiles.size() - 1); +FileChannel channel = new FileOutputStream(lastLogFile).getChannel(); +channel.truncate(0); +channel.close(); + +// This shouldn't thow Exception. +fileItr = new FileTxnLog.FileTxnIterator(logDir, 0); +logFiles = fileItr.getStoredFiles(); +numTransactions = 0; +while (fileItr.next()) { +} + +// Verify that the truncated log file does not exist anymore. +Assert.assertFalse("Verify truncated log file has been deleted", + lastLogFile.exists()); +} + +/** + * Verifies that FileTxnIterator throws CorruptedStreamException if the + * magic number is corrupted. + */ +@Test(expected = StreamCorruptedException.class) +public void testCorruptMagicNumber() throws Exception { +ClientBase.setupTestEnv(); +File dataDir = ClientBase.createTmpDir(); +loadDatabase(dataDir, NUM_MESSAGES); + +File logDir = new File(dataDir, FileTxnSnapLog.version + +FileTxnSnapLog.VERSION); +FileTxnLog.FileTxnIterator fileItr = new FileTxnLog.FileTxnIterator(logDir, 0); +List logFiles = fileItr.getStoredFiles(); +Assert.assertTrue("Verify the number of log files", + logFiles.size() > 0); + +// Corrupt the magic number. +File lastLogFile = logFiles.get(logFiles.size() - 1); +RandomAccessFile file = new RandomAccessFile(lastLogFile, "rw"); +file.seek(0); +file.writeByte(123); +file.close(); + +// This should throw CorruptedStreamException. +while (fileItr.next()) { +} +} + +/** + * Starts a standalone server and create znodes. + */ +public void loadDatabase(File dataDir, int numEntries) throws Exception { +final String hostPort = HOST + PortAssignment.unique(); +// setup a single server cluster +ZooKeeperServer zks = new ZooKeeperServer(dataDir, dataDir, 3000); +SyncRequestProcessor.setSnapCount(100); +final int PORT = Integer.parseInt(hostPort.split(":")[1]); +ServerCnxnFactory f = ServerCnxnFactory.createFactory(PORT, -1); +f.startup(zks); +Assert.assertTrue("waiting for server being up ", +ClientBase.waitForServerUp(hostPort,CONNECTION_TIMEOUT)); +ZooKeeper zk = ClientBase.createZKClient(hostPort); + +// Generate some transactions that will get logged. +try { +for (int i = 0; i < numEntries; i++) { +zk.create("/load-database-" + i, new byte[0], + Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); +} +} finally { +zk.close(); +} +f.shutdown(); --- End diff -- Starting the server is already implemented in base class' setUp() method and shutdown
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316971#comment-16316971 ] ASF GitHub Bot commented on ZOOKEEPER-1621: --- Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/439#discussion_r160244385 --- Diff: src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java --- @@ -307,4 +315,104 @@ public void testReloadSnapshotWithMissingParent() throws Exception { startServer(); } + +/** + * Verify that FileTxnIterator doesn't throw an EOFException when the + * transaction log header is incomplete. + */ +@Test +public void testIncompleteHeader() throws Exception { +ClientBase.setupTestEnv(); +File dataDir = ClientBase.createTmpDir(); +loadDatabase(dataDir, NUM_MESSAGES); --- End diff -- Startup / shutdown logic should be in setUp() / tearDown() methods. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apach
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316972#comment-16316972 ] ASF GitHub Bot commented on ZOOKEEPER-1621: --- Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/439#discussion_r160243609 --- Diff: src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java --- @@ -307,4 +315,104 @@ public void testReloadSnapshotWithMissingParent() throws Exception { startServer(); } + +/** + * Verify that FileTxnIterator doesn't throw an EOFException when the + * transaction log header is incomplete. + */ +@Test +public void testIncompleteHeader() throws Exception { +ClientBase.setupTestEnv(); +File dataDir = ClientBase.createTmpDir(); +loadDatabase(dataDir, NUM_MESSAGES); + +File logDir = new File(dataDir, FileTxnSnapLog.version + +FileTxnSnapLog.VERSION); +FileTxnLog.FileTxnIterator fileItr = new FileTxnLog.FileTxnIterator(logDir, 0); +List logFiles = fileItr.getStoredFiles(); +int numTransactions = 0; +while (fileItr.next()) { +numTransactions++; +} +Assert.assertTrue("Verify the number of log files", + logFiles.size() > 0); +Assert.assertTrue("Verify the number of transactions", + numTransactions >= NUM_MESSAGES); + +// Truncate the last log file. +File lastLogFile = logFiles.get(logFiles.size() - 1); +FileChannel channel = new FileOutputStream(lastLogFile).getChannel(); +channel.truncate(0); +channel.close(); + +// This shouldn't thow Exception. +fileItr = new FileTxnLog.FileTxnIterator(logDir, 0); +logFiles = fileItr.getStoredFiles(); +numTransactions = 0; +while (fileItr.next()) { +} + +// Verify that the truncated log file does not exist anymore. +Assert.assertFalse("Verify truncated log file has been deleted", + lastLogFile.exists()); +} + +/** + * Verifies that FileTxnIterator throws CorruptedStreamException if the + * magic number is corrupted. + */ +@Test(expected = StreamCorruptedException.class) +public void testCorruptMagicNumber() throws Exception { +ClientBase.setupTestEnv(); +File dataDir = ClientBase.createTmpDir(); +loadDatabase(dataDir, NUM_MESSAGES); + +File logDir = new File(dataDir, FileTxnSnapLog.version + +FileTxnSnapLog.VERSION); +FileTxnLog.FileTxnIterator fileItr = new FileTxnLog.FileTxnIterator(logDir, 0); +List logFiles = fileItr.getStoredFiles(); +Assert.assertTrue("Verify the number of log files", + logFiles.size() > 0); + +// Corrupt the magic number. +File lastLogFile = logFiles.get(logFiles.size() - 1); +RandomAccessFile file = new RandomAccessFile(lastLogFile, "rw"); +file.seek(0); +file.writeByte(123); +file.close(); + +// This should throw CorruptedStreamException. +while (fileItr.next()) { +} +} + +/** + * Starts a standalone server and create znodes. + */ +public void loadDatabase(File dataDir, int numEntries) throws Exception { +final String hostPort = HOST + PortAssignment.unique(); +// setup a single server cluster +ZooKeeperServer zks = new ZooKeeperServer(dataDir, dataDir, 3000); +SyncRequestProcessor.setSnapCount(100); +final int PORT = Integer.parseInt(hostPort.split(":")[1]); +ServerCnxnFactory f = ServerCnxnFactory.createFactory(PORT, -1); +f.startup(zks); +Assert.assertTrue("waiting for server being up ", +ClientBase.waitForServerUp(hostPort,CONNECTION_TIMEOUT)); +ZooKeeper zk = ClientBase.createZKClient(hostPort); --- End diff -- Down to this line the logic is already implemented in the base class. Please consider re-using it in your tests. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316973#comment-16316973 ] ASF GitHub Bot commented on ZOOKEEPER-1621: --- Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/439#discussion_r160244289 --- Diff: src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java --- @@ -38,10 +40,16 @@ import org.slf4j.LoggerFactory; import java.io.File; +import java.io.FileOutputStream; import java.io.IOException; +import java.io.RandomAccessFile; +import java.io.StreamCorruptedException; +import java.nio.channels.FileChannel; +import java.util.List; public class LoadFromLogTest extends ClientBase { private static final int NUM_MESSAGES = 300; +private static final String HOST = "127.0.0.1:"; --- End diff -- It's already in the base class. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316969#comment-16316969 ] ASF GitHub Bot commented on ZOOKEEPER-1621: --- Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/439#discussion_r160243418 --- Diff: src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java --- @@ -307,4 +315,104 @@ public void testReloadSnapshotWithMissingParent() throws Exception { startServer(); } + +/** + * Verify that FileTxnIterator doesn't throw an EOFException when the + * transaction log header is incomplete. + */ +@Test +public void testIncompleteHeader() throws Exception { +ClientBase.setupTestEnv(); +File dataDir = ClientBase.createTmpDir(); +loadDatabase(dataDir, NUM_MESSAGES); + +File logDir = new File(dataDir, FileTxnSnapLog.version + +FileTxnSnapLog.VERSION); +FileTxnLog.FileTxnIterator fileItr = new FileTxnLog.FileTxnIterator(logDir, 0); +List logFiles = fileItr.getStoredFiles(); +int numTransactions = 0; +while (fileItr.next()) { +numTransactions++; +} +Assert.assertTrue("Verify the number of log files", + logFiles.size() > 0); +Assert.assertTrue("Verify the number of transactions", + numTransactions >= NUM_MESSAGES); + +// Truncate the last log file. +File lastLogFile = logFiles.get(logFiles.size() - 1); +FileChannel channel = new FileOutputStream(lastLogFile).getChannel(); +channel.truncate(0); +channel.close(); + +// This shouldn't thow Exception. +fileItr = new FileTxnLog.FileTxnIterator(logDir, 0); +logFiles = fileItr.getStoredFiles(); +numTransactions = 0; +while (fileItr.next()) { +} + +// Verify that the truncated log file does not exist anymore. +Assert.assertFalse("Verify truncated log file has been deleted", + lastLogFile.exists()); +} + +/** + * Verifies that FileTxnIterator throws CorruptedStreamException if the + * magic number is corrupted. + */ +@Test(expected = StreamCorruptedException.class) +public void testCorruptMagicNumber() throws Exception { +ClientBase.setupTestEnv(); +File dataDir = ClientBase.createTmpDir(); +loadDatabase(dataDir, NUM_MESSAGES); + +File logDir = new File(dataDir, FileTxnSnapLog.version + +FileTxnSnapLog.VERSION); +FileTxnLog.FileTxnIterator fileItr = new FileTxnLog.FileTxnIterator(logDir, 0); +List logFiles = fileItr.getStoredFiles(); +Assert.assertTrue("Verify the number of log files", + logFiles.size() > 0); + +// Corrupt the magic number. +File lastLogFile = logFiles.get(logFiles.size() - 1); +RandomAccessFile file = new RandomAccessFile(lastLogFile, "rw"); +file.seek(0); +file.writeByte(123); +file.close(); + +// This should throw CorruptedStreamException. +while (fileItr.next()) { +} +} + +/** + * Starts a standalone server and create znodes. + */ +public void loadDatabase(File dataDir, int numEntries) throws Exception { --- End diff -- This method could be private. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStr
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316984#comment-16316984 ] ASF GitHub Bot commented on ZOOKEEPER-1621: --- Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/439#discussion_r160245528 --- Diff: src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java --- @@ -307,4 +315,104 @@ public void testReloadSnapshotWithMissingParent() throws Exception { startServer(); } + +/** + * Verify that FileTxnIterator doesn't throw an EOFException when the + * transaction log header is incomplete. + */ +@Test +public void testIncompleteHeader() throws Exception { +ClientBase.setupTestEnv(); +File dataDir = ClientBase.createTmpDir(); +loadDatabase(dataDir, NUM_MESSAGES); + +File logDir = new File(dataDir, FileTxnSnapLog.version + +FileTxnSnapLog.VERSION); +FileTxnLog.FileTxnIterator fileItr = new FileTxnLog.FileTxnIterator(logDir, 0); +List logFiles = fileItr.getStoredFiles(); +int numTransactions = 0; +while (fileItr.next()) { +numTransactions++; +} +Assert.assertTrue("Verify the number of log files", + logFiles.size() > 0); +Assert.assertTrue("Verify the number of transactions", + numTransactions >= NUM_MESSAGES); + +// Truncate the last log file. +File lastLogFile = logFiles.get(logFiles.size() - 1); +FileChannel channel = new FileOutputStream(lastLogFile).getChannel(); +channel.truncate(0); +channel.close(); + +// This shouldn't thow Exception. +fileItr = new FileTxnLog.FileTxnIterator(logDir, 0); +logFiles = fileItr.getStoredFiles(); +numTransactions = 0; --- End diff -- logFiles & numTransactions variables are not being used in the rest of this test. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317193#comment-16317193 ] ASF GitHub Bot commented on ZOOKEEPER-1621: --- Github user afine commented on the issue: https://github.com/apache/zookeeper/pull/439 @abhishekrai Looking through the JIRA I found: > This has been a recurring problem for us in production because our app's operating environment occasionally causes a Zookeeper server's disk to become full. After that, the server invariably runs into this problem - perhaps because there's something else that deterministically triggers a log rotation when the previous txn log throws an IOException due to disk full? Do we have evidence that the log roll is being triggered "deterministically"? It would be great to know for sure that we are handling the disk filling up appropriately all the time rather than just a work around for special cases. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028195#comment-16028195 ] Johannes Grassler commented on ZOOKEEPER-1621: -- This has been open and unchanged for quite a while now, and the existing patch targets 3.5...has there been any progress on fixing this in the 3.4 branch (I am maintaining a Zookeeper 3.4.x package for OpenSUSE and if there is a fix that targets 3.4.x I'd like to include it). > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165322#comment-16165322 ] Jeff Widman commented on ZOOKEEPER-1621: Any update on this? It says 3.5.4, but looks like it hasn't been merged yet... despite (as best I can tell from the comments) consensus that the patch is an improvement over the current behavior. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15560526#comment-15560526 ] Abhishek Rai commented on ZOOKEEPER-1621: - Reviving this old thread. [~shralex] has a valid concern about trading off consistency for availability. However, for the specific issue being addressed here, we can have both. The patch skips transaction logs with an incomplete header (the first 16 bytes). Skipping such files should not cause any loss of data as the header is an internal bookkeeping write from Zookeeper and does not contain any user data. This avoids the current behavior of Zookeeper crashing on encountering an incomplete header, which compromises availability. This has been a recurring problem for us in production because our app's operating environment occasionally causes a Zookeeper server's disk to become full. After that, the server invariably runs into this problem - perhaps because there's something else that deterministically triggers a log rotation when the previous txn log throws an IOException due to disk full? That said, we can tighten the exception being caught in [~michim]'s patch to EOFException instead of IOException to make sure that the log we are skipping indeed only has a partially written header and nothing else (in FileTxnLog.goToNextLog). Additionally, I have written a test to verify that EOFException is thrown if and only if the header is truncated. Zookeeper already ignores any other partially written transactions in the txn log. If that's useful, I can upload the test, thanks. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) >
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15560533#comment-15560533 ] Hadoop QA commented on ZOOKEEPER-1621: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645856/ZOOKEEPER-1621.patch against trunk revision df5519ab9dac9940f35cc4b308b560f2603aec7f. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3476//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3476//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3476//console This message is automatically generated. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.ser
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616169#comment-15616169 ] Meyer Kizner commented on ZOOKEEPER-1621: - Agreed. Forcing users to manually clean up the partial/empty header in this scenario seems undesirable, and if we only catch EOFException instead of IOException, we shouldn't run into any problems with correctness. Additionally, since this issue should only occur "legitimately" in the most recent txn log file, we can be even more conservative and only continue in that case. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616237#comment-15616237 ] Abhishek Rai commented on ZOOKEEPER-1621: - Thanks [~mkizner]. Your suggestion of doing this only for the most recent txn log file is sound. Are you also suggesting that we delete this truncated txn log file? Cause, if we skip it and don't delete, then in the future, newer txn log files will get created. So, the truncated txn log file will no longer be the latest txn log when we do a purge afterwards. Deletion seems consistent with this approach as well as consistent with PurgeTxnLog's behavior. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616298#comment-15616298 ] Meyer Kizner commented on ZOOKEEPER-1621: - Yes, we would have to delete such a log file upon encountering it. I don't believe this would cause any problems, and it seems desirable to have the extra check this enables. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616950#comment-15616950 ] Michael Han commented on ZOOKEEPER-1621: The proposal of the fix makes sense to me. Is it feasible to make a stronger guarantee for the ZooKeeper serialization semantics - that is, under no cases (disk full, power failure, hardware failure) would ZooKeeper generates invalid persistent files (for both snapshot and tx logs)? This might be possible by serializing things to a swap file first and then at one point do an atomic rename of the file. With the guarantee of the sanity of the on disk formats the deserializing logic would be simplified, as there will not be many corner cases to consider, besides the existing basic checksum check logic. I can think two potential drawback of this approach: * Performance: if we write to swap file and then rename for every writes, we will be making more sys calls per write. Might impact performance / latency of write? * Potential data loss during recover: to improve performance, we could batch writes and only do rename at certain points - (i.e. every 1000 writes). In case of a failure, part of the data might loss as those data (possibly corrupted / partially serialized) living in swap file will not be parsed by ZK during start up (we will only load and parse renamed files.). My feeling is the best approach might be a mix of efforts on both serialization and deserialization side: * When serializing, we do our best efforts to avoid generate corrupted files (i.e. through atomic writes to files.). * When deserializing, we do best efforts to detect corrupt files and recover conservatively - the success of recovery might be case by case - for example for this disk full case the proposed fix sounds pretty safe to perform while in other cases it might not be straightforward to tell which data is good and which is bad. * As a result - the expectation is when things crash and files corrupted, ZK should be able to recover later without manual intervention. This would be good for users. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterat
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623589#comment-15623589 ] Abraham Fine commented on ZOOKEEPER-1621: - [~hanm] I do not see an issue with the generation of invalid log files as long as no data is lost and the system knows how to handle them without user intervention especially if preventing this would have an impact on performance. bq. while in other cases it might not be straightforward to tell which data is good and which is bad Would you mind explaining what cases you are referring to? > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15636622#comment-15636622 ] Hadoop QA commented on ZOOKEEPER-1621: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12837140/ZOOKEEPER-1621.2.patch against trunk revision bcb07a09b06c91243ed244f04a71b8daf629e286. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 19 new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3513//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3513//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3513//console This message is automatically generated. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeepe
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342934#comment-15342934 ] Hadoop QA commented on ZOOKEEPER-1621: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645856/ZOOKEEPER-1621.patch against trunk revision 1748630. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs (version 2.0.3) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3215//testReport/ Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3215//console This message is automatically generated. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.6.0, 3.5.3 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.ja
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707846#comment-14707846 ] Arshad Mohammad commented on ZOOKEEPER-1621: Apart from these corrective measures there should be some preventive measures as well. Can we have disk space availability checker which check periodically whether disk space is available or not and if not available then close the Zookeeper gracefully. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708423#comment-14708423 ] Raul Gutierrez Segales commented on ZOOKEEPER-1621: --- You mean, like a ZK thread dedicated to this? What would the behavior be, only shutdown if it's the leader? > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708873#comment-14708873 ] Arshad Mohammad commented on ZOOKEEPER-1621: * Yes, dedicated thread for this like {{org.apache.zookeeper.server.DatadirCleanupManager}} * shut-down in every case, because without disk space zookeeper can not serve any purpose * The idea is as follows ** add two new zookeeper properties diskspace.min.threshold=5% (values can be % of data directory available space or in GB) diskspace.check.interval=5 second (default:5,min:1,max:Long.MAX_VALUE) ** add dedicated disk check thread *** which runs on every {{diskspace.check.interval)) second *** if disk space is less than {{diskspace.min.threshold}} then shutdown zookeeper instance * Some clarifications: ** Query: Suppose {{diskspace.check.interval=5}} and disk space can be full within 5 second by zookeeper or by other process. What is handling for this? Ans: User should know what is their usage scenario, and what other processes are using the same disk space and based on that they should optimize the {{diskspace.check.interval}} values ** Query: let say {{diskspace.check.interval=1}} but disk space can be filled even within 1 second by zookeeper and other process Ans: yes it can be filled if {{diskspace.min.threshold}} is less, again based on disk space usage user need to optimize {{diskspace.min.threshold}} > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709593#comment-14709593 ] Hadoop QA commented on ZOOKEEPER-1621: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645856/ZOOKEEPER-1621.patch against trunk revision 1697227. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2834//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2834//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2834//console This message is automatically generated. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.serve
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717014#comment-14717014 ] Arshad Mohammad commented on ZOOKEEPER-1621: Hi [~rgs], does it make sense, can we create new jira for this > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340429#comment-14340429 ] David Arthur commented on ZOOKEEPER-1621: - I actually like [~shralex]'s suggestion. However, if this is going to be the way you recommended recovering a corrupt log file, there should be a script that does it for users: zk-recover.sh or some such. In this world of deployment automation, it's not a nice thing to say "go delete the most recent log segment from ZK's data dir". Much better for the application to handle it through a script or command. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.1 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340511#comment-14340511 ] Hadoop QA commented on ZOOKEEPER-1621: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645856/ZOOKEEPER-1621.patch against trunk revision 1662055. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2538//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2538//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2538//console This message is automatically generated. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.1 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKe
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002556#comment-14002556 ] Michi Mutsuzaki commented on ZOOKEEPER-1621: Should FileTxnIterator.goToNextLog() return false if the header is corrupted/incomplete, or should it skip the log file and go to the next log file if it exists? > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.0 > > Attachments: zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003956#comment-14003956 ] Michi Mutsuzaki commented on ZOOKEEPER-1621: https://reviews.apache.org/r/21732/ > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003996#comment-14003996 ] Hadoop QA commented on ZOOKEEPER-1621: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645856/ZOOKEEPER-1621.patch against trunk revision 1596284. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//console This message is automatically generated. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKe
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004109#comment-14004109 ] Alexander Shraer commented on ZOOKEEPER-1621: - Here's a different option - intuitively once zookeeper fails to write to disk, by continuing to operate normally it violates its promises to users (which is that if a majority acked the data is always there even if reboots happen). Once we realize the promise can't be kept it may be better to crash the server at that point and violate liveness (no availability) rather than to continue and risk coming up with a partial log at a later point violating safety (inconsistent state, lost transactions, etc). > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004117#comment-14004117 ] Michi Mutsuzaki commented on ZOOKEEPER-1621: I'm fine with Alex's suggestion. We should document how to manually recover when the server doesn't start because the log file doesn't contain the complete header. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562446#comment-16562446 ] Hadoop QA commented on ZOOKEEPER-1621: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12837140/ZOOKEEPER-1621.2.patch against trunk revision 78e4a1047c701006dd4ec8d09065eda0e7adedb5. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3700//console This message is automatically generated. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki >Priority: Major > Labels: pull-request-available > Fix For: 3.6.0, 3.5.5 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > Time Spent: 10m > Remaining Estimate: 0h > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situation
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723374#comment-16723374 ] Jeff Widman commented on ZOOKEEPER-1621: I was checking into this and noticed that the PR containing the patch for this includes a comment indicating this problem may be resolved by another, already-merged PR: https://github.com/apache/zookeeper/pull/439#issuecomment-408994085 > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki >Priority: Major > Labels: pull-request-available > Fix For: 3.6.0, 3.5.5 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > Time Spent: 10m > Remaining Estimate: 0h > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723383#comment-16723383 ] Hadoop QA commented on ZOOKEEPER-1621: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12837140/ZOOKEEPER-1621.2.patch against trunk revision 46fc819622bf08cbd0781dea279aff734b492902. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3721//console This message is automatically generated. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki >Priority: Major > Labels: pull-request-available > Fix For: 3.6.0, 3.5.5 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > Time Spent: 10m > Remaining Estimate: 0h > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situation
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555110#comment-13555110 ] David Arthur commented on ZOOKEEPER-1621: - I was able to workaround the issue by deleting the partially written snapshot file > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555158#comment-13555158 ] Flavio Junqueira commented on ZOOKEEPER-1621: - I believe the exception is being thrown while reading the snapshot and the partial transaction message is not an indication of what is causing it to crash. It sounds right that we should try a different snapshot, but according to the log messages you posted, it sounds like the problem is that we are not catching EOFException. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555169#comment-13555169 ] Mahadev konar commented on ZOOKEEPER-1621: -- David, So there exceptions are thrown when ZooKeeper is running? Am not sure why its exiting so many times. Do you guys restart the ZK server if it dies? > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > Fix For: 3.5.0 > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555189#comment-13555189 ] David Arthur commented on ZOOKEEPER-1621: - We run ZooKeeper with runit, so yes it is restarted when it dies. It ends up in a loop of: * No space left on device * Starting server * Last transaction was partial * Snapshotting: 0x19a3d to /opt/zookeeper-3.4.3/data/version-2/snapshot.19a3d * No space left on device > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > Fix For: 3.5.0 > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555192#comment-13555192 ] Mahadev konar commented on ZOOKEEPER-1621: -- David, I thought you said it does not recover when disk was full, but looks like the disk is still full? No? > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > Fix For: 3.5.0 > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555215#comment-13555215 ] David Arthur commented on ZOOKEEPER-1621: - Here is the full sequence of events (sorry for the confusion): * Noticed disk was full * Cleaned up disk space * Tried zkCli.sh, got errors * Checked ZK log, loop of: 2013-01-16 15:01:35,194 - ERROR [main:Util@239] - Last transaction was partial. 2013-01-16 15:01:35,196 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) * Stopped ZK * Listed ZK data directory ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ ls -lat total 18096 drwxr-xr-x 2 zookeeper zookeeper 4096 Jan 16 06:41 . -rw-r--r-- 1 zookeeper zookeeper0 Jan 16 06:41 log.19a3e -rw-r--r-- 1 zookeeper zookeeper 585377 Jan 16 06:41 snapshot.19a3d -rw-r--r-- 1 zookeeper zookeeper 67108880 Jan 16 03:11 log.19a2a -rw-r--r-- 1 zookeeper zookeeper 585911 Jan 16 03:11 snapshot.19a29 -rw-r--r-- 1 zookeeper zookeeper 67108880 Jan 16 03:11 log.11549 -rw-r--r-- 1 zookeeper zookeeper 585190 Jan 15 17:28 snapshot.11547 -rw-r--r-- 1 zookeeper zookeeper 67108880 Jan 15 17:28 log.1 -rw-r--r-- 1 zookeeper zookeeper 296 Jan 14 16:44 snapshot.0 drwxr-xr-x 3 zookeeper zookeeper 4096 Jan 14 16:44 .. * Removed log.19a3e and snapshot.19a3d ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ sudo rm log.19a3e ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ sudo rm snapshot.19a3d * Started ZK * Back to normal > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > Fix For: 3.5.0 > > Attachments: zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > a
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555243#comment-13555243 ] Edward Ribeiro commented on ZOOKEEPER-1621: --- Hi folks, FYI, this issue is a duplication of ZOOKEEPER-1612 (curiously, a permutation of the last two digits, heh). I'd suggest to close 1612 as dup instead, if possible. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > Fix For: 3.5.0 > > Attachments: zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555318#comment-13555318 ] Mahadev konar commented on ZOOKEEPER-1621: -- Ill makr 1612 as dup. Thanks for pointing that out Edward. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > Fix For: 3.5.0 > > Attachments: zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557022#comment-13557022 ] Mahadev konar commented on ZOOKEEPER-1621: -- Looks like the header was incomplete. Unfortunately we do not handle corrupt header but do handle corrupt txn's later. Am suprised that this happened twice in a row for 2 users. Ill upload a patch and test case. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > Fix For: 3.5.0 > > Attachments: zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira