[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2017-12-27 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16304970#comment-16304970
 ] 

Jeff Widman commented on ZOOKEEPER-1621:


Any update here?

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, 
> zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2017-12-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306470#comment-16306470
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1621:
---

GitHub user abhishekrai opened a pull request:

https://github.com/apache/zookeeper/pull/439

ZOOKEEPER-1621: Delete and skip txn log with incomplete header

Based on the patch by Michi Mutsuzaki.

When Zookeeper server encounters a txn log with incomplete header,
the old behavior was to crash due to the resulting EOFException.
The new behavior is catch the exception and skip the txn log.

Additionally, the txn log is deleted to ensure that it does not
influence future loads/PurgeTxnLog in believing that this is
the only txn log before the following snapshot that they need to
load/retain.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/abhishekrai/zookeeper ZOOKEEPER-1621

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/439.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #439


commit 6b457a069ccdb01e1ee77537b02db80f3005f5b1
Author: Abhishek Rai 
Date:   2017-12-29T17:38:52Z

ZOOKEEPER-1621: Delete and skip txn log with incomplete header

Based on the patch by Michi Mutsuzaki.

When Zookeeper server encounters a txn log with incomplete header,
the old behavior was to crash due to the resulting EOFException.
The new behavior is catch the exception and skip the txn log.

Additionally, the txn log is deleted to ensure that it does not
influence future loads/PurgeTxnLog in believing that this is
the only txn log before the following snapshot that they need to
load/retain.




> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, 
> zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persisten

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2017-12-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306483#comment-16306483
 ] 

Hadoop QA commented on ZOOKEEPER-1621:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1392//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1392//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1392//console

This message is automatically generated.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, 
> zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> 

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2018-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316970#comment-16316970
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1621:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/439#discussion_r160244035
  
--- Diff: src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java ---
@@ -307,4 +315,104 @@ public void testReloadSnapshotWithMissingParent() 
throws Exception {
 
 startServer();
 }
+
+/**
+ * Verify that FileTxnIterator doesn't throw an EOFException when the
+ * transaction log header is incomplete.
+ */
+@Test
+public void testIncompleteHeader() throws Exception {
+ClientBase.setupTestEnv();
+File dataDir = ClientBase.createTmpDir();
+loadDatabase(dataDir, NUM_MESSAGES);
+
+File logDir = new File(dataDir, FileTxnSnapLog.version +
+FileTxnSnapLog.VERSION);
+FileTxnLog.FileTxnIterator fileItr = new 
FileTxnLog.FileTxnIterator(logDir, 0);
+List logFiles = fileItr.getStoredFiles();
+int numTransactions = 0;
+while (fileItr.next()) {
+numTransactions++;
+}
+Assert.assertTrue("Verify the number of log files",
+  logFiles.size() > 0);
+Assert.assertTrue("Verify the number of transactions",
+  numTransactions >= NUM_MESSAGES);
+
+// Truncate the last log file.
+File lastLogFile = logFiles.get(logFiles.size() - 1);
+FileChannel channel = new 
FileOutputStream(lastLogFile).getChannel();
+channel.truncate(0);
+channel.close();
+
+// This shouldn't thow Exception.
+fileItr = new FileTxnLog.FileTxnIterator(logDir, 0);
+logFiles = fileItr.getStoredFiles();
+numTransactions = 0;
+while (fileItr.next()) {
+}
+
+// Verify that the truncated log file does not exist anymore.
+Assert.assertFalse("Verify truncated log file has been deleted",
+   lastLogFile.exists());
+}
+
+/**
+ * Verifies that FileTxnIterator throws CorruptedStreamException if the
+ * magic number is corrupted.
+ */
+@Test(expected = StreamCorruptedException.class)
+public void testCorruptMagicNumber() throws Exception {
+ClientBase.setupTestEnv();
+File dataDir = ClientBase.createTmpDir();
+loadDatabase(dataDir, NUM_MESSAGES);
+
+File logDir = new File(dataDir, FileTxnSnapLog.version +
+FileTxnSnapLog.VERSION);
+FileTxnLog.FileTxnIterator fileItr = new 
FileTxnLog.FileTxnIterator(logDir, 0);
+List logFiles = fileItr.getStoredFiles();
+Assert.assertTrue("Verify the number of log files",
+  logFiles.size() > 0);
+
+// Corrupt the magic number.
+File lastLogFile = logFiles.get(logFiles.size() - 1);
+RandomAccessFile file = new RandomAccessFile(lastLogFile, "rw");
+file.seek(0);
+file.writeByte(123);
+file.close();
+
+// This should throw CorruptedStreamException.
+while (fileItr.next()) {
+}
+}
+
+/**
+ * Starts a standalone server and create znodes.
+ */
+public void loadDatabase(File dataDir, int numEntries) throws 
Exception {
+final String hostPort = HOST + PortAssignment.unique();
+// setup a single server cluster
+ZooKeeperServer zks = new ZooKeeperServer(dataDir, dataDir, 3000);
+SyncRequestProcessor.setSnapCount(100);
+final int PORT = Integer.parseInt(hostPort.split(":")[1]);
+ServerCnxnFactory f = ServerCnxnFactory.createFactory(PORT, -1);
+f.startup(zks);
+Assert.assertTrue("waiting for server being up ",
+ClientBase.waitForServerUp(hostPort,CONNECTION_TIMEOUT));
+ZooKeeper zk = ClientBase.createZKClient(hostPort);
+
+// Generate some transactions that will get logged.
+try {
+for (int i = 0; i < numEntries; i++) {
+zk.create("/load-database-" + i, new byte[0],
+  Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
+}
+} finally {
+zk.close();
+}
+f.shutdown();
--- End diff --

Starting the server is already implemented in base class' setUp() method 
and shutdown

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2018-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316971#comment-16316971
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1621:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/439#discussion_r160244385
  
--- Diff: src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java ---
@@ -307,4 +315,104 @@ public void testReloadSnapshotWithMissingParent() 
throws Exception {
 
 startServer();
 }
+
+/**
+ * Verify that FileTxnIterator doesn't throw an EOFException when the
+ * transaction log header is incomplete.
+ */
+@Test
+public void testIncompleteHeader() throws Exception {
+ClientBase.setupTestEnv();
+File dataDir = ClientBase.createTmpDir();
+loadDatabase(dataDir, NUM_MESSAGES);
--- End diff --

Startup / shutdown logic should be in setUp() / tearDown() methods.


> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, 
> zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apach

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2018-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316972#comment-16316972
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1621:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/439#discussion_r160243609
  
--- Diff: src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java ---
@@ -307,4 +315,104 @@ public void testReloadSnapshotWithMissingParent() 
throws Exception {
 
 startServer();
 }
+
+/**
+ * Verify that FileTxnIterator doesn't throw an EOFException when the
+ * transaction log header is incomplete.
+ */
+@Test
+public void testIncompleteHeader() throws Exception {
+ClientBase.setupTestEnv();
+File dataDir = ClientBase.createTmpDir();
+loadDatabase(dataDir, NUM_MESSAGES);
+
+File logDir = new File(dataDir, FileTxnSnapLog.version +
+FileTxnSnapLog.VERSION);
+FileTxnLog.FileTxnIterator fileItr = new 
FileTxnLog.FileTxnIterator(logDir, 0);
+List logFiles = fileItr.getStoredFiles();
+int numTransactions = 0;
+while (fileItr.next()) {
+numTransactions++;
+}
+Assert.assertTrue("Verify the number of log files",
+  logFiles.size() > 0);
+Assert.assertTrue("Verify the number of transactions",
+  numTransactions >= NUM_MESSAGES);
+
+// Truncate the last log file.
+File lastLogFile = logFiles.get(logFiles.size() - 1);
+FileChannel channel = new 
FileOutputStream(lastLogFile).getChannel();
+channel.truncate(0);
+channel.close();
+
+// This shouldn't thow Exception.
+fileItr = new FileTxnLog.FileTxnIterator(logDir, 0);
+logFiles = fileItr.getStoredFiles();
+numTransactions = 0;
+while (fileItr.next()) {
+}
+
+// Verify that the truncated log file does not exist anymore.
+Assert.assertFalse("Verify truncated log file has been deleted",
+   lastLogFile.exists());
+}
+
+/**
+ * Verifies that FileTxnIterator throws CorruptedStreamException if the
+ * magic number is corrupted.
+ */
+@Test(expected = StreamCorruptedException.class)
+public void testCorruptMagicNumber() throws Exception {
+ClientBase.setupTestEnv();
+File dataDir = ClientBase.createTmpDir();
+loadDatabase(dataDir, NUM_MESSAGES);
+
+File logDir = new File(dataDir, FileTxnSnapLog.version +
+FileTxnSnapLog.VERSION);
+FileTxnLog.FileTxnIterator fileItr = new 
FileTxnLog.FileTxnIterator(logDir, 0);
+List logFiles = fileItr.getStoredFiles();
+Assert.assertTrue("Verify the number of log files",
+  logFiles.size() > 0);
+
+// Corrupt the magic number.
+File lastLogFile = logFiles.get(logFiles.size() - 1);
+RandomAccessFile file = new RandomAccessFile(lastLogFile, "rw");
+file.seek(0);
+file.writeByte(123);
+file.close();
+
+// This should throw CorruptedStreamException.
+while (fileItr.next()) {
+}
+}
+
+/**
+ * Starts a standalone server and create znodes.
+ */
+public void loadDatabase(File dataDir, int numEntries) throws 
Exception {
+final String hostPort = HOST + PortAssignment.unique();
+// setup a single server cluster
+ZooKeeperServer zks = new ZooKeeperServer(dataDir, dataDir, 3000);
+SyncRequestProcessor.setSnapCount(100);
+final int PORT = Integer.parseInt(hostPort.split(":")[1]);
+ServerCnxnFactory f = ServerCnxnFactory.createFactory(PORT, -1);
+f.startup(zks);
+Assert.assertTrue("waiting for server being up ",
+ClientBase.waitForServerUp(hostPort,CONNECTION_TIMEOUT));
+ZooKeeper zk = ClientBase.createZKClient(hostPort);
--- End diff --

Down to this line the logic is already implemented in the base class. 
Please consider re-using it in your tests.


> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2018-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316973#comment-16316973
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1621:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/439#discussion_r160244289
  
--- Diff: src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java ---
@@ -38,10 +40,16 @@
 import org.slf4j.LoggerFactory;
 
 import java.io.File;
+import java.io.FileOutputStream;
 import java.io.IOException;
+import java.io.RandomAccessFile;
+import java.io.StreamCorruptedException;
+import java.nio.channels.FileChannel;
+import java.util.List;
 
 public class LoadFromLogTest extends ClientBase {
 private static final int NUM_MESSAGES = 300;
+private static final String HOST = "127.0.0.1:";
--- End diff --

It's already in the base class.


> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, 
> zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2018-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316969#comment-16316969
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1621:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/439#discussion_r160243418
  
--- Diff: src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java ---
@@ -307,4 +315,104 @@ public void testReloadSnapshotWithMissingParent() 
throws Exception {
 
 startServer();
 }
+
+/**
+ * Verify that FileTxnIterator doesn't throw an EOFException when the
+ * transaction log header is incomplete.
+ */
+@Test
+public void testIncompleteHeader() throws Exception {
+ClientBase.setupTestEnv();
+File dataDir = ClientBase.createTmpDir();
+loadDatabase(dataDir, NUM_MESSAGES);
+
+File logDir = new File(dataDir, FileTxnSnapLog.version +
+FileTxnSnapLog.VERSION);
+FileTxnLog.FileTxnIterator fileItr = new 
FileTxnLog.FileTxnIterator(logDir, 0);
+List logFiles = fileItr.getStoredFiles();
+int numTransactions = 0;
+while (fileItr.next()) {
+numTransactions++;
+}
+Assert.assertTrue("Verify the number of log files",
+  logFiles.size() > 0);
+Assert.assertTrue("Verify the number of transactions",
+  numTransactions >= NUM_MESSAGES);
+
+// Truncate the last log file.
+File lastLogFile = logFiles.get(logFiles.size() - 1);
+FileChannel channel = new 
FileOutputStream(lastLogFile).getChannel();
+channel.truncate(0);
+channel.close();
+
+// This shouldn't thow Exception.
+fileItr = new FileTxnLog.FileTxnIterator(logDir, 0);
+logFiles = fileItr.getStoredFiles();
+numTransactions = 0;
+while (fileItr.next()) {
+}
+
+// Verify that the truncated log file does not exist anymore.
+Assert.assertFalse("Verify truncated log file has been deleted",
+   lastLogFile.exists());
+}
+
+/**
+ * Verifies that FileTxnIterator throws CorruptedStreamException if the
+ * magic number is corrupted.
+ */
+@Test(expected = StreamCorruptedException.class)
+public void testCorruptMagicNumber() throws Exception {
+ClientBase.setupTestEnv();
+File dataDir = ClientBase.createTmpDir();
+loadDatabase(dataDir, NUM_MESSAGES);
+
+File logDir = new File(dataDir, FileTxnSnapLog.version +
+FileTxnSnapLog.VERSION);
+FileTxnLog.FileTxnIterator fileItr = new 
FileTxnLog.FileTxnIterator(logDir, 0);
+List logFiles = fileItr.getStoredFiles();
+Assert.assertTrue("Verify the number of log files",
+  logFiles.size() > 0);
+
+// Corrupt the magic number.
+File lastLogFile = logFiles.get(logFiles.size() - 1);
+RandomAccessFile file = new RandomAccessFile(lastLogFile, "rw");
+file.seek(0);
+file.writeByte(123);
+file.close();
+
+// This should throw CorruptedStreamException.
+while (fileItr.next()) {
+}
+}
+
+/**
+ * Starts a standalone server and create znodes.
+ */
+public void loadDatabase(File dataDir, int numEntries) throws 
Exception {
--- End diff --

This method could be private.


> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, 
> zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStr

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2018-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316984#comment-16316984
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1621:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/439#discussion_r160245528
  
--- Diff: src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java ---
@@ -307,4 +315,104 @@ public void testReloadSnapshotWithMissingParent() 
throws Exception {
 
 startServer();
 }
+
+/**
+ * Verify that FileTxnIterator doesn't throw an EOFException when the
+ * transaction log header is incomplete.
+ */
+@Test
+public void testIncompleteHeader() throws Exception {
+ClientBase.setupTestEnv();
+File dataDir = ClientBase.createTmpDir();
+loadDatabase(dataDir, NUM_MESSAGES);
+
+File logDir = new File(dataDir, FileTxnSnapLog.version +
+FileTxnSnapLog.VERSION);
+FileTxnLog.FileTxnIterator fileItr = new 
FileTxnLog.FileTxnIterator(logDir, 0);
+List logFiles = fileItr.getStoredFiles();
+int numTransactions = 0;
+while (fileItr.next()) {
+numTransactions++;
+}
+Assert.assertTrue("Verify the number of log files",
+  logFiles.size() > 0);
+Assert.assertTrue("Verify the number of transactions",
+  numTransactions >= NUM_MESSAGES);
+
+// Truncate the last log file.
+File lastLogFile = logFiles.get(logFiles.size() - 1);
+FileChannel channel = new 
FileOutputStream(lastLogFile).getChannel();
+channel.truncate(0);
+channel.close();
+
+// This shouldn't thow Exception.
+fileItr = new FileTxnLog.FileTxnIterator(logDir, 0);
+logFiles = fileItr.getStoredFiles();
+numTransactions = 0;
--- End diff --

logFiles & numTransactions variables are not being used in the rest of this 
test.


> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, 
> zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2018-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317193#comment-16317193
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1621:
---

Github user afine commented on the issue:

https://github.com/apache/zookeeper/pull/439
  
@abhishekrai Looking through the JIRA I found:

> This has been a recurring problem for us in production because our app's 
operating environment occasionally causes a Zookeeper server's disk to become 
full. After that, the server invariably runs into this problem - perhaps 
because there's something else that deterministically triggers a log rotation 
when the previous txn log throws an IOException due to disk full?

Do we have evidence that the log roll is being triggered 
"deterministically"? It would be great to know for sure that we are handling 
the disk filling up appropriately all the time rather than just a work around 
for special cases.


> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, 
> zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2017-05-29 Thread Johannes Grassler (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028195#comment-16028195
 ] 

Johannes Grassler commented on ZOOKEEPER-1621:
--

This has been open and unchanged for quite a while now, and the existing patch 
targets 3.5...has there been any progress on fixing this in the 3.4 branch (I 
am maintaining a Zookeeper 3.4.x package for OpenSUSE and if there is a fix 
that targets 3.4.x I'd like to include it).

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, 
> zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2017-09-13 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165322#comment-16165322
 ] 

Jeff Widman commented on ZOOKEEPER-1621:


Any update on this? 

It says 3.5.4, but looks like it hasn't been merged yet... despite (as best I 
can tell from the comments) consensus that the patch is an improvement over the 
current behavior.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, 
> zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2016-10-09 Thread Abhishek Rai (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15560526#comment-15560526
 ] 

Abhishek Rai commented on ZOOKEEPER-1621:
-

Reviving this old thread.  [~shralex] has a valid concern about trading off 
consistency for availability.  However, for the specific issue being addressed 
here, we can have both.

The patch skips transaction logs with an incomplete header (the first 16 
bytes).  Skipping such files should not cause any loss of data as the header is 
an internal bookkeeping write from Zookeeper and does not contain any user 
data.  This avoids the current behavior of Zookeeper crashing on encountering 
an incomplete header, which compromises availability.

This has been a recurring problem for us in production because our app's 
operating environment occasionally causes a Zookeeper server's disk to become 
full.  After that, the server invariably runs into this problem - perhaps 
because there's something else that deterministically triggers a log rotation 
when the previous txn log throws an IOException due to disk full?

That said, we can tighten the exception being caught in [~michim]'s patch to 
EOFException instead of IOException to make sure that the log we are skipping 
indeed only has a partially written header and nothing else (in 
FileTxnLog.goToNextLog).

Additionally, I have written a test to verify that EOFException is thrown if 
and only if the header is truncated.  Zookeeper already ignores any other 
partially written transactions in the txn log.  If that's useful, I can upload 
the test, thanks.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
>  

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2016-10-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15560533#comment-15560533
 ] 

Hadoop QA commented on ZOOKEEPER-1621:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12645856/ZOOKEEPER-1621.patch
  against trunk revision df5519ab9dac9940f35cc4b308b560f2603aec7f.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

+1 findbugs.  The patch does not introduce any new Findbugs (version 2.0.3) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3476//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3476//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3476//console

This message is automatically generated.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.ser

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2016-10-28 Thread Meyer Kizner (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616169#comment-15616169
 ] 

Meyer Kizner commented on ZOOKEEPER-1621:
-

Agreed. Forcing users to manually clean up the partial/empty header in this 
scenario seems undesirable, and if we only catch EOFException instead of 
IOException, we shouldn't run into any problems with correctness. Additionally, 
since this issue should only occur "legitimately" in the most recent txn log 
file, we can be even more conservative and only continue in that case.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2016-10-28 Thread Abhishek Rai (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616237#comment-15616237
 ] 

Abhishek Rai commented on ZOOKEEPER-1621:
-

Thanks [~mkizner].  Your suggestion of doing this only for the most recent txn 
log file is sound.  Are you also suggesting that we delete this truncated txn 
log file?

Cause, if we skip it and don't delete, then in the future, newer txn log files 
will get created.  So, the truncated txn log file will no longer be the latest 
txn log when we do a purge afterwards.

Deletion seems consistent with this approach as well as consistent with 
PurgeTxnLog's behavior.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2016-10-28 Thread Meyer Kizner (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616298#comment-15616298
 ] 

Meyer Kizner commented on ZOOKEEPER-1621:
-

Yes, we would have to delete such a log file upon encountering it. I don't 
believe this would cause any problems, and it seems desirable to have the extra 
check this enables.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2016-10-28 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616950#comment-15616950
 ] 

Michael Han commented on ZOOKEEPER-1621:


The proposal of the fix makes sense to me.

Is it feasible to make a stronger guarantee for the ZooKeeper serialization 
semantics - that is, under no cases (disk full, power failure, hardware 
failure) would ZooKeeper generates invalid persistent files (for both snapshot 
and tx logs)? This might be possible by serializing things to a swap file first 
and then at one point do an atomic rename of the file. With the guarantee of 
the sanity of the on disk formats the deserializing logic would be simplified, 
as there will not be many corner cases to consider, besides the existing basic 
checksum check logic. 

I can think two potential drawback of this approach:
* Performance: if we write to swap file and then rename for every writes, we 
will be making more sys calls per write. Might impact performance / latency of 
write?
* Potential data loss during recover: to improve performance, we could batch 
writes and only do rename at certain points - (i.e. every 1000 writes). In case 
of a failure, part of the data might loss as those data (possibly corrupted / 
partially serialized) living in swap file will not be parsed by ZK during start 
up (we will only load and parse renamed files.).

My feeling is the best approach might be a mix of efforts on both serialization 
and deserialization side:
* When serializing, we do our best efforts to avoid generate corrupted files 
(i.e. through atomic writes to files.).
* When deserializing, we do best efforts to detect corrupt files and recover 
conservatively - the success of recovery might be case by case - for example 
for this disk full case the proposed fix sounds pretty safe to perform while in 
other cases it might not be straightforward to tell which data is good and 
which is bad.
* As a result - the expectation is when things crash and files corrupted, ZK 
should be able to recover later without manual intervention. This would be good 
for users.


> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterat

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2016-10-31 Thread Abraham Fine (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623589#comment-15623589
 ] 

Abraham Fine commented on ZOOKEEPER-1621:
-

[~hanm] I do not see an issue with the generation of invalid log files as long 
as no data is lost and the system knows how to handle them without user 
intervention especially if preventing this would have an impact on performance.

bq. while in other cases it might not be straightforward to tell which data is 
good and which is bad
Would you mind explaining what cases you are referring to?



> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2016-11-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15636622#comment-15636622
 ] 

Hadoop QA commented on ZOOKEEPER-1621:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12837140/ZOOKEEPER-1621.2.patch
  against trunk revision bcb07a09b06c91243ed244f04a71b8daf629e286.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 19 new Findbugs (version 
3.0.1) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3513//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3513//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3513//console

This message is automatically generated.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, 
> zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeepe

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2016-06-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342934#comment-15342934
 ] 

Hadoop QA commented on ZOOKEEPER-1621:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12645856/ZOOKEEPER-1621.patch
  against trunk revision 1748630.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs (version 2.0.3) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3215//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3215//console

This message is automatically generated.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.6.0, 3.5.3
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.ja

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2015-08-21 Thread Arshad Mohammad (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707846#comment-14707846
 ] 

Arshad Mohammad commented on ZOOKEEPER-1621:


Apart from these corrective measures there should be some preventive measures 
as well. 
Can we have disk space availability checker which check periodically whether 
disk space is available  or not and if not available then close the Zookeeper 
gracefully.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2015-08-23 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708423#comment-14708423
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1621:
---

You mean, like a ZK thread dedicated to this? What would the behavior be, only 
shutdown if it's the leader?

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2015-08-23 Thread Arshad Mohammad (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708873#comment-14708873
 ] 

Arshad Mohammad commented on ZOOKEEPER-1621:


* Yes, dedicated thread for this like 
{{org.apache.zookeeper.server.DatadirCleanupManager}}
* shut-down in every case, because without disk space zookeeper can not serve 
any purpose
* The idea is as follows
** add two new zookeeper properties
 diskspace.min.threshold=5% (values can be % of data directory available space 
or in GB)
 diskspace.check.interval=5 second (default:5,min:1,max:Long.MAX_VALUE)
** add dedicated disk check thread 
*** which runs on every {{diskspace.check.interval)) second
*** if disk space is less than {{diskspace.min.threshold}} then shutdown 
zookeeper instance
* Some clarifications:
** Query: Suppose {{diskspace.check.interval=5}} and disk space can be full 
within 5 second  by zookeeper or by other process. What is handling for this?
Ans: User should know what is their usage scenario, and what other 
processes are using the same disk space and based on that they should optimize 
the {{diskspace.check.interval}} values
** Query: let say {{diskspace.check.interval=1}} but disk space can be filled 
even within 1 second by zookeeper and other process
Ans: yes it can be filled if {{diskspace.min.threshold}} is less, again based 
on disk space usage user need to optimize {{diskspace.min.threshold}}   

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2015-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709593#comment-14709593
 ] 

Hadoop QA commented on ZOOKEEPER-1621:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12645856/ZOOKEEPER-1621.patch
  against trunk revision 1697227.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 2.0.3) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2834//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2834//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2834//console

This message is automatically generated.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.serve

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2015-08-27 Thread Arshad Mohammad (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717014#comment-14717014
 ] 

Arshad Mohammad commented on ZOOKEEPER-1621:


Hi [~rgs], does it make sense, can we create new jira for this

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2015-02-27 Thread David Arthur (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340429#comment-14340429
 ] 

David Arthur commented on ZOOKEEPER-1621:
-

I actually like [~shralex]'s suggestion. However, if this is going to be the 
way you recommended recovering a corrupt log file, there should be a script 
that does it for users: zk-recover.sh or some such. In this world of deployment 
automation, it's not a nice thing to say "go delete the most recent log segment 
from ZK's data dir". Much better for the application to handle it through a 
script or command.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2015-02-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340511#comment-14340511
 ] 

Hadoop QA commented on ZOOKEEPER-1621:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12645856/ZOOKEEPER-1621.patch
  against trunk revision 1662055.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 2.0.3) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2538//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2538//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2538//console

This message is automatically generated.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKe

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2014-05-19 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002556#comment-14002556
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1621:


Should FileTxnIterator.goToNextLog() return false if the header is 
corrupted/incomplete, or should it skip the log file and go to the next log 
file if it exists?

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.0
>
> Attachments: zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2014-05-20 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003956#comment-14003956
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1621:


https://reviews.apache.org/r/21732/

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2014-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003996#comment-14003996
 ] 

Hadoop QA commented on ZOOKEEPER-1621:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12645856/ZOOKEEPER-1621.patch
  against trunk revision 1596284.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//console

This message is automatically generated.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKe

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2014-05-20 Thread Alexander Shraer (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004109#comment-14004109
 ] 

Alexander Shraer commented on ZOOKEEPER-1621:
-

Here's a different option - intuitively once zookeeper fails to write to disk, 
by continuing to operate normally it violates its promises to users (which is 
that if a majority acked the data is always there even if reboots happen). Once 
we realize the promise can't be kept it may be better to crash the server at 
that point and violate liveness (no availability) rather than to continue and 
risk coming up with a partial log at a later point violating safety 
(inconsistent state, lost transactions, etc).


> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2014-05-20 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004117#comment-14004117
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1621:


I'm fine with Alex's suggestion. We should document how to manually recover 
when the server doesn't start because the log file doesn't contain the complete 
header.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2018-07-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562446#comment-16562446
 ] 

Hadoop QA commented on ZOOKEEPER-1621:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12837140/ZOOKEEPER-1621.2.patch
  against trunk revision 78e4a1047c701006dd4ec8d09065eda0e7adedb5.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3700//console

This message is automatically generated.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, 
> zookeeper.log.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situation

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2018-12-17 Thread Jeff Widman (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723374#comment-16723374
 ] 

Jeff Widman commented on ZOOKEEPER-1621:


I was checking into this and noticed that the PR containing the patch for this 
includes a comment indicating this problem may be resolved by another, 
already-merged PR: 
https://github.com/apache/zookeeper/pull/439#issuecomment-408994085

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, 
> zookeeper.log.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2018-12-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723383#comment-16723383
 ] 

Hadoop QA commented on ZOOKEEPER-1621:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12837140/ZOOKEEPER-1621.2.patch
  against trunk revision 46fc819622bf08cbd0781dea279aff734b492902.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3721//console

This message is automatically generated.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Michi Mutsuzaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, 
> zookeeper.log.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situation

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread David Arthur (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555110#comment-13555110
 ] 

David Arthur commented on ZOOKEEPER-1621:
-

I was able to workaround the issue by deleting the partially written snapshot 
file

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555158#comment-13555158
 ] 

Flavio Junqueira commented on ZOOKEEPER-1621:
-

I believe the exception is being thrown while reading the snapshot and the 
partial transaction message is not an indication of what is causing it to 
crash. It sounds right that we should try a different snapshot, but according 
to the log messages you posted, it sounds like the problem is that we are not 
catching EOFException. 

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555169#comment-13555169
 ] 

Mahadev konar commented on ZOOKEEPER-1621:
--

David,
 So there exceptions are thrown when ZooKeeper is running? Am not sure why its 
exiting so many times. Do you guys restart the ZK server if it dies?

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
> Fix For: 3.5.0
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread David Arthur (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555189#comment-13555189
 ] 

David Arthur commented on ZOOKEEPER-1621:
-

We run ZooKeeper with runit, so yes it is restarted when it dies. It ends up in 
a loop of:

* No space left on device
* Starting server
* Last transaction was partial
* Snapshotting: 0x19a3d to /opt/zookeeper-3.4.3/data/version-2/snapshot.19a3d
* No space left on device

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
> Fix For: 3.5.0
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555192#comment-13555192
 ] 

Mahadev konar commented on ZOOKEEPER-1621:
--

David,
 I thought you said it does not recover when disk was full, but looks like the 
disk is still full? No?

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
> Fix For: 3.5.0
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread David Arthur (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555215#comment-13555215
 ] 

David Arthur commented on ZOOKEEPER-1621:
-

Here is the full sequence of events (sorry for the confusion):

* Noticed disk was full
* Cleaned up disk space
* Tried zkCli.sh, got errors
* Checked ZK log, loop of:

2013-01-16 15:01:35,194 - ERROR [main:Util@239] - Last transaction was partial.
2013-01-16 15:01:35,196 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at 
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at 
org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
at 
org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
at 
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at 
org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
at 
org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
at 
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
at 
org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
at 
org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)

* Stopped ZK
* Listed ZK data directory

ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ ls -lat
total 18096
drwxr-xr-x 2 zookeeper zookeeper 4096 Jan 16 06:41 .
-rw-r--r-- 1 zookeeper zookeeper0 Jan 16 06:41 log.19a3e
-rw-r--r-- 1 zookeeper zookeeper   585377 Jan 16 06:41 snapshot.19a3d
-rw-r--r-- 1 zookeeper zookeeper 67108880 Jan 16 03:11 log.19a2a
-rw-r--r-- 1 zookeeper zookeeper   585911 Jan 16 03:11 snapshot.19a29
-rw-r--r-- 1 zookeeper zookeeper 67108880 Jan 16 03:11 log.11549
-rw-r--r-- 1 zookeeper zookeeper   585190 Jan 15 17:28 snapshot.11547
-rw-r--r-- 1 zookeeper zookeeper 67108880 Jan 15 17:28 log.1
-rw-r--r-- 1 zookeeper zookeeper  296 Jan 14 16:44 snapshot.0
drwxr-xr-x 3 zookeeper zookeeper 4096 Jan 14 16:44 ..

* Removed log.19a3e and snapshot.19a3d

ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ sudo rm log.19a3e
ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ sudo rm 
snapshot.19a3d

* Started ZK
* Back to normal

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
> Fix For: 3.5.0
>
> Attachments: zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> a

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Edward Ribeiro (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555243#comment-13555243
 ] 

Edward Ribeiro commented on ZOOKEEPER-1621:
---

Hi folks,

FYI, this issue is a duplication of ZOOKEEPER-1612 (curiously, a permutation of 
the last two digits, heh). I'd suggest to close 1612 as dup instead, if 
possible.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
> Fix For: 3.5.0
>
> Attachments: zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555318#comment-13555318
 ] 

Mahadev konar commented on ZOOKEEPER-1621:
--

Ill makr 1612 as dup. Thanks for pointing that out Edward.



> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
> Fix For: 3.5.0
>
> Attachments: zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557022#comment-13557022
 ] 

Mahadev konar commented on ZOOKEEPER-1621:
--

Looks like the header was incomplete. Unfortunately we do not handle corrupt 
header but do handle corrupt txn's later. Am suprised that this happened twice 
in a row for 2 users. Ill upload a patch and test case.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
> Fix For: 3.5.0
>
> Attachments: zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira