[
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306470#comment-16306470
]
ASF GitHub Bot commented on ZOOKEEPER-1621:
-------------------------------------------
GitHub user abhishekrai opened a pull request:
https://github.com/apache/zookeeper/pull/439
ZOOKEEPER-1621: Delete and skip txn log with incomplete header
Based on the patch by Michi Mutsuzaki.
When Zookeeper server encounters a txn log with incomplete header,
the old behavior was to crash due to the resulting EOFException.
The new behavior is catch the exception and skip the txn log.
Additionally, the txn log is deleted to ensure that it does not
influence future loads/PurgeTxnLog in believing that this is
the only txn log before the following snapshot that they need to
load/retain.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/abhishekrai/zookeeper ZOOKEEPER-1621
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/zookeeper/pull/439.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #439
----
commit 6b457a069ccdb01e1ee77537b02db80f3005f5b1
Author: Abhishek Rai <abhishekrai@...>
Date: 2017-12-29T17:38:52Z
ZOOKEEPER-1621: Delete and skip txn log with incomplete header
Based on the patch by Michi Mutsuzaki.
When Zookeeper server encounters a txn log with incomplete header,
the old behavior was to crash due to the resulting EOFException.
The new behavior is catch the exception and skip the txn log.
Additionally, the txn log is deleted to ensure that it does not
influence future loads/PurgeTxnLog in believing that this is
the only txn log before the following snapshot that they need to
load/retain.
----
> ZooKeeper does not recover from crash when disk was full
> --------------------------------------------------------
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
> Reporter: David Arthur
> Assignee: Michi Mutsuzaki
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch,
> zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] -
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to
> avoid such situations. Is this not the case?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)