ZheyuanLin created ZOOKEEPER-4684:
-------------------------------------

             Summary: Unable to load database on disk exception occurs when the 
cluster startup 
                 Key: ZOOKEEPER-4684
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4684
             Project: ZooKeeper
          Issue Type: Bug
    Affects Versions: 3.4.10, 3.4.9, 3.4.6, 3.4.5
            Reporter: ZheyuanLin


I found a cluster startup error where two of the three nodes failed to start 
properly due to this exception, so the cluster couldn't hold elections, 
eventually crashing:
{code:java}
2023-03-16 09:08:31,945 [myid:3] - INFO  [main:QuorumPeer@429] - currentEpoch 
not found! Creating with a reasonable default of 0. This should only happen 
when you are upgrading your installation
2023-03-16 09:08:31,962 [myid:3] - INFO  [main:QuorumPeer@444] - acceptedEpoch 
not found! Creating with a reasonable default of 0. This should only happen 
when you are upgrading your installation
2023-03-16 09:08:31,972 [myid:3] - ERROR [main:QuorumPeer@453] - Unable to load 
database on disk
java.io.IOException: Could not rename temporary file 
/zookeeper/zookeeper-3.4.5/zkdata/version-2/currentEpoch.tmp to 
/zookeeper/zookeeper-3.4.5/zkdata/version-2/currentEpoch
        at 
org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:82)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1117)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:447)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
2023-03-16 09:08:31,973 [myid:3] - ERROR [main:QuorumPeerMain@89] - Unexpected 
exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server 
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: java.io.IOException: Could not rename temporary file 
/zookeeper/zookeeper-3.4.5/zkdata/version-2/currentEpoch.tmp to 
/zookeeper/zookeeper-3.4.5/zkdata/version-2/currentEpoch
        at 
org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:82)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1117)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:447)
        ... 4 more
{code}
I almost found this exception on version 3.4.X, and the cluster starts normally 
on version 3.5.0 and above.

I think there are several potential causes for this error. One possibility is 
that there is a permission issue with the directory or file, which is 
preventing ZooKeeper from renaming the temporary file. Another possibility is 
that the disk is full or experiencing other issues, preventing the file rename 
from occurring. 

Hope someone can come and help me solve it, thanks a lot.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to