ZheyuanLin created ZOOKEEPER-4684:
-------------------------------------
Summary: Unable to load database on disk exception occurs when the
cluster startup
Key: ZOOKEEPER-4684
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4684
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.4.10, 3.4.9, 3.4.6, 3.4.5
Reporter: ZheyuanLin
I found a cluster startup error where two of the three nodes failed to start
properly due to this exception, so the cluster couldn't hold elections,
eventually crashing:
{code:java}
2023-03-16 09:08:31,945 [myid:3] - INFO [main:QuorumPeer@429] - currentEpoch
not found! Creating with a reasonable default of 0. This should only happen
when you are upgrading your installation
2023-03-16 09:08:31,962 [myid:3] - INFO [main:QuorumPeer@444] - acceptedEpoch
not found! Creating with a reasonable default of 0. This should only happen
when you are upgrading your installation
2023-03-16 09:08:31,972 [myid:3] - ERROR [main:QuorumPeer@453] - Unable to load
database on disk
java.io.IOException: Could not rename temporary file
/zookeeper/zookeeper-3.4.5/zkdata/version-2/currentEpoch.tmp to
/zookeeper/zookeeper-3.4.5/zkdata/version-2/currentEpoch
at
org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:82)
at
org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1117)
at
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:447)
at
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
2023-03-16 09:08:31,973 [myid:3] - ERROR [main:QuorumPeerMain@89] - Unexpected
exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
at
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
at
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: java.io.IOException: Could not rename temporary file
/zookeeper/zookeeper-3.4.5/zkdata/version-2/currentEpoch.tmp to
/zookeeper/zookeeper-3.4.5/zkdata/version-2/currentEpoch
at
org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:82)
at
org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1117)
at
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:447)
... 4 more
{code}
I almost found this exception on version 3.4.X, and the cluster starts normally
on version 3.5.0 and above.
I think there are several potential causes for this error. One possibility is
that there is a permission issue with the directory or file, which is
preventing ZooKeeper from renaming the temporary file. Another possibility is
that the disk is full or experiencing other issues, preventing the file rename
from occurring.
Hope someone can come and help me solve it, thanks a lot.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)