ZheyuanLin created ZOOKEEPER-4684: ------------------------------------- Summary: Unable to load database on disk exception occurs when the cluster startup Key: ZOOKEEPER-4684 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4684 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.10, 3.4.9, 3.4.6, 3.4.5 Reporter: ZheyuanLin
I found a cluster startup error where two of the three nodes failed to start properly due to this exception, so the cluster couldn't hold elections, eventually crashing: {code:java} 2023-03-16 09:08:31,945 [myid:3] - INFO [main:QuorumPeer@429] - currentEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation 2023-03-16 09:08:31,962 [myid:3] - INFO [main:QuorumPeer@444] - acceptedEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation 2023-03-16 09:08:31,972 [myid:3] - ERROR [main:QuorumPeer@453] - Unable to load database on disk java.io.IOException: Could not rename temporary file /zookeeper/zookeeper-3.4.5/zkdata/version-2/currentEpoch.tmp to /zookeeper/zookeeper-3.4.5/zkdata/version-2/currentEpoch at org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:82) at org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1117) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:447) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) 2023-03-16 09:08:31,973 [myid:3] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) Caused by: java.io.IOException: Could not rename temporary file /zookeeper/zookeeper-3.4.5/zkdata/version-2/currentEpoch.tmp to /zookeeper/zookeeper-3.4.5/zkdata/version-2/currentEpoch at org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:82) at org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1117) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:447) ... 4 more {code} I almost found this exception on version 3.4.X, and the cluster starts normally on version 3.5.0 and above. I think there are several potential causes for this error. One possibility is that there is a permission issue with the directory or file, which is preventing ZooKeeper from renaming the temporary file. Another possibility is that the disk is full or experiencing other issues, preventing the file rename from occurring. Hope someone can come and help me solve it, thanks a lot. -- This message was sent by Atlassian Jira (v8.20.10#820010)