On Wed, Jul 28, 2021 at 5:20 AM Damien Diederen <ddiede...@apache.org> wrote:
> > Hi Li, all, > > > When load testing the write operation against Zookeeper 3.7.0, I > observed a > > couple of times that the server crashed because the txn log was too large > > and it was not able to load it. > > Difficult to say without more details, but I suspect ZOOKEEPER-4306 to > be the culprit: > > https://issues.apache.org/jira/browse/ZOOKEEPER-4306 Yes, ZOOKEEPER-4306 could be the culprit. In my write operation test, all the nodes were created as ephemeral. > > Would it be possible for you to share the transaction log ZooKeeper > fails to load? > I observed the issues a couple of times about one month ago. I tried to investigate this issue more recently, but was not able to reproduce it. I remembered I read the txn log file using zkTxnLogToolkit.sh and it was similar to what's mentioned in the https://issues.apache.org/jira/browse/ZOOKEEPER-4306. Unfortunately, I didn't save the txn log. I will save the txn log if I can reproduce it or it happens again. Thanks, Li > HTH, -D > > > > --8<---------------original message------------->8--- > > Li Wang <li4w...@gmail.com> writes: > > Hi, > > > > > > > > When load testing the write operation against Zookeeper 3.7.0, I > observed a > > couple of times that the server crashed because the txn log was too large > > and it was not able to load it. However the data size of write is only 4 > > bytes in the load test and the *jute.maxbuffer *was set to default (i.e. > > 1M). The error doesn't always happen. > > > > > > I wonder if anyone has also seen this error or has any idea on what may > > cause the issue? > > > > > > StackTrace > > > > ========= > > > > > > 2021-07-01 16:02:00,837 [myid:3] - ERROR [main:QuorumPeerMain@114] - > > Unexpected exception, exiting abnormally > > > > java.lang.RuntimeException: Unable to run quorum server > > > > at > > > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1200) > > > > at > org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1131) > > > > at > > > org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:229) > > > > at > > > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:137) > > > > at > > > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:91) > > > > Caused by: java.io.IOException: Unreasonable length = 3175014 > > > > at > > > org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:166) > > > > at > > > org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:127) > > > > at > org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:159) > > > > at > > > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:749) > > > > at > > > org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:361) > > > > at > > > org.apache.zookeeper.server.persistence.FileTxnSnapLog.lambda$restore$0(FileTxnSnapLog.java:267) > > > > at > > > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:312) > > > > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:287) > > > > at > > > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1145) > > > > > > Thanks, > > > > > > Li >