[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550136#comment-14550136 ]
Hadoop QA commented on ZOOKEEPER-2101: -------------------------------------- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12733762/ZOOKEEPER-2101-v6.diff against trunk revision 1679951. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2703//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2703//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2703//console This message is automatically generated. > Transaction larger than max buffer of jute makes zookeeper unavailable > ---------------------------------------------------------------------- > > Key: ZOOKEEPER-2101 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 > Project: ZooKeeper > Issue Type: Bug > Components: jute > Affects Versions: 3.4.4 > Reporter: Liu Shaohui > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, > ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, > ZOOKEEPER-2101-v6.diff, test.diff > > > *Problem* > For multi operation, PrepRequestProcessor may produce a large transaction > whose size may be larger than the max buffer size of jute. There is check of > buffer size in readBuffer method of BinaryInputArchive, but no check in > writeBuffer method of BinaryOutputArchive, which will cause that > 1, Leader can sync transaction to txn log and send the large transaction to > the followers, but the followers failed to read the transaction and can't > sync with leader. > {code} > 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: > [myid:2] Exception when following the leader > java.io.IOException: Unreasonable length = 2054758 > at > org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) > at > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) > at > org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) > at > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) > 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: > [myid:2] shutdown called > java.lang.Exception: shutdown Follower > at > org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) > {code} > 2, The leader lose all followers, which trigger the leader election. The old > leader will become leader again for it has up-to-date data. > {code} > 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: > [myid:3] Shutting down > 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: > [myid:3] Shutdown called > java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 > at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) > at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) > {code} > 3, The leader can not load the transaction from the txn log for the length of > data is larger than the max buffer of jute. > {code} > 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: > [myid:3] Unable to load database on disk > java.io.IOException: Unreasonable length = 2054758 > at > org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) > at > org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) > at > org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) > at > org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) > at > org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) > {code} > The zookeeper service will be unavailable until we enlarge the jute.maxbuffer > and restart zookeeper hbase cluster. > *Solution* > Add buffer size check in BinaryOutputArchive to avoid large transaction be > written to log and sent to followers. > But I am not sure if there are side-effects of throwing an IOException in > BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)