[jira] [Commented] (ZOOKEEPER-2332) Zookeeper failed to start for empty txn log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044328#comment-15044328 ] Liu Shaohui commented on ZOOKEEPER-2332: [~rgs] {quote} how did the empty txnlog happened in the first place? {quote} The zookeeper server was killed after creating a new txn log file before flushing the log header to the log. So a txn log is left without a valid header and makes the the zookeeper server fail to start. See: FileTxnLog.java#207 {code} if (logStream==null) { if(LOG.isInfoEnabled()){ LOG.info("Creating new log file: log." + Long.toHexString(hdr.getZxid())); } logFileWrite = new File(logDir, ("log." + Long.toHexString(hdr.getZxid(; fos = new FileOutputStream(logFileWrite); logStream=new BufferedOutputStream(fos); oa = BinaryOutputArchive.getArchive(logStream); FileHeader fhdr = new FileHeader(TXNLOG_MAGIC,VERSION, dbId); fhdr.serialize(oa, "fileheader"); // Make sure that the magic number is written before padding. logStream.flush(); currentSize = fos.getChannel().position(); streamsToFlush.add(fos); } {code} > Zookeeper failed to start for empty txn log > --- > > Key: ZOOKEEPER-2332 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2332 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.6 >Reporter: Liu Shaohui >Priority: Critical > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-2332-v001.diff > > > We found that the zookeeper server with version 3.4.6 failed to start for > there is a empty txn log in log dir. > I think we should skip the empty log file during restoring the datatree. > Any suggestion? > {code} > 2015-11-27 19:16:16,887 [myid:] - ERROR [main:ZooKeeperServerMain@63] - > Unexpected exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:392) > at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:576) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:595) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:561) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:643) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158) > at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:272) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:399) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:113) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2101: --- Attachment: ZOOKEEPER-2101-v8.diff Rebase on the trunk [~brahmareddy] [~hdeng] Could you help to push this patch? > Transaction larger than max buffer of jute makes zookeeper unavailable > -- > > Key: ZOOKEEPER-2101 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 > Project: ZooKeeper > Issue Type: Bug > Components: jute >Affects Versions: 3.4.4 >Reporter: Liu Shaohui > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, > ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, > ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, ZOOKEEPER-2101-v8.diff, > test.diff > > > *Problem* > For multi operation, PrepRequestProcessor may produce a large transaction > whose size may be larger than the max buffer size of jute. There is check of > buffer size in readBuffer method of BinaryInputArchive, but no check in > writeBuffer method of BinaryOutputArchive, which will cause that > 1, Leader can sync transaction to txn log and send the large transaction to > the followers, but the followers failed to read the transaction and can't > sync with leader. > {code} > 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: > [myid:2] Exception when following the leader > java.io.IOException: Unreasonable length = 2054758 > at > org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) > at > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) > at > org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) > at > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) > 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: > [myid:2] shutdown called > java.lang.Exception: shutdown Follower > at > org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) > {code} > 2, The leader lose all followers, which trigger the leader election. The old > leader will become leader again for it has up-to-date data. > {code} > 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: > [myid:3] Shutting down > 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: > [myid:3] Shutdown called > java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 > at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) > at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) > {code} > 3, The leader can not load the transaction from the txn log for the length of > data is larger than the max buffer of jute. > {code} > 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: > [myid:3] Unable to load database on disk > java.io.IOException: Unreasonable length = 2054758 > at > org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) > at > org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) > at > org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) > at > org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) > at > org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) > {code} > The zookeeper service will be unavailable until we enlarge the jute.maxbuffer > and restart zookeeper hbase cluster. > *Solution* > Add buffer size check in BinaryOutputArchive to avoid large transaction be > written to log and sent to followers. > But I am not sure if there are side-effects of throwing an IOException in > BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2332) Zookeeper failed to start for empty txn log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2332: --- Attachment: ZOOKEEPER-2332-v001.diff First patch to fix this issue. [~rgs] Could you help to review this small patch? Thanks a lot. > Zookeeper failed to start for empty txn log > --- > > Key: ZOOKEEPER-2332 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2332 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.6 >Reporter: Liu Shaohui >Priority: Critical > Attachments: ZOOKEEPER-2332-v001.diff > > > We found that the zookeeper server with version 3.4.6 failed to start for > there is a empty txn log in log dir. > I think we should skip the empty log file during restoring the datatree. > Any suggestion? > {code} > 2015-11-27 19:16:16,887 [myid:] - ERROR [main:ZooKeeperServerMain@63] - > Unexpected exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:392) > at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:576) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:595) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:561) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:643) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158) > at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:272) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:399) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:113) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2332) Zookeeper failed to start for empty txn log
Liu Shaohui created ZOOKEEPER-2332: -- Summary: Zookeeper failed to start for empty txn log Key: ZOOKEEPER-2332 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2332 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.6 Reporter: Liu Shaohui Priority: Critical We found that the zookeeper server with version 3.4.6 failed to start for there is a empty txn log in log dir. I think we should skip the empty log file during restoring the datatree. Any suggestion? {code} 2015-11-27 19:16:16,887 [myid:] - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:576) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:595) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:561) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:643) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:272) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:399) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:113) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2290: --- Attachment: ZOOKEEPER-2290-v5.patch Fix the failed test > Add read/write qps metrics in monitor cmd > - > > Key: ZOOKEEPER-2290 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.6 >Reporter: Liu Shaohui >Priority: Minor > Labels: monitor > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, > ZOOKEEPER-2290-v3.patch, ZOOKEEPER-2290-v4.patch, ZOOKEEPER-2290-v5.patch > > > Read/write qps are important metrics to show the pressure of the cluster. We > can also use it to alert about some abuse of zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2290: --- Attachment: ZOOKEEPER-2290-v2.patch Update for [~eribeiro]'s review {quote} 1/2/3/4/6 {quote} All done {quote} 1. Rate's public void inc(final int incr) is defined, but not used, right? If so, no need to include it. {quote} Removed. {quote} 2. If getRate is the rate per _second_, why are you using time slots of 10,000 ms? {quote}. The average of 10,000 ms may be more representative than that in 1000ms. But never mind, and just revert it 1000 ms. Thanks very much for your so careful review~ > Add read/write qps metrics in monitor cmd > - > > Key: ZOOKEEPER-2290 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.6 >Reporter: Liu Shaohui >Priority: Minor > Labels: monitor > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch > > > Read/write qps are important metrics to show the pressure of the cluster. We > can also use it to alert about some abuse of zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953039#comment-14953039 ] Liu Shaohui commented on ZOOKEEPER-2290: The failed test have no relation with the patch v3. > Add read/write qps metrics in monitor cmd > - > > Key: ZOOKEEPER-2290 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.6 >Reporter: Liu Shaohui >Priority: Minor > Labels: monitor > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, > ZOOKEEPER-2290-v3.patch > > > Read/write qps are important metrics to show the pressure of the cluster. We > can also use it to alert about some abuse of zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2290: --- Attachment: ZOOKEEPER-2290-v3.patch ZOOKEEPER-2290-v3.patch Fix the failed tests > Add read/write qps metrics in monitor cmd > - > > Key: ZOOKEEPER-2290 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.6 >Reporter: Liu Shaohui >Priority: Minor > Labels: monitor > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, > ZOOKEEPER-2290-v3.patch > > > Read/write qps are important metrics to show the pressure of the cluster. We > can also use it to alert about some abuse of zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2290: --- Attachment: (was: ZOOKEEPER-2290-v3.patch) > Add read/write qps metrics in monitor cmd > - > > Key: ZOOKEEPER-2290 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.6 >Reporter: Liu Shaohui >Priority: Minor > Labels: monitor > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, > ZOOKEEPER-2290-v3.patch > > > Read/write qps are important metrics to show the pressure of the cluster. We > can also use it to alert about some abuse of zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2290: --- Attachment: (was: ZOOKEEPER-2290-v3.patch) > Add read/write qps metrics in monitor cmd > - > > Key: ZOOKEEPER-2290 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.6 >Reporter: Liu Shaohui >Priority: Minor > Labels: monitor > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, > ZOOKEEPER-2290-v3.patch > > > Read/write qps are important metrics to show the pressure of the cluster. We > can also use it to alert about some abuse of zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2290: --- Attachment: ZOOKEEPER-2290-v3.patch Fix the failed tests > Add read/write qps metrics in monitor cmd > - > > Key: ZOOKEEPER-2290 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.6 >Reporter: Liu Shaohui >Priority: Minor > Labels: monitor > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, > ZOOKEEPER-2290-v3.patch > > > Read/write qps are important metrics to show the pressure of the cluster. We > can also use it to alert about some abuse of zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954389#comment-14954389 ] Liu Shaohui commented on ZOOKEEPER-2290: [~eribeiro] {quote} *Also I think it is nice to expose those new metrics via JMX too.* {quote} Done in patch v3. Please see the class: ZooKeeperServerBean {quote} And you should put a 'default' case in the switch-case statement – throwing a IllegalArgumentException {quote} Done. {quote} I think getRate() should return a double instead of a long 'cause it will round results otherwise. Also, TIMES_SLOT should be final too. I would rename it as TIME_SLOT_MS {quote} Done. {quote} Finally, no need to call updateLatency() on every op type: you can set the operation type (READ/WRITE) instead and call updateLatency() at the end. {quote} The preview two updateLatency() are because there are *return* in the switch cases. I don't want to change the code structure > Add read/write qps metrics in monitor cmd > - > > Key: ZOOKEEPER-2290 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.6 >Reporter: Liu Shaohui >Priority: Minor > Labels: monitor > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, > ZOOKEEPER-2290-v3.patch > > > Read/write qps are important metrics to show the pressure of the cluster. We > can also use it to alert about some abuse of zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2290: --- Attachment: ZOOKEEPER-2290-v4.patch Update for [~eribeiro]'s review~ > Add read/write qps metrics in monitor cmd > - > > Key: ZOOKEEPER-2290 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.6 >Reporter: Liu Shaohui >Priority: Minor > Labels: monitor > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, > ZOOKEEPER-2290-v3.patch, ZOOKEEPER-2290-v4.patch > > > Read/write qps are important metrics to show the pressure of the cluster. We > can also use it to alert about some abuse of zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954316#comment-14954316 ] Liu Shaohui commented on ZOOKEEPER-2290: [~cconroy] {quote} I would suggest also publishing the raw counters. Monotonically increasing counters are a bit more versatile than locally computed rates. {quote} In my opinion, the qps metrics are more direct than the raw counter and there may be some deviation in the qps calculated from the counter by the external monitor system. > Add read/write qps metrics in monitor cmd > - > > Key: ZOOKEEPER-2290 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.6 >Reporter: Liu Shaohui >Priority: Minor > Labels: monitor > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, > ZOOKEEPER-2290-v3.patch > > > Read/write qps are important metrics to show the pressure of the cluster. We > can also use it to alert about some abuse of zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd
Liu Shaohui created ZOOKEEPER-2290: -- Summary: Add read/write qps metrics in monitor cmd Key: ZOOKEEPER-2290 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290 Project: ZooKeeper Issue Type: Improvement Reporter: Liu Shaohui Priority: Minor Read/write qps are important metrics to show the pressure of the cluster. We can also use it to alert about some abuse of zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2290: --- Attachment: ZOOKEEPER-2290-v1.patch Patch for trunk > Add read/write qps metrics in monitor cmd > - > > Key: ZOOKEEPER-2290 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Liu Shaohui >Priority: Minor > Attachments: ZOOKEEPER-2290-v1.patch > > > Read/write qps are important metrics to show the pressure of the cluster. We > can also use it to alert about some abuse of zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557939#comment-14557939 ] Liu Shaohui commented on ZOOKEEPER-2101: {quote} Swallowing a system failure exception doesn't look like a good choice. I usually prefer to let the system crash if not recoverable. {quote} Agreed {quote} So I will leave it to others to comment. {quote} From the log, the last changes of these code are done by [~fpj] in ZOOKEEPER-106 [~fpj] Do you know why the exceptions is ignored in ZKDatabase.java or Leader.java? Thanks~ {code} ByteArrayOutputStream baos = new ByteArrayOutputStream(); BinaryOutputArchive boa = BinaryOutputArchive.getArchive(baos); try { request.hdr.serialize(boa, hdr); if (request.txn != null) { request.txn.serialize(boa, txn); } baos.close(); } catch (IOException e) { LOG.error(This really should be impossible, e); } {code} Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557930#comment-14557930 ] Liu Shaohui commented on ZOOKEEPER-2101: [~hdeng] Any suggestion? Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1435#comment-1435 ] Liu Shaohui commented on ZOOKEEPER-2101: [~hdeng] Actually, these code are just moved from ZKDatabase.java or Leader.java. Please see the patch. I am not very sure about why it just ingore those exceptions. Maybe it's really impossible. Or we can open another jira issue to discuss it. Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and
[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2101: --- Attachment: ZOOKEEPER-2101-v6.diff Update for [~rakeshr]'s review. - Using IOUtils.cleanup(LOG, baos) Instead of try-catch. - Update the log messages: {code} throw new IOException(Len error + barr.length + , less than 0 or larger than max buffer: + BinaryInputArchive.maxBuffer + set by jute.maxbuffer); {code} Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, ZOOKEEPER-2101-v6.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive
[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551684#comment-14551684 ] Liu Shaohui commented on ZOOKEEPER-2101: [~rakeshr] Do we need another +1 for commit? Or could you help to push this issue? Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, ZOOKEEPER-2101-v6.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2101: --- Attachment: ZOOKEEPER-2101-v5.diff Update for [~rakeshr]'s review. {quote} Move {{ baos.close();}} to finally block {quote} Done. {quote} Please format the lines, few lines exceeds 80 lines. {quote} Done. {quote} In tests, any specific reason to increase the value of TEST_MAXBUFFER to 1000? {quote} The size of extra fields in transaction is large than 100. So we increase the TEST_MAXBUFFER to 1000. {quote} checking 0 condition also. {quote} Done. Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am
[jira] [Commented] (ZOOKEEPER-2191) Continue supporting prior Ant versions that don't implement the threads attribute for the JUnit task.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549713#comment-14549713 ] Liu Shaohui commented on ZOOKEEPER-2191: LGTM Could someone help to push this issue? Continue supporting prior Ant versions that don't implement the threads attribute for the JUnit task. - Key: ZOOKEEPER-2191 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2191 Project: ZooKeeper Issue Type: Improvement Components: build Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: ZOOKEEPER-2191.001.patch ZOOKEEPER-2183 introduced usage of the threads attribute on the junit task call in build.xml to speed up test execution. This attribute is only available since Ant 1.9.4. However, we can continue to support older Ant versions by calling the antversion task and dispatching to a clone of our junit task call that doesn't use the threads attribute. Users of older Ant versions will get the slower single-process test execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547460#comment-14547460 ] Liu Shaohui commented on ZOOKEEPER-2101: [~michim] Could you help to review this patch? I saw you help to update fix versions. :) Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543561#comment-14543561 ] Liu Shaohui commented on ZOOKEEPER-2101: [~rakeshr] Sorry for late replay. Could you help to review the new patch? Thanks. Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Fix For: 3.5.1 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2101: --- Fix Version/s: 3.5.1 Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Fix For: 3.5.1 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2101: --- Attachment: ZOOKEEPER-2101-v4.diff Update for [~rakeshr] review. - Add unit tests - Fix the log problems {quote} The attached log is comparing request.request.capacity() and data.length. But data.length contains both request and additional fields. So comparing these both won't give exact values. {quote} Just add more info in the log Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543584#comment-14543584 ] Liu Shaohui commented on ZOOKEEPER-2101: [~iandi] {quote} However, perhaps an existing error code would be suited to this, such as BADARGUMENTS? {quote} Good advice. Change the error code to BADARGUMENTS. Thanks. Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Fix For: 3.5.1 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544843#comment-14544843 ] Liu Shaohui commented on ZOOKEEPER-2101: The failed test has no relation with this patch. And rerun it in my machine for many times and all passed. Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Fix For: 3.5.1 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2101: --- Attachment: ZOOKEEPER-2101-v3.diff Check proposal size in PrepRequestProcessor and throw ProposalTooLargeException exception when proposal is larger then the max jute buffer size. Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2101: --- Attachment: ZOOKEEPER-2101-v2.diff Limit the size of packet less than the half of jute max buffer size Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2101: --- Attachment: test.diff [~rakeshr] Add log in ZKDatabase to validate that the size of Proposal may larger than the request size. {code} 2015-01-16 17:56:07,469 [myid:] - INFO [SyncThread:0:ZKDatabase@261] - Request type 14 size: 5499 zxid: 2, Proposal size:5526 {code} Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Attachments: ZOOKEEPER-2101-v1.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
Liu Shaohui created ZOOKEEPER-2101: -- Summary: Transaction larger than max buffer of jute makes zookeeper unavailable Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2101: --- Attachment: ZOOKEEPER-2101-v1.diff Patch for trunk. Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Attachments: ZOOKEEPER-2101-v1.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2092) A zk instance can not be connected for ZooKeeperServer is not running
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228244#comment-14228244 ] Liu Shaohui commented on ZOOKEEPER-2092: [~fpj] The thread wait at ZooKeeperServer.java:634 forever and never accept new connections. The reason is that running var in ZooKeeperServer is false. A zk instance can not be connected for ZooKeeperServer is not running - Key: ZOOKEEPER-2092 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2092 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.4 Reporter: Liu Shaohui Attachments: stack In our 5 node zk cluster, we found a zk node always can not be connected. From the stack we found the ZooKeeperServer hung at waiting the server to be running. But the node is running normally and synced with the leader. {code} $ ./zkCli.sh -server 10.101.10.67:11000 ls / 2014-11-27 20:57:11,843 [myid:] - WARN [main-SendThread(lg-com-master02.bj:11000):ClientCnxn$SendThread@1089] - Session 0x0 for server lg-com-master02.bj/10.101.10.67:11000, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:353) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) Exception in thread main org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for / at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1469) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1497) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:726) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:594) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:355) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:283) {code} ZooKeeperServer stack {code} NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11000 daemon prio=10 tid=0x7f60143f7800 nid=0x31fd in Object.wait() [0x7f5fd4678000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.zookeeper.server.ZooKeeperServer.submitRequest(ZooKeeperServer.java:634) - locked 0x0007602756a0 (a org.apache.zookeeper.server.quorum.FollowerZooKeeperServer) at org.apache.zookeeper.server.ZooKeeperServer.submitRequest(ZooKeeperServer.java:626) at org.apache.zookeeper.server.ZooKeeperServer.createSession(ZooKeeperServer.java:525) at org.apache.zookeeper.server.ZooKeeperServer.processConnectRequest(ZooKeeperServer.java:841) at org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:410) at org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:200) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:236) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:662) {code} Any suggestions about this problem? Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2092) A zk instance can not be connected for ZooKeeperServer is not running
Liu Shaohui created ZOOKEEPER-2092: -- Summary: A zk instance can not be connected for ZooKeeperServer is not running Key: ZOOKEEPER-2092 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2092 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.4 Reporter: Liu Shaohui In our 5 node zk cluster, we found a zk node always can not be connected. From the stack we found the ZooKeeperServer hung at waiting the server to be running. But the node is running normally and synced with the leader. {code} $ ./zkCli.sh -server 10.101.10.67:11000 ls / 2014-11-27 20:57:11,843 [myid:] - WARN [main-SendThread(lg-com-master02.bj:11000):ClientCnxn$SendThread@1089] - Session 0x0 for server lg-com-master02.bj/10.101.10.67:11000, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:353) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) Exception in thread main org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for / at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1469) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1497) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:726) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:594) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:355) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:283) {code} ZooKeeperServer stack {code} NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11000 daemon prio=10 tid=0x7f60143f7800 nid=0x31fd in Object.wait() [0x7f5fd4678000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.zookeeper.server.ZooKeeperServer.submitRequest(ZooKeeperServer.java:634) - locked 0x0007602756a0 (a org.apache.zookeeper.server.quorum.FollowerZooKeeperServer) at org.apache.zookeeper.server.ZooKeeperServer.submitRequest(ZooKeeperServer.java:626) at org.apache.zookeeper.server.ZooKeeperServer.createSession(ZooKeeperServer.java:525) at org.apache.zookeeper.server.ZooKeeperServer.processConnectRequest(ZooKeeperServer.java:841) at org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:410) at org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:200) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:236) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:662) {code} Any suggestions about this problem? Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2092) A zk instance can not be connected for ZooKeeperServer is not running
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated ZOOKEEPER-2092: --- Attachment: stack The full stack A zk instance can not be connected for ZooKeeperServer is not running - Key: ZOOKEEPER-2092 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2092 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.4 Reporter: Liu Shaohui Attachments: stack In our 5 node zk cluster, we found a zk node always can not be connected. From the stack we found the ZooKeeperServer hung at waiting the server to be running. But the node is running normally and synced with the leader. {code} $ ./zkCli.sh -server 10.101.10.67:11000 ls / 2014-11-27 20:57:11,843 [myid:] - WARN [main-SendThread(lg-com-master02.bj:11000):ClientCnxn$SendThread@1089] - Session 0x0 for server lg-com-master02.bj/10.101.10.67:11000, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:353) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) Exception in thread main org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for / at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1469) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1497) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:726) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:594) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:355) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:283) {code} ZooKeeperServer stack {code} NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11000 daemon prio=10 tid=0x7f60143f7800 nid=0x31fd in Object.wait() [0x7f5fd4678000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.zookeeper.server.ZooKeeperServer.submitRequest(ZooKeeperServer.java:634) - locked 0x0007602756a0 (a org.apache.zookeeper.server.quorum.FollowerZooKeeperServer) at org.apache.zookeeper.server.ZooKeeperServer.submitRequest(ZooKeeperServer.java:626) at org.apache.zookeeper.server.ZooKeeperServer.createSession(ZooKeeperServer.java:525) at org.apache.zookeeper.server.ZooKeeperServer.processConnectRequest(ZooKeeperServer.java:841) at org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:410) at org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:200) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:236) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:662) {code} Any suggestions about this problem? Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)