[jira] [Commented] (ZOOKEEPER-2332) Zookeeper failed to start for empty txn log

2015-12-06 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044328#comment-15044328
 ] 

Liu Shaohui commented on ZOOKEEPER-2332:


[~rgs]
{quote}
how did the empty txnlog happened in the first place?
{quote}
The zookeeper server was killed after creating a new txn log file before 
flushing the log header to the log.
So a txn log is left without a valid header and makes the the zookeeper server 
fail to start.
See: FileTxnLog.java#207
{code}
if (logStream==null) {
   if(LOG.isInfoEnabled()){
LOG.info("Creating new log file: log." +  
Long.toHexString(hdr.getZxid()));
   }
   
   logFileWrite = new File(logDir, ("log." + 
   Long.toHexString(hdr.getZxid(;
   fos = new FileOutputStream(logFileWrite);
   logStream=new BufferedOutputStream(fos);
   oa = BinaryOutputArchive.getArchive(logStream);
   FileHeader fhdr = new FileHeader(TXNLOG_MAGIC,VERSION, dbId);
   fhdr.serialize(oa, "fileheader");
   // Make sure that the magic number is written before padding.
   logStream.flush();
   currentSize = fos.getChannel().position();
   streamsToFlush.add(fos);
}
{code}

> Zookeeper failed to start for empty txn log
> ---
>
> Key: ZOOKEEPER-2332
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2332
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Liu Shaohui
>Priority: Critical
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2332-v001.diff
>
>
> We found that the zookeeper server with version 3.4.6 failed to start for 
> there is a empty txn log in log dir.  
> I think we should skip the empty log file during restoring the datatree. 
> Any suggestion?
> {code}
> 2015-11-27 19:16:16,887 [myid:] - ERROR [main:ZooKeeperServerMain@63] - 
> Unexpected exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:576)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:595)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:561)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:643)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158)
> at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:272)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:399)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:113)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-12-01 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2101:
---
Attachment: ZOOKEEPER-2101-v8.diff

Rebase on the trunk

[~brahmareddy] [~hdeng]
Could you help to push this patch?

> Transaction larger than max buffer of jute makes zookeeper unavailable
> --
>
> Key: ZOOKEEPER-2101
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: jute
>Affects Versions: 3.4.4
>Reporter: Liu Shaohui
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
> ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, 
> ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, ZOOKEEPER-2101-v8.diff, 
> test.diff
>
>
> *Problem*
> For multi operation, PrepRequestProcessor may produce a large transaction 
> whose size may be larger than the max buffer size of jute. There is check of 
> buffer size in readBuffer method  of BinaryInputArchive, but no check in 
> writeBuffer method  of BinaryOutputArchive, which will cause that 
> 1, Leader can sync transaction to txn log and send the large transaction to 
> the followers, but the followers failed to read the transaction and can't 
> sync with leader.
> {code}
> 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
> [myid:2] Exception when following the leader
> java.io.IOException: Unreasonable length = 2054758
> at 
> org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
> 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
> [myid:2] shutdown called
> java.lang.Exception: shutdown Follower
> at 
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
> {code}
> 2, The leader lose all followers, which trigger the leader election. The old 
> leader will become leader again for it has up-to-date data.
> {code}
> 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
> [myid:3] Shutting down
> 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
> [myid:3] Shutdown called
> java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
> at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
> {code}
> 3, The leader can not load the transaction from the txn log for the length of 
> data is larger than the max buffer of jute.
> {code}
> 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
> [myid:3] Unable to load database on disk
> java.io.IOException: Unreasonable length = 2054758
> at 
> org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
> at 
> org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
> {code}
> The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
> and restart zookeeper hbase cluster.
> *Solution*
> Add buffer size check in BinaryOutputArchive to avoid large transaction be 
> written to log and sent to followers.
> But I am not sure if there are side-effects of throwing an IOException in 
> BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2332) Zookeeper failed to start for empty txn log

2015-11-30 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2332:
---
Attachment: ZOOKEEPER-2332-v001.diff

First patch to fix this issue.

[~rgs]
Could you help to review this small patch? Thanks a lot.

> Zookeeper failed to start for empty txn log
> ---
>
> Key: ZOOKEEPER-2332
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2332
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Liu Shaohui
>Priority: Critical
> Attachments: ZOOKEEPER-2332-v001.diff
>
>
> We found that the zookeeper server with version 3.4.6 failed to start for 
> there is a empty txn log in log dir.  
> I think we should skip the empty log file during restoring the datatree. 
> Any suggestion?
> {code}
> 2015-11-27 19:16:16,887 [myid:] - ERROR [main:ZooKeeperServerMain@63] - 
> Unexpected exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:576)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:595)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:561)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:643)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158)
> at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:272)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:399)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:113)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2332) Zookeeper failed to start for empty txn log

2015-11-27 Thread Liu Shaohui (JIRA)
Liu Shaohui created ZOOKEEPER-2332:
--

 Summary: Zookeeper failed to start for empty txn log
 Key: ZOOKEEPER-2332
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2332
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.6
Reporter: Liu Shaohui
Priority: Critical


We found that the zookeeper server with version 3.4.6 failed to start for there 
is a empty txn log in log dir.  
I think we should skip the empty log file during restoring the datatree. 
Any suggestion?

{code}
2015-11-27 19:16:16,887 [myid:] - ERROR [main:ZooKeeperServerMain@63] - 
Unexpected exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at 
org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:576)
at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:595)
at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:561)
at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:643)
at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at 
org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:272)
at 
org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:399)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
at 
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:113)
at 
org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
at 
org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd

2015-10-13 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2290:
---
Attachment: ZOOKEEPER-2290-v5.patch

Fix the failed test

> Add read/write qps metrics in monitor cmd
> -
>
> Key: ZOOKEEPER-2290
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.6
>Reporter: Liu Shaohui
>Priority: Minor
>  Labels: monitor
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, 
> ZOOKEEPER-2290-v3.patch, ZOOKEEPER-2290-v4.patch, ZOOKEEPER-2290-v5.patch
>
>
> Read/write qps are important metrics to show the pressure of the cluster. We 
> can also use it to alert about some abuse of zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd

2015-10-12 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2290:
---
Attachment: ZOOKEEPER-2290-v2.patch

Update for [~eribeiro]'s review
{quote}
1/2/3/4/6
{quote}
All done

{quote}
1. Rate's public void inc(final int incr) is defined, but not used, right? If 
so, no need to include it.
{quote}
Removed.

{quote}
2. If getRate is the rate per _second_, why are you using time slots of 10,000 
ms?
{quote}.
The average of 10,000 ms may be more representative than that in 1000ms. But 
never mind, and just revert it  1000 ms.

Thanks very much for your so careful review~
 

> Add read/write qps metrics in monitor cmd
> -
>
> Key: ZOOKEEPER-2290
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.6
>Reporter: Liu Shaohui
>Priority: Minor
>  Labels: monitor
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch
>
>
> Read/write qps are important metrics to show the pressure of the cluster. We 
> can also use it to alert about some abuse of zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd

2015-10-12 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953039#comment-14953039
 ] 

Liu Shaohui commented on ZOOKEEPER-2290:


The failed test have no relation with the patch v3.

> Add read/write qps metrics in monitor cmd
> -
>
> Key: ZOOKEEPER-2290
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.6
>Reporter: Liu Shaohui
>Priority: Minor
>  Labels: monitor
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, 
> ZOOKEEPER-2290-v3.patch
>
>
> Read/write qps are important metrics to show the pressure of the cluster. We 
> can also use it to alert about some abuse of zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd

2015-10-12 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2290:
---
Attachment: ZOOKEEPER-2290-v3.patch
ZOOKEEPER-2290-v3.patch

Fix the failed tests

> Add read/write qps metrics in monitor cmd
> -
>
> Key: ZOOKEEPER-2290
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.6
>Reporter: Liu Shaohui
>Priority: Minor
>  Labels: monitor
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, 
> ZOOKEEPER-2290-v3.patch
>
>
> Read/write qps are important metrics to show the pressure of the cluster. We 
> can also use it to alert about some abuse of zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd

2015-10-12 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2290:
---
Attachment: (was: ZOOKEEPER-2290-v3.patch)

> Add read/write qps metrics in monitor cmd
> -
>
> Key: ZOOKEEPER-2290
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.6
>Reporter: Liu Shaohui
>Priority: Minor
>  Labels: monitor
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, 
> ZOOKEEPER-2290-v3.patch
>
>
> Read/write qps are important metrics to show the pressure of the cluster. We 
> can also use it to alert about some abuse of zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd

2015-10-12 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2290:
---
Attachment: (was: ZOOKEEPER-2290-v3.patch)

> Add read/write qps metrics in monitor cmd
> -
>
> Key: ZOOKEEPER-2290
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.6
>Reporter: Liu Shaohui
>Priority: Minor
>  Labels: monitor
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, 
> ZOOKEEPER-2290-v3.patch
>
>
> Read/write qps are important metrics to show the pressure of the cluster. We 
> can also use it to alert about some abuse of zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd

2015-10-12 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2290:
---
Attachment: ZOOKEEPER-2290-v3.patch

Fix the failed tests

> Add read/write qps metrics in monitor cmd
> -
>
> Key: ZOOKEEPER-2290
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.6
>Reporter: Liu Shaohui
>Priority: Minor
>  Labels: monitor
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, 
> ZOOKEEPER-2290-v3.patch
>
>
> Read/write qps are important metrics to show the pressure of the cluster. We 
> can also use it to alert about some abuse of zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd

2015-10-12 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954389#comment-14954389
 ] 

Liu Shaohui commented on ZOOKEEPER-2290:


[~eribeiro]
{quote}
*Also I think it is nice to expose those new metrics via JMX too.*
{quote}
Done in patch v3. Please see the class: ZooKeeperServerBean

{quote}
And you should put a 'default' case in the switch-case statement – throwing a 
IllegalArgumentException
{quote}
Done.

{quote}
 I think getRate() should return a double instead of a long 'cause it will 
round results otherwise. Also, TIMES_SLOT should be final too. I would rename 
it as TIME_SLOT_MS
{quote}
Done.

{quote}
Finally, no need to call updateLatency() on every op type: you can set the 
operation type (READ/WRITE) instead and call updateLatency() at the end. 
{quote}
The preview two updateLatency() are because there are *return* in the switch 
cases. I don't want to change the code structure 


> Add read/write qps metrics in monitor cmd
> -
>
> Key: ZOOKEEPER-2290
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.6
>Reporter: Liu Shaohui
>Priority: Minor
>  Labels: monitor
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, 
> ZOOKEEPER-2290-v3.patch
>
>
> Read/write qps are important metrics to show the pressure of the cluster. We 
> can also use it to alert about some abuse of zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd

2015-10-12 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2290:
---
Attachment: ZOOKEEPER-2290-v4.patch

Update for [~eribeiro]'s review~

> Add read/write qps metrics in monitor cmd
> -
>
> Key: ZOOKEEPER-2290
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.6
>Reporter: Liu Shaohui
>Priority: Minor
>  Labels: monitor
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, 
> ZOOKEEPER-2290-v3.patch, ZOOKEEPER-2290-v4.patch
>
>
> Read/write qps are important metrics to show the pressure of the cluster. We 
> can also use it to alert about some abuse of zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd

2015-10-12 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954316#comment-14954316
 ] 

Liu Shaohui commented on ZOOKEEPER-2290:


[~cconroy]
{quote}
I would suggest also publishing the raw counters. Monotonically increasing 
counters are a bit more versatile than locally computed rates.
{quote}
In my opinion, the qps metrics are more direct than the raw counter and there 
may be some deviation in the qps calculated from the counter by the external 
monitor system.


> Add read/write qps metrics in monitor cmd
> -
>
> Key: ZOOKEEPER-2290
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.6
>Reporter: Liu Shaohui
>Priority: Minor
>  Labels: monitor
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2290-v1.patch, ZOOKEEPER-2290-v2.patch, 
> ZOOKEEPER-2290-v3.patch
>
>
> Read/write qps are important metrics to show the pressure of the cluster. We 
> can also use it to alert about some abuse of zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd

2015-10-11 Thread Liu Shaohui (JIRA)
Liu Shaohui created ZOOKEEPER-2290:
--

 Summary: Add read/write qps metrics in monitor cmd
 Key: ZOOKEEPER-2290
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Liu Shaohui
Priority: Minor


Read/write qps are important metrics to show the pressure of the cluster. We 
can also use it to alert about some abuse of zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2290) Add read/write qps metrics in monitor cmd

2015-10-11 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2290:
---
Attachment: ZOOKEEPER-2290-v1.patch

Patch for trunk

> Add read/write qps metrics in monitor cmd
> -
>
> Key: ZOOKEEPER-2290
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2290
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Liu Shaohui
>Priority: Minor
> Attachments: ZOOKEEPER-2290-v1.patch
>
>
> Read/write qps are important metrics to show the pressure of the cluster. We 
> can also use it to alert about some abuse of zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-05-24 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557939#comment-14557939
 ] 

Liu Shaohui commented on ZOOKEEPER-2101:


{quote}
Swallowing a system failure exception doesn't look like a good choice. I 
usually prefer to let the system crash if not recoverable.
{quote}
Agreed

{quote}
So I will leave it to others to comment.
{quote}
From the log, the last changes of these code are done by [~fpj] in 
ZOOKEEPER-106

[~fpj]
Do you know why the exceptions is ignored in ZKDatabase.java or Leader.java? 
Thanks~
{code}
  ByteArrayOutputStream baos = new ByteArrayOutputStream();
  BinaryOutputArchive boa = BinaryOutputArchive.getArchive(baos);
  try {
  request.hdr.serialize(boa, hdr);
  if (request.txn != null) {
  request.txn.serialize(boa, txn);
  }
  baos.close();
  } catch (IOException e) {
  LOG.error(This really should be impossible, e);
  }
{code}

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, 
 ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
  

[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-05-24 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557930#comment-14557930
 ] 

Liu Shaohui commented on ZOOKEEPER-2101:


[~hdeng]
Any suggestion?

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, 
 ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-05-21 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1435#comment-1435
 ] 

Liu Shaohui commented on ZOOKEEPER-2101:


[~hdeng]
Actually, these code are just moved from ZKDatabase.java or Leader.java. Please 
see the patch.
I am not very sure about why it just ingore those exceptions. Maybe it's really 
impossible.

Or we can open another jira issue to discuss it.

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, 
 ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and 

[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-05-19 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2101:
---
Attachment: ZOOKEEPER-2101-v6.diff

Update for [~rakeshr]'s review.
- Using IOUtils.cleanup(LOG, baos) Instead of try-catch.
- Update the log messages:
{code}
throw new IOException(Len error  + barr.length
+ , less than 0 or larger than max buffer: 
+ BinaryInputArchive.maxBuffer +  set by jute.maxbuffer);
{code}

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, 
 ZOOKEEPER-2101-v6.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive 

[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-05-19 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551684#comment-14551684
 ] 

Liu Shaohui commented on ZOOKEEPER-2101:


[~rakeshr]
Do we need another +1 for commit? Or could you help to push this issue?


 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, 
 ZOOKEEPER-2101-v6.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-05-18 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2101:
---
Attachment: ZOOKEEPER-2101-v5.diff

Update for [~rakeshr]'s review.
{quote}
Move {{ baos.close();}} to finally block
{quote}
Done.
{quote}
Please format the lines, few lines exceeds  80 lines.
{quote}
Done.
{quote}
In tests, any specific reason to increase the value of TEST_MAXBUFFER to 1000?
{quote}
The size of extra fields in transaction is large than 100.
So we increase the TEST_MAXBUFFER to 1000.

{quote}
checking  0 condition also.
{quote}
Done.


 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, 
 test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am 

[jira] [Commented] (ZOOKEEPER-2191) Continue supporting prior Ant versions that don't implement the threads attribute for the JUnit task.

2015-05-18 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549713#comment-14549713
 ] 

Liu Shaohui commented on ZOOKEEPER-2191:


LGTM
Could someone help to push this issue?

 Continue supporting prior Ant versions that don't implement the threads 
 attribute for the JUnit task.
 -

 Key: ZOOKEEPER-2191
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2191
 Project: ZooKeeper
  Issue Type: Improvement
  Components: build
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: ZOOKEEPER-2191.001.patch


 ZOOKEEPER-2183 introduced usage of the threads attribute on the junit task 
 call in build.xml to speed up test execution.  This attribute is only 
 available since Ant 1.9.4.  However, we can continue to support older Ant 
 versions by calling the antversion task and dispatching to a clone of our 
 junit task call that doesn't use the threads attribute.  Users of older Ant 
 versions will get the slower single-process test execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-05-17 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547460#comment-14547460
 ] 

Liu Shaohui commented on ZOOKEEPER-2101:


[~michim]
Could you help to review this patch? I saw you help to update fix versions. :)

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-05-14 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543561#comment-14543561
 ] 

Liu Shaohui commented on ZOOKEEPER-2101:


[~rakeshr]
Sorry for late replay.
Could you help to review the new patch? Thanks.

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.1

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-05-14 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2101:
---
Fix Version/s: 3.5.1

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.1

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-05-14 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2101:
---
Attachment: ZOOKEEPER-2101-v4.diff

Update for [~rakeshr] review.
- Add unit tests
- Fix the log problems

{quote}
The attached log is comparing request.request.capacity() and data.length. But 
data.length contains both request and additional fields. So comparing these 
both won't give exact values.
{quote}
Just add more info in the log

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-05-14 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543584#comment-14543584
 ] 

Liu Shaohui commented on ZOOKEEPER-2101:


[~iandi]
{quote}
However, perhaps an existing error code would be suited to this, such as 
BADARGUMENTS?
{quote}
Good advice. Change the error code to BADARGUMENTS. Thanks.


 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.1

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-05-14 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544843#comment-14544843
 ] 

Liu Shaohui commented on ZOOKEEPER-2101:


The failed test has no relation with this patch. 
And rerun it in my machine for many times and all passed.


 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.1

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-01-22 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2101:
---
Attachment: ZOOKEEPER-2101-v3.diff

Check proposal size in PrepRequestProcessor and throw ProposalTooLargeException 
exception when proposal is larger then the max jute buffer size.


 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-01-19 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2101:
---
Attachment: ZOOKEEPER-2101-v2.diff

Limit the size of packet less than the half of jute max buffer size

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-01-16 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2101:
---
Attachment: test.diff

[~rakeshr]
Add log in ZKDatabase to validate that the  size of Proposal may larger than 
the request size.

{code}
2015-01-16 17:56:07,469 [myid:] - INFO  [SyncThread:0:ZKDatabase@261] - Request 
type 14 size: 5499 zxid: 2, Proposal size:5526
{code}

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Attachments: ZOOKEEPER-2101-v1.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-01-04 Thread Liu Shaohui (JIRA)
Liu Shaohui created ZOOKEEPER-2101:
--

 Summary: Transaction larger than max buffer of jute makes 
zookeeper unavailable
 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui


*Problem*
For multi operation, PrepRequestProcessor may produce a large transaction whose 
size may be larger than the max buffer size of jute. There is check of buffer 
size in readBuffer method  of BinaryInputArchive, but no check in writeBuffer 
method  of BinaryOutputArchive, which will cause that 

1, Leader can sync transaction to txn log and send the large transaction to the 
followers, but the followers failed to read the transaction and can't sync with 
leader.
{code}
2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
[myid:2] Exception when following the leader
java.io.IOException: Unreasonable length = 2054758
at 
org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
at 
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
at 
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at 
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
[myid:2] shutdown called
java.lang.Exception: shutdown Follower
at 
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
{code}

2, The leader lose all followers, which trigger the leader election. The old 
leader will become leader again for it has up-to-date data.
{code}
2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
[myid:3] Shutting down
2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
[myid:3] Shutdown called
java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
{code}
3, The leader can not load the transaction from the txn log for the length of 
data is larger than the max buffer of jute.
{code}

2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
[myid:3] Unable to load database on disk
java.io.IOException: Unreasonable length = 2054758
at 
org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
at 
org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
at 
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
{code}

The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
and restart zookeeper hbase cluster.

*Solution*
Add buffer size check in BinaryOutputArchive to avoid large transaction be 
written to log and sent to followers.

But I am not sure if there are side-effects of throwing an IOException in 
BinaryOutputArchive  and RequestProcessors




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-01-04 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2101:
---
Attachment: ZOOKEEPER-2101-v1.diff

Patch for trunk.

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Attachments: ZOOKEEPER-2101-v1.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2092) A zk instance can not be connected for ZooKeeperServer is not running

2014-11-28 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228244#comment-14228244
 ] 

Liu Shaohui commented on ZOOKEEPER-2092:


[~fpj]
The thread wait at ZooKeeperServer.java:634 forever and never accept new 
connections.
The reason is that running var in ZooKeeperServer is false. 

 A zk instance can not be connected for ZooKeeperServer is not running
 -

 Key: ZOOKEEPER-2092
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2092
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Attachments: stack


 In our 5 node zk cluster, we found a zk node always can not be connected. 
 From the stack we found the ZooKeeperServer hung at waiting the server to be 
 running. But the node is running normally and synced with the leader.
 {code}
 $ ./zkCli.sh -server 10.101.10.67:11000 ls /
 2014-11-27 20:57:11,843 [myid:] - WARN  
 [main-SendThread(lg-com-master02.bj:11000):ClientCnxn$SendThread@1089] - 
 Session 0x0 for server lg-com-master02.bj/10.101.10.67:11000, unexpected 
 error, closing socket connection and attempting reconnect
 java.io.IOException: Connection reset by peer
   at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
   at sun.nio.ch.IOUtil.read(IOUtil.java:192)
   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:353)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
 Exception in thread main 
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
   at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1469)
   at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1497)
   at 
 org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:726)
   at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:594)
   at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:355)
   at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:283)
 {code}
 ZooKeeperServer stack
 {code}
 NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11000 daemon prio=10 
 tid=0x7f60143f7800 nid=0x31fd in Object.wait() [0x7f5fd4678000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.submitRequest(ZooKeeperServer.java:634)
 - locked 0x0007602756a0 (a 
 org.apache.zookeeper.server.quorum.FollowerZooKeeperServer)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.submitRequest(ZooKeeperServer.java:626)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.createSession(ZooKeeperServer.java:525)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processConnectRequest(ZooKeeperServer.java:841)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:410)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:200)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:236)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Any suggestions about this problem? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2092) A zk instance can not be connected for ZooKeeperServer is not running

2014-11-27 Thread Liu Shaohui (JIRA)
Liu Shaohui created ZOOKEEPER-2092:
--

 Summary: A zk instance can not be connected for ZooKeeperServer is 
not running
 Key: ZOOKEEPER-2092
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2092
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.4
Reporter: Liu Shaohui


In our 5 node zk cluster, we found a zk node always can not be connected. From 
the stack we found the ZooKeeperServer hung at waiting the server to be 
running. But the node is running normally and synced with the leader.

{code}
$ ./zkCli.sh -server 10.101.10.67:11000 ls /
2014-11-27 20:57:11,843 [myid:] - WARN  
[main-SendThread(lg-com-master02.bj:11000):ClientCnxn$SendThread@1089] - 
Session 0x0 for server lg-com-master02.bj/10.101.10.67:11000, unexpected error, 
closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:353)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
Exception in thread main 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1469)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1497)
at 
org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:726)
at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:594)
at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:355)
at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:283)
{code}

ZooKeeperServer stack
{code}
NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11000 daemon prio=10 
tid=0x7f60143f7800 nid=0x31fd in Object.wait() [0x7f5fd4678000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at 
org.apache.zookeeper.server.ZooKeeperServer.submitRequest(ZooKeeperServer.java:634)
- locked 0x0007602756a0 (a 
org.apache.zookeeper.server.quorum.FollowerZooKeeperServer)
at 
org.apache.zookeeper.server.ZooKeeperServer.submitRequest(ZooKeeperServer.java:626)
at 
org.apache.zookeeper.server.ZooKeeperServer.createSession(ZooKeeperServer.java:525)
at 
org.apache.zookeeper.server.ZooKeeperServer.processConnectRequest(ZooKeeperServer.java:841)
at 
org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:410)
at 
org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:200)
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:236)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:662)
{code}

Any suggestions about this problem? Thanks.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2092) A zk instance can not be connected for ZooKeeperServer is not running

2014-11-27 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated ZOOKEEPER-2092:
---
Attachment: stack

The full stack

 A zk instance can not be connected for ZooKeeperServer is not running
 -

 Key: ZOOKEEPER-2092
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2092
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Attachments: stack


 In our 5 node zk cluster, we found a zk node always can not be connected. 
 From the stack we found the ZooKeeperServer hung at waiting the server to be 
 running. But the node is running normally and synced with the leader.
 {code}
 $ ./zkCli.sh -server 10.101.10.67:11000 ls /
 2014-11-27 20:57:11,843 [myid:] - WARN  
 [main-SendThread(lg-com-master02.bj:11000):ClientCnxn$SendThread@1089] - 
 Session 0x0 for server lg-com-master02.bj/10.101.10.67:11000, unexpected 
 error, closing socket connection and attempting reconnect
 java.io.IOException: Connection reset by peer
   at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
   at sun.nio.ch.IOUtil.read(IOUtil.java:192)
   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:353)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
 Exception in thread main 
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
   at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1469)
   at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1497)
   at 
 org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:726)
   at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:594)
   at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:355)
   at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:283)
 {code}
 ZooKeeperServer stack
 {code}
 NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11000 daemon prio=10 
 tid=0x7f60143f7800 nid=0x31fd in Object.wait() [0x7f5fd4678000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.submitRequest(ZooKeeperServer.java:634)
 - locked 0x0007602756a0 (a 
 org.apache.zookeeper.server.quorum.FollowerZooKeeperServer)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.submitRequest(ZooKeeperServer.java:626)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.createSession(ZooKeeperServer.java:525)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processConnectRequest(ZooKeeperServer.java:841)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:410)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:200)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:236)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Any suggestions about this problem? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)