RE: Re: [ANNOUNCE] New ZooKeeper committer: Abraham Fine

2018-01-31 Thread Mohammad arshad
Congratulations and Welcome Abe!

-Arshad

-Original Message-
From: Michael Han [mailto:h...@apache.org] 
Sent: Wednesday, January 31, 2018 12:09 PM
To: dev@zookeeper.apache.org
Subject: Re: Re: [ANNOUNCE] New ZooKeeper committer: Abraham Fine

Congratulations Abe!

On Tue, Jan 30, 2018 at 10:35 Brian Nixon  wrote:

> Congratulations, Abe!
>
> On Tue, Jan 30, 2018 at 3:23 AM, Michelle Tan 
> wrote:
>
> > Congratulations Abe! :D
> >
> > Regards,
> > Michelle
> >
> > On Tue, Jan 30, 2018 at 11:18 AM, 岭秀  wrote:
> >
> > > Congratulations to Abe!  A well-deserved honor
> > > 
> > > 
> > > -
> > > > On Tue, Jan 30, 2018 at 1:22 AM, Patrick Hunt 
> > wrote:
> > > >
> > > > > The Apache ZooKeeper PMC recently extended committer karma to Abe
> and
> > > he
> > > > > has accepted. Abe has made some great contributions and we are
> > looking
> > > > > forward to even more :)
> > > > >
> > > > > Congratulations and welcome aboard Abe!
> > > > >
> > > > > Patrick
> > > > >
> > > >
> > >
> >
>


Success: ZOOKEEPER- PreCommit Build #1441

2018-01-31 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1441/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 40.37 MB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1441//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1441//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1441//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 37 minutes 25 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2845
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347901#comment-16347901
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/455

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified.

This is the version of #453 for the 3.4 branch

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2845-3.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #455


commit b035df19616424036afb1f31f345dedf26e3b2ae
Author: Robert Evans 
Date:   2018-02-01T02:09:53Z

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified.




> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #455: ZOOKEEPER-2845: Send a SNAP if transactions can...

2018-01-31 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/455

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified.

This is the version of #453 for the 3.4 branch

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2845-3.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #455


commit b035df19616424036afb1f31f345dedf26e3b2ae
Author: Robert Evans 
Date:   2018-02-01T02:09:53Z

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified.




---


ZooKeeper_branch34_jdk8 - Build # 1282 - Still Failing

2018-01-31 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_jdk8/1282/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 10.32 KB...]
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1724)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:421)
Caused by: hudson.plugins.git.GitException: Command "git config 
remote.origin.url git://git.apache.org/zookeeper.git" returned status code 4:
stdout: 
stderr: error: failed to write new configuration file 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk8/.git/config.lock

at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1970)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1938)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1934)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1572)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1584)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.setRemoteUrl(CliGitAPIImpl.java:1218)
at hudson.plugins.git.GitAPI.setRemoteUrl(GitAPI.java:160)
at sun.reflect.GeneratedMethodAccessor237.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:922)
at 
hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:896)
at 
hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:853)
at hudson.remoting.UserRequest.perform(UserRequest.java:207)
at hudson.remoting.UserRequest.perform(UserRequest.java:53)
at hudson.remoting.Request$2.run(Request.java:358)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to 
H27
at 
hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1693)
at hudson.remoting.UserResponse.retrieve(UserRequest.java:310)
at hudson.remoting.Channel.call(Channel.java:908)
at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:281)
at com.sun.proxy.$Proxy109.setRemoteUrl(Unknown Source)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl.setRemoteUrl(RemoteGitImpl.java:295)
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:813)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1092)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1123)
at hudson.scm.SCM.checkout(SCM.java:495)
at 
hudson.model.AbstractProject.checkout(AbstractProject.java:1202)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at 
jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1724)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at 
hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:421)
ERROR: Error fetching remote repo 'origin'
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: Test reports were found 
but none of them are new. Did leafNodes run? 
For example, 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk8/build/test/logs/TEST-org.apache.jute.BinaryInputArchiveTest.xml
 is 26 days old

Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 

[jira] [Updated] (ZOOKEEPER-2972) When use SSL on zookeeper server, counts of watches may increase more than forty thousands and lead zoookeeper process outofmemory error

2018-01-31 Thread wuyiyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyiyun updated ZOOKEEPER-2972:
---
Labels: features  (was: )

> When use SSL on zookeeper server, counts of watches may increase more than 
> forty thousands and lead zoookeeper process outofmemory error
> 
>
> Key: ZOOKEEPER-2972
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2972
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: recipes
>Affects Versions: 3.5.3
> Environment: I deploy a zookeeper cluster on three nodes. And enable 
> ssl capability under below 
> guidline:[https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide]
> And i use zookeeper client which also enable ssl capability to connect this 
> zookeeper server and set same data to two node under below demo:
> CuratorFramework client;
> // each time we instance a new zookeeper client 
> String path1, path2;
> // instance path1, path2 ..
> String status = "ok"
> client.client.setData().forPath(path1,status.getBytes());
> client.client.setData().forPath(path2,status.getBytes());
> // close zookeeper client...
> This function will be called each five seconds and it work good while ssl 
> capability disabled. when ssl capability enabled, zookeeper server run about 
> one day, and an outofmemory error will occurred and auto produce 
> java_pidXXX.hprof file by zookeeper process . i use Eclipse Memory Analyzer 
> to analize the hprof file and found instance of DataTree used more than six 
> handreds MB memory and more than eighty seven percent memory used by 
> dataTree's field which name is dataWatches. And i use four letter command to 
> check and found too many watches on all of this three nodes. I guess those 
> too many watches cause the error  but i don't know why there are so many 
> watches!
> Additional, if disabled the ssl capability. use four letter command and can 
> only found there are several  watches on each node. and count of watches will 
> not increased.
>  
> Each zookeeper node run under VM which has eight core and eight GB memory, 
> and it's os are centos6.5/centos7.3/redhat6.5/redhat7 and run zookeeper and 
> this demo with JDK1.8.
> This issue will happened under zookeeper 3.5.1 and 3.5.2 and 3.5.3. 
>  
>  
>  
> ...
>Reporter: wuyiyun
>Priority: Major
>  Labels: features
>
> When use SSL on zookeeper server, counts of watches may increase more than 
> forty thousands and lead zoookeeper process outofmemroy error after zookeeper 
> server started one day.
> check command:
> echo wchs | nc localhost 2181
> check result:
> [zookeeper@localhost bin]$ echo wchs | nc localhost 2181
> 44412 connections watching 1 paths
> Total watches:44412



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-2972) When use SSL on zookeeper server, counts of watches may increase more than forty thousands and lead zoookeeper process outofmemory error

2018-01-31 Thread wuyiyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyiyun updated ZOOKEEPER-2972:
---
Environment: 
I deploy a zookeeper cluster on three nodes. And enable ssl capability under 
below 
guidline:[https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide]

And i use zookeeper client which also enable ssl capability to connect this 
zookeeper server and set same data to two node under below demo:

CuratorFramework client;

// each time we instance a new zookeeper client 

String path1, path2;

// instance path1, path2 ..

String status = "ok"

client.client.setData().forPath(path1,status.getBytes());

client.client.setData().forPath(path2,status.getBytes());

// close zookeeper client...

This function will be called each five seconds and it work good while ssl 
capability disabled. when ssl capability enabled, zookeeper server run about 
one day, and an outofmemory error will occurred and auto produce 
java_pidXXX.hprof file by zookeeper process . i use Eclipse Memory Analyzer to 
analize the hprof file and found instance of DataTree used more than six 
handreds MB memory and more than eighty seven percent memory used by dataTree's 
field which name is dataWatches. And i use four letter command to check and 
found too many watches on all of this three nodes. I guess those too many 
watches cause the error  but i don't know why there are so many watches!

Additional, if disabled the ssl capability. use four letter command and can 
only found there are several  watches on each node. and count of watches will 
not increased.

 

Each zookeeper node run under VM which has eight core and eight GB memory, and 
it's os are centos6.5/centos7.3/redhat6.5/redhat7 and run zookeeper and this 
demo with JDK1.8.

This issue will happened under zookeeper 3.5.1 and 3.5.2 and 3.5.3. 

 

 

 

...

  was:
I deploy a zookeeper cluster on three nodes. And enable ssl capability under 
below 
guiline:[https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide]

And i use zookeeper client which also enable ssl capability to connect this 
zookeeper server and set same data to two node under below demo:

CuratorFramework client;

// each time we instance a new zookeeper client 

String path1, path2;

// instance path1, path2 ..

String status = "ok"

client.client.setData().forPath(path1,status.getBytes());

client.client.setData().forPath(path2,status.getBytes());

// close zookeeper client...

This function will be called each five seconds and it work good while ssl 
capability disabled. when ssl capability enabled, zookeeper server run about 
one day, and an outofmemory error will occurred and auto produce 
java_pidXXX.hprof file by zookeeper process . i use Eclipse Memory Analyzer to 
analize the hprof file and found instance of DataTree used more than six 
handreds MB memory and more than eighty seven percent memory used by dataTree's 
field which name is dataWatches. And i use four letter command to check and 
found too many watches on all of this three nodes. I guess those too many 
watches cause the error  but i don't know why there are so many watches!

Additional, if disabled the ssl capability. use four letter command and can 
only found there are several  watches on each node. and count of watches will 
not increased.

 

Each zookeeper node run under VM which has eight core and eight GB memory, and 
it's os are centos6.5/centos7.3/redhat6.5/redhat7 and run zookeeper and this 
demo with JDK1.8.

This issue will happened under zookeeper 3.5.1 and 3.5.2 and 3.5.3. 

 

 

 

...

Summary: When use SSL on zookeeper server, counts of watches may 
increase more than forty thousands and lead zoookeeper process outofmemory 
error  (was: When use SSL on zookeeper server, counts of watches may increase 
more than forty thousands and lead zoookeeper process outofmemroy error)

> When use SSL on zookeeper server, counts of watches may increase more than 
> forty thousands and lead zoookeeper process outofmemory error
> 
>
> Key: ZOOKEEPER-2972
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2972
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: recipes
>Affects Versions: 3.5.3
> Environment: I deploy a zookeeper cluster on three nodes. And enable 
> ssl capability under below 
> guidline:[https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide]
> And i use zookeeper client which also enable ssl capability to connect this 
> zookeeper server and set same data to two node under below demo:
> CuratorFramework client;
> // each time we instance a new zookeeper client 
> String path1, path2;
> // 

[jira] [Commented] (ZOOKEEPER-2973) "Unreasonable length" exception

2018-01-31 Thread wuyiyun (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347842#comment-16347842
 ] 

wuyiyun commented on ZOOKEEPER-2973:


Me too!. I met OutOfMemory error cause zookeeper shutdown. but you can add 
crontab task to restart zookeeper to resolve this issue. 

Of course, you should insure that zookeeper client jar has same version with 
zookeeper server firstly.

> "Unreasonable length" exception 
> 
>
> Key: ZOOKEEPER-2973
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2973
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: wanggang_123
>Priority: Blocker
>
> I am running a three node ZooKeeper cluster. At 2018-01-28 17:56:30,leader 
> node has error log:
> 2018-01-28 17:56:30 
> [UTC:20180128T175630+0800]|ERROR||LearnerHandler-/118.123.180.23:44836hread|Coordination
>  > Unexpected exception causing shutdown while sock still open 
> (LearnerHandler.java:633)
> java.io.IOException: Unreasonable length = 1885430131
>  at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:95)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
>  at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
>  at 
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:546)
> 2018-01-28 17:56:30 [UTC:20180128T175630+0800]|WARN 
> ||LearnerHandler-/118.123.180.23:44836hread|Coordination > *** GOODBYE 
> /118.123.180.23:44836  (LearnerHandler.java:646)
> 2018-01-28 17:56:30 [UTC:20180128T175630+0800]|INFO ||ProcessThread(sid:2 
> cport:-1):hread|Coordination > Got user-level KeeperException when processing 
> sessionid:0x16138593ad43cf9 type:delete cxid:0x5 zxid:0xc104b59e9 txntype:-1 
> reqpath:n/a Error 
> Path:/VSP/Leader/syncScore-0/_c_9101a3d6-f431-4792-b71d-a493e938895d-latch-093037
>  Error:KeeperErrorCode = NoNode for 
> /VSP/Leader/syncScore-0/_c_9101a3d6-f431-4792-b71d-a493e938895d-latch-093037
>  (PrepRequestProcessor.java:645)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Success: ZOOKEEPER- PreCommit Build #1440

2018-01-31 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1440/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 77.97 MB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1440//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1440//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1440//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 17 minutes 59 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2845
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[GitHub] zookeeper pull request #454: ZOOKEEPER-2845: Send a SNAP if transactions can...

2018-01-31 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/454

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified. (3.5)

This is the version of #453 for the 3.5 branch

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2845-3.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/454.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #454


commit 70436249c830af0b129caf3d1bed2f55a2498b6b
Author: Robert Evans 
Date:   2018-01-29T20:27:10Z

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified.




---


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347678#comment-16347678
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/454

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified. (3.5)

This is the version of #453 for the 3.5 branch

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2845-3.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/454.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #454


commit 70436249c830af0b129caf3d1bed2f55a2498b6b
Author: Robert Evans 
Date:   2018-01-29T20:27:10Z

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified.




> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Success: ZOOKEEPER- PreCommit Build #1439

2018-01-31 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1439/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 77.93 MB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1439//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1439//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1439//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 18 minutes 34 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2845
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347557#comment-16347557
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/453

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified.

I will be creating a patch/pull request for 3.4 and 3.5 too, but I wanted 
to get a pull request up for others to look at ASAP.

I have a version of this based off of #310 at 
https://github.com/revans2/zookeeper/tree/ZOOKEEPER-2845-orig-test-patch but 
the test itself is flaky.  Frequently leader election does not go as planned on 
the test and it ends up failing but not because it ended up in an inconsistent 
state.

I am happy to answer any questions anyone has about the patch.  

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2845-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/453.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #453


commit 0219b2c9e44527067cd5fed4b642729171721886
Author: Robert Evans 
Date:   2018-01-29T20:27:10Z

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified.




> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Send a SNAP if transactions can...

2018-01-31 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/453

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified.

I will be creating a patch/pull request for 3.4 and 3.5 too, but I wanted 
to get a pull request up for others to look at ASAP.

I have a version of this based off of #310 at 
https://github.com/revans2/zookeeper/tree/ZOOKEEPER-2845-orig-test-patch but 
the test itself is flaky.  Frequently leader election does not go as planned on 
the test and it ends up failing but not because it ended up in an inconsistent 
state.

I am happy to answer any questions anyone has about the patch.  

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2845-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/453.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #453


commit 0219b2c9e44527067cd5fed4b642729171721886
Author: Robert Evans 
Date:   2018-01-29T20:27:10Z

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified.




---


Re: [ANNOUNCE] New ZooKeeper committer: Abraham Fine

2018-01-31 Thread Rakesh Radhakrishnan
Congrats Abe! Well deserved.

Thanks,
Rakesh

On Tue, Jan 30, 2018 at 5:52 AM, Patrick Hunt  wrote:

> The Apache ZooKeeper PMC recently extended committer karma to Abe and he
> has accepted. Abe has made some great contributions and we are looking
> forward to even more :)
>
> Congratulations and welcome aboard Abe!
>
> Patrick
>


[jira] [Comment Edited] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-01-31 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347026#comment-16347026
 ] 

Robert Joseph Evans edited comment on ZOOKEEPER-2845 at 1/31/18 3:39 PM:
-

I have a fix that I will be posting shortly.  I need to clean up the patch and 
make sure that I get pull requests ready for all of the branches that 
ZOOKEEPER-2926 went into.

 

The following table describes the situation that allows a node to get into an 
inconsistent state.

 
|| ||N1||N2||N3||
|Start with cluster in sync N1 is leader|0x0 0x5|0x0 0x5|0x0 0x5|
|N2 and N3 go down|0x0 0x5| | |
|Proposal to N1 (fails with no quorum)|0x0 0x6| | |
|N2 and N3 return, but N1 is restarting.  N2 elected leader| |0x1 0x0|0x1 0x0|
|A proposal is accepted| |0x1 0x1|0x1 0x1|
|N1 returns and is trying to sync with the new leader N2|0x0 0x6|0x1 0x1|0x1 
0x1|

 

At this point the code in {{LearnerHandler.syncFollower}} takes over to bring 
N1 into sync with N2 the new leader.

That code checks the following in order
 # Is there a {{forceSync}}? Not in this case
 # Are the two zxids in sync already?  No {{0x0 0x6 != 0x1 0x1}}
 # is the peer zxid > the local zxid (and peer didn't just rotate to a new 
epoch)? No {{0x0 0x6 < 0x1 0x1}}
 # is the peer zxid in between the max committed log and the min committed log? 
 In this case yes it is, but it shouldn't be.  The max committed log is {{0x1 
0x1}}.  The min committed log is {{0x0 0x5}} or something likely below it 
because it is based off of distance in the edit log.  The issue is that once 
the epoch changes, {{0x0}} to {{0x1}}, the leader has no idea if the edits are 
in its edit log without explicitly checking for them.

 

The reason that ZOOKEEPER-2926 exposed this is because previously when a leader 
was elected the in memory DB was dropped and everything was reread from disk.  
When this happens the {{0x0 0x6}} proposal was lost.  But it is not guaranteed 
to be lost in all cases.  In theory a snapshot could be taken triggered by that 
proposal, either on the leader, or on a follower that also received the 
proposal, but does not join the new quorum in time.   As such ZOOKEEPER-2926 
really just extended the window of an already existing race.  But it extended 
it almost indefinitely so it is much more likely to happen.

 

My fix is to update {{LearnerHandler.syncFollower}} to only send a {{DIFF}} if 
the epochs are the same.  If they are not the same we don't know if something 
we inserted that we don't know about.

 


was (Author: revans2):
I have a fix that I will be posting shortly.  I need to clean up the patch and 
make sure that I get pull requests ready for all of the branches that 
ZOOKEEPER-2926 went into.

 

The following table describes the situation that allows a node to get into an 
inconsistent state.

 
|| ||N1||N2||N3||
|Start with cluster in sync N1 is leader|0x0 0x5|0x0 0x5|0x0 0x5|
|N2 and N3 go down|0x0 0x5| | |
|Proposal to N1 (fails with no quorum)|0x0 0x6| | |
|N2 and N3 return, but N1 is restarting.  N2 elected leader| |0x1 0x0|0x1 0x0|
|A proposal is accepted| |0x1 0x1|0x1 0x1|
|N1 returns and is trying to sync with the new leader N2|0x0 0x6|0x1 0x1|0x1 
0x1|

 

At this point the code in {{LearnerHandler.syncFollower}} takes over to bring 
N1 into sync with N2 the new leader.

That code checks the following in order
 # Is there a {{forceSync}}? Not in this case
 # Are the two zxids in sync already?  No {{0x0 0x6 != 0x1 0x1}}
 # is the peer zxid > the local zxid (and peer didn't just rotate to a new 
epoch)? No {{0x0 0x6 < 0x1 0x1}}
 # is the peer zxid in between the max committed log and the min committed log? 
 In this case yes it is, but it shouldn't be.  The max committed log is {{0x1 
0x1}}.  The min committed log is {{0x0 0x5}} or something likely below it 
because it is based off of distance in the edit log.  The issue is that once 
the epoch changes, {{0x0}} to {{0x1}}, the leader has no idea if the edits are 
in its edit log without explicitly checking for them.

 

The reason that ZOOKEEPER-2926 exposed this is because previously when a leader 
was elected the in memory DB was dropped and everything was reread from disk.  
When this happens the {{0x0 0x6}} proposal was lost.  But it is not guaranteed 
to be lost in all cases.  In theory a snapshot could be taken triggered by that 
proposal, either on the leader, or on a follower that also allied the proposal, 
but does not join the new quorum in time.   As such ZOOKEEPER-2926 really just 
extended the window of an already existing race.  But it extended it almost 
indefinitely so it is much more likely to happen.

 

My fix is to update {{LearnerHandler.syncFollower}} to only send a {{DIFF}} if 
the epochs are the same.  If they are not the same we don't know if something 
we inserted that we don't know about.

 

> Data inconsistency issue due to retain database in 

[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-01-31 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347026#comment-16347026
 ] 

Robert Joseph Evans commented on ZOOKEEPER-2845:


I have a fix that I will be posting shortly.  I need to clean up the patch and 
make sure that I get pull requests ready for all of the branches that 
ZOOKEEPER-2926 went into.

 

The following table describes the situation that allows a node to get into an 
inconsistent state.

 
|| ||N1||N2||N3||
|Start with cluster in sync N1 is leader|0x0 0x5|0x0 0x5|0x0 0x5|
|N2 and N3 go down|0x0 0x5| | |
|Proposal to N1 (fails with no quorum)|0x0 0x6| | |
|N2 and N3 return, but N1 is restarting.  N2 elected leader| |0x1 0x0|0x1 0x0|
|A proposal is accepted| |0x1 0x1|0x1 0x1|
|N1 returns and is trying to sync with the new leader N2|0x0 0x6|0x1 0x1|0x1 
0x1|

 

At this point the code in {{LearnerHandler.syncFollower}} takes over to bring 
N1 into sync with N2 the new leader.

That code checks the following in order
 # Is there a {{forceSync}}? Not in this case
 # Are the two zxids in sync already?  No {{0x0 0x6 != 0x1 0x1}}
 # is the peer zxid > the local zxid (and peer didn't just rotate to a new 
epoch)? No {{0x0 0x6 < 0x1 0x1}}
 # is the peer zxid in between the max committed log and the min committed log? 
 In this case yes it is, but it shouldn't be.  The max committed log is {{0x1 
0x1}}.  The min committed log is {{0x0 0x5}} or something likely below it 
because it is based off of distance in the edit log.  The issue is that once 
the epoch changes, {{0x0}} to {{0x1}}, the leader has no idea if the edits are 
in its edit log without explicitly checking for them.

 

The reason that ZOOKEEPER-2926 exposed this is because previously when a leader 
was elected the in memory DB was dropped and everything was reread from disk.  
When this happens the {{0x0 0x6}} proposal was lost.  But it is not guaranteed 
to be lost in all cases.  In theory a snapshot could be taken triggered by that 
proposal, either on the leader, or on a follower that also allied the proposal, 
but does not join the new quorum in time.   As such ZOOKEEPER-2926 really just 
extended the window of an already existing race.  But it extended it almost 
indefinitely so it is much more likely to happen.

 

My fix is to update {{LearnerHandler.syncFollower}} to only send a {{DIFF}} if 
the epochs are the same.  If they are not the same we don't know if something 
we inserted that we don't know about.

 

> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-01-31 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans reassigned ZOOKEEPER-2845:
--

Assignee: Robert Joseph Evans

> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


ZooKeeper_branch35_jdk8 - Build # 829 - Still Failing

2018-01-31 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk8/829/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 59.29 KB...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.075 sec, Thread: 3, Class: org.apache.zookeeper.test.SaslClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.709 sec, Thread: 2, Class: 
org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Running org.apache.zookeeper.test.SaslSuperUserTest in thread 3
[junit] Running org.apache.zookeeper.test.ServerCnxnTest in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.792 sec, Thread: 3, Class: org.apache.zookeeper.test.SaslSuperUserTest
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest in thread 
3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.81 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionInvalidationTest
[junit] Running org.apache.zookeeper.test.SessionTest in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.431 sec, Thread: 2, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
2
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.085 sec, Thread: 2, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
45.444 sec, Thread: 7, Class: org.apache.zookeeper.test.RecoveryTest
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 7
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
41.855 sec, Thread: 6, Class: org.apache.zookeeper.test.RestoreCommittedLogTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 6
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.386 sec, Thread: 6, Class: org.apache.zookeeper.test.StatTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.671 sec, Thread: 7, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 6
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 7
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.063 sec, Thread: 7, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 7
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.772 sec, Thread: 7, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.997 sec, Thread: 6, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 6
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 7
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
10.143 sec, Thread: 6, Class: org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
23.369 sec, Thread: 2, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 6
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.114 sec, Thread: 6, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 2
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 6
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.226 sec, Thread: 2, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 2
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.162 sec, Thread: 2, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 2
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
35.424 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
20.568 sec, Thread: 7, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.847 sec, Thread: 3, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
17.368 sec, Thread: 2, Class: 

ZooKeeper-trunk-jdk8 - Build # 1359 - Still Failing

2018-01-31 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-jdk8/1359/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 59.27 KB...]
[junit] Running org.apache.zookeeper.test.ServerCnxnTest in thread 6
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.436 sec, Thread: 6, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest in thread 
6
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.149 sec, Thread: 6, Class: org.apache.zookeeper.test.SessionInvalidationTest
[junit] Running org.apache.zookeeper.test.SessionTest in thread 6
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
124.991 sec, Thread: 4, Class: org.apache.zookeeper.test.RecoveryTest
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
4
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.079 sec, Thread: 4, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 4
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
36.898 sec, Thread: 6, Class: org.apache.zookeeper.test.SessionTest
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 6
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
91.686 sec, Thread: 5, Class: org.apache.zookeeper.test.RestoreCommittedLogTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 5
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.5 
sec, Thread: 6, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 6
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.753 sec, Thread: 5, Class: org.apache.zookeeper.test.StatTest
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 5
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.073 sec, Thread: 5, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 5
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.085 sec, Thread: 6, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 6
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.551 sec, Thread: 5, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 5
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
26.108 sec, Thread: 4, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 4
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.102 sec, Thread: 4, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
16.523 sec, Thread: 6, Class: org.apache.zookeeper.test.TruncateTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 4
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 6
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.501 sec, Thread: 4, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 4
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.098 sec, Thread: 4, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 4
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
24.703 sec, Thread: 5, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 5
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.352 sec, Thread: 5, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
13.379 sec, Thread: 4, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
34.019 sec, Thread: 6, Class: org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
297.931 sec, Thread: 7, Class: org.apache.zookeeper.test.ReconfigTest
[junit] Running org.apache.zookeeper.server.quorum.Zab1_0Test in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 
sec, Thread: 3, Class: org.apache.zookeeper.server.quorum.Zab1_0Test

[jira] [Commented] (ZOOKEEPER-2973) "Unreasonable length" exception

2018-01-31 Thread wanggang_123 (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346549#comment-16346549
 ] 

wanggang_123 commented on ZOOKEEPER-2973:
-

你好,我这边的zookeeper,是偶现shutdown,你遇到的也是吗?

这个问题的触发条件是什么啊?如何能够复现?

> "Unreasonable length" exception 
> 
>
> Key: ZOOKEEPER-2973
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2973
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: wanggang_123
>Priority: Blocker
>
> I am running a three node ZooKeeper cluster. At 2018-01-28 17:56:30,leader 
> node has error log:
> 2018-01-28 17:56:30 
> [UTC:20180128T175630+0800]|ERROR||LearnerHandler-/118.123.180.23:44836hread|Coordination
>  > Unexpected exception causing shutdown while sock still open 
> (LearnerHandler.java:633)
> java.io.IOException: Unreasonable length = 1885430131
>  at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:95)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
>  at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
>  at 
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:546)
> 2018-01-28 17:56:30 [UTC:20180128T175630+0800]|WARN 
> ||LearnerHandler-/118.123.180.23:44836hread|Coordination > *** GOODBYE 
> /118.123.180.23:44836  (LearnerHandler.java:646)
> 2018-01-28 17:56:30 [UTC:20180128T175630+0800]|INFO ||ProcessThread(sid:2 
> cport:-1):hread|Coordination > Got user-level KeeperException when processing 
> sessionid:0x16138593ad43cf9 type:delete cxid:0x5 zxid:0xc104b59e9 txntype:-1 
> reqpath:n/a Error 
> Path:/VSP/Leader/syncScore-0/_c_9101a3d6-f431-4792-b71d-a493e938895d-latch-093037
>  Error:KeeperErrorCode = NoNode for 
> /VSP/Leader/syncScore-0/_c_9101a3d6-f431-4792-b71d-a493e938895d-latch-093037
>  (PrepRequestProcessor.java:645)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


ZooKeeper_branch35_openjdk7 - Build # 830 - Failure

2018-01-31 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_openjdk7/830/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 59.60 KB...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.475 sec, Thread: 1, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Running org.apache.zookeeper.test.SessionTest in thread 4
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
1
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.182 sec, Thread: 1, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
40.292 sec, Thread: 7, Class: org.apache.zookeeper.test.RestoreCommittedLogTest
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 7
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.708 sec, Thread: 7, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 7
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.119 sec, Thread: 7, Class: org.apache.zookeeper.test.StatTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 7
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.763 sec, Thread: 7, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 7
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.1 
sec, Thread: 7, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 7
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.792 sec, Thread: 7, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 7
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
28.197 sec, Thread: 1, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 1
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
34.363 sec, Thread: 4, Class: org.apache.zookeeper.test.SessionTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
9.764 sec, Thread: 7, Class: org.apache.zookeeper.test.TruncateTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 4
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.268 sec, Thread: 4, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 7
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 4
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
7.97 sec, Thread: 7, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 7
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.346 sec, Thread: 7, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 7
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
15.41 sec, Thread: 7, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
35.503 sec, Thread: 1, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 7
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.787 sec, Thread: 7, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
35.958 sec, Thread: 4, Class: org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
477.517 sec, Thread: 6, Class: org.apache.zookeeper.test.DisconnectedWatcherTest
[junit] Tests run: 103, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
457.207 sec, Thread: 3, Class: org.apache.zookeeper.test.NettyNettySuiteTest
[junit] Tests run: 103, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
478.794 sec, Thread: 8, Class: org.apache.zookeeper.test.NioNettySuiteTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
290.653 sec, Thread: 5, Class: org.apache.zookeeper.test.ReconfigTest
[junit] Running org.apache.zookeeper.server.quorum.StandaloneDisabledTest 
in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 
sec, Thread: 2, Class: 

[jira] [Commented] (ZOOKEEPER-2973) "Unreasonable length" exception

2018-01-31 Thread wuyiyun (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346522#comment-16346522
 ] 

wuyiyun commented on ZOOKEEPER-2973:


I met this problem too, while i use open ssl capability on zookeeper server and 
zookeeper client. while i close ssl capability and will not cause this problem.

you can modify source of BinaryInputArchive like below and find what

public String readString(String tag) throws IOException {
 int len = in.readInt();
 if (len == -1) return null; 
 
 checkLength(len);
 byte b[] = new byte[len];
 in.readFully(b); 
 String tobeReturnString = new String(b, "UTF8");
 LOG.info("Just readString String's lengh="+len+" and this 
string is:"+tobeReturnString);
 
 return tobeReturnString;
 }
 
 static public final int maxBuffer = Integer.getInteger("jute.maxbuffer", 
0xf);

public byte[] readBuffer(String tag) throws IOException {
 int len = readInt(tag);
 if (len == -1) return null;
 checkLength(len);
 byte[] arr = new byte[len];
 in.readFully(arr); 
 String tobeReturnString = new String(arr, "UTF8");
 LOG.info("Just readBuffer String's lengh="+len+" and this 
string is:"+tobeReturnString);
 return arr;
 }

> "Unreasonable length" exception 
> 
>
> Key: ZOOKEEPER-2973
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2973
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: wanggang_123
>Priority: Blocker
>
> I am running a three node ZooKeeper cluster. At 2018-01-28 17:56:30,leader 
> node has error log:
> 2018-01-28 17:56:30 
> [UTC:20180128T175630+0800]|ERROR||LearnerHandler-/118.123.180.23:44836hread|Coordination
>  > Unexpected exception causing shutdown while sock still open 
> (LearnerHandler.java:633)
> java.io.IOException: Unreasonable length = 1885430131
>  at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:95)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
>  at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
>  at 
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:546)
> 2018-01-28 17:56:30 [UTC:20180128T175630+0800]|WARN 
> ||LearnerHandler-/118.123.180.23:44836hread|Coordination > *** GOODBYE 
> /118.123.180.23:44836  (LearnerHandler.java:646)
> 2018-01-28 17:56:30 [UTC:20180128T175630+0800]|INFO ||ProcessThread(sid:2 
> cport:-1):hread|Coordination > Got user-level KeeperException when processing 
> sessionid:0x16138593ad43cf9 type:delete cxid:0x5 zxid:0xc104b59e9 txntype:-1 
> reqpath:n/a Error 
> Path:/VSP/Leader/syncScore-0/_c_9101a3d6-f431-4792-b71d-a493e938895d-latch-093037
>  Error:KeeperErrorCode = NoNode for 
> /VSP/Leader/syncScore-0/_c_9101a3d6-f431-4792-b71d-a493e938895d-latch-093037
>  (PrepRequestProcessor.java:645)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-2973) "Unreasonable length" exception

2018-01-31 Thread wanggang_123 (JIRA)
wanggang_123 created ZOOKEEPER-2973:
---

 Summary: "Unreasonable length" exception 
 Key: ZOOKEEPER-2973
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2973
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.6
Reporter: wanggang_123


I am running a three node ZooKeeper cluster. At 2018-01-28 17:56:30,leader node 
has error log:

2018-01-28 17:56:30 
[UTC:20180128T175630+0800]|ERROR||LearnerHandler-/118.123.180.23:44836hread|Coordination
 > Unexpected exception causing shutdown while sock still open 
(LearnerHandler.java:633)
java.io.IOException: Unreasonable length = 1885430131
 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:95)
 at 
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
 at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:546)
2018-01-28 17:56:30 [UTC:20180128T175630+0800]|WARN 
||LearnerHandler-/118.123.180.23:44836hread|Coordination > *** GOODBYE 
/118.123.180.23:44836  (LearnerHandler.java:646)
2018-01-28 17:56:30 [UTC:20180128T175630+0800]|INFO ||ProcessThread(sid:2 
cport:-1):hread|Coordination > Got user-level KeeperException when processing 
sessionid:0x16138593ad43cf9 type:delete cxid:0x5 zxid:0xc104b59e9 txntype:-1 
reqpath:n/a Error 
Path:/VSP/Leader/syncScore-0/_c_9101a3d6-f431-4792-b71d-a493e938895d-latch-093037
 Error:KeeperErrorCode = NoNode for 
/VSP/Leader/syncScore-0/_c_9101a3d6-f431-4792-b71d-a493e938895d-latch-093037
 (PrepRequestProcessor.java:645)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


ZooKeeper-trunk-openjdk7 - Build # 1786 - Failure

2018-01-31 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1786/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 61.23 KB...]
[junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest 
in thread 5
[junit] Running org.apache.zookeeper.test.SaslSuperUserTest in thread 7
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.133 sec, Thread: 3, Class: org.apache.zookeeper.test.SaslAuthFailTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.972 sec, Thread: 5, Class: 
org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Running org.apache.zookeeper.test.ServerCnxnTest in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.17 sec, Thread: 7, Class: org.apache.zookeeper.test.SaslSuperUserTest
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest in thread 
5
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.711 sec, Thread: 5, Class: org.apache.zookeeper.test.SessionInvalidationTest
[junit] Running org.apache.zookeeper.test.SessionTest in thread 7
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
5
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.107 sec, Thread: 5, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 5
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.539 sec, Thread: 3, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.527 sec, Thread: 3, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.9 
sec, Thread: 3, Class: org.apache.zookeeper.test.StatTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 3
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.193 sec, Thread: 3, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.115 sec, Thread: 3, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.095 sec, Thread: 3, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 3
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
72.497 sec, Thread: 8, Class: org.apache.zookeeper.test.QuorumZxidSyncTest
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 8
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
5.185 sec, Thread: 3, Class: org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
20.163 sec, Thread: 5, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.087 sec, Thread: 3, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 3
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 5
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.087 sec, Thread: 3, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 3
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.119 sec, Thread: 3, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 3
[junit] Tests run: 14, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
84.872 sec, Thread: 2, Class: org.apache.zookeeper.test.QuorumTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.094 sec, Thread: 2, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
34.087 sec, Thread: 7, Class: org.apache.zookeeper.test.SessionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
14.317 sec, Thread: 3, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, 

ZooKeeper_branch34_jdk8 - Build # 1281 - Failure

2018-01-31 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_jdk8/1281/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 38.23 KB...]
[junit] Running org.apache.zookeeper.test.RecoveryTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
24.962 sec
[junit] Running org.apache.zookeeper.test.RepeatStartupTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.5 
sec
[junit] Running org.apache.zookeeper.test.RestoreCommittedLogTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
18.614 sec
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedClientTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.992 sec
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedServerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.897 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailDesignatedClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.189 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailNotifyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.839 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.917 sec
[junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.242 sec
[junit] Running org.apache.zookeeper.test.SaslClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.106 sec
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.744 sec
[junit] Running org.apache.zookeeper.test.SessionTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
33.473 sec
[junit] Running org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.967 sec
[junit] Running org.apache.zookeeper.test.StatTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.512 sec
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.372 sec
[junit] Running org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.797 sec
[junit] Running org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
11.622 sec
[junit] Running org.apache.zookeeper.test.UpgradeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.39 sec
[junit] Running org.apache.zookeeper.test.WatchedEventTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.124 sec
[junit] Running org.apache.zookeeper.test.WatcherFuncTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.803 sec
[junit] Running org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
29.464 sec
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
11.415 sec
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.873 sec

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk8/build.xml:1382: 
The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk8/build.xml:1385: 
Tests failed!

Total time: 40 minutes 40 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.SaslAuthTest.testZKOperationsAfterClientSaslAuthFailure

Error Message:
Did not connect

Stack Trace:
java.util.concurrent.TimeoutException: Did not connect
at 
org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:151)
at 
org.apache.zookeeper.SaslAuthTest.testZKOperationsAfterClientSaslAuthFailure(SaslAuthTest.java:174)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:55)