[GitHub] zookeeper pull request #715: Rollup of blocker/critical fixes for 3.5 (to tr...

2018-11-22 Thread mkedwards
GitHub user mkedwards reopened a pull request:

https://github.com/apache/zookeeper/pull/715

Rollup of blocker/critical fixes for 3.5 (to trigger CI)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper rollup-3.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/715.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #715


commit 3694a4e31eef9b85de59112c22ab163452610743
Author: Michael Edwards 
Date:   2018-11-20T13:33:09Z

[ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses 
in an AddressTuple held through an AtomicReference

commit 4cd10c86519b75521f89e451033dca4869d8d0d1
Author: Michael Edwards 
Date:   2018-11-21T08:53:54Z

[ZOOKEEPER-2778] QuorumPeer/QuorumCnxManager: address deadlock and 
visibility issues

commit 03d259bae3b744dc494022698fa843f6cf35e7ed
Author: Michael Edwards 
Date:   2018-11-21T09:01:45Z

[ZOOKEEPER-2778] QuorumPeer: add fast path for already-non-null 
quorum/election address

commit 0531d9c8e6a44ec531a4d8ad667307d9859bef7e
Author: Michael Edwards 
Date:   2018-11-21T17:13:14Z

[ZOOKEEPER-2778] QuorumPeer: fixes from code review

commit 9701f0576f53d1859d3584d0bb9730c89eb57ac1
Author: Michael Edwards 
Date:   2018-11-21T17:19:44Z

[ZOOKEEPER-2778] QuorumPeer: fix access to newly private data members from 
ReconfigTest

commit bbeeebf87391ef642059c4b3b65592c361a2ab4e
Author: Michael Edwards 
Date:   2018-11-21T19:48:49Z

[ZOOKEEPER-2778] LeaderBeanTest: set up mock QuorumVerifier so that 
addresses get set

commit 5038179e217a0b80805fbb6780a6fec024f9e29d
Author: Michael Edwards 
Date:   2018-11-22T19:21:19Z

[ZOOKEEPER-2778] QuorumPeer: warn when clobbering existing election 
algorithm

commit 78df674c4413561336e3435eaa692a9ec2ede0ca
Author: Michael Edwards 
Date:   2018-11-22T19:29:27Z

[ZOOKEEPER-2778] QuorumPeer: halt old QCM when clobbering existing election 
algorithm

commit d6898072947cc908ca802fa542e636d615ac29f0
Author: Fangmin Lyu 
Date:   2018-11-15T17:46:51Z

ZOOKEEPER-1818: Correctly handle potentially inconsistent 
zxid/electionEpoch and peerEpoch during leader election

commit ab30d5c8fb64b9fc93bd5e691855f6c17abd3d17
Author: Michael Edwards 
Date:   2018-11-21T20:31:16Z

ZOOKEEPER-1636: cleanup completion list of a failed multi request (from 
Thawan Kooburat)

commit e05ae82437e8a7124c5501772ce188d29ac9bfc2
Author: Michael Edwards 
Date:   2018-11-21T23:09:55Z

ZOOKEEPER-2488: Synchronized access to shuttingDownLE in QuorumPeer

commit bf685a609c991ac4642da42ab41c7d43d6e35246
Author: Andor Molnar 
Date:   2018-11-19T16:25:52Z

ZOOKEEPER-3193. Refactor SaslAuthFail test to use single class. Use 
CountDownLatch to sync with watcher.

commit b964efbaa363fc873205acba4aefc76940a358f1
Author: Michael Edwards 
Date:   2018-11-21T21:33:01Z

Bump library versions, fix 'ant package-native tar' targets

commit bba8aebae6c153d21913f7391f2201c148f24f1e
Author: Michael Edwards 
Date:   2018-11-22T08:38:28Z

Add OneLinerFormatter to get semi-verbose logs with captured stdout/stderr

commit 984189537df99a6c55b058362047d272a5c59145
Author: Michael Edwards 
Date:   2018-11-22T12:07:38Z

ZOOKEEPER-3198: plumb BindException up to Leader and its peers

* Normalize bind failures to java.net.BindException in both implementation 
of ServerCnxnFactory
* Throw BindException out as far as the caller of QuorumPeer.processReconfig
* Catch BindException in tryToCommit(); the transaction needs to be 
committed




---


[GitHub] zookeeper pull request #715: Rollup of blocker/critical fixes for 3.5 (to tr...

2018-11-22 Thread mkedwards
Github user mkedwards closed the pull request at:

https://github.com/apache/zookeeper/pull/715


---


[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...

2018-11-22 Thread mkedwards
GitHub user mkedwards reopened a pull request:

https://github.com/apache/zookeeper/pull/718

ZOOKEEPER-1818:  Correctly handle potentially inconsistent 
zxid/electionEpoch…

… and peerEpoch during leader election.  (This is Fangmin's patch, I'm 
just firing off a CI build against current master.)  See #714 for the backport 
to branch-3.5.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1818-for-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #718


commit 73250fad37d5c1f461a0a60e7ed88c710b1b3c96
Author: Fangmin Lyu 
Date:   2018-11-15T17:46:51Z

Correctly handle potential inconsitent zxid/electionEpoch and peerEpoch 
during leader election




---


[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...

2018-11-22 Thread mkedwards
Github user mkedwards closed the pull request at:

https://github.com/apache/zookeeper/pull/718


---


Re: Glide path to getting 3.5.x out of beta

2018-11-22 Thread Michael K. Edwards
For what it's worth, builds 2732 and 2733 ran concurrently on H19, and
both failed for what I think are resource-conflict reasons.  It would
probably help to modify the PreCommit-ZOOKEEPER-github-pr-build queue
so that it doesn't attempt concurrent builds on the same
(uncontainerized) host.
On Thu, Nov 22, 2018 at 1:44 PM Michael K. Edwards
 wrote:
>
> Thanks for the guidance.  Feel free to assign ZOOKEEPER-2778 to me (I
> don't seem to be able to do it myself).  I've updated that pull
> request against 3.5 to address all reviewer comments.  When it looks
> ready to land, I'll port it to master as well.
>
> I have updated ZOOKEEPER-1636 and ZOOKEEPER-1818 with clean pull
> requests based on Thawan's and Fangmin's patches.  I'll poke at them
> until they build green, and try to handle anything reviewers bring up.
>
> With regard to flaky tests:  a fair fraction of spurious test failures
> appear to result from failure to bind a dynamically-assigned
> client/election/quorum port.  The prevailing hypothesis is that
> something else, running concurrently on the machine, is binding the
> port in between the check in PortAssignment (which binds it, to verify
> that it's not otherwise in use, and then closes that socket to free it
> again) and the subsequent use as a service port.  If that's the case,
> then we could eliminate this class of test failures by running the
> tests inside a container (with a dedicated network namespace).  Any
> failures of this kind that persist in a containerized test setup are
> the test fighting itself, not fighting unrelated concurrent processes.
> On Thu, Nov 22, 2018 at 8:23 AM Andor Molnar  wrote:
> >
> > Hi Michael!
> >
> > Thanks for the great help to get 3.5 out of the door. We're getting closer 
> > with each commit.
> >
> > You asked a lot of questions in your email, which I'm trying to answer, but 
> > I believe the best approach is to deal with one problem at a time. 
> > Especially in email communication is not ideal to mix different topics, 
> > because it makes things hard to follow.
> >
> > I focus on 3.5 release in this thread according to the subject. There's 
> > another thread btw I usually update every so often, but your list is pretty 
> > much accurate too. I use the following query for 3.5 blockers:
> >
> > project = ZooKeeper AND resolution = Unresolved AND fixVersion = 3.5.5 AND 
> > priority in (blocker, critical) ORDER BY priority DESC, key ASC
> >
> > ZOOKEEPER-1818 - Fangmin is working on it and patch is available on github.
> > ZOOKEEPER-2778 - You're working on it, patch is available. You should 
> > assign the Jira to yourself to avoid somebody else picking it up.
> > ZOOKEEPER-1636 - An ancient C issue which has patch available in Jira. I'm 
> > planning to rebase it on master, but didn't have a chance yet.
> >
> > All of the others are Maven/Doc related which Tamas and Norbert are working 
> > on.
> >
> > Flaky tests are related, but we don't tackle it as a blocker issue. Here's 
> > the umbrella Jira that I've created to track the progress:
> > https://issues.apache.org/jira/browse/ZOOKEEPER-3170
> >
> > Feel free to pick up any of the open ones or create new ones if you think 
> > it's necessary. It's generally better to open individual Jiras for every 
> > issue you're working on and discuss the details in it. You can open an 
> > email thread too, if you feel convenient, but Jira is preferred.
> >
> > Preferred workflow is Open Jira -> GitHub PR -> Commit to master -> 
> > Backport to 3.5/3.4 if necessary -> Close Jira.
> >
> > Thank you for your contribution again!
> >
> > Andor
> >
> >
> >
> > On Thu, Nov 22, 2018 at 12:51 PM Michael K. Edwards  
> > wrote:
> >>
> >> I think it's mostly a problem in CI, where other processes on the same
> >> machine may compete for the port range, producing spurious Jenkins
> >> failures.  The only failures I'm seeing locally are unrelated SSL
> >> issues.
> >> On Thu, Nov 22, 2018 at 3:45 AM Enrico Olivelli  
> >> wrote:
> >> >
> >> > Il giorno gio 22 nov 2018 alle ore 12:44 Michael K. Edwards
> >> >  ha scritto:
> >> > >
> >> > > I'm glad to be able to help.
> >> > >
> >> > > It appears as though some of the "flaky tests" result from another
> >> > > process stealing a server port between the time that it is assigned
> >> > > (in org.apache.zookeeper.PortAssignment.unique()) and the time that it
> >> > > is bound.
> >> >
> >> > You can try running tests using a single thread, this will "mitigate"
> >> > the problem a bit
> >> >
> >> > Enrico
> >> >
> >> > This happened, for example, in
> >> > > https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2708/;
> >> > > looking in the console text, I found:
> >> > >
> >> > >  [exec] [junit] 2018-11-22 00:18:30,336 [myid:] - INFO
> >> > > [QuorumPeerListener:QuorumCnxManager$Listener@884] - My election bind
> >> > > port: localhost/127.0.0.1:19459
> >> > >  [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - INFO
> >> > > [QuorumPeer[myid=1

[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...

2018-11-22 Thread mkedwards
GitHub user mkedwards reopened a pull request:

https://github.com/apache/zookeeper/pull/718

ZOOKEEPER-1818:  Correctly handle potentially inconsistent 
zxid/electionEpoch…

… and peerEpoch during leader election.  (This is Fangmin's patch, I'm 
just firing off a CI build against current master.)  See #714 for the backport 
to branch-3.5.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1818-for-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #718


commit 73250fad37d5c1f461a0a60e7ed88c710b1b3c96
Author: Fangmin Lyu 
Date:   2018-11-15T17:46:51Z

Correctly handle potential inconsitent zxid/electionEpoch and peerEpoch 
during leader election




---


[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...

2018-11-22 Thread mkedwards
Github user mkedwards closed the pull request at:

https://github.com/apache/zookeeper/pull/718


---


[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...

2018-11-22 Thread mkedwards
GitHub user mkedwards reopened a pull request:

https://github.com/apache/zookeeper/pull/718

ZOOKEEPER-1818:  Correctly handle potentially inconsistent 
zxid/electionEpoch…

… and peerEpoch during leader election.  (This is Fangmin's patch, I'm 
just firing off a CI build against current master.)  See #714 for the backport 
to branch-3.5.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1818-for-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #718


commit 73250fad37d5c1f461a0a60e7ed88c710b1b3c96
Author: Fangmin Lyu 
Date:   2018-11-15T17:46:51Z

Correctly handle potential inconsitent zxid/electionEpoch and peerEpoch 
during leader election




---


[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...

2018-11-22 Thread mkedwards
Github user mkedwards closed the pull request at:

https://github.com/apache/zookeeper/pull/718


---


[jira] [Created] (ZOOKEEPER-3198) Handle port-binding failures in a systematic and documented fashion

2018-11-22 Thread Michael K. Edwards (JIRA)
Michael K. Edwards created ZOOKEEPER-3198:
-

 Summary: Handle port-binding failures in a systematic and 
documented fashion
 Key: ZOOKEEPER-3198
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3198
 Project: ZooKeeper
  Issue Type: Improvement
Affects Versions: 3.4.13, 3.5.3, 3.6.0
Reporter: Michael K. Edwards
 Fix For: 3.6.0, 3.5.5, 3.4.14


Many test failures appear to result from bind failures due to port conflicts.  
This can arise in normal use as well.  Presently the code swallows the 
exception (with an error log) at a low level.  It would probably be useful to 
throw the exception far enough up the stack to trigger retry with a new port 
(in tests) or a high-level (perhaps even fatal) error message (in normal use).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Glide path to getting 3.5.x out of beta

2018-11-22 Thread Michael K. Edwards
Thanks for the guidance.  Feel free to assign ZOOKEEPER-2778 to me (I
don't seem to be able to do it myself).  I've updated that pull
request against 3.5 to address all reviewer comments.  When it looks
ready to land, I'll port it to master as well.

I have updated ZOOKEEPER-1636 and ZOOKEEPER-1818 with clean pull
requests based on Thawan's and Fangmin's patches.  I'll poke at them
until they build green, and try to handle anything reviewers bring up.

With regard to flaky tests:  a fair fraction of spurious test failures
appear to result from failure to bind a dynamically-assigned
client/election/quorum port.  The prevailing hypothesis is that
something else, running concurrently on the machine, is binding the
port in between the check in PortAssignment (which binds it, to verify
that it's not otherwise in use, and then closes that socket to free it
again) and the subsequent use as a service port.  If that's the case,
then we could eliminate this class of test failures by running the
tests inside a container (with a dedicated network namespace).  Any
failures of this kind that persist in a containerized test setup are
the test fighting itself, not fighting unrelated concurrent processes.
On Thu, Nov 22, 2018 at 8:23 AM Andor Molnar  wrote:
>
> Hi Michael!
>
> Thanks for the great help to get 3.5 out of the door. We're getting closer 
> with each commit.
>
> You asked a lot of questions in your email, which I'm trying to answer, but I 
> believe the best approach is to deal with one problem at a time. Especially 
> in email communication is not ideal to mix different topics, because it makes 
> things hard to follow.
>
> I focus on 3.5 release in this thread according to the subject. There's 
> another thread btw I usually update every so often, but your list is pretty 
> much accurate too. I use the following query for 3.5 blockers:
>
> project = ZooKeeper AND resolution = Unresolved AND fixVersion = 3.5.5 AND 
> priority in (blocker, critical) ORDER BY priority DESC, key ASC
>
> ZOOKEEPER-1818 - Fangmin is working on it and patch is available on github.
> ZOOKEEPER-2778 - You're working on it, patch is available. You should assign 
> the Jira to yourself to avoid somebody else picking it up.
> ZOOKEEPER-1636 - An ancient C issue which has patch available in Jira. I'm 
> planning to rebase it on master, but didn't have a chance yet.
>
> All of the others are Maven/Doc related which Tamas and Norbert are working 
> on.
>
> Flaky tests are related, but we don't tackle it as a blocker issue. Here's 
> the umbrella Jira that I've created to track the progress:
> https://issues.apache.org/jira/browse/ZOOKEEPER-3170
>
> Feel free to pick up any of the open ones or create new ones if you think 
> it's necessary. It's generally better to open individual Jiras for every 
> issue you're working on and discuss the details in it. You can open an email 
> thread too, if you feel convenient, but Jira is preferred.
>
> Preferred workflow is Open Jira -> GitHub PR -> Commit to master -> Backport 
> to 3.5/3.4 if necessary -> Close Jira.
>
> Thank you for your contribution again!
>
> Andor
>
>
>
> On Thu, Nov 22, 2018 at 12:51 PM Michael K. Edwards  
> wrote:
>>
>> I think it's mostly a problem in CI, where other processes on the same
>> machine may compete for the port range, producing spurious Jenkins
>> failures.  The only failures I'm seeing locally are unrelated SSL
>> issues.
>> On Thu, Nov 22, 2018 at 3:45 AM Enrico Olivelli  wrote:
>> >
>> > Il giorno gio 22 nov 2018 alle ore 12:44 Michael K. Edwards
>> >  ha scritto:
>> > >
>> > > I'm glad to be able to help.
>> > >
>> > > It appears as though some of the "flaky tests" result from another
>> > > process stealing a server port between the time that it is assigned
>> > > (in org.apache.zookeeper.PortAssignment.unique()) and the time that it
>> > > is bound.
>> >
>> > You can try running tests using a single thread, this will "mitigate"
>> > the problem a bit
>> >
>> > Enrico
>> >
>> > This happened, for example, in
>> > > https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2708/;
>> > > looking in the console text, I found:
>> > >
>> > >  [exec] [junit] 2018-11-22 00:18:30,336 [myid:] - INFO
>> > > [QuorumPeerListener:QuorumCnxManager$Listener@884] - My election bind
>> > > port: localhost/127.0.0.1:19459
>> > >  [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - INFO
>> > > [QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@493]
>> > > - binding to port localhost/127.0.0.1:19466
>> > >  [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - ERROR
>> > > [QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@497]
>> > > - Error while reconfiguring
>> > >  [exec] [junit] org.jboss.netty.channel.ChannelException:
>> > > Failed to bind to: localhost/127.0.0.1:19466
>> > >  [exec] [junit] at
>> > > org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.ja

[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...

2018-11-22 Thread mkedwards
GitHub user mkedwards reopened a pull request:

https://github.com/apache/zookeeper/pull/717

ZOOKEEPER-1636: cleanup completion list of a failed multi request

(from Thawan Kooburat)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1636-for-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/717.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #717


commit 40eb35efb56ddf4ef06243ec433d57200e9029f5
Author: Michael Edwards 
Date:   2018-11-21T20:31:16Z

ZOOKEEPER-1636: cleanup completion list of a failed multi request (from 
Thawan Kooburat)




---


[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...

2018-11-22 Thread mkedwards
Github user mkedwards closed the pull request at:

https://github.com/apache/zookeeper/pull/717


---


[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...

2018-11-22 Thread mkedwards
GitHub user mkedwards reopened a pull request:

https://github.com/apache/zookeeper/pull/717

ZOOKEEPER-1636: cleanup completion list of a failed multi request

(from Thawan Kooburat)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1636-for-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/717.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #717


commit 40eb35efb56ddf4ef06243ec433d57200e9029f5
Author: Michael Edwards 
Date:   2018-11-21T20:31:16Z

ZOOKEEPER-1636: cleanup completion list of a failed multi request (from 
Thawan Kooburat)




---


[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...

2018-11-22 Thread mkedwards
Github user mkedwards closed the pull request at:

https://github.com/apache/zookeeper/pull/717


---


[jira] [Commented] (ZOOKEEPER-1818) Fix don't care for trunk

2018-11-22 Thread Michael K. Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696267#comment-16696267
 ] 

Michael K. Edwards commented on ZOOKEEPER-1818:
---

#718 is just Fangmin's patch against current master.

> Fix don't care for trunk
> 
>
> Key: ZOOKEEPER-1818
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1818
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.1
>Reporter: Flavio Junqueira
>Assignee: Fangmin Lv
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-1818.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> See umbrella jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...

2018-11-22 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/718

ZOOKEEPER-1818:  Correctly handle potentially inconsistent 
zxid/electionEpoch…

… and peerEpoch during leader election.  (This is Fangmin's patch, I'm 
just firing off a CI build against current master.)  See #714 for the backport 
to branch-3.5.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1818-for-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #718


commit 73250fad37d5c1f461a0a60e7ed88c710b1b3c96
Author: Fangmin Lyu 
Date:   2018-11-15T17:46:51Z

Correctly handle potential inconsitent zxid/electionEpoch and peerEpoch 
during leader election




---


[jira] [Commented] (ZOOKEEPER-1818) Fix don't care for trunk

2018-11-22 Thread Michael K. Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696264#comment-16696264
 ] 

Michael K. Edwards commented on ZOOKEEPER-1818:
---

#714 now has just Fangmin's patch, ported, without the previous extraneous 
changes.  It may not build green until #707 (or an alternate fix for 
ZOOKEEPER-2778) lands on branch-3.5.

> Fix don't care for trunk
> 
>
> Key: ZOOKEEPER-1818
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1818
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.1
>Reporter: Flavio Junqueira
>Assignee: Fangmin Lv
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-1818.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> See umbrella jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-1636) c-client crash when zoo_amulti failed

2018-11-22 Thread Michael K. Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696263#comment-16696263
 ] 

Michael K. Edwards commented on ZOOKEEPER-1636:
---

#717 is Thawan's patch as a pull request against master.  #713 is the same 
patch against branch-3.5.

> c-client crash when zoo_amulti failed 
> --
>
> Key: ZOOKEEPER-1636
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1636
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.4.3
>Reporter: Thawan Kooburat
>Assignee: Thawan Kooburat
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-1636.patch, ZOOKEEPER-1636.patch, 
> ZOOKEEPER-1636.patch, ZOOKEEPER-1636.patch, ZOOKEEPER-1636.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> deserialize_response for multi operation don't handle the case where the 
> server fail to send back response. (Eg. when multi packet is too large) 
> c-client will try to process completion of all sub-request as if the 
> operation is successful and will eventually cause SIGSEGV



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-1636) c-client crash when zoo_amulti failed

2018-11-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-1636:
--
Labels: pull-request-available  (was: )

> c-client crash when zoo_amulti failed 
> --
>
> Key: ZOOKEEPER-1636
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1636
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.4.3
>Reporter: Thawan Kooburat
>Assignee: Thawan Kooburat
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-1636.patch, ZOOKEEPER-1636.patch, 
> ZOOKEEPER-1636.patch, ZOOKEEPER-1636.patch, ZOOKEEPER-1636.patch
>
>
> deserialize_response for multi operation don't handle the case where the 
> server fail to send back response. (Eg. when multi packet is too large) 
> c-client will try to process completion of all sub-request as if the 
> operation is successful and will eventually cause SIGSEGV



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...

2018-11-22 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/717

ZOOKEEPER-1636: cleanup completion list of a failed multi request

(from Thawan Kooburat)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1636-for-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/717.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #717


commit 40eb35efb56ddf4ef06243ec433d57200e9029f5
Author: Michael Edwards 
Date:   2018-11-21T20:31:16Z

ZOOKEEPER-1636: cleanup completion list of a failed multi request (from 
Thawan Kooburat)




---


[jira] [Commented] (ZOOKEEPER-3152) Port ZK netty stack to netty 4

2018-11-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696226#comment-16696226
 ] 

Hudson commented on ZOOKEEPER-3152:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #278 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/278/])
ZOOKEEPER-3152: Port ZK netty stack to netty4 (andor: rev 
caca062767c36525e6ecead2ae0f34c447394809)
* (edit) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/NettyNettySuiteBase.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocket.java
* (edit) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/ClientTest.java
* (edit) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/ReconfigTest.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java
* (edit) 
zookeeper-server/src/test/java/org/apache/zookeeper/ClientCnxnSocketTest.java
* (add) 
zookeeper-server/src/main/java/org/apache/zookeeper/common/NettyUtils.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/UnifiedServerSocket.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxnFactory.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNIO.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java
* (edit) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/NioNettySuiteBase.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/TestByteBufAllocator.java
* (edit) ivy.xml
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/TestByteBufAllocatorTestHelper.java
* (edit) 
zookeeper-server/src/test/java/org/apache/zookeeper/server/NettyServerCnxnTest.java
* (edit) build.xml


> Port ZK netty stack to netty 4
> --
>
> Key: ZOOKEEPER-3152
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3152
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, server
>Affects Versions: 3.6.0
>Reporter: Ilya Maykov
>Assignee: Ilya Maykov
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>
> Netty 3 is super old. Let's port ZK's netty stack to netty 4. I'm working on 
> a patch that I will put up as a pull request on github once we finish testing 
> it internally at Facebook, just getting the Jira ticket ready ahead of time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

2018-11-22 Thread mkedwards
Github user mkedwards commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/707#discussion_r235808196
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
 ---
@@ -108,7 +109,11 @@
 LocalPeerBean jmxLocalPeerBean;
 private Map jmxRemotePeerBean;
 LeaderElectionBean jmxLeaderElectionBean;
-private QuorumCnxManager qcm;
+
+// The QuorumCnxManager is held through an AtomicReference to ensure 
cross-thread visibility
+// of updates; see the implementation comment at 
setLastSeenQuorumVerifier().
+private AtomicReference qcmRef = new 
AtomicReference<>();
--- End diff --

I am hoping to reduce the need for synchronized blocks in follow-up 
changes.  For now, I added a simple use of getAndSet() to detect multiple calls 
to createElectionAlgorithm() and ensure that the QCM that's being dropped on 
the floor gets halted first.


---


[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

2018-11-22 Thread mkedwards
Github user mkedwards commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/707#discussion_r235807260
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
 ---
@@ -121,6 +126,18 @@
  */
 private ZKDatabase zkDb;
 
+public static class AddressTuple {
--- End diff --

Good idea.  Done.


---


[jira] [Commented] (ZOOKEEPER-3152) Port ZK netty stack to netty 4

2018-11-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696198#comment-16696198
 ] 

Hudson commented on ZOOKEEPER-3152:
---

SUCCESS: Integrated in Jenkins build Zookeeper-trunk-single-thread #118 (See 
[https://builds.apache.org/job/Zookeeper-trunk-single-thread/118/])
ZOOKEEPER-3152: Port ZK netty stack to netty4 (andor: rev 
caca062767c36525e6ecead2ae0f34c447394809)
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocket.java
* (edit) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/NioNettySuiteBase.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxnFactory.java
* (edit) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/ReconfigTest.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNIO.java
* (edit) 
zookeeper-server/src/test/java/org/apache/zookeeper/ClientCnxnSocketTest.java
* (edit) ivy.xml
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java
* (edit) 
zookeeper-server/src/test/java/org/apache/zookeeper/server/NettyServerCnxnTest.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/UnifiedServerSocket.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/TestByteBufAllocatorTestHelper.java
* (add) 
zookeeper-server/src/main/java/org/apache/zookeeper/common/NettyUtils.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/TestByteBufAllocator.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java
* (edit) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/ClientTest.java
* (edit) build.xml
* (edit) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/NettyNettySuiteBase.java


> Port ZK netty stack to netty 4
> --
>
> Key: ZOOKEEPER-3152
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3152
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, server
>Affects Versions: 3.6.0
>Reporter: Ilya Maykov
>Assignee: Ilya Maykov
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>
> Netty 3 is super old. Let's port ZK's netty stack to netty 4. I'm working on 
> a patch that I will put up as a pull request on github once we finish testing 
> it internally at Facebook, just getting the Jira ticket ready ahead of time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3152) Port ZK netty stack to netty 4

2018-11-22 Thread Andor Molnar (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar resolved ZOOKEEPER-3152.
-
   Resolution: Fixed
Fix Version/s: 3.6.0

Issue resolved by pull request 669
[https://github.com/apache/zookeeper/pull/669]

> Port ZK netty stack to netty 4
> --
>
> Key: ZOOKEEPER-3152
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3152
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, server
>Affects Versions: 3.6.0
>Reporter: Ilya Maykov
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> Netty 3 is super old. Let's port ZK's netty stack to netty 4. I'm working on 
> a patch that I will put up as a pull request on github once we finish testing 
> it internally at Facebook, just getting the Jira ticket ready ahead of time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-11-22 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/669
  
Merged to master branch. Thanks @ivmaykov !


---


[jira] [Assigned] (ZOOKEEPER-3152) Port ZK netty stack to netty 4

2018-11-22 Thread Andor Molnar (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar reassigned ZOOKEEPER-3152:
---

Assignee: Ilya Maykov

> Port ZK netty stack to netty 4
> --
>
> Key: ZOOKEEPER-3152
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3152
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, server
>Affects Versions: 3.6.0
>Reporter: Ilya Maykov
>Assignee: Ilya Maykov
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> Netty 3 is super old. Let's port ZK's netty stack to netty 4. I'm working on 
> a patch that I will put up as a pull request on github once we finish testing 
> it internally at Facebook, just getting the Jira ticket ready ahead of time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-11-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/669


---


[GitHub] zookeeper issue #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thread and...

2018-11-22 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/689
  
@lvfangmin Any more concerns?


---


[jira] [Updated] (ZOOKEEPER-3197) Improve documentation in ZooKeeperServer.superSecret

2018-11-22 Thread Colm O hEigeartaigh (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colm O hEigeartaigh updated ZOOKEEPER-3197:
---
Description: 
A security scan flagged the use of a hard-coded secret 
(ZooKeeperServer.superSecret) in conjunction with a java Random instance to 
generate a password:

byte[] generatePasswd(long id)

{    

    Random r = new Random(id ^ superSecret);    

    byte p[] = new byte[16];    

    r.nextBytes(p);    

    return p;    

}

superSecret has the following javadoc:

 /**
    * This is the secret that we use to generate passwords, for the moment it
    * is more of a sanity check.
    */

It is unclear from this comment and looking at the code why it is not a 
security risk. It would be good to update the javadoc along the lines of "Using 
a hard-coded secret with Random to generate is not a security risk because the 
resulting passwords are used for X and not for authentication" or something 
would be very helpful for anyone else looking at the code.

  was:
A security scan flagged the use of a hard-coded secret 
(ZooKeeperServer.superSecret) in conjunction with a java Random instance to 
generate a password:

byte[] generatePasswd(long id) {
    Random r = new Random(id ^ superSecret);
    byte p[] = new byte[16];
    r.nextBytes(p);
    return p;
    }

superSecret has the following javadoc:

 /**
   * This is the secret that we use to generate passwords, for the moment it
   * is more of a sanity check.
   */

It is unclear from this comment and looking at the code why it is not a 
security risk. It would be good to update the javadoc along the lines of "Using 
a hard-coded secret with Random to generate is not a security risk because the 
resulting passwords are used for X and not for authentication" or something 
would be very helpful for anyone else looking at the code.


> Improve documentation in ZooKeeperServer.superSecret
> 
>
> Key: ZOOKEEPER-3197
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3197
> Project: ZooKeeper
>  Issue Type: Task
>Reporter: Colm O hEigeartaigh
>Priority: Trivial
>
> A security scan flagged the use of a hard-coded secret 
> (ZooKeeperServer.superSecret) in conjunction with a java Random instance to 
> generate a password:
> byte[] generatePasswd(long id)
> {    
>     Random r = new Random(id ^ superSecret);    
>     byte p[] = new byte[16];    
>     r.nextBytes(p);    
>     return p;    
> }
> superSecret has the following javadoc:
>  /**
>     * This is the secret that we use to generate passwords, for the moment it
>     * is more of a sanity check.
>     */
> It is unclear from this comment and looking at the code why it is not a 
> security risk. It would be good to update the javadoc along the lines of 
> "Using a hard-coded secret with Random to generate is not a security risk 
> because the resulting passwords are used for X and not for authentication" or 
> something would be very helpful for anyone else looking at the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #294: ZOOKEEPER-2822: Wrong `ObjectName` about `MBeanServer`...

2018-11-22 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/294
  
retest this please


---


Re: Glide path to getting 3.5.x out of beta

2018-11-22 Thread Andor Molnar
Hi Michael!

Thanks for the great help to get 3.5 out of the door. We're getting closer
with each commit.

You asked a lot of questions in your email, which I'm trying to answer, but
I believe the best approach is to deal with one problem at a time.
Especially in email communication is not ideal to mix different topics,
because it makes things hard to follow.

I focus on 3.5 release in this thread according to the subject. There's
another thread btw I usually update every so often, but your list is pretty
much accurate too. I use the following query for 3.5 blockers:

project = ZooKeeper AND resolution = Unresolved AND fixVersion = 3.5.5 AND
priority in (blocker, critical) ORDER BY priority DESC, key ASC

ZOOKEEPER-1818 - Fangmin is working on it and patch is available on github.
ZOOKEEPER-2778 - You're working on it, patch is available. You should
assign the Jira to yourself to avoid somebody else picking it up.
ZOOKEEPER-1636 - An ancient C issue which has patch available in Jira. I'm
planning to rebase it on master, but didn't have a chance yet.

All of the others are Maven/Doc related which Tamas and Norbert are working
on.

Flaky tests are related, but we don't tackle it as a blocker issue. Here's
the umbrella Jira that I've created to track the progress:
https://issues.apache.org/jira/browse/ZOOKEEPER-3170

Feel free to pick up any of the open ones or create new ones if you think
it's necessary. It's generally better to open individual Jiras for every
issue you're working on and discuss the details in it. You can open an
email thread too, if you feel convenient, but Jira is preferred.

Preferred workflow is Open Jira -> GitHub PR -> Commit to master ->
Backport to 3.5/3.4 if necessary -> Close Jira.

Thank you for your contribution again!

Andor



On Thu, Nov 22, 2018 at 12:51 PM Michael K. Edwards 
wrote:

> I think it's mostly a problem in CI, where other processes on the same
> machine may compete for the port range, producing spurious Jenkins
> failures.  The only failures I'm seeing locally are unrelated SSL
> issues.
> On Thu, Nov 22, 2018 at 3:45 AM Enrico Olivelli 
> wrote:
> >
> > Il giorno gio 22 nov 2018 alle ore 12:44 Michael K. Edwards
> >  ha scritto:
> > >
> > > I'm glad to be able to help.
> > >
> > > It appears as though some of the "flaky tests" result from another
> > > process stealing a server port between the time that it is assigned
> > > (in org.apache.zookeeper.PortAssignment.unique()) and the time that it
> > > is bound.
> >
> > You can try running tests using a single thread, this will "mitigate"
> > the problem a bit
> >
> > Enrico
> >
> > This happened, for example, in
> > >
> https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2708/;
> > > looking in the console text, I found:
> > >
> > >  [exec] [junit] 2018-11-22 00:18:30,336 [myid:] - INFO
> > > [QuorumPeerListener:QuorumCnxManager$Listener@884] - My election bind
> > > port: localhost/127.0.0.1:19459
> > >  [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - INFO
> > > [QuorumPeer[myid=1](plain=/127.0.0.1:19457
> )(secure=disabled):NettyServerCnxnFactory@493]
> > > - binding to port localhost/127.0.0.1:19466
> > >  [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - ERROR
> > > [QuorumPeer[myid=1](plain=/127.0.0.1:19457
> )(secure=disabled):NettyServerCnxnFactory@497]
> > > - Error while reconfiguring
> > >  [exec] [junit] org.jboss.netty.channel.ChannelException:
> > > Failed to bind to: localhost/127.0.0.1:19466
> > >  [exec] [junit] at
> > >
> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
> > >  [exec] [junit] at
> > >
> org.apache.zookeeper.server.NettyServerCnxnFactory.reconfigure(NettyServerCnxnFactory.java:494)
> > >  [exec] [junit] at
> > >
> org.apache.zookeeper.server.quorum.QuorumPeer.processReconfig(QuorumPeer.java:1947)
> > >  [exec] [junit] at
> > >
> org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:154)
> > >  [exec] [junit] at
> > >
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:93)
> > >  [exec] [junit] at
> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1263)
> > >  [exec] [junit] Caused by: java.net.BindException: Address
> > > already in use
> > >  [exec] [junit] at sun.nio.ch.Net.bind0(Native Method)
> > >  [exec] [junit] at sun.nio.ch.Net.bind(Net.java:433)
> > >  [exec] [junit] at sun.nio.ch.Net.bind(Net.java:425)
> > >  [exec] [junit] at
> > > sun.nio.ch
> .ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> > >  [exec] [junit] at
> > > sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> > >  [exec] [junit] at
> > >
> org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
> > >  [exec] [junit] at
> > >
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractN

[jira] [Updated] (ZOOKEEPER-3197) Improve documentation in ZooKeeperServer.superSecret

2018-11-22 Thread Colm O hEigeartaigh (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colm O hEigeartaigh updated ZOOKEEPER-3197:
---
Description: 
A security scan flagged the use of a hard-coded secret 
(ZooKeeperServer.superSecret) in conjunction with a java Random instance to 
generate a password:

byte[] generatePasswd(long id)

{         Random r = new Random(id ^ superSecret);         byte p[] = 
new byte[16];         r.nextBytes(p);         return p;     }

superSecret has the following javadoc:

 /**
    * This is the secret that we use to generate passwords, for the moment it
    * is more of a sanity check.
    */

It is unclear from this comment and looking at the code why it is not a 
security risk. It would be good to update the javadoc along the lines of "Using 
a hard-coded secret with Random to generate a password is not a security risk 
because the resulting passwords are used for X, Y, Z and not for authentication 
etc" or something would be very helpful for anyone else looking at the code.

  was:
A security scan flagged the use of a hard-coded secret 
(ZooKeeperServer.superSecret) in conjunction with a java Random instance to 
generate a password:

byte[] generatePasswd(long id)

{    

    Random r = new Random(id ^ superSecret);    

    byte p[] = new byte[16];    

    r.nextBytes(p);    

    return p;    

}

superSecret has the following javadoc:

 /**
    * This is the secret that we use to generate passwords, for the moment it
    * is more of a sanity check.
    */

It is unclear from this comment and looking at the code why it is not a 
security risk. It would be good to update the javadoc along the lines of "Using 
a hard-coded secret with Random to generate is not a security risk because the 
resulting passwords are used for X and not for authentication" or something 
would be very helpful for anyone else looking at the code.


> Improve documentation in ZooKeeperServer.superSecret
> 
>
> Key: ZOOKEEPER-3197
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3197
> Project: ZooKeeper
>  Issue Type: Task
>Reporter: Colm O hEigeartaigh
>Priority: Trivial
>
> A security scan flagged the use of a hard-coded secret 
> (ZooKeeperServer.superSecret) in conjunction with a java Random instance to 
> generate a password:
> byte[] generatePasswd(long id)
> {         Random r = new Random(id ^ superSecret);         byte p[] = 
> new byte[16];         r.nextBytes(p);         return p;     }
> superSecret has the following javadoc:
>  /**
>     * This is the secret that we use to generate passwords, for the moment it
>     * is more of a sanity check.
>     */
> It is unclear from this comment and looking at the code why it is not a 
> security risk. It would be good to update the javadoc along the lines of 
> "Using a hard-coded secret with Random to generate a password is not a 
> security risk because the resulting passwords are used for X, Y, Z and not 
> for authentication etc" or something would be very helpful for anyone else 
> looking at the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3197) Improve documentation in ZooKeeperServer.superSecret

2018-11-22 Thread Colm O hEigeartaigh (JIRA)
Colm O hEigeartaigh created ZOOKEEPER-3197:
--

 Summary: Improve documentation in ZooKeeperServer.superSecret
 Key: ZOOKEEPER-3197
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3197
 Project: ZooKeeper
  Issue Type: Task
Reporter: Colm O hEigeartaigh


A security scan flagged the use of a hard-coded secret 
(ZooKeeperServer.superSecret) in conjunction with a java Random instance to 
generate a password:

byte[] generatePasswd(long id) {
    Random r = new Random(id ^ superSecret);
    byte p[] = new byte[16];
    r.nextBytes(p);
    return p;
    }

superSecret has the following javadoc:

 /**
   * This is the secret that we use to generate passwords, for the moment it
   * is more of a sanity check.
   */

It is unclear from this comment and looking at the code why it is not a 
security risk. It would be good to update the javadoc along the lines of "Using 
a hard-coded secret with Random to generate is not a security risk because the 
resulting passwords are used for X and not for authentication" or something 
would be very helpful for anyone else looking at the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #716: Enable secure processing and disallow DTDs in t...

2018-11-22 Thread coheigea
GitHub user coheigea opened a pull request:

https://github.com/apache/zookeeper/pull/716

Enable secure processing and disallow DTDs in the SAXParserFactory

It's good security practice to set the secure processing feature on 
SAXParserFactory and to disallow Doctypes if they aren't needed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/coheigea/zookeeper sax_secureproc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/716.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #716


commit c3086a962925dc8c3a6aa85e8a8f58ee5e0c4354
Author: Colm O hEigeartaigh 
Date:   2018-11-22T15:51:10Z

Enable secure processing and disallow DTDs in the SAXParserFactory




---


[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

2018-11-22 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/707#discussion_r235712847
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
 ---
@@ -121,6 +126,18 @@
  */
 private ZKDatabase zkDb;
 
+public static class AddressTuple {
--- End diff --

nit: final?


---


[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

2018-11-22 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/707#discussion_r235713462
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
 ---
@@ -108,7 +109,11 @@
 LocalPeerBean jmxLocalPeerBean;
 private Map jmxRemotePeerBean;
 LeaderElectionBean jmxLeaderElectionBean;
-private QuorumCnxManager qcm;
+
+// The QuorumCnxManager is held through an AtomicReference to ensure 
cross-thread visibility
+// of updates; see the implementation comment at 
setLastSeenQuorumVerifier().
+private AtomicReference qcmRef = new 
AtomicReference<>();
--- End diff --

If we are not using Compare and set, why a volatile is not enough?


---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-22 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
This PR now has the extraneous changes removed and is green in CI.  Please 
re-review at your convenience.


---


[GitHub] zookeeper pull request #715: Rollup of blocker/critical fixes for 3.5 (to tr...

2018-11-22 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/715

Rollup of blocker/critical fixes for 3.5 (to trigger CI)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper rollup-3.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/715.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #715


commit 3694a4e31eef9b85de59112c22ab163452610743
Author: Michael Edwards 
Date:   2018-11-20T13:33:09Z

[ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses 
in an AddressTuple held through an AtomicReference

commit 4cd10c86519b75521f89e451033dca4869d8d0d1
Author: Michael Edwards 
Date:   2018-11-21T08:53:54Z

[ZOOKEEPER-2778] QuorumPeer/QuorumCnxManager: address deadlock and 
visibility issues

commit 03d259bae3b744dc494022698fa843f6cf35e7ed
Author: Michael Edwards 
Date:   2018-11-21T09:01:45Z

[ZOOKEEPER-2778] QuorumPeer: add fast path for already-non-null 
quorum/election address

commit 0531d9c8e6a44ec531a4d8ad667307d9859bef7e
Author: Michael Edwards 
Date:   2018-11-21T17:13:14Z

[ZOOKEEPER-2778] QuorumPeer: fixes from code review

commit 9701f0576f53d1859d3584d0bb9730c89eb57ac1
Author: Michael Edwards 
Date:   2018-11-21T17:19:44Z

[ZOOKEEPER-2778] QuorumPeer: fix access to newly private data members from 
ReconfigTest

commit bbeeebf87391ef642059c4b3b65592c361a2ab4e
Author: Michael Edwards 
Date:   2018-11-21T19:48:49Z

[ZOOKEEPER-2778] LeaderBeanTest: set up mock QuorumVerifier so that 
addresses get set

commit 0e2b571d452306ae151b106e05d0511b09a237d3
Author: Michael Edwards 
Date:   2018-11-21T20:31:16Z

ZOOKEEPER-1636: cleanup completion list of a failed multi request (from 
Thawan Kooburat)

commit 3683a1b451fa334ba51227db327557966d319a4d
Author: Fangmin Lyu 
Date:   2018-11-15T17:46:51Z

ZOOKEEPER-1818: Correctly handle potentially inconsistent 
zxid/electionEpoch and peerEpoch during leader election

commit a1b56505671b756448f7c0126de764cd25633e2f
Author: Michael Edwards 
Date:   2018-11-21T21:33:01Z

Bump library versions, fix 'ant package-native tar' targets

commit 3dfd49f6bfea357c838e21d5a2e4f1486ed753e9
Author: Michael Edwards 
Date:   2018-11-21T23:09:55Z

ZOOKEEPER-2488: Synchronized access to shuttingDownLE in QuorumPeer

commit 1cbaec427037b8ac10004e8198821da524949843
Author: Andor Molnar 
Date:   2018-11-19T16:25:52Z

ZOOKEEPER-3193. Refactor SaslAuthFail test to use single class. Use 
CountDownLatch to sync with watcher.

commit 0d4e7839eaab8b3222be31a705f9edded1ad98a5
Author: Michael Edwards 
Date:   2018-11-22T08:38:28Z

Add OneLinerFormatter to get semi-verbose logs with captured stdout/stderr

commit 3b19067c2a5b213e53f6e2ce7638b76250076fe6
Author: Michael Edwards 
Date:   2018-11-22T12:07:38Z

Throw BindException out as far as the caller of QuorumPeer.processReconfig




---


[jira] [Commented] (ZOOKEEPER-2916) startSingleServerTest may be flaky

2018-11-22 Thread Michael K. Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695827#comment-16695827
 ] 

Michael K. Edwards commented on ZOOKEEPER-2916:
---

The root cause is hidden inside {{...[truncated 395348 chars]...}}.  But it 
looks to me like the server failed to bind the port, which seems to be a common 
cause of spurious test failures in CI.  See 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2708/consoleText
 for an example (search for {{BindException}}).

> startSingleServerTest may be flaky
> --
>
> Key: ZOOKEEPER-2916
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2916
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: tests
>Affects Versions: 3.5.3, 3.6.0
>Reporter: Patrick Hunt
>Assignee: Bogdan Kanivets
>Priority: Major
>  Labels: flaky, newbie
>
> startSingleServerTest seems to be failing intermittently. 10 times in the 
> first few days of this month. Can someone take a look?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Glide path to getting 3.5.x out of beta

2018-11-22 Thread Michael K. Edwards
I think it's mostly a problem in CI, where other processes on the same
machine may compete for the port range, producing spurious Jenkins
failures.  The only failures I'm seeing locally are unrelated SSL
issues.
On Thu, Nov 22, 2018 at 3:45 AM Enrico Olivelli  wrote:
>
> Il giorno gio 22 nov 2018 alle ore 12:44 Michael K. Edwards
>  ha scritto:
> >
> > I'm glad to be able to help.
> >
> > It appears as though some of the "flaky tests" result from another
> > process stealing a server port between the time that it is assigned
> > (in org.apache.zookeeper.PortAssignment.unique()) and the time that it
> > is bound.
>
> You can try running tests using a single thread, this will "mitigate"
> the problem a bit
>
> Enrico
>
> This happened, for example, in
> > https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2708/;
> > looking in the console text, I found:
> >
> >  [exec] [junit] 2018-11-22 00:18:30,336 [myid:] - INFO
> > [QuorumPeerListener:QuorumCnxManager$Listener@884] - My election bind
> > port: localhost/127.0.0.1:19459
> >  [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - INFO
> > [QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@493]
> > - binding to port localhost/127.0.0.1:19466
> >  [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - ERROR
> > [QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@497]
> > - Error while reconfiguring
> >  [exec] [junit] org.jboss.netty.channel.ChannelException:
> > Failed to bind to: localhost/127.0.0.1:19466
> >  [exec] [junit] at
> > org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
> >  [exec] [junit] at
> > org.apache.zookeeper.server.NettyServerCnxnFactory.reconfigure(NettyServerCnxnFactory.java:494)
> >  [exec] [junit] at
> > org.apache.zookeeper.server.quorum.QuorumPeer.processReconfig(QuorumPeer.java:1947)
> >  [exec] [junit] at
> > org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:154)
> >  [exec] [junit] at
> > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:93)
> >  [exec] [junit] at
> > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1263)
> >  [exec] [junit] Caused by: java.net.BindException: Address
> > already in use
> >  [exec] [junit] at sun.nio.ch.Net.bind0(Native Method)
> >  [exec] [junit] at sun.nio.ch.Net.bind(Net.java:433)
> >  [exec] [junit] at sun.nio.ch.Net.bind(Net.java:425)
> >  [exec] [junit] at
> > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> >  [exec] [junit] at
> > sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> >  [exec] [junit] at
> > org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
> >  [exec] [junit] at
> > org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
> >  [exec] [junit] at
> > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
> >  [exec] [junit] at
> > org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
> >  [exec] [junit] at
> > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> >  [exec] [junit] at
> > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> >  [exec] [junit] at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >  [exec] [junit] at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >  [exec] [junit] at java.lang.Thread.run(Thread.java:748)
> >
> > We currently log-and-swallow this exception (and many others) down in
> > NettyServerCnxnFactory.reconfigure() and
> > NIOServerCnxnFactory.reconfigure(), which is ... not ideal.
> >
> > How should we handle a bind failure in the real world?  Seems like we
> > ought to throw a BindException out at least as far as the caller of
> > QuorumPeer.processReconfig().  That's either
> > Follower/Leader/Learner/Observer or FastLeaderElection.  Presumably
> > they should immediately go read-only when they can't bind the client
> > port?
> > On Thu, Nov 22, 2018 at 1:23 AM Enrico Olivelli  wrote:
> > >
> > > Thank you very much Michael
> > > I am following and reviewing your patches
> > >
> > > Enrico
> > > Il giorno gio 22 nov 2018 alle ore 10:14 Michael K. Edwards
> > >  ha scritto:
> > > >
> > > > Hmm.  Jira's a bit of a boneyard, isn't it?  And timeouts in flaky
> > > > tests are a problem.
> > > >
> > > > I scrubbed through the open bugs and picked the ones that looked to me
> > > > like they might deserve attention for 3.5.5 or soon thereafter.
> > > > They're all on my watchlist:
> > > > https://issues.apache.org/jira/issues/?filter=-1&jql=watcher%20%3D%20mkedwards%20AND%20resoluti

Re: Glide path to getting 3.5.x out of beta

2018-11-22 Thread Enrico Olivelli
Il giorno gio 22 nov 2018 alle ore 12:44 Michael K. Edwards
 ha scritto:
>
> I'm glad to be able to help.
>
> It appears as though some of the "flaky tests" result from another
> process stealing a server port between the time that it is assigned
> (in org.apache.zookeeper.PortAssignment.unique()) and the time that it
> is bound.

You can try running tests using a single thread, this will "mitigate"
the problem a bit

Enrico

This happened, for example, in
> https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2708/;
> looking in the console text, I found:
>
>  [exec] [junit] 2018-11-22 00:18:30,336 [myid:] - INFO
> [QuorumPeerListener:QuorumCnxManager$Listener@884] - My election bind
> port: localhost/127.0.0.1:19459
>  [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - INFO
> [QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@493]
> - binding to port localhost/127.0.0.1:19466
>  [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - ERROR
> [QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@497]
> - Error while reconfiguring
>  [exec] [junit] org.jboss.netty.channel.ChannelException:
> Failed to bind to: localhost/127.0.0.1:19466
>  [exec] [junit] at
> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>  [exec] [junit] at
> org.apache.zookeeper.server.NettyServerCnxnFactory.reconfigure(NettyServerCnxnFactory.java:494)
>  [exec] [junit] at
> org.apache.zookeeper.server.quorum.QuorumPeer.processReconfig(QuorumPeer.java:1947)
>  [exec] [junit] at
> org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:154)
>  [exec] [junit] at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:93)
>  [exec] [junit] at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1263)
>  [exec] [junit] Caused by: java.net.BindException: Address
> already in use
>  [exec] [junit] at sun.nio.ch.Net.bind0(Native Method)
>  [exec] [junit] at sun.nio.ch.Net.bind(Net.java:433)
>  [exec] [junit] at sun.nio.ch.Net.bind(Net.java:425)
>  [exec] [junit] at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>  [exec] [junit] at
> sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>  [exec] [junit] at
> org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
>  [exec] [junit] at
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
>  [exec] [junit] at
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
>  [exec] [junit] at
> org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
>  [exec] [junit] at
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>  [exec] [junit] at
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>  [exec] [junit] at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [exec] [junit] at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [exec] [junit] at java.lang.Thread.run(Thread.java:748)
>
> We currently log-and-swallow this exception (and many others) down in
> NettyServerCnxnFactory.reconfigure() and
> NIOServerCnxnFactory.reconfigure(), which is ... not ideal.
>
> How should we handle a bind failure in the real world?  Seems like we
> ought to throw a BindException out at least as far as the caller of
> QuorumPeer.processReconfig().  That's either
> Follower/Leader/Learner/Observer or FastLeaderElection.  Presumably
> they should immediately go read-only when they can't bind the client
> port?
> On Thu, Nov 22, 2018 at 1:23 AM Enrico Olivelli  wrote:
> >
> > Thank you very much Michael
> > I am following and reviewing your patches
> >
> > Enrico
> > Il giorno gio 22 nov 2018 alle ore 10:14 Michael K. Edwards
> >  ha scritto:
> > >
> > > Hmm.  Jira's a bit of a boneyard, isn't it?  And timeouts in flaky
> > > tests are a problem.
> > >
> > > I scrubbed through the open bugs and picked the ones that looked to me
> > > like they might deserve attention for 3.5.5 or soon thereafter.
> > > They're all on my watchlist:
> > > https://issues.apache.org/jira/issues/?filter=-1&jql=watcher%20%3D%20mkedwards%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20created%20ASC
> > > (I'm not counting the Ant->Maven transition in that, which I don't
> > > know much about.)
> > >
> > > I'm trying out some more verbose logging for the junit tests, to try
> > > to understand test flakiness.  But the Jenkins pre-commit pipeline
> > > appears to be down?
> > > https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/
> > > On Wed, Nov 21, 2018 at 2:29 PM Michael K. Edwards

Re: Glide path to getting 3.5.x out of beta

2018-11-22 Thread Michael K. Edwards
I'm glad to be able to help.

It appears as though some of the "flaky tests" result from another
process stealing a server port between the time that it is assigned
(in org.apache.zookeeper.PortAssignment.unique()) and the time that it
is bound.  This happened, for example, in
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2708/;
looking in the console text, I found:

 [exec] [junit] 2018-11-22 00:18:30,336 [myid:] - INFO
[QuorumPeerListener:QuorumCnxManager$Listener@884] - My election bind
port: localhost/127.0.0.1:19459
 [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - INFO
[QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@493]
- binding to port localhost/127.0.0.1:19466
 [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - ERROR
[QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@497]
- Error while reconfiguring
 [exec] [junit] org.jboss.netty.channel.ChannelException:
Failed to bind to: localhost/127.0.0.1:19466
 [exec] [junit] at
org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
 [exec] [junit] at
org.apache.zookeeper.server.NettyServerCnxnFactory.reconfigure(NettyServerCnxnFactory.java:494)
 [exec] [junit] at
org.apache.zookeeper.server.quorum.QuorumPeer.processReconfig(QuorumPeer.java:1947)
 [exec] [junit] at
org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:154)
 [exec] [junit] at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:93)
 [exec] [junit] at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1263)
 [exec] [junit] Caused by: java.net.BindException: Address
already in use
 [exec] [junit] at sun.nio.ch.Net.bind0(Native Method)
 [exec] [junit] at sun.nio.ch.Net.bind(Net.java:433)
 [exec] [junit] at sun.nio.ch.Net.bind(Net.java:425)
 [exec] [junit] at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
 [exec] [junit] at
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
 [exec] [junit] at
org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
 [exec] [junit] at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
 [exec] [junit] at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
 [exec] [junit] at
org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
 [exec] [junit] at
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
 [exec] [junit] at
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
 [exec] [junit] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 [exec] [junit] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 [exec] [junit] at java.lang.Thread.run(Thread.java:748)

We currently log-and-swallow this exception (and many others) down in
NettyServerCnxnFactory.reconfigure() and
NIOServerCnxnFactory.reconfigure(), which is ... not ideal.

How should we handle a bind failure in the real world?  Seems like we
ought to throw a BindException out at least as far as the caller of
QuorumPeer.processReconfig().  That's either
Follower/Leader/Learner/Observer or FastLeaderElection.  Presumably
they should immediately go read-only when they can't bind the client
port?
On Thu, Nov 22, 2018 at 1:23 AM Enrico Olivelli  wrote:
>
> Thank you very much Michael
> I am following and reviewing your patches
>
> Enrico
> Il giorno gio 22 nov 2018 alle ore 10:14 Michael K. Edwards
>  ha scritto:
> >
> > Hmm.  Jira's a bit of a boneyard, isn't it?  And timeouts in flaky
> > tests are a problem.
> >
> > I scrubbed through the open bugs and picked the ones that looked to me
> > like they might deserve attention for 3.5.5 or soon thereafter.
> > They're all on my watchlist:
> > https://issues.apache.org/jira/issues/?filter=-1&jql=watcher%20%3D%20mkedwards%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20created%20ASC
> > (I'm not counting the Ant->Maven transition in that, which I don't
> > know much about.)
> >
> > I'm trying out some more verbose logging for the junit tests, to try
> > to understand test flakiness.  But the Jenkins pre-commit pipeline
> > appears to be down?
> > https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/
> > On Wed, Nov 21, 2018 at 2:29 PM Michael K. Edwards
> >  wrote:
> > >
> > > Looks like we're really close.  Can I help?
> > >
> > > I think this is the list of release blockers:
> > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20ZooKeeper%20and%20resolution%20%3D%20Unresolved%20and%20fixVersion%20%3D%203.5.5%20AND%20priority%20in%20(blocker%2C%20critical)%20ORDER%20BY%20priority%20DESC%2C

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-22 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
After groveling through 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2708/consoleText,
 I think this may be a contributing factor to flaky tests:
```
 [exec] [junit] 2018-11-22 00:18:30,336 [myid:] - INFO  
[QuorumPeerListener:QuorumCnxManager$Listener@884] - My election bind port: 
localhost/127.0.0.1:19459
 [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - INFO  
[QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@493]
 - binding to port localhost/127.0.0.1:19466
 [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - ERROR 
[QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@497]
 - Error while reconfiguring
 [exec] [junit] org.jboss.netty.channel.ChannelException: Failed to 
bind to: localhost/127.0.0.1:19466
 [exec] [junit] at 
org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
 [exec] [junit] at 
org.apache.zookeeper.server.NettyServerCnxnFactory.reconfigure(NettyServerCnxnFactory.java:494)
 [exec] [junit] at 
org.apache.zookeeper.server.quorum.QuorumPeer.processReconfig(QuorumPeer.java:1947)
 [exec] [junit] at 
org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:154)
 [exec] [junit] at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:93)
 [exec] [junit] at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1263)
 [exec] [junit] Caused by: java.net.BindException: Address already 
in use
 [exec] [junit] at sun.nio.ch.Net.bind0(Native Method)
 [exec] [junit] at sun.nio.ch.Net.bind(Net.java:433)
 [exec] [junit] at sun.nio.ch.Net.bind(Net.java:425)
 [exec] [junit] at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
 [exec] [junit] at 
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
 [exec] [junit] at 
org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
 [exec] [junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
 [exec] [junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
 [exec] [junit] at 
org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
 [exec] [junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
 [exec] [junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
 [exec] [junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 [exec] [junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 [exec] [junit] at java.lang.Thread.run(Thread.java:748)
```


---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-22 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
Thanks maoling; I'm familiar with the procedure, just hadn't gotten around 
to juggling branches.


---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-22 Thread maoling
Github user maoling commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
@mkedwards 
make sure the code in your origin branch-3.5 is what you want for 
`ZOOKEEPER-2778 `
then `git push origin branch-3.5 -f` will be ok


---


Re: Glide path to getting 3.5.x out of beta

2018-11-22 Thread Enrico Olivelli
Thank you very much Michael
I am following and reviewing your patches

Enrico
Il giorno gio 22 nov 2018 alle ore 10:14 Michael K. Edwards
 ha scritto:
>
> Hmm.  Jira's a bit of a boneyard, isn't it?  And timeouts in flaky
> tests are a problem.
>
> I scrubbed through the open bugs and picked the ones that looked to me
> like they might deserve attention for 3.5.5 or soon thereafter.
> They're all on my watchlist:
> https://issues.apache.org/jira/issues/?filter=-1&jql=watcher%20%3D%20mkedwards%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20created%20ASC
> (I'm not counting the Ant->Maven transition in that, which I don't
> know much about.)
>
> I'm trying out some more verbose logging for the junit tests, to try
> to understand test flakiness.  But the Jenkins pre-commit pipeline
> appears to be down?
> https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/
> On Wed, Nov 21, 2018 at 2:29 PM Michael K. Edwards
>  wrote:
> >
> > Looks like we're really close.  Can I help?
> >
> > I think this is the list of release blockers:
> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20ZooKeeper%20and%20resolution%20%3D%20Unresolved%20and%20fixVersion%20%3D%203.5.5%20AND%20priority%20in%20(blocker%2C%20critical)%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
> >
> > I currently see 7 issues in that search, of which 4 are aspects of the
> > ongoing switch from ant to maven.  Setting that aside for the moment,
> > there are 3 critical bugs:
> >
> > ZOOKEEPER-2778  Potential server deadlock between follower sync with
> > leader and follower receiving external connection requests.
> >
> > ZOOKEEPER-1636  c-client crash when zoo_amulti failed
> >
> > ZOOKEEPER-1818  Fix don't care for trunk
> >
> > I put them in that order because that's the order in which I've
> > stacked the fixes in
> > https://github.com/mkedwards/zookeeper/tree/branch-3.5.  Then on top
> > of that, I've updated the versions of the external library
> > dependencies I think it's important to update: Jetty, Jackson, and
> > BouncyCastle.  The result seems to be a green build in Jenkins:
> > https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2705/
> >
> > Are these fixes in principle landable on the 3.5 branch, or do they
> > have to go to master first?  Does master need help to build green
> > before these can land there?  Are there other bugs that are similarly
> > critical to fix, and not tagged for 3.5.5 in Jira?  Is there other
> > testing that I can help with?  Are more hands needed on the Maven
> > work?
> >
> > Thanks for all the work that goes into keeping Zookeeper healthy and
> > advancing; it's a critical infrastructure component in several systems
> > I help develop and operate, and I like being able to rely on it.
> >
> > Cheers,
> > - Michael


Re: Glide path to getting 3.5.x out of beta

2018-11-22 Thread Michael K. Edwards
Hmm.  Jira's a bit of a boneyard, isn't it?  And timeouts in flaky
tests are a problem.

I scrubbed through the open bugs and picked the ones that looked to me
like they might deserve attention for 3.5.5 or soon thereafter.
They're all on my watchlist:
https://issues.apache.org/jira/issues/?filter=-1&jql=watcher%20%3D%20mkedwards%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20created%20ASC
(I'm not counting the Ant->Maven transition in that, which I don't
know much about.)

I'm trying out some more verbose logging for the junit tests, to try
to understand test flakiness.  But the Jenkins pre-commit pipeline
appears to be down?
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/
On Wed, Nov 21, 2018 at 2:29 PM Michael K. Edwards
 wrote:
>
> Looks like we're really close.  Can I help?
>
> I think this is the list of release blockers:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ZooKeeper%20and%20resolution%20%3D%20Unresolved%20and%20fixVersion%20%3D%203.5.5%20AND%20priority%20in%20(blocker%2C%20critical)%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
>
> I currently see 7 issues in that search, of which 4 are aspects of the
> ongoing switch from ant to maven.  Setting that aside for the moment,
> there are 3 critical bugs:
>
> ZOOKEEPER-2778  Potential server deadlock between follower sync with
> leader and follower receiving external connection requests.
>
> ZOOKEEPER-1636  c-client crash when zoo_amulti failed
>
> ZOOKEEPER-1818  Fix don't care for trunk
>
> I put them in that order because that's the order in which I've
> stacked the fixes in
> https://github.com/mkedwards/zookeeper/tree/branch-3.5.  Then on top
> of that, I've updated the versions of the external library
> dependencies I think it's important to update: Jetty, Jackson, and
> BouncyCastle.  The result seems to be a green build in Jenkins:
> https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2705/
>
> Are these fixes in principle landable on the 3.5 branch, or do they
> have to go to master first?  Does master need help to build green
> before these can land there?  Are there other bugs that are similarly
> critical to fix, and not tagged for 3.5.5 in Jira?  Is there other
> testing that I can help with?  Are more hands needed on the Maven
> work?
>
> Thanks for all the work that goes into keeping Zookeeper healthy and
> advancing; it's a critical infrastructure component in several systems
> I help develop and operate, and I like being able to rely on it.
>
> Cheers,
> - Michael


[jira] [Commented] (ZOOKEEPER-1814) Reduction of waiting time during Fast Leader Election

2018-11-22 Thread Daniel Peon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695632#comment-16695632
 ] 

Daniel Peon commented on ZOOKEEPER-1814:


Hi Michael,

I'm afraid of that the patch is not applicable, long time without receiving 
approval. It was design for 3.4.5 and it will need adaptation.

I'll try to grab some time to adapt it to new code, but if someone else is 
interested in it and will be able to make it faster, don't hesitate to handle 
it.

BR,

Dani.

> Reduction of waiting time during Fast Leader Election
> -
>
> Key: ZOOKEEPER-1814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.4.5, 3.5.0
>Reporter: Daniel Peon
>Assignee: Daniel Peon
>Priority: Major
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-1814.patch, ZOOKEEPER-1814.patch, 
> ZOOKEEPER-1814.patch, ZOOKEEPER-1814.patch, ZOOKEEPER-1814.patch, 
> ZOOKEEPER-1814.patch, ZOOKEEPER-1814.patch, ZOOKEEPER-1814.patch, 
> ZOOKEEPER-1814.patch, ZOOKEEPER-1814_release_3_5_0.patch, 
> ZOOKEEPER-1814_trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> FastLeader election takes long time because of the exponential backoff. 
> Currently the time is 60 seconds.
> It would be interesting to give the possibility to configure this parameter, 
> like for example for a Server shutdown.
> Otherwise, it sometimes takes so long and it has been detected a test failure 
> when executing: org.apache.zookeeper.server.quorum.QuorumPeerMainTest.
> This test case waits until 30 seconds and this is smaller than the 60 seconds 
> where the leader election can be waiting for at the moment of shutting down.
> Considering the failure during the test case, this issue was considered a 
> possible bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)