Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

2020-02-10 Thread Enrico Olivelli
Good.

I will cancel the vote for 3.6.0rc2.

I appreciate very much If Mate and his colleagues have time to work on a
fix.
Otherwise I will have cycles next week

I would also like to spend my time in setting up a few minimal integration
tests about the upgrade story

Enrico

Il Mar 11 Feb 2020, 07:30 Michael Han  ha scritto:

> Kudos Enrico, very thorough work as the final gate keeper of the release!
>
> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
>
> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of the rare
> piece of software that put so much emphasis on compatibilities thus it just
> works when upgrade / downgrade, which is amazing. One guarantee we always
> had is during rolling upgrade, the quorum will always be available, leading
> to no service interruption. It would be sad we lose such capability given
> this is still a tractable problem.
>
> Regarding the fix, can we just make 3.6.0 aware of the old protocol and
> speak old message format when it's talking to old server? Basically, an
> ugly if else check against the protocol version should work and there is no
> need to have multiple pass on rolling upgrade process.
>
>
> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli 
> wrote:
>
> > I suggest this plan:
> > - release 3.6.0 now
> > - improve the migration story, the flow outlined by Mate is
> > interesting, but it will take time
> >
> > 3.6.0rc2 got enough binding votes so I am going to finalize the
> > release this evening (within 8-10 hours) if no one comes out in the
> > VOTE thread with a -1
> >
> > Enrico
> >
> > Enrico
> >
> > Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
> >  ha scritto:
> > >
> > > On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar  wrote:
> > >
> > > > Hi,
> > > >
> > > > Answers inline.
> > > >
> > > >
> > > > > In my experience when you are close to a release it is better to to
> > > > > make big changes. (I am among the approvers of that patch, so I am
> > > > > responsible for this change)
> > > >
> > > >
> > > >
> > > > Although this statement is acceptable for me, I don’t feel this patch
> > > > should not have been merged into 3.6.0. Submission has been preceded
> > by a
> > > > long argument with MAPR folks who originally wanted to be merged into
> > 3.4
> > > > branch (considering the pace how ZooKeeper community is moving
> > forward) and
> > > > we reached an agreement that release it with 3.6.0.
> > > >
> > > > Make a long story short, this patch has been outstanding for ages
> > without
> > > > much attention from the community and contributors made a lot of
> > effort to
> > > > get it done before the release.
> > > >
> > > >
> > > > > I would like to ear from people that have been in the community for
> > > > > long time, then I am ready to complete the release process for
> > > > > 3.6.0rc2.
> > > >
> > > >
> > > > Me too.
> > > >
> > > > I tend to accept the way rolling restart works now - as you described
> > > > Enrico - and given that situation was pretty much the same between
> 3.4
> > and
> > > > 3.5, I don’t feel we have to make additional changes.
> > > >
> > > > On the other hand, the fix that Mate suggested sounds quite cool, I’m
> > also
> > > > happy to work on getting it in.
> > > >
> > > > Fyi, Release Management page says the following:
> > > >
> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> > > >
> > > > "major.minor release of ZooKeeper must be backwards compatible with
> the
> > > > previous minor release, major.(minor-1)"
> > > >
> > > >
> > > Our users, direct and indirect, value the ability to migrate to newer
> > > versions - esp as we drop support for older. Frictions such as this can
> > be
> > > a reason to go elsewhere. I'm "pro" b/w compact - esp given our
> published
> > > guidelines.
> > >
> > > Patrick
> > >
> > >
> > > > Andor
> > > >
> > > >
> > > >
> > > >
> > > > > On 2020. Feb 10., at 11:32, Enrico Olivelli 
> > wrote:
> > > > >
> > > > > Thank you Mate for checking and explaining this story.
> > > > >
> > > > > I find it very interesting that the cause is ZOOKEEPER-3188 as:
> > > > > - it is the last "big patch" committed to 3.6 before starting the
> > > > > release process
> > > > > - it is the cause of the failure of the first RC
> > > > >
> > > > > In my experience when you are close to a release it is better to to
> > > > > make big changes. (I am among the approvers of that patch, so I am
> > > > > responsible for this change)
> > > > >
> > > > > This is a pointer to the change to whom who wants to understand
> > better
> > > > > the context
> > > > >
> > > >
> >
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> > > > >
> > > > > IIUC even for the upgrade from 3.4 to 3.5 the story was the same
> and
> > > > > if this statement holds then I feel we can continue
> > > > > with this release.
> > > > >
> > > > > - Reverting ZOOKEEPER-3188 is not an option for me, it is too
> > complex.
> > > > > - Making 3.5 and

Jenkins build became unstable: zookeeper-master-maven #663

2020-02-10 Thread Apache Jenkins Server
See 




Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

2020-02-10 Thread Michael Han
Kudos Enrico, very thorough work as the final gate keeper of the release!

Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.

I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of the rare
piece of software that put so much emphasis on compatibilities thus it just
works when upgrade / downgrade, which is amazing. One guarantee we always
had is during rolling upgrade, the quorum will always be available, leading
to no service interruption. It would be sad we lose such capability given
this is still a tractable problem.

Regarding the fix, can we just make 3.6.0 aware of the old protocol and
speak old message format when it's talking to old server? Basically, an
ugly if else check against the protocol version should work and there is no
need to have multiple pass on rolling upgrade process.


On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli 
wrote:

> I suggest this plan:
> - release 3.6.0 now
> - improve the migration story, the flow outlined by Mate is
> interesting, but it will take time
>
> 3.6.0rc2 got enough binding votes so I am going to finalize the
> release this evening (within 8-10 hours) if no one comes out in the
> VOTE thread with a -1
>
> Enrico
>
> Enrico
>
> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
>  ha scritto:
> >
> > On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar  wrote:
> >
> > > Hi,
> > >
> > > Answers inline.
> > >
> > >
> > > > In my experience when you are close to a release it is better to to
> > > > make big changes. (I am among the approvers of that patch, so I am
> > > > responsible for this change)
> > >
> > >
> > >
> > > Although this statement is acceptable for me, I don’t feel this patch
> > > should not have been merged into 3.6.0. Submission has been preceded
> by a
> > > long argument with MAPR folks who originally wanted to be merged into
> 3.4
> > > branch (considering the pace how ZooKeeper community is moving
> forward) and
> > > we reached an agreement that release it with 3.6.0.
> > >
> > > Make a long story short, this patch has been outstanding for ages
> without
> > > much attention from the community and contributors made a lot of
> effort to
> > > get it done before the release.
> > >
> > >
> > > > I would like to ear from people that have been in the community for
> > > > long time, then I am ready to complete the release process for
> > > > 3.6.0rc2.
> > >
> > >
> > > Me too.
> > >
> > > I tend to accept the way rolling restart works now - as you described
> > > Enrico - and given that situation was pretty much the same between 3.4
> and
> > > 3.5, I don’t feel we have to make additional changes.
> > >
> > > On the other hand, the fix that Mate suggested sounds quite cool, I’m
> also
> > > happy to work on getting it in.
> > >
> > > Fyi, Release Management page says the following:
> > >
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> > >
> > > "major.minor release of ZooKeeper must be backwards compatible with the
> > > previous minor release, major.(minor-1)"
> > >
> > >
> > Our users, direct and indirect, value the ability to migrate to newer
> > versions - esp as we drop support for older. Frictions such as this can
> be
> > a reason to go elsewhere. I'm "pro" b/w compact - esp given our published
> > guidelines.
> >
> > Patrick
> >
> >
> > > Andor
> > >
> > >
> > >
> > >
> > > > On 2020. Feb 10., at 11:32, Enrico Olivelli 
> wrote:
> > > >
> > > > Thank you Mate for checking and explaining this story.
> > > >
> > > > I find it very interesting that the cause is ZOOKEEPER-3188 as:
> > > > - it is the last "big patch" committed to 3.6 before starting the
> > > > release process
> > > > - it is the cause of the failure of the first RC
> > > >
> > > > In my experience when you are close to a release it is better to to
> > > > make big changes. (I am among the approvers of that patch, so I am
> > > > responsible for this change)
> > > >
> > > > This is a pointer to the change to whom who wants to understand
> better
> > > > the context
> > > >
> > >
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> > > >
> > > > IIUC even for the upgrade from 3.4 to 3.5 the story was the same and
> > > > if this statement holds then I feel we can continue
> > > > with this release.
> > > >
> > > > - Reverting ZOOKEEPER-3188 is not an option for me, it is too
> complex.
> > > > - Making 3.5 and 3.6 "compatible" can be very tricky and we do not
> > > > have tools to certify this compatibility (at least not in the short
> > > > term)
> > > >
> > > > I would like to ear from people that have been in the community for
> > > > long time, then I am ready to complete the release process for
> > > > 3.6.0rc2.
> > > >
> > > > I will update the website and the release notes with a specific
> > > > warning about the upgrade, we should also update the Wiki
> > > >
> > > > Enrico
> > > >
> > > >
> > > > Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> > > >  ha scritto:
> > > >>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

2020-02-10 Thread Enrico Olivelli
I suggest this plan:
- release 3.6.0 now
- improve the migration story, the flow outlined by Mate is
interesting, but it will take time

3.6.0rc2 got enough binding votes so I am going to finalize the
release this evening (within 8-10 hours) if no one comes out in the
VOTE thread with a -1

Enrico

Enrico

Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
 ha scritto:
>
> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar  wrote:
>
> > Hi,
> >
> > Answers inline.
> >
> >
> > > In my experience when you are close to a release it is better to to
> > > make big changes. (I am among the approvers of that patch, so I am
> > > responsible for this change)
> >
> >
> >
> > Although this statement is acceptable for me, I don’t feel this patch
> > should not have been merged into 3.6.0. Submission has been preceded by a
> > long argument with MAPR folks who originally wanted to be merged into 3.4
> > branch (considering the pace how ZooKeeper community is moving forward) and
> > we reached an agreement that release it with 3.6.0.
> >
> > Make a long story short, this patch has been outstanding for ages without
> > much attention from the community and contributors made a lot of effort to
> > get it done before the release.
> >
> >
> > > I would like to ear from people that have been in the community for
> > > long time, then I am ready to complete the release process for
> > > 3.6.0rc2.
> >
> >
> > Me too.
> >
> > I tend to accept the way rolling restart works now - as you described
> > Enrico - and given that situation was pretty much the same between 3.4 and
> > 3.5, I don’t feel we have to make additional changes.
> >
> > On the other hand, the fix that Mate suggested sounds quite cool, I’m also
> > happy to work on getting it in.
> >
> > Fyi, Release Management page says the following:
> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> >
> > "major.minor release of ZooKeeper must be backwards compatible with the
> > previous minor release, major.(minor-1)"
> >
> >
> Our users, direct and indirect, value the ability to migrate to newer
> versions - esp as we drop support for older. Frictions such as this can be
> a reason to go elsewhere. I'm "pro" b/w compact - esp given our published
> guidelines.
>
> Patrick
>
>
> > Andor
> >
> >
> >
> >
> > > On 2020. Feb 10., at 11:32, Enrico Olivelli  wrote:
> > >
> > > Thank you Mate for checking and explaining this story.
> > >
> > > I find it very interesting that the cause is ZOOKEEPER-3188 as:
> > > - it is the last "big patch" committed to 3.6 before starting the
> > > release process
> > > - it is the cause of the failure of the first RC
> > >
> > > In my experience when you are close to a release it is better to to
> > > make big changes. (I am among the approvers of that patch, so I am
> > > responsible for this change)
> > >
> > > This is a pointer to the change to whom who wants to understand better
> > > the context
> > >
> > https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> > >
> > > IIUC even for the upgrade from 3.4 to 3.5 the story was the same and
> > > if this statement holds then I feel we can continue
> > > with this release.
> > >
> > > - Reverting ZOOKEEPER-3188 is not an option for me, it is too complex.
> > > - Making 3.5 and 3.6 "compatible" can be very tricky and we do not
> > > have tools to certify this compatibility (at least not in the short
> > > term)
> > >
> > > I would like to ear from people that have been in the community for
> > > long time, then I am ready to complete the release process for
> > > 3.6.0rc2.
> > >
> > > I will update the website and the release notes with a specific
> > > warning about the upgrade, we should also update the Wiki
> > >
> > > Enrico
> > >
> > >
> > > Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> > >  ha scritto:
> > >>
> > >> Hi Enrico!
> > >>
> > >> This is caused by the different PROTOCOL_VERSION in the
> > QuorumCnxManager.
> > >> The Protocol version  was changed last time in ZOOKEEPER-2186 released
> > >> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs. Later I
> > >> also changed the protocol version when the format of the initial message
> > >> changed in ZOOKEEPER-3188. So actually the quorum protocol is not
> > >> compatible in this case and is the 'expected' behavior if you upgrade
> > e.g
> > >> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to 3.6.0.
> > >>
> > >> We had some discussion in the PR of ZOOKEEPER-3188 back then and got to
> > the
> > >> conclusion that it is not that bad, as there will be no data loss as you
> > >> wrote. The tricky thing is that during rolling upgrade we should ensure
> > >> both backward and forward compatibility to make sure that the old and
> > the
> > >> new part of the quorum can still speak to each other. The current
> > solution
> > >> (simply failing if the protocol versions mismatch) is more simple and
> > still
> > >> working just fine: as the se

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

2020-02-10 Thread Patrick Hunt
On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar  wrote:

> Hi,
>
> Answers inline.
>
>
> > In my experience when you are close to a release it is better to to
> > make big changes. (I am among the approvers of that patch, so I am
> > responsible for this change)
>
>
>
> Although this statement is acceptable for me, I don’t feel this patch
> should not have been merged into 3.6.0. Submission has been preceded by a
> long argument with MAPR folks who originally wanted to be merged into 3.4
> branch (considering the pace how ZooKeeper community is moving forward) and
> we reached an agreement that release it with 3.6.0.
>
> Make a long story short, this patch has been outstanding for ages without
> much attention from the community and contributors made a lot of effort to
> get it done before the release.
>
>
> > I would like to ear from people that have been in the community for
> > long time, then I am ready to complete the release process for
> > 3.6.0rc2.
>
>
> Me too.
>
> I tend to accept the way rolling restart works now - as you described
> Enrico - and given that situation was pretty much the same between 3.4 and
> 3.5, I don’t feel we have to make additional changes.
>
> On the other hand, the fix that Mate suggested sounds quite cool, I’m also
> happy to work on getting it in.
>
> Fyi, Release Management page says the following:
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
>
> "major.minor release of ZooKeeper must be backwards compatible with the
> previous minor release, major.(minor-1)"
>
>
Our users, direct and indirect, value the ability to migrate to newer
versions - esp as we drop support for older. Frictions such as this can be
a reason to go elsewhere. I'm "pro" b/w compact - esp given our published
guidelines.

Patrick


> Andor
>
>
>
>
> > On 2020. Feb 10., at 11:32, Enrico Olivelli  wrote:
> >
> > Thank you Mate for checking and explaining this story.
> >
> > I find it very interesting that the cause is ZOOKEEPER-3188 as:
> > - it is the last "big patch" committed to 3.6 before starting the
> > release process
> > - it is the cause of the failure of the first RC
> >
> > In my experience when you are close to a release it is better to to
> > make big changes. (I am among the approvers of that patch, so I am
> > responsible for this change)
> >
> > This is a pointer to the change to whom who wants to understand better
> > the context
> >
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> >
> > IIUC even for the upgrade from 3.4 to 3.5 the story was the same and
> > if this statement holds then I feel we can continue
> > with this release.
> >
> > - Reverting ZOOKEEPER-3188 is not an option for me, it is too complex.
> > - Making 3.5 and 3.6 "compatible" can be very tricky and we do not
> > have tools to certify this compatibility (at least not in the short
> > term)
> >
> > I would like to ear from people that have been in the community for
> > long time, then I am ready to complete the release process for
> > 3.6.0rc2.
> >
> > I will update the website and the release notes with a specific
> > warning about the upgrade, we should also update the Wiki
> >
> > Enrico
> >
> >
> > Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> >  ha scritto:
> >>
> >> Hi Enrico!
> >>
> >> This is caused by the different PROTOCOL_VERSION in the
> QuorumCnxManager.
> >> The Protocol version  was changed last time in ZOOKEEPER-2186 released
> >> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs. Later I
> >> also changed the protocol version when the format of the initial message
> >> changed in ZOOKEEPER-3188. So actually the quorum protocol is not
> >> compatible in this case and is the 'expected' behavior if you upgrade
> e.g
> >> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to 3.6.0.
> >>
> >> We had some discussion in the PR of ZOOKEEPER-3188 back then and got to
> the
> >> conclusion that it is not that bad, as there will be no data loss as you
> >> wrote. The tricky thing is that during rolling upgrade we should ensure
> >> both backward and forward compatibility to make sure that the old and
> the
> >> new part of the quorum can still speak to each other. The current
> solution
> >> (simply failing if the protocol versions mismatch) is more simple and
> still
> >> working just fine: as the servers are restarted one-by-one, the nodes
> with
> >> the old protocol version and the nodes with the new protocol version
> will
> >> form two partitions, but any given time only one partition will have the
> >> quorum.
> >>
> >> Still, thinking it trough, as a side effect in these cases there will
> be a
> >> short time when none of the partitions will have quorums (when we have N
> >> servers with the old protocol version, N servers with the new protocol
> >> version, and there is one server just being restarted). I am not sure
> if we
> >> can accept this.
> >>
> >> For ZOOKEEPER-3188 we can add a small 

Build failed in Jenkins: zookeeper-branch36-java11 #46

2020-02-10 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 58.29 KB...]
Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 


Jenkins build is back to normal : zookeeper-master-maven-jdk12 #368

2020-02-10 Thread Apache Jenkins Server
See 




Re: [VOTE] Apache ZooKeeper release 3.5.7 candidate 2

2020-02-10 Thread Andor Molnar
+1 (binding)

- release notes are OK,
- documentation looks good,
- verified signatures, checksum,
- Java & C unit tests passed,
- verified 3-node cluster with zk-latencies.py (create, get, delete, setAcl, 
getAcl, watchers)

Andor



> On 2020. Feb 10., at 12:52, Norbert Kalmar  wrote:
> 
> This is the third bugfix release candidate for 3.5.7. It fixes 25 issues,
> including third party CVE fixes, potential data loss and potential split
> brain if some rare conditions exists.
> 
> There are 4 additional patches compared to rc0 and rc1:
> - ZOOKEEPER-3453: missing 'SET' in zkCli on windows
> - ZOOKEEPER-3716: upgrade netty 4.1.42 to address CVE-2019-20444 CVE-20…
> - ZOOKEEPER-3718: The tarball generated by assembly is missing some files
> - ZOOKEEPER-3719: Fix C Client compilation issues
> 
> The full release notes are available at:
> 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&version=12346098
> 
> *** Please download, test and vote by February 13th 2020, 23:59 UTC+0. ***
> 
> Source files:
> https://people.apache.org/~nkalmar/zookeeper-3.5.7-candidate-2/
> 
> Maven staging repo:
> https://repository.apache.org/content/groups/staging/org/apache/zookeeper/zookeeper/3.5.7/
> 
> The release candidate tag in git to be voted upon: release-3.5.7-rc2
> 
> ZooKeeper's KEYS file containing PGP keys we use to sign the release:
> https://www.apache.org/dist/zookeeper/KEYS
> 
> Should we release this candidate?



Build failed in Jenkins: zookeeper-branch36-java8 #45

2020-02-10 Thread Apache Jenkins Server
See 

Changes:


--
[...truncated 51.28 KB...]
Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 


Re: [VOTE] Apache ZooKeeper release 3.5.7 candidate 2

2020-02-10 Thread Jordan Zimmerman
I ran Curator tests and they pass

+1 (non binding)

-Jordan

> On Feb 10, 2020, at 6:52 AM, Norbert Kalmar  wrote:
> 
> This is the third bugfix release candidate for 3.5.7. It fixes 25 issues,
> including third party CVE fixes, potential data loss and potential split
> brain if some rare conditions exists.
> 
> There are 4 additional patches compared to rc0 and rc1:
> - ZOOKEEPER-3453: missing 'SET' in zkCli on windows
> - ZOOKEEPER-3716: upgrade netty 4.1.42 to address CVE-2019-20444 CVE-20…
> - ZOOKEEPER-3718: The tarball generated by assembly is missing some files
> - ZOOKEEPER-3719: Fix C Client compilation issues
> 
> The full release notes are available at:
> 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&version=12346098
> 
> *** Please download, test and vote by February 13th 2020, 23:59 UTC+0. ***
> 
> Source files:
> https://people.apache.org/~nkalmar/zookeeper-3.5.7-candidate-2/
> 
> Maven staging repo:
> https://repository.apache.org/content/groups/staging/org/apache/zookeeper/zookeeper/3.5.7/
> 
> The release candidate tag in git to be voted upon: release-3.5.7-rc2
> 
> ZooKeeper's KEYS file containing PGP keys we use to sign the release:
> https://www.apache.org/dist/zookeeper/KEYS
> 
> Should we release this candidate?



[VOTE] Apache ZooKeeper release 3.5.7 candidate 2

2020-02-10 Thread Norbert Kalmar
This is the third bugfix release candidate for 3.5.7. It fixes 25 issues,
including third party CVE fixes, potential data loss and potential split
brain if some rare conditions exists.

There are 4 additional patches compared to rc0 and rc1:
- ZOOKEEPER-3453: missing 'SET' in zkCli on windows
- ZOOKEEPER-3716: upgrade netty 4.1.42 to address CVE-2019-20444 CVE-20…
- ZOOKEEPER-3718: The tarball generated by assembly is missing some files
- ZOOKEEPER-3719: Fix C Client compilation issues

The full release notes are available at:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&version=12346098

*** Please download, test and vote by February 13th 2020, 23:59 UTC+0. ***

Source files:
https://people.apache.org/~nkalmar/zookeeper-3.5.7-candidate-2/

Maven staging repo:
https://repository.apache.org/content/groups/staging/org/apache/zookeeper/zookeeper/3.5.7/

The release candidate tag in git to be voted upon: release-3.5.7-rc2

ZooKeeper's KEYS file containing PGP keys we use to sign the release:
https://www.apache.org/dist/zookeeper/KEYS

Should we release this candidate?


Re: [VOTE] Apache ZooKeeper release 3.5.7 candidate 1

2020-02-10 Thread Norbert Kalmar
Hi Jordan,

It is available again. Rc1 got downvoted, so I created rc2 which is now
available in the staging repo. I'll also write the email about it but
before I'll just run a few more tests.

- Norbert

On Sun, Feb 9, 2020 at 4:43 PM Jordan Zimmerman 
wrote:

> 3.5.7 is not in the staging repo. I'd like to test with Curator.
>
>
> https://repository.apache.org/content/groups/staging/org/apache/zookeeper/zookeeper/
> <
> https://repository.apache.org/content/groups/staging/org/apache/zookeeper/zookeeper/
> >
>
> -Jordan
>
> > On Feb 7, 2020, at 7:29 AM, Norbert Kalmar  wrote:
> >
> > This is the second bugfix release candidate for 3.5.7. It fixes 21
> issues,
> > including third party CVE fixes, potential data loss and potential split
> > brain if some rare conditions exists.
> >
> > (I have signed rc0 with the wrong key - sorry for that). Everything else
> is
> > unchanged from rc0.
> >
> > The full release notes is available at:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&version=12346098
> >
> > *** Please download, test and vote by February 11th 2020, 23:59 UTC+0.
> ***
> >
> > Source files:
> > https://people.apache.org/~nkalmar/zookeeper-3.5.7-candidate-1/
> >
> > Maven staging repo:
> >
> https://repository.apache.org/content/groups/staging/org/apache/zookeeper/zookeeper/3.5.7/
> >
> > The release candidate tag in git to be voted upon: release-3.5.7-rc1
> > (points to the same commit as rc0)
> >
> > ZooKeeper's KEYS file containing PGP keys we use to sign the release:
> > https://www.apache.org/dist/zookeeper/KEYS
> >
> > Should we release this candidate?
>
>


Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

2020-02-10 Thread Andor Molnar
Hi,

Answers inline.


> In my experience when you are close to a release it is better to to
> make big changes. (I am among the approvers of that patch, so I am
> responsible for this change)



Although this statement is acceptable for me, I don’t feel this patch should 
not have been merged into 3.6.0. Submission has been preceded by a long 
argument with MAPR folks who originally wanted to be merged into 3.4 branch 
(considering the pace how ZooKeeper community is moving forward) and we reached 
an agreement that release it with 3.6.0.

Make a long story short, this patch has been outstanding for ages without much 
attention from the community and contributors made a lot of effort to get it 
done before the release.


> I would like to ear from people that have been in the community for
> long time, then I am ready to complete the release process for
> 3.6.0rc2.


Me too.

I tend to accept the way rolling restart works now - as you described Enrico - 
and given that situation was pretty much the same between 3.4 and 3.5, I don’t 
feel we have to make additional changes.

On the other hand, the fix that Mate suggested sounds quite cool, I’m also 
happy to work on getting it in.

Fyi, Release Management page says the following: 
https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement

"major.minor release of ZooKeeper must be backwards compatible with the 
previous minor release, major.(minor-1)"

Andor




> On 2020. Feb 10., at 11:32, Enrico Olivelli  wrote:
> 
> Thank you Mate for checking and explaining this story.
> 
> I find it very interesting that the cause is ZOOKEEPER-3188 as:
> - it is the last "big patch" committed to 3.6 before starting the
> release process
> - it is the cause of the failure of the first RC
> 
> In my experience when you are close to a release it is better to to
> make big changes. (I am among the approvers of that patch, so I am
> responsible for this change)
> 
> This is a pointer to the change to whom who wants to understand better
> the context
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> 
> IIUC even for the upgrade from 3.4 to 3.5 the story was the same and
> if this statement holds then I feel we can continue
> with this release.
> 
> - Reverting ZOOKEEPER-3188 is not an option for me, it is too complex.
> - Making 3.5 and 3.6 "compatible" can be very tricky and we do not
> have tools to certify this compatibility (at least not in the short
> term)
> 
> I would like to ear from people that have been in the community for
> long time, then I am ready to complete the release process for
> 3.6.0rc2.
> 
> I will update the website and the release notes with a specific
> warning about the upgrade, we should also update the Wiki
> 
> Enrico
> 
> 
> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
>  ha scritto:
>> 
>> Hi Enrico!
>> 
>> This is caused by the different PROTOCOL_VERSION in the QuorumCnxManager.
>> The Protocol version  was changed last time in ZOOKEEPER-2186 released
>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs. Later I
>> also changed the protocol version when the format of the initial message
>> changed in ZOOKEEPER-3188. So actually the quorum protocol is not
>> compatible in this case and is the 'expected' behavior if you upgrade e.g
>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to 3.6.0.
>> 
>> We had some discussion in the PR of ZOOKEEPER-3188 back then and got to the
>> conclusion that it is not that bad, as there will be no data loss as you
>> wrote. The tricky thing is that during rolling upgrade we should ensure
>> both backward and forward compatibility to make sure that the old and the
>> new part of the quorum can still speak to each other. The current solution
>> (simply failing if the protocol versions mismatch) is more simple and still
>> working just fine: as the servers are restarted one-by-one, the nodes with
>> the old protocol version and the nodes with the new protocol version will
>> form two partitions, but any given time only one partition will have the
>> quorum.
>> 
>> Still, thinking it trough, as a side effect in these cases there will be a
>> short time when none of the partitions will have quorums (when we have N
>> servers with the old protocol version, N servers with the new protocol
>> version, and there is one server just being restarted). I am not sure if we
>> can accept this.
>> 
>> For ZOOKEEPER-3188 we can add a small patch to make it possible to parse
>> the initial message of the old protocol version with the new code. But I am
>> not sure if it would be enough (as the old code will not be able to parse
>> the new initial message).
>> 
>> One option can be to make a patch also for 3.5 to have a version which
>> supports both protocol versions. (let's say in 3.5.8) Then we can write to
>> the release note, that if you need rolling upgrade from any versions since
>> 3.4.7, then you have to first upgrade from 3

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

2020-02-10 Thread Enrico Olivelli
Thank you Mate for checking and explaining this story.

I find it very interesting that the cause is ZOOKEEPER-3188 as:
- it is the last "big patch" committed to 3.6 before starting the
release process
- it is the cause of the failure of the first RC

In my experience when you are close to a release it is better to to
make big changes. (I am among the approvers of that patch, so I am
responsible for this change)

This is a pointer to the change to whom who wants to understand better
the context
https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11

IIUC even for the upgrade from 3.4 to 3.5 the story was the same and
if this statement holds then I feel we can continue
with this release.

- Reverting ZOOKEEPER-3188 is not an option for me, it is too complex.
- Making 3.5 and 3.6 "compatible" can be very tricky and we do not
have tools to certify this compatibility (at least not in the short
term)

I would like to ear from people that have been in the community for
long time, then I am ready to complete the release process for
3.6.0rc2.

I will update the website and the release notes with a specific
warning about the upgrade, we should also update the Wiki

Enrico


Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
 ha scritto:
>
> Hi Enrico!
>
> This is caused by the different PROTOCOL_VERSION in the QuorumCnxManager.
> The Protocol version  was changed last time in ZOOKEEPER-2186 released
> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs. Later I
> also changed the protocol version when the format of the initial message
> changed in ZOOKEEPER-3188. So actually the quorum protocol is not
> compatible in this case and is the 'expected' behavior if you upgrade e.g
> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to 3.6.0.
>
> We had some discussion in the PR of ZOOKEEPER-3188 back then and got to the
> conclusion that it is not that bad, as there will be no data loss as you
> wrote. The tricky thing is that during rolling upgrade we should ensure
> both backward and forward compatibility to make sure that the old and the
> new part of the quorum can still speak to each other. The current solution
> (simply failing if the protocol versions mismatch) is more simple and still
> working just fine: as the servers are restarted one-by-one, the nodes with
> the old protocol version and the nodes with the new protocol version will
> form two partitions, but any given time only one partition will have the
> quorum.
>
> Still, thinking it trough, as a side effect in these cases there will be a
> short time when none of the partitions will have quorums (when we have N
> servers with the old protocol version, N servers with the new protocol
> version, and there is one server just being restarted). I am not sure if we
> can accept this.
>
> For ZOOKEEPER-3188 we can add a small patch to make it possible to parse
> the initial message of the old protocol version with the new code. But I am
> not sure if it would be enough (as the old code will not be able to parse
> the new initial message).
>
> One option can be to make a patch also for 3.5 to have a version which
> supports both protocol versions. (let's say in 3.5.8) Then we can write to
> the release note, that if you need rolling upgrade from any versions since
> 3.4.7, then you have to first upgrade from 3.5.8 before upgrading to 3.6.0.
> We can even make the same thing on the 3.4 branch.
>
> But I am also new to the community... It would be great to hear the opinion
> of more experienced people.
> Whatever the decision will be, I am happy to make the changes.
>
> And sorry for breaking the RC (if we decide that this needs to be
> changed...).  ZOOKEEPER-3188 was a complex patch.
>
> Kind regards,
> Mate
>
> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli  wrote:
>
> > Hi,
> > even if we had enough binding +1 on 3.6.0rc2 before closing the VOTE
> > of 3.6.0 I wanted to finish my tests and I am coming to an apparent
> > blocker.
> >
> > I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks like
> > peers are not able to talk to each other.
> > I have a cluster of 3, server1, server2 and server3.
> > When I upgrade server1 to 3.6.0rc2 I see this kind of errors on 3.5 nodes:
> >
> > 2020-02-10 09:35:07,745 [myid:3] - INFO
> > [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] - Received
> > connection request 127.0.0.1:62591
> > 2020-02-10 09:35:07,746 [myid:3] - ERROR
> > [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> >
> > org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> > Got unrecognized protocol version -65535
> >
> > Once I upgrade all of the peers the system is up and running, without
> > apparently no data loss.
> >
> > During the upgrade as soon as I upgrade the first node, say, server1,
> > server1 is not able to accept connections (error "Close of session 0x0
> > java.io.IOException: ZooKeeperServer not running")  from clien

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

2020-02-10 Thread Szalay-Bekő Máté
Actually, we have an other option: we can follow the way, how the rolling
restart support for the QuorumSSL was implemented.
- we can make 3.6.0 to be able to read both protocol versions
- we can add a parameter that tells the 3.6.0 which protocol version to use
(using the old one brakes / disables the MultiAddress feature, but I think
that is OK during upgrade)
- then we can make a rolling upgrade with the old protocol version
- then we can change the parameter to use the new protocol version (at this
point all nodes can understand both versions)
- then we can do a rolling restart with the new config

I would vote on this solution.

Kind regards,
Mate


On Mon, Feb 10, 2020 at 11:17 AM Szalay-Bekő Máté <
szalay.beko.m...@gmail.com> wrote:

> Hi Enrico!
>
> This is caused by the different PROTOCOL_VERSION in the QuorumCnxManager.
> The Protocol version  was changed last time in ZOOKEEPER-2186 released
> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs. Later I
> also changed the protocol version when the format of the initial message
> changed in ZOOKEEPER-3188. So actually the quorum protocol is not
> compatible in this case and is the 'expected' behavior if you upgrade e.g
> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to 3.6.0.
>
> We had some discussion in the PR of ZOOKEEPER-3188 back then and got to
> the conclusion that it is not that bad, as there will be no data loss as
> you wrote. The tricky thing is that during rolling upgrade we should ensure
> both backward and forward compatibility to make sure that the old and the
> new part of the quorum can still speak to each other. The current solution
> (simply failing if the protocol versions mismatch) is more simple and still
> working just fine: as the servers are restarted one-by-one, the nodes with
> the old protocol version and the nodes with the new protocol version will
> form two partitions, but any given time only one partition will have the
> quorum.
>
> Still, thinking it trough, as a side effect in these cases there will be a
> short time when none of the partitions will have quorums (when we have N
> servers with the old protocol version, N servers with the new protocol
> version, and there is one server just being restarted). I am not sure if we
> can accept this.
>
> For ZOOKEEPER-3188 we can add a small patch to make it possible to parse
> the initial message of the old protocol version with the new code. But I am
> not sure if it would be enough (as the old code will not be able to parse
> the new initial message).
>
> One option can be to make a patch also for 3.5 to have a version which
> supports both protocol versions. (let's say in 3.5.8) Then we can write to
> the release note, that if you need rolling upgrade from any versions since
> 3.4.7, then you have to first upgrade from 3.5.8 before upgrading to 3.6.0.
> We can even make the same thing on the 3.4 branch.
>
> But I am also new to the community... It would be great to hear the
> opinion of more experienced people.
> Whatever the decision will be, I am happy to make the changes.
>
> And sorry for breaking the RC (if we decide that this needs to be
> changed...).  ZOOKEEPER-3188 was a complex patch.
>
> Kind regards,
> Mate
>
> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli 
> wrote:
>
>> Hi,
>> even if we had enough binding +1 on 3.6.0rc2 before closing the VOTE
>> of 3.6.0 I wanted to finish my tests and I am coming to an apparent
>> blocker.
>>
>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks like
>> peers are not able to talk to each other.
>> I have a cluster of 3, server1, server2 and server3.
>> When I upgrade server1 to 3.6.0rc2 I see this kind of errors on 3.5 nodes:
>>
>> 2020-02-10 09:35:07,745 [myid:3] - INFO
>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] - Received
>> connection request 127.0.0.1:62591
>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
>>
>> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
>> Got unrecognized protocol version -65535
>>
>> Once I upgrade all of the peers the system is up and running, without
>> apparently no data loss.
>>
>> During the upgrade as soon as I upgrade the first node, say, server1,
>> server1 is not able to accept connections (error "Close of session 0x0
>> java.io.IOException: ZooKeeperServer not running")  from clients, this
>> is expected, because as far as it cannot talk with the other peers it
>> is practically partitioned away from the cluster.
>>
>> My questions are:
>> 1) is this expected ? I can't remember protocol changes from 3.5 to
>> 3.6, but actually 3.6 diverged from 3.5 branch so long ago, and I was
>> not in the community as dev so I cannot tell
>> 2) is this a viable option for users ? to have some temporary glitch
>> during the upgrade and hope that the upgrade completes without
>> troubles ?
>>
>> In theory as long as two servers are running the same majo

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

2020-02-10 Thread Szalay-Bekő Máté
Hi Enrico!

This is caused by the different PROTOCOL_VERSION in the QuorumCnxManager.
The Protocol version  was changed last time in ZOOKEEPER-2186 released
first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs. Later I
also changed the protocol version when the format of the initial message
changed in ZOOKEEPER-3188. So actually the quorum protocol is not
compatible in this case and is the 'expected' behavior if you upgrade e.g
from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to 3.6.0.

We had some discussion in the PR of ZOOKEEPER-3188 back then and got to the
conclusion that it is not that bad, as there will be no data loss as you
wrote. The tricky thing is that during rolling upgrade we should ensure
both backward and forward compatibility to make sure that the old and the
new part of the quorum can still speak to each other. The current solution
(simply failing if the protocol versions mismatch) is more simple and still
working just fine: as the servers are restarted one-by-one, the nodes with
the old protocol version and the nodes with the new protocol version will
form two partitions, but any given time only one partition will have the
quorum.

Still, thinking it trough, as a side effect in these cases there will be a
short time when none of the partitions will have quorums (when we have N
servers with the old protocol version, N servers with the new protocol
version, and there is one server just being restarted). I am not sure if we
can accept this.

For ZOOKEEPER-3188 we can add a small patch to make it possible to parse
the initial message of the old protocol version with the new code. But I am
not sure if it would be enough (as the old code will not be able to parse
the new initial message).

One option can be to make a patch also for 3.5 to have a version which
supports both protocol versions. (let's say in 3.5.8) Then we can write to
the release note, that if you need rolling upgrade from any versions since
3.4.7, then you have to first upgrade from 3.5.8 before upgrading to 3.6.0.
We can even make the same thing on the 3.4 branch.

But I am also new to the community... It would be great to hear the opinion
of more experienced people.
Whatever the decision will be, I am happy to make the changes.

And sorry for breaking the RC (if we decide that this needs to be
changed...).  ZOOKEEPER-3188 was a complex patch.

Kind regards,
Mate

On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli  wrote:

> Hi,
> even if we had enough binding +1 on 3.6.0rc2 before closing the VOTE
> of 3.6.0 I wanted to finish my tests and I am coming to an apparent
> blocker.
>
> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks like
> peers are not able to talk to each other.
> I have a cluster of 3, server1, server2 and server3.
> When I upgrade server1 to 3.6.0rc2 I see this kind of errors on 3.5 nodes:
>
> 2020-02-10 09:35:07,745 [myid:3] - INFO
> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] - Received
> connection request 127.0.0.1:62591
> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> Got unrecognized protocol version -65535
>
> Once I upgrade all of the peers the system is up and running, without
> apparently no data loss.
>
> During the upgrade as soon as I upgrade the first node, say, server1,
> server1 is not able to accept connections (error "Close of session 0x0
> java.io.IOException: ZooKeeperServer not running")  from clients, this
> is expected, because as far as it cannot talk with the other peers it
> is practically partitioned away from the cluster.
>
> My questions are:
> 1) is this expected ? I can't remember protocol changes from 3.5 to
> 3.6, but actually 3.6 diverged from 3.5 branch so long ago, and I was
> not in the community as dev so I cannot tell
> 2) is this a viable option for users ? to have some temporary glitch
> during the upgrade and hope that the upgrade completes without
> troubles ?
>
> In theory as long as two servers are running the same major version
> (3.5 or 3.6) we have a quorum and the system is able to make progress
> and to server clients.
> I feel that this is quite dangerous, but I don't have enough context
> to understand how this problem is possible and when we decided to
> break compatibility.
>
> The other option is that I am wrong in my test and I am messing up :-)
>
> The other upgrade path I would like to see working like a charm is the
> upgrade from 3.4 to 3.6, as I see that as soon as we release 3.6 we
> should encourage users to move to 3.6 and not to 3.5.
>
> Regards
> Enrico
>


Rolling upgrade from 3.5 to 3.6 - expected behaviour

2020-02-10 Thread Enrico Olivelli
Hi,
even if we had enough binding +1 on 3.6.0rc2 before closing the VOTE
of 3.6.0 I wanted to finish my tests and I am coming to an apparent
blocker.

I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks like
peers are not able to talk to each other.
I have a cluster of 3, server1, server2 and server3.
When I upgrade server1 to 3.6.0rc2 I see this kind of errors on 3.5 nodes:

2020-02-10 09:35:07,745 [myid:3] - INFO
[localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] - Received
connection request 127.0.0.1:62591
2020-02-10 09:35:07,746 [myid:3] - ERROR
[localhost/127.0.0.1:3334:QuorumCnxManager@527] -
org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
Got unrecognized protocol version -65535

Once I upgrade all of the peers the system is up and running, without
apparently no data loss.

During the upgrade as soon as I upgrade the first node, say, server1,
server1 is not able to accept connections (error "Close of session 0x0
java.io.IOException: ZooKeeperServer not running")  from clients, this
is expected, because as far as it cannot talk with the other peers it
is practically partitioned away from the cluster.

My questions are:
1) is this expected ? I can't remember protocol changes from 3.5 to
3.6, but actually 3.6 diverged from 3.5 branch so long ago, and I was
not in the community as dev so I cannot tell
2) is this a viable option for users ? to have some temporary glitch
during the upgrade and hope that the upgrade completes without
troubles ?

In theory as long as two servers are running the same major version
(3.5 or 3.6) we have a quorum and the system is able to make progress
and to server clients.
I feel that this is quite dangerous, but I don't have enough context
to understand how this problem is possible and when we decided to
break compatibility.

The other option is that I am wrong in my test and I am messing up :-)

The other upgrade path I would like to see working like a charm is the
upgrade from 3.4 to 3.6, as I see that as soon as we release 3.6 we
should encourage users to move to 3.6 and not to 3.5.

Regards
Enrico