FYI: I created these scripts for my local tests: https://github.com/symat/zk-rolling-upgrade-test
For the long term I would also add some script that actually monitors the state of the quorum and also runs continuous traffic, not just 1-2 smoketests after each restart. But I don't know how important this would be. On Tue, Feb 11, 2020 at 5:25 PM Enrico Olivelli <eolive...@gmail.com> wrote: > Il giorno mar 11 feb 2020 alle ore 17:17 Andor Molnar > <an...@apache.org> ha scritto: > > > > The most obvious one which crosses my mind is that I previously worked > on: > > > > 1) run old version cluster, > > 2) connect to each node and run smoke tests, > > 3) restart one node with new code, > > 4) goto 2) until all nodes are upgraded > > > > I think this wouldn’t work in a “unit test”, we probably need a separate > Jenkins job and a nice python script to do this. > > > > Andor > > > > > > > > > > > On 2020. Feb 11., at 16:38, Patrick Hunt <ph...@apache.org> wrote: > > > > > > Anyone have ideas how we could add testing for upgrade? Obviously > something > > > we're missing, esp given it's import. > > I will send an email next days with a proposal. > btw my idea is very like Andor's one > > Once we have an automatic environment we can launch from Jenkins > > Enrico > > > > > > > > Patrick > > > > > > On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli <eolive...@gmail.com> > > > wrote: > > > > > >> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté > > >> <szalay.beko.m...@gmail.com> ha scritto: > > >>> > > >>> Hi All, > > >>> > > >>> about the question from Michael: > > >>>> Regarding the fix, can we just make 3.6.0 aware of the old protocol > and > > >>>> speak old message format when it's talking to old server? > > >>> > > >>> In this particular case, it might be enough. The protocol change > happened > > >>> now in the 'initial message' sent by the QuorumCnxManager. Maybe it > is > > >> not > > >>> a problem if the new servers can not initiate channels to the old > > >> servers, > > >>> maybe it is enough if these channel gets initiated by the old servers > > >> only. > > >>> I will test it quickly. > > >>> > > >>> Although I have no idea if any other thing changed in the quorum > protocol > > >>> between 3.5 and 3.6. In other cases it might not be enough if the new > > >>> servers can understand the old messages, as the old servers can > break by > > >>> not understanding the messages from the new servers. Also, in the > code > > >>> currently (AFAIK) there is no generic knowledge of protocol > versions, the > > >>> servers are not storing that which protocol versions they can/should > use > > >> to > > >>> communicate to which particular other servers. Maybe we don't even > need > > >>> this, but I would feel better if we would have more tests around > these > > >>> things. > > >>> > > >>> My suggestion for the long term: > > >>> - let's fix this particular issue now with 3.6.0 quickly (I start > doing > > >>> this today) > > >>> - let's do some automation (backed up with jenkins) that will test a > > >> whole > > >>> combinations of different ZooKeeper upgrade paths by making rolling > > >>> upgrades during some light traffic. Let's have a bit better > definition > > >>> about what we expect (e.g. the quorum is up, but some clients can get > > >>> disconnected? What will happen to the ephemeral nodes? Do we want to > > >>> gracefully close or transfer the user sessions before stopping the > old > > >>> server?) and let's see where this broke. Just by checking the code, I > > >> don't > > >>> think the quorum will always be up (e.g. between older 3.4 versions > and > > >>> 3.5). > > >> > > >> > > >> I am happy to work on this topic > > >> > > >>> - we need to update the Wiki about the working rolling upgrade paths > and > > >>> maybe about workarounds if needed > > >>> - we might need to do some fixes (adding backward compatible versions > > >>> and/or specific parameters that enforce old protocol temporary > during the > > >>> rolling upgrade that can be changed later to the new protocol by > either > > >>> dynamic reconfig or by rolling restart) > > >> > > >> it would be much better on 3.6 code to have some support for > > >> compatibility with 3.5 servers > > >> we can't require old code to be forward compatible but we can make new > > >> code be compatible to a certain extend with old code. > > >> If we can achieve this compatibility goal without a flag is better, > > >> users won't have to care about this part and they simply "trust" on us > > >> > > >> The rollback story is also important, but maybe we are still not ready > > >> for it, in case of local changes to store, > > >> it is better to have a clear design and plan and work for a new > release > > >> (3.7?) > > >> > > >> Enrico > > >> > > >>> > > >>> Depending on your comments, I am happy to create a few Jira tickets > > >> around > > >>> these topics. > > >>> > > >>> Kind regards, > > >>> Mate > > >>> > > >>> ps. Enrico, sorry about your RC... I owe you a beer, let me know if > you > > >> are > > >>> near to Budapest ;) > > >>> > > >>> On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <eolive...@gmail.com > > > > >> wrote: > > >>> > > >>>> Good. > > >>>> > > >>>> I will cancel the vote for 3.6.0rc2. > > >>>> > > >>>> I appreciate very much If Mate and his colleagues have time to work > on > > >> a > > >>>> fix. > > >>>> Otherwise I will have cycles next week > > >>>> > > >>>> I would also like to spend my time in setting up a few minimal > > >> integration > > >>>> tests about the upgrade story > > >>>> > > >>>> Enrico > > >>>> > > >>>> Il Mar 11 Feb 2020, 07:30 Michael Han <h...@apache.org> ha scritto: > > >>>> > > >>>>> Kudos Enrico, very thorough work as the final gate keeper of the > > >> release! > > >>>>> > > >>>>> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2. > > >>>>> > > >>>>> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of the > > >> rare > > >>>>> piece of software that put so much emphasis on compatibilities thus > > >> it > > >>>> just > > >>>>> works when upgrade / downgrade, which is amazing. One guarantee we > > >> always > > >>>>> had is during rolling upgrade, the quorum will always be available, > > >>>> leading > > >>>>> to no service interruption. It would be sad we lose such capability > > >> given > > >>>>> this is still a tractable problem. > > >>>>> > > >>>>> Regarding the fix, can we just make 3.6.0 aware of the old protocol > > >> and > > >>>>> speak old message format when it's talking to old server? > Basically, > > >> an > > >>>>> ugly if else check against the protocol version should work and > > >> there is > > >>>> no > > >>>>> need to have multiple pass on rolling upgrade process. > > >>>>> > > >>>>> > > >>>>> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli < > > >> eolive...@gmail.com> > > >>>>> wrote: > > >>>>> > > >>>>>> I suggest this plan: > > >>>>>> - release 3.6.0 now > > >>>>>> - improve the migration story, the flow outlined by Mate is > > >>>>>> interesting, but it will take time > > >>>>>> > > >>>>>> 3.6.0rc2 got enough binding votes so I am going to finalize the > > >>>>>> release this evening (within 8-10 hours) if no one comes out in > the > > >>>>>> VOTE thread with a -1 > > >>>>>> > > >>>>>> Enrico > > >>>>>> > > >>>>>> Enrico > > >>>>>> > > >>>>>> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt > > >>>>>> <ph...@apache.org> ha scritto: > > >>>>>>> > > >>>>>>> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <an...@apache.org> > > >>>> wrote: > > >>>>>>> > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> Answers inline. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> In my experience when you are close to a release it is > > >> better to > > >>>> to > > >>>>>>>>> make big changes. (I am among the approvers of that patch, > > >> so I > > >>>> am > > >>>>>>>>> responsible for this change) > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Although this statement is acceptable for me, I don’t feel this > > >>>> patch > > >>>>>>>> should not have been merged into 3.6.0. Submission has been > > >>>> preceded > > >>>>>> by a > > >>>>>>>> long argument with MAPR folks who originally wanted to be > > >> merged > > >>>> into > > >>>>>> 3.4 > > >>>>>>>> branch (considering the pace how ZooKeeper community is moving > > >>>>>> forward) and > > >>>>>>>> we reached an agreement that release it with 3.6.0. > > >>>>>>>> > > >>>>>>>> Make a long story short, this patch has been outstanding for > > >> ages > > >>>>>> without > > >>>>>>>> much attention from the community and contributors made a lot > > >> of > > >>>>>> effort to > > >>>>>>>> get it done before the release. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> I would like to ear from people that have been in the > > >> community > > >>>> for > > >>>>>>>>> long time, then I am ready to complete the release process > > >> for > > >>>>>>>>> 3.6.0rc2. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Me too. > > >>>>>>>> > > >>>>>>>> I tend to accept the way rolling restart works now - as you > > >>>> described > > >>>>>>>> Enrico - and given that situation was pretty much the same > > >> between > > >>>>> 3.4 > > >>>>>> and > > >>>>>>>> 3.5, I don’t feel we have to make additional changes. > > >>>>>>>> > > >>>>>>>> On the other hand, the fix that Mate suggested sounds quite > > >> cool, > > >>>> I’m > > >>>>>> also > > >>>>>>>> happy to work on getting it in. > > >>>>>>>> > > >>>>>>>> Fyi, Release Management page says the following: > > >>>>>>>> > > >>>>>> > > >>>> > > >> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement > > >>>>>>>> > > >>>>>>>> "major.minor release of ZooKeeper must be backwards compatible > > >> with > > >>>>> the > > >>>>>>>> previous minor release, major.(minor-1)" > > >>>>>>>> > > >>>>>>>> > > >>>>>>> Our users, direct and indirect, value the ability to migrate to > > >> newer > > >>>>>>> versions - esp as we drop support for older. Frictions such as > > >> this > > >>>> can > > >>>>>> be > > >>>>>>> a reason to go elsewhere. I'm "pro" b/w compact - esp given our > > >>>>> published > > >>>>>>> guidelines. > > >>>>>>> > > >>>>>>> Patrick > > >>>>>>> > > >>>>>>> > > >>>>>>>> Andor > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> On 2020. Feb 10., at 11:32, Enrico Olivelli < > > >> eolive...@gmail.com > > >>>>> > > >>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>> Thank you Mate for checking and explaining this story. > > >>>>>>>>> > > >>>>>>>>> I find it very interesting that the cause is ZOOKEEPER-3188 > > >> as: > > >>>>>>>>> - it is the last "big patch" committed to 3.6 before > > >> starting the > > >>>>>>>>> release process > > >>>>>>>>> - it is the cause of the failure of the first RC > > >>>>>>>>> > > >>>>>>>>> In my experience when you are close to a release it is > > >> better to > > >>>> to > > >>>>>>>>> make big changes. (I am among the approvers of that patch, > > >> so I > > >>>> am > > >>>>>>>>> responsible for this change) > > >>>>>>>>> > > >>>>>>>>> This is a pointer to the change to whom who wants to > > >> understand > > >>>>>> better > > >>>>>>>>> the context > > >>>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >> > https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11 > > >>>>>>>>> > > >>>>>>>>> IIUC even for the upgrade from 3.4 to 3.5 the story was the > > >> same > > >>>>> and > > >>>>>>>>> if this statement holds then I feel we can continue > > >>>>>>>>> with this release. > > >>>>>>>>> > > >>>>>>>>> - Reverting ZOOKEEPER-3188 is not an option for me, it is too > > >>>>>> complex. > > >>>>>>>>> - Making 3.5 and 3.6 "compatible" can be very tricky and we > > >> do > > >>>> not > > >>>>>>>>> have tools to certify this compatibility (at least not in the > > >>>> short > > >>>>>>>>> term) > > >>>>>>>>> > > >>>>>>>>> I would like to ear from people that have been in the > > >> community > > >>>> for > > >>>>>>>>> long time, then I am ready to complete the release process > > >> for > > >>>>>>>>> 3.6.0rc2. > > >>>>>>>>> > > >>>>>>>>> I will update the website and the release notes with a > > >> specific > > >>>>>>>>> warning about the upgrade, we should also update the Wiki > > >>>>>>>>> > > >>>>>>>>> Enrico > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté > > >>>>>>>>> <szalay.beko.m...@gmail.com> ha scritto: > > >>>>>>>>>> > > >>>>>>>>>> Hi Enrico! > > >>>>>>>>>> > > >>>>>>>>>> This is caused by the different PROTOCOL_VERSION in the > > >>>>>>>> QuorumCnxManager. > > >>>>>>>>>> The Protocol version was changed last time in > > >> ZOOKEEPER-2186 > > >>>>>> released > > >>>>>>>>>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some > > >> bugs. > > >>>>>> Later I > > >>>>>>>>>> also changed the protocol version when the format of the > > >> initial > > >>>>>> message > > >>>>>>>>>> changed in ZOOKEEPER-3188. So actually the quorum protocol > > >> is > > >>>> not > > >>>>>>>>>> compatible in this case and is the 'expected' behavior if > > >> you > > >>>>>> upgrade > > >>>>>>>> e.g > > >>>>>>>>>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to > > >>>> 3.6.0. > > >>>>>>>>>> > > >>>>>>>>>> We had some discussion in the PR of ZOOKEEPER-3188 back > > >> then and > > >>>>>> got to > > >>>>>>>> the > > >>>>>>>>>> conclusion that it is not that bad, as there will be no data > > >>>> loss > > >>>>>> as you > > >>>>>>>>>> wrote. The tricky thing is that during rolling upgrade we > > >> should > > >>>>>> ensure > > >>>>>>>>>> both backward and forward compatibility to make sure that > > >> the > > >>>> old > > >>>>>> and > > >>>>>>>> the > > >>>>>>>>>> new part of the quorum can still speak to each other. The > > >>>> current > > >>>>>>>> solution > > >>>>>>>>>> (simply failing if the protocol versions mismatch) is more > > >>>> simple > > >>>>>> and > > >>>>>>>> still > > >>>>>>>>>> working just fine: as the servers are restarted one-by-one, > > >> the > > >>>>>> nodes > > >>>>>>>> with > > >>>>>>>>>> the old protocol version and the nodes with the new protocol > > >>>>> version > > >>>>>>>> will > > >>>>>>>>>> form two partitions, but any given time only one partition > > >> will > > >>>>>> have the > > >>>>>>>>>> quorum. > > >>>>>>>>>> > > >>>>>>>>>> Still, thinking it trough, as a side effect in these cases > > >> there > > >>>>>> will > > >>>>>>>> be a > > >>>>>>>>>> short time when none of the partitions will have quorums > > >> (when > > >>>> we > > >>>>>> have N > > >>>>>>>>>> servers with the old protocol version, N servers with the > > >> new > > >>>>>> protocol > > >>>>>>>>>> version, and there is one server just being restarted). I > > >> am not > > >>>>>> sure > > >>>>>>>> if we > > >>>>>>>>>> can accept this. > > >>>>>>>>>> > > >>>>>>>>>> For ZOOKEEPER-3188 we can add a small patch to make it > > >> possible > > >>>> to > > >>>>>> parse > > >>>>>>>>>> the initial message of the old protocol version with the new > > >>>> code. > > >>>>>> But > > >>>>>>>> I am > > >>>>>>>>>> not sure if it would be enough (as the old code will not be > > >> able > > >>>>> to > > >>>>>>>> parse > > >>>>>>>>>> the new initial message). > > >>>>>>>>>> > > >>>>>>>>>> One option can be to make a patch also for 3.5 to have a > > >> version > > >>>>>> which > > >>>>>>>>>> supports both protocol versions. (let's say in 3.5.8) Then > > >> we > > >>>> can > > >>>>>> write > > >>>>>>>> to > > >>>>>>>>>> the release note, that if you need rolling upgrade from any > > >>>>> versions > > >>>>>>>> since > > >>>>>>>>>> 3.4.7, then you have to first upgrade from 3.5.8 before > > >>>> upgrading > > >>>>> to > > >>>>>>>> 3.6.0. > > >>>>>>>>>> We can even make the same thing on the 3.4 branch. > > >>>>>>>>>> > > >>>>>>>>>> But I am also new to the community... It would be great to > > >> hear > > >>>>> the > > >>>>>>>> opinion > > >>>>>>>>>> of more experienced people. > > >>>>>>>>>> Whatever the decision will be, I am happy to make the > > >> changes. > > >>>>>>>>>> > > >>>>>>>>>> And sorry for breaking the RC (if we decide that this needs > > >> to > > >>>> be > > >>>>>>>>>> changed...). ZOOKEEPER-3188 was a complex patch. > > >>>>>>>>>> > > >>>>>>>>>> Kind regards, > > >>>>>>>>>> Mate > > >>>>>>>>>> > > >>>>>>>>>> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli < > > >>>>>> eolive...@gmail.com> > > >>>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> Hi, > > >>>>>>>>>>> even if we had enough binding +1 on 3.6.0rc2 before > > >> closing the > > >>>>>> VOTE > > >>>>>>>>>>> of 3.6.0 I wanted to finish my tests and I am coming to an > > >>>>> apparent > > >>>>>>>>>>> blocker. > > >>>>>>>>>>> > > >>>>>>>>>>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it > > >> looks > > >>>>> like > > >>>>>>>>>>> peers are not able to talk to each other. > > >>>>>>>>>>> I have a cluster of 3, server1, server2 and server3. > > >>>>>>>>>>> When I upgrade server1 to 3.6.0rc2 I see this kind of > > >> errors on > > >>>>> 3.5 > > >>>>>>>> nodes: > > >>>>>>>>>>> > > >>>>>>>>>>> 2020-02-10 09:35:07,745 [myid:3] - INFO > > >>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] - > > >>>>>> Received > > >>>>>>>>>>> connection request 127.0.0.1:62591 > > >>>>>>>>>>> 2020-02-10 09:35:07,746 [myid:3] - ERROR > > >>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] - > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >> > org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException: > > >>>>>>>>>>> Got unrecognized protocol version -65535 > > >>>>>>>>>>> > > >>>>>>>>>>> Once I upgrade all of the peers the system is up and > > >> running, > > >>>>>> without > > >>>>>>>>>>> apparently no data loss. > > >>>>>>>>>>> > > >>>>>>>>>>> During the upgrade as soon as I upgrade the first node, > > >> say, > > >>>>>> server1, > > >>>>>>>>>>> server1 is not able to accept connections (error "Close of > > >>>>> session > > >>>>>> 0x0 > > >>>>>>>>>>> java.io.IOException: ZooKeeperServer not running") from > > >>>> clients, > > >>>>>> this > > >>>>>>>>>>> is expected, because as far as it cannot talk with the > > >> other > > >>>>> peers > > >>>>>> it > > >>>>>>>>>>> is practically partitioned away from the cluster. > > >>>>>>>>>>> > > >>>>>>>>>>> My questions are: > > >>>>>>>>>>> 1) is this expected ? I can't remember protocol changes > > >> from > > >>>> 3.5 > > >>>>> to > > >>>>>>>>>>> 3.6, but actually 3.6 diverged from 3.5 branch so long ago, > > >>>> and I > > >>>>>> was > > >>>>>>>>>>> not in the community as dev so I cannot tell > > >>>>>>>>>>> 2) is this a viable option for users ? to have some > > >> temporary > > >>>>>> glitch > > >>>>>>>>>>> during the upgrade and hope that the upgrade completes > > >> without > > >>>>>>>>>>> troubles ? > > >>>>>>>>>>> > > >>>>>>>>>>> In theory as long as two servers are running the same major > > >>>>> version > > >>>>>>>>>>> (3.5 or 3.6) we have a quorum and the system is able to > > >> make > > >>>>>> progress > > >>>>>>>>>>> and to server clients. > > >>>>>>>>>>> I feel that this is quite dangerous, but I don't have > > >> enough > > >>>>>> context > > >>>>>>>>>>> to understand how this problem is possible and when we > > >> decided > > >>>> to > > >>>>>>>>>>> break compatibility. > > >>>>>>>>>>> > > >>>>>>>>>>> The other option is that I am wrong in my test and I am > > >> messing > > >>>>> up > > >>>>>> :-) > > >>>>>>>>>>> > > >>>>>>>>>>> The other upgrade path I would like to see working like a > > >> charm > > >>>>> is > > >>>>>> the > > >>>>>>>>>>> upgrade from 3.4 to 3.6, as I see that as soon as we > > >> release > > >>>> 3.6 > > >>>>> we > > >>>>>>>>>>> should encourage users to move to 3.6 and not to 3.5. > > >>>>>>>>>>> > > >>>>>>>>>>> Regards > > >>>>>>>>>>> Enrico > > >>>>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >> > > >