Btw we actually observed the described issue (data loss), thankfully in a test environment. So I thought this is important to share with the community.
Unfortunately I don’t have time to run a new ZK release for this, so I’m not going to -1 your candidate, but we are actively working on a fix (ie a test at this point) and I can commit that as soon as we have that. It may be worth while to delay the release by a few more days, but it’s totally up to you since you’re running it. Cheers Alex On Thu, Apr 5, 2018 at 12:47 PM Andor Molnar <an...@cloudera.com> wrote: > Got that. I still believe it's a completely valid issue which has to be > addressed, but it's not a showstopper. I'm afraid we're not going to > convince each other, so it's probably Abe's call if he want to create > another release candidate for the fix. > > I reviewed the code on github and I think it just needs to be covered with > a unit test to be complete. > > Regards, > Andor > > > > On Thu, Apr 5, 2018 at 9:05 PM, Alexander Shraer <shra...@gmail.com> > wrote: > > > Yes sort of, FLE is finished, then enough observer's messages reach the > > leader before participant's messages do. > > Whether its rare depends on the number of observers and participants. For > > example with very few participants and many observers > > your chance of hitting this are quite high. > > > > Alex > > > > On Thu, Apr 5, 2018 at 11:44 AM, Andor Molnar <an...@cloudera.com> > wrote: > > > > > Maybe I'm missing something here, but this looks like a rare edge case > to > > > me. Participants must finish the leader election successfully and right > > > after enough followers should fail to send epoch to the leader, so > > > observers can take it over. > > > > > > Is that description accurate? > > > > > > Andor > > > > > > > > > On Thu, Apr 5, 2018 at 7:35 PM, Alexander Shraer <shra...@gmail.com> > > > wrote: > > > > > > > To clarify - in a deployment with observers this bug can potentially > > > cause > > > > data loss. A server could be elected leader based just on the support > > of > > > > observers, even if this servers data is stale wrt other followers. > > > > > > > > It is certainly a blocker, just not sure if for 3.4.11 or 3.4.12. > > > > > > > > > > > > Alex > > > > On Thu, Apr 5, 2018 at 10:29 AM Andor Molnar <an...@cloudera.com> > > wrote: > > > > > > > > > I don't think it's a blocker. > > > > > The jira and PR has been open since last December and 3.4.11 has > > > released > > > > > without it. > > > > > > > > > > Although this bug is also important to fix, I believe it's more > > > important > > > > > to release a fix for the regression we've found in 3.4.11 asap. > > > > > > > > > > Abe, any thoughts? > > > > > > > > > > Regards, > > > > > Andor > > > > > > > > > > > > > > > > > > > > On Thu, Apr 5, 2018 at 7:00 PM, Alexander Shraer < > shra...@gmail.com> > > > > > wrote: > > > > > > > > > > > Sorry for coming in at the last moment. I'm not sure when the > next > > > 3.4 > > > > > > release is scheduled, so just wanted to mention this bug, > > > > > > which I believe is a blocker for either this or next release: > > > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-2959 > > > > > > > > > > > > Best, > > > > > > Alex > > > > > > > > > > > > On Thu, Apr 5, 2018 at 9:09 AM, Ted Yu <yuzhih...@gmail.com> > > wrote: > > > > > > > > > > > > > Can the vote be closed ? > > > > > > > > > > > > > > It seems we have enough +1's > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > >