Hi Andor, Is anyone working on ZK-2778? I can pick it up if there is no one working on it yet.
I'll open a 3.5 PR for ZK-3104 today. Fangmin On Fri, Oct 26, 2018 at 3:33 AM Andor Molnar <an...@apache.org> wrote: > Hi folks, > > You’ve probably realised lots of update emails coming from Jira. Please be > aware that we’ve updated a bunch of open blocker/critical 3.5 tickets to > reflect to what we discussed in this email. > > If you open up the following jira filter: > > project = ZooKeeper and resolution = Unresolved and fixVersion = 3.5.5 AND > priority in (blocker, critical) ORDER BY priority DESC, key ASC > > You’ll see the most up-to-date list of tickets which need to be addressed > before the stable 3.5 release. > > Thank you for your efforts to get this done. > > Fangmin, ZK-3104 is waiting for backport, but ticket has already been > resolved. Have you created a separate ticket for the backport or shall I > just reopen it with the right fix versions? > > Thanks, > Andor > > > > > On 2018. Oct 8., at 12:34, Andor Molnar <an...@apache.org> wrote: > > > > Hi, > > > > Let me summarize and give a quick update on the outstanding issues for > 3.5 GA: > > > > - ZOOKEEPER-1818 (Fix don't care for trunk) > > - ZOOKEEPER-2778 (Potential server deadlock between follower sync with > leader and follower receiving external connection requests.) > > - ZOOKEEPER-3021 Migrate project structure to Maven (ongoing) > > - ZOOKEEPER-925 Docs generation to Maven > > - ZOOKEEPER-3104 (waiting for backport) > > - ZOOKEEPER-3125 (waiting for backport PR #647) > > > > The 2 Maven related tickets are no-brainers as well as the backports. > ZK-2778 has been picked up by Maoling (thanks!) as far as I can see, > ZK-1818 is the only one waiting for a volunteer. > > > > Please correct me if I’ve missed something. > > > > Regards, > > Andor > > > > > > > > > >> On 2018. Sep 28., at 18:32, Tamas Penzes <tam...@cloudera.com.INVALID> > wrote: > >> > >> Hi All, > >> > >> I would add ZOOKEEPER-3021 > >> <https://issues.apache.org/jira/browse/ZOOKEEPER-3021> Migrate project > >> structure to Maven build as a blocker too. Since the migration has > started > >> it would be good to finish before releasing ZK 3.5.x GA. > >> > >> ZOOKEEPER-925 <https://issues.apache.org/jira/browse/ZOOKEEPER-925> > replace > >> our forrest site and documentation generation might also be a good idea, > >> since then we could deliver the new MarkDown based documentation. > >> > >> Regards, Tamaas > >> > >> On Fri, Sep 14, 2018 at 10:09 AM Fangmin Lv <lvfang...@gmail.com> > wrote: > >> > >>> Oh, sorry for the confusion, I should provide more context. > >>> > >>> Leader will use on disk txn sync with followers to if the peer zxid is > not > >>> in it's in memory commit logs, the code is here: Leader on disk txn > sync > >>> < > >>> > https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java#L774 > >>>> . > >>> There is bug that potentially there will be gap in the txn files, like > >>> after snap sync, etc, so it's possible the peer will miss txns due to > this. > >>> > >>> The option to disable it is snapshotSizeFactor > >>> < > >>> > https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZKDatabase.java#L81 > >>>> , > >>> set it to -1 will disable this feature. On 3.5, it's better to have a > PR to > >>> set this to -1 by default. It might have more SNAP sync, but from our > prod > >>> it doesn't seem to be a big problem to me. > >>> > >>> I can send out the diff to disable it by default on 3.5 if you guys > think > >>> this is the right way to do. > >>> > >>> Thanks, > >>> Fangmin > >>> > >>> On Thu, Sep 13, 2018 at 1:58 AM Andor Molnar <an...@apache.org> wrote: > >>> > >>>> What’s needed to turn it off? > >>>> Do we need a PR or it’s just a config option? > >>>> Shall we implement a feature switch for that and turn it off by > default? > >>>> > >>>> Sorry I don’t have too much insight on disk txn sync. > >>>> > >>>> Andor > >>>> > >>>> > >>>> > >>>>> On 2018. Sep 13., at 9:16, Fangmin Lv <lvfang...@gmail.com> wrote: > >>>>> > >>>>> And to be clear, ZOOKEEPER-2418 is actually just one case of > >>>> inconsistency > >>>>> which could caused by on disk txn sync, as I mentioned in a newer > JIRA > >>>>> ZOOKEEPER-2846 <https://issues.apache.org/jira/browse/ZOOKEEPER-2846 > >, > >>>> the > >>>>> snap sync or txn sync could also leave txns gap in the txn file, > which > >>>> is a > >>>>> more common case could trigger this issue. > >>>>> > >>>>> I would suggest to turn off the on disk txn sync by default for now > to > >>>>> avoid this issue, after we finished ZOOKEEPER-3114, we can use that > to > >>>>> validate the on disk txns during syncing. > >>>>> > >>>>> Thanks, > >>>>> Fangmin > >>>>> > >>>>> On Wed, Sep 12, 2018 at 9:55 AM Fangmin Lv <lvfang...@gmail.com> > >>> wrote: > >>>>> > >>>>>> Andor, > >>>>>> > >>>>>> ZOOKEEPER-3114 is about adding real time digest checking to help > >>>> detecting > >>>>>> inconsistency, it's a new feature with amounts of code change. I'll > >>>> start > >>>>>> upstream it part by part, but I don't expect it's being merged in > the > >>>> next > >>>>>> few weeks. So yes, it's a nice to have, but definitely not a block > for > >>>> 3.5. > >>>>>> > >>>>>> Thanks, > >>>>>> Fangmin > >>>>>> > >>>>>> On Wed, Sep 12, 2018 at 2:55 AM Andor Molnar <an...@apache.org> > >>> wrote: > >>>>>> > >>>>>>> Fangmin, > >>>>>>> > >>>>>>> Sorry, I just noticed that you want to include the consistency > fixes > >>> in > >>>>>>> the stable version which is fine. Let’s finish the backports and > >>> we’ll > >>>> be > >>>>>>> done with them. > >>>>>>> > >>>>>>> ZOOKEEPER-3114 is essentially a new feature, I wouldn’t block 3.5 > >>> with > >>>>>>> that. What do you think? > >>>>>>> > >>>>>>> Andor > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> On 2018. Sep 12., at 11:52, Andor Molnar <an...@apache.org> > wrote: > >>>>>>>> > >>>>>>>> Cool, thanks for the clarification. > >>>>>>>> > >>>>>>>> The updated list is as follows: > >>>>>>>> > >>>>>>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol) > >>>>>>>> - ZOOKEEPER-1818 (Fix don't care for trunk) > >>>>>>>> - ZOOKEEPER-2778 (Potential server deadlock between follower sync > >>> with > >>>>>>> leader and follower receiving external connection requests.) > >>>>>>>> > >>>>>>>> The following are not critical and no blockers for the stable > >>> release: > >>>>>>>> > >>>>>>>> Waiting for to be ported to 3.5: > >>>>>>>> - ZOOKEEPER-3104 > >>>>>>>> - ZOOKEEPER-3125 > >>>>>>>> - ZOOKEEPER-3127 > >>>>>>>> > >>>>>>>> New feature: > >>>>>>>> - ZOOKEEPER-3114 (fixes ZOOKEEPER-2184 too) > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Andor > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> On 2018. Sep 12., at 0:42, Fangmin Lv <lvfang...@gmail.com> > wrote: > >>>>>>>>> > >>>>>>>>> Hi Andor, > >>>>>>>>> > >>>>>>>>> That's the on disk txn feature, which was disabled internally > after > >>>> we > >>>>>>>>> found the potentially inconsistent issue. The only solution we > have > >>>>>>> for now > >>>>>>>>> is waiting for the new digest checking feature I mentioned in > >>>>>>>>> ZOOKEEPER-3114. > >>>>>>>>> > >>>>>>>>> I think there are some other critical consistent issues we just > >>> fixed > >>>>>>> on > >>>>>>>>> master recently: ZOOKEEPER-3104, ZOOKEEPER-3125, ZOOKEEPER-3127, > I > >>>>>>> think we > >>>>>>>>> should include that in the official 3.5 release as well. > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Fangmin > >>>>>>>>> > >>>>>>>>> On Tue, Sep 11, 2018 at 11:58 AM Andor Molnár <an...@apache.org> > >>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi Jeelani, > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Thanks for letting me know. I'm happy to remove it from the list > >>> to > >>>>>>> get > >>>>>>>>>> closer to a stable release. :) > >>>>>>>>>> > >>>>>>>>>> What's the feature which can be disabled to avoid data > >>>> inconsistency? > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Andor > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 09/10/2018 11:33 PM, Mohamed Jeelani wrote: > >>>>>>>>>>> Thanks Andor for compiling this. Should we be ignoring > >>>>>>> ZOOKEEPER-2418 as > >>>>>>>>>> well? This exists in 3.4 as well and the feature can be > disabled. > >>> We > >>>>>>> are > >>>>>>>>>> working on a longer term fix for it in 3.6. > >>>>>>>>>>> > >>>>>>>>>>> Regards, > >>>>>>>>>>> > >>>>>>>>>>> Jeelani > >>>>>>>>>>> > >>>>>>>>>>> On 9/10/18, 5:19 AM, "Andor Molnar" > <an...@cloudera.com.INVALID > >>>> > >>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Fine. > >>>>>>>>>>> > >>>>>>>>>>> I'm happy to ignore 1549, 2846 and 2930. Still we have the list > >>>> of: > >>>>>>>>>>> > >>>>>>>>>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol) > >>>>>>>>>>> - ZOOKEEPER-1818 (Fix don't care for trunk) > >>>>>>>>>>> - ZOOKEEPER-2418 (txnlog diff sync can skip sending some > >>>>>>>>>> transactions to > >>>>>>>>>>> followers) > >>>>>>>>>>> - ZOOKEEPER-2778 (Potential server deadlock between follower > >>> sync > >>>>>>>>>> with > >>>>>>>>>>> leader and follower receiving external connection requests.) > >>>>>>>>>>> > >>>>>>>>>>> SSL (ZK-236) is a feature which essential for the 3.5 release, > >>>>>>> hence > >>>>>>>>>> I > >>>>>>>>>>> wouldn't leave it out or postpone it for the next stable > >>> release. > >>>>>>> PR > >>>>>>>>>> has > >>>>>>>>>>> been out for a long time, get on reviewing please. > >>>>>>>>>>> The rest are also long outstanding issues which have been found > >>> in > >>>>>>>>>> the 3.5 > >>>>>>>>>>> branch. > >>>>>>>>>>> ZK-1818 is something which was found in 3.4 and fixed in 3.4, > >>> but > >>>>>>>>>> never has > >>>>>>>>>>> been fixed in 3.5. Quite a serious issue if still present. > >>>>>>>>>>> > >>>>>>>>>>> I think we should at least run some manual testing and see if > we > >>>>>>>>>> could > >>>>>>>>>>> repro any of these issues before going ahead with a stable > >>>> release. > >>>>>>>>>>> > >>>>>>>>>>> Regards, > >>>>>>>>>>> Andor > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Sep 7, 2018 at 3:24 AM, Michael Han <h...@apache.org> > >>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> I haven't went through the entire list, but looks like lots of > >>> the > >>>>>>>>>> JIRA > >>>>>>>>>>>> issues listed in this thread, such as ZOOKEEPER-1549, 2846, > also > >>>>>>>>>> affects > >>>>>>>>>>>> 3.4 releases. Should we scope these issues out? > >>>>>>>>>>>> > >>>>>>>>>>>> I think historically the single outstanding blocking issue > for a > >>>>>>>>>> stable 3.5 > >>>>>>>>>>>> release is the reconfig feature and security concerns around > it > >>>>>>>>>> (somehow > >>>>>>>>>>>> addressed in ZOOKEEPER-2014), and the alpha and beta releases > >>> were > >>>>>>>>>> created > >>>>>>>>>>>> to stabilize that feature. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>>> > >>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__zookeeper-2Duser.578899.n2.nabble.com_Zookeeper-2Dwith-2D&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=_tGtL3nMWtuPrXKXDx27AIWOzyyT7W-CjIVLDFZwT0E&e= > >>>>>>>>>>>> SSL-release-date-tt7581744.html > >>>>>>>>>>>> > >>>>>>>>>>>> So it looks like we are in good shape to release. Something > >>> might > >>>>>>>>>> worth > >>>>>>>>>>>> doing to claim the quality of 3.5 is on par with 3.4 > >>>>>>>>>>>> > >>>>>>>>>>>> * Run Jepsen on 3.5 - 3.4 passed the test for the record > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>>> > >>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__aphyr.com_posts_291-2Djepsen-2Dzookeeper&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=VjORkX5s7hrJyl8mW9Q4cfeSWF4qfTdyRjcuAiBt0y4&e= > >>>>>>>>>>>> * Fix all flaky tests on 3.5 - 3.4 has little or no flaky > tests > >>> at > >>>>>>>>>> all. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, Sep 4, 2018 at 1:48 AM, Andor Molnar > >>>>>>>>>> <an...@cloudera.com.invalid> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Thanks Maoling! That would be huge help, I appreciate it. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Andor > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>> > >>>> > > > >