Hi team,
I'm proudly announce that thanks to the joint effort from the community, the 3.5 blockers list has become empty: "project = ZooKeeper AND resolution = Unresolved AND fixVersion = 3.5.5 AND priority in (blocker, critical) ORDER BY priority DESC, key ASC" Well... almost. All the blocker issues have gone, but we still have the Maven migration to complete before the stable release. If you have some free cycles, please join us testing the Maven build on this PR: https://github.com/apache/zookeeper/pull/708 I hope we can merge it pretty soon. In terms of the builds, the weather at 3.5 branch is quite sunny nowadays: https://builds.apache.org/view/S-Z/view/ZooKeeper/ The Java 11 build is still having some difficulties, which hopefully I can address before the holidays: https://issues.apache.org/jira/browse/ZOOKEEPER-3204 If you happen to know about something which is important from 3.5's perspective and missing from the above, please don't hesitate to share. Happy ZooKeeping! Andor On 11/2/18 21:12, Fangmin Lv wrote: > Andor, > > Here is the PR to port ZK-3104 from master to 3.4: > https://github.com/apache/zookeeper/pull/685. > > Fangmin > > On Fri, Nov 2, 2018 at 11:46 AM Fangmin Lv <lvfang...@gmail.com> wrote: > >> Hi Andor, >> >> Is anyone working on ZK-2778? I can pick it up if there is no one working >> on it yet. >> >> I'll open a 3.5 PR for ZK-3104 today. >> >> Fangmin >> >> On Fri, Oct 26, 2018 at 3:33 AM Andor Molnar <an...@apache.org> wrote: >> >>> Hi folks, >>> >>> You’ve probably realised lots of update emails coming from Jira. Please >>> be aware that we’ve updated a bunch of open blocker/critical 3.5 tickets to >>> reflect to what we discussed in this email. >>> >>> If you open up the following jira filter: >>> >>> project = ZooKeeper and resolution = Unresolved and fixVersion = 3.5.5 >>> AND priority in (blocker, critical) ORDER BY priority DESC, key ASC >>> >>> You’ll see the most up-to-date list of tickets which need to be addressed >>> before the stable 3.5 release. >>> >>> Thank you for your efforts to get this done. >>> >>> Fangmin, ZK-3104 is waiting for backport, but ticket has already been >>> resolved. Have you created a separate ticket for the backport or shall I >>> just reopen it with the right fix versions? >>> >>> Thanks, >>> Andor >>> >>> >>> >>>> On 2018. Oct 8., at 12:34, Andor Molnar <an...@apache.org> wrote: >>>> >>>> Hi, >>>> >>>> Let me summarize and give a quick update on the outstanding issues for >>> 3.5 GA: >>>> - ZOOKEEPER-1818 (Fix don't care for trunk) >>>> - ZOOKEEPER-2778 (Potential server deadlock between follower sync with >>> leader and follower receiving external connection requests.) >>>> - ZOOKEEPER-3021 Migrate project structure to Maven (ongoing) >>>> - ZOOKEEPER-925 Docs generation to Maven >>>> - ZOOKEEPER-3104 (waiting for backport) >>>> - ZOOKEEPER-3125 (waiting for backport PR #647) >>>> >>>> The 2 Maven related tickets are no-brainers as well as the backports. >>> ZK-2778 has been picked up by Maoling (thanks!) as far as I can see, >>> ZK-1818 is the only one waiting for a volunteer. >>>> Please correct me if I’ve missed something. >>>> >>>> Regards, >>>> Andor >>>> >>>> >>>> >>>> >>>>> On 2018. Sep 28., at 18:32, Tamas Penzes <tam...@cloudera.com.INVALID> >>> wrote: >>>>> Hi All, >>>>> >>>>> I would add ZOOKEEPER-3021 >>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-3021> Migrate project >>>>> structure to Maven build as a blocker too. Since the migration has >>> started >>>>> it would be good to finish before releasing ZK 3.5.x GA. >>>>> >>>>> ZOOKEEPER-925 <https://issues.apache.org/jira/browse/ZOOKEEPER-925> >>> replace >>>>> our forrest site and documentation generation might also be a good >>> idea, >>>>> since then we could deliver the new MarkDown based documentation. >>>>> >>>>> Regards, Tamaas >>>>> >>>>> On Fri, Sep 14, 2018 at 10:09 AM Fangmin Lv <lvfang...@gmail.com> >>> wrote: >>>>>> Oh, sorry for the confusion, I should provide more context. >>>>>> >>>>>> Leader will use on disk txn sync with followers to if the peer zxid >>> is not >>>>>> in it's in memory commit logs, the code is here: Leader on disk txn >>> sync >>>>>> < >>>>>> >>> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java#L774 >>>>>>> . >>>>>> There is bug that potentially there will be gap in the txn files, like >>>>>> after snap sync, etc, so it's possible the peer will miss txns due to >>> this. >>>>>> The option to disable it is snapshotSizeFactor >>>>>> < >>>>>> >>> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZKDatabase.java#L81 >>>>>>> , >>>>>> set it to -1 will disable this feature. On 3.5, it's better to have a >>> PR to >>>>>> set this to -1 by default. It might have more SNAP sync, but from our >>> prod >>>>>> it doesn't seem to be a big problem to me. >>>>>> >>>>>> I can send out the diff to disable it by default on 3.5 if you guys >>> think >>>>>> this is the right way to do. >>>>>> >>>>>> Thanks, >>>>>> Fangmin >>>>>> >>>>>> On Thu, Sep 13, 2018 at 1:58 AM Andor Molnar <an...@apache.org> >>> wrote: >>>>>>> What’s needed to turn it off? >>>>>>> Do we need a PR or it’s just a config option? >>>>>>> Shall we implement a feature switch for that and turn it off by >>> default? >>>>>>> Sorry I don’t have too much insight on disk txn sync. >>>>>>> >>>>>>> Andor >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On 2018. Sep 13., at 9:16, Fangmin Lv <lvfang...@gmail.com> wrote: >>>>>>>> >>>>>>>> And to be clear, ZOOKEEPER-2418 is actually just one case of >>>>>>> inconsistency >>>>>>>> which could caused by on disk txn sync, as I mentioned in a newer >>> JIRA >>>>>>>> ZOOKEEPER-2846 < >>> https://issues.apache.org/jira/browse/ZOOKEEPER-2846>, >>>>>>> the >>>>>>>> snap sync or txn sync could also leave txns gap in the txn file, >>> which >>>>>>> is a >>>>>>>> more common case could trigger this issue. >>>>>>>> >>>>>>>> I would suggest to turn off the on disk txn sync by default for now >>> to >>>>>>>> avoid this issue, after we finished ZOOKEEPER-3114, we can use that >>> to >>>>>>>> validate the on disk txns during syncing. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Fangmin >>>>>>>> >>>>>>>> On Wed, Sep 12, 2018 at 9:55 AM Fangmin Lv <lvfang...@gmail.com> >>>>>> wrote: >>>>>>>>> Andor, >>>>>>>>> >>>>>>>>> ZOOKEEPER-3114 is about adding real time digest checking to help >>>>>>> detecting >>>>>>>>> inconsistency, it's a new feature with amounts of code change. I'll >>>>>>> start >>>>>>>>> upstream it part by part, but I don't expect it's being merged in >>> the >>>>>>> next >>>>>>>>> few weeks. So yes, it's a nice to have, but definitely not a block >>> for >>>>>>> 3.5. >>>>>>>>> Thanks, >>>>>>>>> Fangmin >>>>>>>>> >>>>>>>>> On Wed, Sep 12, 2018 at 2:55 AM Andor Molnar <an...@apache.org> >>>>>> wrote: >>>>>>>>>> Fangmin, >>>>>>>>>> >>>>>>>>>> Sorry, I just noticed that you want to include the consistency >>> fixes >>>>>> in >>>>>>>>>> the stable version which is fine. Let’s finish the backports and >>>>>> we’ll >>>>>>> be >>>>>>>>>> done with them. >>>>>>>>>> >>>>>>>>>> ZOOKEEPER-3114 is essentially a new feature, I wouldn’t block 3.5 >>>>>> with >>>>>>>>>> that. What do you think? >>>>>>>>>> >>>>>>>>>> Andor >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 2018. Sep 12., at 11:52, Andor Molnar <an...@apache.org> >>> wrote: >>>>>>>>>>> Cool, thanks for the clarification. >>>>>>>>>>> >>>>>>>>>>> The updated list is as follows: >>>>>>>>>>> >>>>>>>>>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol) >>>>>>>>>>> - ZOOKEEPER-1818 (Fix don't care for trunk) >>>>>>>>>>> - ZOOKEEPER-2778 (Potential server deadlock between follower sync >>>>>> with >>>>>>>>>> leader and follower receiving external connection requests.) >>>>>>>>>>> The following are not critical and no blockers for the stable >>>>>> release: >>>>>>>>>>> Waiting for to be ported to 3.5: >>>>>>>>>>> - ZOOKEEPER-3104 >>>>>>>>>>> - ZOOKEEPER-3125 >>>>>>>>>>> - ZOOKEEPER-3127 >>>>>>>>>>> >>>>>>>>>>> New feature: >>>>>>>>>>> - ZOOKEEPER-3114 (fixes ZOOKEEPER-2184 too) >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Andor >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 2018. Sep 12., at 0:42, Fangmin Lv <lvfang...@gmail.com> >>> wrote: >>>>>>>>>>>> Hi Andor, >>>>>>>>>>>> >>>>>>>>>>>> That's the on disk txn feature, which was disabled internally >>> after >>>>>>> we >>>>>>>>>>>> found the potentially inconsistent issue. The only solution we >>> have >>>>>>>>>> for now >>>>>>>>>>>> is waiting for the new digest checking feature I mentioned in >>>>>>>>>>>> ZOOKEEPER-3114. >>>>>>>>>>>> >>>>>>>>>>>> I think there are some other critical consistent issues we just >>>>>> fixed >>>>>>>>>> on >>>>>>>>>>>> master recently: ZOOKEEPER-3104, ZOOKEEPER-3125, >>> ZOOKEEPER-3127, I >>>>>>>>>> think we >>>>>>>>>>>> should include that in the official 3.5 release as well. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Fangmin >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Sep 11, 2018 at 11:58 AM Andor Molnár <an...@apache.org >>>>>>>>>> wrote: >>>>>>>>>>>>> Hi Jeelani, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for letting me know. I'm happy to remove it from the >>> list >>>>>> to >>>>>>>>>> get >>>>>>>>>>>>> closer to a stable release. :) >>>>>>>>>>>>> >>>>>>>>>>>>> What's the feature which can be disabled to avoid data >>>>>>> inconsistency? >>>>>>>>>>>>> >>>>>>>>>>>>> Andor >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 09/10/2018 11:33 PM, Mohamed Jeelani wrote: >>>>>>>>>>>>>> Thanks Andor for compiling this. Should we be ignoring >>>>>>>>>> ZOOKEEPER-2418 as >>>>>>>>>>>>> well? This exists in 3.4 as well and the feature can be >>> disabled. >>>>>> We >>>>>>>>>> are >>>>>>>>>>>>> working on a longer term fix for it in 3.6. >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jeelani >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 9/10/18, 5:19 AM, "Andor Molnar" >>> <an...@cloudera.com.INVALID >>>>>>>>>> wrote: >>>>>>>>>>>>>> Fine. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm happy to ignore 1549, 2846 and 2930. Still we have the >>> list >>>>>>> of: >>>>>>>>>>>>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast >>> protocol) >>>>>>>>>>>>>> - ZOOKEEPER-1818 (Fix don't care for trunk) >>>>>>>>>>>>>> - ZOOKEEPER-2418 (txnlog diff sync can skip sending some >>>>>>>>>>>>> transactions to >>>>>>>>>>>>>> followers) >>>>>>>>>>>>>> - ZOOKEEPER-2778 (Potential server deadlock between follower >>>>>> sync >>>>>>>>>>>>> with >>>>>>>>>>>>>> leader and follower receiving external connection requests.) >>>>>>>>>>>>>> >>>>>>>>>>>>>> SSL (ZK-236) is a feature which essential for the 3.5 release, >>>>>>>>>> hence >>>>>>>>>>>>> I >>>>>>>>>>>>>> wouldn't leave it out or postpone it for the next stable >>>>>> release. >>>>>>>>>> PR >>>>>>>>>>>>> has >>>>>>>>>>>>>> been out for a long time, get on reviewing please. >>>>>>>>>>>>>> The rest are also long outstanding issues which have been >>> found >>>>>> in >>>>>>>>>>>>> the 3.5 >>>>>>>>>>>>>> branch. >>>>>>>>>>>>>> ZK-1818 is something which was found in 3.4 and fixed in 3.4, >>>>>> but >>>>>>>>>>>>> never has >>>>>>>>>>>>>> been fixed in 3.5. Quite a serious issue if still present. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think we should at least run some manual testing and see if >>> we >>>>>>>>>>>>> could >>>>>>>>>>>>>> repro any of these issues before going ahead with a stable >>>>>>> release. >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Andor >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 7, 2018 at 3:24 AM, Michael Han <h...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>>>>>>> I haven't went through the entire list, but looks like lots >>> of >>>>>> the >>>>>>>>>>>>> JIRA >>>>>>>>>>>>>>> issues listed in this thread, such as ZOOKEEPER-1549, 2846, >>> also >>>>>>>>>>>>> affects >>>>>>>>>>>>>>> 3.4 releases. Should we scope these issues out? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think historically the single outstanding blocking issue >>> for a >>>>>>>>>>>>> stable 3.5 >>>>>>>>>>>>>>> release is the reconfig feature and security concerns around >>> it >>>>>>>>>>>>> (somehow >>>>>>>>>>>>>>> addressed in ZOOKEEPER-2014), and the alpha and beta releases >>>>>> were >>>>>>>>>>>>> created >>>>>>>>>>>>>>> to stabilize that feature. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__zookeeper-2Duser.578899.n2.nabble.com_Zookeeper-2Dwith-2D&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=_tGtL3nMWtuPrXKXDx27AIWOzyyT7W-CjIVLDFZwT0E&e= >>>>>>>>>>>>>>> SSL-release-date-tt7581744.html >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So it looks like we are in good shape to release. Something >>>>>> might >>>>>>>>>>>>> worth >>>>>>>>>>>>>>> doing to claim the quality of 3.5 is on par with 3.4 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> * Run Jepsen on 3.5 - 3.4 passed the test for the record >>>>>>>>>>>>>>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__aphyr.com_posts_291-2Djepsen-2Dzookeeper&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=VjORkX5s7hrJyl8mW9Q4cfeSWF4qfTdyRjcuAiBt0y4&e= >>>>>>>>>>>>>>> * Fix all flaky tests on 3.5 - 3.4 has little or no flaky >>> tests >>>>>> at >>>>>>>>>>>>> all. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Sep 4, 2018 at 1:48 AM, Andor Molnar >>>>>>>>>>>>> <an...@cloudera.com.invalid> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks Maoling! That would be huge help, I appreciate it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Andor >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>