Andor, Here is the PR to port ZK-3104 from master to 3.4: https://github.com/apache/zookeeper/pull/685.
Fangmin On Fri, Nov 2, 2018 at 11:46 AM Fangmin Lv <lvfang...@gmail.com> wrote: > Hi Andor, > > Is anyone working on ZK-2778? I can pick it up if there is no one working > on it yet. > > I'll open a 3.5 PR for ZK-3104 today. > > Fangmin > > On Fri, Oct 26, 2018 at 3:33 AM Andor Molnar <an...@apache.org> wrote: > >> Hi folks, >> >> You’ve probably realised lots of update emails coming from Jira. Please >> be aware that we’ve updated a bunch of open blocker/critical 3.5 tickets to >> reflect to what we discussed in this email. >> >> If you open up the following jira filter: >> >> project = ZooKeeper and resolution = Unresolved and fixVersion = 3.5.5 >> AND priority in (blocker, critical) ORDER BY priority DESC, key ASC >> >> You’ll see the most up-to-date list of tickets which need to be addressed >> before the stable 3.5 release. >> >> Thank you for your efforts to get this done. >> >> Fangmin, ZK-3104 is waiting for backport, but ticket has already been >> resolved. Have you created a separate ticket for the backport or shall I >> just reopen it with the right fix versions? >> >> Thanks, >> Andor >> >> >> >> > On 2018. Oct 8., at 12:34, Andor Molnar <an...@apache.org> wrote: >> > >> > Hi, >> > >> > Let me summarize and give a quick update on the outstanding issues for >> 3.5 GA: >> > >> > - ZOOKEEPER-1818 (Fix don't care for trunk) >> > - ZOOKEEPER-2778 (Potential server deadlock between follower sync with >> leader and follower receiving external connection requests.) >> > - ZOOKEEPER-3021 Migrate project structure to Maven (ongoing) >> > - ZOOKEEPER-925 Docs generation to Maven >> > - ZOOKEEPER-3104 (waiting for backport) >> > - ZOOKEEPER-3125 (waiting for backport PR #647) >> > >> > The 2 Maven related tickets are no-brainers as well as the backports. >> ZK-2778 has been picked up by Maoling (thanks!) as far as I can see, >> ZK-1818 is the only one waiting for a volunteer. >> > >> > Please correct me if I’ve missed something. >> > >> > Regards, >> > Andor >> > >> > >> > >> > >> >> On 2018. Sep 28., at 18:32, Tamas Penzes <tam...@cloudera.com.INVALID> >> wrote: >> >> >> >> Hi All, >> >> >> >> I would add ZOOKEEPER-3021 >> >> <https://issues.apache.org/jira/browse/ZOOKEEPER-3021> Migrate project >> >> structure to Maven build as a blocker too. Since the migration has >> started >> >> it would be good to finish before releasing ZK 3.5.x GA. >> >> >> >> ZOOKEEPER-925 <https://issues.apache.org/jira/browse/ZOOKEEPER-925> >> replace >> >> our forrest site and documentation generation might also be a good >> idea, >> >> since then we could deliver the new MarkDown based documentation. >> >> >> >> Regards, Tamaas >> >> >> >> On Fri, Sep 14, 2018 at 10:09 AM Fangmin Lv <lvfang...@gmail.com> >> wrote: >> >> >> >>> Oh, sorry for the confusion, I should provide more context. >> >>> >> >>> Leader will use on disk txn sync with followers to if the peer zxid >> is not >> >>> in it's in memory commit logs, the code is here: Leader on disk txn >> sync >> >>> < >> >>> >> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java#L774 >> >>>> . >> >>> There is bug that potentially there will be gap in the txn files, like >> >>> after snap sync, etc, so it's possible the peer will miss txns due to >> this. >> >>> >> >>> The option to disable it is snapshotSizeFactor >> >>> < >> >>> >> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZKDatabase.java#L81 >> >>>> , >> >>> set it to -1 will disable this feature. On 3.5, it's better to have a >> PR to >> >>> set this to -1 by default. It might have more SNAP sync, but from our >> prod >> >>> it doesn't seem to be a big problem to me. >> >>> >> >>> I can send out the diff to disable it by default on 3.5 if you guys >> think >> >>> this is the right way to do. >> >>> >> >>> Thanks, >> >>> Fangmin >> >>> >> >>> On Thu, Sep 13, 2018 at 1:58 AM Andor Molnar <an...@apache.org> >> wrote: >> >>> >> >>>> What’s needed to turn it off? >> >>>> Do we need a PR or it’s just a config option? >> >>>> Shall we implement a feature switch for that and turn it off by >> default? >> >>>> >> >>>> Sorry I don’t have too much insight on disk txn sync. >> >>>> >> >>>> Andor >> >>>> >> >>>> >> >>>> >> >>>>> On 2018. Sep 13., at 9:16, Fangmin Lv <lvfang...@gmail.com> wrote: >> >>>>> >> >>>>> And to be clear, ZOOKEEPER-2418 is actually just one case of >> >>>> inconsistency >> >>>>> which could caused by on disk txn sync, as I mentioned in a newer >> JIRA >> >>>>> ZOOKEEPER-2846 < >> https://issues.apache.org/jira/browse/ZOOKEEPER-2846>, >> >>>> the >> >>>>> snap sync or txn sync could also leave txns gap in the txn file, >> which >> >>>> is a >> >>>>> more common case could trigger this issue. >> >>>>> >> >>>>> I would suggest to turn off the on disk txn sync by default for now >> to >> >>>>> avoid this issue, after we finished ZOOKEEPER-3114, we can use that >> to >> >>>>> validate the on disk txns during syncing. >> >>>>> >> >>>>> Thanks, >> >>>>> Fangmin >> >>>>> >> >>>>> On Wed, Sep 12, 2018 at 9:55 AM Fangmin Lv <lvfang...@gmail.com> >> >>> wrote: >> >>>>> >> >>>>>> Andor, >> >>>>>> >> >>>>>> ZOOKEEPER-3114 is about adding real time digest checking to help >> >>>> detecting >> >>>>>> inconsistency, it's a new feature with amounts of code change. I'll >> >>>> start >> >>>>>> upstream it part by part, but I don't expect it's being merged in >> the >> >>>> next >> >>>>>> few weeks. So yes, it's a nice to have, but definitely not a block >> for >> >>>> 3.5. >> >>>>>> >> >>>>>> Thanks, >> >>>>>> Fangmin >> >>>>>> >> >>>>>> On Wed, Sep 12, 2018 at 2:55 AM Andor Molnar <an...@apache.org> >> >>> wrote: >> >>>>>> >> >>>>>>> Fangmin, >> >>>>>>> >> >>>>>>> Sorry, I just noticed that you want to include the consistency >> fixes >> >>> in >> >>>>>>> the stable version which is fine. Let’s finish the backports and >> >>> we’ll >> >>>> be >> >>>>>>> done with them. >> >>>>>>> >> >>>>>>> ZOOKEEPER-3114 is essentially a new feature, I wouldn’t block 3.5 >> >>> with >> >>>>>>> that. What do you think? >> >>>>>>> >> >>>>>>> Andor >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>>> On 2018. Sep 12., at 11:52, Andor Molnar <an...@apache.org> >> wrote: >> >>>>>>>> >> >>>>>>>> Cool, thanks for the clarification. >> >>>>>>>> >> >>>>>>>> The updated list is as follows: >> >>>>>>>> >> >>>>>>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol) >> >>>>>>>> - ZOOKEEPER-1818 (Fix don't care for trunk) >> >>>>>>>> - ZOOKEEPER-2778 (Potential server deadlock between follower sync >> >>> with >> >>>>>>> leader and follower receiving external connection requests.) >> >>>>>>>> >> >>>>>>>> The following are not critical and no blockers for the stable >> >>> release: >> >>>>>>>> >> >>>>>>>> Waiting for to be ported to 3.5: >> >>>>>>>> - ZOOKEEPER-3104 >> >>>>>>>> - ZOOKEEPER-3125 >> >>>>>>>> - ZOOKEEPER-3127 >> >>>>>>>> >> >>>>>>>> New feature: >> >>>>>>>> - ZOOKEEPER-3114 (fixes ZOOKEEPER-2184 too) >> >>>>>>>> >> >>>>>>>> Regards, >> >>>>>>>> Andor >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>>> On 2018. Sep 12., at 0:42, Fangmin Lv <lvfang...@gmail.com> >> wrote: >> >>>>>>>>> >> >>>>>>>>> Hi Andor, >> >>>>>>>>> >> >>>>>>>>> That's the on disk txn feature, which was disabled internally >> after >> >>>> we >> >>>>>>>>> found the potentially inconsistent issue. The only solution we >> have >> >>>>>>> for now >> >>>>>>>>> is waiting for the new digest checking feature I mentioned in >> >>>>>>>>> ZOOKEEPER-3114. >> >>>>>>>>> >> >>>>>>>>> I think there are some other critical consistent issues we just >> >>> fixed >> >>>>>>> on >> >>>>>>>>> master recently: ZOOKEEPER-3104, ZOOKEEPER-3125, >> ZOOKEEPER-3127, I >> >>>>>>> think we >> >>>>>>>>> should include that in the official 3.5 release as well. >> >>>>>>>>> >> >>>>>>>>> Thanks, >> >>>>>>>>> Fangmin >> >>>>>>>>> >> >>>>>>>>> On Tue, Sep 11, 2018 at 11:58 AM Andor Molnár <an...@apache.org >> > >> >>>>>>> wrote: >> >>>>>>>>> >> >>>>>>>>>> Hi Jeelani, >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Thanks for letting me know. I'm happy to remove it from the >> list >> >>> to >> >>>>>>> get >> >>>>>>>>>> closer to a stable release. :) >> >>>>>>>>>> >> >>>>>>>>>> What's the feature which can be disabled to avoid data >> >>>> inconsistency? >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Andor >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On 09/10/2018 11:33 PM, Mohamed Jeelani wrote: >> >>>>>>>>>>> Thanks Andor for compiling this. Should we be ignoring >> >>>>>>> ZOOKEEPER-2418 as >> >>>>>>>>>> well? This exists in 3.4 as well and the feature can be >> disabled. >> >>> We >> >>>>>>> are >> >>>>>>>>>> working on a longer term fix for it in 3.6. >> >>>>>>>>>>> >> >>>>>>>>>>> Regards, >> >>>>>>>>>>> >> >>>>>>>>>>> Jeelani >> >>>>>>>>>>> >> >>>>>>>>>>> On 9/10/18, 5:19 AM, "Andor Molnar" >> <an...@cloudera.com.INVALID >> >>>> >> >>>>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>> Fine. >> >>>>>>>>>>> >> >>>>>>>>>>> I'm happy to ignore 1549, 2846 and 2930. Still we have the >> list >> >>>> of: >> >>>>>>>>>>> >> >>>>>>>>>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast >> protocol) >> >>>>>>>>>>> - ZOOKEEPER-1818 (Fix don't care for trunk) >> >>>>>>>>>>> - ZOOKEEPER-2418 (txnlog diff sync can skip sending some >> >>>>>>>>>> transactions to >> >>>>>>>>>>> followers) >> >>>>>>>>>>> - ZOOKEEPER-2778 (Potential server deadlock between follower >> >>> sync >> >>>>>>>>>> with >> >>>>>>>>>>> leader and follower receiving external connection requests.) >> >>>>>>>>>>> >> >>>>>>>>>>> SSL (ZK-236) is a feature which essential for the 3.5 release, >> >>>>>>> hence >> >>>>>>>>>> I >> >>>>>>>>>>> wouldn't leave it out or postpone it for the next stable >> >>> release. >> >>>>>>> PR >> >>>>>>>>>> has >> >>>>>>>>>>> been out for a long time, get on reviewing please. >> >>>>>>>>>>> The rest are also long outstanding issues which have been >> found >> >>> in >> >>>>>>>>>> the 3.5 >> >>>>>>>>>>> branch. >> >>>>>>>>>>> ZK-1818 is something which was found in 3.4 and fixed in 3.4, >> >>> but >> >>>>>>>>>> never has >> >>>>>>>>>>> been fixed in 3.5. Quite a serious issue if still present. >> >>>>>>>>>>> >> >>>>>>>>>>> I think we should at least run some manual testing and see if >> we >> >>>>>>>>>> could >> >>>>>>>>>>> repro any of these issues before going ahead with a stable >> >>>> release. >> >>>>>>>>>>> >> >>>>>>>>>>> Regards, >> >>>>>>>>>>> Andor >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> On Fri, Sep 7, 2018 at 3:24 AM, Michael Han <h...@apache.org> >> >>>>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>>> I haven't went through the entire list, but looks like lots >> of >> >>> the >> >>>>>>>>>> JIRA >> >>>>>>>>>>>> issues listed in this thread, such as ZOOKEEPER-1549, 2846, >> also >> >>>>>>>>>> affects >> >>>>>>>>>>>> 3.4 releases. Should we scope these issues out? >> >>>>>>>>>>>> >> >>>>>>>>>>>> I think historically the single outstanding blocking issue >> for a >> >>>>>>>>>> stable 3.5 >> >>>>>>>>>>>> release is the reconfig feature and security concerns around >> it >> >>>>>>>>>> (somehow >> >>>>>>>>>>>> addressed in ZOOKEEPER-2014), and the alpha and beta releases >> >>> were >> >>>>>>>>>> created >> >>>>>>>>>>>> to stabilize that feature. >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>> >> >>>>>>> >> >>>> >> >>> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__zookeeper-2Duser.578899.n2.nabble.com_Zookeeper-2Dwith-2D&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=_tGtL3nMWtuPrXKXDx27AIWOzyyT7W-CjIVLDFZwT0E&e= >> >>>>>>>>>>>> SSL-release-date-tt7581744.html >> >>>>>>>>>>>> >> >>>>>>>>>>>> So it looks like we are in good shape to release. Something >> >>> might >> >>>>>>>>>> worth >> >>>>>>>>>>>> doing to claim the quality of 3.5 is on par with 3.4 >> >>>>>>>>>>>> >> >>>>>>>>>>>> * Run Jepsen on 3.5 - 3.4 passed the test for the record >> >>>>>>>>>>>> >> >>>>>>>>>> >> >>>>>>> >> >>>> >> >>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__aphyr.com_posts_291-2Djepsen-2Dzookeeper&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=VjORkX5s7hrJyl8mW9Q4cfeSWF4qfTdyRjcuAiBt0y4&e= >> >>>>>>>>>>>> * Fix all flaky tests on 3.5 - 3.4 has little or no flaky >> tests >> >>> at >> >>>>>>>>>> all. >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> On Tue, Sep 4, 2018 at 1:48 AM, Andor Molnar >> >>>>>>>>>> <an...@cloudera.com.invalid> >> >>>>>>>>>>>> wrote: >> >>>>>>>>>>>> >> >>>>>>>>>>>>> Thanks Maoling! That would be huge help, I appreciate it. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Andor >> >>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>>>> >> >>>> >> >>>> >> > >> >>