What’s needed to turn it off? Do we need a PR or it’s just a config option? Shall we implement a feature switch for that and turn it off by default?
Sorry I don’t have too much insight on disk txn sync. Andor > On 2018. Sep 13., at 9:16, Fangmin Lv <[email protected]> wrote: > > And to be clear, ZOOKEEPER-2418 is actually just one case of inconsistency > which could caused by on disk txn sync, as I mentioned in a newer JIRA > ZOOKEEPER-2846 <https://issues.apache.org/jira/browse/ZOOKEEPER-2846>, the > snap sync or txn sync could also leave txns gap in the txn file, which is a > more common case could trigger this issue. > > I would suggest to turn off the on disk txn sync by default for now to > avoid this issue, after we finished ZOOKEEPER-3114, we can use that to > validate the on disk txns during syncing. > > Thanks, > Fangmin > > On Wed, Sep 12, 2018 at 9:55 AM Fangmin Lv <[email protected]> wrote: > >> Andor, >> >> ZOOKEEPER-3114 is about adding real time digest checking to help detecting >> inconsistency, it's a new feature with amounts of code change. I'll start >> upstream it part by part, but I don't expect it's being merged in the next >> few weeks. So yes, it's a nice to have, but definitely not a block for 3.5. >> >> Thanks, >> Fangmin >> >> On Wed, Sep 12, 2018 at 2:55 AM Andor Molnar <[email protected]> wrote: >> >>> Fangmin, >>> >>> Sorry, I just noticed that you want to include the consistency fixes in >>> the stable version which is fine. Let’s finish the backports and we’ll be >>> done with them. >>> >>> ZOOKEEPER-3114 is essentially a new feature, I wouldn’t block 3.5 with >>> that. What do you think? >>> >>> Andor >>> >>> >>> >>>> On 2018. Sep 12., at 11:52, Andor Molnar <[email protected]> wrote: >>>> >>>> Cool, thanks for the clarification. >>>> >>>> The updated list is as follows: >>>> >>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol) >>>> - ZOOKEEPER-1818 (Fix don't care for trunk) >>>> - ZOOKEEPER-2778 (Potential server deadlock between follower sync with >>> leader and follower receiving external connection requests.) >>>> >>>> The following are not critical and no blockers for the stable release: >>>> >>>> Waiting for to be ported to 3.5: >>>> - ZOOKEEPER-3104 >>>> - ZOOKEEPER-3125 >>>> - ZOOKEEPER-3127 >>>> >>>> New feature: >>>> - ZOOKEEPER-3114 (fixes ZOOKEEPER-2184 too) >>>> >>>> Regards, >>>> Andor >>>> >>>> >>>> >>>>> On 2018. Sep 12., at 0:42, Fangmin Lv <[email protected]> wrote: >>>>> >>>>> Hi Andor, >>>>> >>>>> That's the on disk txn feature, which was disabled internally after we >>>>> found the potentially inconsistent issue. The only solution we have >>> for now >>>>> is waiting for the new digest checking feature I mentioned in >>>>> ZOOKEEPER-3114. >>>>> >>>>> I think there are some other critical consistent issues we just fixed >>> on >>>>> master recently: ZOOKEEPER-3104, ZOOKEEPER-3125, ZOOKEEPER-3127, I >>> think we >>>>> should include that in the official 3.5 release as well. >>>>> >>>>> Thanks, >>>>> Fangmin >>>>> >>>>> On Tue, Sep 11, 2018 at 11:58 AM Andor Molnár <[email protected]> >>> wrote: >>>>> >>>>>> Hi Jeelani, >>>>>> >>>>>> >>>>>> Thanks for letting me know. I'm happy to remove it from the list to >>> get >>>>>> closer to a stable release. :) >>>>>> >>>>>> What's the feature which can be disabled to avoid data inconsistency? >>>>>> >>>>>> >>>>>> Andor >>>>>> >>>>>> >>>>>> >>>>>> On 09/10/2018 11:33 PM, Mohamed Jeelani wrote: >>>>>>> Thanks Andor for compiling this. Should we be ignoring >>> ZOOKEEPER-2418 as >>>>>> well? This exists in 3.4 as well and the feature can be disabled. We >>> are >>>>>> working on a longer term fix for it in 3.6. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Jeelani >>>>>>> >>>>>>> On 9/10/18, 5:19 AM, "Andor Molnar" <[email protected]> >>> wrote: >>>>>>> >>>>>>> Fine. >>>>>>> >>>>>>> I'm happy to ignore 1549, 2846 and 2930. Still we have the list of: >>>>>>> >>>>>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol) >>>>>>> - ZOOKEEPER-1818 (Fix don't care for trunk) >>>>>>> - ZOOKEEPER-2418 (txnlog diff sync can skip sending some >>>>>> transactions to >>>>>>> followers) >>>>>>> - ZOOKEEPER-2778 (Potential server deadlock between follower sync >>>>>> with >>>>>>> leader and follower receiving external connection requests.) >>>>>>> >>>>>>> SSL (ZK-236) is a feature which essential for the 3.5 release, >>> hence >>>>>> I >>>>>>> wouldn't leave it out or postpone it for the next stable release. >>> PR >>>>>> has >>>>>>> been out for a long time, get on reviewing please. >>>>>>> The rest are also long outstanding issues which have been found in >>>>>> the 3.5 >>>>>>> branch. >>>>>>> ZK-1818 is something which was found in 3.4 and fixed in 3.4, but >>>>>> never has >>>>>>> been fixed in 3.5. Quite a serious issue if still present. >>>>>>> >>>>>>> I think we should at least run some manual testing and see if we >>>>>> could >>>>>>> repro any of these issues before going ahead with a stable release. >>>>>>> >>>>>>> Regards, >>>>>>> Andor >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 7, 2018 at 3:24 AM, Michael Han <[email protected]> >>> wrote: >>>>>>> >>>>>>>> I haven't went through the entire list, but looks like lots of the >>>>>> JIRA >>>>>>>> issues listed in this thread, such as ZOOKEEPER-1549, 2846, also >>>>>> affects >>>>>>>> 3.4 releases. Should we scope these issues out? >>>>>>>> >>>>>>>> I think historically the single outstanding blocking issue for a >>>>>> stable 3.5 >>>>>>>> release is the reconfig feature and security concerns around it >>>>>> (somehow >>>>>>>> addressed in ZOOKEEPER-2014), and the alpha and beta releases were >>>>>> created >>>>>>>> to stabilize that feature. >>>>>>>> >>>>>>>> >>>>>> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__zookeeper-2Duser.578899.n2.nabble.com_Zookeeper-2Dwith-2D&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=_tGtL3nMWtuPrXKXDx27AIWOzyyT7W-CjIVLDFZwT0E&e= >>>>>>>> SSL-release-date-tt7581744.html >>>>>>>> >>>>>>>> So it looks like we are in good shape to release. Something might >>>>>> worth >>>>>>>> doing to claim the quality of 3.5 is on par with 3.4 >>>>>>>> >>>>>>>> * Run Jepsen on 3.5 - 3.4 passed the test for the record >>>>>>>> >>>>>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__aphyr.com_posts_291-2Djepsen-2Dzookeeper&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=VjORkX5s7hrJyl8mW9Q4cfeSWF4qfTdyRjcuAiBt0y4&e= >>>>>>>> * Fix all flaky tests on 3.5 - 3.4 has little or no flaky tests at >>>>>> all. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Sep 4, 2018 at 1:48 AM, Andor Molnar >>>>>> <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks Maoling! That would be huge help, I appreciate it. >>>>>>>>> >>>>>>>>> Andor >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>> >>> >>>
