And to be clear, ZOOKEEPER-2418 is actually just one case of inconsistency which could caused by on disk txn sync, as I mentioned in a newer JIRA ZOOKEEPER-2846 <https://issues.apache.org/jira/browse/ZOOKEEPER-2846>, the snap sync or txn sync could also leave txns gap in the txn file, which is a more common case could trigger this issue.
I would suggest to turn off the on disk txn sync by default for now to avoid this issue, after we finished ZOOKEEPER-3114, we can use that to validate the on disk txns during syncing. Thanks, Fangmin On Wed, Sep 12, 2018 at 9:55 AM Fangmin Lv <lvfang...@gmail.com> wrote: > Andor, > > ZOOKEEPER-3114 is about adding real time digest checking to help detecting > inconsistency, it's a new feature with amounts of code change. I'll start > upstream it part by part, but I don't expect it's being merged in the next > few weeks. So yes, it's a nice to have, but definitely not a block for 3.5. > > Thanks, > Fangmin > > On Wed, Sep 12, 2018 at 2:55 AM Andor Molnar <an...@apache.org> wrote: > >> Fangmin, >> >> Sorry, I just noticed that you want to include the consistency fixes in >> the stable version which is fine. Let’s finish the backports and we’ll be >> done with them. >> >> ZOOKEEPER-3114 is essentially a new feature, I wouldn’t block 3.5 with >> that. What do you think? >> >> Andor >> >> >> >> > On 2018. Sep 12., at 11:52, Andor Molnar <an...@apache.org> wrote: >> > >> > Cool, thanks for the clarification. >> > >> > The updated list is as follows: >> > >> > - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol) >> > - ZOOKEEPER-1818 (Fix don't care for trunk) >> > - ZOOKEEPER-2778 (Potential server deadlock between follower sync with >> leader and follower receiving external connection requests.) >> > >> > The following are not critical and no blockers for the stable release: >> > >> > Waiting for to be ported to 3.5: >> > - ZOOKEEPER-3104 >> > - ZOOKEEPER-3125 >> > - ZOOKEEPER-3127 >> > >> > New feature: >> > - ZOOKEEPER-3114 (fixes ZOOKEEPER-2184 too) >> > >> > Regards, >> > Andor >> > >> > >> > >> >> On 2018. Sep 12., at 0:42, Fangmin Lv <lvfang...@gmail.com> wrote: >> >> >> >> Hi Andor, >> >> >> >> That's the on disk txn feature, which was disabled internally after we >> >> found the potentially inconsistent issue. The only solution we have >> for now >> >> is waiting for the new digest checking feature I mentioned in >> >> ZOOKEEPER-3114. >> >> >> >> I think there are some other critical consistent issues we just fixed >> on >> >> master recently: ZOOKEEPER-3104, ZOOKEEPER-3125, ZOOKEEPER-3127, I >> think we >> >> should include that in the official 3.5 release as well. >> >> >> >> Thanks, >> >> Fangmin >> >> >> >> On Tue, Sep 11, 2018 at 11:58 AM Andor Molnár <an...@apache.org> >> wrote: >> >> >> >>> Hi Jeelani, >> >>> >> >>> >> >>> Thanks for letting me know. I'm happy to remove it from the list to >> get >> >>> closer to a stable release. :) >> >>> >> >>> What's the feature which can be disabled to avoid data inconsistency? >> >>> >> >>> >> >>> Andor >> >>> >> >>> >> >>> >> >>> On 09/10/2018 11:33 PM, Mohamed Jeelani wrote: >> >>>> Thanks Andor for compiling this. Should we be ignoring >> ZOOKEEPER-2418 as >> >>> well? This exists in 3.4 as well and the feature can be disabled. We >> are >> >>> working on a longer term fix for it in 3.6. >> >>>> >> >>>> Regards, >> >>>> >> >>>> Jeelani >> >>>> >> >>>> On 9/10/18, 5:19 AM, "Andor Molnar" <an...@cloudera.com.INVALID> >> wrote: >> >>>> >> >>>> Fine. >> >>>> >> >>>> I'm happy to ignore 1549, 2846 and 2930. Still we have the list of: >> >>>> >> >>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol) >> >>>> - ZOOKEEPER-1818 (Fix don't care for trunk) >> >>>> - ZOOKEEPER-2418 (txnlog diff sync can skip sending some >> >>> transactions to >> >>>> followers) >> >>>> - ZOOKEEPER-2778 (Potential server deadlock between follower sync >> >>> with >> >>>> leader and follower receiving external connection requests.) >> >>>> >> >>>> SSL (ZK-236) is a feature which essential for the 3.5 release, >> hence >> >>> I >> >>>> wouldn't leave it out or postpone it for the next stable release. >> PR >> >>> has >> >>>> been out for a long time, get on reviewing please. >> >>>> The rest are also long outstanding issues which have been found in >> >>> the 3.5 >> >>>> branch. >> >>>> ZK-1818 is something which was found in 3.4 and fixed in 3.4, but >> >>> never has >> >>>> been fixed in 3.5. Quite a serious issue if still present. >> >>>> >> >>>> I think we should at least run some manual testing and see if we >> >>> could >> >>>> repro any of these issues before going ahead with a stable release. >> >>>> >> >>>> Regards, >> >>>> Andor >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> On Fri, Sep 7, 2018 at 3:24 AM, Michael Han <h...@apache.org> >> wrote: >> >>>> >> >>>>> I haven't went through the entire list, but looks like lots of the >> >>> JIRA >> >>>>> issues listed in this thread, such as ZOOKEEPER-1549, 2846, also >> >>> affects >> >>>>> 3.4 releases. Should we scope these issues out? >> >>>>> >> >>>>> I think historically the single outstanding blocking issue for a >> >>> stable 3.5 >> >>>>> release is the reconfig feature and security concerns around it >> >>> (somehow >> >>>>> addressed in ZOOKEEPER-2014), and the alpha and beta releases were >> >>> created >> >>>>> to stabilize that feature. >> >>>>> >> >>>>> >> >>> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__zookeeper-2Duser.578899.n2.nabble.com_Zookeeper-2Dwith-2D&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=_tGtL3nMWtuPrXKXDx27AIWOzyyT7W-CjIVLDFZwT0E&e= >> >>>>> SSL-release-date-tt7581744.html >> >>>>> >> >>>>> So it looks like we are in good shape to release. Something might >> >>> worth >> >>>>> doing to claim the quality of 3.5 is on par with 3.4 >> >>>>> >> >>>>> * Run Jepsen on 3.5 - 3.4 passed the test for the record >> >>>>> >> >>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__aphyr.com_posts_291-2Djepsen-2Dzookeeper&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=VjORkX5s7hrJyl8mW9Q4cfeSWF4qfTdyRjcuAiBt0y4&e= >> >>>>> * Fix all flaky tests on 3.5 - 3.4 has little or no flaky tests at >> >>> all. >> >>>>> >> >>>>> >> >>>>> On Tue, Sep 4, 2018 at 1:48 AM, Andor Molnar >> >>> <an...@cloudera.com.invalid> >> >>>>> wrote: >> >>>>> >> >>>>>> Thanks Maoling! That would be huge help, I appreciate it. >> >>>>>> >> >>>>>> Andor >> >>>>>> >> >>>>> >> >>>> >> >>>> >> >>> >> >>> >> > >> >>