Hi All, I would add ZOOKEEPER-3021 <https://issues.apache.org/jira/browse/ZOOKEEPER-3021> Migrate project structure to Maven build as a blocker too. Since the migration has started it would be good to finish before releasing ZK 3.5.x GA.
ZOOKEEPER-925 <https://issues.apache.org/jira/browse/ZOOKEEPER-925> replace our forrest site and documentation generation might also be a good idea, since then we could deliver the new MarkDown based documentation. Regards, Tamaas On Fri, Sep 14, 2018 at 10:09 AM Fangmin Lv <lvfang...@gmail.com> wrote: > Oh, sorry for the confusion, I should provide more context. > > Leader will use on disk txn sync with followers to if the peer zxid is not > in it's in memory commit logs, the code is here: Leader on disk txn sync > < > https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java#L774 > >. > There is bug that potentially there will be gap in the txn files, like > after snap sync, etc, so it's possible the peer will miss txns due to this. > > The option to disable it is snapshotSizeFactor > < > https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZKDatabase.java#L81 > >, > set it to -1 will disable this feature. On 3.5, it's better to have a PR to > set this to -1 by default. It might have more SNAP sync, but from our prod > it doesn't seem to be a big problem to me. > > I can send out the diff to disable it by default on 3.5 if you guys think > this is the right way to do. > > Thanks, > Fangmin > > On Thu, Sep 13, 2018 at 1:58 AM Andor Molnar <an...@apache.org> wrote: > > > What’s needed to turn it off? > > Do we need a PR or it’s just a config option? > > Shall we implement a feature switch for that and turn it off by default? > > > > Sorry I don’t have too much insight on disk txn sync. > > > > Andor > > > > > > > > > On 2018. Sep 13., at 9:16, Fangmin Lv <lvfang...@gmail.com> wrote: > > > > > > And to be clear, ZOOKEEPER-2418 is actually just one case of > > inconsistency > > > which could caused by on disk txn sync, as I mentioned in a newer JIRA > > > ZOOKEEPER-2846 <https://issues.apache.org/jira/browse/ZOOKEEPER-2846>, > > the > > > snap sync or txn sync could also leave txns gap in the txn file, which > > is a > > > more common case could trigger this issue. > > > > > > I would suggest to turn off the on disk txn sync by default for now to > > > avoid this issue, after we finished ZOOKEEPER-3114, we can use that to > > > validate the on disk txns during syncing. > > > > > > Thanks, > > > Fangmin > > > > > > On Wed, Sep 12, 2018 at 9:55 AM Fangmin Lv <lvfang...@gmail.com> > wrote: > > > > > >> Andor, > > >> > > >> ZOOKEEPER-3114 is about adding real time digest checking to help > > detecting > > >> inconsistency, it's a new feature with amounts of code change. I'll > > start > > >> upstream it part by part, but I don't expect it's being merged in the > > next > > >> few weeks. So yes, it's a nice to have, but definitely not a block for > > 3.5. > > >> > > >> Thanks, > > >> Fangmin > > >> > > >> On Wed, Sep 12, 2018 at 2:55 AM Andor Molnar <an...@apache.org> > wrote: > > >> > > >>> Fangmin, > > >>> > > >>> Sorry, I just noticed that you want to include the consistency fixes > in > > >>> the stable version which is fine. Let’s finish the backports and > we’ll > > be > > >>> done with them. > > >>> > > >>> ZOOKEEPER-3114 is essentially a new feature, I wouldn’t block 3.5 > with > > >>> that. What do you think? > > >>> > > >>> Andor > > >>> > > >>> > > >>> > > >>>> On 2018. Sep 12., at 11:52, Andor Molnar <an...@apache.org> wrote: > > >>>> > > >>>> Cool, thanks for the clarification. > > >>>> > > >>>> The updated list is as follows: > > >>>> > > >>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol) > > >>>> - ZOOKEEPER-1818 (Fix don't care for trunk) > > >>>> - ZOOKEEPER-2778 (Potential server deadlock between follower sync > with > > >>> leader and follower receiving external connection requests.) > > >>>> > > >>>> The following are not critical and no blockers for the stable > release: > > >>>> > > >>>> Waiting for to be ported to 3.5: > > >>>> - ZOOKEEPER-3104 > > >>>> - ZOOKEEPER-3125 > > >>>> - ZOOKEEPER-3127 > > >>>> > > >>>> New feature: > > >>>> - ZOOKEEPER-3114 (fixes ZOOKEEPER-2184 too) > > >>>> > > >>>> Regards, > > >>>> Andor > > >>>> > > >>>> > > >>>> > > >>>>> On 2018. Sep 12., at 0:42, Fangmin Lv <lvfang...@gmail.com> wrote: > > >>>>> > > >>>>> Hi Andor, > > >>>>> > > >>>>> That's the on disk txn feature, which was disabled internally after > > we > > >>>>> found the potentially inconsistent issue. The only solution we have > > >>> for now > > >>>>> is waiting for the new digest checking feature I mentioned in > > >>>>> ZOOKEEPER-3114. > > >>>>> > > >>>>> I think there are some other critical consistent issues we just > fixed > > >>> on > > >>>>> master recently: ZOOKEEPER-3104, ZOOKEEPER-3125, ZOOKEEPER-3127, I > > >>> think we > > >>>>> should include that in the official 3.5 release as well. > > >>>>> > > >>>>> Thanks, > > >>>>> Fangmin > > >>>>> > > >>>>> On Tue, Sep 11, 2018 at 11:58 AM Andor Molnár <an...@apache.org> > > >>> wrote: > > >>>>> > > >>>>>> Hi Jeelani, > > >>>>>> > > >>>>>> > > >>>>>> Thanks for letting me know. I'm happy to remove it from the list > to > > >>> get > > >>>>>> closer to a stable release. :) > > >>>>>> > > >>>>>> What's the feature which can be disabled to avoid data > > inconsistency? > > >>>>>> > > >>>>>> > > >>>>>> Andor > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> On 09/10/2018 11:33 PM, Mohamed Jeelani wrote: > > >>>>>>> Thanks Andor for compiling this. Should we be ignoring > > >>> ZOOKEEPER-2418 as > > >>>>>> well? This exists in 3.4 as well and the feature can be disabled. > We > > >>> are > > >>>>>> working on a longer term fix for it in 3.6. > > >>>>>>> > > >>>>>>> Regards, > > >>>>>>> > > >>>>>>> Jeelani > > >>>>>>> > > >>>>>>> On 9/10/18, 5:19 AM, "Andor Molnar" <an...@cloudera.com.INVALID > > > > >>> wrote: > > >>>>>>> > > >>>>>>> Fine. > > >>>>>>> > > >>>>>>> I'm happy to ignore 1549, 2846 and 2930. Still we have the list > > of: > > >>>>>>> > > >>>>>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol) > > >>>>>>> - ZOOKEEPER-1818 (Fix don't care for trunk) > > >>>>>>> - ZOOKEEPER-2418 (txnlog diff sync can skip sending some > > >>>>>> transactions to > > >>>>>>> followers) > > >>>>>>> - ZOOKEEPER-2778 (Potential server deadlock between follower > sync > > >>>>>> with > > >>>>>>> leader and follower receiving external connection requests.) > > >>>>>>> > > >>>>>>> SSL (ZK-236) is a feature which essential for the 3.5 release, > > >>> hence > > >>>>>> I > > >>>>>>> wouldn't leave it out or postpone it for the next stable > release. > > >>> PR > > >>>>>> has > > >>>>>>> been out for a long time, get on reviewing please. > > >>>>>>> The rest are also long outstanding issues which have been found > in > > >>>>>> the 3.5 > > >>>>>>> branch. > > >>>>>>> ZK-1818 is something which was found in 3.4 and fixed in 3.4, > but > > >>>>>> never has > > >>>>>>> been fixed in 3.5. Quite a serious issue if still present. > > >>>>>>> > > >>>>>>> I think we should at least run some manual testing and see if we > > >>>>>> could > > >>>>>>> repro any of these issues before going ahead with a stable > > release. > > >>>>>>> > > >>>>>>> Regards, > > >>>>>>> Andor > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> On Fri, Sep 7, 2018 at 3:24 AM, Michael Han <h...@apache.org> > > >>> wrote: > > >>>>>>> > > >>>>>>>> I haven't went through the entire list, but looks like lots of > the > > >>>>>> JIRA > > >>>>>>>> issues listed in this thread, such as ZOOKEEPER-1549, 2846, also > > >>>>>> affects > > >>>>>>>> 3.4 releases. Should we scope these issues out? > > >>>>>>>> > > >>>>>>>> I think historically the single outstanding blocking issue for a > > >>>>>> stable 3.5 > > >>>>>>>> release is the reconfig feature and security concerns around it > > >>>>>> (somehow > > >>>>>>>> addressed in ZOOKEEPER-2014), and the alpha and beta releases > were > > >>>>>> created > > >>>>>>>> to stabilize that feature. > > >>>>>>>> > > >>>>>>>> > > >>>>>> > > >>> > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__zookeeper-2Duser.578899.n2.nabble.com_Zookeeper-2Dwith-2D&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=_tGtL3nMWtuPrXKXDx27AIWOzyyT7W-CjIVLDFZwT0E&e= > > >>>>>>>> SSL-release-date-tt7581744.html > > >>>>>>>> > > >>>>>>>> So it looks like we are in good shape to release. Something > might > > >>>>>> worth > > >>>>>>>> doing to claim the quality of 3.5 is on par with 3.4 > > >>>>>>>> > > >>>>>>>> * Run Jepsen on 3.5 - 3.4 passed the test for the record > > >>>>>>>> > > >>>>>> > > >>> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__aphyr.com_posts_291-2Djepsen-2Dzookeeper&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=VjORkX5s7hrJyl8mW9Q4cfeSWF4qfTdyRjcuAiBt0y4&e= > > >>>>>>>> * Fix all flaky tests on 3.5 - 3.4 has little or no flaky tests > at > > >>>>>> all. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Tue, Sep 4, 2018 at 1:48 AM, Andor Molnar > > >>>>>> <an...@cloudera.com.invalid> > > >>>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> Thanks Maoling! That would be huge help, I appreciate it. > > >>>>>>>>> > > >>>>>>>>> Andor > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>> > > >>> > > >>> > > > >