Hi All,

I would add ZOOKEEPER-3021
<https://issues.apache.org/jira/browse/ZOOKEEPER-3021> Migrate project
structure to Maven build as a blocker too. Since the migration has started
it would be good to finish before releasing ZK 3.5.x GA.

ZOOKEEPER-925 <https://issues.apache.org/jira/browse/ZOOKEEPER-925> replace
our forrest site and documentation generation might also be a good idea,
since then we could deliver the new MarkDown based documentation.

Regards, Tamaas

On Fri, Sep 14, 2018 at 10:09 AM Fangmin Lv <lvfang...@gmail.com> wrote:

> Oh, sorry for the confusion, I should provide more context.
>
> Leader will use on disk txn sync with followers to if the peer zxid is not
> in it's in memory commit logs, the code is here: Leader on disk txn sync
> <
> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java#L774
> >.
> There is bug that potentially there will be gap in the txn files, like
> after snap sync, etc, so it's possible the peer will miss txns due to this.
>
> The option to disable it is snapshotSizeFactor
> <
> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZKDatabase.java#L81
> >,
> set it to -1 will disable this feature. On 3.5, it's better to have a PR to
> set this to -1 by default. It might have more SNAP sync, but from our prod
> it doesn't seem to be a big problem to me.
>
> I can send out the diff to disable it by default on 3.5 if you guys think
> this is the right way to do.
>
> Thanks,
> Fangmin
>
> On Thu, Sep 13, 2018 at 1:58 AM Andor Molnar <an...@apache.org> wrote:
>
> > What’s needed to turn it off?
> > Do we need a PR or it’s just a config option?
> > Shall we implement a feature switch for that and turn it off by default?
> >
> > Sorry I don’t have too much insight on disk txn sync.
> >
> > Andor
> >
> >
> >
> > > On 2018. Sep 13., at 9:16, Fangmin Lv <lvfang...@gmail.com> wrote:
> > >
> > > And to be clear, ZOOKEEPER-2418 is actually just one case of
> > inconsistency
> > > which could caused by on disk txn sync, as I mentioned in a newer JIRA
> > > ZOOKEEPER-2846 <https://issues.apache.org/jira/browse/ZOOKEEPER-2846>,
> > the
> > > snap sync or txn sync could also leave txns gap in the txn file, which
> > is a
> > > more common case could trigger this issue.
> > >
> > > I would suggest to turn off the on disk txn sync by default for now to
> > > avoid this issue, after we finished ZOOKEEPER-3114, we can use that to
> > > validate the on disk txns during syncing.
> > >
> > > Thanks,
> > > Fangmin
> > >
> > > On Wed, Sep 12, 2018 at 9:55 AM Fangmin Lv <lvfang...@gmail.com>
> wrote:
> > >
> > >> Andor,
> > >>
> > >> ZOOKEEPER-3114 is about adding real time digest checking to help
> > detecting
> > >> inconsistency, it's a new feature with amounts of code change. I'll
> > start
> > >> upstream it part by part, but I don't expect it's being merged in the
> > next
> > >> few weeks. So yes, it's a nice to have, but definitely not a block for
> > 3.5.
> > >>
> > >> Thanks,
> > >> Fangmin
> > >>
> > >> On Wed, Sep 12, 2018 at 2:55 AM Andor Molnar <an...@apache.org>
> wrote:
> > >>
> > >>> Fangmin,
> > >>>
> > >>> Sorry, I just noticed that you want to include the consistency fixes
> in
> > >>> the stable version which is fine. Let’s finish the backports and
> we’ll
> > be
> > >>> done with them.
> > >>>
> > >>> ZOOKEEPER-3114 is essentially a new feature, I wouldn’t block 3.5
> with
> > >>> that. What do you think?
> > >>>
> > >>> Andor
> > >>>
> > >>>
> > >>>
> > >>>> On 2018. Sep 12., at 11:52, Andor Molnar <an...@apache.org> wrote:
> > >>>>
> > >>>> Cool, thanks for the clarification.
> > >>>>
> > >>>> The updated list is as follows:
> > >>>>
> > >>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol)
> > >>>> - ZOOKEEPER-1818 (Fix don't care for trunk)
> > >>>> - ZOOKEEPER-2778 (Potential server deadlock between follower sync
> with
> > >>> leader and follower receiving external connection requests.)
> > >>>>
> > >>>> The following are not critical and no blockers for the stable
> release:
> > >>>>
> > >>>> Waiting for to be ported to 3.5:
> > >>>> - ZOOKEEPER-3104
> > >>>> - ZOOKEEPER-3125
> > >>>> - ZOOKEEPER-3127
> > >>>>
> > >>>> New feature:
> > >>>> - ZOOKEEPER-3114 (fixes ZOOKEEPER-2184 too)
> > >>>>
> > >>>> Regards,
> > >>>> Andor
> > >>>>
> > >>>>
> > >>>>
> > >>>>> On 2018. Sep 12., at 0:42, Fangmin Lv <lvfang...@gmail.com> wrote:
> > >>>>>
> > >>>>> Hi Andor,
> > >>>>>
> > >>>>> That's the on disk txn feature, which was disabled internally after
> > we
> > >>>>> found the potentially inconsistent issue. The only solution we have
> > >>> for now
> > >>>>> is waiting for the new digest checking feature I mentioned in
> > >>>>> ZOOKEEPER-3114.
> > >>>>>
> > >>>>> I think there are some other critical consistent issues we just
> fixed
> > >>> on
> > >>>>> master recently: ZOOKEEPER-3104, ZOOKEEPER-3125, ZOOKEEPER-3127, I
> > >>> think we
> > >>>>> should include that in the official 3.5 release as well.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Fangmin
> > >>>>>
> > >>>>> On Tue, Sep 11, 2018 at 11:58 AM Andor Molnár <an...@apache.org>
> > >>> wrote:
> > >>>>>
> > >>>>>> Hi Jeelani,
> > >>>>>>
> > >>>>>>
> > >>>>>> Thanks for letting me know. I'm happy to remove it from the list
> to
> > >>> get
> > >>>>>> closer to a stable release. :)
> > >>>>>>
> > >>>>>> What's the feature which can be disabled to avoid data
> > inconsistency?
> > >>>>>>
> > >>>>>>
> > >>>>>> Andor
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On 09/10/2018 11:33 PM, Mohamed Jeelani wrote:
> > >>>>>>> Thanks Andor for compiling this. Should we be ignoring
> > >>> ZOOKEEPER-2418 as
> > >>>>>> well? This exists in 3.4 as well and the feature can be disabled.
> We
> > >>> are
> > >>>>>> working on a longer term fix for it in 3.6.
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>>
> > >>>>>>> Jeelani
> > >>>>>>>
> > >>>>>>> On 9/10/18, 5:19 AM, "Andor Molnar" <an...@cloudera.com.INVALID
> >
> > >>> wrote:
> > >>>>>>>
> > >>>>>>>  Fine.
> > >>>>>>>
> > >>>>>>>  I'm happy to ignore 1549, 2846 and 2930. Still we have the list
> > of:
> > >>>>>>>
> > >>>>>>>  - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol)
> > >>>>>>>  - ZOOKEEPER-1818 (Fix don't care for trunk)
> > >>>>>>>  - ZOOKEEPER-2418 (txnlog diff sync can skip sending some
> > >>>>>> transactions to
> > >>>>>>>  followers)
> > >>>>>>>  - ZOOKEEPER-2778 (Potential server deadlock between follower
> sync
> > >>>>>> with
> > >>>>>>>  leader and follower receiving external connection requests.)
> > >>>>>>>
> > >>>>>>>  SSL (ZK-236) is a feature which essential for the 3.5 release,
> > >>> hence
> > >>>>>> I
> > >>>>>>>  wouldn't leave it out or postpone it for the next stable
> release.
> > >>> PR
> > >>>>>> has
> > >>>>>>>  been out for a long time, get on reviewing please.
> > >>>>>>>  The rest are also long outstanding issues which have been found
> in
> > >>>>>> the 3.5
> > >>>>>>>  branch.
> > >>>>>>>  ZK-1818 is something which was found in 3.4 and fixed in 3.4,
> but
> > >>>>>> never has
> > >>>>>>>  been fixed in 3.5. Quite a serious issue if still present.
> > >>>>>>>
> > >>>>>>>  I think we should at least run some manual testing and see if we
> > >>>>>> could
> > >>>>>>>  repro any of these issues before going ahead with a stable
> > release.
> > >>>>>>>
> > >>>>>>>  Regards,
> > >>>>>>>  Andor
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>  On Fri, Sep 7, 2018 at 3:24 AM, Michael Han <h...@apache.org>
> > >>> wrote:
> > >>>>>>>
> > >>>>>>>> I haven't went through the entire list, but looks like lots of
> the
> > >>>>>> JIRA
> > >>>>>>>> issues listed in this thread, such as ZOOKEEPER-1549, 2846, also
> > >>>>>> affects
> > >>>>>>>> 3.4 releases. Should we scope these issues out?
> > >>>>>>>>
> > >>>>>>>> I think historically the single outstanding blocking issue for a
> > >>>>>> stable 3.5
> > >>>>>>>> release is the reconfig feature and security concerns around it
> > >>>>>> (somehow
> > >>>>>>>> addressed in ZOOKEEPER-2014), and the alpha and beta releases
> were
> > >>>>>> created
> > >>>>>>>> to stabilize that feature.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__zookeeper-2Duser.578899.n2.nabble.com_Zookeeper-2Dwith-2D&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=_tGtL3nMWtuPrXKXDx27AIWOzyyT7W-CjIVLDFZwT0E&e=
> > >>>>>>>> SSL-release-date-tt7581744.html
> > >>>>>>>>
> > >>>>>>>> So it looks like we are in good shape to release. Something
> might
> > >>>>>> worth
> > >>>>>>>> doing to claim the quality of 3.5 is on par with 3.4
> > >>>>>>>>
> > >>>>>>>> * Run Jepsen on 3.5 - 3.4 passed the test for the record
> > >>>>>>>>
> > >>>>>>
> > >>>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__aphyr.com_posts_291-2Djepsen-2Dzookeeper&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=VjORkX5s7hrJyl8mW9Q4cfeSWF4qfTdyRjcuAiBt0y4&e=
> > >>>>>>>> * Fix all flaky tests on 3.5 - 3.4 has little or no flaky tests
> at
> > >>>>>> all.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Tue, Sep 4, 2018 at 1:48 AM, Andor Molnar
> > >>>>>> <an...@cloudera.com.invalid>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Thanks Maoling! That would be huge help, I appreciate it.
> > >>>>>>>>>
> > >>>>>>>>> Andor
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>
> > >>>
> > >>>
> >
> >

Reply via email to