And to be clear, ZOOKEEPER-2418 is actually just one case of inconsistency
which could caused by on disk txn sync, as I mentioned in a newer JIRA
ZOOKEEPER-2846 <https://issues.apache.org/jira/browse/ZOOKEEPER-2846>, the
snap sync or txn sync could also leave txns gap in the txn file, which is a
more common case could trigger this issue.

I would suggest to turn off the on disk txn sync by default for now to
avoid this issue, after we finished ZOOKEEPER-3114, we can use that to
validate the on disk txns during syncing.

Thanks,
Fangmin

On Wed, Sep 12, 2018 at 9:55 AM Fangmin Lv <lvfang...@gmail.com> wrote:

> Andor,
>
> ZOOKEEPER-3114 is about adding real time digest checking to help detecting
> inconsistency, it's a new feature with amounts of code change. I'll start
> upstream it part by part, but I don't expect it's being merged in the next
> few weeks. So yes, it's a nice to have, but definitely not a block for 3.5.
>
> Thanks,
> Fangmin
>
> On Wed, Sep 12, 2018 at 2:55 AM Andor Molnar <an...@apache.org> wrote:
>
>> Fangmin,
>>
>> Sorry, I just noticed that you want to include the consistency fixes in
>> the stable version which is fine. Let’s finish the backports and we’ll be
>> done with them.
>>
>> ZOOKEEPER-3114 is essentially a new feature, I wouldn’t block 3.5 with
>> that. What do you think?
>>
>> Andor
>>
>>
>>
>> > On 2018. Sep 12., at 11:52, Andor Molnar <an...@apache.org> wrote:
>> >
>> > Cool, thanks for the clarification.
>> >
>> > The updated list is as follows:
>> >
>> > - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol)
>> > - ZOOKEEPER-1818 (Fix don't care for trunk)
>> > - ZOOKEEPER-2778 (Potential server deadlock between follower sync with
>> leader and follower receiving external connection requests.)
>> >
>> > The following are not critical and no blockers for the stable release:
>> >
>> > Waiting for to be ported to 3.5:
>> > - ZOOKEEPER-3104
>> > - ZOOKEEPER-3125
>> > - ZOOKEEPER-3127
>> >
>> > New feature:
>> > - ZOOKEEPER-3114 (fixes ZOOKEEPER-2184 too)
>> >
>> > Regards,
>> > Andor
>> >
>> >
>> >
>> >> On 2018. Sep 12., at 0:42, Fangmin Lv <lvfang...@gmail.com> wrote:
>> >>
>> >> Hi Andor,
>> >>
>> >> That's the on disk txn feature, which was disabled internally after we
>> >> found the potentially inconsistent issue. The only solution we have
>> for now
>> >> is waiting for the new digest checking feature I mentioned in
>> >> ZOOKEEPER-3114.
>> >>
>> >> I think there are some other critical consistent issues we just fixed
>> on
>> >> master recently: ZOOKEEPER-3104, ZOOKEEPER-3125, ZOOKEEPER-3127, I
>> think we
>> >> should include that in the official 3.5 release as well.
>> >>
>> >> Thanks,
>> >> Fangmin
>> >>
>> >> On Tue, Sep 11, 2018 at 11:58 AM Andor Molnár <an...@apache.org>
>> wrote:
>> >>
>> >>> Hi Jeelani,
>> >>>
>> >>>
>> >>> Thanks for letting me know. I'm happy to remove it from the list to
>> get
>> >>> closer to a stable release. :)
>> >>>
>> >>> What's the feature which can be disabled to avoid data inconsistency?
>> >>>
>> >>>
>> >>> Andor
>> >>>
>> >>>
>> >>>
>> >>> On 09/10/2018 11:33 PM, Mohamed Jeelani wrote:
>> >>>> Thanks Andor for compiling this. Should we be ignoring
>> ZOOKEEPER-2418 as
>> >>> well? This exists in 3.4 as well and the feature can be disabled. We
>> are
>> >>> working on a longer term fix for it in 3.6.
>> >>>>
>> >>>> Regards,
>> >>>>
>> >>>> Jeelani
>> >>>>
>> >>>> On 9/10/18, 5:19 AM, "Andor Molnar" <an...@cloudera.com.INVALID>
>> wrote:
>> >>>>
>> >>>>   Fine.
>> >>>>
>> >>>>   I'm happy to ignore 1549, 2846 and 2930. Still we have the list of:
>> >>>>
>> >>>>   - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol)
>> >>>>   - ZOOKEEPER-1818 (Fix don't care for trunk)
>> >>>>   - ZOOKEEPER-2418 (txnlog diff sync can skip sending some
>> >>> transactions to
>> >>>>   followers)
>> >>>>   - ZOOKEEPER-2778 (Potential server deadlock between follower sync
>> >>> with
>> >>>>   leader and follower receiving external connection requests.)
>> >>>>
>> >>>>   SSL (ZK-236) is a feature which essential for the 3.5 release,
>> hence
>> >>> I
>> >>>>   wouldn't leave it out or postpone it for the next stable release.
>> PR
>> >>> has
>> >>>>   been out for a long time, get on reviewing please.
>> >>>>   The rest are also long outstanding issues which have been found in
>> >>> the 3.5
>> >>>>   branch.
>> >>>>   ZK-1818 is something which was found in 3.4 and fixed in 3.4, but
>> >>> never has
>> >>>>   been fixed in 3.5. Quite a serious issue if still present.
>> >>>>
>> >>>>   I think we should at least run some manual testing and see if we
>> >>> could
>> >>>>   repro any of these issues before going ahead with a stable release.
>> >>>>
>> >>>>   Regards,
>> >>>>   Andor
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>   On Fri, Sep 7, 2018 at 3:24 AM, Michael Han <h...@apache.org>
>> wrote:
>> >>>>
>> >>>>> I haven't went through the entire list, but looks like lots of the
>> >>> JIRA
>> >>>>> issues listed in this thread, such as ZOOKEEPER-1549, 2846, also
>> >>> affects
>> >>>>> 3.4 releases. Should we scope these issues out?
>> >>>>>
>> >>>>> I think historically the single outstanding blocking issue for a
>> >>> stable 3.5
>> >>>>> release is the reconfig feature and security concerns around it
>> >>> (somehow
>> >>>>> addressed in ZOOKEEPER-2014), and the alpha and beta releases were
>> >>> created
>> >>>>> to stabilize that feature.
>> >>>>>
>> >>>>>
>> >>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__zookeeper-2Duser.578899.n2.nabble.com_Zookeeper-2Dwith-2D&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=_tGtL3nMWtuPrXKXDx27AIWOzyyT7W-CjIVLDFZwT0E&e=
>> >>>>> SSL-release-date-tt7581744.html
>> >>>>>
>> >>>>> So it looks like we are in good shape to release. Something might
>> >>> worth
>> >>>>> doing to claim the quality of 3.5 is on par with 3.4
>> >>>>>
>> >>>>> * Run Jepsen on 3.5 - 3.4 passed the test for the record
>> >>>>>
>> >>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__aphyr.com_posts_291-2Djepsen-2Dzookeeper&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=VjORkX5s7hrJyl8mW9Q4cfeSWF4qfTdyRjcuAiBt0y4&e=
>> >>>>> * Fix all flaky tests on 3.5 - 3.4 has little or no flaky tests at
>> >>> all.
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Sep 4, 2018 at 1:48 AM, Andor Molnar
>> >>> <an...@cloudera.com.invalid>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Thanks Maoling! That would be huge help, I appreciate it.
>> >>>>>>
>> >>>>>> Andor
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >
>>
>>

Reply via email to