If Helix components have not actually started using ttl, I believe it is
doable (although risky) to build Helix with newer ZK lib version and
connect to older ZK servers. Otherwise, if ttl is already used, then I
don't think there is a way to support older versions without creating a
parallel branch.

My feeling is that Helix internally does not need ttl for now (correct me
if I am wrong). In this case, we can keep the older ZK version as default,
but release a separate zookeeper-lib for the new ZK version for the
customers with needs.

Best Regards,
Jiajun


On Mon, Jul 18, 2022 at 4:27 PM Junkai Xue <[email protected]> wrote:

> Thanks Brent for raising this concern! Previously, we were not aware of
> this issue of ZK level backward incompatibility.
>
> I think you can submit the log4j patch to the 1.0.2 branch in Apache Helix
> to make it a hotfix. But I am not sure whether we can do a release for that
> as long as there is no build number version in Apache Helix.
>
> I added to the dev list to see whether there are any other suggestions for
> this scenario or not.
>
> Best,
>
> Junkai
>
> On Mon, Jul 18, 2022 at 3:34 PM Brent <[email protected]> wrote:
>
> > Hey Helix folks,
> >
> > We ran into a fun issue recently.  Between the time that Apache Helix
> > v1.0.3 was released on April 14 and v1.0.4 was recently on June 9, it
> looks
> > like a backward-incompatible change may have been introduced on June 3rd
> > that makes Helix v1.0.4 not work correctly on Zookeeper 3.4.x clusters.
> >
> > I do acknowledge that Zookeeper 3.4.x was end-of-lifed on June 1st 2020 (
> > https://lists.apache.org/thread/xckr6nnsg9rxchkbvltkvt7hr2d0mhbo), so
> > obviously that certainly factors in, but it's what our organizational
> team
> > is supporting.  So unfortunately we're stuck between a rock and a hard
> > place at the moment:
> > - We can't go back to v1.0.2 because it lacks the Log4j fixes
> > - We can't use v1.0.3 due to the corruption issue
> > - We can't move ahead to v1.0.4 due to the compatibility issue with
> > Zookeeper
> > I have a fork we were previously using (
> >
> https://github.com/brentwritescode/helix/releases/tag/1.0.2-with-log4j-2.17.1
> ),
> > but that's not a long-term solution either.
> >
> > The issue is a bit subtle.  From v1.0.2 to v1.0.3, the
> > org.apache.zookeeper version requirement in the helix/zookeeper-api was
> > bumped from 3.14.13 to 3.5.9:
> > - v1.0.2:
> >
> https://github.com/apache/helix/blob/c219050f8dc02c25451493f96575b56fabbf2c1e/zookeeper-api/pom.xml#L58
> > - v1.0.3:
> >
> https://github.com/apache/helix/blob/46b705f7d47990fa7bf1feeb6c64457e3d80af22/zookeeper-api/pom.xml#L54
> > So that, in and of itself, was not breaking.
> >
> > And then from v1.0.3 to v1.0.4, some code changes were introduced in this
> > PR (https://github.com/apache/helix/pull/2138/files) that relied
> > specifically on that 3.5.x Zookeeper version.  For example, the "import
> > org.apache.zookeeper.AsyncCallback.Create2Callback" that was added to
> >
> "helix/zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/callback/ZkAsyncCallbacks.java"
> > in that PR introduces a backward incompatible change.
> >
> > So the net result is that, unfortunately, there has been a drift over the
> > past two versions (from v1.0.2 to v1.0.4) that has rendered Zookeeper
> 3.4.x
> > clusters incompatible with Apache Helix.
> >
> > I wanted to post this here:
> >
> > 1.  To see if you were all aware of it (since it may hit other customers
> > as well and we were a bit blind-sided by it)
> > 2.  To see if you had any ideas on how to work with/around this
> >
> > Our long-term plan will obviously be to get on newer Zookeeper clusters
> as
> > we can, but that's likely not going to be a quick turn-around for us.  In
> > the short-term we'll need to revert back to our v1.0.2 fork.
> >
> > Does the team happen to have any other comments or suggestions on dealing
> > with this issue?  Is this correctable at the project level (I suspect
> that
> > will be tough)?
> >
> > Thanks much!
> >
> > ~Brent
> >
>

Reply via email to