If Helix components have not actually started using ttl, I believe it is doable (although risky) to build Helix with newer ZK lib version and connect to older ZK servers. Otherwise, if ttl is already used, then I don't think there is a way to support older versions without creating a parallel branch.
My feeling is that Helix internally does not need ttl for now (correct me if I am wrong). In this case, we can keep the older ZK version as default, but release a separate zookeeper-lib for the new ZK version for the customers with needs. Best Regards, Jiajun On Mon, Jul 18, 2022 at 4:27 PM Junkai Xue <[email protected]> wrote: > Thanks Brent for raising this concern! Previously, we were not aware of > this issue of ZK level backward incompatibility. > > I think you can submit the log4j patch to the 1.0.2 branch in Apache Helix > to make it a hotfix. But I am not sure whether we can do a release for that > as long as there is no build number version in Apache Helix. > > I added to the dev list to see whether there are any other suggestions for > this scenario or not. > > Best, > > Junkai > > On Mon, Jul 18, 2022 at 3:34 PM Brent <[email protected]> wrote: > > > Hey Helix folks, > > > > We ran into a fun issue recently. Between the time that Apache Helix > > v1.0.3 was released on April 14 and v1.0.4 was recently on June 9, it > looks > > like a backward-incompatible change may have been introduced on June 3rd > > that makes Helix v1.0.4 not work correctly on Zookeeper 3.4.x clusters. > > > > I do acknowledge that Zookeeper 3.4.x was end-of-lifed on June 1st 2020 ( > > https://lists.apache.org/thread/xckr6nnsg9rxchkbvltkvt7hr2d0mhbo), so > > obviously that certainly factors in, but it's what our organizational > team > > is supporting. So unfortunately we're stuck between a rock and a hard > > place at the moment: > > - We can't go back to v1.0.2 because it lacks the Log4j fixes > > - We can't use v1.0.3 due to the corruption issue > > - We can't move ahead to v1.0.4 due to the compatibility issue with > > Zookeeper > > I have a fork we were previously using ( > > > https://github.com/brentwritescode/helix/releases/tag/1.0.2-with-log4j-2.17.1 > ), > > but that's not a long-term solution either. > > > > The issue is a bit subtle. From v1.0.2 to v1.0.3, the > > org.apache.zookeeper version requirement in the helix/zookeeper-api was > > bumped from 3.14.13 to 3.5.9: > > - v1.0.2: > > > https://github.com/apache/helix/blob/c219050f8dc02c25451493f96575b56fabbf2c1e/zookeeper-api/pom.xml#L58 > > - v1.0.3: > > > https://github.com/apache/helix/blob/46b705f7d47990fa7bf1feeb6c64457e3d80af22/zookeeper-api/pom.xml#L54 > > So that, in and of itself, was not breaking. > > > > And then from v1.0.3 to v1.0.4, some code changes were introduced in this > > PR (https://github.com/apache/helix/pull/2138/files) that relied > > specifically on that 3.5.x Zookeeper version. For example, the "import > > org.apache.zookeeper.AsyncCallback.Create2Callback" that was added to > > > "helix/zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/callback/ZkAsyncCallbacks.java" > > in that PR introduces a backward incompatible change. > > > > So the net result is that, unfortunately, there has been a drift over the > > past two versions (from v1.0.2 to v1.0.4) that has rendered Zookeeper > 3.4.x > > clusters incompatible with Apache Helix. > > > > I wanted to post this here: > > > > 1. To see if you were all aware of it (since it may hit other customers > > as well and we were a bit blind-sided by it) > > 2. To see if you had any ideas on how to work with/around this > > > > Our long-term plan will obviously be to get on newer Zookeeper clusters > as > > we can, but that's likely not going to be a quick turn-around for us. In > > the short-term we'll need to revert back to our v1.0.2 fork. > > > > Does the team happen to have any other comments or suggestions on dealing > > with this issue? Is this correctable at the project level (I suspect > that > > will be tough)? > > > > Thanks much! > > > > ~Brent > > >
