... and I always thought replication would always be used within the same server.
Recently we added a test on replication versioning (compatibility test). I will see what I can do with the versioning. On Fri, Jul 15, 2022 at 11:43 AM Robbie Gemmell <robbie.gemm...@gmail.com> wrote: > > Perhaps, I didnt go looking at the year old commits to see the > relative sequence of when it changed. The problem being raised wasnt > that the particular PR didnt change the version though (albeit the > version either already had, or subsequently did change, which I was > simply noting in case it wasnt already clear to Jan). Instead its that > it changed that packet contents without adding a new packet version, > and its being said that the old server cant handle the new data now > being sent in the old packet, and also that the new server cant handle > the absence of the new data that the old server obviously doesnt know > about to send it. > > Which or both of those is true I dont know. I do recall other similar > cases before of suggesting not sending new fields to old servers, and > being told it shouldnt matter as theyd simply not use it, though > personally I argued it still should never be sent to them as then it > definitely cant cause any change in behaviour. > > On Fri, 15 Jul 2022 at 16:00, Clebert Suconic <clebert.suco...@gmail.com> > wrote: > > > > as far as I know that PR did not make a switch in the protocol version > > because there was already another change in there for the same > > version... right? > > > > On Fri, Jul 15, 2022 at 6:07 AM Robbie Gemmell <robbie.gemm...@gmail.com> > > wrote: > > > > > > This isnt an area I know about but what I vaguely recalled/can see is > > > that there was coincidentally a wire version bump in 2.18.0 as part of > > > other changes, see the ARTEMIS_2_18_0_VERSION constant in PacketImpl. > > > From that I would guess it should be possible for newer servers to > > > specifically tell whether they are connected to <=2.17.0 or >= > > > 2.18.0. Perhaps the new one could then handle the situation in some > > > way if the issue can be fixed from the new side only, by changing what > > > it sends and expects in the existing packet? > > > > > > If it can be handled that way, I doubt there would be appetite for > > > releasing fixes across all the superceded intermediate versions rather > > > than just the latest. It doesnt appear to be widely hit so far in > > > nearly a year, people using only any versions >=2.18.0 wont be > > > affected, and anyone not yet affected could become so should use a > > > more recent fixed release (or else can patch the old superceded > > > intermediate release with the fix themselves). > > > > > > On Fri, 15 Jul 2022 at 09:57, Jan Šmucr <jan.sm...@aimtecglobal.com> > > > wrote: > > > > > > > > Dear devs, > > > > > > > > I'd like to ask you for help with the communication incompatibility > > > > between pre-2.18.0 servers and the newer ones. What I've learned so far > > > > is that in 2.18.0 there's been a change in the > > > > REPLICATION_START_FINISH_SYNC packet, yet no new version of that packet > > > > has been introduced. There have been some additional data appended to > > > > that packet, so that newer servers expect older servers to send more > > > > data than they actually do, and older servers can't cope with the > > > > additional data they receive. The fact that until now nobody noticed > > > > that replication between pre-2.18.0 and post-2.18.0 does not work > > > > confuses me a little. > > > > > > > > Before learning the actual reason of the incompatibility, I have > > > > developed a test which would eventually pass after the issue has been > > > > fixed. But now I see that fixing it would mean releasing a set of at > > > > least five minor bugfix releases. Shall I even attempt? If not, will > > > > you accept at least the test suite so that nothing like that happens in > > > > the future? Also mentioning the incompatibility somewhere might help > > > > others as unfortunate as me. > > > > > > > > The WIP PR is here: https://github.com/apache/activemq-artemis/pull/4144 > > > > [https://opengraph.githubassets.com/1fef362275960b2364da60ecddb76ca361b56b67aca157a2a2d25e3145d32d99/apache/activemq-artemis/pull/4144]<https://github.com/apache/activemq-artemis/pull/4144> > > > > ARTEMIS-3767 Fix replication incompatibility between pre 2.18.0 and > > > > SNAPSHOT (WIP) by jsmucr · Pull Request #4144 · > > > > apache/activemq-artemis<https://github.com/apache/activemq-artemis/pull/4144> > > > > This PR attempts to solve the issue described in > > > > https://issues.apache.org/jira/browse/ARTEMIS-3767. TL;DR replication > > > > between =<2.17.0 and newer Artemis versions is broken since 2.18.0. > > > > github.com > > > > > > > > Thanks for your suggestions. > > > > > > > > Jan > > > > > > > > -- > > Clebert Suconic -- Clebert Suconic