Hi, Thanks for your suggestion, Anthony. I gave it a try and the rolling downgrade worked because peers are prepared to exchange messages in different versions by means of the toDataPre_GEODE_X_X_X_X() fromDataPre_GEODE_X_X_X_X mechanism.
After all my investigation on this topic I have come to the conclusion that the main problem to rolling downgrades is the persistent files compatibility. If the persistent files format does not change between versions (not likely to happen) then the rolling downgrade should be straightforward. But, given that this is not something that can be guaranteed, in order to support the rolling upgrades it would be necessary (among other things) to provide a tool to convert files from the new to the old version because, in general, older members will not be able to read files written with newer members. Newer members are normally prepared to read older version files as it is noted in the reference you sent about Geode backward compatibility but not the other way around (as they would have to know the future :-)). In my tests, I ran into a problem when trying to perform a rolling downgrade from version 1.11 to 1.10 because the format of the view files for locators had changed due to GEODE-7090. Nevertheless, I managed to do a rolling downgrade from version 1.10 to 1.9 because there were no changes to the format of persistent files. If anybody could share any other insight on this subject, it would be appreciated. Best regards, /Alberto G. On 9/10/19 21:50, Anthony Baker wrote: Hi Alberto! Another experiment that might be useful to try is changing a p2p message following [1]. If you follow the steps in the wiki, a rolling upgrade should work ok. But if you then try to do a rolling downgrade, what happens? Anthony [1] https://cwiki.apache.org/confluence/display/GEODE/Managing+Backward+Compatibility On Sep 26, 2019, at 9:37 AM, Alberto Gomez <alberto.go...@est.tech><mailto:alberto.go...@est.tech> wrote: Hi again, I have been investigating a bit more the possibility of supporting "rolling downgrades" in Geode similar to rolling upgrades and I would like to share my findings and also ask for some help. My tests were done upgrading from Geode 1.10 to a recent version in the develop branch and rolling back (downgrading) to 1.10. I was using one locator and two servers. I am sure my findings would have been different if I used other Geode versions or another configuration. By doing some changes in code, I managed to rollback the servers but I got into trouble when starting the old locator. The changes I did where the following: - I removed the check for equality for the local and remote versions of Geode in ConnectCommand::connect() so that it was allowed to connect to Geode with a newer or older version of gfsh. - I started the locators and servers with the gemfire.allow_old_members_to_join_for_testing property to allow old members to join a newer Geode system. - I changed Version::fromOrdinal method to return CURRENT instead of throwing an exception when the ordinal passed corresponds to a version not supported. I had to do this change in order for old servers to be able to progress when reading oplogs generated by newer servers. After downgrading the servers successfully, I stopped the new locator, started the old one (with the old gfsh) and got an exception in the locator when reading from the view file: The Locator process terminated unexpectedly with exit status 1. Please refer to the log file in /home/alberto/geode/geode-releases/apache-geode-1.0.0/locator1 for full details. Exception in thread "main" org.apache.geode.InternalGemFireException: Unable to recover previous membership view from /home/alberto/geode/geode-releases/apache-geode-1.10.0/locator1/locator10334view.dat at org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.recoverFromFile(GMSLocator.java:492) ... Caused by: java.io.StreamCorruptedException: invalid type code: 02 at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2871) ... I think the problem is in the deserialization due to the fact that the format of the locator's view file has changed between both Geode versions after GEODE-7090. This leads me to think that I might have been successful in the "rolling downgrade" if I had selected other versions of Geode or I might have run into a different set of problems. After this research I would like to get some feedback from the community on the following questions: - Would it be reasonable to restrict future changes in Geode between minor versions so that the rolling downgrade is supported? This would imply that changes such as the one done in GEODE-7090 would not be allowed for a minor version change. - Could the changes in code and configuration I have done in my tests to support the "rolling downgrade" have any negative secondary effects which should dissuade us from using them? - Are there any other things I have not taken into account that would require changes in order to support rolling upgrades? - Is it even feasible to implement "rolling downgrades" of Geode with some restrictions or there are always possible incompatibilities between versions that make it impossible or unreasonably hard to support this kind of feature? Thanks in advance for your help, -Alberto G. On 23/9/19 17:04, Alberto Gomez wrote: Hi Anthony, That's an option but, as you say, the cost in infrastructure is high and there are also other problems to solve like how to do the switch between systems and how to assure the data consistency among them. I was thinking that in many cases it might be possible to support a rolling downgrade similar to the rolling upgrade given that the rolling upgrade already allows the coexistence of old and new members in a cluster. -Alberto On 23/9/19 15:55, Anthony Baker wrote: Have you considered using a blue / green deployment approach? It provides more flexibility for these scenarios though the infrastructure cost is high. Anthony On Sep 23, 2019, at 5:59 AM, Alberto Gomez <alberto.go...@est.tech><mailto:alberto.go...@est.tech> wrote: Hi, Looking at the Geode documentation I have not found any reference to rolling back a Geode upgrade. Running some tests, I have observed that once a Geode System has been upgraded to a later version, it is not possible to rollback the upgrade even if no data modifications have been done after the upgrade. The system protects itself in several places: gfsh does not allow you to connect to a newer version of Geode, the Oplog files store the version of the server which prevents an older server to start from a file from a newer server, the cluster also does not allow older members to join a cluster with newer members and there are probably other protections I did not hit. Even if you tamper with some of those protections, you can run into trouble due to compatibility issues. I ran into one when I lifted up the requirement to have the same gfsh versions using versions 1.8 and 1.10 because it seems there is some configuration exchanged in Json format whose format has changed between those two versions. My question is that if it has ever been considered to support rollback of Geode upgrades (preferably in rolling mode), at least between systems under the same major version. In our experience customers often require the rollback of upgrades. Thanks in advance for your help, -Alberto G.