Re: Downgradability

C. Scott Andreas Tue, 21 Feb 2023 08:17:02 -0800
I realize my feedback on this has been spread across tickets and older mailing list / wiki discussions, so I'll offer a proposal here.Starting with goals -1. Cassandra users must be able to abort and 
revert an upgrade to a new version of the database that introduces a new major SSTable format.This reduces risk of upgrading to a build that also introduces a non-data-format-related bug that is 
intolerable. This goal does not specify a mechanism or downgrade target - just the "downgradability" goal.2. Where possible, Cassandra users should be able to opt into writing of a new major 
SSTable format.This reduces that risk further by allowing users to decouple data format changes from the upgrade itself. There may be cases where new features or bug fixes prevent this from being possible, 
but I'll offer it as a goal.3. It should be possible for users to perform the downgrade in-place by launching the database using a previous version's binary.This avoids the need for complex orchestration 
of offline commands like a hypothetical `downgradesstables`.The following approach would allow us to accomplish these goals:1. Major SSTable changes should begin with forward-compatibility in a prior 
release.In a release prior to one that revs major SSTable versions, we should implement the ability to read the SSTables that we intend to write in the next major version. This would allow someone to 
(eg.,) revert from 5.0 to 4.2 if they encountered a regression that caused an outage without data loss. This downgrade path should be well-specified and called out in NEWS.txt.2. Where possible, major 
SSTable format changes should be opt-in (if the features / bugfixes introduced allow).This would be via a flag to enable writing the new format once an operator has determined that post-upgrade their 
clusters are sufficiently stable. This is an approach that HDFS has adopted. Following a rolling upgrade of HDFS, downgrade remains possible until an operator executes a "finalize" operation to 
migrate NameNode metadata to the new version's. An approach like this would allow users to perform a staged upgrade in which they first rev the version of the database, followed by opting into its new 
format to derisk (eg.,) adoption of BTI-indexed SSTables.These approaches aren't meant to discourage SSTable format evolution - but to make it safer, and ideally faster. They don't specify duplicative 
serialization or a game of Twister to hide fields in locations where old versions don't think to look. Forward compatibility in a prior release could be landed at the same time as the major format revision 
itself, so long as we cut releases from both branches.Ability to back out an upgrade until finalized would dramatically lower the risk of adopting new releases of Apache Cassandra. For many users, the 
qualification cycle for a new release is more than a year - and a *lot* of work.Reducing the risk of upgrading to new releases repositions Cassandra as a database that can be treated with greater trust -- 
especially for multi-petabyte, mission critical systems. Our user community will advance to newer releases more quickly and we'll be able to shorten the maintenance cycles for older releases. In the same 
way that CI stability enables us to move faster and more confidently in the project, safety features like this will enable our users (and indeed ourselves) to move more confidently to adopt them.– ScottOn 
Feb 21, 2023, at 4:51 AM, "Claude Warren, Jr via dev" <dev@cassandra.apache.org> wrote:My goal in implementing CASSANDRA-8928 was to be able to take the current version 4.x and write it as 
the earliest 3.x version possible.  The reasoning being that if that was possible then whatever 3.x version was executed would be able to automatically read the early 3.x version.  My thought was that each 
release version would have the ability to downgrade to the earliest previous version.  In this way if users need to they could string together a number of downgrader versions to move from 5.x to 3.x. My 
testing has been pretty straightforward, I created 4 docker containers using the standard published Cassandra docker images for 3.1 and 4.0 with data mounted on an external drive.  two of the containers 
(one of each version) did not automatically start Cassandra.  My process was then:start and stop Cassandra 4.0 to create the default data filesstart the Cassandra 4.0 container that does not automatically 
run Cassandra and execute the new downgrade functionality.start the Cassandra 3.1 container and dump the logs.  If the system started then I knew that I at least had a proof of concept.  So far no-go.On 
Tue, Feb 21, 2023 at 8:57 AM Branimir Lambov <branimir.lam...@datastax.com> wrote:It appears to me that the first thing we need to start this feature off is a definition of a suite of tests together 
with a set of rules to keep the suite up to date with new features as they are introduced. The moment that suite is in place, we can start having some confidence that we can enforce 
downgradability.Something like this will definitely catch incompatibilities in SSTable formats (such as the one in CASSANDRA-17698 that I managed to miss during review), but will also be able to identify 
incompatible system schema changes among others, and at the same time rightfully ignore non-breaking changes such as modifications to the key cache serialization formats.Is downgradability in scope for 
5.0? It is a feature like any other, and I don't see any difficulty adding it (with support for downgrade to 4.x) a little later in the 5.x timeline.Regards,BranimirOn Tue, Feb 21, 2023 at 9:40 AM Jacek 
Lewandowski <lewandowski.ja...@gmail.com> wrote:I'd like to mention CASSANDRA-17056 (CEP-17) here as it aims to introduce multiple sstable formats support. It allows for providing an implementation 
of SSTableFormat along with SSTableReader and SSTableWriter. That could be extended easily to support different implementations for certain version ranges, like one impl for ma-nz, other for oa+, etc. 
without having a confusing implementation with a lot of conditional blocks. Old formats in such case could be maintained separately from the main code and easily switched any time. thanks- - -- --- ----- 
-------- -------------Jacek Lewandowskiwt., 21 lut 2023 o 01:46 Yuki Morishita <yu...@apache.org> napisał(a):Hi,What I wanted to address in my comment in 
CASSANDRA-8110(https://issues.apache.org/jira/browse/CASSANDRA-8110?focusedCommentId=17641705&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17641705) is to focus on 
better upgrade experience.Upgrading the cluster can be painful for some orgs with mission critical Cassandra cluster, where they cannot tolerate less availability because of the inability to replace the 
downed node.They also need to plan rolling back to the previous state when something happens along the way.The change I proposed in CASSANDRA-8110 is to achieve the goal of at least enabling SSTable 
streaming during the upgrade by not upgrading the SSTable version. This can make the cluster to easily rollback to the previous version.Downgrading SSTable is not the primary focus (though Cassandra needs 
to implement the way to write SSTable in older versions, so it is somewhat related.)I'm preparing the design doc for the change.Also, if I should create a separate ticket from CASSANDRA-8110 for the 
clarity of the goal of the change, please let me know.On Tue, Feb 21, 2023 at 5:31 AM Benedict <bened...@apache.org> wrote:FWIW I think 8110 is the right approach, even if it isn’t a panacea. We will 
have to eventually also tackle system schema changes (probably not hard), and may have to think a little carefully about other things, eg with TTLs the format change is only the contract about what values 
can be present, so we have to make sure the data validity checks are consistent with the format we write. It isn’t as simple as writing an earlier version in this case (unless we permit truncating the TTL, 
perhaps) On 20 Feb 2023, at 20:24, Benedict <bened...@apache.org> wrote:In a self-organising community, things that aren’t self-policed naturally end up policed in an adhoc manner, and with 
difficulty. I’m not sure that’s the same as arbitrary enforcement. It seems to me the real issue is nobody noticed this was agreed and/or forgot and didn’t think about it much. But, even without any prior 
agreement, it’s perfectly reasonable to request that things do not break compatibility if they do not need to, as part of the normal patch integration process.Issues with 3.1->4.0 aren’t particularly 
relevant as they predate any agreement to do this. But we can and should address the problem of new columns in schema tables, as this happens often between versions. I’m not sure it has in 4.1 
though?Regarding downgrade versions, surely this should simply be the same as upgrade versions we support?On 20 Feb 2023, at 20:02, Jeff Jirsa <jji...@gmail.com> wrote:I'm not even convinced even 
8110 addresses this - just writing sstables in old versions won't help if we ever add things like new types or new types of collections without other control abilities. Claude's other email in another 
thread a few hours ago talks about some of these surprises - "Specifically during the 3.1 -> 4.0 changes a column broadcast_port was added to system/local.  This means that 3.1 system can not read 
the table as it has no definition for it.  I tried marking the column for deletion in the metadata and in the serialization header.  The later got past the column not found problem, but I suspect that it 
just means that data columns after broadcast_port shifted and so incorrectly read." - this is a harder problem to solve than just versioning sstables and network protocols. Stepping back a bit, we 
have downgrade ability listed as a goal, but it's not (as far as I can tell) universally enforced, nor is it clear at which point we will be able to concretely say "this release can be downgraded to 
X".   Until we actually define and agree that this is a real goal with a concrete version where downgrade-ability becomes real, it feels like things are somewhat arbitrarily enforced, which is 
probably very frustrating for people trying to commit work/tickets.- JeffOn Mon, Feb 20, 2023 at 11:48 AM Dinesh Joshi <djo...@apache.org> wrote:I’m a big fan of maintaining backward compatibility. 
Downgradability implies that we could potentially roll back an upgrade at any time. While I don’t think we need to retain the ability to downgrade in perpetuity it would be a good objective to maintain 
strict backward compatibility and therefore downgradability until a certain point. This would imply versioning metadata and extending it in such a way that prior version(s) could continue functioning. This 
can certainly be expensive to implement and might bloat on-disk storage. However, we could always offer an option for the operator to optimize the on-disk structures for the current version then we can 
rewrite them in the latest version. This optimizes the storage and opens up new functionality. This means new features that can work with old on-disk structures will be available while others that strictly 
require new versions of the data structures will be unavailable until the operator migrates to the new version. This migration IMO should be irreversible. Beyond this point the operator will lose the 
ability to downgrade which is ok.DineshOn Feb 20, 2023, at 10:40 AM, Jake Luciani <jak...@gmail.com> wrote:There has been progress on 
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8928Which is similar to what datastax does for DSE. Would this be an acceptable solution?Jake On Mon, Feb 20, 2023 at 11:17 AM guo 
Maxwell <cclive1...@gmail.com> wrote:It seems “An alternative solution is to implement/complete CASSANDRA-8110” can give us more options if it is finished😉Branimir Lambov 
<blam...@apache.org>于2023年2月20日 周一下午11:03写道：Hi everyone,There has been a discussion lately about changes to the sstable format in the context of being able to abort a cluster upgrade, and the fact 
that changes to sstables can prevent downgraded nodes from reading any data written during their temporary operation with the new version.Most of the discussion is in CASSANDRA-18134, and is spreading into 
CASSANDRA-14277 and CASSANDRA-17698, none of which is a good place to discuss the topic seriously.Downgradability is a worthy goal and is listed in the current roadmap. I would like to open a discussion 
here on how it would be achieved.My understanding of what has been suggested so far translates to:- avoid changes to sstable formats;- if there are changes, implement them in a way that is 
backwards-compatible, e.g. by duplicating data, so that a new version is presented in a component or portion of a component that legacy nodes will not try to read;- if the latter is not feasible, make sure 
the changes are only applied if a feature flag has been enabled.To me this approach introduces several risks:- it bloats file and parsing complexity;- it discourages improvement (e.g. CASSANDRA-17698 is no 
longer a LHF ticket once this requirement is in place);- it needs care to avoid risky solutions to address technical issues with the format versioning (e.g. staying on n-versions for 5.0 and needing a bump 
for a 4.1 bugfix might require porting over support for new features);- it requires separate and uncoordinated solutions to the problem and switching mechanisms for each individual change.An alternative 
solution is to implement/complete CASSANDRA-8110, which provides a method of writing sstables for a target version. During upgrades, a node could be set to produce sstables corresponding to the older 
version, and there is a very straightforward way to implement modifications to formats like the tickets above to conform to its requirements. What do people think should be the way 
forward?Regards,Branimir-- you are the apple of my eye !-- http://twitter.com/tjake-- Branimir Lambove. branimir.lam...@datastax.com w. www.datastax.com
Re: Downgradability

Reply via email to