[ https://issues.apache.org/jira/browse/CASSANDRA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17187616#comment-17187616 ]
Sylvain Lebresne edited comment on CASSANDRA-16063 at 8/31/20, 10:04 AM: ------------------------------------------------------------------------- Ok, I hadn't understood the strategy you described, and in particular why skipping errors during commit replay was part of this. To rephrase, in case that helps other people as dense as me, the strategy implemented by the current patches is that if a users starts 4.0 with some compact tables, the server will not start but may (will?) write some CL segments. The user is then asked to restart on 3.x with a special startup flag that makes it so that if replaying those 4.0 CL segments fails, the server ignores those errors. But as mentioned in my previous comment, this is not an ideal experience (it also makes it harder to convince oneself that it is safe). Ideally, users should not have to pass special flags when restarting 3.x. So back to my question above, what is technically preventing us to do that startup check before any CL segment is written? was (Author: slebresne): Ok, I hadn't understood the strategy you described, and in particular was skipping errors during commit replay was part of this. To rephrase, in case that helps other people as dense as me, the strategy implemented by the current patches is that if a users starts 4.0 with some compact tables, the server will not start but may (will?) write some CL segments. The user is then asked to restart on 3.x with a special startup flag that makes it so that if replaying those 4.0 CL segments fails, the server ignores those errors. But as mentioned in my previous comment, this is not an ideal experience (it also makes it harder to convince oneself that it is safe). Ideally, users should not have to pass special flags when restarting 3.x. So back to my question above, what is technically preventing us to do that startup check before any CL segment is written? > Fix user experience when upgrading to 4.0 with compact tables > ------------------------------------------------------------- > > Key: CASSANDRA-16063 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16063 > Project: Cassandra > Issue Type: Bug > Components: Legacy/CQL > Reporter: Sylvain Lebresne > Assignee: Ekaterina Dimitrova > Priority: Normal > Fix For: 4.0-beta > > > The code to handle compact tables has been removed from 4.0, and the intended > upgrade path to 4.0 for users having compact tables on 3.x is that they must > execute {{ALTER ... DROP COMPACT STORAGE}} on all of their compact tables > *before* attempting the upgrade. > Obviously, some users won't read the upgrade instructions (or miss a table) > and may try upgrading despite still having compact tables. If they do so, the > intent is that the node will _not_ start, with a message clearly indicating > the pre-upgrade step the user has missed. The user will then downgrade back > the node(s) to 3.x, run the proper {{ALTER ... DROP COMPACT STORAGE}}, and > then upgrade again. > But while 4.0 does currently fail startup when finding any compact tables > with a decent message, I believe the check is done too late during startup. > Namely, that check is done as we read the tables schema, so within > [{{Schema.instance.loadFromDisk()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/CassandraDaemon.java#L241]. > But by then, we've _at least_ called > {{SystemKeyspace.persistLocalMetadata()}}} and > {{SystemKeyspaceMigrator40.migrate()}}, which will get into the commit log, > and even possibly flush new {{na}} format sstables. As a results, a user > might not be able to seemlessly restart the node on 3.x (to drop compact > storage on the appropriate tables). > Basically, we should make sure the check for compact tables done at 4.0 > startup is done as a {{StartupCheck}}, before the node does anything. > We should also add a test for this (checking that if you try upgrading to 4.0 > with compact storage, you can downgrade back with no intervention whatsoever). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org