Re: Schema inconsistency in mixed-version cluster

Sebastian Marsching Tue, 12 Dec 2023 14:05:21 -0800

> I assume these are column names of a non-system table.
> 
This is correct. It is one of our application tables. The table has the 
following schema:


CREATE TABLE pv_archive.channels (
    channel_name text,
    decimation_level int,
    bucket_start_time bigint,
    channel_data_id uuid static,
    control_system_type text static,
    server_id uuid static,
    decimation_levels set<int> static,
    bucket_end_time bigint,
    PRIMARY KEY (channel_name, decimation_level, bucket_start_time)
) WITH CLUSTERING ORDER BY (decimation_level ASC, bucket_start_time ASC)
    AND additional_write_policy = '99p'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND cdc = false
    AND comment = ''
    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND memtable = 'default'
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND extensions = {}
    AND gc_grace_seconds = 2592000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99p';

> From the stack trace, this looks like an error from a node which was running 
> 4.1.3, and this node was not the coordinator for this query.
> 
> I did some research and found these bug reports which may be related:
> 
> CASSANDRA-15899 <https://issues.apache.org/jira/browse/CASSANDRA-15899> 
> Dropping a column can break queries until the schema is fully propagated
> CASSANDRA-16735 <https://issues.apache.org/jira/browse/CASSANDRA-16735> 
> Adding columns via ALTER TABLE can generate corrupt sstables
> The solution for CASSANDRA-16735 was to revert CASSANDRA-15899, according to 
> the comments in the ticket. 
> 
> This does look like CASSANDRA-15899 is back, but I can't see why it was only 
> happening when the nodes were running mixed versions, and then stopped after 
> all nodes were upgraded.
> 
I do not think that it is either of these bugs. These bugs occurred after 
altering a table, but I can say with certainty that this table has never been 
altered after it was created years ago.

It must be a very strange bug where C* somehow gets confused about the schema 
for a table during an upgrade, even when the schema for this table did not 
change. I wonder whether it might have anything to do with the use of static 
columns…

We have a second cluster that is using a setup that is pretty much identical 
and that we have not upgraded yet. We are now scheduling a bit of downtime for 
the upgrade there. As that cluster is rather small (only six nodes), upgrading 
the whole cluster should not take very long.

It will be interesting to see, whether the problem will appear there to. If it 
doesn’t, this might have been some kind of freak accident that might not 
warrant further investigation. If it happens again, I might be able to collect 
more information.

smime.p7s
Description: S/MIME cryptographic signature

Re: Schema inconsistency in mixed-version cluster

Reply via email to