[ https://issues.apache.org/jira/browse/CASSANDRA-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921661#comment-16921661 ]
Charlemange Lasse commented on CASSANDRA-15298: ----------------------------------------------- How should sstablesscrub know about deleted columns? It doesn't have the schema (and no information what is dropped and what is not), right? And doing a "nodetool scrub" before making the snapshot doesn't remove the columns. Which means that the restore will also not work with the snapshot on the other node. And I am not the only person running in this problem and so I would say that the documentation is misleading. [~cassio rossi], [~jjordan], ... > Cassandra node cannot be restored using documented backup method > ---------------------------------------------------------------- > > Key: CASSANDRA-15298 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15298 > Project: Cassandra > Issue Type: Bug > Reporter: Charlemange Lasse > Priority: Normal > > I have a single cassandra 3.11.4 node. It contains various tables and UDFs. > The [documentation describes a method to backup this > node|https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsBackupTakesSnapshot.html]: > * use "DESCRIBE SCHEMA" in cqlsh to get the schema > * create a snapshot using nodetool > * copy the snapshot + schema to a new (completely disconnected) node > * load schema into new node > * load sstables again using nodetool > But this is a complete bogus method. It will result in errors like: > > {noformat} > java.lang.RuntimeException: Unknown column deleted_column during > deserialization {noformat} > And all data in this column is now lost. > Problem is that the "DESCRIBE SCHEMA" CQL doesn't add the stuff correctly for > already deleted (but still existing columns) to the schema. It looks for > example like: > {noformat} > CREATE TABLE mykeyspace.testcf ( > primary_uuid uuid, > secondary_uuid uuid, > name text, > PRIMARY KEY (main_uuid, secondary_uuid) > ) WITH CLUSTERING ORDER BY (secondary_uuid ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE'; > {noformat} > But it must actually look like: > {noformat} > CREATE TABLE IF NOT EXISTS mykeyspace.testcf ( > primary_uuid uuid, > secondary_uuid uuid, > name text, > deleted_column boolean, > PRIMARY KEY (main_uuid, secondary_uuid) > WITH ID = a1afdd4d-b61e-4f2a-b806-57c296be3948 > AND CLUSTERING ORDER BY (ap_uuid ASC) > AND bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.1 > AND crc_check_chance = 1.0 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND min_index_interval = 128 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE' > AND comment = '' > AND caching = { 'keys': 'ALL', 'rows_per_partition': 'NONE' } > AND compaction = { 'max_threshold': '32', 'min_threshold': '4', > 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' } > AND compression = { 'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor' } > AND cdc = false > AND extensions = { }; > ALTER TABLE mykeyspace.testcf DROP deleted_column USING TIMESTAMP > 1563978151561000; > {noformat} > This was taken from the snapshot's (column family specific) schema.cql. Which > of course is not compatible with the main schema because it will only create > the tables when they don't exist (which they are because the main "DESCRIBE > SCHEMA" file already creates them) and is missing all other kind of stuff > like UDFs. > It is currently not possible (using the builtin mechanisms from cassandra > 3.11.4) to migrate a keyspace from one separated server to another separated > server. > This behavior also breaks various backup systems which try to store cassandra > cluster information to an offline storage. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org