[ 
https://issues.apache.org/jira/browse/CASSANDRA-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921665#comment-16921665
 ] 

Jeremy Hanna edited comment on CASSANDRA-15298 at 9/3/19 7:11 PM:
------------------------------------------------------------------

I was just speaking with Jeremiah about this as well and you're right, scrub 
requires the schema to do the work.  The correct schema is recorded in the 
schema.cql in the snapshot directory when a snapshot is created.  So to your 
original question about the normal flow, that can be used by users and tools to 
recreate the schema in a way for the restore to work properly.  The schema 
describe using cql isn't going to contain that information though, like you say.

And you're correct - the documentation is misleading, I was just saying that if 
you haven't dropped columns in the past, then the documentation is correct.  We 
should update the documentation to include that a nodetool snapshot creates a 
schema.cql file that should be used when restoring the data - rather than 
relying on the schema describe.


was (Author: jeromatron):
I was just speaking with Jeremiah about this as well and you're right, scrub 
requires the schema to do the work.  The schema is recorded in the schema.cql 
that is used when doing the backup though, so to your original question about 
the normal flow, it sounds like that can be used by users and tools to recreate 
the schema in a way for the restore to work properly.  The schema describe 
using cql isn't going to contain that information though, like you say.

And you're correct - the documentation is misleading, I was just saying that if 
you haven't dropped columns in the past, then the documentation is correct.  We 
should probably update the documentation to include that a nodetool snapshot 
creates a schema.cql file that should be used when restoring the data - rather 
than relying on the schema describe.

> Cassandra node cannot be restored using documented backup method
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-15298
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15298
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Charlemange Lasse
>            Priority: Normal
>
> I have a single cassandra 3.11.4 node. It contains various tables and UDFs. 
> The [documentation describes a method to backup this 
> node|https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsBackupTakesSnapshot.html]:
>  * use "DESCRIBE SCHEMA" in cqlsh to get the schema
>  * create a snapshot using nodetool
>  * copy the snapshot + schema to a new (completely disconnected) node
>  * load schema into new node
>  * load sstables again using nodetool
> But this is a complete bogus method. It will result in errors like: 
>  
> {noformat}
> java.lang.RuntimeException: Unknown column deleted_column during 
> deserialization {noformat}
> And all data in this column is now lost.
> Problem is that the "DESCRIBE SCHEMA" CQL doesn't add the stuff correctly for 
> already deleted (but still existing columns) to the schema. It looks for 
> example like:
> {noformat}
> CREATE TABLE mykeyspace.testcf (
>     primary_uuid uuid,
>     secondary_uuid uuid,
>     name text,
>     PRIMARY KEY (main_uuid, secondary_uuid)
> ) WITH CLUSTERING ORDER BY (secondary_uuid ASC)
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND comment = ''
>     AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
>     AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND crc_check_chance = 1.0
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99PERCENTILE';
> {noformat}
> But it must actually look like:
> {noformat}
> CREATE TABLE IF NOT EXISTS mykeyspace.testcf (
>         primary_uuid uuid,
>         secondary_uuid uuid,
>         name text,
>         deleted_column boolean,
>         PRIMARY KEY (main_uuid, secondary_uuid)
>         WITH ID = a1afdd4d-b61e-4f2a-b806-57c296be3948
>         AND CLUSTERING ORDER BY (ap_uuid ASC)
>         AND bloom_filter_fp_chance = 0.01
>         AND dclocal_read_repair_chance = 0.1
>         AND crc_check_chance = 1.0
>         AND default_time_to_live = 0
>         AND gc_grace_seconds = 864000
>         AND min_index_interval = 128
>         AND max_index_interval = 2048
>         AND memtable_flush_period_in_ms = 0
>         AND read_repair_chance = 0.0
>         AND speculative_retry = '99PERCENTILE'
>         AND comment = ''
>         AND caching = { 'keys': 'ALL', 'rows_per_partition': 'NONE' }
>         AND compaction = { 'max_threshold': '32', 'min_threshold': '4', 
> 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' }
>         AND compression = { 'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor' }
>         AND cdc = false
>         AND extensions = {  };
> ALTER TABLE mykeyspace.testcf DROP deleted_column USING TIMESTAMP 
> 1563978151561000;
> {noformat}
> This was taken from the snapshot's (column family specific) schema.cql. Which 
> of course is not compatible with the main schema because it will only create 
> the tables when they don't exist (which they are because the main "DESCRIBE 
> SCHEMA" file already creates them) and is missing all other kind of stuff 
> like UDFs.
> It is currently not possible (using the builtin mechanisms from cassandra 
> 3.11.4) to migrate a keyspace from one separated server to another separated 
> server.
> This behavior also breaks various backup systems which try to store cassandra 
> cluster information to an offline storage.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to