[jira] [Commented] (CASSANDRA-6717) Modernize schema tables

Tyler Hobbs (JIRA) Wed, 08 Jul 2015 15:39:37 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619513#comment-14619513
 ]


Tyler Hobbs commented on CASSANDRA-6717:
----------------------------------------

The first commit looks reasonable overall to me.  Some preliminary comments:

LegacySchemaMigrator.java:
* {{migrate()}}: should log when migration has completed successfully
* {{checkNeedsUpgrade()}}: the comment explaining the check for the 
"supercolumn map" isn't immediately clear -- maybe point to the 
{{CompactTables}} javadoc or explicitly mention that the supercolumn map is the 
only way {{column_name}} can be empty
* {{decodeTableMetadata()}}: when would {{cf_id}} not be present?
* {{serializeKind()}} is a dupe of {{SchemaKeyspaces.deserializeKind()}}
* {{decodeTableMetadata()}} is a near dupe of 
{{SchemaKeyspaces.createTableFromTableRowAndColumnRows()}} -- can we unify 
those somewhat?
* In {{parseAggregateFunctionName()}}, there's the following comment: {{// 
function name can be abbreviated (pre 2.2rc2) - it is in the same keyspace as 
the aggregate}}.  It sounds like we don't allow that any more, so we don't need 
to handle it.
* Perhaps rename {{LegacySchemaTables}} to {{LEGACY_SCHEMA_TABLES}}? (Looks 
like a class name, currently.)
* typo in javadoc: infromation

SchemaKeyspaces.java: 
* This comment should be updated (or maybe just removed): "Note that because 
Keyspaces is a COMPACT TABLE, we're really only setting static columns 
internally and shouldn't set any clustering"

Schema.java:
* You added a call to {{cfs.indexManager.setIndexRemoved()}} in 
{{dropTable()}}.  Why do we need that there now?

LegacySchemaMigratorTest.java:
* Needed test coverage:
** Legacy schema tables are removed 
** New schema tables are written to with the correct timestamp
** Legacy schema tables don't exist in new schema tables
** Migrating tables in general, especially COMPACT ones
*** Null values for any optional fields
** Maybe UDTs that refer to other UDTs?
** NTS keyspaces

> Modernize schema tables
> -----------------------
>
>                 Key: CASSANDRA-6717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6717
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Sylvain Lebresne
>            Assignee: Aleksey Yeschenko
>              Labels: client-impacting
>             Fix For: 3.0 beta 1
>
>
> There is a few problems/improvements that can be done with the way we store 
> schema:
> # CASSANDRA-4988: as explained on the ticket, storing the comparator is now 
> redundant (or almost, we'd need to store whether the table is COMPACT or not 
> too, which we don't currently is easy and probably a good idea anyway), it 
> can be entirely reconstructed from the infos in schema_columns (the same is 
> true of key_validator and subcomparator, and replacing default_validator by a 
> COMPACT_VALUE column in all case is relatively simple). And storing the 
> comparator as an opaque string broke concurrent updates of sub-part of said 
> comparator (concurrent collection addition or altering 2 separate clustering 
> columns typically) so it's really worth removing it.
> # CASSANDRA-4603: it's time to get rid of those ugly json maps. I'll note 
> that schema_keyspaces is a problem due to its use of COMPACT STORAGE, but I 
> think we should fix it once and for-all nonetheless (see below).
> # For CASSANDRA-6382 and to allow indexing both map keys and values at the 
> same time, we'd need to be able to have more than one index definition for a 
> given column.
> # There is a few mismatches in table options between the one stored in the 
> schema and the one used when declaring/altering a table which would be nice 
> to fix. The compaction, compression and replication maps are one already 
> mentioned from CASSANDRA-4603, but also for some reason 
> 'dclocal_read_repair_chance' in CQL is called just 'local_read_repair_chance' 
> in the schema table, and 'min/max_compaction_threshold' are column families 
> option in the schema but just compaction options for CQL (which makes more 
> sense).
> None of those issues are major, and we could probably deal with them 
> independently but it might be simpler to just fix them all in one shot so I 
> wanted to sum them all up here. In particular, the fact that 
> 'schema_keyspaces' uses COMPACT STORAGE is annoying (for the replication map, 
> but it may limit future stuff too) which suggest we should migrate it to a 
> new, non COMPACT table. And while that's arguably a detail, it wouldn't hurt 
> to rename schema_columnfamilies to schema_tables for the years to come since 
> that's the prefered vernacular for CQL.
> Overall, what I would suggest is to move all schema tables to a new keyspace, 
> named 'schema' for instance (or 'system_schema' but I prefer the shorter 
> version), and fix all the issues above at once. Since we currently don't 
> exchange schema between nodes of different versions, all we'd need to do that 
> is a one shot startup migration, and overall, I think it could be simpler for 
> clients to deal with one clear migration than to have to handle minor 
> individual changes all over the place. I also think it's somewhat cleaner 
> conceptually to have schema tables in their own keyspace since they are 
> replicated through a different mechanism than other system tables.
> If we do that, we could, for instance, migrate to the following schema tables 
> (details up for discussion of course):
> {noformat}
> CREATE TYPE user_type (
>   name text,
>   column_names list<text>,
>   column_types list<text>
> )
> CREATE TABLE keyspaces (
>   name text PRIMARY KEY,
>   durable_writes boolean,
>   replication map<string, string>,
>   user_types map<string, user_type>
> )
> CREATE TYPE trigger_definition (
>   name text,
>   options map<tex, text>
> )
> CREATE TABLE tables (
>   keyspace text,
>   name text,
>   id uuid,
>   table_type text, // COMPACT, CQL or SUPER
>   dropped_columns map<text, bigint>,
>   triggers map<text, trigger_definition>,
>   // options
>   comment text,
>   compaction map<text, text>,
>   compression map<text, text>,
>   read_repair_chance double,
>   dclocal_read_repair_chance double,
>   gc_grace_seconds int,
>   caching text,
>   rows_per_partition_to_cache text,
>   default_time_to_live int,
>   min_index_interval int,
>   max_index_interval int,
>   speculative_retry text,
>   populate_io_cache_on_flush boolean,
>   bloom_filter_fp_chance double
>   memtable_flush_period_in_ms int,
>   PRIMARY KEY (keyspace, name)
> )
> CREATE TYPE index_definition (
>   name text,
>   index_type text,
>   options map<text, text>
> )
> CREATE TABLE columns (
>   keyspace text,
>   table text,
>   name text,
>   kind text, // PARTITION_KEY, CLUSTERING_COLUMN, REGULAR or COMPACT_VALUE
>   component_index int;
>   type text,
>   indexes map<text, index_definition>,
>   PRIMARY KEY (keyspace, table, name)
> )
> {noformat}
> Nit: wouldn't hurt to create a simple enum that is reuse by both CFMetaData 
> and CFPropDefs for table options names while we're at it once they are the 
> same instead of repeating string constants which is fragile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6717) Modernize schema tables

Reply via email to