[ 
https://issues.apache.org/jira/browse/CASSANDRA-15811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-15811:
------------------------------------
    Description: 
DROP COMPACT STORAGE was introduced in CASSANDRA-10857 as one of the steps to 
deprecate Thrift. However, current semantics of dropping compact storage flags 
from tables reveal several columns that are usually empty (colum1 and value in 
non-dense case, value for dense columns, and a column with an empty name for 
super column families). Showing these columns  can confuse application 
developers, especially ones that have never used thrift and/or made writes that 
assumed presence of those fields, and used compact storage in 3.x because is 
has “compact” in the name.

There’s not much we can do in a super column family case, especially 
considering there’s no way to create a supercolumn family using CQL, but we can 
improve dense and non-dense cases. We can scan stables and make sure there are 
no signs of thrift writes in them, and if all sstables conform to this rule, we 
can not only drop the flag, but also drop columns that are supposed to be 
hidden. However, this is both not very user-friendly, and is probably not worth 
development effort. 

An alternative to scanning is to add {{FORCE DROP COMPACT}} syntax (or 
something similar) that would just drop columns unconditionally. It is likely 
that people who were using compact storage with thrift know they were doing 
that, so they'll usually use "regular" {{DROP COMPACT}}, withouot force, that 
will simply reveal the columns as it does right now.

Since for fixing CASSANDRA-15778, and to allow EmptyType column to actually 
have data[*] we had to remove empty type validation, properly handling compact 
storage starts making more sense, but we’ll solve it through not having 
columns, hence not caring about values instead, or keeping values _and_ data, 
not requiring validation in this case. EmptyType field will have to be handled 
differently though.

[*] as it is possible to end up with sstables upgraded from 2.x or written in 
3.x before CASSANDRA-15373, which means not every 2.x upgraded or 3.x cluster 
is guaranteed to have empty values in this column, and this behaviour, even if 
undesired, might be used by people. 

Open question is: CASSANDRA-15373 adds validation to EmptyType that disallows 
any non-empty value to be written to it, but we already allow creating table 
via CQL, and still write data into it with thrift. It seems to have been 
unintended, but it might have become a feature people rely on. If we simply 
back port 15373 to 2.2 and 2.1, we’ll change and will break behaviour. Given 
no-one complained in 3.0 and 3.11, this assumption is unlikely though. 

  was:
DROP COMPACT STORAGE was introduced in CASSANDRA-10857 as one of the steps to 
deprecate Thrift. However, current semantics of dropping compact storage flags 
from tables reveal several columns that are usually empty (colum1 and value in 
non-dense case, value for dense columns, and a column with an empty name for 
super column families). Showing these columns  can confuse application 
developers, especially ones that have never used thrift and/or made writes that 
assumed presence of those fields, and used compact storage in 3.x because is 
has “compact” in the name.

There’s not much we can do in a super column family case, especially 
considering there’s no way to create a supercolumn family using CQL, but we can 
improve dense and non-dense cases. We can scan stables and make sure there are 
no signs of thrift writes in them, and if all sstables conform to this rule, we 
can not only drop the flag, but also drop columns that are supposed to be 
hidden. However, this is both not very user-friendly, and is probably not worth 
development effort. 

An alternative to scanning is to add FORCE DROP syntax (or something similar) 
that would just drop columns unconditionally. It is likely that people who were 
using compact storage with thrift know they were doing that, so they'll usually 
use "regular" DROP COMPACT, withouot force, that will simply reveal the columns 
as it does right now.

Since for fixing CASSANDRA-15778, and to allow EmptyType column to actually 
have data[*] we had to remove empty type validation, properly handling compact 
storage starts making more sense, but we’ll solve it through not having 
columns, hence not caring about values instead, or keeping values _and_ data, 
not requiring validation in this case. EmptyType field will have to be handled 
differently though.

[*] as it is possible to end up with sstables upgraded from 2.x or written in 
3.x before CASSANDRA-15373, which means not every 2.x upgraded or 3.x cluster 
is guaranteed to have empty values in this column, and this behaviour, even if 
undesired, might be used by people. 

Open question is: CASSANDRA-15373 adds validation to EmptyType that disallows 
any non-empty value to be written to it, but we already allow creating table 
via CQL, and still write data into it with thrift. It seems to have been 
unintended, but it might have become a feature people rely on. If we simply 
back port 15373 to 2.2 and 2.1, we’ll change and will break behaviour. Given 
no-one complained in 3.0 and 3.11, this assumption is unlikely though. 


> Improve DROP COMPACT STORAGE
> ----------------------------
>
>                 Key: CASSANDRA-15811
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15811
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Alex Petrov
>            Priority: Normal
>
> DROP COMPACT STORAGE was introduced in CASSANDRA-10857 as one of the steps to 
> deprecate Thrift. However, current semantics of dropping compact storage 
> flags from tables reveal several columns that are usually empty (colum1 and 
> value in non-dense case, value for dense columns, and a column with an empty 
> name for super column families). Showing these columns  can confuse 
> application developers, especially ones that have never used thrift and/or 
> made writes that assumed presence of those fields, and used compact storage 
> in 3.x because is has “compact” in the name.
> There’s not much we can do in a super column family case, especially 
> considering there’s no way to create a supercolumn family using CQL, but we 
> can improve dense and non-dense cases. We can scan stables and make sure 
> there are no signs of thrift writes in them, and if all sstables conform to 
> this rule, we can not only drop the flag, but also drop columns that are 
> supposed to be hidden. However, this is both not very user-friendly, and is 
> probably not worth development effort. 
> An alternative to scanning is to add {{FORCE DROP COMPACT}} syntax (or 
> something similar) that would just drop columns unconditionally. It is likely 
> that people who were using compact storage with thrift know they were doing 
> that, so they'll usually use "regular" {{DROP COMPACT}}, withouot force, that 
> will simply reveal the columns as it does right now.
> Since for fixing CASSANDRA-15778, and to allow EmptyType column to actually 
> have data[*] we had to remove empty type validation, properly handling 
> compact storage starts making more sense, but we’ll solve it through not 
> having columns, hence not caring about values instead, or keeping values 
> _and_ data, not requiring validation in this case. EmptyType field will have 
> to be handled differently though.
> [*] as it is possible to end up with sstables upgraded from 2.x or written in 
> 3.x before CASSANDRA-15373, which means not every 2.x upgraded or 3.x cluster 
> is guaranteed to have empty values in this column, and this behaviour, even 
> if undesired, might be used by people. 
> Open question is: CASSANDRA-15373 adds validation to EmptyType that disallows 
> any non-empty value to be written to it, but we already allow creating table 
> via CQL, and still write data into it with thrift. It seems to have been 
> unintended, but it might have become a feature people rely on. If we simply 
> back port 15373 to 2.2 and 2.1, we’ll change and will break behaviour. Given 
> no-one complained in 3.0 and 3.11, this assumption is unlikely though. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to