[jira] [Commented] (CASSANDRA-8354) A better story for dealing with empty values
[ https://issues.apache.org/jira/browse/CASSANDRA-8354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225968#comment-14225968 ] Sylvain Lebresne commented on CASSANDRA-8354: - Thing is, I'm not sure how we could properly convert people out of those empty values completely without breaking thrift compatibility. I'm typically not sure how that {{strict_cql_values}} option would work in practice (would that be a global yaml option that affects thrift too btw?). That said, I've kind of mixed two issues in this ticket. The main reason I've opened this was the UDF question, but I realize that this question is actually already a problem with {{null}} and so I've created a separate issue for it (CASSANDRA-8374). Provided we fix that latter issue, it's probably ok for UDT to consider that empty values (for types for which they are not reasonable values) are always converted to {{null}} (which is already how it works in fact). Still, it would be nice to change the default for CQL so that empty values are refused. I'm just not sure I see how to make that happen in practice without a syntax addition. A better story for dealing with empty values Key: CASSANDRA-8354 URL: https://issues.apache.org/jira/browse/CASSANDRA-8354 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Fix For: 3.0 In CQL, a value of any type can be empty, even for types for which such values doesn't make any sense (int, uuid, ...). Note that it's different from having no value (i.e. a {{null}}). This is due to historical reasons, and we can't entirely disallow it for backward compatibility, but it's pretty painful when working with CQL since you always need to be defensive about such largely non-sensical values. This is particularly annoying with UDF: those empty values are represented as {{null}} for UDF and that plays weirdly with UDF that use unboxed native types. So I would suggest that we introduce variations of the types that don't accept empty byte buffers for those type for which it's not a particularly sensible value. Ideally we'd use those variant by default, that is: {noformat} CREATE TABLE foo (k text PRIMARY, v int) {noformat} would not accept empty values for {{v}}. But {noformat} CREATE TABLE foo (k text PRIMARY, v int ALLOW EMPTY) {noformat} would. Similarly, for UDF, a function like: {noformat} CREATE FUNCTION incr(v int) RETURNS int LANGUAGE JAVA AS 'return v + 1'; {noformat} would be guaranteed it can only be applied where no empty values are allowed. A function that wants to handle empty values could be created with: {noformat} CREATE FUNCTION incr(v int ALLOW EMPTY) RETURNS int ALLOW EMPTY LANGUAGE JAVA AS 'return (v == null) ? null : v + 1'; {noformat} Of course, doing that has the problem of backward compatibility. One option could be to say that if a type doesn't accept empties, but we do have an empty internally, then we convert it to some reasonably sensible default value (0 for numeric values, the smallest possible uuid for uuids, etc...). This way, we could allow convesion of types to and from 'ALLOW EMPTY'. And maybe we'd say that existing compact tables gets the 'ALLOW EMPTY' flag for their types by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8354) A better story for dealing with empty values
[ https://issues.apache.org/jira/browse/CASSANDRA-8354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224854#comment-14224854 ] Sylvain Lebresne commented on CASSANDRA-8354: - The one data point is that I remember clearly having a debate with some arguing that using empty values as a way to emulate nulls in clustering columns was useful for him. Now, as much as I personally disagree with that, I would slightly prefer to take the less radical option of keeping the old behavior possible, but hidden behind a non-default flag. Provided we properly document it and make it clear that it's a bad idea to use unless for backward compatibility sake, I don't think having it in the syntax is such a bug deal. In fact, that ALLOW EMPTY (or whatever equivalent syntax) could be useful for types like strings or blob when you know an empty string/blob doesn't make sense and you want the database to validate it (that is, allowing more precise validation server side is not a bad thing imo). The other thing is that automatically and inconditionally converting empty values on upgrade could be a pretty painful upgrade for users that do use those empty values. Anyway, my point is, I wish as much as anyone else that we had no empty value for type for which it doesn't make sense from day one, but since that's not the case, I'd have a preference for the option that give us the proper default while making it as little painful as possible for upgraders (even those upgraders that we disagree with). A better story for dealing with empty values Key: CASSANDRA-8354 URL: https://issues.apache.org/jira/browse/CASSANDRA-8354 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Fix For: 3.0 In CQL, a value of any type can be empty, even for types for which such values doesn't make any sense (int, uuid, ...). Note that it's different from having no value (i.e. a {{null}}). This is due to historical reasons, and we can't entirely disallow it for backward compatibility, but it's pretty painful when working with CQL since you always need to be defensive about such largely non-sensical values. This is particularly annoying with UDF: those empty values are represented as {{null}} for UDF and that plays weirdly with UDF that use unboxed native types. So I would suggest that we introduce variations of the types that don't accept empty byte buffers for those type for which it's not a particularly sensible value. Ideally we'd use those variant by default, that is: {noformat} CREATE TABLE foo (k text PRIMARY, v int) {noformat} would not accept empty values for {{v}}. But {noformat} CREATE TABLE foo (k text PRIMARY, v int ALLOW EMPTY) {noformat} would. Similarly, for UDF, a function like: {noformat} CREATE FUNCTION incr(v int) RETURNS int LANGUAGE JAVA AS 'return v + 1'; {noformat} would be guaranteed it can only be applied where no empty values are allowed. A function that wants to handle empty values could be created with: {noformat} CREATE FUNCTION incr(v int ALLOW EMPTY) RETURNS int ALLOW EMPTY LANGUAGE JAVA AS 'return (v == null) ? null : v + 1'; {noformat} Of course, doing that has the problem of backward compatibility. One option could be to say that if a type doesn't accept empties, but we do have an empty internally, then we convert it to some reasonably sensible default value (0 for numeric values, the smallest possible uuid for uuids, etc...). This way, we could allow convesion of types to and from 'ALLOW EMPTY'. And maybe we'd say that existing compact tables gets the 'ALLOW EMPTY' flag for their types by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8354) A better story for dealing with empty values
[ https://issues.apache.org/jira/browse/CASSANDRA-8354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224867#comment-14224867 ] Aleksey Yeschenko commented on CASSANDRA-8354: -- Orthogonally to the primary question here, can we maybe start allowing explicit null in partition key columns/clustering columns? (encode size as -1, as we do for tuples and UDTs now). A better story for dealing with empty values Key: CASSANDRA-8354 URL: https://issues.apache.org/jira/browse/CASSANDRA-8354 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Fix For: 3.0 In CQL, a value of any type can be empty, even for types for which such values doesn't make any sense (int, uuid, ...). Note that it's different from having no value (i.e. a {{null}}). This is due to historical reasons, and we can't entirely disallow it for backward compatibility, but it's pretty painful when working with CQL since you always need to be defensive about such largely non-sensical values. This is particularly annoying with UDF: those empty values are represented as {{null}} for UDF and that plays weirdly with UDF that use unboxed native types. So I would suggest that we introduce variations of the types that don't accept empty byte buffers for those type for which it's not a particularly sensible value. Ideally we'd use those variant by default, that is: {noformat} CREATE TABLE foo (k text PRIMARY, v int) {noformat} would not accept empty values for {{v}}. But {noformat} CREATE TABLE foo (k text PRIMARY, v int ALLOW EMPTY) {noformat} would. Similarly, for UDF, a function like: {noformat} CREATE FUNCTION incr(v int) RETURNS int LANGUAGE JAVA AS 'return v + 1'; {noformat} would be guaranteed it can only be applied where no empty values are allowed. A function that wants to handle empty values could be created with: {noformat} CREATE FUNCTION incr(v int ALLOW EMPTY) RETURNS int ALLOW EMPTY LANGUAGE JAVA AS 'return (v == null) ? null : v + 1'; {noformat} Of course, doing that has the problem of backward compatibility. One option could be to say that if a type doesn't accept empties, but we do have an empty internally, then we convert it to some reasonably sensible default value (0 for numeric values, the smallest possible uuid for uuids, etc...). This way, we could allow convesion of types to and from 'ALLOW EMPTY'. And maybe we'd say that existing compact tables gets the 'ALLOW EMPTY' flag for their types by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8354) A better story for dealing with empty values
[ https://issues.apache.org/jira/browse/CASSANDRA-8354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225824#comment-14225824 ] Jonathan Ellis commented on CASSANDRA-8354: --- bq. ALLOW EMPTY (or whatever equivalent syntax) could be useful for types like strings or blob when you know an empty string/blob doesn't make sense and you want the database to validate it This is too specific a use case to warrant special syntax. We can certainly add CHECK constraints using UDF though. A better story for dealing with empty values Key: CASSANDRA-8354 URL: https://issues.apache.org/jira/browse/CASSANDRA-8354 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Fix For: 3.0 In CQL, a value of any type can be empty, even for types for which such values doesn't make any sense (int, uuid, ...). Note that it's different from having no value (i.e. a {{null}}). This is due to historical reasons, and we can't entirely disallow it for backward compatibility, but it's pretty painful when working with CQL since you always need to be defensive about such largely non-sensical values. This is particularly annoying with UDF: those empty values are represented as {{null}} for UDF and that plays weirdly with UDF that use unboxed native types. So I would suggest that we introduce variations of the types that don't accept empty byte buffers for those type for which it's not a particularly sensible value. Ideally we'd use those variant by default, that is: {noformat} CREATE TABLE foo (k text PRIMARY, v int) {noformat} would not accept empty values for {{v}}. But {noformat} CREATE TABLE foo (k text PRIMARY, v int ALLOW EMPTY) {noformat} would. Similarly, for UDF, a function like: {noformat} CREATE FUNCTION incr(v int) RETURNS int LANGUAGE JAVA AS 'return v + 1'; {noformat} would be guaranteed it can only be applied where no empty values are allowed. A function that wants to handle empty values could be created with: {noformat} CREATE FUNCTION incr(v int ALLOW EMPTY) RETURNS int ALLOW EMPTY LANGUAGE JAVA AS 'return (v == null) ? null : v + 1'; {noformat} Of course, doing that has the problem of backward compatibility. One option could be to say that if a type doesn't accept empties, but we do have an empty internally, then we convert it to some reasonably sensible default value (0 for numeric values, the smallest possible uuid for uuids, etc...). This way, we could allow convesion of types to and from 'ALLOW EMPTY'. And maybe we'd say that existing compact tables gets the 'ALLOW EMPTY' flag for their types by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8354) A better story for dealing with empty values
[ https://issues.apache.org/jira/browse/CASSANDRA-8354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223462#comment-14223462 ] Jonathan Ellis commented on CASSANDRA-8354: --- Is there a way we can avoid permanently enshrining this wart? What if for instance we added an option {{strict_cql_values}} to 3.0 that defaults to false. When enabled it rejects nonsensical empty values. For 3.1 we default to true, and give people a tool to convert empty to null or some other value. For 4.0 it stays permanently true. A better story for dealing with empty values Key: CASSANDRA-8354 URL: https://issues.apache.org/jira/browse/CASSANDRA-8354 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Fix For: 3.0 In CQL, a value of any type can be empty, even for types for which such values doesn't make any sense (int, uuid, ...). Note that it's different from having no value (i.e. a {{null}}). This is due to historical reasons, and we can't entirely disallow it for backward compatibility, but it's pretty painful when working with CQL since you always need to be defensive about such largely non-sensical values. This is particularly annoying with UDF: those empty values are represented as {{null}} for UDF and that plays weirdly with UDF that use unboxed native types. So I would suggest that we introduce variations of the types that don't accept empty byte buffers for those type for which it's not a particularly sensible value. Ideally we'd use those variant by default, that is: {noformat} CREATE TABLE foo (k text PRIMARY, v int) {noformat} would not accept empty values for {{v}}. But {noformat} CREATE TABLE foo (k text PRIMARY, v int ALLOW EMPTY) {noformat} would. Similarly, for UDF, a function like: {noformat} CREATE FUNCTION incr(v int) RETURNS int LANGUAGE JAVA AS 'return v + 1'; {noformat} would be guaranteed it can only be applied where no empty values are allowed. A function that wants to handle empty values could be created with: {noformat} CREATE FUNCTION incr(v int ALLOW EMPTY) RETURNS int ALLOW EMPTY LANGUAGE JAVA AS 'return (v == null) ? null : v + 1'; {noformat} Of course, doing that has the problem of backward compatibility. One option could be to say that if a type doesn't accept empties, but we do have an empty internally, then we convert it to some reasonably sensible default value (0 for numeric values, the smallest possible uuid for uuids, etc...). This way, we could allow convesion of types to and from 'ALLOW EMPTY'. And maybe we'd say that existing compact tables gets the 'ALLOW EMPTY' flag for their types by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8354) A better story for dealing with empty values
[ https://issues.apache.org/jira/browse/CASSANDRA-8354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223472#comment-14223472 ] Aleksey Yeschenko commented on CASSANDRA-8354: -- bq. What if for instance we added an option strict_cql_values to 3.0 that defaults to false. When enabled it rejects nonsensical empty values. For 3.1 we default to true, and give people a tool to convert empty to null or some other value. For 4.0 it stays permanently true. That. Except it's not just CQL, there is thrift too, where we should enforce this, so maybe should name it 'reject_empty_types' or something. As a tool, upgradesstables will probably do. Don't want to legitimize it on CQL syntax level either. A better story for dealing with empty values Key: CASSANDRA-8354 URL: https://issues.apache.org/jira/browse/CASSANDRA-8354 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Fix For: 3.0 In CQL, a value of any type can be empty, even for types for which such values doesn't make any sense (int, uuid, ...). Note that it's different from having no value (i.e. a {{null}}). This is due to historical reasons, and we can't entirely disallow it for backward compatibility, but it's pretty painful when working with CQL since you always need to be defensive about such largely non-sensical values. This is particularly annoying with UDF: those empty values are represented as {{null}} for UDF and that plays weirdly with UDF that use unboxed native types. So I would suggest that we introduce variations of the types that don't accept empty byte buffers for those type for which it's not a particularly sensible value. Ideally we'd use those variant by default, that is: {noformat} CREATE TABLE foo (k text PRIMARY, v int) {noformat} would not accept empty values for {{v}}. But {noformat} CREATE TABLE foo (k text PRIMARY, v int ALLOW EMPTY) {noformat} would. Similarly, for UDF, a function like: {noformat} CREATE FUNCTION incr(v int) RETURNS int LANGUAGE JAVA AS 'return v + 1'; {noformat} would be guaranteed it can only be applied where no empty values are allowed. A function that wants to handle empty values could be created with: {noformat} CREATE FUNCTION incr(v int ALLOW EMPTY) RETURNS int ALLOW EMPTY LANGUAGE JAVA AS 'return (v == null) ? null : v + 1'; {noformat} Of course, doing that has the problem of backward compatibility. One option could be to say that if a type doesn't accept empties, but we do have an empty internally, then we convert it to some reasonably sensible default value (0 for numeric values, the smallest possible uuid for uuids, etc...). This way, we could allow convesion of types to and from 'ALLOW EMPTY'. And maybe we'd say that existing compact tables gets the 'ALLOW EMPTY' flag for their types by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8354) A better story for dealing with empty values
[ https://issues.apache.org/jira/browse/CASSANDRA-8354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220930#comment-14220930 ] Robert Stupp commented on CASSANDRA-8354: - I'd prefer something like {{ALLOW NULL}} for UDFs since {{null}} and empty are equivalent for a UDF (it cannot handle an _empty int_ or _empty uuid_). A better story for dealing with empty values Key: CASSANDRA-8354 URL: https://issues.apache.org/jira/browse/CASSANDRA-8354 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Fix For: 3.0 In CQL, a value of any type can be empty, even for types for which such values doesn't make any sense (int, uuid, ...). Note that it's different from having no value (i.e. a {{null}}). This is due to historical reasons, and we can't entirely disallow it for backward compatibility, but it's pretty painful when working with CQL since you always need to be defensive about such largely non-sensical values. This is particularly annoying with UDF: those empty values are represented as {{null}} for UDF and that plays weirdly with UDF that use unboxed native types. So I would suggest that we introduce variations of the types that don't accept empty byte buffers for those type for which it's not a particularly sensible value. Ideally we'd use those variant by default, that is: {noformat} CREATE TABLE foo (k text PRIMARY, v int) {noformat} would not accept empty values for {{v}}. But {noformat} CREATE TABLE foo (k text PRIMARY, v int ALLOW EMPTY) {noformat} would. Similarly, for UDF, a function like: {noformat} CREATE FUNCTION incr(v int) RETURNS int LANGUAGE JAVA AS 'return v + 1'; {noformat} would be guaranteed it can only be applied where no empty values are allowed. A function that wants to handle empty values could be created with: {noformat} CREATE FUNCTION incr(v int ALLOW EMPTY) RETURNS int ALLOW EMPTY LANGUAGE JAVA AS 'return (v == null) ? null : v + 1'; {noformat} Of course, doing that has the problem of backward compatibility. One option could be to say that if a type doesn't accept empties, but we do have an empty internally, then we convert it to some reasonably sensible default value (0 for numeric values, the smallest possible uuid for uuids, etc...). This way, we could allow convesion of types to and from 'ALLOW EMPTY'. And maybe we'd say that existing compact tables gets the 'ALLOW EMPTY' flag for their types by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)