Hi,

+1 I like doing it the SQL way. This makes sense to me.

Now, in Cassandra setting a column to null means deleting it and if all​ 
columns in a row are null the row is deleted. This might be another edge case...

German
________________________________
From: Benjamin Lerer <b.le...@gmail.com>
Sent: Wednesday, March 20, 2024 9:15 AM
To: dev@cassandra.apache.org <dev@cassandra.apache.org>
Subject: [EXTERNAL] [DISCUSS] NULL handling and the unfrozen collection issue

You don't often get email from b.le...@gmail.com. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
Hi everybody,

CEP-29 (CQL NOT Operator) is hitting the grey area of how we want as a 
community to handle NULL including for things like unfrozen (multi-cell) 
collections and I would like to make a proposal for moving forward with NULL 
related issues.

We have currently 2 tickets open about NULL handling (I might have missed 
others):

  1.  CASSANDRA-10715<https://issues.apache.org/jira/browse/CASSANDRA-10715>: 
Allowing Filtering on NULL
  2.  CASSANDRA-17762<https://issues.apache.org/jira/browse/CASSANDRA-17762>: 
LWT IF col = NULL is inconsistent with SQL NULL

We also had previously some discussion on which we touched the subject:

  *   [DISCUSS] LWT UPDATE semantics with + and - when null
  *   CEP-15 multi key transaction syntax

In all those tickets and discussions the consensus was to have a behavior 
similar to SQL.

For null comparisons, SQL uses the three-value logic 
(https://modern-sql.com/concept/three-valued-logic) introducing the need for IS 
NULL and IS NOT NULL operators. Those conflict with the col = NULL predicate 
supported in LWT conditions 
(CASSANDRA-17762<https://issues.apache.org/jira/browse/CASSANDRA-17762>).

So far, as Cassandra was only using inclusive operators, comparisons were 
behaving in an expected way. According to three-valued logic NULL CONTAINS 
'foo' should return UNKNOWN and the filtering behavior should exclude 
everything which is not true.Therefore the row should not be returned as 
expected. With exclusive operators things are more tricky. NULL NOT CONTAINS 
'foo' will also return UNKNOWN causing the row to not be returned which might 
not match people's expectations.
This behavior can be even more confusing once you take into account empty and 
null collections. NOT CONTAINS on an empty collection will return true while it 
will return UNKNOWN on a NULL collection. Unfortunately, for unfrozen 
(multicell) collections we are unable to differentiate between an empty and 
null collection and therefore always treat empty collections as NULL.
For predicates such as map[myKey] != 'foo' when myKey is not present the result 
can also be surprising as it will end up comparing NULL to 'foo' returning once 
more UNKNOWN and ignoring the row.
In order to respect the SQL three-valued logic and be able to allow the user to 
fetch all the rows which do not contains a specific value we would need support 
IS NULL, IS NOT NULL and OR to allow query like:
WHERE c IS NULL OR c NOT CONTAINS 'foo' / WHERE m IS NULL OR m[myKey] != foo

Supporting the three-valued logic makes sense to me even if some behavior might 
end up being confusing. In which case we can easily fix CASSANDRA-10715 and 
deprectate support for col = NULL/col != NULL in LWT.

What is people's opinion? Should we go for the three-valued logic everywhere? 
Should we try something else?





Reply via email to