Re: CEP-15 multi key transaction syntax

Avi Kivity via dev Sun, 14 Aug 2022 08:30:53 -0700


On 14/08/2022 17.50, Benedict Elliott Smith wrote:

> SELECT and LET incompatible once comparisons become valid selectors
I don’t think this would be ambiguous, as = is required in the LETsyntax as we have to bind the result to a variable name.
But, I like the deconstructed tuple syntax improvement over “Option6”. This would also seem to easily support assigning from non-querystatements, such as LET (a, b) = (someFunc(), someOtherFunc(?))
I don’t think it is ideal to depend on relative position in the tuplefor assigning results to a variable name, as it leaves more scope forerrors. It would be nice to have a simple way to deconstruct safely.But, I think this proposal is good, and I’d be fine with it as analternative if others concur. I agree that seeing the SELECTindependently may be more easily recognisable to users.
With this approach there remains the question of how we handle singlecolumn results. I’d be inclined to treat in the following way:
LET (a) = SELECT val FROM table
IF a > 1 THEN...

LET a = SELECT val FROM table
IF a.val > 1 THEN...

I think SQL dialects require subqueries to be parenthesized (not sure).If that's the case I think we should keep the tradition.

----

There is also the question of whether we support SELECT without a FROMclause, e.g.

LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2

Or just LET (since they are no longer equivalent)
e.g.
LET x = (someFunc() AS v1, someOtherFunc() as v2)
LET (v1, v2) = (someFunc(), someOtherFunc())

I see no harm in making FROM optional, as it's recognized by other SQLdialects.

----
Also since LET is only binding variables, is there any reason weshouldn’t support multiple SELECT assignments in a single LET?, e.g.
LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))


What if an inner select returns a tuple? Would y be a tuple?

I think this is redundant and atypical enough to not be worthsupporting. Most people would use separate LETs.

----
Also whether we support tuples in SELECT statements anyway, e.g.
LET (tuple1, tuple2) = SELECT (a, b), (c, d) FROM..
IF tuple1.a > 1 AND tuple2.d > 1…

Absolutely, this just flows naturally from having tuples. There's nodifference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple".


----
and whether we support nested deconstruction, e.g.
LET (a, b, (c, d)) = SELECT a, b, someTuple FROM..
IF a > 1 AND d > 1…

I think this can be safely deferred. Most people would again separate itinto separate LETs.

I'd add (to the specification) that LETs cannot override a previouslydefined variable, just to reduce ambiguity.

On 14 Aug 2022, at 13:55, Avi Kivity via dev<[email protected]> wrote:



On 14/08/2022 01.29, Benedict Elliott Smith wrote:

I’ll do my best to express with my thinking, as well as how I wouldexplain the feature to a user.
My mental model for LET statements is that they are simply SELECTstatements where the columns that are selected become variablesaccessible anywhere in the scope of the transaction. That is to say,you should be able to run something like s/LET/SELECT ands/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LETstatement and produce a valid SELECT statement, and vice versa. Bothshould perform identically.
e.g.
SELECT pk AS key, v AS value FROM table

=>
LET key = pk, value = v FROM table

"=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQLsupports selecting comparisons:



$ psql
psql (14.3)
Type "help" for help.

avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
 ?column? | ?column? | ?column?
----------+----------+----------
 f        | t        |
(1 row)

Using "=" as a syntactic element in LET would make SELECT and LETincompatible once comparisons become valid selectors. Unless theybecome mandatory (and then you'd write "LET q = a = b" if you wantedto select a comparison).



I personally prefer the nested query syntax:


    LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);

So there aren't two similar-but-not-quite-the-same syntaxes. SELECTis immediately recognizable by everyone as a query, LET is not.

Identical form, identical behaviour. Every statement should bedirectly translatable with some simple text manipulation.

We can then make this more powerful for users by simply expandingSELECT statements, e.g. by permitting them to declare constants andtuples in the column results. In this scheme LET x = * is simplysyntactic sugar for LET x = (pk, ck, field1, …) This scheme thensupports options 2, 4 and 5 all at once, consistently alongside eachother.

Option 6 is in fact very similar, but is strictly less flexible forthe user as they have no way to declare multiple scalar variableswithout scoping them inside a tuple.


e.g.
LET key = pk, value = v FROM table
IF key > 1 AND value > 1 THEN...

=>
LET row = SELECT pk AS key, v AS value FROM table
IF row.key > 1 AND row.value > 1 THEN…

However, both are expressible in the existing proposal, as if youprefer this naming scheme you can simply write


LET row = (pk AS key, v AS value) FROM table
IF row.key > 1 AND row.value > 1 THEN…

With respect to auto converting single column results to a scalar,we do need a way for the user to say they care whether the row wasnull or the column. I think an implicit conversion here could besurprising. However we could implement tuple expressions anyway andlet the user explicitly declare v as a tuple as Caleb has suggestedfor the existing proposal as well.

Assigning constants or other values not selected from a table wouldalso be a little clunky:


LET v1 = someFunc(), v2 = someOtherFunc(?)
IF v1 > 1 AND v2 > 1 THEN…

=>
LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
IF row.v1 > 1 AND row.v2 > 1 THEN...

That said, the proposals are /close/ to identical, it is justslightly more verbose and slightly less flexible.

Which one would be most intuitive to users is hard to predict. Itmight be that Option 6 would be slightly easier, but I’m unsure ifthere would be a huge difference.

On 13 Aug 2022, at 16:59, Patrick McFadin <[email protected]> wrote:

I'm really happy to see CEP-15 getting closer to a finalimplementation. I'm going to walk through my reasoning for yourproposals wrt trying to explain this to somebody new.

Looking at all the options, the first thing that comes up for me isthe Cassandra project's complicated relationship with NULL. Wehave prior art with EXISTS/NOT EXISTS when creating new tables. ISNULL/IS NOT NULL is used in materialized views similarly toproposals 2,4 and 5.


CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
  AS SELECT [ (column_list) ]
  FROM [keyspace_name.]table_name
  [ WHERE column_name IS NOT NULL
  [ AND column_name IS NOT NULL ... ] ]
  [ AND relation [ AND ... ] ]
  PRIMARY KEY ( column_list )
  [ WITH [ table_properties ]

[ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option)] ] ;

Based on that, I believe 1 and 3 would just confuse users, so -1on those.

Trying to explain the difference between row and column operationswith LET, I can't see the difference between a row and column in #2.

#4 introduces a boolean instead of column names and just adds moresyntax.

#5 is verbose and, in my opinion, easier to reason when writing aquery. Thinking top down, I need to know if these exact rows and/orcolumn values exist before changing them, so I'll define themfirst. Then I'll iterate over the state I created in my actualchanges so I know I'm changing precisely what I want.

#5 could use a bit more to be clearer to somebody who doesn't writeCQL queries daily and wouldn't require memorizing subtledifferences. It should be similar to all the other syntax, solearning a little about CQL will let you move into more withoutcompletely re-learning the new syntax.


So I propose #6)
BEGIN TRANSACTION

LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selectsall columns

  LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0;
  SELECT row1, row2
  IF row1 IS NULL AND row2.v = 3 THEN
    INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
  END IF
COMMIT TRANSACTION

I added the SELECT in the LET just so it's straightforward, you arereading, and it's just like doing a regular select, but you areassigning it to a variable.

I removed the confusing 'row1.v'and replaced it with 'row1'I can'tsee why you would need the '.v'vs having the complete variable Icreated in the statement above.


EOL

Patrick

On Thu, Aug 11, 2022 at 1:37 PM Caleb Rackliffe<[email protected]> wrote:


    ...and one more option...

    5.) Introduce tuple assignments, removing all ambiguity around
    row vs. column operations.

    BEGIN TRANSACTION
      LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
    columns
      LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0;
      SELECT row1.v, row2.v
      IF row1 IS NULL AND row2.v = 3 THEN
        INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
      END IF
    COMMIT TRANSACTION



    On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe
    <[email protected]> wrote:

        via Benedict, here is a 4th option:

        4.) Similar to #2, but don't rely on the key element being
        NULL.

        If the read returns no result, x effectively becomes NULL.
        Otherwise, it remains true/NOT NULL.

        BEGIN TRANSACTION
          LET x = true FROM ks.tbl WHERE k=0 AND c=0;
          LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
          SELECT x, row2_v
          IF x IS NULL AND row2_v = 3 THEN
            INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
          END IF
        COMMIT TRANSACTION

        On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe
        <[email protected]> wrote:

            Hello again everyone!

            I've been working on a prototype
            <https://issues.apache.org/jira/browse/CASSANDRA-17719> in
            CASSANDRA-17719 for a grammar that roughly corresponds
            to what we've agreed on in this thread. One thing that
            isn't immediately obvious to me is how the LET syntax
            handles cases where we want to check for the plain
            existence of a row in IF. For example, in this hybrid
            of the originally proposed syntax and something more
            like what we've agreed on (and the RETURNING just to
            distinguish between that and SELECT), this could be
            pretty straightforward:

            BEGIN TRANSACTION
              SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
              SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
              RETURNING row1.v, row2.v
              IF row1 NOT EXISTS AND row2.v = 3 THEN
                INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
              END IF
            COMMIT TRANSACTION

            The NOT EXISTS operator has row1 to work with. One the
            other hand, w/ the LET syntax and no naming of reads,
            it's not clear what the best solution would be. Here
            are a few possibilities:

            1.) Provide a few built-in functions that operate on a
            whole result row. If we assume a SQL style IS NULL and
            IS NOT NULL (see my last post here) for operations on
            particular columns, this probably eliminates the need
            for EXISTS/NOT EXISTS as well.

            BEGIN TRANSACTION
              LET row1_missing = notExists() FROM ks.tbl WHERE k=0
            AND c=0;
              LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
              SELECT row1_missing, row2_v
            IF row1_missing AND row2_v = 3 THEN
                INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
              END IF
            COMMIT TRANSACTION

            2.) Assign and check the first primary key element to
            determine whether the row exists.

            BEGIN TRANSACTION
              LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0;
              LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
              SELECT row1_k, row2_v
              IF row1_k IS NULL AND row2_v = 3 THEN
                INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
              END IF
            COMMIT TRANSACTION

            3.) Reconsider the LET concept toward something that
            allows us to explicitly name our reads again.

            BEGIN TRANSACTION
              WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0) AS row1;
              WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0) AS row2;
            SELECT row1.v, row2.v
              IF row1 NOT EXISTS AND row2.v = 3 THEN
                INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
              END IF
            COMMIT TRANSACTION

            I don't have a strong affinity for any of these,
            although #1 seems the most awkward.

            Does anyone have any other alternatives? Preference for
            one of the above options?

            Thanks!

            On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe
            <[email protected]> wrote:

                Avi brought up an interesting point around NULLness
                checking inCASSANDRA-17762
                <https://issues.apache.org/jira/browse/CASSANDRA-17762>...

                    In SQL, any comparison with NULL is NULL, which
                    is interpreted as FALSE in a condition. To test
                    for NULLness, you use IS NULL or IS NOT NULL.
                    But LWT uses IF col = NULL as a NULLness test.
                    This is likely to confuse people coming from
                    SQL and hamper attempts to extend the dialect.


                We can leave that Jira open to address what to do
                in the legacy LWT case, but I'd support a
                SQL-congruent syntax here (IS NULL or IS NOT NULL),
                where we have something closer to a blank slate.

                Thoughts?

                On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky
                <[email protected]> wrote:

                    The new syntax looks great, and I’m really
                    excited to see this coming together.

                    One piece of feedback on the proposed syntax is
                    around the use of “=“ as a declaration in
                    addition to its current use as an equality
                    operator in a WHERE clause and an assignment
                    operator in an UPDATE:

                        BEGIN TRANSACTION
                          LET car_miles = miles_driven,
                        car_is_running = is_running FROM cars WHERE
                        model=’pinto’
                          LET user_miles = miles_driven FROM users
                        WHERE name=’blake’
                        SELECT something else from some other table
                          IF NOT car_is_running THEN ABORT
                          UPDATE users SET miles_driven =
                        user_miles + 30 WHERE name='blake';
                        UPDATE cars SET miles_driven = car_miles +
                        30 WHERE model='pinto';
                        COMMIT TRANSACTION

                    This is supported in languages like PL/pgSQL,
                    but in a normal SQL query kind of local
                    declaration is often expressed as an alias
                    (SELECT col AS new_col), subquery alias (SELECT
                    col) t, or common table expression (WITH t AS
                    (SELECT col)).

                    Here’s an example of an alternative to the
                    proposed syntax that I’d find more readable:

                        BEGIN TRANSACTION
                        WITH car_miles, car_is_running AS (SELECT
                        miles_driven, is_running FROM cars WHERE
                        model=’pinto’),
                        user_miles AS (SELECT miles_driven FROM
                        users WHERE name=’blake’)
                          IF NOT car_is_running THEN ABORT
                          UPDATE users SET miles_driven =
                        user_miles + 30 WHERE name='blake';
                        UPDATE cars SET miles_driven = car_miles +
                        30 WHERE model='pinto';
                        COMMIT TRANSACTION

                    There’s also the option of naming the
                    transaction like a subquery, and supporting LET
                    via AS (this one I’m less sure about but wanted
                    to propose anyway):

                        BEGIN TRANSACTION t1
                        SELECT miles_driven AS t1.car_miles,
                        is_running AS t1.car_is_running FROM cars
                        WHERE model=’pinto’;
                          SELECT miles_driven AS t1.user_miles FROM
                        users WHERE name=’blake’;
                          IF NOT car_is_running THEN ABORT
                          UPDATE users SET miles_driven =
                        user_miles + 30 WHERE name='blake';
                        UPDATE cars SET miles_driven = car_miles +
                        30 WHERE model='pinto';
                        COMMIT TRANSACTION

                    This also has the benefit of resolving
                    ambiguity in case of naming conflicts with
                    existing (or future) column names.

                    --
                    Abe

Re: CEP-15 multi key transaction syntax

Reply via email to