Re: CEP-15 multi key transaction syntax

Avi Kivity via dev Mon, 22 Aug 2022 09:20:35 -0700

I wasn't referring to specific syntax but to the concept. If a SQLdialect (or better, the standard) has a way to select data into avariable, let's adopt it.

If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab)is my preference.


On 8/22/22 19:13, Patrick McFadin wrote:

The replies got trashed pretty badly in the responses.

When you say: "Agree it's better to reuse existing syntax than inventnew syntax."


Which syntax are you referring to?

Patrick

On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev<[email protected]> wrote:


    Agree it's better to reuse existing syntax than invent new syntax.

    On 8/21/22 16:52, Konstantin Osipov wrote:
    > * Avi Kivity via dev <[email protected]> [22/08/14 15:59]:
    >
    > MySQL supports SELECT <expr_list> INTO <var_list> FROM ... WHERE
    > ...
    >
    > PostgreSQL supports pretty much the same syntax.
    >
    > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var
    TYPE and
    > MySQL/PostgreSQL SELECT ... INTO?
    >
    >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
    >>> 
    >>> I’ll do my best to express with my thinking, as well as how I
    would
    >>> explain the feature to a user.
    >>>
    >>> My mental model for LET statements is that they are simply SELECT
    >>> statements where the columns that are selected become variables
    >>> accessible anywhere in the scope of the transaction. That is
    to say, you
    >>> should be able to run something like s/LET/SELECT and
    >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET
    statement
    >>> and produce a valid SELECT statement, and vice versa. Both should
    >>> perform identically.
    >>>
    >>> e.g.
    >>> SELECT pk AS key, v AS value FROM table
    >>>
    >>> =>
    >>> LET key = pk, value = v FROM table
    >>
    >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet,
    but SQL
    >> supports selecting comparisons:
    >>
    >>
    >> $ psql
    >> psql (14.3)
    >> Type "help" for help.
    >>
    >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
    >>   ?column? | ?column? | ?column?
    >> ----------+----------+----------
    >>   f        | t        |
    >> (1 row)
    >>
    >>
    >> Using "=" as a syntactic element in LET would make SELECT and LET
    >> incompatible once comparisons become valid selectors. Unless
    they become
    >> mandatory (and then you'd write "LET q = a = b" if you wanted
    to select a
    >> comparison).
    >>
    >>
    >> I personally prefer the nested query syntax:
    >>
    >>
    >>      LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
    >>
    >>
    >> So there aren't two similar-but-not-quite-the-same syntaxes.
    SELECT is
    >> immediately recognizable by everyone as a query, LET is not.
    >>
    >>
    >>> Identical form, identical behaviour. Every statement should be
    directly
    >>> translatable with some simple text manipulation.
    >>>
    >>> We can then make this more powerful for users by simply
    expanding SELECT
    >>> statements, e.g. by permitting them to declare constants and
    tuples in
    >>> the column results. In this scheme LET x = * is simply
    syntactic sugar
    >>> for LET x = (pk, ck, field1, …) This scheme then supports
    options 2, 4
    >>> and 5 all at once, consistently alongside each other.
    >>>
    >>> Option 6 is in fact very similar, but is strictly less
    flexible for the
    >>> user as they have no way to declare multiple scalar variables
    without
    >>> scoping them inside a tuple.
    >>>
    >>> e.g.
    >>> LET key = pk, value = v FROM table
    >>> IF key > 1 AND value > 1 THEN...
    >>>
    >>> =>
    >>> LET row = SELECT pk AS key, v AS value FROM table
    >>> IF row.key > 1 AND row.value > 1 THEN…
    >>>
    >>> However, both are expressible in the existing proposal, as if
    you prefer
    >>> this naming scheme you can simply write
    >>>
    >>> LET row = (pk AS key, v AS value) FROM table
    >>> IF row.key > 1 AND row.value > 1 THEN…
    >>>
    >>> With respect to auto converting single column results to a
    scalar, we do
    >>> need a way for the user to say they care whether the row was
    null or the
    >>> column. I think an implicit conversion here could be
    surprising. However
    >>> we could implement tuple expressions anyway and let the user
    explicitly
    >>> declare v as a tuple as Caleb has suggested for the existing
    proposal as
    >>> well.
    >>>
    >>> Assigning constants or other values not selected from a table
    would also
    >>> be a little clunky:
    >>>
    >>> LET v1 = someFunc(), v2 = someOtherFunc(?)
    >>> IF v1 > 1 AND v2 > 1 THEN…
    >>>
    >>> =>
    >>> LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
    >>> IF row.v1 > 1 AND row.v2 > 1 THEN...
    >>>
    >>> That said, the proposals are /close/ to identical, it is just
    slightly
    >>> more verbose and slightly less flexible.
    >>>
    >>> Which one would be most intuitive to users is hard to predict.
    It might
    >>> be that Option 6 would be slightly easier, but I’m unsure if
    there would
    >>> be a huge difference.
    >>>
    >>>
    >>>> On 13 Aug 2022, at 16:59, Patrick McFadin
    <[email protected]> wrote:
    >>>>
    >>>> I'm really happy to see CEP-15 getting closer to a final
    >>>> implementation. I'm going to walk through my reasoning for your
    >>>> proposals wrt trying to explain this to somebody new.
    >>>>
    >>>> Looking at all the options, the first thing that comes up for
    me is
    >>>> the Cassandra project's complicated relationship with NULL. 
    We have
    >>>> prior art with EXISTS/NOT EXISTS when creating new tables. IS
    >>>> NULL/IS NOT NULL is used in materialized views similarly to
    >>>> proposals 2,4 and 5.
    >>>>
    >>>> CREATE MATERIALIZED VIEW [ IF NOT EXISTS ]
    [keyspace_name.]view_name
    >>>>    AS SELECT [ (column_list) ]
    >>>>    FROM [keyspace_name.]table_name
    >>>>    [ WHERE column_name IS NOT NULL
    >>>>    [ AND column_name IS NOT NULL ... ] ]
    >>>>    [ AND relation [ AND ... ] ]
    >>>>    PRIMARY KEY ( column_list )
    >>>>    [ WITH [ table_properties ]
    >>>>    [ [ AND ] CLUSTERING ORDER BY (cluster_column_name
    order_option) ] ] ;
    >>>>
    >>>>   Based on that, I believe 1 and 3 would just confuse users,
    so -1 on
    >>>> those.
    >>>>
    >>>> Trying to explain the difference between row and column
    operations
    >>>> with LET, I can't see the difference between a row and column
    in #2.
    >>>>
    >>>> #4 introduces a boolean instead of column names and just adds
    more
    >>>> syntax.
    >>>>
    >>>> #5 is verbose and, in my opinion, easier to reason when writing a
    >>>> query. Thinking top down, I need to know if these exact rows
    and/or
    >>>> column values exist before changing them, so I'll define them
    first.
    >>>> Then I'll iterate over the state I created in my actual
    changes so I
    >>>> know I'm changing precisely what I want.
    >>>>
    >>>> #5 could use a bit more to be clearer to somebody who doesn't
    write
    >>>> CQL queries daily and wouldn't require memorizing subtle
    >>>> differences. It should be similar to all the other syntax, so
    >>>> learning a little about CQL will let you move into more without
    >>>> completely re-learning the new syntax.
    >>>>
    >>>> So I propose #6)
    >>>> BEGIN TRANSACTION
    >>>> LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- *
    selects all
    >>>> columns
    >>>> LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0;
    >>>>    SELECT row1, row2
    >>>> IF row1 IS NULL AND row2.v = 3 THEN
    >>>>    INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
    >>>> END IF
    >>>> COMMIT TRANSACTION
    >>>>
    >>>> I added the SELECT in the LET just so it's straightforward,
    you are
    >>>> reading, and it's just like doing a regular select, but you are
    >>>> assigning it to a variable.
    >>>>
    >>>> I removed the confusing 'row1.v'and replaced it with 'row1'I
    can't
    >>>> see why you would need the '.v'vs having the complete variable I
    >>>> created in the statement above.
    >>>>
    >>>> EOL
    >>>>
    >>>> Patrick
    >>>>
    >>>> On Thu, Aug 11, 2022 at 1:37 PM Caleb Rackliffe
    >>>> <[email protected]> wrote:
    >>>>
    >>>>      ...and one more option...
    >>>>
    >>>>      5.) Introduce tuple assignments, removing all ambiguity
    around
    >>>>      row vs. column operations.
    >>>>
    >>>>      BEGIN TRANSACTION
    >>>>        LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- *
    selects all
    >>>>      columns
    >>>>        LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0;
    >>>>      SELECT row1.v, row2.v
    >>>>        IF row1 IS NULL AND row2.v = 3 THEN
    >>>>          INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
    >>>>        END IF
    >>>>      COMMIT TRANSACTION
    >>>>
    >>>>
    >>>>
    >>>>      On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe
    >>>>      <[email protected]> wrote:
    >>>>
    >>>>          via Benedict, here is a 4th option:
    >>>>
    >>>>          4.) Similar to #2, but don't rely on the key element
    being NULL.
    >>>>
    >>>>          If the read returns no result, x effectively becomes
    NULL.
    >>>>          Otherwise, it remains true/NOT NULL.
    >>>>
    >>>>          BEGIN TRANSACTION
    >>>>            LET x = true FROM ks.tbl WHERE k=0 AND c=0;
    >>>>            LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
    >>>>            SELECT x, row2_v
    >>>>            IF x IS NULL AND row2_v = 3 THEN
    >>>>              INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
    >>>>            END IF
    >>>>          COMMIT TRANSACTION
    >>>>
    >>>>          On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe
    >>>>          <[email protected]> wrote:
    >>>>
    >>>>              Hello again everyone!
    >>>>
    >>>>              I've been working on a prototype
    >>>>             
    <https://issues.apache.org/jira/browse/CASSANDRA-17719> in
    >>>>              CASSANDRA-17719 for a grammar that roughly
    corresponds to
    >>>>              what we've agreed on in this thread. One thing
    that isn't
    >>>>              immediately obvious to me is how the LET syntax
    handles
    >>>>              cases where we want to check for the plain
    existence of a
    >>>>              row in IF. For example, in this hybrid of the
    originally
    >>>>              proposed syntax and something more like what
    we've agreed
    >>>>              on (and the RETURNING just to distinguish
    between that
    >>>>              and SELECT), this could be pretty straightforward:
    >>>>
    >>>>              BEGIN TRANSACTION
    >>>>                SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
    >>>>                SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
    >>>>                RETURNING row1.v, row2.v
    >>>>                IF row1 NOT EXISTS AND row2.v = 3 THEN
    >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
    >>>>                END IF
    >>>>              COMMIT TRANSACTION
    >>>>
    >>>>              The NOT EXISTS operator has row1 to work with.
    One the
    >>>>              other hand, w/ the LET syntax and no naming of
    reads,
    >>>>              it's not clear what the best solution would be.
    Here are
    >>>>              a few possibilities:
    >>>>
    >>>>              1.) Provide a few built-in functions that
    operate on a
    >>>>              whole result row. If we assume a SQL style IS
    NULL and IS
    >>>>              NOT NULL (see my last post here) for operations on
    >>>>              particular columns, this probably eliminates the
    need for
    >>>>              EXISTS/NOT EXISTS as well.
    >>>>
    >>>>              BEGIN TRANSACTION
    >>>>                LET row1_missing = notExists() FROM ks.tbl
    WHERE k=0
    >>>>              AND c=0;
    >>>>                LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
    >>>>                SELECT row1_missing, row2_v
    >>>>                IF row1_missing AND row2_v = 3 THEN
    >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
    >>>>                END IF
    >>>>              COMMIT TRANSACTION
    >>>>
    >>>>              2.) Assign and check the first primary key
    element to
    >>>>              determine whether the row exists.
    >>>>
    >>>>              BEGIN TRANSACTION
    >>>>                LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0;
    >>>>                LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
    >>>>                SELECT row1_k, row2_v
    >>>>                IF row1_k IS NULL AND row2_v = 3 THEN
    >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
    >>>>                END IF
    >>>>              COMMIT TRANSACTION
    >>>>
    >>>>              3.) Reconsider the LET concept toward something that
    >>>>              allows us to explicitly name our reads again.
    >>>>
    >>>>              BEGIN TRANSACTION
    >>>>                WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0)
    AS row1;
    >>>>                WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0)
    AS row2;
    >>>>                SELECT row1.v, row2.v
    >>>>                IF row1 NOT EXISTS AND row2.v = 3 THEN
    >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
    >>>>                END IF
    >>>>              COMMIT TRANSACTION
    >>>>
    >>>>              I don't have a strong affinity for any of these,
    although
    >>>>              #1 seems the most awkward.
    >>>>
    >>>>              Does anyone have any other alternatives?
    Preference for
    >>>>              one of the above options?
    >>>>
    >>>>              Thanks!
    >>>>
    >>>>              On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe
    >>>>              <[email protected]> wrote:
    >>>>
    >>>>                  Avi brought up an interesting point around
    NULLness
    >>>>                  checking inCASSANDRA-17762
    >>>>                 
    <https://issues.apache.org/jira/browse/CASSANDRA-17762>...
    >>>>
    >>>>                      In SQL, any comparison with NULL is
    NULL, which
    >>>>                      is interpreted as FALSE in a condition.
    To test
    >>>>                      for NULLness, you use IS NULL or IS NOT
    NULL. But
    >>>>                      LWT uses IF col = NULL as a NULLness
    test. This
    >>>>                      is likely to confuse people coming from
    SQL and
    >>>>                      hamper attempts to extend the dialect.
    >>>>
    >>>>
    >>>>                  We can leave that Jira open to address what
    to do in
    >>>>                  the legacy LWT case, but I'd support a
    SQL-congruent
    >>>>                  syntax here (IS NULL or IS NOT NULL), where
    we have
    >>>>                  something closer to a blank slate.
    >>>>
    >>>>                  Thoughts?
    >>>>
    >>>>                  On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky
    >>>>                  <[email protected]> wrote:
    >>>>
    >>>>                      The new syntax looks great, and I’m really
    >>>>                      excited to see this coming together.
    >>>>
    >>>>                      One piece of feedback on the proposed
    syntax is
    >>>>                      around the use of “=“ as a declaration in
    >>>>                      addition to its current use as an equality
    >>>>                      operator in a WHERE clause and an assignment
    >>>>                      operator in an UPDATE:
    >>>>
    >>>>                          BEGIN TRANSACTION
    >>>>                            LET car_miles = miles_driven,
    >>>>                          car_is_running = is_running FROM
    cars WHERE
    >>>>                          model=’pinto’
    >>>>                            LET user_miles = miles_driven FROM
    users
    >>>>                          WHERE name=’blake’
    >>>>                            SELECT something else from some
    other table
    >>>>                            IF NOT car_is_running THEN ABORT
    >>>>                            UPDATE users SET miles_driven =
    user_miles
    >>>>                          + 30 WHERE name='blake';
    >>>>                            UPDATE cars SET miles_driven =
    car_miles +
    >>>>                          30 WHERE model='pinto';
    >>>>                          COMMIT TRANSACTION
    >>>>
    >>>>                      This is supported in languages like
    PL/pgSQL, but
    >>>>                      in a normal SQL query kind of local
    declaration
    >>>>                      is often expressed as an alias (SELECT
    col AS
    >>>>                      new_col), subquery alias (SELECT col) t, or
    >>>>                      common table expression (WITH t AS
    (SELECT col)).
    >>>>
    >>>>                      Here’s an example of an alternative to the
    >>>>                      proposed syntax that I’d find more readable:
    >>>>
    >>>>                          BEGIN TRANSACTION
    >>>>                            WITH car_miles, car_is_running AS
    (SELECT
    >>>>                          miles_driven, is_running FROM cars WHERE
    >>>>                          model=’pinto’),
    >>>>                          user_miles AS (SELECT miles_driven
    FROM users
    >>>>                          WHERE name=’blake’)
    >>>>                            IF NOT car_is_running THEN ABORT
    >>>>                            UPDATE users SET miles_driven =
    user_miles
    >>>>                          + 30 WHERE name='blake';
    >>>>                          UPDATE cars SET miles_driven =
    car_miles + 30
    >>>>                          WHERE model='pinto';
    >>>>                          COMMIT TRANSACTION
    >>>>
    >>>>                      There’s also the option of naming the
    transaction
    >>>>                      like a subquery, and supporting LET via
    AS (this
    >>>>                      one I’m less sure about but wanted to
    propose
    >>>>                      anyway):
    >>>>
    >>>>                          BEGIN TRANSACTION t1
    >>>>                            SELECT miles_driven AS t1.car_miles,
    >>>>                          is_running AS t1.car_is_running FROM
    cars
    >>>>                          WHERE model=’pinto’;
    >>>>                            SELECT miles_driven AS
    t1.user_miles FROM
    >>>>                          users WHERE name=’blake’;
    >>>>                            IF NOT car_is_running THEN ABORT
    >>>>                            UPDATE users SET miles_driven =
    user_miles
    >>>>                          + 30 WHERE name='blake';
    >>>>                            UPDATE cars SET miles_driven =
    car_miles +
    >>>>                          30 WHERE model='pinto';
    >>>>                          COMMIT TRANSACTION
    >>>>
    >>>>                      This also has the benefit of resolving
    ambiguity
    >>>>                      in case of naming conflicts with
    existing (or
    >>>>                      future) column names.
    >>>>
    >>>>                      --
    >>>>                      Abe
    >>>>

Re: CEP-15 multi key transaction syntax

Reply via email to