[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

Cameron Hatfield (JIRA) Tue, 04 Oct 2016 12:45:40 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543705#comment-15543705
 ]


Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 7:45 PM:
-----------------------------------------------------------------

At what point will the expressions be evaluated in the context of update 
statement?

Specifically, intra-query order (Notice the use of COUNTER1 and COUNTER2 on the 
RHS of each expression):
Statements ran in order, in two separate execute round trips to Phoenix/HBase
{code:sql}
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
{code}
After the second upsert, which would be the result?
COUNTER1, COUNTER2
1: 1, 1
2: 1, 2
3: 2, 1

Last time I looked at this during the update statement thread, I believe it was 
number 1 that should be the result according to the SQL standard.


inter-query (Same query as example query in description, twice):
Multiple queries, within the same batch sent to the Phoenix coprocessor
{code:sql}
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
{code}
After the batch is run, which would be the result?
COUNTER1, COUNTER2
1: 0,0
2: 1,1

This is the one we had issues with choosing how we wanted to do it with the CAS 
change. Since they aren't in the memstore until the whole batch is finished, 
you have to keep state within the coprocessor, for each edit, to ensure it came 
out with number 2. On top of that, if you have two different queries that will 
have very different results in different orders, then batching can readily 
affect the outcome. Right now, even though all of the warnings within HBase 
talk about ordering not guaranteed for batching, for a single row, it seems to 
be ordered in batch order as sent by the client. Aside from that, since 
batching is usually a performance optimization, I wouldn't want the 
optimization actually affecting the results.

>From a SQL standard direction, I believe 2 would be correct here, as there is 
>no idea of a "transaction" for a batch in phoenix (even per key), so I would 
>assume that it would be equiv. to auto commit after each statement.




was (Author: cameron.hatfield):
At what point will the expressions be evaluated in the context of update 
statement?

Specifically, intra-query order (Notice the use of COUNTER1 and COUNTER2 on the 
RHS of each expression):
Statements ran in order, in two separate execute round trips to Phoenix/HBase

UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;

After the second upsert, which would be the result?
COUNTER1, COUNTER2
1: 1, 1
2: 1, 2
3: 2, 1

Last time I looked at this during the update statement thread, I believe it was 
number 1 that should be the result according to the SQL standard.


inter-query (Same query as example query in description, twice):
Multiple queries, within the same batch sent to the Phoenix coprocessor
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;

After the batch is run, which would be the result?
COUNTER1, COUNTER2
1: 0,0
2: 1,1

This is the one we had issues with choosing how we wanted to do it with the CAS 
change. Since they aren't in the memstore until the whole batch is finished, 
you have to keep state within the coprocessor, for each edit, to ensure it came 
out with number 2. On top of that, if you have two different queries that will 
have very different results in different orders, then batching can readily 
affect the outcome. Right now, even though all of the warnings within HBase 
talk about ordering not guaranteed for batching, for a single row, it seems to 
be ordered in batch order as sent by the client. Aside from that, since 
batching is usually a performance optimization, I wouldn't want the 
optimization actually affecting the results.

>From a SQL standard direction, I believe 2 would be correct here, as there is 
>no idea of a "transaction" for a batch in phoenix (even per key), so I would 
>assume that it would be equiv. to auto commit after each statement.



> Support ON DUPLICATE KEY construct
> ----------------------------------
>
>                 Key: PHOENIX-6
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: James Taylor
>            Assignee: James Taylor
>             Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> <column>=<expression>, ...] ]
> {code}
> The following restrictions will apply:
> - The <column> may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the <column> may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

Reply via email to