[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546874#comment-15546874
 ] 

Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 10:34 PM:
------------------------------------------------------------------

For the help with my use case question, not really, due to the round-trip per 
row insert (same reasoning as the other "batch checkandput" bugs within phoenix 
and HBase).

For the question about commit, and this is just me trying to understand it, if 
you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that 
order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a 
delete, or select), according to that wording. Is that correct? For example, 
what happens in the following case (when using jdbc driver commands for 
autocommit / commit):
{code:sql}
AUTOCOMMIT OFF;
UPSERT INTO T (PK, V) VALUES (0,0);

-- Run batch size + 1 times
UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff;

COMMIT;
{code}

In theory, according to SQL + the docs, I would expect to see only one new row 
added to the table. But if more then one batch occurs, and the same PK is in 
both sides of the batch, an extra two rows should exist. The most likely time 
for this to show up, of course, is multiple upsert selects within the same 
commit on a large enough table.

Anyways, this is only mildly related to this bug, as it this same problem would 
be more pronounced with an on duplicate key command, as you would probably 
actually be using it for things like increments / etc.


was (Author: cameron.hatfield):
For the help with my use case question, not really, due to the round-trip per 
row insert (same reasoning as the other "batch checkandput" bugs within phoenix 
and HBase).

For the question about commit, and this is just me trying to understand it, if 
you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that 
order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a 
delete, or select), according to that wording. Is that correct? For example, 
what happens in the following case (when using jdbc driver commands for 
autocommit / commit):
{code:sql}
AUTOCOMMIT OFF;
UPSERT INTO T (PK, V) VALUES (0,0);

-- Run batch size + 1 times
UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff;

COMMIT;
{code}

In theory, according to SQL + the docs, I would expect to see only one new row 
added to the table. But if more then one batch occurs, an extra two rows should 
exist. The most likely time for this to show up, of course, is multiple upsert 
selects within the same commit on a large enough table.

Anyways, this is only mildly related to this bug, as it this same problem would 
be more pronounced with an on duplicate key command, as you would probably 
actually be using it for things like increments / etc.

> Support ON DUPLICATE KEY construct
> ----------------------------------
>
>                 Key: PHOENIX-6
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: James Taylor
>            Assignee: James Taylor
>             Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> <column>=<expression>, ...] ]
> {code}
> The following restrictions will apply:
> - The <column> may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the <column> may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to