[
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15607265#comment-15607265
]
James Taylor commented on PHOENIX-6:
------------------------------------
[[email protected]] - Can you give us a bit of information on usage
pattern for this feature to help guide [~mujtabachohan]'s performance
evaluation? In particular:
- How many rows will you be incrementing in one commit batch? Is it just a
single row commit? Or could it be hundreds of rows at a time?
- Will one row be updated multiple times in the same commit batch?
- Any idea of the velocity of updates? How many rows per second would you
expect to be updated?
- How frequently will the same rows be updated? Is it highly skewed in that the
same rows will be incremented over and over again? Or is it a pretty even
distribution across the board?
> Support ON DUPLICATE KEY construct
> ----------------------------------
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
> Issue Type: New Feature
> Reporter: James Taylor
> Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch,
> PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch,
> PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should
> support the "on duplicate key" construct for UPSERT. With this construct, the
> UPSERT VALUES statement would run atomically and would thus require a read
> before write which would obviously have a negative impact on performance. For
> an example of similar syntax , see MySQL documentation at
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail:
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J.
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0)
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0)
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE
> <column>=<expression>, ...] ]
> {code}
> The following restrictions will apply:
> * The <column> may not be part of the primary key constraint - only KeyValue
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in
> place which isn't allowed for immutable tables.
> ** Transactional tables because these use optimistic concurrency as their
> mechanism for consistency and isolation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)