[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576237#comment-15576237 ] Cameron Hatfield commented on PHOENIX-6: And in general, I agree with the heavyweightness of the merge statement. I could also see ON DUPLICATE KEY having slightly different semantics / optimization opportunities, ala Postgreses discussion around the feature, where having two statements makes sense, even if the more general statement can handle the same problem. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576149#comment-15576149 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/14/16 7:15 PM: -- It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: * UPSERT DUPLICATE KEY * UPSERT FROM SELECT * UPSERT DUPLICATE KEY * UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some of the results from statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. was (Author: cameron.hatfield): It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: * UPSERT DUPLICATE KEY * UPSERT FROM SELECT * UPSERT DUPLICATE KEY * UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576149#comment-15576149 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/14/16 6:44 PM: -- It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: * UPSERT DUPLICATE KEY * UPSERT FROM SELECT * UPSERT DUPLICATE KEY * UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. was (Author: cameron.hatfield): It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: * UPSERT DUPLICATE KEY * UPSERT FROM SELECT * UPSERT DUPLICATE KEY * UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576149#comment-15576149 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/14/16 6:43 PM: -- It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: *UPSERT DUPLICATE KEY *UPSERT FROM SELECT *UPSERT DUPLICATE KEY *UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. was (Author: cameron.hatfield): It would work for me, though I'm not sure if it would be consistent the statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: *UPSERT DUPLICATE KEY *UPSERT FROM SELECT *UPSERT DUPLICATE KEY *UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576149#comment-15576149 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/14/16 6:43 PM: -- It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: * UPSERT DUPLICATE KEY * UPSERT FROM SELECT * UPSERT DUPLICATE KEY * UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. was (Author: cameron.hatfield): It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: *UPSERT DUPLICATE KEY *UPSERT FROM SELECT *UPSERT DUPLICATE KEY *UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576149#comment-15576149 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/14/16 6:44 PM: -- It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: * UPSERT DUPLICATE KEY * UPSERT FROM SELECT * UPSERT DUPLICATE KEY * UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. was (Author: cameron.hatfield): It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: * UPSERT DUPLICATE KEY * UPSERT FROM SELECT * UPSERT DUPLICATE KEY * UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576149#comment-15576149 ] Cameron Hatfield commented on PHOENIX-6: It would work for me, though I'm not sure if it would be consistent the statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: *UPSERT DUPLICATE KEY *UPSERT FROM SELECT *UPSERT DUPLICATE KEY *UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576121#comment-15576121 ] Cameron Hatfield commented on PHOENIX-6: Here is what MSSQL does to handle not having to repeat the actual values (though the clause is still repeated), as well as how it supports multiple values for the same merge: {code:sql} MERGE INTO Sales.SalesReason AS Target USING (VALUES ('Recommendation','Other'), ('Review', 'Marketing'), ('Internet', 'Promotion')) AS Source (NewName, NewReasonType) ON Target.Name = Source.NewName WHEN MATCHED THEN UPDATE SET ReasonType = Source.NewReasonType WHEN NOT MATCHED BY TARGET THEN INSERT (Name, ReasonType) VALUES (NewName, NewReasonType) {code} > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576102#comment-15576102 ] Cameron Hatfield commented on PHOENIX-6: In general, the main use case I have is reducing RPC calls across "compare-and-set" requests. A tertiary use case would be reducing the amount of work the parser/compiler has to do (issuing and parsing one statement is generally easier/faster then issuing and parsing multiple of the same statements). The first could be handled by either supporting batching at the client level (so one RPC call for multiple statements being run), though there are questions on what autocommit means in that respect, or by supporting a single statement that supports multiple values (such as a values statement that supports multiple rows). The allowing multiple statements for the same row key would basically be equivalent to an "autocommit" after each individual statement, so that would work for me. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546874#comment-15546874 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 10:56 PM: -- For the help with my use case question, not really, due to the round-trip per row insert (same reasoning as the other "batch checkandput" bugs within phoenix and HBase). For the question about commit, and this is just me trying to understand it, if you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a delete, or select), according to that wording. Is that correct? For example, what happens in the following case (when using jdbc driver commands for autocommit / commit): {code:sql} AUTOCOMMIT OFF; UPSERT INTO T (PK, V) VALUES (0,0); -- Run batch size + 1 times UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff; COMMIT; {code} In theory, according to SQL + the docs, I would expect to see only one new row added to the table. But if more then one batch occurs, and the same PK is in both sides of the batch, an extra two rows should exist. The most likely time for this to show up, of course, is multiple upsert selects within the same commit on a large enough table. In playing with this in squirrel sql, I don't really have enough knowledge to what would actually trigger multiple batches to happen, so I have no repro for this :). Anyways, this is only mildly related to this bug, as it this same problem would be more pronounced with an on duplicate key command, as you would probably actually be using it for things like increments / etc. if the problem exists at all. was (Author: cameron.hatfield): For the help with my use case question, not really, due to the round-trip per row insert (same reasoning as the other "batch checkandput" bugs within phoenix and HBase). For the question about commit, and this is just me trying to understand it, if you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a delete, or select), according to that wording. Is that correct? For example, what happens in the following case (when using jdbc driver commands for autocommit / commit): {code:sql} AUTOCOMMIT OFF; UPSERT INTO T (PK, V) VALUES (0,0); -- Run batch size + 1 times UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff; COMMIT; {code} In theory, according to SQL + the docs, I would expect to see only one new row added to the table. But if more then one batch occurs, and the same PK is in both sides of the batch, an extra two rows should exist. The most likely time for this to show up, of course, is multiple upsert selects within the same commit on a large enough table. Anyways, this is only mildly related to this bug, as it this same problem would be more pronounced with an on duplicate key command, as you would probably actually be using it for things like increments / etc. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546874#comment-15546874 ] Cameron Hatfield commented on PHOENIX-6: For the help with my use case question, not really, due to the round-trip per row insert (same reasoning as the other "batch checkandput" bugs within phoenix and HBase). For the question about commit, and this is just me trying to understand it, if you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a delete, or select), according to that wording. Is that correct? For example, what happens in the following case (when using jdbc driver commands for autocommit / commit): {code:sql} AUTOCOMMIT OFF; UPSERT INTO T (PK, V) VALUES (0,0); -- Run batch size + 1 times UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff; COMMIT; {code} In theory, according to SQL + the docs, I would expect to see only one new row added to the table. But if more then one batch occurs, an extra two rows should exist. The most likely time for this to show up, of course, is multiple upsert selects within the same commit on a large enough table. Anyways, this is only mildly related to this bug, as it this same problem would be more pronounced with an on duplicate key command, as you would probably actually be using it for things like increments / etc. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546874#comment-15546874 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 10:34 PM: -- For the help with my use case question, not really, due to the round-trip per row insert (same reasoning as the other "batch checkandput" bugs within phoenix and HBase). For the question about commit, and this is just me trying to understand it, if you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a delete, or select), according to that wording. Is that correct? For example, what happens in the following case (when using jdbc driver commands for autocommit / commit): {code:sql} AUTOCOMMIT OFF; UPSERT INTO T (PK, V) VALUES (0,0); -- Run batch size + 1 times UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff; COMMIT; {code} In theory, according to SQL + the docs, I would expect to see only one new row added to the table. But if more then one batch occurs, and the same PK is in both sides of the batch, an extra two rows should exist. The most likely time for this to show up, of course, is multiple upsert selects within the same commit on a large enough table. Anyways, this is only mildly related to this bug, as it this same problem would be more pronounced with an on duplicate key command, as you would probably actually be using it for things like increments / etc. was (Author: cameron.hatfield): For the help with my use case question, not really, due to the round-trip per row insert (same reasoning as the other "batch checkandput" bugs within phoenix and HBase). For the question about commit, and this is just me trying to understand it, if you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a delete, or select), according to that wording. Is that correct? For example, what happens in the following case (when using jdbc driver commands for autocommit / commit): {code:sql} AUTOCOMMIT OFF; UPSERT INTO T (PK, V) VALUES (0,0); -- Run batch size + 1 times UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff; COMMIT; {code} In theory, according to SQL + the docs, I would expect to see only one new row added to the table. But if more then one batch occurs, an extra two rows should exist. The most likely time for this to show up, of course, is multiple upsert selects within the same commit on a large enough table. Anyways, this is only mildly related to this bug, as it this same problem would be more pronounced with an on duplicate key command, as you would probably actually be using it for things like increments / etc. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546605#comment-15546605 ] Cameron Hatfield commented on PHOENIX-6: Ahh, I saw where I was getting confused. I thought that Phoenix implemented executebatch according to the JDBC contract (multiple statements will be sent in one go to the server), however instead they send each statement separately to the underlying db. Would probably be worthwhile to mention that if you want to use this for counters / etc, that each separate statement hitting the same row requires a separate commit (just to make it more obvious that it is how it would work). This would also be an interesting interaction with how UPSERT ... SELECT batches inserts, when duplicates are involved, since the it would act differently depending on where duplicates where in the batch. Technically, I believe this is a problem now, as UPSERT .. SELECT would allow you to read data that is not yet "committed", so its possible that the documentation for that should be updated (as a separate bug): "Non transactional tables have no guarantees above and beyond the HBase guarantee of row level atomicity (see here). *In addition, non transactional tables will not see their updates until after a commit has occurred.*" > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543705#comment-15543705 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 7:45 PM: - At what point will the expressions be evaluated in the context of update statement? Specifically, intra-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase {code:sql} UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; {code} After the second upsert, which would be the result? COUNTER1, COUNTER2 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. inter-query (Same query as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor {code:sql} UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; {code} After the batch is run, which would be the result? COUNTER1, COUNTER2 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. Aside from that, since batching is usually a performance optimization, I wouldn't want the optimization actually affecting the results. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. was (Author: cameron.hatfield): At what point will the expressions be evaluated in the context of update statement? Specifically, intra-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? COUNTER1, COUNTER2 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. inter-query (Same query as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? COUNTER1, COUNTER2 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. Aside from that, since batching is usually a performance optimization, I wouldn't want the optimization actually affecting the results. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546436#comment-15546436 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 7:43 PM: - Except, in the second case (two statements, same batch), the value of row 'a' of would non-existent, as they have never been inserted before. So wouldn't the result be 0,0 if it was in the same batch? was (Author: cameron.hatfield): Except, in the second case (two statements, same batch), the value of row 'a' of would non-existent, as they have never been interested before. So wouldn't the result be 0,0 if it was in the same batch? > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546436#comment-15546436 ] Cameron Hatfield commented on PHOENIX-6: Except, in the second case (two statements, same batch), the value of row 'a' of would non-existent, as they have never been interested before. So wouldn't the result be 0,0 if it was in the same batch? > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543705#comment-15543705 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 6:01 PM: - At what point will the expressions be evaluated in the context of update statement? Specifically, intra-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? COUNTER1, COUNTER2 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. inter-query (Same query as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? COUNTER1, COUNTER2 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. Aside from that, since batching is usually a performance optimization, I wouldn't want the optimization actually affecting the results. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. was (Author: cameron.hatfield): At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? COUNTER1, COUNTER2 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? COUNTER1, COUNTER2 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. Aside from that, since batching is usually a performance optimization, I wouldn't want the optimization actually affecting the results. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543705#comment-15543705 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/3/16 11:21 PM: -- At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? COUNTER1, COUNTER2 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? COUNTER1, COUNTER2 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. Aside from that, since batching is usually a performance optimization, I wouldn't want the optimization actually affecting the results. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. was (Author: cameron.hatfield): At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? COUNTER1, COUNTER2 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? COUNTER1, COUNTER2 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543719#comment-15543719 ] Cameron Hatfield commented on PHOENIX-6: And as a general note, though inefficent, when combined with an IF-like function, this would also cover our CAS use cases. However, it will be highly dependent on the ordering guarantees above. Here's the first blog post example I found of doing CAS like things in another SQL engine, https://thewebfellas.com/blog/conditional-duplicate-key-updates-with-mysql > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543705#comment-15543705 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/3/16 11:13 PM: -- At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? COUNTER1, COUNTER2 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? COUNTER1, COUNTER2 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. was (Author: cameron.hatfield): At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543705#comment-15543705 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/3/16 11:12 PM: -- At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. was (Author: cameron.hatfield): At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Both run in two separate round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543705#comment-15543705 ] Cameron Hatfield commented on PHOENIX-6: At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Both run in two separate round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2909) Surface checkAndPut through UPDATE statement
[ https://issues.apache.org/jira/browse/PHOENIX-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435860#comment-15435860 ] Cameron Hatfield commented on PHOENIX-2909: --- I would agree, and I would say is a good argument for having something else within the merge statement call out the use of whether this will be a fast / slow version. My basic worry is that when I see something like UPDATE vs UPSERT or INSERT vs INSERT ... ON DUPLICATE KEY IGNORE, I always expect the more complicated / does more for me statement to be either the same speed or slower in the general case. With the proposed change, the more constrained case UPDATE is actually slower. Whether that is a valid concern is up to debate. How does this update statement interact with transaction support? > Surface checkAndPut through UPDATE statement > > > Key: PHOENIX-2909 > URL: https://issues.apache.org/jira/browse/PHOENIX-2909 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > We can surface atomic checkAndPut like functionality through support of the > SQL UPSERT statement. > For example, the following could use do a get under row lock to perform the > row update atomically > {code} > UPDATE my_table SET counter=coalesce(counter,0) + 1 > FROM my_table WHERE pk1 = 1 AND pk2 = 2; > {code} > To force prior MVCC transactions to complete (making it serializable as an > Increment is), we'd have code like this: > {code} > mvcc = region.getMVCC(); > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert()); > {code} > By users setting auto commit to true and issuing an UPDATE statement over a > non transactional table, they'd get a way for row updates to be atomic. This > would work especially well to support counters. > An UPDATE statement would simply be translated to an equivalent UPSERT SELECT > with a flag being passed to the server such that the row lock and read occurs > when executed. For example, the above statement would become: > {code} > UPSERT INTO my_table(pk1,pk2,counter) SELECT pk1, pk2, coalesce(counter,0) + > 1 > FROM my_table WHERE pk1 = 1 AND pk2 = 2; > {code} > Note that the coalesce call above handles the case where counter is null. > This could be made prettier with support for the DEFAULT clause at CREATE > TABLE time (PHOENIX-476). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-2909) Surface checkAndPut through UPDATE statement
[ https://issues.apache.org/jira/browse/PHOENIX-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435680#comment-15435680 ] Cameron Hatfield edited comment on PHOENIX-2909 at 8/24/16 9:04 PM: I like the idea of this (of course), but the issue I have is one that this doesn't support our use case because it requires an existing row, unlike an upsert statement. Also seems like it starts getting confusing when you have two separate statements, that are essentially doing the same thing, but have a slightly different use case due to the purely backend change to it flushing the memstore. I would expect an UPDATE statement to behave exactly the same as an UPSERT, in the case that a row already exists. But violating the principle of least surprise, it actually ends up being more expensive, with little to no warning to the user aside from a note in the documentation. Wouldn't this be better to add as a flag / etc on top of the existing upsert statement? Should we reconsider the work already done on PHOENIX-2271? was (Author: cameron.hatfield): I like the idea of this (of course), but the issue I have is one that this doesn't support our use case because it requires an existing row, unlike an upsert statement. Also seems like it starts getting confusing when you have two separate statements, that are essentially doing the same thing, but have a slightly different use case due to the purely backend change to it flushing the memstore. I would expect an UPDATE statement to behave exactly the same as an UPSERT, in the case that it already exists. But violating the principle of least surprise, it actually ends up being more expensive, with little to no warning to the user aside from a note in the documentation. Wouldn't this be better to add as a flag / etc on top of the existing upsert statement? Should we reconsider the work already done on PHOENIX-2271? > Surface checkAndPut through UPDATE statement > > > Key: PHOENIX-2909 > URL: https://issues.apache.org/jira/browse/PHOENIX-2909 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > We can surface atomic checkAndPut like functionality through support of the > SQL UPSERT statement. > For example, the following could use do a get under row lock to perform the > row update atomically > {code} > UPDATE my_table SET counter=coalesce(counter,0) + 1 > FROM my_table WHERE pk1 = 1 AND pk2 = 2; > {code} > To force prior MVCC transactions to complete (making it serializable as an > Increment is), we'd have code like this: > {code} > mvcc = region.getMVCC(); > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert()); > {code} > By users setting auto commit to true and issuing an UPDATE statement over a > non transactional table, they'd get a way for row updates to be atomic. This > would work especially well to support counters. > An UPDATE statement would simply be translated to an equivalent UPSERT SELECT > with a flag being passed to the server such that the row lock and read occurs > when executed. For example, the above statement would become: > {code} > UPSERT INTO my_table(pk1,pk2,counter) SELECT pk1, pk2, coalesce(counter,0) + > 1 > FROM my_table WHERE pk1 = 1 AND pk2 = 2; > {code} > Note that the coalesce call above handles the case where counter is null. > This could be made prettier with support for the DEFAULT clause at CREATE > TABLE time (PHOENIX-476). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2909) Surface checkAndPut through UPDATE statement
[ https://issues.apache.org/jira/browse/PHOENIX-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435680#comment-15435680 ] Cameron Hatfield commented on PHOENIX-2909: --- I like the idea of this (of course), but the issue I have is one that this doesn't support our use case because it requires an existing row, unlike an upsert statement. Also seems like it starts getting confusing when you have two separate statements, that are essentially doing the same thing, but have a slightly different use case due to the purely backend change to it flushing the memstore. I would expect an UPDATE statement to behave exactly the same as an UPSERT, in the case that it already exists. But violating the principle of least surprise, it actually ends up being more expensive, with little to no warning to the user aside from a note in the documentation. Wouldn't this be better to add as a flag / etc on top of the existing upsert statement? Should we reconsider the work already done on PHOENIX-2271? > Surface checkAndPut through UPDATE statement > > > Key: PHOENIX-2909 > URL: https://issues.apache.org/jira/browse/PHOENIX-2909 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > We can surface atomic checkAndPut like functionality through support of the > SQL UPSERT statement. > For example, the following could use do a get under row lock to perform the > row update atomically > {code} > UPDATE my_table SET counter=coalesce(counter,0) + 1 > FROM my_table WHERE pk1 = 1 AND pk2 = 2; > {code} > To force prior MVCC transactions to complete (making it serializable as an > Increment is), we'd have code like this: > {code} > mvcc = region.getMVCC(); > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert()); > {code} > By users setting auto commit to true and issuing an UPDATE statement over a > non transactional table, they'd get a way for row updates to be atomic. This > would work especially well to support counters. > An UPDATE statement would simply be translated to an equivalent UPSERT SELECT > with a flag being passed to the server such that the row lock and read occurs > when executed. For example, the above statement would become: > {code} > UPSERT INTO my_table(pk1,pk2,counter) SELECT pk1, pk2, coalesce(counter,0) + > 1 > FROM my_table WHERE pk1 = 1 AND pk2 = 2; > {code} > Note that the coalesce call above handles the case where counter is null. > This could be made prettier with support for the DEFAULT clause at CREATE > TABLE time (PHOENIX-476). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2909) Surface checkAndPut through UPDATE statement
[ https://issues.apache.org/jira/browse/PHOENIX-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435629#comment-15435629 ] Cameron Hatfield commented on PHOENIX-2909: --- What has changed since the original discussion that no longer makes PHOENIX-2275 the correct answer? > Surface checkAndPut through UPDATE statement > > > Key: PHOENIX-2909 > URL: https://issues.apache.org/jira/browse/PHOENIX-2909 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > We can surface atomic checkAndPut like functionality through support of the > SQL UPSERT statement. > For example, the following could use do a get under row lock to perform the > row update atomically > {code} > UPDATE my_table SET counter=coalesce(counter,0) + 1 > FROM my_table WHERE pk1 = 1 AND pk2 = 2; > {code} > To force prior MVCC transactions to complete (making it serializable as an > Increment is), we'd have code like this: > {code} > mvcc = region.getMVCC(); > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert()); > {code} > By users setting auto commit to true and issuing an UPDATE statement over a > non transactional table, they'd get a way for row updates to be atomic. This > would work especially well to support counters. > An UPDATE statement would simply be translated to an equivalent UPSERT SELECT > with a flag being passed to the server such that the row lock and read occurs > when executed. For example, the above statement would become: > {code} > UPSERT INTO my_table(pk1,pk2,counter) SELECT pk1, pk2, coalesce(counter,0) + > 1 > FROM my_table WHERE pk1 = 1 AND pk2 = 2; > {code} > Note that the coalesce call above handles the case where counter is null. > This could be made prettier with support for the DEFAULT clause at CREATE > TABLE time (PHOENIX-476). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2909) Surface checkAndPut through UPDATE statement
[ https://issues.apache.org/jira/browse/PHOENIX-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435628#comment-15435628 ] Cameron Hatfield commented on PHOENIX-2909: --- Previous discussion on check-and-put > Surface checkAndPut through UPDATE statement > > > Key: PHOENIX-2909 > URL: https://issues.apache.org/jira/browse/PHOENIX-2909 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > We can surface atomic checkAndPut like functionality through support of the > SQL UPSERT statement. > For example, the following could use do a get under row lock to perform the > row update atomically > {code} > UPDATE my_table SET counter=coalesce(counter,0) + 1 > FROM my_table WHERE pk1 = 1 AND pk2 = 2; > {code} > To force prior MVCC transactions to complete (making it serializable as an > Increment is), we'd have code like this: > {code} > mvcc = region.getMVCC(); > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert()); > {code} > By users setting auto commit to true and issuing an UPDATE statement over a > non transactional table, they'd get a way for row updates to be atomic. This > would work especially well to support counters. > An UPDATE statement would simply be translated to an equivalent UPSERT SELECT > with a flag being passed to the server such that the row lock and read occurs > when executed. For example, the above statement would become: > {code} > UPSERT INTO my_table(pk1,pk2,counter) SELECT pk1, pk2, coalesce(counter,0) + > 1 > FROM my_table WHERE pk1 = 1 AND pk2 = 2; > {code} > Note that the coalesce call above handles the case where counter is null. > This could be made prettier with support for the DEFAULT clause at CREATE > TABLE time (PHOENIX-476). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2271) Upsert - CheckAndPut like functionality
[ https://issues.apache.org/jira/browse/PHOENIX-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908804#comment-14908804 ] Cameron Hatfield commented on PHOENIX-2271: --- For values: Thanks for the link, answers the problems that I was unsure about. The particular case I was mentioning isn't necessarily just count, but that the statement can return the keys themselves that were updated / inserted / deleted, generally used for finding out things such as what was already in the table, what wasn't inserted cause it already existed, etc. The count itself is also something that would be nice to have, and probably much easier to return compared to actual rows. I have actually used this feature in TSQL the past, though I can't remember if it was for anything more then counts. (Which I believe is how TSQL got around the fact that you only return one count). The next question would then be, can we get this patch in, with a limited subset of merge implemented (a insert/update only merge, for values only), with the assumption being that adding table support would be harder problem then just the values that it is now, or should we wait on the longer-burn timeline of calcite. > Upsert - CheckAndPut like functionality > --- > > Key: PHOENIX-2271 > URL: https://issues.apache.org/jira/browse/PHOENIX-2271 > Project: Phoenix > Issue Type: Improvement >Reporter: Babar Tareen > Attachments: patch.diff > > > The Upsert statement does not support HBase's checkAndPut api, thus making it > difficult to conditionally update a row. Based on the comments from > PHOENIX-6, I have implemented such functionality. The Upsert statement is > modified to support compare clause, which allows us to pass in an expression. > The expression is evaluated against the current record and Upsert is only > performed when the expression evaluates to true. More details > [here|https://github.com/babartareen/phoenix]. > h4. Examples > Given that the FirstName is always set for the users, create a user record if > one doesn't already exist. > {code:sql} > UPSERT INTO User (UserId, FirstName, LastName, Phone, Address, PIN) VALUES > (1, 'Alice', 'A', '123 456 7890', 'Some St. in a city', 1122) COMPARE > FirstName IS NULL; > {code} > Update the phone number for UserId '1' if the FirstName is set. Given that > the FirstName is always set for the users, this will only update the record > if it already exists. > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > IS NOT NULL; > {code} > Update the phone number if the first name for UserId '1' starts with 'Al' and > last name is 'A' > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > LIKE 'Al%' AND LastName = 'A'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2271) Upsert - CheckAndPut like functionality
[ https://issues.apache.org/jira/browse/PHOENIX-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908805#comment-14908805 ] Cameron Hatfield commented on PHOENIX-2271: --- For values: Thanks for the link, answers the problems that I was unsure about. The particular case I was mentioning isn't necessarily just count, but that the statement can return the keys themselves that were updated / inserted / deleted, generally used for finding out things such as what was already in the table, what wasn't inserted cause it already existed, etc. The count itself is also something that would be nice to have, and probably much easier to return compared to actual rows. I have actually used this feature in TSQL the past, though I can't remember if it was for anything more then counts. (Which I believe is how TSQL got around the fact that you only return one count). The next question would then be, can we get this patch in, with a limited subset of merge implemented (a insert/update only merge, for values only), with the assumption being that adding table support would be harder problem then just the values that it is now, or should we wait on the longer-burn timeline of calcite. > Upsert - CheckAndPut like functionality > --- > > Key: PHOENIX-2271 > URL: https://issues.apache.org/jira/browse/PHOENIX-2271 > Project: Phoenix > Issue Type: Improvement >Reporter: Babar Tareen > Attachments: patch.diff > > > The Upsert statement does not support HBase's checkAndPut api, thus making it > difficult to conditionally update a row. Based on the comments from > PHOENIX-6, I have implemented such functionality. The Upsert statement is > modified to support compare clause, which allows us to pass in an expression. > The expression is evaluated against the current record and Upsert is only > performed when the expression evaluates to true. More details > [here|https://github.com/babartareen/phoenix]. > h4. Examples > Given that the FirstName is always set for the users, create a user record if > one doesn't already exist. > {code:sql} > UPSERT INTO User (UserId, FirstName, LastName, Phone, Address, PIN) VALUES > (1, 'Alice', 'A', '123 456 7890', 'Some St. in a city', 1122) COMPARE > FirstName IS NULL; > {code} > Update the phone number for UserId '1' if the FirstName is set. Given that > the FirstName is always set for the users, this will only update the record > if it already exists. > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > IS NOT NULL; > {code} > Update the phone number if the first name for UserId '1' starts with 'Al' and > last name is 'A' > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > LIKE 'Al%' AND LastName = 'A'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-2271) Upsert - CheckAndPut like functionality
[ https://issues.apache.org/jira/browse/PHOENIX-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908805#comment-14908805 ] Cameron Hatfield edited comment on PHOENIX-2271 at 9/25/15 10:49 PM: - For values: Thanks for the link, answers the problems that I was unsure about. The particular case I was mentioning isn't necessarily just count, but that the statement can return the keys themselves that were updated / inserted / deleted, generally used for finding out things such as what was already in the table, what wasn't inserted cause it already existed, etc. The count itself is also something that would be nice to have, and probably much easier to return compared to actual rows. I have actually used this feature in TSQL the past, though I can't remember if it was for anything more then counts. (Which I believe is how TSQL got around the fact that you only return one count). The next question would then be, can we get this patch in, with a limited subset of merge implemented (a insert/update only merge, for values only), with the assumption being that adding table support would be harder problem then just the values that it is now, or should we wait on the longer-burn timeline of calcite (since I believe we have previously removed the requirement of waiting for transactions). was (Author: cameron.hatfield): For values: Thanks for the link, answers the problems that I was unsure about. The particular case I was mentioning isn't necessarily just count, but that the statement can return the keys themselves that were updated / inserted / deleted, generally used for finding out things such as what was already in the table, what wasn't inserted cause it already existed, etc. The count itself is also something that would be nice to have, and probably much easier to return compared to actual rows. I have actually used this feature in TSQL the past, though I can't remember if it was for anything more then counts. (Which I believe is how TSQL got around the fact that you only return one count). The next question would then be, can we get this patch in, with a limited subset of merge implemented (a insert/update only merge, for values only), with the assumption being that adding table support would be harder problem then just the values that it is now, or should we wait on the longer-burn timeline of calcite. > Upsert - CheckAndPut like functionality > --- > > Key: PHOENIX-2271 > URL: https://issues.apache.org/jira/browse/PHOENIX-2271 > Project: Phoenix > Issue Type: Improvement >Reporter: Babar Tareen > Attachments: patch.diff > > > The Upsert statement does not support HBase's checkAndPut api, thus making it > difficult to conditionally update a row. Based on the comments from > PHOENIX-6, I have implemented such functionality. The Upsert statement is > modified to support compare clause, which allows us to pass in an expression. > The expression is evaluated against the current record and Upsert is only > performed when the expression evaluates to true. More details > [here|https://github.com/babartareen/phoenix]. > h4. Examples > Given that the FirstName is always set for the users, create a user record if > one doesn't already exist. > {code:sql} > UPSERT INTO User (UserId, FirstName, LastName, Phone, Address, PIN) VALUES > (1, 'Alice', 'A', '123 456 7890', 'Some St. in a city', 1122) COMPARE > FirstName IS NULL; > {code} > Update the phone number for UserId '1' if the FirstName is set. Given that > the FirstName is always set for the users, this will only update the record > if it already exists. > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > IS NOT NULL; > {code} > Update the phone number if the first name for UserId '1' starts with 'Al' and > last name is 'A' > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > LIKE 'Al%' AND LastName = 'A'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2271) Upsert - CheckAndPut like functionality
[ https://issues.apache.org/jira/browse/PHOENIX-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905376#comment-14905376 ] Cameron Hatfield commented on PHOENIX-2271: --- For the second part, in reference to rowprocessor / multiple conditional checks / etc: The currently discussed patch supports all of those features, using the currently existing Indexer Region Observer, allowing for easy integration with the indexing process that exists. If it makes sense to pull that out to a rowprocessor, then that seems a reasonable change to make, but I don't believe we need to bring in or use any of mentioned extensions to HBase, as they don't necessarily integrate as nicely with Phoenixes indexes (though that is just from our understanding of the codebase of both Phoenix and HBase, which neither of us can claim expertise in), nor how the phoenix client works. > Upsert - CheckAndPut like functionality > --- > > Key: PHOENIX-2271 > URL: https://issues.apache.org/jira/browse/PHOENIX-2271 > Project: Phoenix > Issue Type: Improvement >Reporter: Babar Tareen > Attachments: patch.diff > > > The Upsert statement does not support HBase's checkAndPut api, thus making it > difficult to conditionally update a row. Based on the comments from > PHOENIX-6, I have implemented such functionality. The Upsert statement is > modified to support compare clause, which allows us to pass in an expression. > The expression is evaluated against the current record and Upsert is only > performed when the expression evaluates to true. More details > [here|https://github.com/babartareen/phoenix]. > h4. Examples > Given that the FirstName is always set for the users, create a user record if > one doesn't already exist. > {code:sql} > UPSERT INTO User (UserId, FirstName, LastName, Phone, Address, PIN) VALUES > (1, 'Alice', 'A', '123 456 7890', 'Some St. in a city', 1122) COMPARE > FirstName IS NULL; > {code} > Update the phone number for UserId '1' if the FirstName is set. Given that > the FirstName is always set for the users, this will only update the record > if it already exists. > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > IS NOT NULL; > {code} > Update the phone number if the first name for UserId '1' starts with 'Al' and > last name is 'A' > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > LIKE 'Al%' AND LastName = 'A'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-2271) Upsert - CheckAndPut like functionality
[ https://issues.apache.org/jira/browse/PHOENIX-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905383#comment-14905383 ] Cameron Hatfield edited comment on PHOENIX-2271 at 9/23/15 10:05 PM: - For the issue of trying to automatically determine when to do what for MERGE: The main issue I see is that transactions and compare-and-set offer similar, but subtly different concurrency guarantees, that I believe are impossible to automatically choose for a user. Specifically, imagine the following example: CREATE TABLE MyTable (id INT primary key, value INT) UPSERT INTO MyTable (id, value) values(1, 1) Now, you have merge statement A: MERGE INTO MyTable USING (VALUES (1, 2)) AS NewData (id, value) ON MyTable.id = NewData.id and value >= 1 WHEN MATCHED THEN UPDATE SET MyTable.value = MyTable.value WHEN NOT MATCHED THEN INSERT VALUES (NewData.id, NewData.value) And merge statement B MERGE INTO MyTable USING (VALUES (1, 3)) AS NewData (id, value) ON MyTable.id = NewData.id and value >= 1 WHEN MATCHED THEN UPDATE SET MyTable.value = MyTable.value WHEN NOT MATCHED THEN INSERT VALUES (NewData.id, NewData.value) The main difference between the two is that value is 2 in the first, and value is 3 in the other. Now what happens if they conflict? In a compare-and-set world, the last one to commit will win (since the condition will match, no matter the order). No exception will ever be thrown. But in a transaction world, the first one to commit will win, as a transaction error would be thrown when the second one is committed. So the options seem to become either adding a special option to merge, that allows you to force compare-and-set like functionality, which may mean the grammar has some cases where it has features that are not supported for compare-and-set vs a transaction based merged, as well as having two slightly different MERGE statements with the only difference being a single keywords, with two subtly different semantics; or use the already non-standard UPSERT command as to add that support, with a syntax that calls out that this is a different operation. Having is part of the UPSERT command would also help differentiate UPSERT from MERGE, allowing you to do more low-level / different types of concurrency, as otherwise UPSERT becomes essentially a subset of the MERGE command. was (Author: cameron.hatfield): For the issue of trying to automatically determine when to do what for MERGE: The main issue I see is that transactions and compare-and-set offer similar, but subtly different concurrency guarantees, that I believe are impossible to automatically choose for a user. Specifically, imagine the following example: CREATE TABLE MyTable (id INT primary key, value INT) UPSERT INTO MyTable (id, value) values(1, 1) Now, you have merge statement A: MERGE INTO MyTable USING (VALUES (1, 2)) AS NewData (id, value) ON MyTable.id = NewData.id and value >= 1 WHEN MATCHED THEN UPDATE SET MyTable.value = MyTable.value WHEN NOT MATCHED THEN INSERT VALUES (NewData.id, NewData.value) And merge statement B MERGE INTO MyTable USING (VALUES (1, 3)) AS NewData (id, value) ON MyTable.id = NewData.id and value >= 1 WHEN MATCHED THEN UPDATE SET MyTable.value = MyTable.value WHEN NOT MATCHED THEN INSERT VALUES (NewData.id, NewData.value) The main difference between the two is that value is 2 in the first, and value is 3 in the other. Now what happens if they conflict? In a compare-and-set world, the last one to commit will win (since the condition will match, no matter the order). No exception will ever be thrown. But in a transaction world, the first one to commit will win, as a transaction error would be thrown when the second one is committed. So the options seem to become either adding a special option to merge, that allows you to force compare-and-set like functionality, which may mean the grammar has some cases where it has features that are not supported for compare-and-set vs a transaction based merged, as well as having two slightly different MERGE statements with the only difference being a single keywords, with two subtly different semantics; or use the already non-standard UPSERT command as to add that support, with a syntax that calls out that this is a different operation. This would also help differentiate UPSERT from MERGE, allowing you to do more low-level / different types of concurrency, as otherwise UPSERT becomes essentially a subset of the merge operation. > Upsert - CheckAndPut like functionality > --- > > Key: PHOENIX-2271 > URL: https://issues.apache.org/jira/browse/PHOENIX-2271 > Project: Phoenix > Issue Type: Improvement >Reporter: Babar Tareen > Attachments: patch.diff > > > The Upsert statement does not support HBase's checkAndPut api, thus
[jira] [Commented] (PHOENIX-2271) Upsert - CheckAndPut like functionality
[ https://issues.apache.org/jira/browse/PHOENIX-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905383#comment-14905383 ] Cameron Hatfield commented on PHOENIX-2271: --- For the issue of trying to automatically determine when to do what for MERGE: The main issue I see is that transactions and compare-and-set offer similar, but subtly different concurrency guarantees, that I believe are impossible to automatically choose for a user. Specifically, imagine the following example: CREATE TABLE MyTable (id INT primary key, value INT) UPSERT INTO MyTable (id, value) values(1, 1) Now, you have merge statement A: MERGE INTO MyTable USING (VALUES (1, 2)) AS NewData (id, value) ON MyTable.id = NewData.id and value >= 1 WHEN MATCHED THEN UPDATE SET MyTable.value = MyTable.value WHEN NOT MATCHED THEN INSERT VALUES (NewData.id, NewData.value) And merge statement B MERGE INTO MyTable USING (VALUES (1, 3)) AS NewData (id, value) ON MyTable.id = NewData.id and value >= 1 WHEN MATCHED THEN UPDATE SET MyTable.value = MyTable.value WHEN NOT MATCHED THEN INSERT VALUES (NewData.id, NewData.value) The main difference between the two is that value is 2 in the first, and value is 3 in the other. Now what happens if they conflict? In a compare-and-set world, the last one to commit will win (since the condition will match, no matter the order). No exception will ever be thrown. But in a transaction world, the first one to commit will win, as a transaction error would be thrown when the second one is committed. So the options seem to become either adding a special option to merge, that allows you to force compare-and-set like functionality, which may mean the grammar has some cases where it has features that are not supported for compare-and-set vs a transaction based merged, as well as having two slightly different MERGE statements with the only difference being a single keywords, with two subtly different semantics; or use the already non-standard UPSERT command as to add that support, with a syntax that calls out that this is a different operation. This would also help differentiate UPSERT from MERGE, allowing you to do more low-level / different types of concurrency, as otherwise UPSERT becomes essentially a subset of the merge operation. > Upsert - CheckAndPut like functionality > --- > > Key: PHOENIX-2271 > URL: https://issues.apache.org/jira/browse/PHOENIX-2271 > Project: Phoenix > Issue Type: Improvement >Reporter: Babar Tareen > Attachments: patch.diff > > > The Upsert statement does not support HBase's checkAndPut api, thus making it > difficult to conditionally update a row. Based on the comments from > PHOENIX-6, I have implemented such functionality. The Upsert statement is > modified to support compare clause, which allows us to pass in an expression. > The expression is evaluated against the current record and Upsert is only > performed when the expression evaluates to true. More details > [here|https://github.com/babartareen/phoenix]. > h4. Examples > Given that the FirstName is always set for the users, create a user record if > one doesn't already exist. > {code:sql} > UPSERT INTO User (UserId, FirstName, LastName, Phone, Address, PIN) VALUES > (1, 'Alice', 'A', '123 456 7890', 'Some St. in a city', 1122) COMPARE > FirstName IS NULL; > {code} > Update the phone number for UserId '1' if the FirstName is set. Given that > the FirstName is always set for the users, this will only update the record > if it already exists. > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > IS NOT NULL; > {code} > Update the phone number if the first name for UserId '1' starts with 'Al' and > last name is 'A' > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > LIKE 'Al%' AND LastName = 'A'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-2271) Upsert - CheckAndPut like functionality
[ https://issues.apache.org/jira/browse/PHOENIX-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905452#comment-14905452 ] Cameron Hatfield edited comment on PHOENIX-2271 at 9/23/15 10:56 PM: - Would you be thinking of extending the standard merge statement (which, from what I can tell without access to the ISO doc, only supports working on direct table references) with non-standard support for a value constructor, ala the syntax I stole from TSQL? Without that, it seems relatively hard to use it for our use case (compare-and-set for a row). The other minus I would see is the inability to mix and match transactions and compare-and-set within the same table, forcing you to have to choose one or the other. Though I am unsure how much sense that would make anyways. Another issue, with either path, would be if the MERGE / UPSERT would need to be extended to return information on success or not of the result, similar to POSTGRESQLs RETURNING. Currently we don't have a requirement for this, though I know there are use cases for doing so. was (Author: cameron.hatfield): Would you be thinking of extending the standard merge statement (which, from what I can tell without access to the ISO doc, only supports working on direct table references) with non-standard support for a value constructor, ala the syntax I stole from TSQL? Without that, it seems relatively hard to use it for our use case (compare-and-set for a row). The other minus I would see is the inability to mix and match transactions and compare-and-set within the same table, forcing you to have to choose one or the other. Though I am unsure how much sense that would make anyways. The another issue, with either decision should, would be if the MERGE / UPSERT would need to be extended to return information on success or not of the UPSERT, similar to POSTGRESQLs RETURNING. Currently we don't have a requirement for this, though I know there are use cases for doing so. > Upsert - CheckAndPut like functionality > --- > > Key: PHOENIX-2271 > URL: https://issues.apache.org/jira/browse/PHOENIX-2271 > Project: Phoenix > Issue Type: Improvement >Reporter: Babar Tareen > Attachments: patch.diff > > > The Upsert statement does not support HBase's checkAndPut api, thus making it > difficult to conditionally update a row. Based on the comments from > PHOENIX-6, I have implemented such functionality. The Upsert statement is > modified to support compare clause, which allows us to pass in an expression. > The expression is evaluated against the current record and Upsert is only > performed when the expression evaluates to true. More details > [here|https://github.com/babartareen/phoenix]. > h4. Examples > Given that the FirstName is always set for the users, create a user record if > one doesn't already exist. > {code:sql} > UPSERT INTO User (UserId, FirstName, LastName, Phone, Address, PIN) VALUES > (1, 'Alice', 'A', '123 456 7890', 'Some St. in a city', 1122) COMPARE > FirstName IS NULL; > {code} > Update the phone number for UserId '1' if the FirstName is set. Given that > the FirstName is always set for the users, this will only update the record > if it already exists. > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > IS NOT NULL; > {code} > Update the phone number if the first name for UserId '1' starts with 'Al' and > last name is 'A' > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > LIKE 'Al%' AND LastName = 'A'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2271) Upsert - CheckAndPut like functionality
[ https://issues.apache.org/jira/browse/PHOENIX-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905452#comment-14905452 ] Cameron Hatfield commented on PHOENIX-2271: --- Would you be thinking of extending the standard merge statement (which, from what I can tell without access to the ISO doc, only supports working on direct table references) with non-standard support for a value constructor, ala the syntax I stole from TSQL? Without that, it seems relatively hard to use it for our use case (compare-and-set for a row). The other minus I would see is the inability to mix and match transactions and compare-and-set within the same table, forcing you to have to choose one or the other. Though I am unsure how much sense that would make anyways. The another issue, with either decision should, would be if the MERGE / UPSERT would need to be extended to return information on success or not of the UPSERT, similar to POSTGRESQLs RETURNING. Currently we don't have a requirement for this, though I know there are use cases for doing so. > Upsert - CheckAndPut like functionality > --- > > Key: PHOENIX-2271 > URL: https://issues.apache.org/jira/browse/PHOENIX-2271 > Project: Phoenix > Issue Type: Improvement >Reporter: Babar Tareen > Attachments: patch.diff > > > The Upsert statement does not support HBase's checkAndPut api, thus making it > difficult to conditionally update a row. Based on the comments from > PHOENIX-6, I have implemented such functionality. The Upsert statement is > modified to support compare clause, which allows us to pass in an expression. > The expression is evaluated against the current record and Upsert is only > performed when the expression evaluates to true. More details > [here|https://github.com/babartareen/phoenix]. > h4. Examples > Given that the FirstName is always set for the users, create a user record if > one doesn't already exist. > {code:sql} > UPSERT INTO User (UserId, FirstName, LastName, Phone, Address, PIN) VALUES > (1, 'Alice', 'A', '123 456 7890', 'Some St. in a city', 1122) COMPARE > FirstName IS NULL; > {code} > Update the phone number for UserId '1' if the FirstName is set. Given that > the FirstName is always set for the users, this will only update the record > if it already exists. > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > IS NOT NULL; > {code} > Update the phone number if the first name for UserId '1' starts with 'Al' and > last name is 'A' > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > LIKE 'Al%' AND LastName = 'A'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-2271) Upsert - CheckAndPut like functionality
[ https://issues.apache.org/jira/browse/PHOENIX-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905452#comment-14905452 ] Cameron Hatfield edited comment on PHOENIX-2271 at 9/23/15 10:57 PM: - Would you be thinking of extending the standard merge statement (which, from what I can tell without access to the ISO doc, only supports working on direct table references) with non-standard support for a value constructor, ala the syntax I stole from TSQL? Without that, it seems relatively hard to use it for our use case (compare-and-set for a row). The other minus I would see is the inability to mix and match transactions and compare-and-set within the same table, forcing you to have to choose one or the other. Though I am unsure how much sense that would make anyways. Another issue, with either path, would be if the MERGE / UPSERT would need to be extended to return information on success or not of the result, similar to POSTGRESQLs RETURNING or the OUTPUT clause in TSQL. Currently we don't have a requirement for this, though I know there are use cases for doing so. was (Author: cameron.hatfield): Would you be thinking of extending the standard merge statement (which, from what I can tell without access to the ISO doc, only supports working on direct table references) with non-standard support for a value constructor, ala the syntax I stole from TSQL? Without that, it seems relatively hard to use it for our use case (compare-and-set for a row). The other minus I would see is the inability to mix and match transactions and compare-and-set within the same table, forcing you to have to choose one or the other. Though I am unsure how much sense that would make anyways. Another issue, with either path, would be if the MERGE / UPSERT would need to be extended to return information on success or not of the result, similar to POSTGRESQLs RETURNING. Currently we don't have a requirement for this, though I know there are use cases for doing so. > Upsert - CheckAndPut like functionality > --- > > Key: PHOENIX-2271 > URL: https://issues.apache.org/jira/browse/PHOENIX-2271 > Project: Phoenix > Issue Type: Improvement >Reporter: Babar Tareen > Attachments: patch.diff > > > The Upsert statement does not support HBase's checkAndPut api, thus making it > difficult to conditionally update a row. Based on the comments from > PHOENIX-6, I have implemented such functionality. The Upsert statement is > modified to support compare clause, which allows us to pass in an expression. > The expression is evaluated against the current record and Upsert is only > performed when the expression evaluates to true. More details > [here|https://github.com/babartareen/phoenix]. > h4. Examples > Given that the FirstName is always set for the users, create a user record if > one doesn't already exist. > {code:sql} > UPSERT INTO User (UserId, FirstName, LastName, Phone, Address, PIN) VALUES > (1, 'Alice', 'A', '123 456 7890', 'Some St. in a city', 1122) COMPARE > FirstName IS NULL; > {code} > Update the phone number for UserId '1' if the FirstName is set. Given that > the FirstName is always set for the users, this will only update the record > if it already exists. > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > IS NOT NULL; > {code} > Update the phone number if the first name for UserId '1' starts with 'Al' and > last name is 'A' > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > LIKE 'Al%' AND LastName = 'A'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2271) Upsert - CheckAndPut like functionality
[ https://issues.apache.org/jira/browse/PHOENIX-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903074#comment-14903074 ] Cameron Hatfield commented on PHOENIX-2271: --- It seems like it would be nice to expose an underlying primitive for compare and set, allowing you to get to features of HBase that aren't currently exposed, for when you need lighter weight features, especially since compare and set can open up more then a couple possibilities for different types of distributed computations (such as the one we are trying to support) For the upsert workaround, it seems a bit hacky to try to use udfs and a select upsert, instead of exposing a real primitive for it. That would also require a UDF call for each changed column, to compare the two versions, correct? (Don't have enough background with UDFs, as I haven't taken a good look at the underlying code yet). For the merge statement, you would also add the RPC and implementation overhead of defining both a when / when not pair, vs just the values + the compare statement, not including complicating the optimization detection for the case of simple compare and set. Of the two, the merge statement seems the most promising, but only if it: Supports inserting multiple values, instead of a single value (so that you get the advantage of doing compare and set in one statement, instead of one upsert per compare and set) Supports a way to ensure you're using the compare and set, vs relying on the optimizer to choose the option that does a compare and set under the hood. This is because it won't do the correct thing, *unless* you happened to start a transaction before running the statement. The main issue I see is the second one is that requires you to basically expose some non-sql standard way of doing things, in a sql standard statement, vs exposing it in the already non-sql standard statement. Aside from that, what is the timeline for calcite integration? > Upsert - CheckAndPut like functionality > --- > > Key: PHOENIX-2271 > URL: https://issues.apache.org/jira/browse/PHOENIX-2271 > Project: Phoenix > Issue Type: Improvement >Reporter: Babar Tareen > Attachments: patch.diff > > > The Upsert statement does not support HBase's checkAndPut api, thus making it > difficult to conditionally update a row. Based on the comments from > PHOENIX-6, I have implemented such functionality. The Upsert statement is > modified to support compare clause, which allows us to pass in an expression. > The expression is evaluated against the current record and Upsert is only > performed when the expression evaluates to true. More details > [here|https://github.com/babartareen/phoenix]. > h4. Examples > Given that the FirstName is always set for the users, create a user record if > one doesn't already exist. > {code:sql} > UPSERT INTO User (UserId, FirstName, LastName, Phone, Address, PIN) VALUES > (1, 'Alice', 'A', '123 456 7890', 'Some St. in a city', 1122) COMPARE > FirstName IS NULL; > {code} > Update the phone number for UserId '1' if the FirstName is set. Given that > the FirstName is always set for the users, this will only update the record > if it already exists. > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > IS NOT NULL; > {code} > Update the phone number if the first name for UserId '1' starts with 'Al' and > last name is 'A' > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > LIKE 'Al%' AND LastName = 'A'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2271) Upsert - CheckAndPut like functionality
[ https://issues.apache.org/jira/browse/PHOENIX-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903398#comment-14903398 ] Cameron Hatfield commented on PHOENIX-2271: --- Something along on the lines of (using TSQL syntax as an example, since it was the first I ran across): MERGE INTO A USING (VALUES ('foo', 1), ('bar', 2)) AS B (ColA, ColB) ON A.ColA = B.ColA WHEN MATCHED THEN UPDATE SET A.ColB = B.ColB WHEN NOT MATCHED THEN INSERT VALUES (B.ColA, B.ColB) And then in theory you would have some type of extension to say "Use compare and set". Though it still seems a bit like trying to force a square peg into a round hole. Yes, you can use merge to do it, but it seems like that would be more confusing then having a dedicated syntax for it. Another thing that might be worth considering is a syntax related to postgresql's syntax they are considering, or at least their discussion of the issues of MERGE vs their upsert statement: https://wiki.postgresql.org/wiki/UPSERT#SQL_MERGE_syntax Their syntax could also be extended to support a conditional clause in the case of a conflict as well. > Upsert - CheckAndPut like functionality > --- > > Key: PHOENIX-2271 > URL: https://issues.apache.org/jira/browse/PHOENIX-2271 > Project: Phoenix > Issue Type: Improvement >Reporter: Babar Tareen > Attachments: patch.diff > > > The Upsert statement does not support HBase's checkAndPut api, thus making it > difficult to conditionally update a row. Based on the comments from > PHOENIX-6, I have implemented such functionality. The Upsert statement is > modified to support compare clause, which allows us to pass in an expression. > The expression is evaluated against the current record and Upsert is only > performed when the expression evaluates to true. More details > [here|https://github.com/babartareen/phoenix]. > h4. Examples > Given that the FirstName is always set for the users, create a user record if > one doesn't already exist. > {code:sql} > UPSERT INTO User (UserId, FirstName, LastName, Phone, Address, PIN) VALUES > (1, 'Alice', 'A', '123 456 7890', 'Some St. in a city', 1122) COMPARE > FirstName IS NULL; > {code} > Update the phone number for UserId '1' if the FirstName is set. Given that > the FirstName is always set for the users, this will only update the record > if it already exists. > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > IS NOT NULL; > {code} > Update the phone number if the first name for UserId '1' starts with 'Al' and > last name is 'A' > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > LIKE 'Al%' AND LastName = 'A'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-2271) Upsert - CheckAndPut like functionality
[ https://issues.apache.org/jira/browse/PHOENIX-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903409#comment-14903409 ] Cameron Hatfield edited comment on PHOENIX-2271 at 9/22/15 9:10 PM: The other possible point of confusion is "HBase CAS is based on a single cell. How would you see it extended to multiple cells?" The patch [~babartareen] has made actually does multi-cell CAS, due to how rowlocks are acquired in the coprocessor. So it is technically a more powerful extension of the built-in checkAndPut operations provided by default HBase. The use of coprocessors for multiCheckAndPut is vaguely mentioned in https://issues.apache.org/jira/browse/HBASE-8458, and the reasonable extension is that since I have a rowlock, why not multiple compares. Multiple cell checkandput are referenced in this apparently dead patch https://issues.apache.org/jira/browse/HBASE-11274 was (Author: cameron.hatfield): The other possible point of confusion is "HBase CAS is based on a single cell. How would you see it extended to multiple cells?" The patch [~babartareen] has made actually does multi-cell CAS, due to how rowlocks are acquired in the coprocessor. So it is technically a more powerful extension of the built-in checkAndPut operations provided by default HBase. The use of coprocessors for multiCheckAndPut is vaguely mentioned in https://issues.apache.org/jira/browse/HBASE-8458, and the reasonable extension is that since I have a rowlock, why not multiple compares. > Upsert - CheckAndPut like functionality > --- > > Key: PHOENIX-2271 > URL: https://issues.apache.org/jira/browse/PHOENIX-2271 > Project: Phoenix > Issue Type: Improvement >Reporter: Babar Tareen > Attachments: patch.diff > > > The Upsert statement does not support HBase's checkAndPut api, thus making it > difficult to conditionally update a row. Based on the comments from > PHOENIX-6, I have implemented such functionality. The Upsert statement is > modified to support compare clause, which allows us to pass in an expression. > The expression is evaluated against the current record and Upsert is only > performed when the expression evaluates to true. More details > [here|https://github.com/babartareen/phoenix]. > h4. Examples > Given that the FirstName is always set for the users, create a user record if > one doesn't already exist. > {code:sql} > UPSERT INTO User (UserId, FirstName, LastName, Phone, Address, PIN) VALUES > (1, 'Alice', 'A', '123 456 7890', 'Some St. in a city', 1122) COMPARE > FirstName IS NULL; > {code} > Update the phone number for UserId '1' if the FirstName is set. Given that > the FirstName is always set for the users, this will only update the record > if it already exists. > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > IS NOT NULL; > {code} > Update the phone number if the first name for UserId '1' starts with 'Al' and > last name is 'A' > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > LIKE 'Al%' AND LastName = 'A'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-2271) Upsert - CheckAndPut like functionality
[ https://issues.apache.org/jira/browse/PHOENIX-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903409#comment-14903409 ] Cameron Hatfield edited comment on PHOENIX-2271 at 9/22/15 9:10 PM: The other possible point of confusion is "HBase CAS is based on a single cell. How would you see it extended to multiple cells?" The patch [~babartareen] has made actually does multi-cell CAS, due to how rowlocks are acquired in the coprocessor. So it is technically a more powerful extension of the built-in checkAndPut operations provided by default HBase. The use of coprocessors for multiCheckAndPut is vaguely mentioned in https://issues.apache.org/jira/browse/HBASE-8458, and the reasonable extension is that since I have a rowlock, why not multiple compares. Multiple cell checkandput is referenced in this apparently dead patch https://issues.apache.org/jira/browse/HBASE-11274 was (Author: cameron.hatfield): The other possible point of confusion is "HBase CAS is based on a single cell. How would you see it extended to multiple cells?" The patch [~babartareen] has made actually does multi-cell CAS, due to how rowlocks are acquired in the coprocessor. So it is technically a more powerful extension of the built-in checkAndPut operations provided by default HBase. The use of coprocessors for multiCheckAndPut is vaguely mentioned in https://issues.apache.org/jira/browse/HBASE-8458, and the reasonable extension is that since I have a rowlock, why not multiple compares. Multiple cell checkandput are referenced in this apparently dead patch https://issues.apache.org/jira/browse/HBASE-11274 > Upsert - CheckAndPut like functionality > --- > > Key: PHOENIX-2271 > URL: https://issues.apache.org/jira/browse/PHOENIX-2271 > Project: Phoenix > Issue Type: Improvement >Reporter: Babar Tareen > Attachments: patch.diff > > > The Upsert statement does not support HBase's checkAndPut api, thus making it > difficult to conditionally update a row. Based on the comments from > PHOENIX-6, I have implemented such functionality. The Upsert statement is > modified to support compare clause, which allows us to pass in an expression. > The expression is evaluated against the current record and Upsert is only > performed when the expression evaluates to true. More details > [here|https://github.com/babartareen/phoenix]. > h4. Examples > Given that the FirstName is always set for the users, create a user record if > one doesn't already exist. > {code:sql} > UPSERT INTO User (UserId, FirstName, LastName, Phone, Address, PIN) VALUES > (1, 'Alice', 'A', '123 456 7890', 'Some St. in a city', 1122) COMPARE > FirstName IS NULL; > {code} > Update the phone number for UserId '1' if the FirstName is set. Given that > the FirstName is always set for the users, this will only update the record > if it already exists. > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > IS NOT NULL; > {code} > Update the phone number if the first name for UserId '1' starts with 'Al' and > last name is 'A' > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > LIKE 'Al%' AND LastName = 'A'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-2271) Upsert - CheckAndPut like functionality
[ https://issues.apache.org/jira/browse/PHOENIX-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903409#comment-14903409 ] Cameron Hatfield edited comment on PHOENIX-2271 at 9/22/15 9:08 PM: The other possible point of confusion is "HBase CAS is based on a single cell. How would you see it extended to multiple cells?" The patch [~babartareen] has made actually does multi-cell CAS, due to how rowlocks are acquired in the coprocessor. So it is technically a more powerful extension of the built-in checkAndPut operations provided by default HBase. The use of coprocessors for multiCheckAndPut is vaguely mentioned in https://issues.apache.org/jira/browse/HBASE-8458, and the reasonable extension is that since I have a rowlock, why not multiple compares. was (Author: cameron.hatfield): The other possible point of confusion is "HBase CAS is based on a single cell. How would you see it extended to multiple cells?" The patch [~babartareen] has made actually does multi-cell CAS, due to how rowlocks are acquired in the coprocessor. So it is technically a more powerful extension of the built-in checkAndPut operations provided by default HBase. This is vaguely mentioned in https://issues.apache.org/jira/browse/HBASE-8458 > Upsert - CheckAndPut like functionality > --- > > Key: PHOENIX-2271 > URL: https://issues.apache.org/jira/browse/PHOENIX-2271 > Project: Phoenix > Issue Type: Improvement >Reporter: Babar Tareen > Attachments: patch.diff > > > The Upsert statement does not support HBase's checkAndPut api, thus making it > difficult to conditionally update a row. Based on the comments from > PHOENIX-6, I have implemented such functionality. The Upsert statement is > modified to support compare clause, which allows us to pass in an expression. > The expression is evaluated against the current record and Upsert is only > performed when the expression evaluates to true. More details > [here|https://github.com/babartareen/phoenix]. > h4. Examples > Given that the FirstName is always set for the users, create a user record if > one doesn't already exist. > {code:sql} > UPSERT INTO User (UserId, FirstName, LastName, Phone, Address, PIN) VALUES > (1, 'Alice', 'A', '123 456 7890', 'Some St. in a city', 1122) COMPARE > FirstName IS NULL; > {code} > Update the phone number for UserId '1' if the FirstName is set. Given that > the FirstName is always set for the users, this will only update the record > if it already exists. > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > IS NOT NULL; > {code} > Update the phone number if the first name for UserId '1' starts with 'Al' and > last name is 'A' > {code:sql} > UPSERT INTO User (UserId, Phone) VALUES (1, '987 654 3210') COMPARE FirstName > LIKE 'Al%' AND LastName = 'A'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)