[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645719#comment-15645719 ] Gary Horen commented on PHOENIX-6: -- in the near term this will probably be mostly single rows / commit. As time goes on other use cases might present larger sized batches. I would not expect a single row to be updated many times in a single commit. That would be rare for us. Peak arrival rate (occuring a handful of times / day) would be in the order of a handful per second, for now. Modal arrival rate during the day will probably be several / minute. The current scenario is counting views for feed items. Some feed items will be very popular, others will be viewed seldom. My wild guess would be that the ratio of popular to unpopular will be 10:1 with a gentle downward asymptote between them. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, > PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_v4.patch, > PHOENIX-6_v5.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, > PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > * The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > * This new clause cannot be used with > ** Immutable tables since the whole point is to atomically update a row in > place which isn't allowed for immutable tables. > ** Transactional tables because these use optimistic concurrency as their > mechanism for consistency and isolation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613510#comment-15613510 ] Hudson commented on PHOENIX-6: -- SUCCESS: Integrated in Jenkins build Phoenix-master #1455 (See [https://builds.apache.org/job/Phoenix-master/1455/]) PHOENIX-6 Support ON DUPLICATE KEY construct (jamestaylor: rev 927c61205c818fcfd3c743bcb8ddea47a8b04f7f) * (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/builder/BaseIndexBuilder.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/index/PhoenixIndexBuilder.java * (edit) phoenix-core/src/it/java/org/apache/phoenix/tx/TransactionIT.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java * (edit) phoenix-core/src/test/java/org/apache/phoenix/compile/QueryCompilerTest.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/parse/UpsertStatement.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/compile/DeleteCompiler.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/schema/DelegateTable.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/schema/PColumnImpl.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/compile/UpsertCompiler.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/schema/PTable.java * (add) phoenix-core/src/it/java/org/apache/phoenix/end2end/OnDuplicateKeyIT.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/util/ExpressionUtil.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/schema/DelegateColumn.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/builder/IndexBuildManager.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/schema/PRow.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/builder/IndexBuilder.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/execute/MutationState.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeFactory.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/util/KeyValueBuilder.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/Indexer.java * (edit) phoenix-core/src/main/antlr3/PhoenixSQL.g * (edit) phoenix-core/src/main/java/org/apache/phoenix/schema/PTableImpl.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/covered/IndexCodec.java * (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/index/IndexTestUtil.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/exception/SQLExceptionCode.java > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, > PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_v4.patch, > PHOENIX-6_v5.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, > PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > * The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > * This new clause cannot be used with > ** Immutable tables since the whole point is to atomically update a row in > place which isn't allowed for immutable tables. > ** Transactional tables because these use optimistic concurrency as their > mechanism for consistency and isolation. -- This message was sent by A
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613167#comment-15613167 ] Hadoop QA commented on PHOENIX-6: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12835641/PHOENIX-6_v4.patch against master branch at commit 2c53dac2867e688d7a9f83fe34a7b5ebb097dea5. ATTACHMENT ID: 12835641 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 43 warning messages. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +String ddl = " create table " + tableName + "(pk varchar primary key, counter1 bigint, counter2 smallint)"; +String dml = "UPSERT INTO " + tableName + " VALUES('a',0) ON DUPLICATE KEY UPDATE counter1 = counter1 + 1"; +ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " + tableName + " WHERE counter1 >= 0"); +rs = conn.createStatement().executeQuery("SELECT * FROM " + tableName + " WHERE counter1 >= 0"); +String ddl = " create table " + tableName + "(k1 varchar, k2 varchar, counter1 varchar, counter2 varchar, other1 char(3), other2 varchar default 'f', constraint pk primary key (k1,k2))"; + "ON DUPLICATE KEY UPDATE counter1 = counter1 || CASE WHEN LENGTH(counter1) < 10 THEN 'SMALL' ELSE 'LARGE' END || k2 || other2 || other1 "; +String ddl = " create table " + tableName + "(pk varchar primary key, counter1 varchar, counter2 smallint)"; +String dml = "UPSERT INTO " + tableName + " VALUES('a','b') ON DUPLICATE KEY UPDATE counter1 = counter1 || 'b'"; +ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " + tableName + " WHERE substr(counter1,1,1) = 'b'"); +rs = conn.createStatement().executeQuery("SELECT * FROM " + tableName + " WHERE substr(counter1,1,1) = 'b'"); {color:green}+1 core tests{color}. The patch passed unit tests in . {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.atlas.hive.hook.HiveHookIT.testCreateExternalTable(HiveHookIT.java:208) at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80) at org.testng.internal.Invoker.invokeMethod(Invoker.java:673) at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:842) at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1166) at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125) at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109) at org.testng.TestRunner.runWorkers(TestRunner.java:1178) at org.testng.TestRunner.privateRun(TestRunner.java:757) at org.testng.TestRunner.run(TestRunner.java:608) at org.testng.SuiteRunner.runTest(SuiteRunner.java:334) at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329) at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291) at org.testng.SuiteRunner.run(SuiteRunner.java:240) at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) at org.testng.TestNG.runSuitesSequentially(TestNG.java:1158) at org.testng.TestNG.runSuitesLocally(TestNG.java:1083) at org.testng.TestNG.run(TestNG.java:999) at org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:132) at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeSingleClass(TestNGDirectoryTestSuite.java:112) at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:99) at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:147) Test results: https://builds.apache.org/job/PreCommit-PHOENIX-Build/649//testReport/ Javadoc warnings: https://builds.apache.org/job/PreCommit-PHOENIX-Build/649//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/649//console This message is automatically generated. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: Jam
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613164#comment-15613164 ] Samarth Jain commented on PHOENIX-6: +1, this is a great addition, [~jamestaylor]! > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, > PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_v4.patch, > PHOENIX-6_v5.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, > PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > * The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > * This new clause cannot be used with > ** Immutable tables since the whole point is to atomically update a row in > place which isn't allowed for immutable tables. > ** Transactional tables because these use optimistic concurrency as their > mechanism for consistency and isolation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613077#comment-15613077 ] James Taylor commented on PHOENIX-6: bq. How about a covered column test case for local indexes too? Sounds good - I've made the parameterized local index covered. Do you think queries with a subquery in the update clause will compile? We only allow expressions in the ON DUPLICATE KEY UPDATE syntax, so it's not possible to have a subquery (or any query) there. Also what if the UDF has a row key column. Will the expression compiler be able to figure that out? The left-hand-side of the ON DUPLICATE KEY UPDATE may only be a column - a UDF would yield a syntax error. It's valid to use a UDF on the right-hand-side (and equally valid to refer to row key columns too). In that case, you'd be invoking the UDF not causing the row key to change. Thanks for the reviews, [~samarthjain]! > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, > PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_v4.patch, > PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, > PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > * The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > * This new clause cannot be used with > ** Immutable tables since the whole point is to atomically update a row in > place which isn't allowed for immutable tables. > ** Transactional tables because these use optimistic concurrency as their > mechanism for consistency and isolation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612995#comment-15612995 ] Samarth Jain commented on PHOENIX-6: Thanks for the explanation, makes sense. bq. This is already covered as the test is parameterized with indexes added for both the indexed and covered column case. Ah, missed that. How about a covered column test case for local indexes too? Last two questions, :) Do you think queries with a subquery in the update clause will compile? Also what if the UDF has a row key column. Will the expression compiler be able to figure that out? > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, > PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_v4.patch, > PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, > PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > * The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > * This new clause cannot be used with > ** Immutable tables since the whole point is to atomically update a row in > place which isn't allowed for immutable tables. > ** Transactional tables because these use optimistic concurrency as their > mechanism for consistency and isolation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612838#comment-15612838 ] James Taylor commented on PHOENIX-6: bq. How about a couple more test cases where the expression to be evaluated is a bit more complex like a CASE statement Sure, I'll the test case with a CASE statement. The complexity of the expression doesn't make a lot of difference as it's going through the same ExpressionCompiler that everything goes through. bq. Also, some test cases around indexes with covered columns where the expression is updating covered and/or indexed columns. This is already covered as the test is parameterized with indexes added for both the indexed and covered column case. bq. Can you tell me more about the reasoning behind why we disallow the columns that are part of the row key in the UPDATE clause. If you update the row key, your inserting a new row that'll potentially land in a different region on a different region server. We can't support an atomic operation for that. We'll document that it's not a great idea to atomically update columns that are contained in global indexes, but it will work. The preIncrementAfterRowLock coprocessor hook has an exclusive lock on the row and while under lock the coprocessors will fire that update the index rows so they'd be updated atomically as well. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, > PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_wip1.patch, > PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > * The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > * This new clause cannot be used with > ** Immutable tables since the whole point is to atomically update a row in > place which isn't allowed for immutable tables. > ** Transactional tables because these use optimistic concurrency as their > mechanism for consistency and isolation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612798#comment-15612798 ] Samarth Jain commented on PHOENIX-6: Patch looks good, [~jamestaylor]. How about a couple more test cases where the expression to be evaluated is a bit more complex like a CASE statement. Also, some test cases around indexes with covered columns where the expression is updating covered and/or indexed columns. Can you tell me more about the reasoning behind why we disallow the columns that are part of the row key in the UPDATE clause. Also, why does the same reasoning not apply for the case when we are updating an index on the data table? > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, > PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_wip1.patch, > PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > * The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > * This new clause cannot be used with > ** Immutable tables since the whole point is to atomically update a row in > place which isn't allowed for immutable tables. > ** Transactional tables because these use optimistic concurrency as their > mechanism for consistency and isolation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612601#comment-15612601 ] Samarth Jain commented on PHOENIX-6: Couple of questions, [~jamestaylor], otherwise the patch looks good to me. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, > PHOENIX-6_v2.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, > PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > * The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > * This new clause cannot be used with > ** Immutable tables since the whole point is to atomically update a row in > place which isn't allowed for immutable tables. > ** Transactional tables because these use optimistic concurrency as their > mechanism for consistency and isolation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612583#comment-15612583 ] Samarth Jain commented on PHOENIX-6: {code} +// We cannot neither use the time stamp in the Increment to set the Get time range +// nor set the Put/Delete time stamp and have this be atomic as HBase does not +// handle that. +long ts = HConstants.LATEST_TIMESTAMP; +byte[] rowKey = inc.getRow(); {code} Does that mean point in time queries/setting SCN won't work with this construct? Either way, it would be good to have a couple of test cases where the connection has an SCN set. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, > PHOENIX-6_v2.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, > PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > * The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > * This new clause cannot be used with > ** Immutable tables since the whole point is to atomically update a row in > place which isn't allowed for immutable tables. > ** Transactional tables because these use optimistic concurrency as their > mechanism for consistency and isolation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610914#comment-15610914 ] Samarth Jain commented on PHOENIX-6: [~jamestaylor] - is there any restriction on what kind of expressions are allowed on the UPDATE part? Would something like this compile? {code} UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY UPDATE COUNTER1 = SUM(COUNTER2) {code} > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, > PHOENIX-6_v2.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, > PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > * The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > * This new clause cannot be used with > ** Immutable tables since the whole point is to atomically update a row in > place which isn't allowed for immutable tables. > ** Transactional tables because these use optimistic concurrency as their > mechanism for consistency and isolation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610459#comment-15610459 ] Hadoop QA commented on PHOENIX-6: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12835456/PHOENIX-6_v2.patch against master branch at commit 53ca2882722214e15a2f32afd0ca3ed3a4016227. ATTACHMENT ID: 12835456 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 43 warning messages. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +String ddl = " create table " + tableName + "(pk varchar primary key, counter1 bigint, counter2 smallint)"; +String dml = "UPSERT INTO " + tableName + " VALUES('a',0) ON DUPLICATE KEY UPDATE counter1 = counter1 + 1"; +ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " + tableName + " WHERE counter1 >= 0"); +rs = conn.createStatement().executeQuery("SELECT * FROM " + tableName + " WHERE counter1 >= 0"); +String ddl = " create table " + tableName + "(k1 varchar, k2 varchar, counter1 varchar, counter2 varchar, other1 char(3), other2 varchar default 'f', constraint pk primary key (k1,k2))"; +String ddl = " create table " + tableName + "(pk varchar primary key, counter1 varchar, counter2 smallint)"; +String dml = "UPSERT INTO " + tableName + " VALUES('a','b') ON DUPLICATE KEY UPDATE counter1 = counter1 || 'b'"; +ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " + tableName + " WHERE substr(counter1,1,1) = 'b'"); +rs = conn.createStatement().executeQuery("SELECT * FROM " + tableName + " WHERE substr(counter1,1,1) = 'b'"); +String ddl = " create table " + tableName + "(pk varchar primary key, counter1 varchar, counter2 smallint)"; {color:red}-1 core tests{color}. The patch failed these unit tests: ./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.index.MutableIndexFailureIT Test results: https://builds.apache.org/job/PreCommit-PHOENIX-Build/646//testReport/ Javadoc warnings: https://builds.apache.org/job/PreCommit-PHOENIX-Build/646//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/646//console This message is automatically generated. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, > PHOENIX-6_v2.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, > PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > * The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > * This new clause cannot be used with > ** Immutable tables since the whole point is to atomically update a row in > place which isn't allowed for immutable tables. > ** Transactional
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15607551#comment-15607551 ] Hadoop QA commented on PHOENIX-6: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12835267/PHOENIX-6.patch against master branch at commit 202b8eb1eda29db01006cdeefd8199f0bd360692. ATTACHMENT ID: 12835267 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation, build, or dev patch that doesn't require tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/643//console This message is automatically generated. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, > PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, > PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > * The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > * This new clause cannot be used with > ** Immutable tables since the whole point is to atomically update a row in > place which isn't allowed for immutable tables. > ** Transactional tables because these use optimistic concurrency as their > mechanism for consistency and isolation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15607265#comment-15607265 ] James Taylor commented on PHOENIX-6: [~gho...@salesforce.com] - Can you give us a bit of information on usage pattern for this feature to help guide [~mujtabachohan]'s performance evaluation? In particular: - How many rows will you be incrementing in one commit batch? Is it just a single row commit? Or could it be hundreds of rows at a time? - Will one row be updated multiple times in the same commit batch? - Any idea of the velocity of updates? How many rows per second would you expect to be updated? - How frequently will the same rows be updated? Is it highly skewed in that the same rows will be incremented over and over again? Or is it a pretty even distribution across the board? > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, > PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, > PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > * The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > * This new clause cannot be used with > ** Immutable tables since the whole point is to atomically update a row in > place which isn't allowed for immutable tables. > ** Transactional tables because these use optimistic concurrency as their > mechanism for consistency and isolation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606891#comment-15606891 ] Hadoop QA commented on PHOENIX-6: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12835228/PHOENIX-6_4.x-HBase-0.98.patch against 4.x-HBase-0.98 branch at commit a5bcb3ea9a86b800b44b1d7815b094d7a952a11b. ATTACHMENT ID: 12835228 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/641//console This message is automatically generated. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, > PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, > PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > * The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > * This new clause cannot be used with > ** Immutable tables since the whole point is to atomically update a row in > place which isn't allowed for immutable tables. > ** Transactional tables because these use optimistic concurrency as their > mechanism for consistency and isolation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606521#comment-15606521 ] Mujtaba Chohan commented on PHOENIX-6: -- Sure [~jamestaylor] > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_wip1.patch, > PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576237#comment-15576237 ] Cameron Hatfield commented on PHOENIX-6: And in general, I agree with the heavyweightness of the merge statement. I could also see ON DUPLICATE KEY having slightly different semantics / optimization opportunities, ala Postgreses discussion around the feature, where having two statements makes sense, even if the more general statement can handle the same problem. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576149#comment-15576149 ] Cameron Hatfield commented on PHOENIX-6: It would work for me, though I'm not sure if it would be consistent the statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: *UPSERT DUPLICATE KEY *UPSERT FROM SELECT *UPSERT DUPLICATE KEY *UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576121#comment-15576121 ] Cameron Hatfield commented on PHOENIX-6: Here is what MSSQL does to handle not having to repeat the actual values (though the clause is still repeated), as well as how it supports multiple values for the same merge: {code:sql} MERGE INTO Sales.SalesReason AS Target USING (VALUES ('Recommendation','Other'), ('Review', 'Marketing'), ('Internet', 'Promotion')) AS Source (NewName, NewReasonType) ON Target.Name = Source.NewName WHEN MATCHED THEN UPDATE SET ReasonType = Source.NewReasonType WHEN NOT MATCHED BY TARGET THEN INSERT (Name, ReasonType) VALUES (NewName, NewReasonType) {code} > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576102#comment-15576102 ] Cameron Hatfield commented on PHOENIX-6: In general, the main use case I have is reducing RPC calls across "compare-and-set" requests. A tertiary use case would be reducing the amount of work the parser/compiler has to do (issuing and parsing one statement is generally easier/faster then issuing and parsing multiple of the same statements). The first could be handled by either supporting batching at the client level (so one RPC call for multiple statements being run), though there are questions on what autocommit means in that respect, or by supporting a single statement that supports multiple values (such as a values statement that supports multiple rows). The allowing multiple statements for the same row key would basically be equivalent to an "autocommit" after each individual statement, so that would work for me. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573250#comment-15573250 ] James Taylor commented on PHOENIX-6: I'm inclined to make the single-commit-for-multi-statements case work the same as the separate-commit-per-statement case. I think that makes more sense. So in that case, the result of the following statements (assuming the row doesn't already exist) would be (1,1): {code} AUTO COMMIT OFF UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; COMMIT {code} Does that make sense to you and help you with your use case, [~cameron.hatfield]? > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563263#comment-15563263 ] James Taylor commented on PHOENIX-6: I'm exploring an alternate syntax (with equivalent functionality) so that when we move to Calcite there will be no need for users to make any changes. Calcite supports the MERGE statement which is a superset of this functionality. I'm trying to come up with the minimal support for MERGE we'd need. {code} MERGE MyTable AS t USING (VALUES('a','b',0,0)) AS s(pk1,pk2,counter1,counter2) ON (t.pk1,t.pk2) = (s.pk1, s.pk2) WHEN MATCHED THEN UPDATE SET t.counter1 = s.counter1 + 1, t.counter2 = s.counter2 + 1 WHEN NOT MATCHED THEN INSERT(pk1,pk2,counter1,counter2) VALUES('a', 'b', 0, 0) {code} Not sure how restrict/scope down the ON clause as the only construct allowed there would be a PK match. The verbosity of the MERGE statement is not ideal for this simple use case, as the ON DUPLICATE KEY construct only has to define the WHEN MATCHED part. Also, it's unfortunate that the VALUES clause has to be repeated. Is there any way around that, [~julianhyde]? IMHO, the above is harder to read and will be harder to implement too. Perhaps we can support the ON DUPLICATE KEY syntax and generate a MERGE statement from it when the Phoenix/Calcite integration is complete. Thoughts? > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547135#comment-15547135 ] James Taylor commented on PHOENIX-6: Would you mind filing a separate JIRA for your use case? If I understand it correctly, you'd like to be able to have multiple atomic insert/update statements for the same row in the same commit batch and have them sequentially execute? Would the atomicity be for the entire commit batch (for all rows being operated on) or if not, how would potentially overlapping rows from other batches be handled? For your question, I think you meant to use the same table name for both statements, right? For non transactional tables, the UPSERT SELECT wouldn't see the data from the UPSERT VALUES. When auto commit is off, the data is kept in memory on the client until the commit occurs. The UPSERT VALUES would cause a new row to be cached on the client (in memory) and the UPSERT SELECT would cause N rows to be cached on the client (where N is the number of rows currently in the table). The commit essentially generates the HBase Puts and does a batch mutate. For transactional tables, you do see your own updates. In that case, the UPSERT SELECT would see the row from the UPSERT VALUES call. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546874#comment-15546874 ] Cameron Hatfield commented on PHOENIX-6: For the help with my use case question, not really, due to the round-trip per row insert (same reasoning as the other "batch checkandput" bugs within phoenix and HBase). For the question about commit, and this is just me trying to understand it, if you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a delete, or select), according to that wording. Is that correct? For example, what happens in the following case (when using jdbc driver commands for autocommit / commit): {code:sql} AUTOCOMMIT OFF; UPSERT INTO T (PK, V) VALUES (0,0); -- Run batch size + 1 times UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff; COMMIT; {code} In theory, according to SQL + the docs, I would expect to see only one new row added to the table. But if more then one batch occurs, an extra two rows should exist. The most likely time for this to show up, of course, is multiple upsert selects within the same commit on a large enough table. Anyways, this is only mildly related to this bug, as it this same problem would be more pronounced with an on duplicate key command, as you would probably actually be using it for things like increments / etc. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546680#comment-15546680 ] James Taylor commented on PHOENIX-6: With Phoenix, for non transactional tables, the updates to the HBase data all happen at commit time, so UPSERT SELECT won't see uncommitted data. It's true, though that: "Non transactional tables have no guarantees above and beyond the HBase guarantee of row level atomicity (see here). In addition, non transactional tables will not see their updates until after a commit has occurred." Will this new ON DUPLICATE KEY syntax and semantics help with your use case, [~cameron.hatfield]? > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546605#comment-15546605 ] Cameron Hatfield commented on PHOENIX-6: Ahh, I saw where I was getting confused. I thought that Phoenix implemented executebatch according to the JDBC contract (multiple statements will be sent in one go to the server), however instead they send each statement separately to the underlying db. Would probably be worthwhile to mention that if you want to use this for counters / etc, that each separate statement hitting the same row requires a separate commit (just to make it more obvious that it is how it would work). This would also be an interesting interaction with how UPSERT ... SELECT batches inserts, when duplicates are involved, since the it would act differently depending on where duplicates where in the batch. Technically, I believe this is a problem now, as UPSERT .. SELECT would allow you to read data that is not yet "committed", so its possible that the documentation for that should be updated (as a separate bug): "Non transactional tables have no guarantees above and beyond the HBase guarantee of row level atomicity (see here). *In addition, non transactional tables will not see their updates until after a commit has occurred.*" > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546487#comment-15546487 ] James Taylor commented on PHOENIX-6: Yes, agreed, [~cameron.hatfield]. If these two statements are in the same commit batch, the result would be 0, 0: {code} UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; -- With auto commit off, the result would be 'a', 0, 0 {code} > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546436#comment-15546436 ] Cameron Hatfield commented on PHOENIX-6: Except, in the second case (two statements, same batch), the value of row 'a' of would non-existent, as they have never been interested before. So wouldn't the result be 0,0 if it was in the same batch? > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546218#comment-15546218 ] James Taylor commented on PHOENIX-6: The UPSERT VALUES ... ON DUPLICATE KEY will be evaluated on the server side at commit time. As with existing UPSERT VALUES calls, only the last statement takes effect if multiple statements for the same row key are in the same batch. So in your example, assuming auto commit is true: {code} UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; -- Result would be: 'a', 0, 0 UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; -- Result would be 'a', 1, 1 {code} Since we do a read under lock of the row, we'd get the values of row 'a' before any of the ON DUPLICATE KEY expressions are evaluated, and then we'd evaluate all of the expressions against the current values of row 'a'. I'll make sure to add a unit test like this - it's a good test. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543719#comment-15543719 ] Cameron Hatfield commented on PHOENIX-6: And as a general note, though inefficent, when combined with an IF-like function, this would also cover our CAS use cases. However, it will be highly dependent on the ordering guarantees above. Here's the first blog post example I found of doing CAS like things in another SQL engine, https://thewebfellas.com/blog/conditional-duplicate-key-updates-with-mysql > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543705#comment-15543705 ] Cameron Hatfield commented on PHOENIX-6: At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Both run in two separate round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531383#comment-15531383 ] James Taylor commented on PHOENIX-6: bq. but incrementing a numeric column can use the optimized path, right? - An Increment does a read+write under row lock like we're planning for this JIRA. See HRegion.java [1], doIncrement, applyIncrementsToColumnFamily and getIncrementCurrentValue calls. - Our mechanism will have only one code path - there's no need to use Increment (and it wouldn't work in the context of Phoenix for a variety of reasons). - Our mechanism will be more general as you can set a column value to the evaluation of *any* expression, not just column+1 - Our mechanism also gives you the ability to define the initial value while Increment doesn't [1] https://github.com/apache/hbase/blob/branch-1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531316#comment-15531316 ] Gary Horen commented on PHOENIX-6: -- [~giacomotaylor]: >>would thus require a read before write For a clause ON DUPLICATE KEY counter = counter +1 you could just issue a put that contains an Increment, right? My understanding is that hbase would either instantiate the column (and row) if it doesn't exist, or apply the Increment to the existing column, right? We discussed this a couple of days ago but I'm not seeing any explicit description of it in this Jira. ON DUPLICATE KEY IGNORE, and ON DUPLICATE KEY would require read-then-write as far as I understand, but incrementing a numeric column can use the optimized path, right? > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530929#comment-15530929 ] James Taylor commented on PHOENIX-6: FYI, [~gho...@salesforce.com] > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)