[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-11-07 Thread Gary Horen (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645719#comment-15645719
 ] 

Gary Horen commented on PHOENIX-6:
--

in the near term this will probably be mostly single rows / commit. As time 
goes on other use cases might present larger sized batches.

I would not expect a single row to be updated many times in a single commit. 
That would be rare for us.

Peak arrival rate (occuring a handful of times / day) would be in the order of 
a handful per second, for now. Modal arrival rate during the day will probably 
be several / minute.

The current scenario is counting views for feed items. Some feed items will be 
very popular, others will be viewed seldom. My wild guess would be that the 
ratio of popular to unpopular will be 10:1 with a gentle downward asymptote 
between them.


> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_v4.patch, 
> PHOENIX-6_v5.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, 
> PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because these use optimistic concurrency as their 
> mechanism for consistency and isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613510#comment-15613510
 ] 

Hudson commented on PHOENIX-6:
--

SUCCESS: Integrated in Jenkins build Phoenix-master #1455 (See 
[https://builds.apache.org/job/Phoenix-master/1455/])
PHOENIX-6 Support ON DUPLICATE KEY construct (jamestaylor: rev 
927c61205c818fcfd3c743bcb8ddea47a8b04f7f)
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/hbase/index/builder/BaseIndexBuilder.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/index/PhoenixIndexBuilder.java
* (edit) phoenix-core/src/it/java/org/apache/phoenix/tx/TransactionIT.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java
* (edit) 
phoenix-core/src/test/java/org/apache/phoenix/compile/QueryCompilerTest.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/parse/UpsertStatement.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/compile/DeleteCompiler.java
* (edit) phoenix-core/src/main/java/org/apache/phoenix/schema/DelegateTable.java
* (edit) phoenix-core/src/main/java/org/apache/phoenix/schema/PColumnImpl.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/compile/UpsertCompiler.java
* (edit) phoenix-core/src/main/java/org/apache/phoenix/schema/PTable.java
* (add) 
phoenix-core/src/it/java/org/apache/phoenix/end2end/OnDuplicateKeyIT.java
* (edit) phoenix-core/src/main/java/org/apache/phoenix/util/ExpressionUtil.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/schema/DelegateColumn.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/hbase/index/builder/IndexBuildManager.java
* (edit) phoenix-core/src/main/java/org/apache/phoenix/schema/PRow.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/hbase/index/builder/IndexBuilder.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/execute/MutationState.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeFactory.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/hbase/index/util/KeyValueBuilder.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java
* (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/Indexer.java
* (edit) phoenix-core/src/main/antlr3/PhoenixSQL.g
* (edit) phoenix-core/src/main/java/org/apache/phoenix/schema/PTableImpl.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/hbase/index/covered/IndexCodec.java
* (edit) 
phoenix-core/src/it/java/org/apache/phoenix/end2end/index/IndexTestUtil.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/exception/SQLExceptionCode.java


> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_v4.patch, 
> PHOENIX-6_v5.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, 
> PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because these use optimistic concurrency as their 
> mechanism for consistency and isolation.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613167#comment-15613167
 ] 

Hadoop QA commented on PHOENIX-6:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12835641/PHOENIX-6_v4.patch
  against master branch at commit 2c53dac2867e688d7a9f83fe34a7b5ebb097dea5.
  ATTACHMENT ID: 12835641

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
43 warning messages.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+String ddl = " create table " + tableName + "(pk varchar primary 
key, counter1 bigint, counter2 smallint)";
+String dml = "UPSERT INTO " + tableName + " VALUES('a',0) ON DUPLICATE 
KEY UPDATE counter1 = counter1 + 1";
+ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " + 
tableName + " WHERE counter1 >= 0");
+rs = conn.createStatement().executeQuery("SELECT * FROM " + tableName 
+ " WHERE counter1 >= 0");
+String ddl = " create table " + tableName + "(k1 varchar, k2 varchar, 
counter1 varchar, counter2 varchar, other1 char(3), other2 varchar default 'f', 
constraint pk primary key (k1,k2))";
+ "ON DUPLICATE KEY UPDATE counter1 = counter1 || CASE WHEN 
LENGTH(counter1) < 10 THEN 'SMALL' ELSE 'LARGE' END || k2 || other2 || other1 ";
+String ddl = " create table " + tableName + "(pk varchar primary key, 
counter1 varchar, counter2 smallint)";
+String dml = "UPSERT INTO " + tableName + " VALUES('a','b') ON 
DUPLICATE KEY UPDATE counter1 = counter1 || 'b'";
+ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " + 
tableName + " WHERE substr(counter1,1,1) = 'b'");
+rs = conn.createStatement().executeQuery("SELECT * FROM " + tableName 
+ " WHERE substr(counter1,1,1) = 'b'");

{color:green}+1 core tests{color}.  The patch passed unit tests in .

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.atlas.hive.hook.HiveHookIT.testCreateExternalTable(HiveHookIT.java:208)
at 
org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:673)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:842)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1166)
at 
org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
at org.testng.TestRunner.runWorkers(TestRunner.java:1178)
at org.testng.TestRunner.privateRun(TestRunner.java:757)
at org.testng.TestRunner.run(TestRunner.java:608)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291)
at org.testng.SuiteRunner.run(SuiteRunner.java:240)
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1158)
at org.testng.TestNG.runSuitesLocally(TestNG.java:1083)
at org.testng.TestNG.run(TestNG.java:999)
at 
org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:132)
at 
org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeSingleClass(TestNGDirectoryTestSuite.java:112)
at 
org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:99)
at 
org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:147)

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/649//testReport/
Javadoc warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/649//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/649//console

This message is automatically generated.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>  

[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-27 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613164#comment-15613164
 ] 

Samarth Jain commented on PHOENIX-6:


+1, this is a great addition, [~jamestaylor]!

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_v4.patch, 
> PHOENIX-6_v5.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, 
> PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because these use optimistic concurrency as their 
> mechanism for consistency and isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-27 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613077#comment-15613077
 ] 

James Taylor commented on PHOENIX-6:


bq. How about a covered column test case for local indexes too?
Sounds good - I've made the parameterized local index covered.

Do you think queries with a subquery in the update clause will compile? 
We only allow expressions in the ON DUPLICATE KEY UPDATE syntax, so it's not 
possible to have a subquery (or any query) there.

 Also what if the UDF has a row key column. Will the expression compiler be 
able to figure that out?
The left-hand-side of the ON DUPLICATE KEY UPDATE may only be a column - a UDF 
would yield a syntax error. It's valid to use a UDF on the right-hand-side (and 
equally valid to refer to row key columns too). In that case, you'd be invoking 
the UDF not causing the row key to change.

Thanks for the reviews, [~samarthjain]!

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_v4.patch, 
> PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, 
> PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because these use optimistic concurrency as their 
> mechanism for consistency and isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-27 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612995#comment-15612995
 ] 

Samarth Jain commented on PHOENIX-6:


Thanks for the explanation, makes sense. 

bq. This is already covered as the test is parameterized with indexes added for 
both the indexed and covered column case.
Ah, missed that. How about a covered column test case for local indexes too?

Last two questions, :)

Do you think queries with a subquery in the update clause will compile? Also 
what if the UDF has a row key column. Will the expression compiler be able to 
figure that out?

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_v4.patch, 
> PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, 
> PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because these use optimistic concurrency as their 
> mechanism for consistency and isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-27 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612838#comment-15612838
 ] 

James Taylor commented on PHOENIX-6:


bq. How about a couple more test cases where the expression to be evaluated is 
a bit more complex like a CASE statement
Sure, I'll the test case with a CASE statement. The complexity of the 
expression doesn't make a lot of difference as it's going through the same 
ExpressionCompiler that everything goes through.

bq. Also, some test cases around indexes with covered columns where the 
expression is updating covered and/or indexed columns.
This is already covered as the test is parameterized with indexes added for 
both the indexed and covered column case.

bq. Can you tell me more about the reasoning behind why we disallow the columns 
that are part of the row key in the UPDATE clause. 
If you update the row key, your inserting a new row that'll potentially land in 
a different region on a different region server. We can't support an atomic 
operation for that.

We'll document that it's not a great idea to atomically update columns that are 
contained in global indexes, but it will work. The preIncrementAfterRowLock 
coprocessor hook has an exclusive lock on the row and while under lock the 
coprocessors will fire that update the index rows so they'd be updated 
atomically as well.



> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_wip1.patch, 
> PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because these use optimistic concurrency as their 
> mechanism for consistency and isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-27 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612798#comment-15612798
 ] 

Samarth Jain commented on PHOENIX-6:


Patch looks good, [~jamestaylor]. How about a couple more test cases where the 
expression to be evaluated is a bit more complex like a CASE statement. Also, 
some test cases around indexes with covered columns where the expression is 
updating covered and/or indexed columns. 

Can you tell me more about the reasoning behind why we disallow the columns 
that are part of the row key in the UPDATE clause. Also, why does the same 
reasoning not apply for the case when we are updating an index on the data 
table? 

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_wip1.patch, 
> PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because these use optimistic concurrency as their 
> mechanism for consistency and isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-27 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612601#comment-15612601
 ] 

Samarth Jain commented on PHOENIX-6:


Couple of questions, [~jamestaylor], otherwise the patch looks good to me. 

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_v2.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, 
> PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because these use optimistic concurrency as their 
> mechanism for consistency and isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-27 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612583#comment-15612583
 ] 

Samarth Jain commented on PHOENIX-6:


{code}
+// We cannot neither use the time stamp in the Increment to set the 
Get time range
+// nor set the Put/Delete time stamp and have this be atomic as HBase 
does not
+// handle that.
+long ts = HConstants.LATEST_TIMESTAMP;
+byte[] rowKey = inc.getRow();
{code}

Does that mean point in time queries/setting SCN won't work with this 
construct? Either way, it would be good to have a couple of test cases where 
the connection has an SCN set.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_v2.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, 
> PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because these use optimistic concurrency as their 
> mechanism for consistency and isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-27 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610914#comment-15610914
 ] 

Samarth Jain commented on PHOENIX-6:


[~jamestaylor] - is there any restriction on what kind of expressions are 
allowed on the UPDATE part? Would something like this compile? 
{code}
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY UPDATE COUNTER1 = SUM(COUNTER2)
{code}


> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_v2.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, 
> PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because these use optimistic concurrency as their 
> mechanism for consistency and isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610459#comment-15610459
 ] 

Hadoop QA commented on PHOENIX-6:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12835456/PHOENIX-6_v2.patch
  against master branch at commit 53ca2882722214e15a2f32afd0ca3ed3a4016227.
  ATTACHMENT ID: 12835456

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
43 warning messages.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+String ddl = " create table " + tableName + "(pk varchar primary 
key, counter1 bigint, counter2 smallint)";
+String dml = "UPSERT INTO " + tableName + " VALUES('a',0) ON DUPLICATE 
KEY UPDATE counter1 = counter1 + 1";
+ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " + 
tableName + " WHERE counter1 >= 0");
+rs = conn.createStatement().executeQuery("SELECT * FROM " + tableName 
+ " WHERE counter1 >= 0");
+String ddl = " create table " + tableName + "(k1 varchar, k2 varchar, 
counter1 varchar, counter2 varchar, other1 char(3), other2 varchar default 'f', 
constraint pk primary key (k1,k2))";
+String ddl = " create table " + tableName + "(pk varchar primary key, 
counter1 varchar, counter2 smallint)";
+String dml = "UPSERT INTO " + tableName + " VALUES('a','b') ON 
DUPLICATE KEY UPDATE counter1 = counter1 || 'b'";
+ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " + 
tableName + " WHERE substr(counter1,1,1) = 'b'");
+rs = conn.createStatement().executeQuery("SELECT * FROM " + tableName 
+ " WHERE substr(counter1,1,1) = 'b'");
+String ddl = " create table " + tableName + "(pk varchar primary key, 
counter1 varchar, counter2 smallint)";

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.index.MutableIndexFailureIT

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/646//testReport/
Javadoc warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/646//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/646//console

This message is automatically generated.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_v2.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, 
> PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because 

[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607551#comment-15607551
 ] 

Hadoop QA commented on PHOENIX-6:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12835267/PHOENIX-6.patch
  against master branch at commit 202b8eb1eda29db01006cdeefd8199f0bd360692.
  ATTACHMENT ID: 12835267

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation, build,
or dev patch that doesn't require tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/643//console

This message is automatically generated.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, 
> PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because these use optimistic concurrency as their 
> mechanism for consistency and isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-25 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607265#comment-15607265
 ] 

James Taylor commented on PHOENIX-6:


[~gho...@salesforce.com] - Can you give us a bit of information on usage 
pattern for this feature to help guide [~mujtabachohan]'s performance 
evaluation? In particular:
- How many rows will you be incrementing in one commit batch? Is it just a 
single row commit? Or could it be hundreds of rows at a time?
- Will one row be updated multiple times in the same commit batch?
- Any idea of the velocity of updates? How many rows per second would you 
expect to be updated?
- How frequently will the same rows be updated? Is it highly skewed in that the 
same rows will be incremented over and over again? Or is it a pretty even 
distribution across the board?

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, 
> PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because these use optimistic concurrency as their 
> mechanism for consistency and isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606891#comment-15606891
 ] 

Hadoop QA commented on PHOENIX-6:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12835228/PHOENIX-6_4.x-HBase-0.98.patch
  against 4.x-HBase-0.98 branch at commit 
a5bcb3ea9a86b800b44b1d7815b094d7a952a11b.
  ATTACHMENT ID: 12835228

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/641//console

This message is automatically generated.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, 
> PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because these use optimistic concurrency as their 
> mechanism for consistency and isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-25 Thread Mujtaba Chohan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606521#comment-15606521
 ] 

Mujtaba Chohan commented on PHOENIX-6:
--

Sure [~jamestaylor]

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_wip1.patch, 
> PHOENIX-6_wip2.patch, PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> To handle the maintenance of immutable indexes, we'll need to push the 
> maintenance to the server side.
> This is because the mutations for indexes on immutable tables are calculated 
> on the client-side, while this new syntax would potentially modify the value 
> on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-14 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576237#comment-15576237
 ] 

Cameron Hatfield commented on PHOENIX-6:


And in general, I agree with the heavyweightness of the merge statement. I 
could also see ON DUPLICATE KEY having slightly different semantics / 
optimization opportunities, ala Postgreses discussion around the feature, where 
having two statements makes sense, even if the more general statement can 
handle the same problem.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6_wip1.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> To handle the maintenance of immutable indexes, we'll need to push the 
> maintenance to the server side.
> This is because the mutations for indexes on immutable tables are calculated 
> on the client-side, while this new syntax would potentially modify the value 
> on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-14 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576149#comment-15576149
 ] 

Cameron Hatfield commented on PHOENIX-6:


It would work for me, though I'm not sure if it would be consistent the 
statements that exist within phoenix, as far as read isolation goes. 
Specifically, with the following ordering: 
*UPSERT DUPLICATE KEY
*UPSERT FROM SELECT
*UPSERT DUPLICATE KEY
*UPSERT FROM SELECT
Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the 
second DUPLICATE KEY see the results from the first FROM SELECT? Does the 
second FROM SELECT see the results from the first FROM SELECT?

Would definitely be counter intuitive if only some statements in the same 
commit batch could occur in different orders. It might be better, at least long 
term, just to support per-statement auto-commit batching at the client level 
(since commit basically handles non-autocommit batching), though the difficulty 
would be higher.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6_wip1.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> To handle the maintenance of immutable indexes, we'll need to push the 
> maintenance to the server side.
> This is because the mutations for indexes on immutable tables are calculated 
> on the client-side, while this new syntax would potentially modify the value 
> on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-14 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576121#comment-15576121
 ] 

Cameron Hatfield commented on PHOENIX-6:


Here is what MSSQL does to handle not having to repeat the actual values 
(though the clause is still repeated), as well as how it supports multiple 
values for the same merge:
{code:sql}
MERGE INTO Sales.SalesReason AS Target  
  USING (VALUES ('Recommendation','Other'), ('Review', 'Marketing'), 
('Internet', 'Promotion'))  
   AS Source (NewName, NewReasonType)  
  ON Target.Name = Source.NewName  
WHEN MATCHED THEN  
  UPDATE SET ReasonType = Source.NewReasonType  
WHEN NOT MATCHED BY TARGET THEN  
  INSERT (Name, ReasonType) VALUES (NewName, NewReasonType)
{code}

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6_wip1.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> To handle the maintenance of immutable indexes, we'll need to push the 
> maintenance to the server side.
> This is because the mutations for indexes on immutable tables are calculated 
> on the client-side, while this new syntax would potentially modify the value 
> on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-14 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576102#comment-15576102
 ] 

Cameron Hatfield commented on PHOENIX-6:


In general, the main use case I have is reducing RPC calls across 
"compare-and-set" requests. A tertiary use case would be reducing the amount of 
work the parser/compiler has to do (issuing and parsing one statement is 
generally easier/faster then issuing and parsing multiple of the same 
statements). The first could be handled by either supporting batching at the 
client level (so one RPC call for multiple statements being run), though there 
are questions on what autocommit means in that respect, or by supporting a 
single statement that supports multiple values (such as a values statement that 
supports multiple rows). 

The allowing multiple statements for the same row key would basically be 
equivalent to an "autocommit" after each individual statement, so that would 
work for me.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6_wip1.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> To handle the maintenance of immutable indexes, we'll need to push the 
> maintenance to the server side.
> This is because the mutations for indexes on immutable tables are calculated 
> on the client-side, while this new syntax would potentially modify the value 
> on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-10 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563263#comment-15563263
 ] 

James Taylor commented on PHOENIX-6:


I'm exploring an alternate syntax (with equivalent functionality) so that when 
we move to Calcite there will be no need for users to make any changes. Calcite 
supports the MERGE statement which is a superset of this functionality. I'm 
trying to come up with the minimal support for MERGE we'd need. 
{code}
MERGE MyTable AS t
USING (VALUES('a','b',0,0)) AS s(pk1,pk2,counter1,counter2)
ON (t.pk1,t.pk2) = (s.pk1, s.pk2)
WHEN MATCHED THEN UPDATE SET t.counter1 = s.counter1 + 1, t.counter2 = 
s.counter2 + 1
WHEN NOT MATCHED THEN INSERT(pk1,pk2,counter1,counter2) VALUES('a', 'b', 0, 0)
{code}
Not sure how restrict/scope down the ON clause as the only construct allowed 
there would be a PK match. The verbosity of the MERGE statement is not ideal 
for this simple use case, as the ON DUPLICATE KEY  construct only has to define 
the WHEN MATCHED part. Also, it's unfortunate that the VALUES clause has to be 
repeated. Is there any way around that, [~julianhyde]? 

IMHO, the above is harder to read and will be harder to implement too. Perhaps 
we can support the ON DUPLICATE KEY syntax and generate a MERGE statement from 
it when the Phoenix/Calcite integration is complete.

Thoughts?

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> To handle the maintenance of immutable indexes, we'll need to push the 
> maintenance to the server side.
> This is because the mutations for indexes on immutable tables are calculated 
> on the client-side, while this new syntax would potentially modify the value 
> on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-04 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15547135#comment-15547135
 ] 

James Taylor commented on PHOENIX-6:


Would you mind filing a separate JIRA for your use case? If I understand it 
correctly, you'd like to be able to have multiple atomic insert/update 
statements for the same row in the same commit batch and have them sequentially 
execute? Would the atomicity be for the entire commit batch (for all rows being 
operated on) or if not, how would potentially overlapping rows from other 
batches be handled?

For your question, I think you meant to use the same table name for both 
statements, right? For non transactional tables, the UPSERT SELECT wouldn't see 
the data from the UPSERT VALUES. When auto commit is off, the data is kept in 
memory on the client until the commit occurs. The UPSERT VALUES would cause a 
new row to be cached on the client (in memory) and the UPSERT SELECT would 
cause N rows to be cached on the client (where N is the number of rows 
currently in the table). The commit essentially generates the HBase Puts and 
does a batch mutate.

For transactional tables, you do see your own updates. In that case, the UPSERT 
SELECT would see the row from the UPSERT VALUES call.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the  may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-04 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546874#comment-15546874
 ] 

Cameron Hatfield commented on PHOENIX-6:


For the help with my use case question, not really, due to the round-trip per 
row insert (same reasoning as the other "batch checkandput" bugs within phoenix 
and HBase).

For the question about commit, and this is just me trying to understand it, if 
you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that 
order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a 
delete, or select), according to that wording. Is that correct? For example, 
what happens in the following case (when using jdbc driver commands for 
autocommit / commit):
{code:sql}
AUTOCOMMIT OFF;
UPSERT INTO T (PK, V) VALUES (0,0);

-- Run batch size + 1 times
UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff;

COMMIT;
{code}

In theory, according to SQL + the docs, I would expect to see only one new row 
added to the table. But if more then one batch occurs, an extra two rows should 
exist. The most likely time for this to show up, of course, is multiple upsert 
selects within the same commit on a large enough table.

Anyways, this is only mildly related to this bug, as it this same problem would 
be more pronounced with an on duplicate key command, as you would probably 
actually be using it for things like increments / etc.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the  may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-04 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546680#comment-15546680
 ] 

James Taylor commented on PHOENIX-6:


With Phoenix, for non transactional tables, the updates to the HBase data all 
happen at commit time, so UPSERT SELECT won't see uncommitted data. It's true, 
though that: "Non transactional tables have no guarantees above and beyond the 
HBase guarantee of row level atomicity (see here). In addition, non 
transactional tables will not see their updates until after a commit has 
occurred."

Will this new ON DUPLICATE KEY syntax and semantics help with your use case, 
[~cameron.hatfield]?

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the  may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-04 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546605#comment-15546605
 ] 

Cameron Hatfield commented on PHOENIX-6:


Ahh, I saw where I was getting confused. I thought that Phoenix implemented 
executebatch according to the JDBC contract (multiple statements will be sent 
in one go to the server), however instead they send each statement separately 
to the underlying db.

Would probably be worthwhile to mention that if you want to use this for 
counters / etc, that each separate statement hitting the same row requires a 
separate commit (just to make it more obvious that it is how it would work). 
This would also be an interesting interaction with how UPSERT ... SELECT 
batches inserts, when duplicates are involved, since the it would act 
differently depending on where duplicates where in the batch. Technically, I 
believe this is a problem now, as UPSERT .. SELECT would allow you to read data 
that is not yet "committed", so its possible that the documentation for that 
should be updated (as a separate bug): "Non transactional tables have no 
guarantees above and beyond the HBase guarantee of row level atomicity (see 
here). *In addition, non transactional tables will not see their updates until 
after a commit has occurred.*"



> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the  may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-04 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546487#comment-15546487
 ] 

James Taylor commented on PHOENIX-6:


Yes, agreed, [~cameron.hatfield]. If these two statements are in the same 
commit batch, the result would be 0, 0:
{code}
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
-- With auto commit off, the result would be 'a', 0, 0
{code}


> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the  may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-04 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546436#comment-15546436
 ] 

Cameron Hatfield commented on PHOENIX-6:


Except, in the second case (two statements, same batch), the value of row 'a' 
of would non-existent, as they have never been interested before. So wouldn't 
the result be 0,0 if it was in the same batch?

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the  may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-04 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546218#comment-15546218
 ] 

James Taylor commented on PHOENIX-6:


The UPSERT VALUES ... ON DUPLICATE KEY will be evaluated on the server side at 
commit time. As with existing UPSERT VALUES calls, only the last statement 
takes effect if multiple statements for the same row key are in the same batch.

So in your example, assuming auto commit is true:
{code}
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;

-- Result would be: 'a', 0, 0

UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;

-- Result would be 'a', 1, 1

{code}

Since we do a read under lock of the row, we'd get the values of row 'a' before 
any of the ON DUPLICATE KEY expressions are evaluated, and then we'd evaluate 
all of the expressions against the current values of row 'a'.

I'll make sure to add a unit test like this - it's a good test.


> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the  may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-03 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543719#comment-15543719
 ] 

Cameron Hatfield commented on PHOENIX-6:


And as a general note, though inefficent, when combined with an IF-like 
function, this would also cover our CAS use cases. However, it will be highly 
dependent on the ordering guarantees above. Here's the first blog post example 
I found of doing CAS like things in another SQL engine, 
https://thewebfellas.com/blog/conditional-duplicate-key-updates-with-mysql

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the  may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-03 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543705#comment-15543705
 ] 

Cameron Hatfield commented on PHOENIX-6:


At what point will the expressions be evaluated in the context of update 
statement?

Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the 
RHS of each expression):
Both run in two separate round trips to Phoenix/HBase

UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;

After the second upsert, which would be the result?
1: 1, 1
2: 1, 2
3: 2, 1

Last time I looked at this during the update statement thread, I believe it was 
number 1 that should be the result according to the SQL standard.


intra-query (Same querie as example query in description, twice):
Multiple queries, within the same batch sent to the Phoenix coprocessor
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;

After the batch is run, which would be the result?
1: 0,0
2: 1,1

This is the one we had issues with choosing how we wanted to do it with the CAS 
change. Since they aren't in the memstore until the whole batch is finished, 
you have to keep state within the coprocessor, for each edit, to ensure it came 
out with number 2. On top of that, if you have two different queries that will 
have very different results in different orders, then batching can readily 
affect the outcome. Right now, even though all of the warnings within HBase 
talk about ordering not guaranteed for batching, for a single row, it seems to 
be ordered in batch order as sent by the client.

>From a SQL standard direction, I believe 2 would be correct here, as there is 
>no idea of a "transaction" for a batch in phoenix (even per key), so I would 
>assume that it would be equiv. to auto commit after each statement.



> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the  may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-09-28 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15531383#comment-15531383
 ] 

James Taylor commented on PHOENIX-6:


bq. but incrementing a numeric column can use the optimized path, right?
- An Increment does a read+write under row lock like we're planning for this 
JIRA. See HRegion.java [1], doIncrement, applyIncrementsToColumnFamily and 
getIncrementCurrentValue calls.
- Our mechanism will have only one code path - there's no need to use Increment 
(and it wouldn't work in the context of Phoenix for a variety of reasons).
- Our mechanism will be more general as you can set a column value to the 
evaluation of *any* expression, not just column+1
- Our mechanism also gives you the ability to define the initial value while 
Increment doesn't

[1] 
https://github.com/apache/hbase/blob/branch-1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the  may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-09-28 Thread Gary Horen (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15531316#comment-15531316
 ] 

Gary Horen commented on PHOENIX-6:
--

[~giacomotaylor]:
>>would thus require a read before write

For a clause
 
ON DUPLICATE KEY counter = counter +1

you could just issue a put that contains an Increment, right? My understanding 
is that hbase would either instantiate the column (and row) if it doesn't 
exist, or apply the Increment to the existing column, right? We discussed this 
a couple of days ago but I'm not seeing any explicit description of it in this 
Jira.

ON DUPLICATE KEY IGNORE, and ON DUPLICATE KEY  would require read-then-write as far as I 
understand, but incrementing a numeric column can use the optimized path, right?

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the  may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-09-28 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530929#comment-15530929
 ] 

James Taylor commented on PHOENIX-6:


FYI, [~gho...@salesforce.com]

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)