[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-11-07 Thread Gary Horen (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645719#comment-15645719
 ] 

Gary Horen commented on PHOENIX-6:
--

in the near term this will probably be mostly single rows / commit. As time 
goes on other use cases might present larger sized batches.

I would not expect a single row to be updated many times in a single commit. 
That would be rare for us.

Peak arrival rate (occuring a handful of times / day) would be in the order of 
a handful per second, for now. Modal arrival rate during the day will probably 
be several / minute.

The current scenario is counting views for feed items. Some feed items will be 
very popular, others will be viewed seldom. My wild guess would be that the 
ratio of popular to unpopular will be 10:1 with a gentle downward asymptote 
between them.


> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_v4.patch, 
> PHOENIX-6_v5.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, 
> PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because these use optimistic concurrency as their 
> mechanism for consistency and isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-09-28 Thread Gary Horen (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531316#comment-15531316
 ] 

Gary Horen commented on PHOENIX-6:
--

[~giacomotaylor]:
>>would thus require a read before write

For a clause
 
ON DUPLICATE KEY counter = counter +1

you could just issue a put that contains an Increment, right? My understanding 
is that hbase would either instantiate the column (and row) if it doesn't 
exist, or apply the Increment to the existing column, right? We discussed this 
a couple of days ago but I'm not seeing any explicit description of it in this 
Jira.

ON DUPLICATE KEY IGNORE, and ON DUPLICATE KEY  would require read-then-write as far as I 
understand, but incrementing a numeric column can use the optimized path, right?

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the  may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-2194) order by should not require all PK fields with = constraint

2015-08-20 Thread Gary Horen (JIRA)
Gary Horen created PHOENIX-2194:
---

 Summary: order by should not require all PK fields with = 
constraint
 Key: PHOENIX-2194
 URL: https://issues.apache.org/jira/browse/PHOENIX-2194
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.5.0
 Environment: linux
Reporter: Gary Horen


Here is a table:
CREATE TABLE IF NOT EXISTS FEEDS.STUFF
(
STUFF CHAR(15) NOT NULL,
NONSENSE CHAR(15) NOT NULL
CONSTRAINT PK PRIMARY KEY
(
STUFF,
NONSENSE

)
) VERSIONS=1,MULTI_TENANT=TRUE,REPLICATION_SCOPE=1

Here is a query:
explain SELECT * FROM feeds.stuff
where stuff = ' '
and nonsense > ' '
order by nonsense

Here is the plan:
CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN  
SERVER FILTER BY FIRST KEY ONLY   
SERVER TOP 100 ROWS SORTED BY [NONSE  
CLIENT MERGE SORT   

If I change to ORDER BY STUFF, NONSENSE I get:
CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN O  
SERVER FILTER BY FIRST KEY ONLY AND   
SERVER 100 ROW LIMIT  
CLIENT 100 ROW LIMIT  

Since the leading constraint is =,  ORDER BY will be unaffected by it, so ORDER 
BY should not need the leading constraint; it should only require the columns 
whose values would vary (which, since they are ordered by the key, should (and 
do) result in the client side sort being optimized out.) Having to include the 
leading = constraints in the ORDER BY clause is very counter-intuitive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)