Re: Setting a TTL in an upsert

2015-09-23 Thread James Taylor
Hi Alex,
I can think of a couple of ways to support this:
1) Surface support for per Cell TTLs (HBASE-10560) in Phoenix
(PHOENIX-1335). This could have the kind of syntax you mentioned (or
alternatively rely on a connection property and no syntactic change would
be necessary, and then in MutationState (where Phoenix produces HBase
Mutations), you'd need to use the HBase API to set the TTLs. You'd also
need to deal with setting secondary index rows to have the same TTLs as
their data rows.
2) Use the CurrentSCN property at connection time for UPSERT calls to
future date the cell timestamp. You'd also need to set the CurrentSCN
property for readers above any value you used at UPSERT time as otherwise
you wouldn't see the data you wrote.

If you're up for it, (1) would be a nice contribution and definitely a
viable feature.

Thanks,
James

On Wed, Sep 23, 2015 at 9:08 AM, Alex Loffler <alex.loff...@telus.com>
wrote:

> Hi,
>
> Thanks for the response – would this be a viable feature request? We’re
> moving from using raw HBase to Phoenix and would like to use this
> ‘countdown’ feature to allow for different rows in the same table to have
> different retention times. Instead of having to index a user created TTL
> column and create a script to manually garbage collect the stale rows, we
> could continue to leverage HBase’s TTL mechanism to automatically exclude
> the rows and physically delete them on the next major compaction.
>
> From the documentation, Phoenix supports TTL on secondary indexes as long
> as they are created with the same value as the base table, which would be
> perfect!
>
> Thanks,
>
> -Alex.
>
> *From:* Yuhao Bi [mailto:byh0...@gmail.com <byh0...@gmail.com>]
> *Sent:* September 23, 2015 00:31
> *To:* user
> *Subject:* Re: Setting a TTL in an upsert
>
> Hi,
>
> As I know, we can only set a ttl in create table stage corresponding to
> HBase table ttl.
>
> CREATE TABLE IF NOT EXISTS my_schema.my_table (
> org_id CHAR(15), entity_id CHAR(15), payload binary(1000),
> CONSTRAINT pk PRIMARY KEY (org_id, entity_id) )
> TTL=86400
>
> See *http://phoenix.apache.org/language/index.html#create_table*
> <http://phoenix.apache.org/language/index.html> for more grammar detail.
>
> Thanks,
>
> 2015-09-23 15:11 GMT+08:00 Alex Loffler <*alex.loff...@telus.com*
> <alex.loff...@telus.com>>:
>
> Hi,
>
>
>
> Is it possible to define the TTL of a row (or even each cell in the row)
> during an upsert e.g:
>
>
>
> upsert into test values(1,2,3) TTL=1442988643355;
>
>
>
> Assuming the table has a TTL this would allow per-row retention policies
> (with automatic garbage-collection by HBase) by e.g. setting the upsert TTL
> to a time in the future.
>
>
>
> For example if the TTL on the table is set to 60 (seconds), a row with a
> desired retention policy of 1 year could be upserted with a TTL=now() + 1
> year.
>
>
>
> Thanks in advance,
>
> -Alex.
>
>


Re: Setting a TTL in an upsert

2015-09-23 Thread James Taylor
Also, for more information on (2), see
https://phoenix.apache.org/faq.html#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API

On Wed, Sep 23, 2015 at 10:55 AM, James Taylor <jamestay...@apache.org>
wrote:

> Hi Alex,
> I can think of a couple of ways to support this:
> 1) Surface support for per Cell TTLs (HBASE-10560) in Phoenix
> (PHOENIX-1335). This could have the kind of syntax you mentioned (or
> alternatively rely on a connection property and no syntactic change would
> be necessary, and then in MutationState (where Phoenix produces HBase
> Mutations), you'd need to use the HBase API to set the TTLs. You'd also
> need to deal with setting secondary index rows to have the same TTLs as
> their data rows.
> 2) Use the CurrentSCN property at connection time for UPSERT calls to
> future date the cell timestamp. You'd also need to set the CurrentSCN
> property for readers above any value you used at UPSERT time as otherwise
> you wouldn't see the data you wrote.
>
> If you're up for it, (1) would be a nice contribution and definitely a
> viable feature.
>
> Thanks,
> James
>
> On Wed, Sep 23, 2015 at 9:08 AM, Alex Loffler <alex.loff...@telus.com>
> wrote:
>
>> Hi,
>>
>> Thanks for the response – would this be a viable feature request? We’re
>> moving from using raw HBase to Phoenix and would like to use this
>> ‘countdown’ feature to allow for different rows in the same table to have
>> different retention times. Instead of having to index a user created TTL
>> column and create a script to manually garbage collect the stale rows,
>> we could continue to leverage HBase’s TTL mechanism to automatically
>> exclude the rows and physically delete them on the next major compaction.
>>
>> From the documentation, Phoenix supports TTL on secondary indexes as long
>> as they are created with the same value as the base table, which would be
>> perfect!
>>
>> Thanks,
>>
>> -Alex.
>>
>> *From:* Yuhao Bi [mailto:byh0...@gmail.com <byh0...@gmail.com>]
>> *Sent:* September 23, 2015 00:31
>> *To:* user
>> *Subject:* Re: Setting a TTL in an upsert
>>
>> Hi,
>>
>> As I know, we can only set a ttl in create table stage corresponding to
>> HBase table ttl.
>>
>> CREATE TABLE IF NOT EXISTS my_schema.my_table (
>> org_id CHAR(15), entity_id CHAR(15), payload binary(1000),
>> CONSTRAINT pk PRIMARY KEY (org_id, entity_id) )
>> TTL=86400
>>
>> See *http://phoenix.apache.org/language/index.html#create_table*
>> <http://phoenix.apache.org/language/index.html> for more grammar detail.
>>
>> Thanks,
>>
>> 2015-09-23 15:11 GMT+08:00 Alex Loffler <*alex.loff...@telus.com*
>> <alex.loff...@telus.com>>:
>>
>> Hi,
>>
>>
>>
>> Is it possible to define the TTL of a row (or even each cell in the row)
>> during an upsert e.g:
>>
>>
>>
>> upsert into test values(1,2,3) TTL=1442988643355;
>>
>>
>>
>> Assuming the table has a TTL this would allow per-row retention policies
>> (with automatic garbage-collection by HBase) by e.g. setting the upsert TTL
>> to a time in the future.
>>
>>
>>
>> For example if the TTL on the table is set to 60 (seconds), a row with a
>> desired retention policy of 1 year could be upserted with a TTL=now() + 1
>> year.
>>
>>
>>
>> Thanks in advance,
>>
>> -Alex.
>>
>>
>


RE: Setting a TTL in an upsert

2015-09-23 Thread Alex Loffler
Hi James,

 

Thank you for the info/validation. What I had in mind was the ability to define 
the (HBase per cell) timestamp arbitrarily for each upsert statement, but more 
broadly there seems to be at least three levels of granularity:

 

1)  Per cell/column timestamp – where each column in the upsert statement 
could define a separate timestamp (this may make the syntax quite unwieldy, but 
offers the most flexibility). At first glance this hurts my head especially wrt 
honouring the timestamps on primary keys & secondary indexes that are defined 
over multiple columns where a parts of the index/row could disappear at 
different times.

2)  Per row timestamp – my original use-case where all of the cells in the 
row receive the same timestamp.

3)  Per connection – As you say, probably the simplest/least impacting as 
this could be passed as a connection property. I believe it will also be 
possible to emulate (2) by using multiple connections, one per retention policy 
duration, so maybe this is a good starting point.

 

I’m new to the project so will dive into the code to get my bearings before 
pulling together a plan of attack.

 

Thanks again,

-Alex.

 

From: James Taylor [mailto:jamestay...@apache.org] 
Sent: September 23, 2015 10:59
To: James Taylor
Cc: user
Subject: Re: Setting a TTL in an upsert

 

Also, for more information on (2), see 
https://phoenix.apache.org/faq.html#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API

 

On Wed, Sep 23, 2015 at 10:55 AM, James Taylor <jamestay...@apache.org> wrote:

Hi Alex,

I can think of a couple of ways to support this:

1) Surface support for per Cell TTLs (HBASE-10560) in Phoenix (PHOENIX-1335). 
This could have the kind of syntax you mentioned (or alternatively rely on a 
connection property and no syntactic change would be necessary, and then in 
MutationState (where Phoenix produces HBase Mutations), you'd need to use the 
HBase API to set the TTLs. You'd also need to deal with setting secondary index 
rows to have the same TTLs as their data rows.

2) Use the CurrentSCN property at connection time for UPSERT calls to future 
date the cell timestamp. You'd also need to set the CurrentSCN property for 
readers above any value you used at UPSERT time as otherwise you wouldn't see 
the data you wrote.

 

If you're up for it, (1) would be a nice contribution and definitely a viable 
feature.

 

Thanks,

James

 

On Wed, Sep 23, 2015 at 9:08 AM, Alex Loffler <alex.loff...@telus.com> wrote:

Hi,

Thanks for the response – would this be a viable feature request? We’re moving 
from using raw HBase to Phoenix and would like to use this ‘countdown’ feature 
to allow for different rows in the same table to have different retention 
times. Instead of having to index a user created TTL column and create a script 
to manually garbage collect the stale rows, we could continue to leverage 
HBase’s TTL mechanism to automatically exclude the rows and physically delete 
them on the next major compaction.

>From the documentation, Phoenix supports TTL on secondary indexes as long as 
>they are created with the same value as the base table, which would be perfect!

Thanks,

-Alex.

From: Yuhao Bi [mailto:byh0...@gmail.com]
Sent: September 23, 2015 00:31
To: user
Subject: Re: Setting a TTL in an upsert

Hi,

As I know, we can only set a ttl in create table stage corresponding to HBase 
table ttl.

CREATE TABLE IF NOT EXISTS my_schema.my_table (
org_id CHAR(15), entity_id CHAR(15), payload binary(1000),
CONSTRAINT pk PRIMARY KEY (org_id, entity_id) )
TTL=86400

See http://phoenix.apache.org/language/index.html#create_table 
<http://phoenix.apache.org/language/index.html>  for more grammar detail.

Thanks,

2015-09-23 15:11 GMT+08:00 Alex Loffler <alex.loff...@telus.com>:

Hi,

 

Is it possible to define the TTL of a row (or even each cell in the row) during 
an upsert e.g:

 

upsert into test values(1,2,3) TTL=1442988643355;

 

Assuming the table has a TTL this would allow per-row retention policies (with 
automatic garbage-collection by HBase) by e.g. setting the upsert TTL to a time 
in the future.

 

For example if the TTL on the table is set to 60 (seconds), a row with a 
desired retention policy of 1 year could be upserted with a TTL=now() + 1 year.

 

Thanks in advance,

-Alex.

 

 



smime.p7s
Description: S/MIME cryptographic signature


Re: Setting a TTL in an upsert

2015-09-23 Thread James Taylor
Thanks, Alex. I agree - starting with (3) would be best as we wouldn't need
any non standard SQL syntax.

On Wed, Sep 23, 2015 at 12:06 PM, Alex Loffler <alex.loff...@telus.com>
wrote:

> Hi James,
>
>
>
> Thank you for the info/validation. What I had in mind was the ability to
> define the (HBase per cell) timestamp arbitrarily for each upsert
> statement, but more broadly there seems to be at least three levels of
> granularity:
>
>
>
> 1)  Per cell/column timestamp – where each column in the upsert
> statement could define a separate timestamp (this may make the syntax quite
> unwieldy, but offers the most flexibility). At first glance this hurts my
> head especially wrt honouring the timestamps on primary keys & secondary
> indexes that are defined over multiple columns where a parts of the
> index/row could disappear at different times.
>
> 2)  Per row timestamp – my original use-case where all of the cells
> in the row receive the same timestamp.
>
> 3)  Per connection – As you say, probably the simplest/least
> impacting as this could be passed as a connection property. I believe it
> will also be possible to emulate (2) by using multiple connections, one per
> retention policy duration, so maybe this is a good starting point.
>
>
>
> I’m new to the project so will dive into the code to get my bearings
> before pulling together a plan of attack.
>
>
>
> Thanks again,
>
> -Alex.
>
>
>
> *From:* James Taylor [mailto:jamestay...@apache.org]
> *Sent:* September 23, 2015 10:59
> *To:* James Taylor
> *Cc:* user
>
> *Subject:* Re: Setting a TTL in an upsert
>
>
>
> Also, for more information on (2), see
> https://phoenix.apache.org/faq.html#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API
>
>
>
> On Wed, Sep 23, 2015 at 10:55 AM, James Taylor <jamestay...@apache.org>
> wrote:
>
> Hi Alex,
>
> I can think of a couple of ways to support this:
>
> 1) Surface support for per Cell TTLs (HBASE-10560) in Phoenix
> (PHOENIX-1335). This could have the kind of syntax you mentioned (or
> alternatively rely on a connection property and no syntactic change would
> be necessary, and then in MutationState (where Phoenix produces HBase
> Mutations), you'd need to use the HBase API to set the TTLs. You'd also
> need to deal with setting secondary index rows to have the same TTLs as
> their data rows.
>
> 2) Use the CurrentSCN property at connection time for UPSERT calls to
> future date the cell timestamp. You'd also need to set the CurrentSCN
> property for readers above any value you used at UPSERT time as otherwise
> you wouldn't see the data you wrote.
>
>
>
> If you're up for it, (1) would be a nice contribution and definitely a
> viable feature.
>
>
>
> Thanks,
>
> James
>
>
>
> On Wed, Sep 23, 2015 at 9:08 AM, Alex Loffler <alex.loff...@telus.com>
> wrote:
>
> Hi,
>
> Thanks for the response – would this be a viable feature request? We’re
> moving from using raw HBase to Phoenix and would like to use this
> ‘countdown’ feature to allow for different rows in the same table to have
> different retention times. Instead of having to index a user created TTL
> column and create a script to manually garbage collect the stale rows, we
> could continue to leverage HBase’s TTL mechanism to automatically exclude
> the rows and physically delete them on the next major compaction.
>
> From the documentation, Phoenix supports TTL on secondary indexes as long
> as they are created with the same value as the base table, which would be
> perfect!
>
> Thanks,
>
> -Alex.
>
> *From:* Yuhao Bi [mailto:byh0...@gmail.com <byh0...@gmail.com>]
> *Sent:* September 23, 2015 00:31
> *To:* user
> *Subject:* Re: Setting a TTL in an upsert
>
> Hi,
>
> As I know, we can only set a ttl in create table stage corresponding to
> HBase table ttl.
>
> CREATE TABLE IF NOT EXISTS my_schema.my_table (
> org_id CHAR(15), entity_id CHAR(15), payload binary(1000),
> CONSTRAINT pk PRIMARY KEY (org_id, entity_id) )
> TTL=86400
>
> See http://phoenix.apache.org/language/index.html#create_table
> <http://phoenix.apache.org/language/index.html> for more grammar detail.
>
> Thanks,
>
> 2015-09-23 15:11 GMT+08:00 Alex Loffler <alex.loff...@telus.com>:
>
> Hi,
>
>
>
> Is it possible to define the TTL of a row (or even each cell in the row)
> during an upsert e.g:
>
>
>
> upsert into test values(1,2,3) TTL=1442988643355;
>
>
>
> Assuming the table has a TTL this would allow per-row retention policies
> (with automatic garbage-collection by HBase) by e.g. setting the upsert TTL
> to a time in the future.
>
>
>
> For example if the TTL on the table is set to 60 (seconds), a row with a
> desired retention policy of 1 year could be upserted with a TTL=now() + 1
> year.
>
>
>
> Thanks in advance,
>
> -Alex.
>
>
>
>
>