[ 
https://issues.apache.org/jira/browse/PHOENIX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-914:
---------------------------------
    Attachment: PHOENIX-914.patch

Thanks for the review, [~jamestaylor]. Attached is the updated patch.

bq. This function can be simpler:
{code}
+    private static LiteralParseNode getNodeForRowTimestampColumn(PColumn col) {
{code}

The code in UpsertCompiler requires the value part in the LiteralExpressionNode 
to be of the same type. So I need to create new Date(1), new Timestamp(1), etc.

bq.For UPSERT SELECT, how about passing the rowTimestamp column PK slot 
position instead of a boolean as an indicator that we're using this 
optimization, as then we don't need to re-search for it again?

I need to pass both - whether row time stamp needs to be automatically selected 
from the table timestamp as well as the actual position of the row timestamp 
column. The former allows to check whether the value for row ts to be upserted 
is valid ( >=0 ) or not.

I have added tests for the following scenarios:
1) Query compiler tests to check for various validations associated with a 
row_timestamp column.
2) Tests in UpsertValuesIT and UpsertSelectIT to see whether the row timestamp 
column is being written correctly (for both asc, desc sort order). Also tested 
upserting from desc to asc, asc to desc, desc to desc and asc to asc 
row_timestamp columns.
3) For upsert select, added tests that execute upsert select on both client and 
server side.
4) Test to check that we don't allow setting SCN as negative.
5) Tests to check that we disallow setting negative values for row_timestamp 
column for both UPSERT VALUES and UPSERT SELECT

A couple of my test cases are commented since they are failing due to the bugs 
that I discovered - PHOENIX-2276, PHOENIX-2277, PHOENIX-2278. 

Please review, when you get a chance. Thanks!

> Native HBase timestamp support to optimize date range queries in Phoenix 
> -------------------------------------------------------------------------
>
>                 Key: PHOENIX-914
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-914
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.0.0
>            Reporter: Vladimir Rodionov
>            Assignee: Samarth Jain
>         Attachments: PHOENIX-914.patch, PHOENIX-914.patch, wip.patch
>
>
> For many applications one of the column of a table can be (and must be) 
> naturally mapped 
> to HBase timestamp. What it gives us is the optimization on StoreScanner 
> where HFiles with timestamps out of range of
> a Scan operator will be omitted. Let us say that we have time-series type of 
> data (EVENTS) and custom compaction, where we create 
> series of HFiles with continuous non-overlapping timestamp ranges.
> CREATE TABLE IF NOT EXISTS ODS.EVENTS (
>     METRICID  VARCHAR NOT NULL,
>     METRICNAME VARCHAR,
>     SERVICENAME VARCHAR NOT NULL,
>     ORIGIN VARCHAR NOT NULL,
>     APPID VARCHAR,
>     IPID VARCHAR,
>     NVALUE DOUBLE,
>     TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/,
>     DATA VARCHAR,
>     SVALUE VARCHAR
>     CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID, 
> TIME)
> ) SALT_BUCKETS=40, IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';
> Make note on   TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/ - this is the Hint to 
> Phoenix that the column
> TIME must be mapped to HBase timestamp. 
> The Query:
> Select all events of type 'X' for last 7 days
> SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() - 
> 7*24*3600000; (this may be not correct SQL syntax of course)
> These types of queries will be efficiently optimized if:
> 1. Phoenix maps  TIME column to HBase timestamp
> 2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan 
> timerange 
> Although this :
> Properties props = new Properties();
> props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
> Connection conn = DriverManager.connect(myUrl, props);
> conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
> conn.commit();
> will work in my case- it may not be efficient from performance point of view 
> because for every INSERT/UPSERT 
> new Connection object and new Statement is created, beside this we still need 
> the optimization 2. (see above). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to