[
https://issues.apache.org/jira/browse/PHOENIX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152891#comment-14152891
]
Dominik Wagenknecht edited comment on PHOENIX-914 at 9/30/14 7:29 AM:
----------------------------------------------------------------------
As far as I understand this discussion this is not so much about setting (this
can still be done with the aforementioned Properties) the timestamp or really
"using" multiple versions, but about _getting_ and _querying_ on it. In our
system we have one "quality assurance" job that we run every night to compare
upsert (= cell) timestamps to other timestamps that are stored as normal
"columns". We had to do that with a standard HBase SCAN (in a Map/Reduce job)
and use (internal) Phoenix API to convert the normal column byte[]s written by
SQL/Phoenix into a "real" Timestamp.
So here my suggestion: Instead of a "magic" Hint, wouldn't it solve the entire
thing by providing a function, e.g. {{CELL_TIME}} that can be applied on both
sides, e.g.:
{{SELECT mytime, CELL_TIME(mytime) AS mytime_cell FROM ...}} or
{{SELECT col_a, col_b, col_c FROM xzy WHERE CELL_TIME(col_a) > 2014-09-30...}}
To apply this for all elements in a row in a query (like querying "history", if
you really want to _actively_ use HBase's Cell-Timestamp which will always be
strange in SQL as it doesn't feature any _cell_ notation in its core logic)
additional functions could be provided that provide the minimum cell time over
a range of columns, etc...
was (Author: ledominik):
As far as I understand this discussion this is not so much about setting (this
can still be done with the aforementioned Properties) the timestamp or really
"using" multiple versions, but about _getting_ and _querying_ on it. In our
system we have one job that we run every night to compare upsert (= cell)
timestamps to other timestamps that are stored as normal "columns" and we had
to do that with a standard HBase SCAN (in a Map/Reduce job) and use (internal)
Phoenix API to convert the normal column byte[]s written by phoenix into a
"real" Timestamp.
So here my suggestion: Instead of a "magic" Hint, wouldn't it solve the entire
thing by providing a function, e.g. {{CELL_TIME}} that can be applied on both
sides, e.g.:
{{SELECT mytime, CELL_TIME(mytime) AS mytime_cell FROM ...}} or
{{SELECT col_a, col_b, col_c FROM xzy WHERE CELL_TIME(col_a) > 2014-09-30...}}
To apply this for all elements in a row in a query (like querying "history", if
you really want to _actively_ use HBase's Cell-Timestamp which will always be
strange in SQL as it doesn't feature any _cell_ notation in its core logic)
additional functions could be provided that provide the minimum cell time over
a range of columns, etc...
> Native HBase timestamp support to optimize date range queries in Phoenix
> -------------------------------------------------------------------------
>
> Key: PHOENIX-914
> URL: https://issues.apache.org/jira/browse/PHOENIX-914
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.0.0
> Reporter: Vladimir Rodionov
> Assignee: Vladimir Rodionov
>
> For many applications one of the column of a table can be (and must be)
> naturally mapped
> to HBase timestamp. What it gives us is the optimization on StoreScanner
> where HFiles with timestamps out of range of
> a Scan operator will be omitted. Let us say that we have time-series type of
> data (EVENTS) and custom compaction, where we create
> series of HFiles with continuous non-overlapping timestamp ranges.
> CREATE TABLE IF NOT EXISTS ODS.EVENTS (
> METRICID VARCHAR NOT NULL,
> METRICNAME VARCHAR,
> SERVICENAME VARCHAR NOT NULL,
> ORIGIN VARCHAR NOT NULL,
> APPID VARCHAR,
> IPID VARCHAR,
> NVALUE DOUBLE,
> TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/,
> DATA VARCHAR,
> SVALUE VARCHAR
> CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID,
> TIME)
> ) SALT_BUCKETS=40, IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';
> Make note on TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/ - this is the Hint to
> Phoenix that the column
> TIME must be mapped to HBase timestamp.
> The Query:
> Select all events of type 'X' for last 7 days
> SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() -
> 7*24*3600000; (this may be not correct SQL syntax of course)
> These types of queries will be efficiently optimized if:
> 1. Phoenix maps TIME column to HBase timestamp
> 2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan
> timerange
> Although this :
> Properties props = new Properties();
> props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
> Connection conn = DriverManager.connect(myUrl, props);
> conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
> conn.commit();
> will work in my case- it may not be efficient from performance point of view
> because for every INSERT/UPSERT
> new Connection object and new Statement is created, beside this we still need
> the optimization 2. (see above).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)