James Taylor created PHOENIX-2143:
-------------------------------------

             Summary: Use region start key instead of region name in stats 
primary key
                 Key: PHOENIX-2143
                 URL: https://issues.apache.org/jira/browse/PHOENIX-2143
             Project: Phoenix
          Issue Type: Sub-task
            Reporter: James Taylor


Our current SYSTEM.STATS table uses the region name as the last column in the 
primary key constraint. Instead, we should use the MIN_KEY column (which 
corresponds to the region start key). The advantage would be that the stats 
would then be ordered by region start key allowing us to approximate the number 
of guideposts which would be traversed given the start/stop row of a scan:
{code}
SELECT SUM(guide_posts_count) FROM SYSTEM.STATS WHERE min_key > :1 AND min_key 
< :2
{code}
where :1 is the start row and :2 is the stop row of the scan. With an UNNEST 
operator for ARRAYs, we could get a better approximation.

As part of the upgrade to the new Phoenix version containing this fix, stats 
could simply be dropped and they'd be recalculated with the new schema.

An alternative, even more granular approach would be to *not* use arrays to 
store the guide posts, but instead store them as individual rows with a schema 
like this.
|PHYSICAL_NAME|VARCHAR|
|COLUMN_FAMILY|VARCHAR|
|GUIDE_POST_KEY|VARBINARY|
|GUIDE_POST_WIDTH|LONG|

In this alternative, the maintenance during compaction is higher, though, as 
you'd need to run a separate query to do the deletion of the old guideposts, 
followed by a commit of the new guideposts. The other disadvantage (besides 
requiring multiple queries) is that this couldn't be done transactionally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to