[jira] [Updated] (PHOENIX-2143) Use guidepost bytes instead of region name in stats primary key

Ankit Singhal (JIRA) Tue, 12 Jan 2016 23:48:07 -0800

     [ 
https://issues.apache.org/jira/browse/PHOENIX-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ankit Singhal updated PHOENIX-2143:
-----------------------------------
    Attachment: PHOENIX-2143_v4.patch


I think you were right that we don't need lower timestamp for adding a 
transaction column. I tried without PHOENIX-2572 and timestamp equal 4.7.0 and 
upgrade still work fine from 4.5 to 4.7 and 4.6 to 4.7..
I think the problem is with 4.4 to 4.7 upgrade so I tested with a patch from 
PHOENIX-2572 but it still didn't help in 4.4 to 4.7 upgrade.

PFA, updated patch. It includes you review comment of not using a lower 
timestamp for adding a column and tested manually for upgrades from 4.5 to 4.7 
and 4.6 to 4.7


> Use guidepost bytes instead of region name in stats primary key
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-2143
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2143
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Ankit Singhal
>         Attachments: PHOENIX-2143.patch, PHOENIX-2143_v2.patch, 
> PHOENIX-2143_v3.patch, PHOENIX-2143_v4.patch, PHOENIX-2143_wip.patch, 
> PHOENIX-2143_wip_2.patch
>
>
> Our current SYSTEM.STATS table uses the region name as the last column in the 
> primary key constraint. Instead, we should use the MIN_KEY column (which 
> corresponds to the region start key). The advantage would be that the stats 
> would then be ordered by region start key allowing us to approximate the 
> number of guideposts which would be traversed given the start/stop row of a 
> scan:
> {code}
> SELECT SUM(guide_posts_count) FROM SYSTEM.STATS WHERE min_key > :1 AND 
> min_key < :2
> {code}
> where :1 is the start row and :2 is the stop row of the scan. With an UNNEST 
> operator for ARRAYs, we could get a better approximation.
> As part of the upgrade to the new Phoenix version containing this fix, stats 
> could simply be dropped and they'd be recalculated with the new schema.
> An alternative, even more granular approach would be to *not* use arrays to 
> store the guide posts, but instead store them as individual rows with a 
> schema like this.
> |PHYSICAL_NAME|VARCHAR|
> |COLUMN_FAMILY|VARCHAR|
> |GUIDE_POST_KEY|VARBINARY|
> In this alternative, the maintenance during compaction is higher, though, as 
> you'd need to run a separate query to do the deletion of the old guideposts, 
> followed by a commit of the new guideposts. The other disadvantage (besides 
> requiring multiple queries) is that this couldn't be done transactionally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PHOENIX-2143) Use guidepost bytes instead of region name in stats primary key

Reply via email to