[jira] [Commented] (PHOENIX-1333) Store statistics guideposts as VARBINARY

ramkrishna.s.vasudevan (JIRA) Thu, 09 Oct 2014 11:11:49 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165456#comment-14165456
 ]


ramkrishna.s.vasudevan commented on PHOENIX-1333:
-------------------------------------------------

bq.I'd keep a running count of the number of bytes and serialize it in the 
front of the byte[] and use it for sizing (that plus the writeVInt size of the 
count and you should be able to get an exact byte amount.
Which byte[] you mean? The row that is added to the list in the 
Pair<Integer,List<byte[]>? If we add it here then we may have not know the 
exact length until we traverse the guidePost map. 
bq.Make sure when you combine the guideposts for a given family, that you add 
the previous guidePostCount and guidePostDepth to the new one here:
So this means I have to have a map that has the guidePostDepth * 
guidePosts.size() collected against the family.  Every time we access for the 
same family get the previous value and add it with the new value - put it back 
to the map?


> Store statistics guideposts as VARBINARY
> ----------------------------------------
>
>                 Key: PHOENIX-1333
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1333
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: Phoenix-1333.patch
>
>
> There's a potential problem with storing the guideposts as a VARBINARY ARRAY, 
> as pointed out by PHOENIX-1329. We'd run into this issue if we're collecting 
> stats for a table with a trailing VARBINARY row key column if the value 
> contained embedded null bytes. Because of this, we're better off storing 
> guideposts as VARBINARY and serializing/deserializing in the following manner:
> <byte length as vint><bytes><byte length as vint><bytes>...
> We should also store as a separate KeyValue column the total number of 
> guideposts. So the schema of SYSTEM.STATS would look like this now instead:
> {code}
>     public static final String CREATE_STATS_TABLE_METADATA = 
>             "CREATE TABLE " + SYSTEM_CATALOG_SCHEMA + ".\"" + 
> SYSTEM_STATS_TABLE + "\"(\n" +
>             // PK columns
>             PHYSICAL_NAME  + " VARCHAR NOT NULL," +
>             COLUMN_FAMILY + " VARCHAR," +
>             REGION_NAME + " VARCHAR," +
>             GUIDE_POSTS  + " VARBINARY," +
>             GUIDE_POSTS_COUNT + " SMALLINT," +
>             MIN_KEY + " VARBINARY," + 
>             MAX_KEY + " VARBINARY," +
>             LAST_STATS_UPDATE_TIME+ " DATE, "+
>             "CONSTRAINT " + SYSTEM_TABLE_PK_NAME + " PRIMARY KEY ("
>             + PHYSICAL_NAME + ","
>             + COLUMN_FAMILY + ","+ REGION_NAME+"))\n" +
>             // TODO: should we support versioned stats?
>             // Install split policy to prevent a physical table's stats from 
> being split across regions.
>             HTableDescriptor.SPLIT_POLICY + "='" + 
> MetaDataSplitPolicy.class.getName() + "'\n";
> {code}
> Then the serialization code in StatisticsTable.addStats() would need to 
> change to populate the GUIDE_POSTS_COUNT and serialize the GUIDE_POSTS in the 
> new format.
> The deserialization code is isolated to StatisticsUtil.readStatisitics(). It 
> would need to read the GUIDE_POSTS_COUNT first for estimated sizing, and then 
> deserialize the GUIDE_POSTS in the new format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1333) Store statistics guideposts as VARBINARY

Reply via email to