[
https://issues.apache.org/jira/browse/PHOENIX-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165883#comment-14165883
]
James Taylor commented on PHOENIX-1333:
---------------------------------------
Also, one more minor note: given that we need to store PStats on PTable, it's
probably better if we do expose an accessor for it and just the guideposts
through PStats. That way, you don't have to break it apart at all for the
separate column families (and same for the byteCount). I'll try to make that
change.
> Store statistics guideposts as VARBINARY
> ----------------------------------------
>
> Key: PHOENIX-1333
> URL: https://issues.apache.org/jira/browse/PHOENIX-1333
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Assignee: ramkrishna.s.vasudevan
> Priority: Critical
> Attachments: Phoenix-1333.patch
>
>
> There's a potential problem with storing the guideposts as a VARBINARY ARRAY,
> as pointed out by PHOENIX-1329. We'd run into this issue if we're collecting
> stats for a table with a trailing VARBINARY row key column if the value
> contained embedded null bytes. Because of this, we're better off storing
> guideposts as VARBINARY and serializing/deserializing in the following manner:
> <byte length as vint><bytes><byte length as vint><bytes>...
> We should also store as a separate KeyValue column the total number of
> guideposts. So the schema of SYSTEM.STATS would look like this now instead:
> {code}
> public static final String CREATE_STATS_TABLE_METADATA =
> "CREATE TABLE " + SYSTEM_CATALOG_SCHEMA + ".\"" +
> SYSTEM_STATS_TABLE + "\"(\n" +
> // PK columns
> PHYSICAL_NAME + " VARCHAR NOT NULL," +
> COLUMN_FAMILY + " VARCHAR," +
> REGION_NAME + " VARCHAR," +
> GUIDE_POSTS + " VARBINARY," +
> GUIDE_POSTS_COUNT + " SMALLINT," +
> MIN_KEY + " VARBINARY," +
> MAX_KEY + " VARBINARY," +
> LAST_STATS_UPDATE_TIME+ " DATE, "+
> "CONSTRAINT " + SYSTEM_TABLE_PK_NAME + " PRIMARY KEY ("
> + PHYSICAL_NAME + ","
> + COLUMN_FAMILY + ","+ REGION_NAME+"))\n" +
> // TODO: should we support versioned stats?
> // Install split policy to prevent a physical table's stats from
> being split across regions.
> HTableDescriptor.SPLIT_POLICY + "='" +
> MetaDataSplitPolicy.class.getName() + "'\n";
> {code}
> Then the serialization code in StatisticsTable.addStats() would need to
> change to populate the GUIDE_POSTS_COUNT and serialize the GUIDE_POSTS in the
> new format.
> The deserialization code is isolated to StatisticsUtil.readStatisitics(). It
> would need to read the GUIDE_POSTS_COUNT first for estimated sizing, and then
> deserialize the GUIDE_POSTS in the new format.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)