[
https://issues.apache.org/jira/browse/PHOENIX-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103291#comment-15103291
]
ASF GitHub Bot commented on PHOENIX-2417:
-----------------------------------------
Github user JamesRTaylor commented on a diff in the pull request:
https://github.com/apache/phoenix/pull/147#discussion_r49935021
--- Diff: phoenix-protocol/src/main/PTable.proto ---
@@ -52,11 +52,12 @@ message PColumn {
message PTableStats {
required bytes key = 1;
- repeated bytes values = 2;
+ optional bytes guidePosts = 2;
--- End diff --
There's a pretty big backward compatibility issue due to PHOENIX-2143 and
this one. The case you'll need to make work is an old pre 4.7.0 client that's
running against a new 4.7.0 server. The client will expect the stats to be in
the original format. In the following call:
public void getTable(RpcController controller, GetTableRequest request,
RpcCallback<MetaDataResponse> done) {
You'll need to pass request.getClientVersion() through doGetTable(), into
getTable() and finally into StatisticsUtil.readStatistics(). You should
preserve the old code (we can dump it when we do a major release), and use that
code path if the stats have not been regenerated yet. You can detect this based
on the existence of the GUIDE_POSTS key value (which you'll want to project
into the scan for the new code for this b/w compatibility case). If the stats
have been regenerated, there'd be two cases: the client is pre 4.7.0 in which
case you'd want to use the new code but put the data in the old format, or the
client is 4.7.0 or above in which case your existing code is fine.
With PHOENIX-2143, when compaction runs, we'll generate stats in the new
format. It's possible that the SYSTEM.STATS table hasn't been updated yet (as
this gets triggered when a new 4.7.0 client connects to the server which may
not yet have happened). We'd need to issue the previous Delete marker based on
the old row key structure to ensure that the stats for the region are deleted.
We wouldn't want to issue the query that does the range delete in this case
because it might delete rows across multiple regions (ugh). So we'd need to
know if the schema upgrade has been done yet when compaction runs. We could
detect this by querying the SYSTEM.CATALOG table directly or by using the
MetaDataProtocol.getTable() call and pulling over the PTable and then
conditionally do the delete the old way versus the new way.
WDYT, @ankitsinghal? A more radical alternative would be to call this
release 5.0. Users could still upgrade the server and client as with a minor
release, but they'd need to truncate the SYSTEM.STATS table manually before
upgrading the server. In that case, I think it'd be acceptable to return an
empty guidepost for the protobuf values field (as essentially stats would be
disabled for older clients running against the newer server).
> Compress memory used by row key byte[] of guideposts
> ----------------------------------------------------
>
> Key: PHOENIX-2417
> URL: https://issues.apache.org/jira/browse/PHOENIX-2417
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Assignee: Ankit Singhal
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2417.patch, PHOENIX-2417_encoder.diff,
> PHOENIX-2417_v2_wip.patch
>
>
> We've found that smaller guideposts are better in terms of minimizing any
> increase in latency for point scans. However, this increases the amount of
> memory significantly when caching the guideposts on the client. Guidepost are
> equidistant row keys in the form of raw byte[] which are likely to have a
> large percentage of their leading bytes in common (as they're stored in
> sorted order. We should use a simple compression technique to mitigate this.
> I noticed that Apache Parquet has a run length encoding - perhaps we can use
> that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)