[
https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273135#comment-14273135
]
James Taylor commented on PHOENIX-1453:
---------------------------------------
Just a few minor items:
- Minor nit: would you mind naming the deserialize method in GuidePostsInfo as
deserializeGuidePostsInfo instead of deSerializeGuidePostsInfo?
- When calculating the percentage rowCount and byteCount on a split, you need
to have two values: leftRowCount, leftByteCount, and rightRowCount,
rightByteCount. Use the per variable as you've done to calculate the right
ones, but use (1.0 - per) for the other ones. Otherwise, you've mistakenly
increased the byte count and row count on a split if say the per variable is
0.7. Would be good to have a test case that would have caught this (i.e. that
the overall row count and byte count remain the same after a split).
{code}
+ double per = (double)(midEndIndex + 1) / size;
+ if (rowCountCell != null) {
+ rowCount =
PLong.INSTANCE.getCodec().decodeLong(rowCountCell.getValueArray(),
+ rowCountCell.getValueOffset(),
SortOrder.getDefault());
+ rowCount = (long)(per * rowCount);
+ }
+ if (byteSizeCell != null) {
+ byteSize =
PLong.INSTANCE.getCodec().decodeLong(byteSizeCell.getValueArray(),
+ byteSizeCell.getValueOffset(),
SortOrder.getDefault());
+ byteSize = (long)(per * byteSize);
+ }
if (midEndIndex > 0) {
- GuidePostsInfo lguidePosts = new
GuidePostsInfo(byteSize, guidePosts.getGuidePosts().subList(0, midEndIndex));
- tracker.clear();
+ GuidePostsInfo lguidePosts = new
GuidePostsInfo(byteSize, guidePostsRegionInfo
+ .getGuidePosts().subList(0, midEndIndex),
rowCount);
+ tracker.clear();
tracker.addGuidePost(cfKey, lguidePosts, byteSize,
cell.getTimestamp());
addStats(l.getRegionName(), tracker, cfKey, mutations);
}
- if (midStartIndex < guidePosts.getGuidePosts().size()) {
- GuidePostsInfo rguidePosts = new
GuidePostsInfo(byteSize, guidePosts.getGuidePosts().subList(midStartIndex,
guidePosts.getGuidePosts().size()));
+ if (midStartIndex < size) {
+ GuidePostsInfo rguidePosts = new
GuidePostsInfo(byteSize, guidePostsRegionInfo
+ .getGuidePosts().subList(midStartIndex, size),
+ rowCount);
{code}
- In PTableImpl, when the stats are serialized, it'd be a good idea to
serialize the byteWidth as before (in addition to the new way you're doing
which is definitely superior). The reason is that we allow a new server to be
used with an old client. If you serialize the byteWidth in the old way *and*
the new way, then old clients will continue to read the expected value.
+1 after those. Thanks for all the effort on this one, [~ramkrishna].
> Collect row counts per region in stats table
> --------------------------------------------
>
> Key: PHOENIX-1453
> URL: https://issues.apache.org/jira/browse/PHOENIX-1453
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Assignee: ramkrishna.s.vasudevan
> Attachments: Phoenix-1453.patch, Phoenix-1453_1.patch,
> Phoenix-1453_10.patch, Phoenix-1453_13.patch, Phoenix-1453_15.patch,
> Phoenix-1453_17.patch, Phoenix-1453_2.patch, Phoenix-1453_3.patch,
> Phoenix-1453_7.patch, Phoenix-1453_8.patch
>
>
> We currently collect guideposts per equal chunk, but we should also capture
> row counts. Should we have a parallel array with the guideposts that count
> rows per guidepost, or is it enough to have a per region count?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)