[jira] [Commented] (PHOENIX-1453) Collect row counts per region in stats table

James Taylor (JIRA) Tue, 13 Jan 2015 20:26:04 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276467#comment-14276467
 ]


James Taylor commented on PHOENIX-1453:
---------------------------------------

Thanks, [~ramkrishna]. Not a huge deal, but I don't quite understand the 
special case you've created. If there are 2 guideposts, and a split occurs, 
what is midEndIndex? Assuming midEndIndex=1, I don't think we should be doing a 
+1 for per, as per should be 0.5 in this case (since the split will end up with 
50% of the guideposts in the right region and 50% of the guideposts in the left 
region). If you removed the +1, and just set per = ((double)(midEndIndex)) / 
size, would that prevent the special case?
{code}
+                double per = (double)(midEndIndex + 1) / size;
+                long leftRowCount = 0;
+                long rightRowCount = 0;
+                long leftByteCount = 0;
+                long rightByteCount = 0;
+                if (rowCountCell != null) {
+                    rowCount = 
PLong.INSTANCE.getCodec().decodeLong(rowCountCell.getValueArray(),
+                            rowCountCell.getValueOffset(), 
SortOrder.getDefault());
+                    leftRowCount = (long)(per * rowCount);
+                    if (leftRowCount == rowCount) {
+                        leftRowCount = (rightRowCount = rowCount / 2);
+                    } else {
+                        rightRowCount = (long)((1 - per) * rowCount);
+                    }
+                }
{code}

> Collect row counts per region in stats table
> --------------------------------------------
>
>                 Key: PHOENIX-1453
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1453
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>         Attachments: Phoenix-1453.patch, Phoenix-1453_1.patch, 
> Phoenix-1453_10.patch, Phoenix-1453_13.patch, Phoenix-1453_15.patch, 
> Phoenix-1453_17.patch, Phoenix-1453_18.patch, Phoenix-1453_2.patch, 
> Phoenix-1453_20.patch, Phoenix-1453_3.patch, Phoenix-1453_7.patch, 
> Phoenix-1453_8.patch
>
>
> We currently collect guideposts per equal chunk, but we should also capture 
> row counts. Should we have a parallel array with the guideposts that count 
> rows per guidepost, or is it enough to have a per region count?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1453) Collect row counts per region in stats table

Reply via email to