[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

ramkrishna.s.vasudevan (JIRA) Sun, 11 Oct 2015 21:29:38 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952609#comment-14952609
 ]


ramkrishna.s.vasudevan commented on HBASE-14221:
------------------------------------------------

Thanks for the review.  Yes that is how it has to be - but there is one 
difference in how this works.  Due to various conditions and cases I thought of 
having 3 cases - INIT, NOT_IN_NEXT_ROW and NEXT_ROW. 
So in case there are 2 CFs and the 2 cells are 'row1:cf1:q1' and 'row1:cf2:q1'. 
(assume we have row2:cf1:q1 and row2:cf2:q1).
So once the first cell with cf1 is fetched that storeScanner would be moved to 
nextRow as it is currently at row2.  But for fetching row1:cf2 that 
storeSCanner is still at INIT state and so we will be doing the matchingRows 
there and that will also be moved to nextRow.  But at the KVHeap having these 
StoreScanners will next fetch row2:cf1 but since this scanner is the 'current' 
StoreScanner and already in nextRow it will avoid the compare this time. 
Internally at the StoreScanner level also we are avoiding the matchingrows to 
set the SQM with the current row.
Why this INIT state is also needed is that, some times though there are two CFs 
but the rows in the 2nd CF may always be lesser than the row fetched out from 
the CF1. So the KVHeap holding the StoreScanners may not actually use the 2nd 
CF at all so that point we really don know if that CF has crossed the current 
row it is pointing to. 

> Reduce the number of time row comparison is done in a Scan
> ----------------------------------------------------------
>
>                 Key: HBASE-14221
>                 URL: https://issues.apache.org/jira/browse/HBASE-14221
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Scanners
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: 14221-0.98-takeALook.txt, HBASE-14221.patch, 
> HBASE-14221_1.patch, HBASE-14221_1.patch, HBASE-14221_6.patch, 
> withmatchingRowspatch.png, withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>        int ret = this.rowComparator.compareRows(curCell, cell);
>     if (!this.isReversed) {
>       if (ret <= -1) {
>         return MatchCode.DONE;
>       } else if (ret >= 1) {
>         // could optimize this, if necessary?
>         // Could also be called SEEK_TO_CURRENT_ROW, but this
>         // should be rare/never happens.
>         return MatchCode.SEEK_NEXT_ROW;
>       }
>     } else {
>       if (ret <= -1) {
>         return MatchCode.SEEK_NEXT_ROW;
>       } else if (ret >= 1) {
>         return MatchCode.DONE;
>       }
>     }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
>     if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
>         isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>       this.countPerRow = 0;
>       matcher.setToNewRow(peeked);
>     }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>           scannerContext.setKeepProgress(true);
>           heap.next(results, scannerContext);
>           scannerContext.setKeepProgress(tmpKeepProgress);
>           nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

Reply via email to