[ 
https://issues.apache.org/jira/browse/HBASE-24742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157776#comment-17157776
 ] 

Bharath Vissapragada commented on HBASE-24742:
----------------------------------------------

To add more color, following is the tight loop that Lars is talking about

{noformat}
  protected boolean trySkipToNextColumn(Cell cell) throws IOException {
    Cell nextCell = null;
    // used to guard against a changed next indexed key by doing a identity 
comparison
    // when the identity changes we need to compare the bytes again
    Cell previousIndexedKey = null;
    do {
      Cell nextIndexedKey = getNextIndexedKey();
      if (nextIndexedKey != null && nextIndexedKey != 
KeyValueScanner.NO_NEXT_INDEXED_KEY &&
          (nextIndexedKey == previousIndexedKey ||
          matcher.compareKeyForNextColumn(nextIndexedKey, cell) >= 0)) { <=====
        this.heap.next();
        ++kvsScanned;
        previousIndexedKey = nextIndexedKey;
      } else {
        return false;
      }
    } while ((nextCell = this.heap.peek()) != null && 
CellUtil.matchingRowColumn(cell, nextCell));
    // We need this check because it may happen that the new scanner that we get
    // during heap.next() is requiring reseek due of fake KV previously 
generated for
    // ROWCOL bloom filter optimization. See HBASE-19863 for more details
    if (nextCell != null && matcher.compareKeyForNextColumn(nextCell, cell) < 
0) {. <===
      return false;
    }
    return true;
  }
{noformat}

Specifically that was added to prevent SQM from matching the skipped rows but 
it turns out that it does may more compare checks than what it was before. To 
test our theory we've undone the loop and let the SQM match the rows and we 
gained almost ~30% back in scans with explicit column filters. But again as 
discussed in HBASE-17958, that comes at an expense of correctness that filters 
shouldn't see skipped rows.

[~zghao] [~zhangduo] FYI since you were involved in the original jira fix and 
implementation.

> Improve performance of SKIP vs SEEK logic
> -----------------------------------------
>
>                 Key: HBASE-24742
>                 URL: https://issues.apache.org/jira/browse/HBASE-24742
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Major
>         Attachments: hbase-24742-branch-1.txt
>
>
> In our testing of HBase 1.3 against the current tip of branch-1 we saw a 30% 
> slowdown in scanning scenarios.
> We tracked it back to HBASE-17958 and HBASE-19863.
> Both add comparisons to one of the tightest HBase has.
> [~bharathv]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to