[ https://issues.apache.org/jira/browse/HBASE-24742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157776#comment-17157776 ]
Bharath Vissapragada commented on HBASE-24742: ---------------------------------------------- To add more color, following is the tight loop that Lars is talking about {noformat} protected boolean trySkipToNextColumn(Cell cell) throws IOException { Cell nextCell = null; // used to guard against a changed next indexed key by doing a identity comparison // when the identity changes we need to compare the bytes again Cell previousIndexedKey = null; do { Cell nextIndexedKey = getNextIndexedKey(); if (nextIndexedKey != null && nextIndexedKey != KeyValueScanner.NO_NEXT_INDEXED_KEY && (nextIndexedKey == previousIndexedKey || matcher.compareKeyForNextColumn(nextIndexedKey, cell) >= 0)) { <===== this.heap.next(); ++kvsScanned; previousIndexedKey = nextIndexedKey; } else { return false; } } while ((nextCell = this.heap.peek()) != null && CellUtil.matchingRowColumn(cell, nextCell)); // We need this check because it may happen that the new scanner that we get // during heap.next() is requiring reseek due of fake KV previously generated for // ROWCOL bloom filter optimization. See HBASE-19863 for more details if (nextCell != null && matcher.compareKeyForNextColumn(nextCell, cell) < 0) {. <=== return false; } return true; } {noformat} Specifically that was added to prevent SQM from matching the skipped rows but it turns out that it does may more compare checks than what it was before. To test our theory we've undone the loop and let the SQM match the rows and we gained almost ~30% back in scans with explicit column filters. But again as discussed in HBASE-17958, that comes at an expense of correctness that filters shouldn't see skipped rows. [~zghao] [~zhangduo] FYI since you were involved in the original jira fix and implementation. > Improve performance of SKIP vs SEEK logic > ----------------------------------------- > > Key: HBASE-24742 > URL: https://issues.apache.org/jira/browse/HBASE-24742 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Priority: Major > Attachments: hbase-24742-branch-1.txt > > > In our testing of HBase 1.3 against the current tip of branch-1 we saw a 30% > slowdown in scanning scenarios. > We tracked it back to HBASE-17958 and HBASE-19863. > Both add comparisons to one of the tightest HBase has. > [~bharathv] -- This message was sent by Atlassian Jira (v8.3.4#803005)