[ https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671944#comment-16671944 ]
Lars Hofhansl commented on HBASE-21418: --------------------------------------- Thanks for looking into this [~Jeongdae Kim]! Generally I am not a fan of adding more HBase and/or scan options that one has to know about. (which is why I had removed the LOOK_AHEAD hint that I myself had added a bit earlier). Is there no way to guess whether we should SKIP or SEEK? Patch looks good in general. Few comments: {code:java} - /** - * @deprecated without replacement - * This is now a no-op, SEEKs and SKIPs are optimizated automatically. - * Will be removed in 2.0+ - */ - @Deprecated + public static final String HINT_LOOKAHEAD = "_look_ahead_";{code} We might want to put the older (EXPERT_ONLY) comment back (or any comment explaining what this is doing). {code:java} + MemStoreScanner(long readPoint, ScanQueryMatcher scanQueryMatcher) { + this(readPoint); + + if (scanQueryMatcher != null) { + int lookAheadRows = conf.getInt("hbase.hregion.memstore.scanner.lookahead.rows", 0); + if (scanQueryMatcher.getLookAheadBeforeReseek() <= lookAheadRows) { + this.lookAheadRows = scanQueryMatcher.getLookAheadBeforeReseek(); + } + } + } +{code} Note that conf.getXXX can be slow itself. We might want to cache this value somewhere. {code:java} + + if (lookAheadRows > 0 && peek() != null) { + for (int i = lookAheadRows; i > 0; --i) { + next(); + if (peek() == null) { + return false; + } + if (comparator.compare(peek(), key) >= 0) { + return true; + } + } + } +{code} This is a nit, but since the value if {{i}} is not actually used inside the loop, I'd prefer {{for (int i=0; i< lookAheadRows; i++)...}} {code:java} + + byte[] attr = scan.getAttribute(Scan.HINT_LOOKAHEAD); + this.lookAheadBeforeSeek = Math.min(attr == null ? 0 : Bytes.toInt(attr), scanInfo.getMaxVersions()); +{code} Why max versions here? The SEEKing can also be an issue with many columns, right? If we can, let's find a heuristic to do this automatically (like I did with HFiles), so that a user won't have to hint. > Reduce a number of reseek operations in MemstoreScanner when seek point is > close to the current row. > ---------------------------------------------------------------------------------------------------- > > Key: HBASE-21418 > URL: https://issues.apache.org/jira/browse/HBASE-21418 > Project: HBase > Issue Type: Improvement > Components: scan, Scanners > Affects Versions: 1.2.5 > Reporter: Jeongdae Kim > Assignee: Jeongdae Kim > Priority: Minor > Labels: performance > Attachments: HBASE-21418.branch-1.2.001.patch, > HBASE-21418.branch-1.2.001.patch > > > We observed “responseTooSlow” logs for Get requests in our production > clusters. even some get requests were responded after 10 seconds. > Affected get requests were done with the timerange, and target rows have many > columns that have some versions. > We reproduced this issue, and found this behavior happens only when scanning > in the memstore. after flushing the HStore, this slow response issue for Get > disappeared and all same get requests are responded very quickly. > > We investigated this case, and found this performance difference between > memstore scanner and hfile scanner is caused by the number of reseek > operations executed while scanning. When a store scanner needs to reseek the > next column, Hfile scanner wisely decide whether it have to reseek or not by > checking the seek point is in current block, whereas memstore scanner just do > reseek without decision unlike Hfile scanner. In our case, almost all columns > in the memstore have older timestamp than scan(get)’s timerange, and so many > reseek operations occur as much as about the number of columns. This results > in increasing the response time of Get requests sporadically. > > To improve the reseek operation of the memstore scanner, i think it’s better > skipping than seeking when reseek requested, if seek point is quite close to > current cell that the scanner is pointing now.(Actually, i changed > MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time > of Get was 6x faster than before) But we can’t decide whether seek point is > close to the current cell or not, because memstore scannner has no > information such as next block index. > Before HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this > case, and it may be deprecated someday. But, i think that hint is still be > useful for the memstore scanner to try to skip first, before reseeking, and > with this option we can make reseek operations of memstore scanner smarter. > > I tested this patch in our case, and got the same result as i changed > matchcode (mentioned above). -- This message was sent by Atlassian JIRA (v7.6.3#76005)