[ 
https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671944#comment-16671944
 ] 

Lars Hofhansl commented on HBASE-21418:
---------------------------------------

Thanks for looking into this [~Jeongdae Kim]!

Generally I am not a fan of adding more HBase and/or scan options that one has 
to know about. (which is why I had removed the LOOK_AHEAD hint that I myself 
had added a bit earlier).

Is there no way to guess whether we should SKIP or SEEK?

Patch looks good in general. Few comments:
{code:java}
 
-  /**
-   * @deprecated without replacement
-   *             This is now a no-op, SEEKs and SKIPs are optimizated 
automatically.
-   *             Will be removed in 2.0+
-   */
-  @Deprecated
+
   public static final String HINT_LOOKAHEAD = "_look_ahead_";{code}
We might want to put the older (EXPERT_ONLY) comment back (or any comment 
explaining what this is doing).
{code:java}
+    MemStoreScanner(long readPoint, ScanQueryMatcher scanQueryMatcher) {
+      this(readPoint);
+
+      if (scanQueryMatcher != null) {
+        int lookAheadRows = 
conf.getInt("hbase.hregion.memstore.scanner.lookahead.rows", 0);
+        if (scanQueryMatcher.getLookAheadBeforeReseek() <= lookAheadRows) {
+          this.lookAheadRows = scanQueryMatcher.getLookAheadBeforeReseek();
+        }
+      }
+    }
+{code}
Note that conf.getXXX can be slow itself. We might want to cache this value 
somewhere.
{code:java}
+
+      if (lookAheadRows > 0 && peek() != null) {
+        for (int i = lookAheadRows; i > 0; --i) {
+          next();
+          if (peek() == null) {
+            return false;
+          }
+          if (comparator.compare(peek(), key) >= 0) {
+            return true;
+          }
+        }
+      }
+{code}
This is a nit, but since the value if {{i}} is not actually used inside the 
loop, I'd prefer

{{for (int i=0; i< lookAheadRows; i++)...}}
{code:java}
+
+    byte[] attr = scan.getAttribute(Scan.HINT_LOOKAHEAD);
+    this.lookAheadBeforeSeek = Math.min(attr == null ? 0 : Bytes.toInt(attr), 
scanInfo.getMaxVersions());
+{code}
Why max versions here? The SEEKing can also be an issue with many columns, 
right?

 

If we can, let's find a heuristic to do this automatically (like I did with 
HFiles), so that a user won't have to hint.

 

> Reduce a number of reseek operations in MemstoreScanner when seek point is 
> close to the current row.
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21418
>                 URL: https://issues.apache.org/jira/browse/HBASE-21418
>             Project: HBase
>          Issue Type: Improvement
>          Components: scan, Scanners
>    Affects Versions: 1.2.5
>            Reporter: Jeongdae Kim
>            Assignee: Jeongdae Kim
>            Priority: Minor
>              Labels: performance
>         Attachments: HBASE-21418.branch-1.2.001.patch, 
> HBASE-21418.branch-1.2.001.patch
>
>
> We observed “responseTooSlow” logs for Get requests in our production 
> clusters. even some get requests were responded after 10 seconds.
> Affected get requests were done with the timerange, and target rows have many 
> columns that have some versions.
> We reproduced this issue, and found this behavior happens only when scanning 
> in the memstore. after flushing the HStore, this slow response issue for Get 
> disappeared and all same get requests are responded very quickly.
>  
> We investigated this case, and found this performance difference between 
> memstore scanner and hfile scanner is caused by the number of reseek 
> operations executed while scanning. When a store scanner needs to reseek the 
> next column, Hfile scanner wisely decide whether it have to reseek or not by 
> checking the seek point is in current block, whereas memstore scanner just do 
> reseek without decision unlike Hfile scanner. In our case, almost all columns 
> in the memstore have older timestamp than scan(get)’s timerange, and so many 
> reseek operations occur as much as about the number of columns. This results 
> in increasing the response time of Get requests sporadically.
>  
> To improve the reseek operation of the memstore scanner, i think it’s better 
> skipping than seeking when reseek requested, if seek point is quite close to 
> current cell that the scanner is pointing now.(Actually, i changed 
> MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time 
> of Get was 6x faster than before) But we can’t decide whether seek point is 
> close to the current cell or not, because memstore scannner has no 
> information such as next block index.
>  Before HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this 
> case, and it may be deprecated someday. But, i think that hint is still be 
> useful for the memstore scanner to try to skip first, before reseeking, and 
> with this option we can make reseek operations of memstore scanner smarter.
>  
> I tested this patch in our case, and got the same result as i changed 
> matchcode (mentioned above).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to