[jira] Commented: (HBASE-2959) Scanning always starts at the beginning of a row

Jonathan Gray (JIRA) Wed, 08 Sep 2010 10:53:13 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907335#action_12907335
 ]


Jonathan Gray commented on HBASE-2959:
--------------------------------------

Sorry, mis-pasted the first quote.

bq. Yes this issue is a pretty major performance regression for us
I'm just trying to be clear.  It is not _this_ issue that is causing a 
performance regression (though this issue is likely involved with the 
slowness).  This issue has not changed since 0.20, it is the move to making 
Gets into Scans.  And in that, it's probably that previously a Get would 
early-out if the incremented column was in MemStore whereas now we seek every 
file before doing the get.

> Scanning always starts at the beginning of a row
> ------------------------------------------------
>
>                 Key: HBASE-2959
>                 URL: https://issues.apache.org/jira/browse/HBASE-2959
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.4, 0.20.5, 0.20.6, 0.89.20100621
>            Reporter: Benoit Sigoure
>            Priority: Blocker
>
> In HBASE-2248, the code in {{HRegion#get}} was changed like so:
> {code}
> -  private void get(final Store store, final Get get,
> -    final NavigableSet<byte []> qualifiers, List<KeyValue> result)
> -  throws IOException {
> -    store.get(get, qualifiers, result);
> +  /*
> +   * Do a get based on the get parameter.
> +   */
> +  private List<KeyValue> get(final Get get) throws IOException {
> +    Scan scan = new Scan(get);
> +
> +    List<KeyValue> results = new ArrayList<KeyValue>();
> +
> +    InternalScanner scanner = null;
> +    try {
> +      scanner = getScanner(scan);
> +      scanner.next(results);
> +    } finally {
> +      if (scanner != null)
> +        scanner.close();
> +    }
> +    return results;
>    }
> {code}
> So instead of doing a {{get}} straight on the {{Store}}, we now open a 
> scanner.  The problem is that we eventually end up in {{ScanQueryMatcher}} 
> where the constructor does: {{this.startKey = 
> KeyValue.createFirstOnRow(scan.getStartRow());}}.  This entails that if we 
> have a very wide row (thousands of columns), the scanner will need to go 
> through thousands of {{KeyValue}}'s before finding the right entry, because 
> it always starts from the beginning of the row, whereas before it was much 
> more straightforward.
> This problem was under the radar for a while because the overhead isn't too 
> unreasonable, but later on, {{incrementColumnValue}} was changed to do a 
> {{get}} under the hood.  At StumbleUpon we do thousands of ICV per second, so 
> thousand of times per second we're scanning some really wide rows.  When a 
> row is contented, this results in all the IPC threads being stuck on 
> acquiring a row lock, while one thread is doing the ICV (albeit slowly due to 
> the excessive scanning).  When all IPC threads are stuck, the region server 
> is unable to serve more requests.
> As a nice side effect, fixing this bug will make {{get}} and 
> {{incrementColumnValue}} faster, as well as the first call to {{next}} on a 
> scanner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2959) Scanning always starts at the beginning of a row

Reply via email to