[jira] Commented: (HBASE-2959) Scanning always starts at the beginning of a row

Jonathan Gray (JIRA) Wed, 08 Sep 2010 10:44:56 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907333#action_12907333
 ]


Jonathan Gray commented on HBASE-2959:
--------------------------------------

bq. Wrong or indeterminate behavior when there are duplicate versions of a 
column
There hasn't been any change in HBase behavior.  This issue has always existed. 
 And until recent patches from Pranav, we would actually read the entire row 
even if we only wanted a single column out of it.  So there have been distinct 
perf improvements added on trunk not regressions.  I think the regressions are 
coming from your change in schema design.

bq. Jonathan, I'm missing some context about "delete family". 
Imagine I have a row with 20k columns and versions in it.  I want to delete 
this row.  I now have to read in all 20k KVs then do an insert 20k new delete 
marker KVs.  When I do a read on this row later, I will have to shuffle through 
all 40k of these KVs, processing the delete marker before each deleted KV.  The 
actual delete operation will be quite slow and then all reads against this row 
will be slow (well beyond the hit of a start-of-row seek).  Deleting rows is 
fairly common IMO, I know that we are doing it here and I've made use of it in 
the past as well.

The direction I'd like to see is to move delete families to the side if we 
don't want to pay the cost of having to start every row read at the start of 
the row.  It'd be an open question if we wanted to do that with delete columns 
as well, since the same issue will exist if you have a single column with many 
versions (a schema used here).

A space efficient way would be to have a delete bloom (could be:  
deletefam+row+ts as the key, can stuff in deletecol+row+col+ts to the same 
bloom as well).  If you hit the bloom, then you have to do the start-of-row 
seek.  If not, you go straight to the column/version in question and don't pay 
any delete tax besides the bloom lookup.  If you have no deletes then there 
will be virtually no increase in memory usage or latency.

> Scanning always starts at the beginning of a row
> ------------------------------------------------
>
>                 Key: HBASE-2959
>                 URL: https://issues.apache.org/jira/browse/HBASE-2959
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.4, 0.20.5, 0.20.6, 0.89.20100621
>            Reporter: Benoit Sigoure
>            Priority: Blocker
>
> In HBASE-2248, the code in {{HRegion#get}} was changed like so:
> {code}
> -  private void get(final Store store, final Get get,
> -    final NavigableSet<byte []> qualifiers, List<KeyValue> result)
> -  throws IOException {
> -    store.get(get, qualifiers, result);
> +  /*
> +   * Do a get based on the get parameter.
> +   */
> +  private List<KeyValue> get(final Get get) throws IOException {
> +    Scan scan = new Scan(get);
> +
> +    List<KeyValue> results = new ArrayList<KeyValue>();
> +
> +    InternalScanner scanner = null;
> +    try {
> +      scanner = getScanner(scan);
> +      scanner.next(results);
> +    } finally {
> +      if (scanner != null)
> +        scanner.close();
> +    }
> +    return results;
>    }
> {code}
> So instead of doing a {{get}} straight on the {{Store}}, we now open a 
> scanner.  The problem is that we eventually end up in {{ScanQueryMatcher}} 
> where the constructor does: {{this.startKey = 
> KeyValue.createFirstOnRow(scan.getStartRow());}}.  This entails that if we 
> have a very wide row (thousands of columns), the scanner will need to go 
> through thousands of {{KeyValue}}'s before finding the right entry, because 
> it always starts from the beginning of the row, whereas before it was much 
> more straightforward.
> This problem was under the radar for a while because the overhead isn't too 
> unreasonable, but later on, {{incrementColumnValue}} was changed to do a 
> {{get}} under the hood.  At StumbleUpon we do thousands of ICV per second, so 
> thousand of times per second we're scanning some really wide rows.  When a 
> row is contented, this results in all the IPC threads being stuck on 
> acquiring a row lock, while one thread is doing the ICV (albeit slowly due to 
> the excessive scanning).  When all IPC threads are stuck, the region server 
> is unable to serve more requests.
> As a nice side effect, fixing this bug will make {{get}} and 
> {{incrementColumnValue}} faster, as well as the first call to {{next}} on a 
> scanner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2959) Scanning always starts at the beginning of a row

Reply via email to