[jira] Updated: (HBASE-3073) New APIs for Result, faster implementation for some calls

ryan rawson (JIRA) Fri, 01 Oct 2010 17:18:56 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ryan rawson updated HBASE-3073:
-------------------------------

    Attachment: HBASE-3073.txt

updated patch removing some WIP stuff that doesnt make sense

> New APIs for Result, faster implementation for some calls
> ---------------------------------------------------------
>
>                 Key: HBASE-3073
>                 URL: https://issues.apache.org/jira/browse/HBASE-3073
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.89.20100924
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3073.txt
>
>
> Our existing API for Result hasn't been given much love in the last year.  In 
> the mean time, inefficiencies in the existing implementation have come to 
> light, causing issues with benchmarks.  Furthermore, some people are finding 
> the API both difficult to use as well as not useful enough (See: HBASE-1937).
> I propose the following new APIs:
> public List<KeyValue> getColumn(byte [] family, byte [] qualifier);
> public KeyValue getColumnLatest(byte [] family, byte [] qualifier);
> The implementation of these use a binary search on the underlying kvs array 
> (which is sorted).  I also have new implementations for
> public boolean containsColumn(byte [] family, byte [] qualifier);
> public byte [] getValue(byte [] family, byte [] qualifier);
> Which in the small case run faster, but in the big case seem to run a bit 
> slower.  That is if you call getValue() 10 times for a Result it will be 
> faster with the new implementation, but if you call getValue() 100 times for 
> the same Result it is faster using the old implementation.  My tests 
> indicated about 10% slower on 'getValue' 100x with an overall 1000x iteration 
> on 1000 different Result objects.  Considering most people use getValue() to 
> retrieve named columns and iteration when the qualifier list is unknown I 
> think this is a reasonable trade off.
> Along with the new API, there is a recommendation to use raw() to get the 
> list of KeyValue objects for iteration.  This increases the visibility of 
> KeyValue, and also is much faster to iterate (4.9 times on my mini benchmark, 
> 100 columns per Result, redone 1000 times on different Result objects).
> Given my recent major speed boost by changing YCSB to use the raw() 
> interface, I think that this is a must have for 0.90.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-3073) New APIs for Result, faster implementation for some calls

Reply via email to