[jira] Commented: (HBASE-3073) New APIs for Result, faster implementation for some calls

ryan rawson (JIRA) Fri, 01 Oct 2010 17:35:55 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917128#action_12917128
 ]


ryan rawson commented on HBASE-3073:
------------------------------------

this does indeed fix HBASE-1937, and HBASE-2753 is in progress under it's own 
commit.  At that time we can make raw/sorted do the same thing (just return 
'kvs').

This fixes HBASE-1937 by introducing the getColumnLatest() which returns a 
KeyValue which the user can call getTimestamp() on.  Instead of creating Result 
API calls for every conceivable thing a user might want to do, let's just 
expose the KeyValue which has a rich API for doing all sorts of things, such as 
getting various fields (timestamp, qualifier, field, row, value) and comparison.

I need to do a bit more perf testing and then I will clean up those javadocs.  

> New APIs for Result, faster implementation for some calls
> ---------------------------------------------------------
>
>                 Key: HBASE-3073
>                 URL: https://issues.apache.org/jira/browse/HBASE-3073
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.89.20100924
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3073.txt
>
>
> Our existing API for Result hasn't been given much love in the last year.  In 
> the mean time, inefficiencies in the existing implementation have come to 
> light, causing issues with benchmarks.  Furthermore, some people are finding 
> the API both difficult to use as well as not useful enough (See: HBASE-1937).
> I propose the following new APIs:
> public List<KeyValue> getColumn(byte [] family, byte [] qualifier);
> public KeyValue getColumnLatest(byte [] family, byte [] qualifier);
> The implementation of these use a binary search on the underlying kvs array 
> (which is sorted).  I also have new implementations for
> public boolean containsColumn(byte [] family, byte [] qualifier);
> public byte [] getValue(byte [] family, byte [] qualifier);
> Which in the small case run faster, but in the big case seem to run a bit 
> slower.  That is if you call getValue() 10 times for a Result it will be 
> faster with the new implementation, but if you call getValue() 100 times for 
> the same Result it is faster using the old implementation.  My tests 
> indicated about 10% slower on 'getValue' 100x with an overall 1000x iteration 
> on 1000 different Result objects.  Considering most people use getValue() to 
> retrieve named columns and iteration when the qualifier list is unknown I 
> think this is a reasonable trade off.
> Along with the new API, there is a recommendation to use raw() to get the 
> list of KeyValue objects for iteration.  This increases the visibility of 
> KeyValue, and also is much faster to iterate (4.9 times on my mini benchmark, 
> 100 columns per Result, redone 1000 times on different Result objects).
> Given my recent major speed boost by changing YCSB to use the raw() 
> interface, I think that this is a must have for 0.90.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3073) New APIs for Result, faster implementation for some calls

Reply via email to