[ https://issues.apache.org/jira/browse/HBASE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917128#action_12917128 ]
ryan rawson commented on HBASE-3073: ------------------------------------ this does indeed fix HBASE-1937, and HBASE-2753 is in progress under it's own commit. At that time we can make raw/sorted do the same thing (just return 'kvs'). This fixes HBASE-1937 by introducing the getColumnLatest() which returns a KeyValue which the user can call getTimestamp() on. Instead of creating Result API calls for every conceivable thing a user might want to do, let's just expose the KeyValue which has a rich API for doing all sorts of things, such as getting various fields (timestamp, qualifier, field, row, value) and comparison. I need to do a bit more perf testing and then I will clean up those javadocs. > New APIs for Result, faster implementation for some calls > --------------------------------------------------------- > > Key: HBASE-3073 > URL: https://issues.apache.org/jira/browse/HBASE-3073 > Project: HBase > Issue Type: Bug > Affects Versions: 0.89.20100924 > Reporter: ryan rawson > Assignee: ryan rawson > Fix For: 0.90.0 > > Attachments: HBASE-3073.txt > > > Our existing API for Result hasn't been given much love in the last year. In > the mean time, inefficiencies in the existing implementation have come to > light, causing issues with benchmarks. Furthermore, some people are finding > the API both difficult to use as well as not useful enough (See: HBASE-1937). > I propose the following new APIs: > public List<KeyValue> getColumn(byte [] family, byte [] qualifier); > public KeyValue getColumnLatest(byte [] family, byte [] qualifier); > The implementation of these use a binary search on the underlying kvs array > (which is sorted). I also have new implementations for > public boolean containsColumn(byte [] family, byte [] qualifier); > public byte [] getValue(byte [] family, byte [] qualifier); > Which in the small case run faster, but in the big case seem to run a bit > slower. That is if you call getValue() 10 times for a Result it will be > faster with the new implementation, but if you call getValue() 100 times for > the same Result it is faster using the old implementation. My tests > indicated about 10% slower on 'getValue' 100x with an overall 1000x iteration > on 1000 different Result objects. Considering most people use getValue() to > retrieve named columns and iteration when the qualifier list is unknown I > think this is a reasonable trade off. > Along with the new API, there is a recommendation to use raw() to get the > list of KeyValue objects for iteration. This increases the visibility of > KeyValue, and also is much faster to iterate (4.9 times on my mini benchmark, > 100 columns per Result, redone 1000 times on different Result objects). > Given my recent major speed boost by changing YCSB to use the raw() > interface, I think that this is a must have for 0.90. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.