[ https://issues.apache.org/jira/browse/HBASE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918365#action_12918365 ]
HBase Review Board commented on HBASE-3073: ------------------------------------------- Message from: "Ryan Rawson" <ryano...@gmail.com> bq. On 2010-10-05 16:25:58, Ryan Rawson wrote: bq. > trunk/src/main/java/org/apache/hadoop/hbase/client/Result.java, line 202 bq. > <http://review.cloudera.org/r/963/diff/1/?file=14002#file14002line202> bq. > bq. > after https://issues.apache.org/jira/browse/HBASE-2753 is resolved, raw()===sort() and there wont be a difference. bq. bq. Jonathan Gray wrote: bq. Okay. Just saying, weird to use a deprecated method internally but agree that it doesn't matter if you're planning to remove the assert. the code calls raw() now, hopefully our asserts dont trigger - Ryan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/963/#review1428 ----------------------------------------------------------- > New APIs for Result, faster implementation for some calls > --------------------------------------------------------- > > Key: HBASE-3073 > URL: https://issues.apache.org/jira/browse/HBASE-3073 > Project: HBase > Issue Type: Bug > Affects Versions: 0.89.20100924 > Reporter: ryan rawson > Assignee: ryan rawson > Fix For: 0.90.0 > > Attachments: HBASE-3073.txt > > > Our existing API for Result hasn't been given much love in the last year. In > the mean time, inefficiencies in the existing implementation have come to > light, causing issues with benchmarks. Furthermore, some people are finding > the API both difficult to use as well as not useful enough (See: HBASE-1937). > I propose the following new APIs: > public List<KeyValue> getColumn(byte [] family, byte [] qualifier); > public KeyValue getColumnLatest(byte [] family, byte [] qualifier); > The implementation of these use a binary search on the underlying kvs array > (which is sorted). I also have new implementations for > public boolean containsColumn(byte [] family, byte [] qualifier); > public byte [] getValue(byte [] family, byte [] qualifier); > Which in the small case run faster, but in the big case seem to run a bit > slower. That is if you call getValue() 10 times for a Result it will be > faster with the new implementation, but if you call getValue() 100 times for > the same Result it is faster using the old implementation. My tests > indicated about 10% slower on 'getValue' 100x with an overall 1000x iteration > on 1000 different Result objects. Considering most people use getValue() to > retrieve named columns and iteration when the qualifier list is unknown I > think this is a reasonable trade off. > Along with the new API, there is a recommendation to use raw() to get the > list of KeyValue objects for iteration. This increases the visibility of > KeyValue, and also is much faster to iterate (4.9 times on my mini benchmark, > 100 columns per Result, redone 1000 times on different Result objects). > Given my recent major speed boost by changing YCSB to use the raw() > interface, I think that this is a must have for 0.90. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.