[ https://issues.apache.org/jira/browse/HBASE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ryan rawson updated HBASE-3073: ------------------------------- Attachment: (was: HBASE-3073.txt) > New APIs for Result, faster implementation for some calls > --------------------------------------------------------- > > Key: HBASE-3073 > URL: https://issues.apache.org/jira/browse/HBASE-3073 > Project: HBase > Issue Type: Bug > Affects Versions: 0.89.20100924 > Reporter: ryan rawson > Assignee: ryan rawson > Fix For: 0.90.0 > > Attachments: HBASE-3073.txt > > > Our existing API for Result hasn't been given much love in the last year. In > the mean time, inefficiencies in the existing implementation have come to > light, causing issues with benchmarks. Furthermore, some people are finding > the API both difficult to use as well as not useful enough (See: HBASE-1937). > I propose the following new APIs: > public List<KeyValue> getColumn(byte [] family, byte [] qualifier); > public KeyValue getColumnLatest(byte [] family, byte [] qualifier); > The implementation of these use a binary search on the underlying kvs array > (which is sorted). I also have new implementations for > public boolean containsColumn(byte [] family, byte [] qualifier); > public byte [] getValue(byte [] family, byte [] qualifier); > Which in the small case run faster, but in the big case seem to run a bit > slower. That is if you call getValue() 10 times for a Result it will be > faster with the new implementation, but if you call getValue() 100 times for > the same Result it is faster using the old implementation. My tests > indicated about 10% slower on 'getValue' 100x with an overall 1000x iteration > on 1000 different Result objects. Considering most people use getValue() to > retrieve named columns and iteration when the qualifier list is unknown I > think this is a reasonable trade off. > Along with the new API, there is a recommendation to use raw() to get the > list of KeyValue objects for iteration. This increases the visibility of > KeyValue, and also is much faster to iterate (4.9 times on my mini benchmark, > 100 columns per Result, redone 1000 times on different Result objects). > Given my recent major speed boost by changing YCSB to use the raw() > interface, I think that this is a must have for 0.90. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.