Also, dont break it :-) Part of the goal of HFile was to build something quick and reliable. It can be hard to know you have all the corner cases down and you won't find out in 6 months that every single piece of data you have put in HBase is corrupt. Keeping it simple is one strategy.
I have previously thought about prefix compression, it seemed doable, you'd need a compressing algorithm, then in the Scanner you would expand KeyValues and callers would end up with copies, not views on, the original data. The JVM is fairly good about short lived objects (up to a certain allocation rate that is), and while the original goal was to reduce memory usage, it could make sense to take a higher short term allocation rate if the wins from prefix compression are there. Also note that in whole-system profiling, often repeated methods in KeyValue do pop up. The goal of KeyValue was to have a format that didnt require deserialization into larger data structures (hence the lack of vint), and would be simple and fast. Undoing that work should be accompanied with profiling evidence that new slowdowns were not introduced. -ryan On Sat, Jun 4, 2011 at 3:30 PM, Jason Rutherglen <jason.rutherg...@gmail.com> wrote: >> You'd have to change how the Scanner code works, etc. You'll find out. > > Nice! Sounds fun. > > On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson <ryano...@gmail.com> wrote: >> What are the specs/goals of a pluggable block index? Right now the >> block index is fairly tied deep in how HFile works. You'd have to >> change how the Scanner code works, etc. You'll find out. >> >> >> >> On Sat, Jun 4, 2011 at 3:17 PM, Stack <saint....@gmail.com> wrote: >>> I do not know of one. FYI hfile is pretty standalone regards tests etc. >>> There is even a perf testing class for hfile >>> >>> >>> >>> On Jun 4, 2011, at 14:44, Jason Rutherglen <jason.rutherg...@gmail.com> >>> wrote: >>> >>>> I want to take a wh/hack at creating a pluggable block index, is there >>>> an open issue for this? I looked and couldn't find one. >>> >> >