I was wondering if anyone else out there would like to use hbase to support storing data that does not need random access just insert/delete/scan If we could support a table like this that would require little to no memory but still allow sorted scanable updateable data to be
stored in hbase with out the need to have index of keys in memory.
We should still have memory usage with inserts stored in memcache but no key index in memory.

This would allow large datasets that do not need random access to be stored and still give access to new/live data with scans with out having to merge/sort the data on disk manually before seeing updates.

I have a large amount of data coming in that needs expired over time. I store in hadoop and run MR jobs over it to produce accessible index of the data via hbase. The ideal here is if I could import that data in to hbase then I can access subsets of the data with out having to read all the data to find what I am looking for. with this hbase could merge/sort/expire/split the data as needed and still give access to newly inserted data.

This might take some memory on the master node but I would not thank there would be a limit on the size of the data except the hadoop storage size.
Anyone else thank they could use something like this also?

Billy Pearson



Reply via email to