If LOB means data larger than 10MB or even 100MB, why not just use an FileSystem instead of HBase? For a FileSystem it already has the stream interface...
2015-03-09 10:55 GMT+08:00 Wilm Schumacher <wilm.schumac...@gmail.com>: > Hi, > > I have an idea for a feature in hbase which directly derives from the > idea of the MOB feature. As Jonathan Hsieh pointed out, the only thing > that limiting the feature to MOBs instead to LOBs is the memory > allocation on client and server side. However, the "LOB feature" would > be very handy for me and I think for some other users, too. Furthermore > the fast fetching small files problem could be solved. > > The natural solution would be a "BigPut" and a "BigGet" class, which > encounter that problem, which are capable of dealing with large amount > of data without using too much memory. My plan by now is to creates > classes that do e.g. > BigPut BigPut.add( byte[] , byte[] , inputstream ) > and > outputstream BigResult.value( byte[] , byte[] ) > (in addition to the normal byte[] to byte[] member functions) > > and pass the inputstreams through the AsyncProcess class to the RPC or > in reverse the outputstream for the BigResult class. By this plan the > client and server would have to throw out some threads to deal with > multiple streams[1]. > > By now I dig into the hbase-client (2.0.0) sources and I think that my > plan would be quite invasive to the existing code ... but is doable. > However, regarding the very open development model of hbase features I > think it could be adressed. > > But I'm veeeery new to hbase development and just started to read the > source. Before I dig to deep into the problem I wanted to ask here if > there is any show stopper I'm missing by now? > To make a list of questions for that feature: > * As this plan probably won't break the thread model of the > hbase-client, is there any problem on the (region) server side? Or is > there any blocking/race condition problem elsewhere I miss by now? > * Is it a bad plan to pump several 100s of MB through one RPC in a > separate thread? If yes ... why? > * Are there any other fundamental problems I miss by now which makes > that a horrible plan? > * Is there already some dev onging? I didn't found something on jira. > But that doesn't mean anything :/ > * Does anyone have a better name than "BigPut" :D? > > And at last: > * Is it a better plan to create a separate "MOB/LOB service"?[2] > > Best wishes > > Wilm > > [1] or one could limit the number of streams to one. By this the > threading problem would be much more simple to encounter as only one > "RPC" would be neccessary. > > [2] on one hand it is easier to bare LOBs in mind if you create a > service e.g. with a rest interface (multipart data etc), on the other > hand you have to reinvent the wheel (compaction etc.) >