I would do the DataOutputBuffer level -- in general, compressing bigger blocks is more efficient since the compression algorithm has more room to find duplicates. But trying to stripe across buffers would leave you with awkwardness in the presence of missing data.
I would start with the DataOutputBuffer strategy, since it's easy to do and not obviously the wrong thing -- if it seems to work satisfactorily, declare victory and contribute the patch. On Sat, Jul 28, 2012 at 8:19 PM, Sourygna Luangsay <[email protected]> wrote: > Hi Ari, > > Yes, we do need such feature for a project of us. So plan to develop it. > When I come back from holidays I'll create a JIRA. > > Meanwhile, don't hesitate to tell me more if you have any idea of some > interesting > features linked with such compression, or any advice to implement it. For > instance, I am not > really sure right now at which level I should set the compression in the > Chukwa Agent: > - at the whole DataOutputBuffer level? > - at the "data" field of every Chunk? > - or at the adaptor level? > (at first sight, the DataOutputBuffer level seems the easier to impement). > > Thanks, > > Sourygna -- Ari Rabkin [email protected] Princeton Computer Science Department
