Thanks Stack for your reply. I will work on this and give a patch soon...

-Anoop-
________________________________________
From: [email protected] [[email protected]] on behalf of Stack 
[[email protected]]
Sent: Saturday, May 12, 2012 10:08 AM
To: [email protected]
Subject: Re: Usage of block encoding in bulk loading

On Fri, May 11, 2012 at 10:18 AM, Anoop Sam John <[email protected]> wrote:
> Hi Devs
>              When the data is bulk loaded using HFileOutputFormat, we are not 
> using the block encoding and the HBase handled checksum features I think..  
> When the writer is created for making the HFile, I am not seeing any such 
> info passing to the WriterBuilder.
> In HFileOutputFormat.getNewWriter(byte[] family, Configuration conf), we dont 
> have these info and do not pass also to the writer... So those HFiles will 
> not have these optimizations..
>
> Later in LoadIncrementalHFiles.copyHFileHalf(), where we physically divide 
> one HFile(created by the MR) iff it can not belong to just one region, I can 
> see we pass the datablock encoding details and checksum details to the new 
> HFile writer. But this step wont happen normally I think..
>
> Correct me if my understanding is wrong pls...
>

Sounds plausible Anoop.  Sounds like something worth fixing too?

Good on you,
St.Ack

Reply via email to