Thamizh,
For a much older project I wrote a demo tool that computed the hadoop
style checksum locally:

https://github.com/jpatanooga/IvoryMonkey

Checksum generator is a single threaded replica of Hadoop's internal
Distributed hash-checksum mechanic.

What its actually doing is saving the CRC32 of every 512 bytes (per
block) and then doing a MD5 hash on that. Then when the
"getFileChecksum()" method is called, each block for a file sends its
md5 hash to a collector which are gathered together and a md5 hash is
calc'd for all of the block hashes.

My version includes code that can calculate the hash on the client
side (but breaks things up in the same way that hdfs does and will
calc it the same way).

During development, we also discovered and filed:

https://issues.apache.org/jira/browse/HDFS-772

To invoke this method, use my shell wrapper:

https://github.com/jpatanooga/IvoryMonkey/blob/master/src/tv/floe/IvoryMonkey/hadoop/fs/Shell.java

Hope this provides some reference information for you.

On Sat, Apr 9, 2011 at 10:38 AM, Thamizh <tceg...@yahoo.co.in> wrote:
> Hi Harsh ,
> Thanks a lot for your reference.
> I am looking forward to know about, how does Hadoop computes CRC for any 
> file? If you have some reference please share me. It would be great help for 
> me.
>
> Regards,
>
>  Thamizhannal P
>
> --- On Sat, 9/4/11, Harsh J <ha...@cloudera.com> wrote:
>
> From: Harsh J <ha...@cloudera.com>
> Subject: Re: Reg HDFS checksum
> To: common-user@hadoop.apache.org
> Date: Saturday, 9 April, 2011, 3:20 PM
>
> Hello Thamizh,
>
> Perhaps the discussion in the following link can shed some light on
> this: http://getsatisfaction.com/cloudera/topics/hadoop_fs_crc
>
> On Fri, Apr 8, 2011 at 5:47 PM, Thamizh <tceg...@yahoo.co.in> wrote:
>> Hi All,
>>
>> This is question regarding "HDFS checksum" computation.
>
> --
> Harsh J
>



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com
blog: http://jpatterson.floe.tv

Reply via email to