I copied a 230GB file into my hadoop cluster.  After my MR job kept failing I 
tracked down the error to one line of formatted text.

I copied the file back out of hdfs and when I compare it to the original file 
there are about 20 bytes on one line (out of 230GB) that are different.

Is there no CRC or checksum done when copying files into hdfs?

(Just to be clear, I copied the original file out of hdfs - not the output of 
my 
MR job.)



      

Reply via email to