[ 
http://issues.apache.org/jira/browse/HADOOP-738?page=comments#action_12452033 ] 
            
Doug Cutting commented on HADOOP-738:
-------------------------------------

> Do we still want to support crc files [ ...]

MapReduce data spends a lot of time in memory (while sorting) and on local 
disks.  Most checksum errors folks see are from local disks during sorting, not 
HDFS.  So, yes, we'll still need it.

And per-block checksums are different.  They're not end-to-end.  Currently we 
checksum the data as it is written to the output stream's buffer, and validate 
it as it is read from the input stream's buffer.  A lot can happen between that 
time and it winding up in a DFS block.  To replace this we'd ideally want to 
still compute the checksum as it is written, transmit it along with the block 
to datanodes, then transmit it back to the client when the data is read, and 
verify as it is read.  We'd also need sub-block checksums, not per-block, so 
that one can seek without checksumming an entire block.  Yes, TCP does 
checksums, but memory errors can be introduced on either end outside of the TCP 
stack, and, if blocks are temporarily stored on local disk, it can also be a 
source of block corruption.  So getting rid of CRC files for even HDFS will 
take more than just per-block checksums on datanodes.

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files 
> including crc files are also copied. When we -put or -copyFromLocal again, 
> since the crc files already exist on DFS, this put fails. The solution is not 
> to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to