[ 
https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172006#comment-13172006
 ] 

Todd Lipcon commented on HDFS-2699:
-----------------------------------

bq. Modifying any portion of that region will require that the entire data for 
the region be read in, and the CRC recomputed for that entire region and the 
entire region written out again

But the cost of random-reading 4K is essentially the same as the cost of 
reading 512 bytes. Once you seek to the offset, the data transfer time is 
insignificant.

Plus, given the 4KB page size used by Linux, all IO is already at this 
granularity.

bq. An append happens a few days later to extend the file from 9K to 11K. CRC3 
is now recomputed for the 3K-sized region spanning offsets 8K-11K and written 
out as CRC3-new. But there is a crash...

This is an existing issue regardless of whether the checksums are interleaved 
or separate. The current solution is that we allow a checksum error on the last 
"checksum chunk" of a file in the case that it's being recovered after a crash 
-- iirc only in the case that _all_ replicas have this issue. If there is any 
valid replica, then we use that and truncate/rollback the other files to the 
sync boundary.

                
> Store data and checksums together in block file
> -----------------------------------------------
>
>                 Key: HDFS-2699
>                 URL: https://issues.apache.org/jira/browse/HDFS-2699
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> The current implementation of HDFS stores the data in one block file and the 
> metadata(checksum) in another block file. This means that every read from 
> HDFS actually consumes two disk iops, one to the datafile and one to the 
> checksum file. This is a major problem for scaling HBase, because HBase is 
> usually  bottlenecked on the number of random disk iops that the 
> storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to