[ 
https://issues.apache.org/jira/browse/HADOOP-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488766
 ] 

Doug Cutting commented on HADOOP-1259:
--------------------------------------

> post-HADOOP-1134-upgrade, such a mismatch will cause similar issue, for e.g., 
> to join two blocks, we will need to re-checksum the entire second block

When do we anticipate that we'll need to join blocks?  We might, to de-fragment 
files that have been appended to many times.  But in that case the blocks will 
likely be on separate datanodes, so re-checksumming would incur no cost.  
De-fragmenting could probably be done fairly efficiently by a map-reduce task 
in user code, by simply copying a file to a new, temporary name, then renaming 
it back to its original name.  So I don't yet see block joining as an argument 
for this.


> DFS should enforce block size is a multiple of io.bytes.per.checksum 
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-1259
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1259
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Raghu Angadi
>
> DFSClient currently does not enforce that dfs.block.size is a multiple 
> io.bytes.per.checksum. This not really problem currently but can future 
> upgrades like HADOOP-1134 (see one of the comments 
> http://issues.apache.org/jira/browse/HADOOP-1134#action_12488542 there). 
> I propose DFSClient should fail loudly and ask the user politely to change 
> the config to meet this conidtion. Of course we will change the documentation 
> for dfs.block.size also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to