[
https://issues.apache.org/jira/browse/HADOOP-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488766
]
Doug Cutting commented on HADOOP-1259:
--------------------------------------
> post-HADOOP-1134-upgrade, such a mismatch will cause similar issue, for e.g.,
> to join two blocks, we will need to re-checksum the entire second block
When do we anticipate that we'll need to join blocks? We might, to de-fragment
files that have been appended to many times. But in that case the blocks will
likely be on separate datanodes, so re-checksumming would incur no cost.
De-fragmenting could probably be done fairly efficiently by a map-reduce task
in user code, by simply copying a file to a new, temporary name, then renaming
it back to its original name. So I don't yet see block joining as an argument
for this.
> DFS should enforce block size is a multiple of io.bytes.per.checksum
> ---------------------------------------------------------------------
>
> Key: HADOOP-1259
> URL: https://issues.apache.org/jira/browse/HADOOP-1259
> Project: Hadoop
> Issue Type: Improvement
> Reporter: Raghu Angadi
>
> DFSClient currently does not enforce that dfs.block.size is a multiple
> io.bytes.per.checksum. This not really problem currently but can future
> upgrades like HADOOP-1134 (see one of the comments
> http://issues.apache.org/jira/browse/HADOOP-1134#action_12488542 there).
> I propose DFSClient should fail loudly and ask the user politely to change
> the config to meet this conidtion. Of course we will change the documentation
> for dfs.block.size also.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.