from:"Dave Thompson \(JIRA\)"

[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.

2013-03-18 Thread Dave Thompson (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605205#comment-13605205
]

Dave Thompson commented on MAPREDUCE-5065:
--

Reviewed latest patch. Looks good. +1

DistCp should skip checksum comparisons if block-sizes are different on
source/target.
--

Key: MAPREDUCE-5065
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: distcp
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
Attachments: MAPREDUCE-5065.branch-0.23.patch,
MAPREDUCE-5065.branch-2.patch

When copying files between 2 clusters with different default block-sizes, one
sees that the copy fails with a checksum-mismatch, even though the files have
identical contents.
The reason is that on HDFS, a file's checksum is unfortunately a function of
the block-size of the file. So you could have 2 different files with
identical contents (but different block-sizes) have different checksums.
(Thus, it's also possible for DistCp to fail to copy files on the same
file-system, if the source-file's block-size differs from HDFS default, and
-pb isn't used.)
I propose that we skip checksum comparisons under the following conditions:
1. -skipCrc is specified.
2. File-size is 0 (in which case the call to the checksum-servlet is moot).
3. source.getBlockSize() != target.getBlockSize(), since the checksums are
guaranteed to differ in this case.
I have a patch for #3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2593) Random read benchmark for DFS

2011-07-01 Thread Dave Thompson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058700#comment-13058700
 ] 

Dave Thompson commented on MAPREDUCE-2593:
--

Jenkins appears to be triggering a failure because delete a file in archive 
and rename a file in archive unit tests are failing, despite that those test 
have nothing to do with this patch.



 Random read benchmark for DFS
 -

 Key: MAPREDUCE-2593
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2593
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Raghu Angadi
Assignee: Dave Thompson
 Attachments: HDFS-236.patch, RndRead-TestDFSIO-061011.patch, 
 RndRead-TestDFSIO.patch


 We should have at least one  random read benchmark that can be run with rest 
 of Hadoop benchmarks regularly.
 Please provide benchmark  ideas or requirements.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.

[jira] [Commented] (MAPREDUCE-2593) Random read benchmark for DFS

2 matches

Site Navigation

Mail list logo

Footer information