[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.

2013-03-18 Thread Dave Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605205#comment-13605205
 ] 

Dave Thompson commented on MAPREDUCE-5065:
--

Reviewed latest patch.  Looks good.  +1

 DistCp should skip checksum comparisons if block-sizes are different on 
 source/target.
 --

 Key: MAPREDUCE-5065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: MAPREDUCE-5065.branch-0.23.patch, 
 MAPREDUCE-5065.branch-2.patch


 When copying files between 2 clusters with different default block-sizes, one 
 sees that the copy fails with a checksum-mismatch, even though the files have 
 identical contents.
 The reason is that on HDFS, a file's checksum is unfortunately a function of 
 the block-size of the file. So you could have 2 different files with 
 identical contents (but different block-sizes) have different checksums. 
 (Thus, it's also possible for DistCp to fail to copy files on the same 
 file-system, if the source-file's block-size differs from HDFS default, and 
 -pb isn't used.)
 I propose that we skip checksum comparisons under the following conditions:
 1. -skipCrc is specified.
 2. File-size is 0 (in which case the call to the checksum-servlet is moot).
 3. source.getBlockSize() != target.getBlockSize(), since the checksums are 
 guaranteed to differ in this case.
 I have a patch for #3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2593) Random read benchmark for DFS

2011-07-01 Thread Dave Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058700#comment-13058700
 ] 

Dave Thompson commented on MAPREDUCE-2593:
--

Jenkins appears to be triggering a failure because delete a file in archive 
and rename a file in archive unit tests are failing, despite that those test 
have nothing to do with this patch.



 Random read benchmark for DFS
 -

 Key: MAPREDUCE-2593
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2593
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Raghu Angadi
Assignee: Dave Thompson
 Attachments: HDFS-236.patch, RndRead-TestDFSIO-061011.patch, 
 RndRead-TestDFSIO.patch


 We should have at least one  random read benchmark that can be run with rest 
 of Hadoop benchmarks regularly.
 Please provide benchmark  ideas or requirements.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira