[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449661#comment-13449661
 ] 

Hudson commented on HDFS-3054:
--

Integrated in Hadoop-Mapreduce-trunk #1188 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1188/])
HDFS-3054. distcp -skipcrccheck has no effect. Contributed by Colin Patrick 
McCabe. (Revision 1381296)

 Result = ABORTED
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296
Files : 
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java


> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0, 2.2.0-alpha
>
> Attachments: HDFS-3054.002.patch, HDFS-3054.004.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449624#comment-13449624
 ] 

Hudson commented on HDFS-3054:
--

Integrated in Hadoop-Hdfs-trunk #1157 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1157/])
HDFS-3054. distcp -skipcrccheck has no effect. Contributed by Colin Patrick 
McCabe. (Revision 1381296)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296
Files : 
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java


> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0, 2.2.0-alpha
>
> Attachments: HDFS-3054.002.patch, HDFS-3054.004.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-09-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449025#comment-13449025
 ] 

Hudson commented on HDFS-3054:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2709 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2709/])
HDFS-3054. distcp -skipcrccheck has no effect. Contributed by Colin Patrick 
McCabe. (Revision 1381296)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296
Files : 
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java


> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0, 2.2.0-alpha
>
> Attachments: HDFS-3054.002.patch, HDFS-3054.004.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-09-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449014#comment-13449014
 ] 

Hudson commented on HDFS-3054:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2748 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2748/])
HDFS-3054. distcp -skipcrccheck has no effect. Contributed by Colin Patrick 
McCabe. (Revision 1381296)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296
Files : 
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java


> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0, 2.2.0-alpha
>
> Attachments: HDFS-3054.002.patch, HDFS-3054.004.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-09-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449012#comment-13449012
 ] 

Hudson commented on HDFS-3054:
--

Integrated in Hadoop-Common-trunk-Commit #2685 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2685/])
HDFS-3054. distcp -skipcrccheck has no effect. Contributed by Colin Patrick 
McCabe. (Revision 1381296)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296
Files : 
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java


> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0, 2.2.0-alpha
>
> Attachments: HDFS-3054.002.patch, HDFS-3054.004.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-09-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449000#comment-13449000
 ] 

Todd Lipcon commented on HDFS-3054:
---

+1, will commit momentarily.

> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-3054.002.patch, HDFS-3054.004.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448943#comment-13448943
 ] 

Hadoop QA commented on HDFS-3054:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12543879/HDFS-3054.004.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in hadoop-tools/hadoop-distcp.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3149//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3149//console

This message is automatically generated.

> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-3054.002.patch, HDFS-3054.004.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-09-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448883#comment-13448883
 ] 

Todd Lipcon commented on HDFS-3054:
---

style nits:
- missing space after ',' in RetriableFileCopyCommand's constructor definition, 
and its usage
- please reorder the '@param' in the javadoc to match the order of arguments

otherwise +1

> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-3054.002.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-09-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448386#comment-13448386
 ] 

Colin Patrick McCabe commented on HDFS-3054:


testing done:

I confirmed that copying files from a cluster running branch-1 derived code to 
a cluster running branch-2 derived code did *not* work unless {{-skipcrccheck}} 
was supplied.

The exception was this:

{code}
Error: java.io.IOException: File copy failed: hftp://172.22.1.204:6001/a/xx 
--> hdfs://localhost:6000/b/a/xx
at 
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:267)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:153)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:148)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying 
hftp://172.22.1.204:6001/a/xx to hdfs://localhost:6000/b/a/xx
at 
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
at 
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:263)
... 10 more
Caused by: java.io.IOException: Check-sum mismatch between 
hftp://172.22.1.204:6001/a/xx and 
hdfs://localhost:6000/b/.distcp.tmp.attempt_1346456743556_0010_m_01_0
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareCheckSums(RetriableFileCopyCommand.java:145)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:107)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:83)
at 
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
{code}

With {{-skipcrccheck}}, this problem did not occur.

> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-3054.002.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-08-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446569#comment-13446569
 ] 

Colin Patrick McCabe commented on HDFS-3054:


I confirmed through manual testing that -skipcrccheck does indeed cause the crc 
checking paths to be bypassed.

However, I found this in this code, in {{DistCpUtils#checksumsAreEquals}}:
{code}
try {
  sourceChecksum = sourceFS.getFileChecksum(source);
  targetChecksum = targetFS.getFileChecksum(target);
} catch (IOException e) {
  LOG.error("Unable to retrieve checksum for " + source + " or " + target, 
e);
}
{code} 

I think this should be a fatal error for the distcp operation unless 
{{-skipcrccheck}} is set.  Silently ignoring checksums if we can't find them 
doesn't seem like a good behavior.  Perhaps we should open a different JIRA for 
that, though...

> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-3054.002.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-08-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439740#comment-13439740
 ] 

Colin Patrick McCabe commented on HDFS-3054:


bq. How about just corrupting the block files manually itself ala 
TestFSInputChecker?

Probably best to add a @VisibleForTesting method in MiniDFSCluster that 
corrupts the block.  MiniDFSCluster is part of HDFS, this isn't.


> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
> Attachments: HDFS-3054.002.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-08-22 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439707#comment-13439707
 ] 

Eli Collins commented on HDFS-3054:
---

How about just corrupting the block files manually itself ala 
TestFSInputChecker?

Also, let's manually test that we can distcp from a v1 cluster to a v2 cluster 
with this patch (using skipcrc since HADOOP-8060 is not yet ready).

> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
> Attachments: HDFS-3054.002.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-08-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439207#comment-13439207
 ] 

Colin Patrick McCabe commented on HDFS-3054:


Hi Rahul,
This patch looks good to me overall.  I created a rebased version of it that 
incorporates changes from trunk.

I wish we could have a unit test for this.  It's a little difficult to create a 
good one without getting kind of uncomfortably coupled to the HDFS code (after 
all this is in tools).  We would probably have to create an API to corrupt file 
checksums in HDFS, and use that.  Then we could be sure that distcp 
-skipcrccheck was not consulting the CRC.  I'm not sure if this is worth the 
extra work, though... thoughts?

> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
> Attachments: HDFS-3054.002.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-08-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432582#comment-13432582
 ] 

Hadoop QA commented on HDFS-3054:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12540262/hdfs-3054.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javac.  The patch appears to cause the build to fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2985//console

This message is automatically generated.

> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
> Attachments: hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-08-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432582#comment-13432582
 ] 

Hadoop QA commented on HDFS-3054:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12540262/hdfs-3054.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javac.  The patch appears to cause the build to fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2985//console

This message is automatically generated.

> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
> Attachments: hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira