[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049997#comment-14049997 ] liuwei commented on MAPREDUCE-2257: --- since distcp has distcp2, is there a patch exits for distcp2 to copy blocks in parallel? distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: Mithun Radhakrishnan Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508414#comment-13508414 ] Mithun Radhakrishnan commented on MAPREDUCE-2257: - Sorry, I haven't been able to spare the time yet. I'll try make the time, shortly. distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: Mithun Radhakrishnan Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508168#comment-13508168 ] Harsh J commented on MAPREDUCE-2257: [~mithun] - I know its been a while, but are you still working on this? Since HDFS-222 is getting some attention, I feel it would be good to have this as an inbuilt usage of the same (and since Dhruba has already mentioned it is a great improvement to DistCp). distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: Mithun Radhakrishnan Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205551#comment-13205551 ] Mahadev konar commented on MAPREDUCE-2257: -- Thanks for taking this up Mithun! distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: Mithun Radhakrishnan Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205280#comment-13205280 ] Mithun Radhakrishnan commented on MAPREDUCE-2257: - I'll take a look. I already have a patch that accomplishes the bulk of this. The finishing touches remain. I'll post a patch shortly. distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033623#comment-13033623 ] Rodrigo Schmidt commented on MAPREDUCE-2257: +1 Patch looks good. Just make sure it passes the QA test. Hadoop QA doesn't seem to have picked up the latest version. distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026166#comment-13026166 ] Rodrigo Schmidt commented on MAPREDUCE-2257: Maybe it's time to change it to non-deprecated classes. distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025087#comment-13025087 ] Rodrigo Schmidt commented on MAPREDUCE-2257: Shouldn't you change your code to use the class that replaced the deprecated one? :) distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025088#comment-13025088 ] Rosie Li commented on MAPREDUCE-2257: - the original code was using the deprecated one..like the JobConf, InputSplit distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025035#comment-13025035 ] Rosie Li commented on MAPREDUCE-2257: - [javac] /data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:59: warning: [deprecation] org.apache.hadoop.mapred.FileSplit in org.apache.hadoop.mapred has been deprecated [javac] import org.apache.hadoop.mapred.FileSplit; [javac]^ [javac] /data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:60: warning: [deprecation] org.apache.hadoop.mapred.InputFormat in org.apache.hadoop.mapred has been deprecated [javac] import org.apache.hadoop.mapred.InputFormat; [javac]^ [javac] /data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:61: warning: [deprecation] org.apache.hadoop.mapred.InputSplit in org.apache.hadoop.mapred has been deprecated [javac] import org.apache.hadoop.mapred.InputSplit; [javac]^ [javac] /data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:63: warning: [deprecation] org.apache.hadoop.mapred.JobClient in org.apache.hadoop.mapred has been deprecated [javac] import org.apache.hadoop.mapred.JobClient; [javac]^ [javac] /data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:64: warning: [deprecation] org.apache.hadoop.mapred.JobConf in org.apache.hadoop.mapred has been deprecated [javac] import org.apache.hadoop.mapred.JobConf; [javac]^ [javac] /data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:66: warning: [deprecation] org.apache.hadoop.mapred.Mapper in org.apache.hadoop.mapred has been deprecated [javac] import org.apache.hadoop.mapred.Mapper; [javac]^ [javac] /data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:211: warning: [deprecation] org.apache.hadoop.mapred.JobConf in org.apache.hadoop.mapred has been deprecated [javac] private JobConf conf; [javac] ^ [javac] /data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:738: warning: [deprecation] org.apache.hadoop.mapred.JobConf in org.apache.hadoop.mapred has been deprecated [javac] private static void checkSrcPath(JobConf jobConf, ListPath srcPaths) [javac]^ [javac] /data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:831: warning: [deprecation] org.apache.hadoop.mapred.JobConf in org.apache.hadoop.mapred has been deprecated [javac] static private void finalize(Configuration conf, JobConf jobconf, [javac]^ [javac] /data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:1096: warning: [deprecation] org.apache.hadoop.mapred.JobConf in org.apache.hadoop.mapred has been deprecated [javac] private static int setMapCount(long totalBytes, JobConf job) [javac] ^ [javac] /data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:1120: warning: [deprecation] org.apache.hadoop.mapred.JobConf in org.apache.hadoop.mapred has been deprecated [javac] private static JobConf createJobConf(Configuration conf) { [javac] ^ [javac] /data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:1148: warning: [deprecation] org.apache.hadoop.mapred.JobConf in org.apache.hadoop.mapred has been deprecated [javac] private static void setReplication(Configuration conf, JobConf jobConf, [javac] ^ [javac] /data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:1190: warning: [deprecation] org.apache.hadoop.mapred.JobConf in org.apache.hadoop.mapred has been deprecated [javac] static boolean setup(Configuration conf, JobConf jobConf, [javac]^ [javac] /data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:1562: warning: [deprecation] org.apache.hadoop.mapred.JobConf in org.apache.hadoop.mapred has been deprecated [javac] FileSystem jobfs, Path jobdir, JobConf jobconf, Configuration conf [javac] ^ [javac]
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020936#comment-13020936 ] Rodrigo Schmidt commented on MAPREDUCE-2257: The class FileChunkPair is not really a pair, right? It stores 5 fields. Can't we somehow unify the if/else in copy()? At least doCopyFile() could use doCopyFileChunks(). distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021230#comment-13021230 ] Hadoop QA commented on MAPREDUCE-2257: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12476634/MAPREDUCE-2257.patch against trunk revision 1094093. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 2256 javac compiler warnings (more than the trunk's current 2244 warnings). +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/173//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/173//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/173//console This message is automatically generated. distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016023#comment-13016023 ] Rosie Li commented on MAPREDUCE-2257: - The failure of the contrib test is not related to the new distcp. distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013901#comment-13013901 ] Hadoop QA commented on MAPREDUCE-2257: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475059/MAPREDUCE-2257.patch against trunk revision 1087098. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 2256 javac compiler warnings (more than the trunk's current 2244 warnings). -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/150//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/150//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/150//console This message is automatically generated. distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014085#comment-13014085 ] Allen Wittenauer commented on MAPREDUCE-2257: - By default, distcp.copy.by.chunk is set to true in the configuration. The user can set it to false to use the original distcp. But the type of destination will be checked afterward. distcp.copy.by.chunk will remain true only if the destination file system is the distributed file system. This needs to get added to the release notes. distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014214#comment-13014214 ] Hadoop QA commented on MAPREDUCE-2257: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475126/MAPREDUCE-2257.patch against trunk revision 1087098. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 2256 javac compiler warnings (more than the trunk's current 2244 warnings). +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/152//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/152//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/152//console This message is automatically generated. distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012211#comment-13012211 ] Hadoop QA commented on MAPREDUCE-2257: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12474807/MAPREDUCE-2257.patch against trunk revision 1082703. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/148//testReport/ Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/148//console This message is automatically generated. distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012260#comment-13012260 ] Hadoop QA commented on MAPREDUCE-2257: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12474806/MAPREDUCE-2257.patch against trunk revision 1082703. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 2256 javac compiler warnings (more than the trunk's current 2244 warnings). -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/147//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/147//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/147//console This message is automatically generated. distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.21.0 Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: MAPREDUCE-2257.patch The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008557#comment-13008557 ] gopikannan venugopalsamy commented on MAPREDUCE-2257: - I wanna work on this, hey nikhil .. would you like to discuss distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Reporter: dhruba borthakur Assignee: dhruba borthakur The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008669#comment-13008669 ] Rosie Li commented on MAPREDUCE-2257: - I'm working on this feature right now. Already done writing the code. Testing now. distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Reporter: dhruba borthakur Assignee: dhruba borthakur The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002925#comment-13002925 ] nikhil commented on MAPREDUCE-2257: --- Is anyone working on this feature? distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Reporter: dhruba borthakur Assignee: dhruba borthakur The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12989456#comment-12989456 ] gopikannan venugopalsamy commented on MAPREDUCE-2257: - Hello, I wish to contribute to this issue but I am new to this project.Can you guys give some tips for where to start from distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Reporter: dhruba borthakur Assignee: dhruba borthakur The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980408#action_12980408 ] Allen Wittenauer commented on MAPREDUCE-2257: - Won't changing the unit break hftp? distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Reporter: dhruba borthakur Assignee: dhruba borthakur The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980490#action_12980490 ] dhruba borthakur commented on MAPREDUCE-2257: - A new option to distcp could trigger parallel-block copy. It cannot be used with hftp. distcp can copy blocks in parallel -- Key: MAPREDUCE-2257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Reporter: dhruba borthakur Assignee: dhruba borthakur The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.