[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2014-07-02 Thread liuwei (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049997#comment-14049997
 ] 

liuwei commented on MAPREDUCE-2257:
---

since distcp has distcp2, is there a patch exits for distcp2 to copy blocks in 
parallel?

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: Mithun Radhakrishnan
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2012-12-02 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508414#comment-13508414
 ] 

Mithun Radhakrishnan commented on MAPREDUCE-2257:
-

Sorry, I haven't been able to spare the time yet. I'll try make the time, 
shortly.

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: Mithun Radhakrishnan
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2012-12-01 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508168#comment-13508168
 ] 

Harsh J commented on MAPREDUCE-2257:


[~mithun] - I know its been a while, but are you still working on this?

Since HDFS-222 is getting some attention, I feel it would be good to have this 
as an inbuilt usage of the same (and since Dhruba has already mentioned it is a 
great improvement to DistCp).

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: Mithun Radhakrishnan
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2012-02-10 Thread Mahadev konar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205551#comment-13205551
 ] 

Mahadev konar commented on MAPREDUCE-2257:
--

Thanks for taking this up Mithun!

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: Mithun Radhakrishnan
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2012-02-09 Thread Mithun Radhakrishnan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205280#comment-13205280
 ] 

Mithun Radhakrishnan commented on MAPREDUCE-2257:
-

I'll take a look. I already have a patch that accomplishes the bulk of this. 
The finishing touches remain.

I'll post a patch shortly.

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-05-14 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033623#comment-13033623
 ] 

Rodrigo Schmidt commented on MAPREDUCE-2257:


+1
Patch looks good. Just make sure it passes the QA test. Hadoop QA doesn't seem 
to have picked up the latest version.

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-04-28 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026166#comment-13026166
 ] 

Rodrigo Schmidt commented on MAPREDUCE-2257:


Maybe it's time to change it to non-deprecated classes.

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-04-26 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025087#comment-13025087
 ] 

Rodrigo Schmidt commented on MAPREDUCE-2257:


Shouldn't you change your code to use the class that replaced the deprecated 
one? :)

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-04-26 Thread Rosie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025088#comment-13025088
 ] 

Rosie Li commented on MAPREDUCE-2257:
-

the original code was using the deprecated one..like the JobConf, 
InputSplit

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-04-25 Thread Rosie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025035#comment-13025035
 ] 

Rosie Li commented on MAPREDUCE-2257:
-

[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:59:
 warning: [deprecation] org.apache.hadoop.mapred.FileSplit in 
org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.FileSplit;
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:60:
 warning: [deprecation] org.apache.hadoop.mapred.InputFormat in 
org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.InputFormat;
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:61:
 warning: [deprecation] org.apache.hadoop.mapred.InputSplit in 
org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.InputSplit;
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:63:
 warning: [deprecation] org.apache.hadoop.mapred.JobClient in 
org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.JobClient;
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:64:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.JobConf;
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:66:
 warning: [deprecation] org.apache.hadoop.mapred.Mapper in 
org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.Mapper;
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:211:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   private JobConf conf;
[javac]   ^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:738:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   private static void checkSrcPath(JobConf jobConf, ListPath 
srcPaths)
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:831:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   static private void finalize(Configuration conf, JobConf jobconf,
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:1096:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   private static int setMapCount(long totalBytes, JobConf job)
[javac]   ^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:1120:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   private static JobConf createJobConf(Configuration conf) {
[javac]  ^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:1148:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   private static void setReplication(Configuration conf, JobConf 
jobConf,
[javac]  ^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:1190:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   static boolean setup(Configuration conf, JobConf jobConf,
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:1562:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   FileSystem jobfs, Path jobdir, JobConf jobconf, Configuration 
conf
[javac]  ^
[javac] 

[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-04-18 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020936#comment-13020936
 ] 

Rodrigo Schmidt commented on MAPREDUCE-2257:


The class FileChunkPair is not really a pair, right? It stores 5 fields.

Can't we somehow unify the if/else in copy()? At least doCopyFile() could use 
doCopyFileChunks().

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-04-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021230#comment-13021230
 ] 

Hadoop QA commented on MAPREDUCE-2257:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12476634/MAPREDUCE-2257.patch
  against trunk revision 1094093.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 2256 javac compiler warnings (more 
than the trunk's current 2244 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/173//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/173//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/173//console

This message is automatically generated.

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-04-05 Thread Rosie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016023#comment-13016023
 ] 

Rosie Li commented on MAPREDUCE-2257:
-

The failure of the contrib test is not related to the new distcp.

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013901#comment-13013901
 ] 

Hadoop QA commented on MAPREDUCE-2257:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12475059/MAPREDUCE-2257.patch
  against trunk revision 1087098.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 2256 javac compiler warnings (more 
than the trunk's current 2244 warnings).

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/150//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/150//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/150//console

This message is automatically generated.

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-31 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014085#comment-13014085
 ] 

Allen Wittenauer commented on MAPREDUCE-2257:
-

By default, distcp.copy.by.chunk is set to true in the configuration. The user 
can set it to false to use the original distcp. But the type of destination 
will be checked afterward. distcp.copy.by.chunk will remain true only if the 
destination file system is the distributed file system.

This needs to get added to the release notes.

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014214#comment-13014214
 ] 

Hadoop QA commented on MAPREDUCE-2257:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12475126/MAPREDUCE-2257.patch
  against trunk revision 1087098.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 2256 javac compiler warnings (more 
than the trunk's current 2244 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/152//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/152//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/152//console

This message is automatically generated.

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012211#comment-13012211
 ] 

Hadoop QA commented on MAPREDUCE-2257:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12474807/MAPREDUCE-2257.patch
  against trunk revision 1082703.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:


-1 contrib tests.  The patch failed contrib unit tests.

-1 system test framework.  The patch failed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/148//testReport/
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/148//console

This message is automatically generated.

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012260#comment-13012260
 ] 

Hadoop QA commented on MAPREDUCE-2257:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12474806/MAPREDUCE-2257.patch
  against trunk revision 1082703.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 2256 javac compiler warnings (more 
than the trunk's current 2244 warnings).

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/147//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/147//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/147//console

This message is automatically generated.

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.21.0
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: MAPREDUCE-2257.patch


 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-18 Thread gopikannan venugopalsamy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008557#comment-13008557
 ] 

gopikannan venugopalsamy commented on MAPREDUCE-2257:
-

I wanna work on this, hey nikhil .. would you like to discuss

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-18 Thread Rosie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008669#comment-13008669
 ] 

Rosie Li commented on MAPREDUCE-2257:
-

I'm working on this feature right now. Already done writing the code. Testing 
now.

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-04 Thread nikhil (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002925#comment-13002925
 ] 

nikhil commented on MAPREDUCE-2257:
---

Is anyone working on this feature?


 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-02-01 Thread gopikannan venugopalsamy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12989456#comment-12989456
 ] 

gopikannan venugopalsamy commented on MAPREDUCE-2257:
-

Hello,
I wish to contribute to this issue but I am new to this project.Can you 
guys give some tips for where to start from

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-01-11 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980408#action_12980408
 ] 

Allen Wittenauer commented on MAPREDUCE-2257:
-

Won't changing the unit break hftp?


 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-01-11 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980490#action_12980490
 ] 

dhruba borthakur commented on MAPREDUCE-2257:
-

A new option to distcp could trigger parallel-block copy. It cannot be used 
with hftp.

 distcp can copy blocks in parallel
 --

 Key: MAPREDUCE-2257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 The minimum unit of work for a distcp task is a file. We have files that are 
 greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
 files, the tasks either take a long long long time or finally fails. A better 
 way for distcp would be to copy all the source blocks in parallel, and then 
 stich the blocks back to files at the destination via the HDFS Concat API 
 (HDFS-222)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.