[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2014-07-02 Thread liuwei (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049997#comment-14049997
 ] 

liuwei commented on MAPREDUCE-2257:
---

since distcp has distcp2, is there a patch exits for distcp2 to copy blocks in 
parallel?

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Mithun Radhakrishnan
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2012-12-02 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508414#comment-13508414
 ] 

Mithun Radhakrishnan commented on MAPREDUCE-2257:
-

Sorry, I haven't been able to spare the time yet. I'll try make the time, 
shortly.

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Mithun Radhakrishnan
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2012-12-01 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508168#comment-13508168
 ] 

Harsh J commented on MAPREDUCE-2257:


[~mithun] - I know its been a while, but are you still working on this?

Since HDFS-222 is getting some attention, I feel it would be good to have this 
as an inbuilt usage of the same (and since Dhruba has already mentioned it is a 
great improvement to DistCp).

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Mithun Radhakrishnan
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2012-02-10 Thread Mahadev konar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205551#comment-13205551
 ] 

Mahadev konar commented on MAPREDUCE-2257:
--

Thanks for taking this up Mithun!

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Mithun Radhakrishnan
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2012-02-09 Thread Mithun Radhakrishnan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205280#comment-13205280
 ] 

Mithun Radhakrishnan commented on MAPREDUCE-2257:
-

I'll take a look. I already have a patch that accomplishes the bulk of this. 
The finishing touches remain.

I'll post a patch shortly.

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-05-14 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033623#comment-13033623
 ] 

Rodrigo Schmidt commented on MAPREDUCE-2257:


+1
Patch looks good. Just make sure it passes the QA test. Hadoop QA doesn't seem 
to have picked up the latest version.

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-04-28 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026166#comment-13026166
 ] 

Rodrigo Schmidt commented on MAPREDUCE-2257:


Maybe it's time to change it to non-deprecated classes.

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-04-25 Thread Rosie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025088#comment-13025088
 ] 

Rosie Li commented on MAPREDUCE-2257:
-

the original code was using the deprecated one..like the JobConf, 
InputSplit

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-04-25 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025087#comment-13025087
 ] 

Rodrigo Schmidt commented on MAPREDUCE-2257:


Shouldn't you change your code to use the class that replaced the deprecated 
one? :)

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-04-25 Thread Rosie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025035#comment-13025035
 ] 

Rosie Li commented on MAPREDUCE-2257:
-

[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:59:
 warning: [deprecation] org.apache.hadoop.mapred.FileSplit in 
org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.FileSplit;
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:60:
 warning: [deprecation] org.apache.hadoop.mapred.InputFormat in 
org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.InputFormat;
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:61:
 warning: [deprecation] org.apache.hadoop.mapred.InputSplit in 
org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.InputSplit;
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:63:
 warning: [deprecation] org.apache.hadoop.mapred.JobClient in 
org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.JobClient;
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:64:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.JobConf;
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:66:
 warning: [deprecation] org.apache.hadoop.mapred.Mapper in 
org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.Mapper;
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:211:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   private JobConf conf;
[javac]   ^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:738:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   private static void checkSrcPath(JobConf jobConf, List 
srcPaths)
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:831:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   static private void finalize(Configuration conf, JobConf jobconf,
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:1096:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   private static int setMapCount(long totalBytes, JobConf job)
[javac]   ^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:1120:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   private static JobConf createJobConf(Configuration conf) {
[javac]  ^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:1148:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   private static void setReplication(Configuration conf, JobConf 
jobConf,
[javac]  ^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:1190:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   static boolean setup(Configuration conf, JobConf jobConf,
[javac]^
[javac] 
/data/users/rosieli/hadoop_jira/hadoop-mapred-trunk/src/tools/org/apache/hadoop/tools/DistCp.java:1562:
 warning: [deprecation] org.apache.hadoop.mapred.JobConf in 
org.apache.hadoop.mapred has been deprecated
[javac]   FileSystem jobfs, Path jobdir, JobConf jobconf, Configuration 
conf
[javac]  ^
[javac] 
/data/users/ro

[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-04-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021230#comment-13021230
 ] 

Hadoop QA commented on MAPREDUCE-2257:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12476634/MAPREDUCE-2257.patch
  against trunk revision 1094093.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 2256 javac compiler warnings (more 
than the trunk's current 2244 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/173//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/173//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/173//console

This message is automatically generated.

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-04-18 Thread Rosie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021125#comment-13021125
 ] 

Rosie Li commented on MAPREDUCE-2257:
-

FileChunkPair is still src/dst file pairs but with the other 3 fields telling 
the starting point and offset for the file chunk pairs
Also I merged doCopyFile() and doCopyFileChunks(), now we only have one 
doCopyFile method.


> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-04-18 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020936#comment-13020936
 ] 

Rodrigo Schmidt commented on MAPREDUCE-2257:


The class FileChunkPair is not really a pair, right? It stores 5 fields.

Can't we somehow unify the if/else in copy()? At least doCopyFile() could use 
doCopyFileChunks().

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-04-05 Thread Rosie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016023#comment-13016023
 ] 

Rosie Li commented on MAPREDUCE-2257:
-

The failure of the contrib test is not related to the new distcp.

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014214#comment-13014214
 ] 

Hadoop QA commented on MAPREDUCE-2257:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12475126/MAPREDUCE-2257.patch
  against trunk revision 1087098.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 2256 javac compiler warnings (more 
than the trunk's current 2244 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/152//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/152//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/152//console

This message is automatically generated.

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-31 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014085#comment-13014085
 ] 

Allen Wittenauer commented on MAPREDUCE-2257:
-

>By default, distcp.copy.by.chunk is set to true in the configuration. The user 
>can set it to >false to use the original distcp. But the type of destination 
>will be checked afterward. >distcp.copy.by.chunk will remain true only if the 
>destination file system is the distributed >file system.

This needs to get added to the release notes.

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013901#comment-13013901
 ] 

Hadoop QA commented on MAPREDUCE-2257:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12475059/MAPREDUCE-2257.patch
  against trunk revision 1087098.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 2256 javac compiler warnings (more 
than the trunk's current 2244 warnings).

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/150//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/150//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/150//console

This message is automatically generated.

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012260#comment-13012260
 ] 

Hadoop QA commented on MAPREDUCE-2257:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12474806/MAPREDUCE-2257.patch
  against trunk revision 1082703.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 2256 javac compiler warnings (more 
than the trunk's current 2244 warnings).

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/147//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/147//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/147//console

This message is automatically generated.

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012211#comment-13012211
 ] 

Hadoop QA commented on MAPREDUCE-2257:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12474807/MAPREDUCE-2257.patch
  against trunk revision 1082703.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:


-1 contrib tests.  The patch failed contrib unit tests.

-1 system test framework.  The patch failed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/148//testReport/
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/148//console

This message is automatically generated.

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-18 Thread Rosie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008669#comment-13008669
 ] 

Rosie Li commented on MAPREDUCE-2257:
-

I'm working on this feature right now. Already done writing the code. Testing 
now.

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-18 Thread gopikannan venugopalsamy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008557#comment-13008557
 ] 

gopikannan venugopalsamy commented on MAPREDUCE-2257:
-

I wanna work on this, hey nikhil .. would you like to discuss

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-03-04 Thread nikhil (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002925#comment-13002925
 ] 

nikhil commented on MAPREDUCE-2257:
---

Is anyone working on this feature?


> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-02-01 Thread gopikannan venugopalsamy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989456#comment-12989456
 ] 

gopikannan venugopalsamy commented on MAPREDUCE-2257:
-

Hello,
I wish to contribute to this issue but I am new to this project.Can you 
guys give some tips for where to start from

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-01-11 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980490#action_12980490
 ] 

dhruba borthakur commented on MAPREDUCE-2257:
-

A new option to distcp could trigger parallel-block copy. It cannot be used 
with hftp.

> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2257) distcp can copy blocks in parallel

2011-01-11 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980408#action_12980408
 ] 

Allen Wittenauer commented on MAPREDUCE-2257:
-

Won't changing the unit break hftp?


> distcp can copy blocks in parallel
> --
>
> Key: MAPREDUCE-2257
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.