[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012260#comment-13012260 ] Hadoop QA commented on MAPREDUCE-2257: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12474806/MAPREDUCE-2257.patch against trunk revision 1082703. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 2256 javac compiler warnings (more than the trunk's current 2244 warnings). -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/147//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/147//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/147//console This message is automatically generated. > distcp can copy blocks in parallel > -- > > Key: MAPREDUCE-2257 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.21.0 >Reporter: dhruba borthakur >Assignee: dhruba borthakur > Attachments: MAPREDUCE-2257.patch > > > The minimum unit of work for a distcp task is a file. We have files that are > greater than 1 TB with a block size of 1 GB. If we use distcp to copy these > files, the tasks either take a long long long time or finally fails. A better > way for distcp would be to copy all the source blocks in parallel, and then > stich the blocks back to files at the destination via the HDFS Concat API > (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2406) Failed validate copy in distcp
Failed validate copy in distcp -- Key: MAPREDUCE-2406 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2406 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.21.0 Reporter: Rosie Li Priority: Minor Each time the distcp is done, {{validateCopy(srcstat, absdst)}} will be called. When doing distcp, if the -pb(preserve block size) is not set, the dst will use the default block size. However, if the src file use block size other than the default block size, and -pb is not set, after copying, the src and dst will have different block size. It will not pass the validateCopy check in this case. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012211#comment-13012211 ] Hadoop QA commented on MAPREDUCE-2257: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12474807/MAPREDUCE-2257.patch against trunk revision 1082703. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/148//testReport/ Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/148//console This message is automatically generated. > distcp can copy blocks in parallel > -- > > Key: MAPREDUCE-2257 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.21.0 >Reporter: dhruba borthakur >Assignee: dhruba borthakur > Attachments: MAPREDUCE-2257.patch > > > The minimum unit of work for a distcp task is a file. We have files that are > greater than 1 TB with a block size of 1 GB. If we use distcp to copy these > files, the tasks either take a long long long time or finally fails. A better > way for distcp would be to copy all the source blocks in parallel, and then > stich the blocks back to files at the destination via the HDFS Concat API > (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rosie Li updated MAPREDUCE-2257: Attachment: MAPREDUCE-2257.patch chop files into chunks before copy and then stitch them back after copy > distcp can copy blocks in parallel > -- > > Key: MAPREDUCE-2257 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.21.0 >Reporter: dhruba borthakur >Assignee: dhruba borthakur > Attachments: MAPREDUCE-2257.patch > > > The minimum unit of work for a distcp task is a file. We have files that are > greater than 1 TB with a block size of 1 GB. If we use distcp to copy these > files, the tasks either take a long long long time or finally fails. A better > way for distcp would be to copy all the source blocks in parallel, and then > stich the blocks back to files at the destination via the HDFS Concat API > (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rosie Li updated MAPREDUCE-2257: Attachment: (was: MAPREDUCE-2257.patch) > distcp can copy blocks in parallel > -- > > Key: MAPREDUCE-2257 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.21.0 >Reporter: dhruba borthakur >Assignee: dhruba borthakur > > The minimum unit of work for a distcp task is a file. We have files that are > greater than 1 TB with a block size of 1 GB. If we use distcp to copy these > files, the tasks either take a long long long time or finally fails. A better > way for distcp would be to copy all the source blocks in parallel, and then > stich the blocks back to files at the destination via the HDFS Concat API > (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rosie Li updated MAPREDUCE-2257: Attachment: MAPREDUCE-2257.patch chop files into chunks before copy, and stitch them back after copy. > distcp can copy blocks in parallel > -- > > Key: MAPREDUCE-2257 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.21.0 >Reporter: dhruba borthakur >Assignee: dhruba borthakur > Attachments: MAPREDUCE-2257.patch > > > The minimum unit of work for a distcp task is a file. We have files that are > greater than 1 TB with a block size of 1 GB. If we use distcp to copy these > files, the tasks either take a long long long time or finally fails. A better > way for distcp would be to copy all the source blocks in parallel, and then > stich the blocks back to files at the destination via the HDFS Concat API > (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2257) distcp can copy blocks in parallel
[ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rosie Li updated MAPREDUCE-2257: Affects Version/s: 0.21.0 Release Note: copy file parallel by first chopping files into chunks, copy them and stitch the file chunks back into files in the end. Status: Patch Available (was: Open) By default, distcp.copy.by.chunk is set to true in the configuration. The user can set it to false to use the original distcp. But the type of destination will be checked afterward. distcp.copy.by.chunk will remain true only if the destination file system is the distributed file system. > distcp can copy blocks in parallel > -- > > Key: MAPREDUCE-2257 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.21.0 >Reporter: dhruba borthakur >Assignee: dhruba borthakur > > The minimum unit of work for a distcp task is a file. We have files that are > greater than 1 TB with a block size of 1 GB. If we use distcp to copy these > files, the tasks either take a long long long time or finally fails. A better > way for distcp would be to copy all the source blocks in parallel, and then > stich the blocks back to files at the destination via the HDFS Concat API > (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2307) Exception thrown in Jobtracker logs, when the Scheduler configured is FairScheduler.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012143#comment-13012143 ] Matei Zaharia commented on MAPREDUCE-2307: -- Oh OK, got it. This looks good then. I will commit it if there are no other comments. > Exception thrown in Jobtracker logs, when the Scheduler configured is > FairScheduler. > > > Key: MAPREDUCE-2307 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2307 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/fair-share >Affects Versions: 0.23.0 >Reporter: Devaraj K >Priority: Minor > Fix For: 0.23.0 > > Attachments: MAPREDUCE-2307.patch > > > If we try to start the job tracker with fair scheduler using the default > configuration, It is giving the below exception. > {code:xml} > 2010-07-03 10:18:27,142 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 2 on 9001: starting > 2010-07-03 10:18:27,143 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 3 on 9001: starting > 2010-07-03 10:18:27,143 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 4 on 9001: starting > 2010-07-03 10:18:27,143 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 5 on 9001: starting > 2010-07-03 10:18:27,143 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 6 on 9001: starting > 2010-07-03 10:18:27,143 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 7 on 9001: starting > 2010-07-03 10:18:27,143 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 8 on 9001: starting > 2010-07-03 10:18:27,143 INFO org.apache.hadoop.mapred.JobTracker: Starting > RUNNING > 2010-07-03 10:18:27,143 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 9 on 9001: starting > 2010-07-03 10:18:28,037 INFO org.apache.hadoop.net.NetworkTopology: Adding a > new node: /default-rack/linux172.site > 2010-07-03 10:18:28,090 INFO org.apache.hadoop.net.NetworkTopology: Adding a > new node: /default-rack/linux177.site > 2010-07-03 10:18:40,074 ERROR org.apache.hadoop.mapred.PoolManager: Failed to > reload allocations file - will use existing allocations. > java.lang.NullPointerException > at java.io.File.(File.java:222) > at > org.apache.hadoop.mapred.PoolManager.reloadAllocsIfNecessary(PoolManager.java:127) > at org.apache.hadoop.mapred.FairScheduler.assignTasks(FairScheduler.java:234) > at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2785) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:513) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:984) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:980) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:978) > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2405) MR-279: Implement uber-AppMaster (in-cluster LocalJobRunner for MRv2)
[ https://issues.apache.org/jira/browse/MAPREDUCE-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011941#comment-13011941 ] Sharad Agarwal commented on MAPREDUCE-2405: --- The overall direction looks good. Noticed in JobImpl: {code} .addTransition(JobState.INITED, JobState.KILL_WAIT, JobEventType.JOB_KILL, KILL_NEW_JOB_TRANSITION) {code} Kill event on INITED state should directly get to KILLED state. Also it should not use KILL_NEW_JOB_TRANSITION because it does not call the abort logic. Here we need a new Transition which executes abort. > MR-279: Implement uber-AppMaster (in-cluster LocalJobRunner for MRv2) > - > > Key: MAPREDUCE-2405 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2405 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Mahadev konar >Assignee: Greg Roelofs > Fix For: 0.23.0 > > Attachments: MR-2405-MR-1220-yarn.v8.MR-279-hadoop-yarn.patch.txt > > > "Port" MAPREDUCE-1220 to MRv2. This is an optimization for small jobs > wherein all tasks run on the same node in the same JVM/container. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira