[jira] [Commented] (MAPREDUCE-5143) TestLineRecordReader was no test case for compressed files
[ https://issues.apache.org/jira/browse/MAPREDUCE-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717986#comment-13717986 ] Tsuyoshi OZAWA commented on MAPREDUCE-5143: --- [~gelesh], I could create the review request on Review board. https://reviews.apache.org/r/12892/ Thanks for your help :-) The reason I failed to attach is diff format. I needed to use git diff command with "--full-index" option, but I didn't. > TestLineRecordReader was no test case for compressed files > -- > > Key: MAPREDUCE-5143 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5143 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0, trunk, 2.1.0-beta >Reporter: Sonu Prathap >Assignee: Tsuyoshi OZAWA >Priority: Minor > Attachments: MAPREDUCE-5143.1.patch, MAPREDUCE-5143.2.patch > > > TestLineRecordReader was no test case for compressed files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5412) Change MR to use multiple containers API of ContainerManager after YARN-926
[ https://issues.apache.org/jira/browse/MAPREDUCE-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved MAPREDUCE-5412. Resolution: Fixed Fix Version/s: 2.1.0-beta Hadoop Flags: Reviewed Committed this together with YARN-926. Closing. > Change MR to use multiple containers API of ContainerManager after YARN-926 > --- > > Key: MAPREDUCE-5412 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5412 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Fix For: 2.1.0-beta > > Attachments: MAPREDUCE-5412.txt > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5412) Change MR to use multiple containers API of ContainerManager after YARN-926
[ https://issues.apache.org/jira/browse/MAPREDUCE-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-5412: --- Attachment: MAPREDUCE-5412.txt Attaching MR part of YARN-926 on Jian's behalf. Reviewing and committing this as part of YARN-926. > Change MR to use multiple containers API of ContainerManager after YARN-926 > --- > > Key: MAPREDUCE-5412 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5412 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: MAPREDUCE-5412.txt > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5372) ControlledJob#getMapredJobID capitalization is inconsistent between MR1 and MR2
[ https://issues.apache.org/jira/browse/MAPREDUCE-5372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717863#comment-13717863 ] Akira AJISAKA commented on MAPREDUCE-5372: -- [~zjshen], thanks for the review. I don't want to make an incompatible change again. I think this issue should be "won't fix". [~sandyr], what do you think of this? > ControlledJob#getMapredJobID capitalization is inconsistent between MR1 and > MR2 > --- > > Key: MAPREDUCE-5372 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5372 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Akira AJISAKA > Labels: newbie > Attachments: MAPREDUCE-5372-1.patch, MAPREDUCE-5372-2.patch, > MAPREDUCE-5372-3.patch, MAPREDUCE-5372-4.patch > > > In MR2, the 'd' in Id is lowercase, but in MR1, it is capitalized. While > ControlledJob is marked as Evolving, there is no reason to be inconsistent > here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5413) Killing mapred job in Initial stage leads to java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/MAPREDUCE-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yeshavora updated MAPREDUCE-5413: - Attachment: syslog_MapredKillTask(1).docx > Killing mapred job in Initial stage leads to java.lang.NullPointerException > --- > > Key: MAPREDUCE-5413 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5413 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: yeshavora >Assignee: Omkar Vinit Joshi > Attachments: syslog_MapredKillTask(1).docx > > > Run a MR job and kill it as soon as jobId is known. After killing the job, > try to kill its attempts. > Steps: > 1) start a map reduce job > hadoop jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 10 -r 10 -mt > 5 -rt 5 > 2)Kill the job > hadoop job -kill > 3)Try to kill the attempts for above job. > Result: > mapred job was not able to shut down properly. Attaching syslog. It does not > kill its attempts also. > Above steps leads to below exception, > org.apache.hadoop.util.ShutdownHookManager: ShutdownHook > 'MRAppMasterShutdownHook' failed, java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:811) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1249) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5413) Killing mapred job in Initial stage leads to java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/MAPREDUCE-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated MAPREDUCE-5413: - Attachment: (was: syslog_MapredKillTask.docx) > Killing mapred job in Initial stage leads to java.lang.NullPointerException > --- > > Key: MAPREDUCE-5413 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5413 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: yeshavora >Assignee: Omkar Vinit Joshi > > Run a MR job and kill it as soon as jobId is known. After killing the job, > try to kill its attempts. > Steps: > 1) start a map reduce job > hadoop jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 10 -r 10 -mt > 5 -rt 5 > 2)Kill the job > hadoop job -kill > 3)Try to kill the attempts for above job. > Result: > mapred job was not able to shut down properly. Attaching syslog. It does not > kill its attempts also. > Above steps leads to below exception, > org.apache.hadoop.util.ShutdownHookManager: ShutdownHook > 'MRAppMasterShutdownHook' failed, java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:811) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1249) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5413) Killing mapred job in Initial stage leads to java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/MAPREDUCE-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yeshavora updated MAPREDUCE-5413: - Description: Run a MR job and kill it as soon as jobId is known. After killing the job, try to kill its attempts. Steps: 1) start a map reduce job hadoop jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 10 -r 10 -mt 5 -rt 5 2)Kill the job hadoop job -kill 3)Try to kill the attempts for above job. Result: mapred job was not able to shut down properly. Attaching syslog. It does not kill its attempts also. Above steps leads to below exception, org.apache.hadoop.util.ShutdownHookManager: ShutdownHook 'MRAppMasterShutdownHook' failed, java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:811) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1249) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) was: Run a MR job and kill it as soon as jobId is known. After killing the job, try to kill its attempts. Steps: 1) start a map reduce job /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.0.3.22-alpha-tests.jar sleep -m 10 -r 10 -mt 5 -rt 5 2)Kill the job /usr/bin/mapred -kill 3)Try to kill the attempts for above job. Result: mapred job was not able to shut down properly. Attaching syslog. It does not kill its attempts also. Above steps leads to below exception, org.apache.hadoop.util.ShutdownHookManager: ShutdownHook 'MRAppMasterShutdownHook' failed, java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:811) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1249) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > Killing mapred job in Initial stage leads to java.lang.NullPointerException > --- > > Key: MAPREDUCE-5413 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5413 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: yeshavora >Assignee: Omkar Vinit Joshi > Attachments: syslog_MapredKillTask.docx > > > Run a MR job and kill it as soon as jobId is known. After killing the job, > try to kill its attempts. > Steps: > 1) start a map reduce job > hadoop jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 10 -r 10 -mt > 5 -rt 5 > 2)Kill the job > hadoop job -kill > 3)Try to kill the attempts for above job. > Result: > mapred job was not able to shut down properly. Attaching syslog. It does not > kill its attempts also. > Above steps leads to below exception, > org.apache.hadoop.util.ShutdownHookManager: ShutdownHook > 'MRAppMasterShutdownHook' failed, java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:811) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1249) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5413) Killing mapred job in Initial stage leads to java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/MAPREDUCE-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated MAPREDUCE-5413: - Attachment: syslog_MapredKillTask.docx > Killing mapred job in Initial stage leads to java.lang.NullPointerException > --- > > Key: MAPREDUCE-5413 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5413 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: yeshavora >Assignee: Omkar Vinit Joshi > Attachments: syslog_MapredKillTask.docx > > > Run a MR job and kill it as soon as jobId is known. After killing the job, > try to kill its attempts. > Steps: > 1) start a map reduce job > /usr/lib/hadoop/bin/hadoop jar > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.0.3.22-alpha-tests.jar > sleep -m 10 -r 10 -mt 5 -rt 5 > 2)Kill the job > /usr/bin/mapred -kill > 3)Try to kill the attempts for above job. > Result: > mapred job was not able to shut down properly. Attaching syslog. It does not > kill its attempts also. > Above steps leads to below exception, > org.apache.hadoop.util.ShutdownHookManager: ShutdownHook > 'MRAppMasterShutdownHook' failed, java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:811) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1249) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5413) Killing mapred job in Initial stage leads to java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/MAPREDUCE-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated MAPREDUCE-5413: - Description: Run a MR job and kill it as soon as jobId is known. After killing the job, try to kill its attempts. Steps: 1) start a map reduce job /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.0.3.22-alpha-tests.jar sleep -m 10 -r 10 -mt 5 -rt 5 2)Kill the job /usr/bin/mapred -kill 3)Try to kill the attempts for above job. Result: mapred job was not able to shut down properly. Attaching syslog. It does not kill its attempts also. Above steps leads to below exception, org.apache.hadoop.util.ShutdownHookManager: ShutdownHook 'MRAppMasterShutdownHook' failed, java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:811) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1249) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) was: Run a MR job and kill it as soon as jobId is known. After killing the job, try to kill its attempts. Above steps leads to below exception, org.apache.hadoop.util.ShutdownHookManager: ShutdownHook 'MRAppMasterShutdownHook' failed, java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:811) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1249) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > Killing mapred job in Initial stage leads to java.lang.NullPointerException > --- > > Key: MAPREDUCE-5413 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5413 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: yeshavora >Assignee: Omkar Vinit Joshi > > Run a MR job and kill it as soon as jobId is known. After killing the job, > try to kill its attempts. > Steps: > 1) start a map reduce job > /usr/lib/hadoop/bin/hadoop jar > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.0.3.22-alpha-tests.jar > sleep -m 10 -r 10 -mt 5 -rt 5 > 2)Kill the job > /usr/bin/mapred -kill > 3)Try to kill the attempts for above job. > Result: > mapred job was not able to shut down properly. Attaching syslog. It does not > kill its attempts also. > Above steps leads to below exception, > org.apache.hadoop.util.ShutdownHookManager: ShutdownHook > 'MRAppMasterShutdownHook' failed, java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:811) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1249) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5413) Killing mapred job in Initial stage leads to java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/MAPREDUCE-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yeshavora updated MAPREDUCE-5413: - Target Version/s: 2.1.0-beta > Killing mapred job in Initial stage leads to java.lang.NullPointerException > --- > > Key: MAPREDUCE-5413 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5413 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: yeshavora >Assignee: Omkar Vinit Joshi > > Run a MR job and kill it as soon as jobId is known. After killing the job, > try to kill its attempts. > Above steps leads to below exception, > org.apache.hadoop.util.ShutdownHookManager: ShutdownHook > 'MRAppMasterShutdownHook' failed, java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:811) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1249) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5413) Killing mapred job in Initial stage leads to java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/MAPREDUCE-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yeshavora updated MAPREDUCE-5413: - Assignee: Omkar Vinit Joshi > Killing mapred job in Initial stage leads to java.lang.NullPointerException > --- > > Key: MAPREDUCE-5413 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5413 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: yeshavora >Assignee: Omkar Vinit Joshi > > Run a MR job and kill it as soon as jobId is known. After killing the job, > try to kill its attempts. > Above steps leads to below exception, > org.apache.hadoop.util.ShutdownHookManager: ShutdownHook > 'MRAppMasterShutdownHook' failed, java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:811) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1249) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5413) Killing mapred job in Initial stage leads to java.lang.NullPointerException
yeshavora created MAPREDUCE-5413: Summary: Killing mapred job in Initial stage leads to java.lang.NullPointerException Key: MAPREDUCE-5413 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5413 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: yeshavora Run a MR job and kill it as soon as jobId is known. After killing the job, try to kill its attempts. Above steps leads to below exception, org.apache.hadoop.util.ShutdownHookManager: ShutdownHook 'MRAppMasterShutdownHook' failed, java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:811) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1249) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717751#comment-13717751 ] Hadoop QA commented on MAPREDUCE-5367: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593094/MAPREDUCE-5367-b1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3887//console This message is automatically generated. > Local jobs all use same local working directory > --- > > Key: MAPREDUCE-5367 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5367-b1.patch > > > This means that local jobs, even in different JVMs, can't run concurrently > because they might delete each other's files during work directory setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5367) Local jobs all use same local working directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5367: -- Status: Patch Available (was: Open) > Local jobs all use same local working directory > --- > > Key: MAPREDUCE-5367 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5367-b1.patch > > > This means that local jobs, even in different JVMs, can't run concurrently > because they might delete each other's files during work directory setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717725#comment-13717725 ] Alejandro Abdelnur commented on MAPREDUCE-4049: --- [~avnerb], sorry for the delay. I've just tried applying the patch to branch-1 and there is one hunk failing, it seems a trivial rebase. Mind rebasing the patch to current HEAD of branch-1? Once you do that it can go in. > plugin for generic shuffle service > -- > > Key: MAPREDUCE-4049 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: performance, task, tasktracker >Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 >Reporter: Avner BenHanoch >Assignee: Avner BenHanoch > Labels: merge, plugin, rdma, shuffle > Fix For: 2.0.3-alpha > > Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, > MAPREDUCE-4049--branch-1.patch, MAPREDUCE-4049--branch-1.patch, > MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch > > > Support generic shuffle service as set of two plugins: ShuffleProvider & > ShuffleConsumer. > This will satisfy the following needs: > # Better shuffle and merge performance. For example: we are working on > shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, > or Infiniband) instead of using the current HTTP shuffle. Based on the fast > RDMA shuffle, the plugin can also utilize a suitable merge approach during > the intermediate merges. Hence, getting much better performance. > # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden > dependency of NodeManager with a specific version of mapreduce shuffle > (currently targeted to 0.24.0). > References: > # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu > from Auburn University with others, > [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] > # I am attaching 2 documents with suggested Top Level Design for both plugins > (currently, based on 1.0 branch) > # I am providing link for downloading UDA - Mellanox's open source plugin > that implements generic shuffle service using RDMA and levitated merge. > Note: At this phase, the code is in C++ through JNI and you should consider > it as beta only. Still, it can serve anyone that wants to implement or > contribute to levitated merge. (Please be advised that levitated merge is > mostly suit in very fast networks) - > [http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=144&menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717714#comment-13717714 ] Hadoop QA commented on MAPREDUCE-5251: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593790/MAPREDUCE-5251-6.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3886//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3886//console This message is automatically generated. > Reducer should not implicate map attempt if it has insufficient space to > fetch map output > - > > Key: MAPREDUCE-5251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Ashwin Shankar > Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, > MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt > > > A job can fail if a reducer happens to run on a node with insufficient space > to hold a map attempt's output. The reducer keeps reporting the map attempt > as bad, and if the map attempt ends up being re-launched too many times > before the reducer decides maybe it is the real problem the job can fail. > In that scenario it would be better to re-launch the reduce attempt and > hopefully it will run on another node that has sufficient space to complete > the shuffle. Reporting the map attempt is bad and relaunching the map task > doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5403) yarn.application.classpath requires client to know service internals
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717687#comment-13717687 ] Sandy Ryza commented on MAPREDUCE-5403: --- Test failures are related to YARN-960 > yarn.application.classpath requires client to know service internals > > > Key: MAPREDUCE-5403 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5403.patch > > > yarn.application.classpath is a confusing property because it is used by > MapReduce and not YARN, and MapReduce already has > mapreduce.application.classpath, which provides the same functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5403) yarn.application.classpath requires client to know service internals
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717685#comment-13717685 ] Hadoop QA commented on MAPREDUCE-5403: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593775/MAPREDUCE-5403.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.mapreduce.security.TestBinaryTokenFile org.apache.hadoop.mapreduce.security.TestMRCredentials {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3884//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3884//console This message is automatically generated. > yarn.application.classpath requires client to know service internals > > > Key: MAPREDUCE-5403 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5403.patch > > > yarn.application.classpath is a confusing property because it is used by > MapReduce and not YARN, and MapReduce already has > mapreduce.application.classpath, which provides the same functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated MAPREDUCE-5251: -- Attachment: MAPREDUCE-5251-6.txt Makes sense,both the comments addressed in latest patch. > Reducer should not implicate map attempt if it has insufficient space to > fetch map output > - > > Key: MAPREDUCE-5251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Ashwin Shankar > Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, > MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt > > > A job can fail if a reducer happens to run on a node with insufficient space > to hold a map attempt's output. The reducer keeps reporting the map attempt > as bad, and if the map attempt ends up being re-launched too many times > before the reducer decides maybe it is the real problem the job can fail. > In that scenario it would be better to re-launch the reduce attempt and > hopefully it will run on another node that has sufficient space to complete > the shuffle. Reporting the map attempt is bad and relaunching the map task > doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717615#comment-13717615 ] Hadoop QA commented on MAPREDUCE-1981: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593773/MAPREDUCE-1981.branch-0.23.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3885//console This message is automatically generated. > Improve getSplits performance by using listFiles, the new FileSystem API > > > Key: MAPREDUCE-1981 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 0.23.0 >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Attachments: mapredListFiles1.patch, mapredListFiles2.patch, > mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, > mapredListFiles.patch, MAPREDUCE-1981.branch-0.23.patch, MAPREDUCE-1981.patch > > > This jira will make FileInputFormat and CombinedFileInputForm to use the new > API, thus reducing the number of RPCs to HDFS NameNode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5403) yarn.application.classpath requires client to know service internals
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5403: -- Summary: yarn.application.classpath requires client to know service internals (was: Get rid of yarn.application.classpath) > yarn.application.classpath requires client to know service internals > > > Key: MAPREDUCE-5403 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5403.patch > > > yarn.application.classpath is a confusing property because it is used by > MapReduce and not YARN, and MapReduce already has > mapreduce.application.classpath, which provides the same functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5403) Get rid of yarn.application.classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5403: -- Status: Patch Available (was: Open) > Get rid of yarn.application.classpath > - > > Key: MAPREDUCE-5403 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5403.patch > > > yarn.application.classpath is a confusing property because it is used by > MapReduce and not YARN, and MapReduce already has > mapreduce.application.classpath, which provides the same functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5403) Get rid of yarn.application.classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5403: -- Attachment: MAPREDUCE-5403.patch > Get rid of yarn.application.classpath > - > > Key: MAPREDUCE-5403 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5403.patch > > > yarn.application.classpath is a confusing property because it is used by > MapReduce and not YARN, and MapReduce already has > mapreduce.application.classpath, which provides the same functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5403) Get rid of yarn.application.classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717603#comment-13717603 ] Sandy Ryza commented on MAPREDUCE-5403: --- bq. I wanted to point out there are other interesting tidbits of information in yarn-site.xml besides the classpath that clients may want to access, and I'm wondering what criteria qualifies a client-consumed property to graduate to an environment variable or some other mechanism for determining the value besides parsing yarn-site.xml. My saying that YARN configs should not be client configs was imprecise, and I totally agree that the reality has more shares of gray. Regarding the criteria, my general ideal would be to only keep client configs that relate to how to locate and communicate with the YARN services. Though that might need to be amended to cover some requirements I'm not thinking of? Uploading a patch that makes yarn.application.classpath a server-side config. Whatever is placed into a NodeManager's yarn.application.classpath gets placed in the container environment as $YARN_APPLICATION_CLASSPATH. > Get rid of yarn.application.classpath > - > > Key: MAPREDUCE-5403 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > > yarn.application.classpath is a confusing property because it is used by > MapReduce and not YARN, and MapReduce already has > mapreduce.application.classpath, which provides the same functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-1981: -- Attachment: MAPREDUCE-1981.branch-0.23.patch Thanks for the review, Kihwal. Here's the equivalent patch for branch-0.23. > Improve getSplits performance by using listFiles, the new FileSystem API > > > Key: MAPREDUCE-1981 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 0.23.0 >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Attachments: mapredListFiles1.patch, mapredListFiles2.patch, > mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, > mapredListFiles.patch, MAPREDUCE-1981.branch-0.23.patch, MAPREDUCE-1981.patch > > > This jira will make FileInputFormat and CombinedFileInputForm to use the new > API, thus reducing the number of RPCs to HDFS NameNode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4959) bundling classpath into jar manifest on Windows does not expand environment variables or wildcards
[ https://issues.apache.org/jira/browse/MAPREDUCE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated MAPREDUCE-4959: - Target Version/s: 1-win, 1.3.0 (was: 1-win) Expanding scope of target versions to include next unreleased 1.x version, 1.3.0. HBase and others would benefit from having this code in branch-1, as per discussion on YARN-358. [~ndimiduk], you mentioned needing this in 1.1.x. Would it be acceptable for HBase to upgrade to 1.3.0 to pick up this change, or would it really need to be a new 1.1.x version? > bundling classpath into jar manifest on Windows does not expand environment > variables or wildcards > -- > > Key: MAPREDUCE-4959 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4959 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1-win >Reporter: Chris Nauroth > > To support long classpaths on Windows, the class path entries get bundled > into a small temporary jar with a manifest that has a Class-Path attribute. > When a classpath is specified in a jar manifest like this, it does not expand > environment variables (i.e. %HADOOP_COMMON_HOME%), and it does not expand > wildcards (i.e. lib/*.jar). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5412) Change MR to use multiple containers API of ContainerManager after YARN-926
Jian He created MAPREDUCE-5412: -- Summary: Change MR to use multiple containers API of ContainerManager after YARN-926 Key: MAPREDUCE-5412 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5412 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717531#comment-13717531 ] Jason Lowe commented on MAPREDUCE-5251: --- I see reportLocalError is now throwing UnknownHostException. Unfortunately since that is an IOException, if it ever does do that it will end up catching that in the outer try-catch block in copyMapOutput and a map attempt wil be blamed for it. Also now that I think of it, we arguably should be incrementing the ioErrs counter before calling reportLocalError since this is an I/O error during the shuffle that prevented a successful map output transfer. > Reducer should not implicate map attempt if it has insufficient space to > fetch map output > - > > Key: MAPREDUCE-5251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Ashwin Shankar > Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, > MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt > > > A job can fail if a reducer happens to run on a node with insufficient space > to hold a map attempt's output. The reducer keeps reporting the map attempt > as bad, and if the map attempt ends up being re-launched too many times > before the reducer decides maybe it is the real problem the job can fail. > In that scenario it would be better to re-launch the reduce attempt and > hopefully it will run on another node that has sufficient space to complete > the shuffle. Reporting the map attempt is bad and relaunching the map task > doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717519#comment-13717519 ] Hadoop QA commented on MAPREDUCE-5251: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593765/MAPREDUCE-5251-5.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3883//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3883//console This message is automatically generated. > Reducer should not implicate map attempt if it has insufficient space to > fetch map output > - > > Key: MAPREDUCE-5251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Ashwin Shankar > Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, > MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt > > > A job can fail if a reducer happens to run on a node with insufficient space > to hold a map attempt's output. The reducer keeps reporting the map attempt > as bad, and if the map attempt ends up being re-launched too many times > before the reducer decides maybe it is the real problem the job can fail. > In that scenario it would be better to re-launch the reduce attempt and > hopefully it will run on another node that has sufficient space to complete > the shuffle. Reporting the map attempt is bad and relaunching the map task > doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5356) Ability to refresh aggregated log retention period and check interval
[ https://issues.apache.org/jira/browse/MAPREDUCE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5356: -- Resolution: Fixed Fix Version/s: 2.3.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks, Ashwin! I committed this to trunk and branch-2. > Ability to refresh aggregated log retention period and check interval > -- > > Key: MAPREDUCE-5356 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5356 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Fix For: 3.0.0, 2.3.0 > > Attachments: MAPREDUCE-5266-2.txt, MAPREDUCE-5266-3.txt, > MAPREDUCE-5266-4.txt, MAPREDUCE-5356-5.txt, MAPREDUCE-5356-5.txt, > WHOLE_PATCH_NOT_TO_BE_CHKEDIN-MAPREDUCE-5356-5.txt > > > We want to be able to refresh log aggregation retention time > and 'check interval' time on the fly by changing configs so that we dont have > to bounce history server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5356) Ability to refresh aggregated log retention period and check interval
[ https://issues.apache.org/jira/browse/MAPREDUCE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5356: -- Summary: Ability to refresh aggregated log retention period and check interval (was: Refresh Log aggregation 'retention period' and 'check interval' ) > Ability to refresh aggregated log retention period and check interval > -- > > Key: MAPREDUCE-5356 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5356 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Attachments: MAPREDUCE-5266-2.txt, MAPREDUCE-5266-3.txt, > MAPREDUCE-5266-4.txt, MAPREDUCE-5356-5.txt, MAPREDUCE-5356-5.txt, > WHOLE_PATCH_NOT_TO_BE_CHKEDIN-MAPREDUCE-5356-5.txt > > > We want to be able to refresh log aggregation retention time > and 'check interval' time on the fly by changing configs so that we dont have > to bounce history server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated MAPREDUCE-5251: -- Attachment: MAPREDUCE-5251-5.txt Thanks,patch updated. > Reducer should not implicate map attempt if it has insufficient space to > fetch map output > - > > Key: MAPREDUCE-5251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Ashwin Shankar > Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, > MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt > > > A job can fail if a reducer happens to run on a node with insufficient space > to hold a map attempt's output. The reducer keeps reporting the map attempt > as bad, and if the map attempt ends up being re-launched too many times > before the reducer decides maybe it is the real problem the job can fail. > In that scenario it would be better to re-launch the reduce attempt and > hopefully it will run on another node that has sufficient space to complete > the shuffle. Reporting the map attempt is bad and relaunching the map task > doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5356) Refresh Log aggregation 'retention period' and 'check interval'
[ https://issues.apache.org/jira/browse/MAPREDUCE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717494#comment-13717494 ] Jason Lowe commented on MAPREDUCE-5356: --- +1, lgtm. Committing this. > Refresh Log aggregation 'retention period' and 'check interval' > > > Key: MAPREDUCE-5356 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5356 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Attachments: MAPREDUCE-5266-2.txt, MAPREDUCE-5266-3.txt, > MAPREDUCE-5266-4.txt, MAPREDUCE-5356-5.txt, MAPREDUCE-5356-5.txt, > WHOLE_PATCH_NOT_TO_BE_CHKEDIN-MAPREDUCE-5356-5.txt > > > We want to be able to refresh log aggregation retention time > and 'check interval' time on the fly by changing configs so that we dont have > to bounce history server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717487#comment-13717487 ] Jason Lowe commented on MAPREDUCE-5251: --- Thanks for the update, Aswhin. Couple of minor things: * reportLocalError probably should just compute the hostname itself rather than requiring callers to do so * there is whitespace missing between arguments added in the latest patch (which will be fixed if we remove the reduceHost arg to reportLocalError) > Reducer should not implicate map attempt if it has insufficient space to > fetch map output > - > > Key: MAPREDUCE-5251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Ashwin Shankar > Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, > MAPREDUCE-5251-4.txt > > > A job can fail if a reducer happens to run on a node with insufficient space > to hold a map attempt's output. The reducer keeps reporting the map attempt > as bad, and if the map attempt ends up being re-launched too many times > before the reducer decides maybe it is the real problem the job can fail. > In that scenario it would be better to re-launch the reduce attempt and > hopefully it will run on another node that has sufficient space to complete > the shuffle. Reporting the map attempt is bad and relaunching the map task > doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5356) Refresh Log aggregation 'retention period' and 'check interval'
[ https://issues.apache.org/jira/browse/MAPREDUCE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717472#comment-13717472 ] Hadoop QA commented on MAPREDUCE-5356: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593755/MAPREDUCE-5356-5.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3882//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3882//console This message is automatically generated. > Refresh Log aggregation 'retention period' and 'check interval' > > > Key: MAPREDUCE-5356 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5356 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Attachments: MAPREDUCE-5266-2.txt, MAPREDUCE-5266-3.txt, > MAPREDUCE-5266-4.txt, MAPREDUCE-5356-5.txt, MAPREDUCE-5356-5.txt, > WHOLE_PATCH_NOT_TO_BE_CHKEDIN-MAPREDUCE-5356-5.txt > > > We want to be able to refresh log aggregation retention time > and 'check interval' time on the fly by changing configs so that we dont have > to bounce history server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716704#comment-13716704 ] Hadoop QA commented on MAPREDUCE-5251: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593044/MAPREDUCE-5251-4.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3881//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3881//console This message is automatically generated. > Reducer should not implicate map attempt if it has insufficient space to > fetch map output > - > > Key: MAPREDUCE-5251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Ashwin Shankar > Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, > MAPREDUCE-5251-4.txt > > > A job can fail if a reducer happens to run on a node with insufficient space > to hold a map attempt's output. The reducer keeps reporting the map attempt > as bad, and if the map attempt ends up being re-launched too many times > before the reducer decides maybe it is the real problem the job can fail. > In that scenario it would be better to re-launch the reduce attempt and > hopefully it will run on another node that has sufficient space to complete > the shuffle. Reporting the map attempt is bad and relaunching the map task > doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4366) mapred metrics shows negative count of waiting maps and reduces
[ https://issues.apache.org/jira/browse/MAPREDUCE-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716701#comment-13716701 ] Alejandro Abdelnur commented on MAPREDUCE-4366: --- [~acmurthy], if you don't have any further comments/concerns, I'll commit this later this week. > mapred metrics shows negative count of waiting maps and reduces > --- > > Key: MAPREDUCE-4366 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4366 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 1.0.2 >Reporter: Thomas Graves >Assignee: Sandy Ryza > Attachments: MAPREDUCE-4366-branch-1-1.patch, > MAPREDUCE-4366-branch-1.patch > > > Negative waiting_maps and waiting_reduces count is observed in the mapred > metrics. MAPREDUCE-1238 partially fixed this but it appears there is still > issues as we are seeing it, but not as bad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5356) Refresh Log aggregation 'retention period' and 'check interval'
[ https://issues.apache.org/jira/browse/MAPREDUCE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5356: -- Attachment: MAPREDUCE-5356-5.txt Now that MAPREDUCE-5265 is in, re-attaching latest version of patch so Jenkins can comment. > Refresh Log aggregation 'retention period' and 'check interval' > > > Key: MAPREDUCE-5356 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5356 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Attachments: MAPREDUCE-5266-2.txt, MAPREDUCE-5266-3.txt, > MAPREDUCE-5266-4.txt, MAPREDUCE-5356-5.txt, MAPREDUCE-5356-5.txt, > WHOLE_PATCH_NOT_TO_BE_CHKEDIN-MAPREDUCE-5356-5.txt > > > We want to be able to refresh log aggregation retention time > and 'check interval' time on the fly by changing configs so that we dont have > to bounce history server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716688#comment-13716688 ] Kihwal Lee commented on MAPREDUCE-1981: --- +1 The patch looks good. I also ran some tests and they worked successfully. Thanks for fixing both mapred and mapreduce. > Improve getSplits performance by using listFiles, the new FileSystem API > > > Key: MAPREDUCE-1981 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 0.23.0 >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Attachments: mapredListFiles1.patch, mapredListFiles2.patch, > mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, > mapredListFiles.patch, MAPREDUCE-1981.patch > > > This jira will make FileInputFormat and CombinedFileInputForm to use the new > API, thus reducing the number of RPCs to HDFS NameNode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5251: -- Target Version/s: 3.0.0, 2.3.0, 0.23.10 (was: 3.0.0, 2.1.0-beta, 0.23.10) Status: Patch Available (was: Open) > Reducer should not implicate map attempt if it has insufficient space to > fetch map output > - > > Key: MAPREDUCE-5251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.4-alpha, 0.23.7 >Reporter: Jason Lowe >Assignee: Ashwin Shankar > Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, > MAPREDUCE-5251-4.txt > > > A job can fail if a reducer happens to run on a node with insufficient space > to hold a map attempt's output. The reducer keeps reporting the map attempt > as bad, and if the map attempt ends up being re-launched too many times > before the reducer decides maybe it is the real problem the job can fail. > In that scenario it would be better to re-launch the reduce attempt and > hopefully it will run on another node that has sufficient space to complete > the shuffle. Reporting the map attempt is bad and relaunching the map task > doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5409) MRAppMaster throws InvalidStateTransitonException: Invalid event: TA_TOO_MANY_FETCH_FAILURE at KILLED for TaskAttemptImpl
[ https://issues.apache.org/jira/browse/MAPREDUCE-5409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716639#comment-13716639 ] Zhijie Shen commented on MAPREDUCE-5409: [~devaraj.k], would you mind sharing more context of the exception? For example, the state of TaskAttemptImpl before KILLED. My guess is that before TaskAttemptImpl entered KILLED, it was probably at RUNNING or COMMIT_PENDING, where StatusUpdater transition could happen. This transition would send a JOB_TASK_ATTEMPT_FETCH_FAILURE event to JobImpl, TaskAttemptFetchFailureTransition would be triggered, and a TA_TOO_MANY_FETCH_FAILURE would be sent back to TaskAttemptImpl. Before this event was processed, TaskAttemptImpl went through KilledTransition as it received a TA_CONTAINER_CLEANED event in advance. Therefore, TA_TOO_MANY_FETCH_FAILURE at KILLED happened. > MRAppMaster throws InvalidStateTransitonException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at KILLED for TaskAttemptImpl > - > > Key: MAPREDUCE-5409 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5409 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.5-alpha >Reporter: Devaraj K >Assignee: Devaraj K > > {code:xml} > 2013-07-23 12:28:05,217 INFO [IPC Server handler 29 on 50796] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt > attempt_1374560536158_0003_m_40_0 is : 0.0 > 2013-07-23 12:28:05,221 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1374560536158_0003_m_07_0 ... raising > fetch failure to map > 2013-07-23 12:28:05,222 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1374560536158_0003_m_07_0 > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1032) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:143) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1123) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1115) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) > at java.lang.Thread.run(Thread.java:662) > 2013-07-23 12:28:05,249 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: > job_1374560536158_0003Job Transitioned from RUNNING to ERROR > 2013-07-23 12:28:05,338 INFO [IPC Server handler 16 on 50796] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from > attempt_1374560536158_0003_m_40_0 > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5411) Refresh size of loaded job cache on history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716609#comment-13716609 ] Hadoop QA commented on MAPREDUCE-5411: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593729/LOADED_JOB_CACHE_MR5411-1.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3880//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3880//console This message is automatically generated. > Refresh size of loaded job cache on history server > -- > > Key: MAPREDUCE-5411 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Attachments: LOADED_JOB_CACHE_MR5411-1.txt > > > We want to be able to refresh size of the loaded job > cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server > through history server's admin interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
[ https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716584#comment-13716584 ] Hadoop QA commented on MAPREDUCE-5402: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593727/MAPREDUCE-5402.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-distcp. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3879//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3879//console This message is automatically generated. > DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE > -- > > Key: MAPREDUCE-5402 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp, mrv2 >Reporter: David Rosenstrauch >Assignee: Tsuyoshi OZAWA > Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, > MAPREDUCE-5402.3.patch > > > In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author > describes the implementation of DynamicInputFormat, with one of the main > motivations cited being to reduce the chance of long-tails where a few > leftover mappers run much longer than the rest. > However, I today ran into a situation where I experienced exactly such a long > tail using DistCpV2 and DynamicInputFormat. And when I tried to alleviate > the problem by overriding the number of mappers and the split ratio used by > the DynamicInputFormat, I was prevented from doing so by the hard-coded limit > set in the code by the MAX_CHUNKS_TOLERABLE constant. (Currently set to 400.) > This constant is actually set quite low for production use. (See a > description of my use case below.) And although MAPREDUCE-2765 states that > this is an "overridable maximum", when reading through the code there does > not actually appear to be any mechanism available to override it. > This should be changed. It should be possible to expand the maximum # of > chunks beyond this arbitrary limit. > For example, here is the situation I ran into today: > I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots. > The job consisted of copying ~2800 files from HDFS to Amazon S3. I overrode > the number of mappers for the job from the default of 20 to 128, so as to > more properly parallelize the copy across the cluster. The number of chunk > files created was calculated as 241, and mapred.num.entries.per.chunk was > calculated as 12. > As the job ran on, it reached a point where there were only 4 remaining map > tasks, which had each been running for over 2 hours. The reason for this was > that each of the 12 files that those mappers were copying were quite large > (several hundred megabytes in size) and took ~20 minutes each. However, > during this time, all the other 124 mappers sat idle. > In theory I should be able to alleviate this problem with DynamicInputFormat. > If I were able to, say, quadruple the number of chunk files created, that > would have made each chunk contain only 3 files, and these large files would > have gotten distributed better around the cluster and copied in parallel. > However, when I tried to do that - by overriding mapred.listing.split.ratio > to, say, 10 - DynamicInputFormat responded with an exception ("Too many > chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease > split-ratio to proceed.") - presumably because I exceeded the > MAX_CHUNKS_TOLERABLE value of 400. > Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit? I > can't personally s
[jira] [Updated] (MAPREDUCE-5317) Stale files left behind for failed jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5317: -- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks, Ravi. I committed this to trunk, branch-2, and branch-0.23. > Stale files left behind for failed jobs > --- > > Key: MAPREDUCE-5317 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5317 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 3.0.0, 2.0.4-alpha, 0.23.8 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 3.0.0, 2.3.0, 0.23.10 > > Attachments: MAPREDUCE-5317.branch-0.23.patch, > MAPREDUCE-5317.branch-0.23.patch, MAPREDUCE-5317.branch-0.23.patch, > MAPREDUCE-5317.patch, MAPREDUCE-5317.patch, MAPREDUCE-5317.patch, > MAPREDUCE-5317.patch, MAPREDUCE-5317.patch, MAPREDUCE-5317.patch > > > Courtesy [~amar_kamat]! > {quote} > We are seeing _temporary files left behind in the output folder if the job > fails. > The job were failed due to hitting quota issue. > I simply ran the randomwriter (from hadoop examples) with the default setting. > That failed and left behind some stray files. > {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5411) Refresh size of loaded job cache on history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated MAPREDUCE-5411: -- Attachment: LOADED_JOB_CACHE_MR5411-1.txt > Refresh size of loaded job cache on history server > -- > > Key: MAPREDUCE-5411 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Attachments: LOADED_JOB_CACHE_MR5411-1.txt > > > We want to be able to refresh size of the loaded job > cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server > through history server's admin interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5411) Refresh size of loaded job cache on history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated MAPREDUCE-5411: -- Attachment: LOADED_JOB_CACHE_MR5411-1.txt > Refresh size of loaded job cache on history server > -- > > Key: MAPREDUCE-5411 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Attachments: LOADED_JOB_CACHE_MR5411-1.txt > > > We want to be able to refresh size of the loaded job > cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server > through history server's admin interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5411) Refresh size of loaded job cache on history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated MAPREDUCE-5411: -- Status: Patch Available (was: Open) Added a new command on history server's admin interface 'refreshLoadedJobCache' which refreshes the size of loaded job cache. > Refresh size of loaded job cache on history server > -- > > Key: MAPREDUCE-5411 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Attachments: LOADED_JOB_CACHE_MR5411-1.txt > > > We want to be able to refresh size of the loaded job > cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server > through history server's admin interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
[ https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-5402: -- Attachment: MAPREDUCE-5402.3.patch Fixed to pass compile. > DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE > -- > > Key: MAPREDUCE-5402 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp, mrv2 >Reporter: David Rosenstrauch >Assignee: Tsuyoshi OZAWA > Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, > MAPREDUCE-5402.3.patch > > > In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author > describes the implementation of DynamicInputFormat, with one of the main > motivations cited being to reduce the chance of long-tails where a few > leftover mappers run much longer than the rest. > However, I today ran into a situation where I experienced exactly such a long > tail using DistCpV2 and DynamicInputFormat. And when I tried to alleviate > the problem by overriding the number of mappers and the split ratio used by > the DynamicInputFormat, I was prevented from doing so by the hard-coded limit > set in the code by the MAX_CHUNKS_TOLERABLE constant. (Currently set to 400.) > This constant is actually set quite low for production use. (See a > description of my use case below.) And although MAPREDUCE-2765 states that > this is an "overridable maximum", when reading through the code there does > not actually appear to be any mechanism available to override it. > This should be changed. It should be possible to expand the maximum # of > chunks beyond this arbitrary limit. > For example, here is the situation I ran into today: > I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots. > The job consisted of copying ~2800 files from HDFS to Amazon S3. I overrode > the number of mappers for the job from the default of 20 to 128, so as to > more properly parallelize the copy across the cluster. The number of chunk > files created was calculated as 241, and mapred.num.entries.per.chunk was > calculated as 12. > As the job ran on, it reached a point where there were only 4 remaining map > tasks, which had each been running for over 2 hours. The reason for this was > that each of the 12 files that those mappers were copying were quite large > (several hundred megabytes in size) and took ~20 minutes each. However, > during this time, all the other 124 mappers sat idle. > In theory I should be able to alleviate this problem with DynamicInputFormat. > If I were able to, say, quadruple the number of chunk files created, that > would have made each chunk contain only 3 files, and these large files would > have gotten distributed better around the cluster and copied in parallel. > However, when I tried to do that - by overriding mapred.listing.split.ratio > to, say, 10 - DynamicInputFormat responded with an exception ("Too many > chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease > split-ratio to proceed.") - presumably because I exceeded the > MAX_CHUNKS_TOLERABLE value of 400. > Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit? I > can't personally see any. > If this limit has no particular logic behind it, then it should be > overridable - or even better: removed altogether. After all, I'm not sure I > see any need for it. Even if numMaps * splitRatio resulted in an > extraordinarily large number, if the code were modified so that the number of > chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then > there would be no need for MAX_CHUNKS_TOLERABLE. In this worst-case scenario > where the product of numMaps and splitRatio is large, capping the number of > chunks at the number of files (numberOfChunks = numberOfFiles) would result > in 1 file per chunk - the maximum parallelization possible. That may not be > the best-tuned solution for some users, but I would think that it should be > left up to the user to deal with the potential consequence of not having > tuned their job properly. Certainly that would be better than having an > arbitrary hard-coded limit that *prevents* proper parallelization when > dealing with large files and/or large numbers of mappers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5411) Refresh size of loaded job cache on history server
Ashwin Shankar created MAPREDUCE-5411: - Summary: Refresh size of loaded job cache on history server Key: MAPREDUCE-5411 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: jobhistoryserver Affects Versions: 2.1.0-beta Reporter: Ashwin Shankar Assignee: Ashwin Shankar We want to be able to refresh size of the loaded job cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server through history server's admin interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (MAPREDUCE-5408) CLONE - The logging level of the tasks should be configurable by the job
[ https://issues.apache.org/jira/browse/MAPREDUCE-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716490#comment-13716490 ] Arun C Murthy edited comment on MAPREDUCE-5408 at 7/23/13 3:57 PM: --- Thanks Hitesh, I've fixed default level to be INFO. W.r.t the first comment, let's keep backport as the same to ensure we are compatible with branch-2 (for binary compat). was (Author: acmurthy): Thanks Hitesh, I've fixed default level to be INFO. W.r.t the first comment, let's keep backport as the same. > CLONE - The logging level of the tasks should be configurable by the job > > > Key: MAPREDUCE-5408 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5408 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Arun C Murthy > Fix For: 1.3.0 > > Attachments: MAPREDUCE-336_branch1.patch, MAPREDUCE-336_branch1.patch > > > It would be nice to be able to configure the logging level of the Task JVM's > separately from the server JVM's. Reducing logging substantially increases > performance and reduces the consumption of local disk on the task trackers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5317) Stale files left behind for failed jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716512#comment-13716512 ] Jason Lowe commented on MAPREDUCE-5317: --- +1 > Stale files left behind for failed jobs > --- > > Key: MAPREDUCE-5317 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5317 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 3.0.0, 2.0.4-alpha, 0.23.8 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 3.0.0, 2.3.0, 0.23.10 > > Attachments: MAPREDUCE-5317.branch-0.23.patch, > MAPREDUCE-5317.branch-0.23.patch, MAPREDUCE-5317.branch-0.23.patch, > MAPREDUCE-5317.patch, MAPREDUCE-5317.patch, MAPREDUCE-5317.patch, > MAPREDUCE-5317.patch, MAPREDUCE-5317.patch, MAPREDUCE-5317.patch > > > Courtesy [~amar_kamat]! > {quote} > We are seeing _temporary files left behind in the output folder if the job > fails. > The job were failed due to hitting quota issue. > I simply ran the randomwriter (from hadoop examples) with the default setting. > That failed and left behind some stray files. > {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5403) Get rid of yarn.application.classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716503#comment-13716503 ] Jason Lowe commented on MAPREDUCE-5403: --- bq. I don't think yarn.resourcemanager.address should be an environment variable. I think there is a conceptual difference between a client knowing how to contact a service and a client knowing about details of the service's internals. Do you disagree? I'm just pointing out that locating the YARN jars is not the only thing clients may need to do to communicate with YARN. e.g.: a non-Java YARN client still needs to locate the ResourceManager address somehow, and currently it can do this via parsing yarn-site.xml. To me, yarn-site.xml is a site-specific config regardless of whether it's a YARN server or YARN client processing it. For a site that doesn't host any YARN daemons (e.g.: gateway or launcher node), that becomes essentially a client-side-only config. I can see where you're coming from, and maybe for the classpath we should do something different (e.g.: environment variable and/or clients should use "yarn classpath" to get the classpath instead). I wanted to point out there are other interesting tidbits of information in yarn-site.xml besides the classpath that clients may want to access, and I'm wondering what criteria qualifies a client-consumed property to graduate to an environment variable or some other mechanism for determining the value besides parsing yarn-site.xml. > Get rid of yarn.application.classpath > - > > Key: MAPREDUCE-5403 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > > yarn.application.classpath is a confusing property because it is used by > MapReduce and not YARN, and MapReduce already has > mapreduce.application.classpath, which provides the same functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5408) CLONE - The logging level of the tasks should be configurable by the job
[ https://issues.apache.org/jira/browse/MAPREDUCE-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-5408: - Attachment: MAPREDUCE-336_branch1.patch Thanks Hitesh, I've fixed default level to be INFO. W.r.t the first comment, let's keep backport as the same. > CLONE - The logging level of the tasks should be configurable by the job > > > Key: MAPREDUCE-5408 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5408 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Arun C Murthy > Fix For: 1.3.0 > > Attachments: MAPREDUCE-336_branch1.patch, MAPREDUCE-336_branch1.patch > > > It would be nice to be able to configure the logging level of the Task JVM's > separately from the server JVM's. Reducing logging substantially increases > performance and reduces the consumption of local disk on the task trackers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5408) CLONE - The logging level of the tasks should be configurable by the job
[ https://issues.apache.org/jira/browse/MAPREDUCE-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-5408: - Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this. > CLONE - The logging level of the tasks should be configurable by the job > > > Key: MAPREDUCE-5408 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5408 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Arun C Murthy > Fix For: 1.3.0 > > Attachments: MAPREDUCE-336_branch1.patch, MAPREDUCE-336_branch1.patch > > > It would be nice to be able to configure the logging level of the Task JVM's > separately from the server JVM's. Reducing logging substantially increases > performance and reduces the consumption of local disk on the task trackers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
[ https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716472#comment-13716472 ] Hadoop QA commented on MAPREDUCE-5402: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593707/MAPREDUCE-5402.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3878//console This message is automatically generated. > DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE > -- > > Key: MAPREDUCE-5402 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp, mrv2 >Reporter: David Rosenstrauch >Assignee: Tsuyoshi OZAWA > Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch > > > In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author > describes the implementation of DynamicInputFormat, with one of the main > motivations cited being to reduce the chance of long-tails where a few > leftover mappers run much longer than the rest. > However, I today ran into a situation where I experienced exactly such a long > tail using DistCpV2 and DynamicInputFormat. And when I tried to alleviate > the problem by overriding the number of mappers and the split ratio used by > the DynamicInputFormat, I was prevented from doing so by the hard-coded limit > set in the code by the MAX_CHUNKS_TOLERABLE constant. (Currently set to 400.) > This constant is actually set quite low for production use. (See a > description of my use case below.) And although MAPREDUCE-2765 states that > this is an "overridable maximum", when reading through the code there does > not actually appear to be any mechanism available to override it. > This should be changed. It should be possible to expand the maximum # of > chunks beyond this arbitrary limit. > For example, here is the situation I ran into today: > I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots. > The job consisted of copying ~2800 files from HDFS to Amazon S3. I overrode > the number of mappers for the job from the default of 20 to 128, so as to > more properly parallelize the copy across the cluster. The number of chunk > files created was calculated as 241, and mapred.num.entries.per.chunk was > calculated as 12. > As the job ran on, it reached a point where there were only 4 remaining map > tasks, which had each been running for over 2 hours. The reason for this was > that each of the 12 files that those mappers were copying were quite large > (several hundred megabytes in size) and took ~20 minutes each. However, > during this time, all the other 124 mappers sat idle. > In theory I should be able to alleviate this problem with DynamicInputFormat. > If I were able to, say, quadruple the number of chunk files created, that > would have made each chunk contain only 3 files, and these large files would > have gotten distributed better around the cluster and copied in parallel. > However, when I tried to do that - by overriding mapred.listing.split.ratio > to, say, 10 - DynamicInputFormat responded with an exception ("Too many > chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease > split-ratio to proceed.") - presumably because I exceeded the > MAX_CHUNKS_TOLERABLE value of 400. > Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit? I > can't personally see any. > If this limit has no particular logic behind it, then it should be > overridable - or even better: removed altogether. After all, I'm not sure I > see any need for it. Even if numMaps * splitRatio resulted in an > extraordinarily large number, if the code were modified so that the number of > chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then > there would be no need for MAX_CHUNKS_TOLERABLE. In this worst-case scenario > where the product of numMaps and splitRatio is large, capping the number of > chunks at the number of files (numberOfChunks = numberOfFiles) would result > in 1 file per chunk - the maximum parallelization possible. That may not be > the best-tuned solution for some users, bu
[jira] [Commented] (MAPREDUCE-5408) CLONE - The logging level of the tasks should be configurable by the job
[ https://issues.apache.org/jira/browse/MAPREDUCE-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716470#comment-13716470 ] Hitesh Shah commented on MAPREDUCE-5408: Mostly looks good. A couple of minor comments: - DEFAULT_LOG_LEVEL could be renamed to DEFAULT_TASK_LOG_LEVEL and the type changed to a string. Having the type as Level is not buying much as it always ends up being converted to a string when used. If the intention is to retain the backport as is, this comment can be ignored for now. - Level.toLevel() has an api which takes in a default value. In the event that the user has a typo, the current usage falls back to using DEBUG where as the default-based api can be made to fall back to INFO. > CLONE - The logging level of the tasks should be configurable by the job > > > Key: MAPREDUCE-5408 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5408 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Arun C Murthy > Fix For: 1.3.0 > > Attachments: MAPREDUCE-336_branch1.patch > > > It would be nice to be able to configure the logging level of the Task JVM's > separately from the server JVM's. Reducing logging substantially increases > performance and reduces the consumption of local disk on the task trackers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
[ https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-5402: -- Attachment: MAPREDUCE-5402.2.patch Fixed to pass findbugs warnings. > DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE > -- > > Key: MAPREDUCE-5402 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp, mrv2 >Reporter: David Rosenstrauch >Assignee: Tsuyoshi OZAWA > Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch > > > In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author > describes the implementation of DynamicInputFormat, with one of the main > motivations cited being to reduce the chance of long-tails where a few > leftover mappers run much longer than the rest. > However, I today ran into a situation where I experienced exactly such a long > tail using DistCpV2 and DynamicInputFormat. And when I tried to alleviate > the problem by overriding the number of mappers and the split ratio used by > the DynamicInputFormat, I was prevented from doing so by the hard-coded limit > set in the code by the MAX_CHUNKS_TOLERABLE constant. (Currently set to 400.) > This constant is actually set quite low for production use. (See a > description of my use case below.) And although MAPREDUCE-2765 states that > this is an "overridable maximum", when reading through the code there does > not actually appear to be any mechanism available to override it. > This should be changed. It should be possible to expand the maximum # of > chunks beyond this arbitrary limit. > For example, here is the situation I ran into today: > I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots. > The job consisted of copying ~2800 files from HDFS to Amazon S3. I overrode > the number of mappers for the job from the default of 20 to 128, so as to > more properly parallelize the copy across the cluster. The number of chunk > files created was calculated as 241, and mapred.num.entries.per.chunk was > calculated as 12. > As the job ran on, it reached a point where there were only 4 remaining map > tasks, which had each been running for over 2 hours. The reason for this was > that each of the 12 files that those mappers were copying were quite large > (several hundred megabytes in size) and took ~20 minutes each. However, > during this time, all the other 124 mappers sat idle. > In theory I should be able to alleviate this problem with DynamicInputFormat. > If I were able to, say, quadruple the number of chunk files created, that > would have made each chunk contain only 3 files, and these large files would > have gotten distributed better around the cluster and copied in parallel. > However, when I tried to do that - by overriding mapred.listing.split.ratio > to, say, 10 - DynamicInputFormat responded with an exception ("Too many > chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease > split-ratio to proceed.") - presumably because I exceeded the > MAX_CHUNKS_TOLERABLE value of 400. > Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit? I > can't personally see any. > If this limit has no particular logic behind it, then it should be > overridable - or even better: removed altogether. After all, I'm not sure I > see any need for it. Even if numMaps * splitRatio resulted in an > extraordinarily large number, if the code were modified so that the number of > chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then > there would be no need for MAX_CHUNKS_TOLERABLE. In this worst-case scenario > where the product of numMaps and splitRatio is large, capping the number of > chunks at the number of files (numberOfChunks = numberOfFiles) would result > in 1 file per chunk - the maximum parallelization possible. That may not be > the best-tuned solution for some users, but I would think that it should be > left up to the user to deal with the potential consequence of not having > tuned their job properly. Certainly that would be better than having an > arbitrary hard-coded limit that *prevents* proper parallelization when > dealing with large files and/or large numbers of mappers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5379) Include FS delegation token ID in job conf
[ https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716386#comment-13716386 ] Alejandro Abdelnur commented on MAPREDUCE-5379: --- [~daryn], ping. > Include FS delegation token ID in job conf > -- > > Key: MAPREDUCE-5379 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission, security >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379.patch > > > Making a job's FS delegation token ID accessible will allow external services > to associate it with the file system operations it performs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5410) MapReduce output issue
Mullangi created MAPREDUCE-5410: --- Summary: MapReduce output issue Key: MAPREDUCE-5410 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5410 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples, job submission Affects Versions: 1.0.3 Environment: ubuntu Reporter: Mullangi Hi, I am new to Hadoop concepts. While practicing with one custom MapReduce program, I found the result is not as expected after executing the code on HDFS based file. Please note that when I execute the same program using Unix based file,getting expected result. Below are the details of my code. MapReduce in java == import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.util.*; public class WordCount1 { public static class Map extends MapReduceBase implements Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); String tokenedZone=null; StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { tokenedZone=tokenizer.nextToken(); word.set(tokenedZone); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements Reducer { public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; int val = 0; while (values.hasNext()) { val = values.next().get(); sum += val; } if(sum>1) output.collect(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(); conf.setJarByClass(WordCount1.class); conf.setJobName("wordcount1"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); Path inPath = new Path(args[0]); Path outPath = new Path(args[0]); FileInputFormat.setInputPaths(conf,inPath ); FileOutputFormat.setOutputPath(conf, outPath); JobClient.runJob(conf); } } input File === test my program during test and my hadoop your during get program hadoop generated output file on HDFS file system === during 2 my 2 test2 hadoop generated output file on local file system === during 2 my 2 program 2 test2 Please help me on this issue -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5409) MRAppMaster throws InvalidStateTransitonException: Invalid event: TA_TOO_MANY_FETCH_FAILURE at KILLED for TaskAttemptImpl
Devaraj K created MAPREDUCE-5409: Summary: MRAppMaster throws InvalidStateTransitonException: Invalid event: TA_TOO_MANY_FETCH_FAILURE at KILLED for TaskAttemptImpl Key: MAPREDUCE-5409 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5409 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.5-alpha Reporter: Devaraj K Assignee: Devaraj K {code:xml} 2013-07-23 12:28:05,217 INFO [IPC Server handler 29 on 50796] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1374560536158_0003_m_40_0 is : 0.0 2013-07-23 12:28:05,221 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures for output of task attempt: attempt_1374560536158_0003_m_07_0 ... raising fetch failure to map 2013-07-23 12:28:05,222 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1374560536158_0003_m_07_0 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_TOO_MANY_FETCH_FAILURE at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1032) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:143) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1123) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1115) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:662) 2013-07-23 12:28:05,249 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1374560536158_0003Job Transitioned from RUNNING to ERROR 2013-07-23 12:28:05,338 INFO [IPC Server handler 16 on 50796] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1374560536158_0003_m_40_0 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5408) CLONE - The logging level of the tasks should be configurable by the job
[ https://issues.apache.org/jira/browse/MAPREDUCE-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716202#comment-13716202 ] Hadoop QA commented on MAPREDUCE-5408: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593650/MAPREDUCE-336_branch1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3877//console This message is automatically generated. > CLONE - The logging level of the tasks should be configurable by the job > > > Key: MAPREDUCE-5408 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5408 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Arun C Murthy > Fix For: 1.3.0 > > Attachments: MAPREDUCE-336_branch1.patch > > > It would be nice to be able to configure the logging level of the Task JVM's > separately from the server JVM's. Reducing logging substantially increases > performance and reduces the consumption of local disk on the task trackers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5408) CLONE - The logging level of the tasks should be configurable by the job
[ https://issues.apache.org/jira/browse/MAPREDUCE-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-5408: - Status: Patch Available (was: Open) > CLONE - The logging level of the tasks should be configurable by the job > > > Key: MAPREDUCE-5408 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5408 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Arun C Murthy > Fix For: 1.3.0 > > Attachments: MAPREDUCE-336_branch1.patch > > > It would be nice to be able to configure the logging level of the Task JVM's > separately from the server JVM's. Reducing logging substantially increases > performance and reduces the consumption of local disk on the task trackers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira