[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739303#comment-13739303 ] Hitesh Shah commented on YARN-1055: --- [~kkambatl] In case of a network issue where the AM is running but cannot talk to the RM or say the NM on which the AM was running goes down, what knob would control handling these situations? > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
[ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739304#comment-13739304 ] Hadoop QA commented on YARN-292: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597896/YARN-292.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1710//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1710//console This message is automatically generated. > ResourceManager throws ArrayIndexOutOfBoundsException while handling > CONTAINER_ALLOCATED for application attempt > > > Key: YARN-292 > URL: https://issues.apache.org/jira/browse/YARN-292 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.0.1-alpha >Reporter: Devaraj K >Assignee: Zhijie Shen > Attachments: YARN-292.1.patch > > > {code:xml} > 2012-12-26 08:41:15,030 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: > Calling allocate on removed or non existant application > appattempt_1356385141279_49525_01 > 2012-12-26 08:41:15,031 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type CONTAINER_ALLOCATED for applicationAttempt > application_1356385141279_49525 > java.lang.ArrayIndexOutOfBoundsException: 0 > at java.util.Arrays$ArrayList.get(Arrays.java:3381) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
[ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739302#comment-13739302 ] Junping Du commented on YARN-292: - Thanks for the patch. Zhijie! Patch looks good to me. However, I would suggest to document why at least one container is expected in allocation or adding no empty check on getContainers(). > ResourceManager throws ArrayIndexOutOfBoundsException while handling > CONTAINER_ALLOCATED for application attempt > > > Key: YARN-292 > URL: https://issues.apache.org/jira/browse/YARN-292 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.0.1-alpha >Reporter: Devaraj K >Assignee: Zhijie Shen > Attachments: YARN-292.1.patch > > > {code:xml} > 2012-12-26 08:41:15,030 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: > Calling allocate on removed or non existant application > appattempt_1356385141279_49525_01 > 2012-12-26 08:41:15,031 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type CONTAINER_ALLOCATED for applicationAttempt > application_1356385141279_49525 > java.lang.ArrayIndexOutOfBoundsException: 0 > at java.util.Arrays$ArrayList.get(Arrays.java:3381) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
[ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-292: - Attachment: YARN-292.1.patch Created a patch to use ConcurrentHashMap for applications in FifoScheduler and FairScheduler, which will make accessing applications thread-safe. > ResourceManager throws ArrayIndexOutOfBoundsException while handling > CONTAINER_ALLOCATED for application attempt > > > Key: YARN-292 > URL: https://issues.apache.org/jira/browse/YARN-292 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.0.1-alpha >Reporter: Devaraj K >Assignee: Zhijie Shen > Attachments: YARN-292.1.patch > > > {code:xml} > 2012-12-26 08:41:15,030 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: > Calling allocate on removed or non existant application > appattempt_1356385141279_49525_01 > 2012-12-26 08:41:15,031 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type CONTAINER_ALLOCATED for applicationAttempt > application_1356385141279_49525 > java.lang.ArrayIndexOutOfBoundsException: 0 > at java.util.Arrays$ArrayList.get(Arrays.java:3381) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-451) Add more metrics to RM page
[ https://issues.apache.org/jira/browse/YARN-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739225#comment-13739225 ] Vinod Kumar Vavilapalli commented on YARN-451: -- Agreed about having it on the listing page, but that page is already dense. Have to do some basic UI design. Again, like I mentioned, Hadoop-1 was different as number of maps, reduces doesn't change after job starts. Whereas in Hadoop-2, memory/cores allocated slowly increases over time , so it may or may not be of much use. I am ambivalent about adding it. > Add more metrics to RM page > --- > > Key: YARN-451 > URL: https://issues.apache.org/jira/browse/YARN-451 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.0.3-alpha >Reporter: Lohit Vijayarenu >Priority: Minor > > ResourceManager webUI shows list of RUNNING applications, but it does not > tell which applications are requesting more resource compared to others. With > cluster running hundreds of applications at once it would be useful to have > some kind of metric to show high-resource usage applications vs low-resource > usage ones. At the minimum showing number of containers is good option. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-993) job can not recovery after restart resourcemanager
[ https://issues.apache.org/jira/browse/YARN-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739219#comment-13739219 ] prophy Yan commented on YARN-993: - Jian He i have tryed the patch file in the YARN-513 list,but some error occur when i use the patch. my test version is hadoop2.0.5-alpha,so can this patch work with this version? thank you. > job can not recovery after restart resourcemanager > -- > > Key: YARN-993 > URL: https://issues.apache.org/jira/browse/YARN-993 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha > Environment: CentOS5.3 JDK1.7.0_11 >Reporter: prophy Yan >Priority: Critical > > Recently, i have test the function job recovery in the YARN framework, but it > failed. > first, i run the wordcount example program, and the i kill -9 the > resourcemanager process on the server when the wordcount process in map 100%. > the job will exit with error in minutes. > second, i restart the resourcemanager on the server by user the > 'start-yarn.sh' command. but, the failed job(wordcount) can not to continue. > the yarn log says "file not exist!" > Here is the YARN log: > 013-07-23 16:05:21,472 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done > launching container Container: [ContainerId: > container_1374564764970_0001_02_01, NodeId: mv8.mzhen.cn:52117, > NodeHttpAddress: mv8.mzhen.cn:8042, Resource: , > Priority: 0, State: NEW, Token: null, Status: container_id {, app_attempt_id > {, application_id {, id: 1, cluster_timestamp: 1374564764970, }, attemptId: > 2, }, id: 1, }, state: C_NEW, ] for AM appattempt_1374564764970_0001_02 > 2013-07-23 16:05:21,473 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1374564764970_0001_02 State change from ALLOCATED to LAUNCHED > 2013-07-23 16:05:21,925 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1374564764970_0001_02 State change from LAUNCHED to FAILED > 2013-07-23 16:05:21,925 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application > application_1374564764970_0001 failed 1 times due to AM Container for > appattempt_1374564764970_0001_02 exited with exitCode: -1000 due to: > RemoteTrace: > java.io.FileNotFoundException: File does not exist: > hdfs://ns1:8020/tmp/hadoop-yarn/staging/supertool/.staging/job_1374564764970_0001/appTokens > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:815) > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:176) > at > org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:51) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:284) > at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:282) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:280) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:51) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > at LocalTrace: > org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: > File does not exist: > hdfs://ns1:8020/tmp/hadoop-yarn/staging/supertool/.staging/job_1374564764970_0001/appTokens > at > org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217) > at > org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:819) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService
[jira] [Commented] (YARN-1060) Two tests in TestFairScheduler are missing @Test annotation
[ https://issues.apache.org/jira/browse/YARN-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739218#comment-13739218 ] Hudson commented on YARN-1060: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4256 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4256/]) YARN-1060. Two tests in TestFairScheduler are missing @Test annotation (Niranjan Singh via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java > Two tests in TestFairScheduler are missing @Test annotation > --- > > Key: YARN-1060 > URL: https://issues.apache.org/jira/browse/YARN-1060 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Niranjan Singh > Labels: newbie > Fix For: 2.3.0 > > Attachments: YARN-1060.patch > > > Amazingly, these tests appear to pass with the annotations added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1060) Two tests in TestFairScheduler are missing @Test annotation
[ https://issues.apache.org/jira/browse/YARN-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739216#comment-13739216 ] Sandy Ryza commented on YARN-1060: -- Committed to trunk and branch-2. Thanks Niranjan! > Two tests in TestFairScheduler are missing @Test annotation > --- > > Key: YARN-1060 > URL: https://issues.apache.org/jira/browse/YARN-1060 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Niranjan Singh > Labels: newbie > Attachments: YARN-1060.patch > > > Amazingly, these tests appear to pass with the annotations added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-451) Add more metrics to RM page
[ https://issues.apache.org/jira/browse/YARN-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739213#comment-13739213 ] Sangjin Lee commented on YARN-451: -- I think showing this information on the app list page is actually more valuable than the per-app page. If this information is present in the app list page, one can quickly scan the list and get a sense of which job/app is bigger than others in terms of resource consumption. Also, it makes sorting possible. One could in theory visit individual per-app pages one by one to get the same information, but it's so much more useful to have it ready at the overview page so one can get that information quickly. In hadoop 1.0, one could get the same information by looking at the number of total mappers and reducers. That way, we got a very good idea on which ones are big jobs (and thus need to be monitored more closely) without drilling into any of the apps. > Add more metrics to RM page > --- > > Key: YARN-451 > URL: https://issues.apache.org/jira/browse/YARN-451 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.0.3-alpha >Reporter: Lohit Vijayarenu >Priority: Minor > > ResourceManager webUI shows list of RUNNING applications, but it does not > tell which applications are requesting more resource compared to others. With > cluster running hundreds of applications at once it would be useful to have > some kind of metric to show high-resource usage applications vs low-resource > usage ones. At the minimum showing number of containers is good option. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1060) Two tests in TestFairScheduler are missing @Test annotation
[ https://issues.apache.org/jira/browse/YARN-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739211#comment-13739211 ] Sandy Ryza commented on YARN-1060: -- +1 > Two tests in TestFairScheduler are missing @Test annotation > --- > > Key: YARN-1060 > URL: https://issues.apache.org/jira/browse/YARN-1060 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Niranjan Singh > Labels: newbie > Attachments: YARN-1060.patch > > > Amazingly, these tests appear to pass with the annotations added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1061) NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager.
[ https://issues.apache.org/jira/browse/YARN-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739178#comment-13739178 ] Rohith Sharma K S commented on YARN-1061: - Actual issue I got in 5 node cluster (1 RM and 5 NM).It is hard to recure scenario for resourcemanager is hang up state in real cluster. The same scenario can be simulated manually bringing resourcemanager to hang up state with help of linux command "KILL -STOP ". All the NM->RM call wait indefinitely. Another case where we can observer indefinite wait is "Add new NodeManager when ResouceMangaer is hang up state". > NodeManager is indefinitely waiting for nodeHeartBeat() response from > ResouceManager. > - > > Key: YARN-1061 > URL: https://issues.apache.org/jira/browse/YARN-1061 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.5-alpha >Reporter: Rohith Sharma K S > > It is observed that in one of the scenario, NodeManger is indefinetly waiting > for nodeHeartbeat response from ResouceManger where ResouceManger is in > hanged up state. > NodeManager should get timeout exception instead of waiting indefinetly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739109#comment-13739109 ] Karthik Kambatla commented on YARN-1055: >From a YARN-user POV, I see it differently. I want to control whether my app >should be recovered on AM/RM failures separately. I might want to recover on >RM restart but not on AM failures or viceversa: # In case of AM failure, user might want to check for user errors and hence not recover. But recover in case of RM failures. # Like Oozie, might want to recover on AM failures but not on RM failures. Also, is there a disadvantage to having two knobs for the two failures? > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739073#comment-13739073 ] Bikas Saha commented on YARN-1055: -- Restart on am failure is already determined by the default value of max am retries in yarn config. Setting that to 1 will prevent RM from restarting AM's on failure. Thus no need for new config. Restart after RM restart is already covered by setting max am retries to 1 by the app client on app submission. If an app cannot handle this situation it should create its own config and set the correct value of 1 on submission. YARN should not add a config IMO. If I remember right, this config is being imported from hadoop 1 and the impl of this config in hadoop 1 is what RM already does to handle user defined max am retries. > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739040#comment-13739040 ] Vinod Kumar Vavilapalli commented on YARN-1055: --- This is a new issue with Hadoop 2 completely - we've added new failure conditions. All the apps handing AM restarts is really the right way forward given AMs can now run on random compute nodes that can just fail any time. Offline I started engaging some of Pig/Hive community folks. For MR, enough work is already done. Oozie needs to follow suit too. Till work-preserving restart is finished, this is a real pain on RM restarts. Which is why I am proposing that oozie set max-attempts to 1 for its launcher action so that there are no split brain issues - RM restart or otherwise. Oozie has a retry mechanism anyways which will then be submitted as a new application. Adding a separate knob just for restart is a hack I don't see any value of. If I read your proposal correctly, for launcher jobs, you will set restart.am.on.rm.restart to 1 and restart.am.on.on.failure > 1. Right? Which is not correct as I repeated - node failures will cause the same split brain issues. > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739027#comment-13739027 ] Alejandro Abdelnur commented on YARN-1055: -- [~vinodkv], in theory I agree with you. In practice, there are 2 issues we Oozie cannot address in the short term: * 1. Oozie still using a a launcher MRAM * 2. mr/pig/hive/sqoop/distcp/... fat clients which are not aware of Yarn restart/recovery. #1 will be addressed when Oozie implements an OozieLauncherAM instead piggybacking on an MR Map as driver. #2 it is more complicated and I don't see this one be addressed in the short/medium term. By having distinct knobs differentiating recover after AM failure and after RM restart Oozie can handle/recover jobs on the same set of failure scenarios possible with Hadoop 1. In order to get folks into Yarn we need to provide functional parity. I suggest having the 2 knobs Karthik proposed {{restart.am.on.rm.restart}} and {{restart.am.on.on.failure}} with {{restart.am.on.rm.restart=$restar.am.on.am.failure}}. Does this sound reasonable? > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1058) Recovery issues on RM Restart with FileSystemRMStateStore
[ https://issues.apache.org/jira/browse/YARN-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739013#comment-13739013 ] Bikas Saha commented on YARN-1058: -- It could be that history service was not properly shutdown in the first AM. Earlier, the AM would receive proper reboot command from the RM and would shutdown properly based on the reboot flag being set. Now the AM is getting an exception from the RM and so not shutting down properly. This should get fixed when we refresh the AM RM token from the saved value. > Recovery issues on RM Restart with FileSystemRMStateStore > - > > Key: YARN-1058 > URL: https://issues.apache.org/jira/browse/YARN-1058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > > App recovery doesn't work as expected using FileSystemRMStateStore. > Steps to reproduce: > - Ran sleep job with a single map and sleep time of 2 mins > - Restarted RM while the map task is still running > - The first attempt fails with the following error > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > Password not found for ApplicationAttempt > appattempt_1376294441253_0001_01 > at org.apache.hadoop.ipc.Client.call(Client.java:1404) > at org.apache.hadoop.ipc.Client.call(Client.java:1357) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at $Proxy28.finishApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:91) > {noformat} > - The second attempt fails with a different error: > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on > /tmp/hadoop-yarn/staging/kasha/.staging/job_1376294441253_0001/job_1376294441253_0001_2.jhist: > File does not exist. Holder DFSClient_NONMAPREDUCE_389533538_1 does not have > any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2543) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2454) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:534) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48073) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
[ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739011#comment-13739011 ] Zhijie Shen commented on YARN-292: -- Did more investigation on this issue: {code} 2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_01 {code} This log indicates that ArrayIndexOutOfBoundsException happens because the application is not found. There're three possibilities where the application is not found: 1. The application hasn't been added into FiFoScheduler#applications. If it is the case, FiFoScheduler will not send APP_ACCEPTED event to the corresponding RMAppAttemptImpl. Without APP_ACCEPTED event, RMAppAttemptImpl will not enter SCHEDULED state, and will not go through AMContainerAllocatedTransition to ALLOCATED_SAVING consequently. Therefore, this case is impossible. 2. The application has already been removed from FiFoScheduler#applications. To trigger the removal operation, the corresponding RMAppAttemptImpl needs to go through BaseFinalTransition. It is worth mentioning first that RMAppAttemptImpl's transitions are executed on the thread of AsyncDispatcher, while YarnScheduler#handle is invoked on the thread of SchedulerEventDispatcher. The two threads will execute in parallel, indicating that the process of an RMAppAttemptEvent and that of a SchedulerEvent may interpolate. However, the processes of two RMAppAttemptEvents or two SchedulerEvents will not. Therefore, AMContainerAllocatedTransition will not start before RMAppAttemptImpl has already finished BaseFinalTransition. Nevertheless, when RMAppAttemptImpl goes through BaseFinalTransition, it will enter an final state as well, such that AMContainerAllocatedTransition will not happen at all. In conclusion, this case is impossible as well. 3. The application is in FiFoScheduler#applications, but RMAppAttemptImpl doesn't get it. First of all, FiFoScheduler#applications is a TreeMap, which is not thread safe (FairScheduler#applications is a HashMap while CapcityScheduler#applications is a ConcurrentHashMap). Second, the methods of accessing the map are not consistently synchronized, thus, read and write on the same map can operate simultaneously. RMAppAttemptImpl on the thread of AsyncDispatcher will eventually call FiFoScheduler#applications#get in AMContainerAllocatedTransition, while FiFoScheduler on thread of SchedulerEventDispatcher will use FiFoScheduler#applications#add|remove. Therefore, getting null when the application actually exists happens under a big number of concurrent operations. Please feel free to correct me if you think there's something wrong or missing with the analysis. I'm going to work on a patch to fix the problem. > ResourceManager throws ArrayIndexOutOfBoundsException while handling > CONTAINER_ALLOCATED for application attempt > > > Key: YARN-292 > URL: https://issues.apache.org/jira/browse/YARN-292 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.0.1-alpha >Reporter: Devaraj K >Assignee: Zhijie Shen > > {code:xml} > 2012-12-26 08:41:15,030 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: > Calling allocate on removed or non existant application > appattempt_1356385141279_49525_01 > 2012-12-26 08:41:15,031 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type CONTAINER_ALLOCATED for applicationAttempt > application_1356385141279_49525 > java.lang.ArrayIndexOutOfBoundsException: 0 > at java.util.Arrays$ArrayList.get(Arrays.java:3381) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) > at > org.apache.hadoop.yarn.server.resour
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738985#comment-13738985 ] Sandy Ryza commented on YARN-1024: -- I've been thinking a lot about this, and wanted to propose a modified approach, inspired by an offline discussion with Arun and his max-vcores idea (https://issues.apache.org/jira/browse/YARN-1024?focusedCommentId=13730074&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13730074). First, my assumptions about how CPUs work: * A CPU is essentially a bathtub full of processing power that can be doled out to threads, with a limit per thread based on the power of each core within it. * To give X processing power to a thread means that within a standard unit of time, roughly some number of instructions proportional to X can be executed for that thread. * No more than a certain amount of processing power (the amount of processing power per core) can be given to each thread. * We can use CGroups to say that a task gets some fraction of the system's processing power. * This means that if we have 5 cores with Y processing power each, we can give 5 threads Y processing power each, or 6 threads 5Y/6 processing power each, but we can't give 4 threads 5Y/4 processing power each. * It never makes sense to use CGroups assign a higher fraction of the system's processing power than (numthreads the task can take advantage of / number of cores) to a task. * Equivalently, if my CPU has X processing power per core, it never makes sense to assign more than (numthreads the task can take advantage of) * X processing power to a task. So as long as we account for that last constraint, we can essentially view processing power as a fluid resource like memory. With this in mind, we can: 1. Split virtual cores into cores and yarnComputeUnitsPerCore. Requests can include both and nodes can be configured with both. 2. Have a cluster-defined maxComputeUnitsPerCore, which would be the smallest yarnComputeUnitsPerCore on any node. We min all yarnComputeUnitsPerCore requests with this number when they hit the RM. 3. Use YCUs, not cores, for scheduling. I.e. the scheduler thinks of a node's CPU capacity in terms of the number of YCUs it can handle and thinks of a resource's CPU request in terms of its (normalized yarnComputeUnitsPerCore * # cores). We use YCUs for DRF. 4. If we make YCUs small enough, no need for fractional anything. This reduces to a number-of-cores-based approach if all containers are requested with yarnComputeUnitsPerCore=infinity, and reduces to a YCU approach if maxComputeUnitsPerCore is set to infinity. Predictability, simplicity, and scheduling flexibility can be traded off per cluster without overloading the same concept with multiple definitions. This doesn't take into account heteregeneous hardware within a cluster, but I think (2) can be tweaked to handle this by holding a value for each node (can elaborate on how this would work). It also doesn't take into account pinning threads to CPUs, but I don't think it's any less extensible for ultimately dealing with this than other proposals. Sorry for the longwindedness. Bobby, would this provide the flexibility you're looking for? > Define a virtual core unambigiously > --- > > Key: YARN-1024 > URL: https://issues.apache.org/jira/browse/YARN-1024 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > We need to clearly define the meaning of a virtual core unambiguously so that > it's easy to migrate applications between clusters. > For e.g. here is Amazon EC2 definition of ECU: > http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it > Essentially we need to clearly define a YARN Virtual Core (YVC). > Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the > equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-337) RM handles killed application tracking URL poorly
[ https://issues.apache.org/jira/browse/YARN-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738968#comment-13738968 ] Thomas Graves commented on YARN-337: +1 looks good. Thanks Jason! Feel free to commit it. > RM handles killed application tracking URL poorly > - > > Key: YARN-337 > URL: https://issues.apache.org/jira/browse/YARN-337 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Jason Lowe > Labels: usability > Attachments: YARN-337.patch > > > When the ResourceManager kills an application, it leaves the proxy URL > redirecting to the original tracking URL for the application even though the > ApplicationMaster is no longer there to service it. It should redirect it > somewhere more useful, like the RM's web page for the application, where the > user can find that the application was killed and links to the AM logs. > In addition, sometimes the AM during teardown from the kill can attempt to > unregister and provide an updated tracking URL, but unfortunately the RM has > "forgotten" the AM due to the kill and refuses to process the unregistration. > Instead it logs: > {noformat} > 2013-01-09 17:37:49,671 [IPC Server handler 2 on 8030] ERROR > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: > AppAttemptId doesnt exist in cache appattempt_1357575694478_28614_01 > {noformat} > It should go ahead and process the unregistration to update the tracking URL > since the application offered it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1058) Recovery issues on RM Restart with FileSystemRMStateStore
[ https://issues.apache.org/jira/browse/YARN-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738934#comment-13738934 ] Karthik Kambatla commented on YARN-1058: I was expecting the first one, and Bikas is right about the second one. When I kil the job client, the job does finish successfully. However, the AM for the recovered attempt fails to write the history. {noformat} 2013-08-13 13:57:32,440 ERROR [eventHandlingThread] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[eventHandlingThread,5,main] threw an Exception. org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/hadoop-yarn/staging/kasha/.staging/job_1376427059607_0002/job_1376427059607_0002_2.jhist: File does not exist. Holder DFSClient_NONMAPREDUCE_416024880_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2543) ... at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2037) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:514) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$1.run(JobHistoryEventHandler.java:276) at java.lang.Thread.run(Thread.java:662) {noformat} > Recovery issues on RM Restart with FileSystemRMStateStore > - > > Key: YARN-1058 > URL: https://issues.apache.org/jira/browse/YARN-1058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > > App recovery doesn't work as expected using FileSystemRMStateStore. > Steps to reproduce: > - Ran sleep job with a single map and sleep time of 2 mins > - Restarted RM while the map task is still running > - The first attempt fails with the following error > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > Password not found for ApplicationAttempt > appattempt_1376294441253_0001_01 > at org.apache.hadoop.ipc.Client.call(Client.java:1404) > at org.apache.hadoop.ipc.Client.call(Client.java:1357) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at $Proxy28.finishApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:91) > {noformat} > - The second attempt fails with a different error: > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on > /tmp/hadoop-yarn/staging/kasha/.staging/job_1376294441253_0001/job_1376294441253_0001_2.jhist: > File does not exist. Holder DFSClient_NONMAPREDUCE_389533538_1 does not have > any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2543) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2454) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:534) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48073) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-573) Shared data structures in Public Localizer and Private Localizer are not Thread safe.
[ https://issues.apache.org/jira/browse/YARN-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-573: Fix Version/s: 0.23.10 +1 lgtm as well, thanks Mit and Omkar! I committed this to branch-0.23. > Shared data structures in Public Localizer and Private Localizer are not > Thread safe. > - > > Key: YARN-573 > URL: https://issues.apache.org/jira/browse/YARN-573 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi >Priority: Critical > Fix For: 3.0.0, 0.23.10, 2.1.1-beta > > Attachments: YARN-573-20130730.1.patch, YARN-573-20130731.1.patch, > YARN-573.branch-0.23-08132013.patch > > > PublicLocalizer > 1) pending accessed by addResource (part of event handling) and run method > (as a part of PublicLocalizer.run() ). > PrivateLocalizer > 1) pending accessed by addResource (part of event handling) and > findNextResource (i.remove()). Also update method should be fixed. It too is > sharing pending list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1060) Two tests in TestFairScheduler are missing @Test annotation
[ https://issues.apache.org/jira/browse/YARN-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738851#comment-13738851 ] Hadoop QA commented on YARN-1060: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597684/YARN-1060.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1709//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1709//console This message is automatically generated. > Two tests in TestFairScheduler are missing @Test annotation > --- > > Key: YARN-1060 > URL: https://issues.apache.org/jira/browse/YARN-1060 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Niranjan Singh > Labels: newbie > Attachments: YARN-1060.patch > > > Amazingly, these tests appear to pass with the annotations added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-573) Shared data structures in Public Localizer and Private Localizer are not Thread safe.
[ https://issues.apache.org/jira/browse/YARN-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738845#comment-13738845 ] Omkar Vinit Joshi commented on YARN-573: +1 ..lgtm for branch 0.23 > Shared data structures in Public Localizer and Private Localizer are not > Thread safe. > - > > Key: YARN-573 > URL: https://issues.apache.org/jira/browse/YARN-573 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi >Priority: Critical > Fix For: 3.0.0, 2.1.1-beta > > Attachments: YARN-573-20130730.1.patch, YARN-573-20130731.1.patch, > YARN-573.branch-0.23-08132013.patch > > > PublicLocalizer > 1) pending accessed by addResource (part of event handling) and run method > (as a part of PublicLocalizer.run() ). > PrivateLocalizer > 1) pending accessed by addResource (part of event handling) and > findNextResource (i.remove()). Also update method should be fixed. It too is > sharing pending list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-573) Shared data structures in Public Localizer and Private Localizer are not Thread safe.
[ https://issues.apache.org/jira/browse/YARN-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-573: --- Attachment: YARN-573.branch-0.23-08132013.patch Patch ported to Branch-23 > Shared data structures in Public Localizer and Private Localizer are not > Thread safe. > - > > Key: YARN-573 > URL: https://issues.apache.org/jira/browse/YARN-573 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi >Priority: Critical > Fix For: 3.0.0, 2.1.1-beta > > Attachments: YARN-573-20130730.1.patch, YARN-573-20130731.1.patch, > YARN-573.branch-0.23-08132013.patch > > > PublicLocalizer > 1) pending accessed by addResource (part of event handling) and run method > (as a part of PublicLocalizer.run() ). > PrivateLocalizer > 1) pending accessed by addResource (part of event handling) and > findNextResource (i.remove()). Also update method should be fixed. It too is > sharing pending list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1036) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/YARN-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738754#comment-13738754 ] Jason Lowe commented on YARN-1036: -- +1 lgtm as well. Committing this. > Distributed Cache gives inconsistent result if cache files get deleted from > task tracker > - > > Key: YARN-1036 > URL: https://issues.apache.org/jira/browse/YARN-1036 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 0.23.9 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-1036.branch-0.23.patch, > YARN-1036.branch-0.23.patch, YARN-1036.branch-0.23.patch > > > This is a JIRA to backport MAPREDUCE-4342. I had to open a new JIRA because > that one had been closed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1056) Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
[ https://issues.apache.org/jira/browse/YARN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738686#comment-13738686 ] Hadoop QA commented on YARN-1056: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597782/yarn-1056-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1708//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1708//console This message is automatically generated. > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > > > Key: YARN-1056 > URL: https://issues.apache.org/jira/browse/YARN-1056 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Labels: conf > Attachments: yarn-1056-1.patch, yarn-1056-1.patch, yarn-1056-2.patch > > > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > to have a *resourcemanager* only once, make them consistent with other such > yarn configs and add entries in yarn-default.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1056) Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
[ https://issues.apache.org/jira/browse/YARN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738651#comment-13738651 ] Jian He commented on YARN-1056: --- Looks good, +1 > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > > > Key: YARN-1056 > URL: https://issues.apache.org/jira/browse/YARN-1056 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Labels: conf > Attachments: yarn-1056-1.patch, yarn-1056-1.patch, yarn-1056-2.patch > > > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > to have a *resourcemanager* only once, make them consistent with other such > yarn configs and add entries in yarn-default.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-435) Make it easier to access cluster topology information in an AM
[ https://issues.apache.org/jira/browse/YARN-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738647#comment-13738647 ] shenhong commented on YARN-435: --- Firstly, if AM get all nodes in the cluster including their rack information by calling RM. This will increase pressure on the RM's network. For example, the cluster had more than 5000 datanodes. Secondly, if the yarn cluster only has 100 nodemanagers, but the hdfs it accessed is a cluster with more than 5000 datanodes, we can't get all the nodes including their rack information. However, AM need all the datanode information in it's job.splitmetainfo file, in order to init TaskAttempt. In this case, we can't get all nodes by calling RM. > Make it easier to access cluster topology information in an AM > -- > > Key: YARN-435 > URL: https://issues.apache.org/jira/browse/YARN-435 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Omkar Vinit Joshi > > ClientRMProtocol exposes a getClusterNodes api that provides a report on all > nodes in the cluster including their rack information. > However, this requires the AM to open and establish a separate connection to > the RM in addition to one for the AMRMProtocol. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1056) Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
[ https://issues.apache.org/jira/browse/YARN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1056: --- Attachment: yarn-1056-2.patch Updated fs config to be fs.state-store.uri instead of fs.rm-state-store.uri > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > > > Key: YARN-1056 > URL: https://issues.apache.org/jira/browse/YARN-1056 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Labels: conf > Attachments: yarn-1056-1.patch, yarn-1056-1.patch, yarn-1056-2.patch > > > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > to have a *resourcemanager* only once, make them consistent with other such > yarn configs and add entries in yarn-default.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1030) Adding AHS as service of RM
[ https://issues.apache.org/jira/browse/YARN-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738622#comment-13738622 ] Hadoop QA commented on YARN-1030: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597781/YARN-1030.2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1707//console This message is automatically generated. > Adding AHS as service of RM > --- > > Key: YARN-1030 > URL: https://issues.apache.org/jira/browse/YARN-1030 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-1030.1.patch, YARN-1030.2.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1030) Adding AHS as service of RM
[ https://issues.apache.org/jira/browse/YARN-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1030: -- Attachment: YARN-1030.2.patch Thanks [~devaraj.k] for your review. I've updated patch according to your comments. If YARN-953 is committed first, I'll remove the change in pom.xml in this patch. Now I keep the change not to break the build. > Adding AHS as service of RM > --- > > Key: YARN-1030 > URL: https://issues.apache.org/jira/browse/YARN-1030 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-1030.1.patch, YARN-1030.2.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-1062) MRAppMaster take a long time to init taskAttempt
[ https://issues.apache.org/jira/browse/YARN-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shenhong resolved YARN-1062. Resolution: Duplicate > MRAppMaster take a long time to init taskAttempt > > > Key: YARN-1062 > URL: https://issues.apache.org/jira/browse/YARN-1062 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 0.23.6 >Reporter: shenhong > > In our cluster, MRAppMaster take a long time to init taskAttempt, the > following log last one minute, > 2013-07-17 11:28:06,328 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r01f11012.yh.aliyun.com to > /r01f11 > 2013-07-17 11:28:06,357 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r01f11004.yh.aliyun.com to > /r01f11 > 2013-07-17 11:28:06,383 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r03b05042.yh.aliyun.com to > /r03b05 > 2013-07-17 11:28:06,384 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1373523419753_4543_m_00_0 TaskAttempt Transitioned from NEW to > UNASSIGNED > 2013-07-17 11:28:06,415 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r03b02006.yh.aliyun.com to > /r03b02 > 2013-07-17 11:28:06,436 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r02f02045.yh.aliyun.com to > /r02f02 > 2013-07-17 11:28:06,457 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r02f02034.yh.aliyun.com to > /r02f02 > 2013-07-17 11:28:06,457 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1373523419753_4543_m_01_0 TaskAttempt Transitioned from NEW to > UNASSIGNED > The reason is: resolved one host to rack almost take 25ms (We resolve the > host to rack by a python script). Our hdfs cluster is more than 4000 > datanodes, then a large input job will take a long time to init TaskAttempt. > Is there any good idea to solve this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1061) NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager.
[ https://issues.apache.org/jira/browse/YARN-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738601#comment-13738601 ] Omkar Vinit Joshi commented on YARN-1061: - Are you able to reproduce this scenario? Can you please enable DEBUG (HADOOP_ROOT_LOGGER & YARN_ROOT_LOGGER) logs and attach them to this jira? How big is your cluster? what is the frequency at which nodemanagers are heartbeating? Can you also attach yarn-site.xml? which version are you using? > NodeManager is indefinitely waiting for nodeHeartBeat() response from > ResouceManager. > - > > Key: YARN-1061 > URL: https://issues.apache.org/jira/browse/YARN-1061 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.5-alpha >Reporter: Rohith Sharma K S > > It is observed that in one of the scenario, NodeManger is indefinetly waiting > for nodeHeartbeat response from ResouceManger where ResouceManger is in > hanged up state. > NodeManager should get timeout exception instead of waiting indefinetly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1062) MRAppMaster take a long time to init taskAttempt
[ https://issues.apache.org/jira/browse/YARN-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738596#comment-13738596 ] shenhong commented on YARN-1062: Thanks Vinod Kumar Vavilapalli, I think YARN-435 is okay to me. > MRAppMaster take a long time to init taskAttempt > > > Key: YARN-1062 > URL: https://issues.apache.org/jira/browse/YARN-1062 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 0.23.6 >Reporter: shenhong > > In our cluster, MRAppMaster take a long time to init taskAttempt, the > following log last one minute, > 2013-07-17 11:28:06,328 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r01f11012.yh.aliyun.com to > /r01f11 > 2013-07-17 11:28:06,357 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r01f11004.yh.aliyun.com to > /r01f11 > 2013-07-17 11:28:06,383 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r03b05042.yh.aliyun.com to > /r03b05 > 2013-07-17 11:28:06,384 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1373523419753_4543_m_00_0 TaskAttempt Transitioned from NEW to > UNASSIGNED > 2013-07-17 11:28:06,415 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r03b02006.yh.aliyun.com to > /r03b02 > 2013-07-17 11:28:06,436 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r02f02045.yh.aliyun.com to > /r02f02 > 2013-07-17 11:28:06,457 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r02f02034.yh.aliyun.com to > /r02f02 > 2013-07-17 11:28:06,457 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1373523419753_4543_m_01_0 TaskAttempt Transitioned from NEW to > UNASSIGNED > The reason is: resolved one host to rack almost take 25ms (We resolve the > host to rack by a python script). Our hdfs cluster is more than 4000 > datanodes, then a large input job will take a long time to init TaskAttempt. > Is there any good idea to solve this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1056) Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
[ https://issues.apache.org/jira/browse/YARN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738595#comment-13738595 ] Karthik Kambatla commented on YARN-1056: [~jianhe], good point. Let me upload a patch including that change shortly. > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > > > Key: YARN-1056 > URL: https://issues.apache.org/jira/browse/YARN-1056 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Labels: conf > Attachments: yarn-1056-1.patch, yarn-1056-1.patch > > > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > to have a *resourcemanager* only once, make them consistent with other such > yarn configs and add entries in yarn-default.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1056) Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
[ https://issues.apache.org/jira/browse/YARN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738582#comment-13738582 ] Hadoop QA commented on YARN-1056: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597773/yarn-1056-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1706//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1706//console This message is automatically generated. > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > > > Key: YARN-1056 > URL: https://issues.apache.org/jira/browse/YARN-1056 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Labels: conf > Attachments: yarn-1056-1.patch, yarn-1056-1.patch > > > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > to have a *resourcemanager* only once, make them consistent with other such > yarn configs and add entries in yarn-default.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1056) Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
[ https://issues.apache.org/jira/browse/YARN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738572#comment-13738572 ] Jian He commented on YARN-1056: --- Hi [~kkambatl], do you think it's also necessary to change 'yarn.resourcemanager.fs.rm-state-store.uri' to 'yarn.resourcemanager.fs.state-store.uri' for consistency with 'zk.state-store' ? > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > > > Key: YARN-1056 > URL: https://issues.apache.org/jira/browse/YARN-1056 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Labels: conf > Attachments: yarn-1056-1.patch, yarn-1056-1.patch > > > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > to have a *resourcemanager* only once, make them consistent with other such > yarn configs and add entries in yarn-default.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738570#comment-13738570 ] Hadoop QA commented on YARN-1021: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597774/YARN-1021.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1705//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1705//console This message is automatically generated. > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738556#comment-13738556 ] Wei Yan commented on YARN-1021: --- Updates of the patch: Reduce the number of threads needed for NMSimulators. Before, each NMSimulator uses one thread (for its AsyncDispatcher). Currently removed AsyncDispatcher and the total number of threads needed only depends on the thread pool size. > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1056) Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
[ https://issues.apache.org/jira/browse/YARN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738560#comment-13738560 ] Arun C Murthy commented on YARN-1056: - Looks fine, I'll commit after jenkins. Thanks [~kkambatl]. > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > > > Key: YARN-1056 > URL: https://issues.apache.org/jira/browse/YARN-1056 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Labels: conf > Attachments: yarn-1056-1.patch, yarn-1056-1.patch > > > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > to have a *resourcemanager* only once, make them consistent with other such > yarn configs and add entries in yarn-default.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1036) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/YARN-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738541#comment-13738541 ] Omkar Vinit Joshi commented on YARN-1036: - +1 ... thanks for updating the patch..lgtm. > Distributed Cache gives inconsistent result if cache files get deleted from > task tracker > - > > Key: YARN-1036 > URL: https://issues.apache.org/jira/browse/YARN-1036 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 0.23.9 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-1036.branch-0.23.patch, > YARN-1036.branch-0.23.patch, YARN-1036.branch-0.23.patch > > > This is a JIRA to backport MAPREDUCE-4342. I had to open a new JIRA because > that one had been closed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1021: -- Attachment: YARN-1021.patch A new patch updates some code in the NMSimulator. > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1056) Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
[ https://issues.apache.org/jira/browse/YARN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1056: --- Target Version/s: 2.1.0-beta (was: 2.1.1-beta) > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > > > Key: YARN-1056 > URL: https://issues.apache.org/jira/browse/YARN-1056 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Labels: conf > Attachments: yarn-1056-1.patch, yarn-1056-1.patch > > > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > to have a *resourcemanager* only once, make them consistent with other such > yarn configs and add entries in yarn-default.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1056) Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
[ https://issues.apache.org/jira/browse/YARN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1056: --- Attachment: yarn-1056-1.patch Reuploading patch to kick Jenkins. > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > > > Key: YARN-1056 > URL: https://issues.apache.org/jira/browse/YARN-1056 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Labels: conf > Attachments: yarn-1056-1.patch, yarn-1056-1.patch > > > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > to have a *resourcemanager* only once, make them consistent with other such > yarn configs and add entries in yarn-default.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1056) Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
[ https://issues.apache.org/jira/browse/YARN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1056: --- Priority: Blocker (was: Major) > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > > > Key: YARN-1056 > URL: https://issues.apache.org/jira/browse/YARN-1056 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Labels: conf > Attachments: yarn-1056-1.patch > > > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > to have a *resourcemanager* only once, make them consistent with other such > yarn configs and add entries in yarn-default.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-979) [YARN-321] Adding application attempt and container to ApplicationHistoryProtocol
[ https://issues.apache.org/jira/browse/YARN-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738519#comment-13738519 ] Zhijie Shen commented on YARN-979: -- There're some high-level comments on the patch: 1. To make the protocol work, it is required to define the corresponding protos in yarn_service.proto and update pplication_history_service.proto 2. The setter of the request/response APIs should be @Public, shouldn't it? 3. It's required to mark ApplicationHistoryProtocol as well. > [YARN-321] Adding application attempt and container to > ApplicationHistoryProtocol > - > > Key: YARN-979 > URL: https://issues.apache.org/jira/browse/YARN-979 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Mayank Bansal >Assignee: Mayank Bansal > Attachments: YARN-979-1.patch > > > Adding application attempt and container to ApplicationHistoryProtocol > Thanks, > Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1036) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/YARN-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738484#comment-13738484 ] Hadoop QA commented on YARN-1036: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597762/YARN-1036.branch-0.23.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1704//console This message is automatically generated. > Distributed Cache gives inconsistent result if cache files get deleted from > task tracker > - > > Key: YARN-1036 > URL: https://issues.apache.org/jira/browse/YARN-1036 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 0.23.9 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-1036.branch-0.23.patch, > YARN-1036.branch-0.23.patch, YARN-1036.branch-0.23.patch > > > This is a JIRA to backport MAPREDUCE-4342. I had to open a new JIRA because > that one had been closed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1036) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/YARN-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated YARN-1036: --- Attachment: YARN-1036.branch-0.23.patch Thanks Jason and Omkar for your comments. Ok. Here is the updated patch which has src/main code exactly like Omkar suggested. I've tested it by using a pendrive to simulate drive failure, and the file is indeed localized again. > Distributed Cache gives inconsistent result if cache files get deleted from > task tracker > - > > Key: YARN-1036 > URL: https://issues.apache.org/jira/browse/YARN-1036 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 0.23.9 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-1036.branch-0.23.patch, > YARN-1036.branch-0.23.patch, YARN-1036.branch-0.23.patch > > > This is a JIRA to backport MAPREDUCE-4342. I had to open a new JIRA because > that one had been closed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1023) [YARN-321] Webservices REST API's support for Application History
[ https://issues.apache.org/jira/browse/YARN-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1023: -- Summary: [YARN-321] Webservices REST API's support for Application History (was: [YARN-321] Weservices REST API's support for Application History) > [YARN-321] Webservices REST API's support for Application History > - > > Key: YARN-1023 > URL: https://issues.apache.org/jira/browse/YARN-1023 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-321 >Reporter: Devaraj K >Assignee: Devaraj K > Attachments: YARN-1023-v0.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1062) MRAppMaster take a long time to init taskAttempt
[ https://issues.apache.org/jira/browse/YARN-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738417#comment-13738417 ] Vinod Kumar Vavilapalli commented on YARN-1062: --- You should definitely see if you can improve your python script by looking at a static resolution file instead of dynamically pinging DNS at run time. That'll clearly improve your performance. Overall, we wish to expose this information to AMs from RM itself so that each AM doesn't need to do this itself. That is tracked via YARN-435. If that okay with you, please close this as duplicate. > MRAppMaster take a long time to init taskAttempt > > > Key: YARN-1062 > URL: https://issues.apache.org/jira/browse/YARN-1062 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 0.23.6 >Reporter: shenhong > > In our cluster, MRAppMaster take a long time to init taskAttempt, the > following log last one minute, > 2013-07-17 11:28:06,328 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r01f11012.yh.aliyun.com to > /r01f11 > 2013-07-17 11:28:06,357 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r01f11004.yh.aliyun.com to > /r01f11 > 2013-07-17 11:28:06,383 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r03b05042.yh.aliyun.com to > /r03b05 > 2013-07-17 11:28:06,384 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1373523419753_4543_m_00_0 TaskAttempt Transitioned from NEW to > UNASSIGNED > 2013-07-17 11:28:06,415 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r03b02006.yh.aliyun.com to > /r03b02 > 2013-07-17 11:28:06,436 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r02f02045.yh.aliyun.com to > /r02f02 > 2013-07-17 11:28:06,457 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r02f02034.yh.aliyun.com to > /r02f02 > 2013-07-17 11:28:06,457 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1373523419753_4543_m_01_0 TaskAttempt Transitioned from NEW to > UNASSIGNED > The reason is: resolved one host to rack almost take 25ms (We resolve the > host to rack by a python script). Our hdfs cluster is more than 4000 > datanodes, then a large input job will take a long time to init TaskAttempt. > Is there any good idea to solve this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1056) Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
[ https://issues.apache.org/jira/browse/YARN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738413#comment-13738413 ] Vinod Kumar Vavilapalli commented on YARN-1056: --- Config changes ARE API changes. If you wish to rename it now itself, mark it as blocker and let the release manager now. Otherwise, you should deprecate this config and add a new one and wait for the next release. I'm okay either ways. > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > > > Key: YARN-1056 > URL: https://issues.apache.org/jira/browse/YARN-1056 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: conf > Attachments: yarn-1056-1.patch > > > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > to have a *resourcemanager* only once, make them consistent with other such > yarn configs and add entries in yarn-default.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1062) MRAppMaster take a long time to init taskAttempt
[ https://issues.apache.org/jira/browse/YARN-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shenhong updated YARN-1062: --- Description: In our cluster, MRAppMaster take a long time to init taskAttempt, the following log last one minute, 2013-07-17 11:28:06,328 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved r01f11012.yh.aliyun.com to /r01f11 2013-07-17 11:28:06,357 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved r01f11004.yh.aliyun.com to /r01f11 2013-07-17 11:28:06,383 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved r03b05042.yh.aliyun.com to /r03b05 2013-07-17 11:28:06,384 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1373523419753_4543_m_00_0 TaskAttempt Transitioned from NEW to UNASSIGNED 2013-07-17 11:28:06,415 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved r03b02006.yh.aliyun.com to /r03b02 2013-07-17 11:28:06,436 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved r02f02045.yh.aliyun.com to /r02f02 2013-07-17 11:28:06,457 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved r02f02034.yh.aliyun.com to /r02f02 2013-07-17 11:28:06,457 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1373523419753_4543_m_01_0 TaskAttempt Transitioned from NEW to UNASSIGNED The reason is: resolved one host to rack almost take 25ms (We resolve the host to rack by a python script). Our hdfs cluster is more than 4000 datanodes, then a large input job will take a long time to init TaskAttempt. Is there any good idea to solve this problem. was: In our cluster, MRAppMaster take a long time to init taskAttempt, the following log last one minute, 2013-07-17 11:28:06,328 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved r01f11012.yh.aliyun.com to /r01f11 2013-07-17 11:28:06,357 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved r01f11004.yh.aliyun.com to /r01f11 2013-07-17 11:28:06,383 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved r03b05042.yh.aliyun.com to /r03b05 2013-07-17 11:28:06,384 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1373523419753_4543_m_00_0 TaskAttempt Transitioned from NEW to UNASSIGNED The reason is: resolved one host to rack almost take 25ms, our hdfs cluster is more than 4000 datanodes, then a large input job will take a long time to init TaskAttempt. Is there any good idea to solve this problem. > MRAppMaster take a long time to init taskAttempt > > > Key: YARN-1062 > URL: https://issues.apache.org/jira/browse/YARN-1062 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 0.23.6 >Reporter: shenhong > > In our cluster, MRAppMaster take a long time to init taskAttempt, the > following log last one minute, > 2013-07-17 11:28:06,328 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r01f11012.yh.aliyun.com to > /r01f11 > 2013-07-17 11:28:06,357 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r01f11004.yh.aliyun.com to > /r01f11 > 2013-07-17 11:28:06,383 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r03b05042.yh.aliyun.com to > /r03b05 > 2013-07-17 11:28:06,384 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1373523419753_4543_m_00_0 TaskAttempt Transitioned from NEW to > UNASSIGNED > 2013-07-17 11:28:06,415 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r03b02006.yh.aliyun.com to > /r03b02 > 2013-07-17 11:28:06,436 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r02f02045.yh.aliyun.com to > /r02f02 > 2013-07-17 11:28:06,457 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved r02f02034.yh.aliyun.com to > /r02f02 > 2013-07-17 11:28:06,457 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1373523419753_4543_m_01_0 TaskAttempt Transitioned from NEW to > UNASSIGNED > The reason is: resolved one host to rack almost take 25ms (We resolve the > host to rack by a python script). Our hdfs cluster is more than 4000 > datanodes, then a large input job will take a long time to init TaskAttempt. > Is there any good idea to solve this
[jira] [Created] (YARN-1062) MRAppMaster take a long time to init taskAttempt
shenhong created YARN-1062: -- Summary: MRAppMaster take a long time to init taskAttempt Key: YARN-1062 URL: https://issues.apache.org/jira/browse/YARN-1062 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 0.23.6 Reporter: shenhong In our cluster, MRAppMaster take a long time to init taskAttempt, the following log last one minute, 2013-07-17 11:28:06,328 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved r01f11012.yh.aliyun.com to /r01f11 2013-07-17 11:28:06,357 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved r01f11004.yh.aliyun.com to /r01f11 2013-07-17 11:28:06,383 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved r03b05042.yh.aliyun.com to /r03b05 2013-07-17 11:28:06,384 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1373523419753_4543_m_00_0 TaskAttempt Transitioned from NEW to UNASSIGNED The reason is: resolved one host to rack almost take 25ms, our hdfs cluster is more than 4000 datanodes, then a large input job will take a long time to init TaskAttempt. Is there any good idea to solve this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1036) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/YARN-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738337#comment-13738337 ] Jason Lowe commented on YARN-1036: -- Agree with Ravi that we should focus on porting the change to 0.23 and fix any issues that also apply to trunk/branch-2 in a separate JIRA. Therefore I agree with Omkar that we should simply break or omit the LOCALIZED case from the switch statement since 0.23 doesn't have localCacheDirectoryManager to match the trunk behavior. Otherwise patch looks good to me. > Distributed Cache gives inconsistent result if cache files get deleted from > task tracker > - > > Key: YARN-1036 > URL: https://issues.apache.org/jira/browse/YARN-1036 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 0.23.9 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-1036.branch-0.23.patch, YARN-1036.branch-0.23.patch > > > This is a JIRA to backport MAPREDUCE-4342. I had to open a new JIRA because > that one had been closed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations
[ https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738175#comment-13738175 ] Alejandro Abdelnur commented on YARN-1008: -- What the patch does: Introduces a new configuration property in the RM, {{RM_SCHEDULER_USE_PORT_FOR_NODE_NAME}} (defaulting to {{false}}). This is an RM property but it is to be used by the scheduler implementations when matching a resourcerequest to a node. If the property is set to {{false}} things work as today, the matching is done using only the HOSTNAME obtained from the NodeId. If the property is set to {{true}}, the matching is done using the HOSTNAME:PORT obtained from the NodeId. There are not changes in the NM or AM side. If the property is set to {{true}}, the AM must be aware of such setting and when creating resource requests, it must set the location as HOSTNAME:PORT of the node instead of just HOSTNAME. The renaming of {{SchedulerNode#getHostName()}} to {{SchedulerNode#getNodeName()}} is to make obvious to developers that may not be HOSTNAME. Added javadocs explaing this clearly. This works with all 3 schedulers. The main motivation for this change is to be able to use yarn minicluster with multiple NMs and being able to target a specific NM instance. We could expose this for production use if there is a need. For that we would need to: * Expose via ApplicationReport the node matching mode: HOSTNAME or HOSTNAME:PORT * Provide a mechanism for AMs only aware of HOSTNAME matching mode to work with HOSTNAME:PORT mode If there is a usecase for this in a real deployment we should follow up this with another JIRA. > MiniYARNCluster with multiple nodemanagers, all nodes have same key for > allocations > --- > > Key: YARN-1008 > URL: https://issues.apache.org/jira/browse/YARN-1008 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch > > > While the NMs are keyed using the NodeId, the allocation is done based on the > hostname. > This makes the different nodes indistinguishable to the scheduler. > There should be an option to enabled the host:port instead just port for > allocations. The nodes reported to the AM should report the 'key' (host or > host:port). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1059) IllegalArgumentException while starting YARN
[ https://issues.apache.org/jira/browse/YARN-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738002#comment-13738002 ] rvller commented on YARN-1059: -- It was my fault, I've changed another parameter to one line in the XML, so that's why RM was not able to start. When I changed all of the parameters to one line (10.245.1.30:9030 and etc) RM wa able to start. I suppose that this is a bug, because it's confusing. > IllegalArgumentException while starting YARN > > > Key: YARN-1059 > URL: https://issues.apache.org/jira/browse/YARN-1059 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha > Environment: Ubuntu 12.04, hadoop 2.0.5 >Reporter: rvller > > Here is the traceback while starting the yarn resourse manager: > 2013-08-12 12:53:29,319 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting > ResourceManager > java.lang.IllegalArgumentException: Does not contain a valid host:port > authority: > 10.245.1.30:9030 > (configuration property 'yarn.resourcemanager.resource-tracker.address') > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:193) > at > org.apache.hadoop.conf.Configuration.getSocketAddr(Configuration.java:1450) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.init(ResourceTrackerService.java:105) > at > org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:255) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710) > And here is the yarn-site.xml: > > > > yarn.resourcemanager.address > > > 10.245.1.30:9010 > > > > > > > yarn.resourcemanager.scheduler.address > > > 10.245.1.30:9020 > > > > > > > yarn.resourcemanager.resource-tracker.address > > > 10.245.1.30:9030 > > > > > > > yarn.resourcemanager.admin.address > > > 10.245.1.30:9040 > > > > > > > yarn.resourcemanager.webapp.address > > > 10.245.1.30:9050 > > > > > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1061) NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager.
[ https://issues.apache.org/jira/browse/YARN-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737990#comment-13737990 ] Rohith Sharma K S commented on YARN-1061: - Extracted thread dump from NodeManager is {noformat} "Node Status Updater" prio=10 tid=0x414dc000 nid=0x1d754 in Object.wait() [0x7fefa2dec000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.ipc.Client.call(Client.java:1231) - locked <0xdef4f158> (a org.apache.hadoop.ipc.Client$Call) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy28.nodeHeartbeat(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:70) at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy30.nodeHeartbeat(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:348) {noformat} > NodeManager is indefinitely waiting for nodeHeartBeat() response from > ResouceManager. > - > > Key: YARN-1061 > URL: https://issues.apache.org/jira/browse/YARN-1061 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.5-alpha >Reporter: Rohith Sharma K S > > It is observed that in one of the scenario, NodeManger is indefinetly waiting > for nodeHeartbeat response from ResouceManger where ResouceManger is in > hanged up state. > NodeManager should get timeout exception instead of waiting indefinetly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1061) NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager.
Rohith Sharma K S created YARN-1061: --- Summary: NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager. Key: YARN-1061 URL: https://issues.apache.org/jira/browse/YARN-1061 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.5-alpha Reporter: Rohith Sharma K S It is observed that in one of the scenario, NodeManger is indefinetly waiting for nodeHeartbeat response from ResouceManger where ResouceManger is in hanged up state. NodeManager should get timeout exception instead of waiting indefinetly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1060) Two tests in TestFairScheduler are missing @Test annotation
[ https://issues.apache.org/jira/browse/YARN-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niranjan Singh updated YARN-1060: - Attachment: YARN-1060.patch Added @Test annotations > Two tests in TestFairScheduler are missing @Test annotation > --- > > Key: YARN-1060 > URL: https://issues.apache.org/jira/browse/YARN-1060 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Niranjan Singh > Labels: newbie > Attachments: YARN-1060.patch > > > Amazingly, these tests appear to pass with the annotations added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-1060) Two tests in TestFairScheduler are missing @Test annotation
[ https://issues.apache.org/jira/browse/YARN-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niranjan Singh reassigned YARN-1060: Assignee: Niranjan Singh > Two tests in TestFairScheduler are missing @Test annotation > --- > > Key: YARN-1060 > URL: https://issues.apache.org/jira/browse/YARN-1060 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Niranjan Singh > Labels: newbie > > Amazingly, these tests appear to pass with the annotations added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira