[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713322#comment-13713322 ] Maysam Yabandeh commented on YARN-713: -- [~ojoshi], but the patch was ready since June 14! Anyway, feel free to take over. > ResourceManager can exit unexpectedly if DNS is unavailable > --- > > Key: YARN-713 > URL: https://issues.apache.org/jira/browse/YARN-713 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Assignee: Maysam Yabandeh >Priority: Critical > Fix For: 2.1.0-beta > > Attachments: YARN-713.patch, YARN-713.patch, YARN-713.patch, > YARN-713.patch > > > As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could > lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and > that ultimately would cause the RM to exit. The RM should not exit during > DNS hiccups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception
[ https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713315#comment-13713315 ] Hadoop QA commented on YARN-875: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593101/YARN-875.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1529//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1529//console This message is automatically generated. > Application can hang if AMRMClientAsync callback thread has exception > - > > Key: YARN-875 > URL: https://issues.apache.org/jira/browse/YARN-875 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch, > YARN-875.3.patch > > > Currently that thread will die and then never callback. App can hang. > Possible solution could be to catch Throwable in the callback and then call > client.onError(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-853) maximum-am-resource-percent doesn't work after refreshQueues command
[ https://issues.apache.org/jira/browse/YARN-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713311#comment-13713311 ] Hadoop QA commented on YARN-853: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593117/YARN-853-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1528//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1528//console This message is automatically generated. > maximum-am-resource-percent doesn't work after refreshQueues command > > > Key: YARN-853 > URL: https://issues.apache.org/jira/browse/YARN-853 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.1.0-beta, 2.0.5-alpha >Reporter: Devaraj K >Assignee: Devaraj K > Attachments: YARN-853-1.patch, YARN-853-2.patch, YARN-853-3.patch, > YARN-853.patch > > > If we update yarn.scheduler.capacity.maximum-am-resource-percent / > yarn.scheduler.capacity..maximum-am-resource-percent > configuration and then do the refreshNodes, it uses the new config value to > calculate Max Active Applications and Max Active Application Per User. If we > add new node after issuing 'rmadmin -refreshQueues' command, it uses the old > maximum-am-resource-percent config value to calculate Max Active Applications > and Max Active Application Per User. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-853) maximum-am-resource-percent doesn't work after refreshQueues command
[ https://issues.apache.org/jira/browse/YARN-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-853: --- Attachment: YARN-853-3.patch > maximum-am-resource-percent doesn't work after refreshQueues command > > > Key: YARN-853 > URL: https://issues.apache.org/jira/browse/YARN-853 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.1.0-beta, 2.0.5-alpha >Reporter: Devaraj K >Assignee: Devaraj K > Attachments: YARN-853-1.patch, YARN-853-2.patch, YARN-853-3.patch, > YARN-853.patch > > > If we update yarn.scheduler.capacity.maximum-am-resource-percent / > yarn.scheduler.capacity..maximum-am-resource-percent > configuration and then do the refreshNodes, it uses the new config value to > calculate Max Active Applications and Max Active Application Per User. If we > add new node after issuing 'rmadmin -refreshQueues' command, it uses the old > maximum-am-resource-percent config value to calculate Max Active Applications > and Max Active Application Per User. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-685) Capacity Scheduler is not distributing the reducers tasks across the cluster
[ https://issues.apache.org/jira/browse/YARN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713285#comment-13713285 ] Omkar Vinit Joshi commented on YARN-685: [~raviprak] can you please tell me what is and ?? I think first one is nodes..what is second? also what do you mean here? {code} For 23, Reduce: 2 1 32 2 1 4 {code} > Capacity Scheduler is not distributing the reducers tasks across the cluster > > > Key: YARN-685 > URL: https://issues.apache.org/jira/browse/YARN-685 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.0.4-alpha >Reporter: Devaraj K > > If we have reducers whose total memory required to complete is less than the > total cluster memory, it is not assigning the reducers to all the nodes > uniformly(~uniformly). Also at that time there are no other jobs or job tasks > running in the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-245) Node Manager can not handle duplicate responses
[ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713282#comment-13713282 ] Omkar Vinit Joshi commented on YARN-245: ignore thread comment ...realized it is L not 1 :D > Node Manager can not handle duplicate responses > --- > > Key: YARN-245 > URL: https://issues.apache.org/jira/browse/YARN-245 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Devaraj K >Assignee: Mayank Bansal > Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, > YARN-245-trunk-3.patch > > > {code:xml} > 2012-11-25 12:56:11,795 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > FINISH_APPLICATION at FINISHED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > 2012-11-25 12:56:11,796 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1353818859056_0004 transitioned from FINISHED to null > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-245) Node Manager can not handle duplicate responses
[ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713280#comment-13713280 ] Omkar Vinit Joshi commented on YARN-245: Thanks [~mayank_bansal] for the patch.. I agree that checking heartbeat ids will test this issue... few comments.. {code} + conf.setBoolean(YarnConfiguration.LOG_AGGREGATION_ENABLED, true); {code} why are we doing this? {code} + NodeStatus nodeStatus = request.getNodeStatus(); + nodeStatus.setResponseId(heartBeatID++); {code} required? can be removed? * There is one issue at present with NodeStatusUpdaterImpl.java ...imagine if we get such a heartbeat then we will not wait but try again.. check finally code {} which won't get executed. and will keep pinging RM until we get correct response with response-id. Should we wait or immediately request? thoughts? {code} +Thread.sleep(1000l); {code} can we make it 1000? .. * test will need timeout. however I see there are certain tests without timeout... if adding timeout then add little larger value... :) {code} + if (nodeStatus.getKeepAliveApplications() != null + && nodeStatus.getKeepAliveApplications().size() > 0) { +for (ApplicationId appId : nodeStatus.getKeepAliveApplications()) { + List list = keepAliveRequests.get(appId); + if (list == null) { +list = new LinkedList(); +keepAliveRequests.put(appId, list); + } + list.add(System.currentTimeMillis()); +} + } {code} {code} + if (heartBeatID == 2) { +LOG.info("Sending FINISH_APP for application: [" + appId + "]"); +this.context.getApplications().put(appId, mock(Application.class)); + nhResponse.addAllApplicationsToCleanup(Collections.singletonList(appId)); + } {code} {code} + rt.context.getApplications().remove(rt.appId); {code} {code} +private Map> keepAliveRequests = +new HashMap>(); +private ApplicationId appId = BuilderUtils.newApplicationId(1, 1); {code} do we need this? can we remove all application related stuff? as we are now checking only heartbeat ids..we can remove this.. thoughts? > Node Manager can not handle duplicate responses > --- > > Key: YARN-245 > URL: https://issues.apache.org/jira/browse/YARN-245 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Devaraj K >Assignee: Mayank Bansal > Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, > YARN-245-trunk-3.patch > > > {code:xml} > 2012-11-25 12:56:11,795 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > FINISH_APPLICATION at FINISHED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > 2012-11-25 12:56:11,796 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1353818859056_0004 transitioned from FINISHED to null > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-857) Errors when localizing end up with the localization failure not being seen by the NM
[ https://issues.apache.org/jira/browse/YARN-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-857: - Issue Type: Sub-task (was: Bug) Parent: YARN-522 > Errors when localizing end up with the localization failure not being seen by > the NM > > > Key: YARN-857 > URL: https://issues.apache.org/jira/browse/YARN-857 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: YARN-857.1.patch, YARN-857.2.patch > > > at > org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:106) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:978) > Traced this down to DefaultExecutor which does not look at the exit code for > the localizer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701
[ https://issues.apache.org/jira/browse/YARN-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713243#comment-13713243 ] Arun C Murthy commented on YARN-937: Hey [~tucu00], is there a chance you can get to this one quickly? Much appreciated, thanks! > Fix unmanaged AM in non-secure/secure setup post YARN-701 > - > > Key: YARN-937 > URL: https://issues.apache.org/jira/browse/YARN-937 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Arun C Murthy >Assignee: Alejandro Abdelnur >Priority: Blocker > Fix For: 2.1.0-beta > > > Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens > will be used in both scenarios. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-919) Setting default heap sizes in yarn env
[ https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713244#comment-13713244 ] Hadoop QA commented on YARN-919: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593107/YARN.919.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1526//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1526//console This message is automatically generated. > Setting default heap sizes in yarn env > -- > > Key: YARN-919 > URL: https://issues.apache.org/jira/browse/YARN-919 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Mayank Bansal >Assignee: Mayank Bansal >Priority: Minor > Attachments: YARN.919.4.patch, YARN-919-trunk-1.patch, > YARN-919-trunk-2.patch, YARN-919-trunk-3.patch > > > Right now there are no defaults in yarn env scripts for resource manager nad > node manager and if user wants to override that, then user has to go to > documentation and find the variables and change the script. > There is no straight forward way to change it in script. Just updating the > variables with defaults. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713241#comment-13713241 ] Djellel Eddine Difallah commented on YARN-897: -- That is what I tried before, but because the parentQueue doesn't know which queue made the call, we don't know what to reinsert. Basically all we can do is to go down to tree to figure this out. Alternatively, we can modify the signature of completedContainer to include the Queue... That's why we figured that it's more efficient to have the leaf queue explicitly trigger the call. > CapacityScheduler wrongly sorted queues > --- > > Key: YARN-897 > URL: https://issues.apache.org/jira/browse/YARN-897 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.0.4-alpha >Reporter: Djellel Eddine Difallah >Priority: Blocker > Attachments: TestBugParentQueue.java, YARN-897-1.patch, > YARN-897-2.patch > > > The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity > defines the sort order. This ensures the queue with least UsedCapacity to > receive resources next. On containerAssignment we correctly update the order, > but we miss to do so on container completions. This corrupts the TreeSet > structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-927) Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest
[ https://issues.apache.org/jira/browse/YARN-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713235#comment-13713235 ] Vinod Kumar Vavilapalli commented on YARN-927: -- Seems like this is already committed. Close this? > Change ContainerRequest to not have more than 1 container count and remove > StoreContainerRequest > > > Key: YARN-927 > URL: https://issues.apache.org/jira/browse/YARN-927 > Project: Hadoop YARN > Issue Type: Task >Reporter: Bikas Saha >Assignee: Bikas Saha > Attachments: YARN-927.1.patch, YARN-927.2.patch, YARN-927.3.patch, > YARN-927.4.patch > > > The downside is having to use more than 1 container request when requesting > more than 1 container at * priority. For most other use cases that have > specific locations we anyways need to make multiple container requests. This > will also remove unnecessary duplication caused by StoredContainerRequest. It > will make the getMatchingRequest() always available and easy to use > removeContainerRequest(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-403) Node Manager throws java.io.IOException: Verification of the hashReply failed
[ https://issues.apache.org/jira/browse/YARN-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713234#comment-13713234 ] Devaraj K commented on YARN-403: Sorry Omkar, I don't have logs for this now, I have faced it once during long runs when the load was high on the cluster. During that time the fetch request from the Reducer got failed due to this error. We could wait for sometime If I get any further info I will update, or if any others face this they could also help to check this further. > Node Manager throws java.io.IOException: Verification of the hashReply failed > - > > Key: YARN-403 > URL: https://issues.apache.org/jira/browse/YARN-403 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.2-alpha, 0.23.6 >Reporter: Devaraj K >Assignee: Omkar Vinit Joshi > > {code:xml} > 2013-02-09 22:59:47,490 WARN org.apache.hadoop.mapred.ShuffleHandler: Shuffle > failure > java.io.IOException: Verification of the hashReply failed > at > org.apache.hadoop.mapreduce.security.SecureShuffleUtils.verifyReply(SecureShuffleUtils.java:98) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.verifyRequest(ShuffleHandler.java:436) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:383) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) > at > org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:540) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) > at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349) > at > org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280) > at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713233#comment-13713233 ] Arun C Murthy commented on YARN-897: On second thoughts - we should get ParentQueue.completedContainer to resort, rather than relying on LeafQueue to make an explicit 'resort' call... > CapacityScheduler wrongly sorted queues > --- > > Key: YARN-897 > URL: https://issues.apache.org/jira/browse/YARN-897 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Djellel Eddine Difallah > Attachments: TestBugParentQueue.java, YARN-897-1.patch, > YARN-897-2.patch > > > The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity > defines the sort order. This ensures the queue with least UsedCapacity to > receive resources next. On containerAssignment we correctly update the order, > but we miss to do so on container completions. This corrupts the TreeSet > structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-897: --- Target Version/s: 2.1.0-beta > CapacityScheduler wrongly sorted queues > --- > > Key: YARN-897 > URL: https://issues.apache.org/jira/browse/YARN-897 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.0.4-alpha >Reporter: Djellel Eddine Difallah >Priority: Blocker > Attachments: TestBugParentQueue.java, YARN-897-1.patch, > YARN-897-2.patch > > > The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity > defines the sort order. This ensures the queue with least UsedCapacity to > receive resources next. On containerAssignment we correctly update the order, > but we miss to do so on container completions. This corrupts the TreeSet > structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-897: --- Priority: Blocker (was: Major) > CapacityScheduler wrongly sorted queues > --- > > Key: YARN-897 > URL: https://issues.apache.org/jira/browse/YARN-897 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Djellel Eddine Difallah >Priority: Blocker > Attachments: TestBugParentQueue.java, YARN-897-1.patch, > YARN-897-2.patch > > > The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity > defines the sort order. This ensures the queue with least UsedCapacity to > receive resources next. On containerAssignment we correctly update the order, > but we miss to do so on container completions. This corrupts the TreeSet > structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-897: --- Affects Version/s: 2.0.4-alpha > CapacityScheduler wrongly sorted queues > --- > > Key: YARN-897 > URL: https://issues.apache.org/jira/browse/YARN-897 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.0.4-alpha >Reporter: Djellel Eddine Difallah >Priority: Blocker > Attachments: TestBugParentQueue.java, YARN-897-1.patch, > YARN-897-2.patch > > > The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity > defines the sort order. This ensures the queue with least UsedCapacity to > receive resources next. On containerAssignment we correctly update the order, > but we miss to do so on container completions. This corrupts the TreeSet > structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-919) Setting default heap sizes in yarn env
[ https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-919: - Attachment: YARN.919.4.patch [~mayank_bansal] Thanks for addressing the review comments. Looking at bin/yarn in a bit more detail, I noticed a couple of other gotchas that can affect users when setting heapsize. In that aspect, I have attached a patch with a bit more verbose documentation. Let me know what you think. > Setting default heap sizes in yarn env > -- > > Key: YARN-919 > URL: https://issues.apache.org/jira/browse/YARN-919 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Mayank Bansal >Assignee: Mayank Bansal >Priority: Minor > Attachments: YARN.919.4.patch, YARN-919-trunk-1.patch, > YARN-919-trunk-2.patch, YARN-919-trunk-3.patch > > > Right now there are no defaults in yarn env scripts for resource manager nad > node manager and if user wants to override that, then user has to go to > documentation and find the variables and change the script. > There is no straight forward way to change it in script. Just updating the > variables with defaults. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713231#comment-13713231 ] Arun C Murthy commented on YARN-897: Patch looks good. Can you please provide a patch which includes the test as well (rather than 2 files). Tx! > CapacityScheduler wrongly sorted queues > --- > > Key: YARN-897 > URL: https://issues.apache.org/jira/browse/YARN-897 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Djellel Eddine Difallah > Attachments: TestBugParentQueue.java, YARN-897-1.patch, > YARN-897-2.patch > > > The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity > defines the sort order. This ensures the queue with least UsedCapacity to > receive resources next. On containerAssignment we correctly update the order, > but we miss to do so on container completions. This corrupts the TreeSet > structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701
/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCResponseId.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java > ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload > after YARN-701 > - > > Key: YARN-918 > URL: https://issues.apache.org/jira/browse/YARN-918 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 2.1.0-beta > > Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt, > YARN-918-20130718.txt > > > Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need > ApplicationAttemptId in the RPC pay load. This is an API change, so doing it > as a blocker for 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713206#comment-13713206 ] Omkar Vinit Joshi commented on YARN-744: [~bikassaha] yes... there is similar but different bug though..so [~mayank_bansal] is fixing it. There we are computing the response and then updating RMNodeImpl asynchronously. If this approach is correct then we can do the similar thing after YARN-245 is in. > Race condition in ApplicationMasterService.allocate .. It might process same > allocate request twice resulting in additional containers getting allocated. > - > > Key: YARN-744 > URL: https://issues.apache.org/jira/browse/YARN-744 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Omkar Vinit Joshi >Priority: Minor > Attachments: MAPREDUCE-3899-branch-0.23.patch, > YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch > > > Looks like the lock taken in this is broken. It takes a lock on lastResponse > object and then puts a new lastResponse object into the map. At this point a > new thread entering this function will get a new lastResponse object and will > be able to take its lock and enter the critical section. Presumably we want > to limit one response per app attempt. So the lock could be taken on the > ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713204#comment-13713204 ] Bikas Saha commented on YARN-321: - Sounds right. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713203#comment-13713203 ] Bikas Saha commented on YARN-744: - Does the same thing apply for ResourceTrackerService too? > Race condition in ApplicationMasterService.allocate .. It might process same > allocate request twice resulting in additional containers getting allocated. > - > > Key: YARN-744 > URL: https://issues.apache.org/jira/browse/YARN-744 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Omkar Vinit Joshi >Priority: Minor > Attachments: MAPREDUCE-3899-branch-0.23.patch, > YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch > > > Looks like the lock taken in this is broken. It takes a lock on lastResponse > object and then puts a new lastResponse object into the map. At this point a > new thread entering this function will get a new lastResponse object and will > be able to take its lock and enter the critical section. Presumably we want > to limit one response per app attempt. So the lock could be taken on the > ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable
[ https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713181#comment-13713181 ] Hudson commented on YARN-814: - SUCCESS: Integrated in Hadoop-trunk-Commit #4115 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4115/]) YARN-814. Improving diagnostics when containers fail during launch due to various reasons like invalid env etc. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504732) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java > Difficult to diagnose a failed container launch when error due to invalid > environment variable > -- > > Key: YARN-814 > URL: https://issues.apache.org/jira/browse/YARN-814 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Jian He > Fix For: 2.1.1-beta > > Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, > YARN-814.4.patch, YARN-814.5.patch, YARN-814.6.patch, YARN-814.7.patch, > YARN-814.patch > > > The container's launch script sets up environment variables, symlinks etc. > If there is any failure when setting up the basic context ( before the actual > user's process is launched ), nothing is captured by the NM. This makes it > impossible to diagnose the reason for the failure. > To reproduce, set an env var where the value contains characters that throw > syntax errors in bash. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-919) Setting default heap sizes in yarn env
[ https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713180#comment-13713180 ] Vinod Kumar Vavilapalli commented on YARN-919: -- [~hitesh], can you please review/commit this? Tx. > Setting default heap sizes in yarn env > -- > > Key: YARN-919 > URL: https://issues.apache.org/jira/browse/YARN-919 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Mayank Bansal >Assignee: Mayank Bansal >Priority: Minor > Attachments: YARN-919-trunk-1.patch, YARN-919-trunk-2.patch, > YARN-919-trunk-3.patch > > > Right now there are no defaults in yarn env scripts for resource manager nad > node manager and if user wants to override that, then user has to go to > documentation and find the variables and change the script. > There is no straight forward way to change it in script. Just updating the > variables with defaults. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable
[ https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713158#comment-13713158 ] Vinod Kumar Vavilapalli commented on YARN-814: -- +1. This looks good. Hopefully the tests run fine on Windows too. Checking this in. > Difficult to diagnose a failed container launch when error due to invalid > environment variable > -- > > Key: YARN-814 > URL: https://issues.apache.org/jira/browse/YARN-814 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Jian He > Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, > YARN-814.4.patch, YARN-814.5.patch, YARN-814.6.patch, YARN-814.7.patch, > YARN-814.patch > > > The container's launch script sets up environment variables, symlinks etc. > If there is any failure when setting up the basic context ( before the actual > user's process is launched ), nothing is captured by the NM. This makes it > impossible to diagnose the reason for the failure. > To reproduce, set an env var where the value contains characters that throw > syntax errors in bash. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception
[ https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-875: --- Attachment: YARN-875.3.patch > Application can hang if AMRMClientAsync callback thread has exception > - > > Key: YARN-875 > URL: https://issues.apache.org/jira/browse/YARN-875 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch, > YARN-875.3.patch > > > Currently that thread will die and then never callback. App can hang. > Possible solution could be to catch Throwable in the callback and then call > client.onError(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception
[ https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713159#comment-13713159 ] Xuan Gong commented on YARN-875: Currently, if callback handler has exception, this thread will stop. But applicationMaster will keep running. We try to stop ApplicationMaster if callback handler has exception. That is why we add a try..catch block, and at catch block, we call handler.onError(). And calling stop() inside onError() is not required, it is the recommended action. If it call stop() inside onError(), that is fine, too. Eventually, AMRMClientAsync will call unregisterApplicationMaster and set keepRunning flag to false which will stop the heartBeat thread. But it is good to let heartBeat thread stop earlier. > Application can hang if AMRMClientAsync callback thread has exception > - > > Key: YARN-875 > URL: https://issues.apache.org/jira/browse/YARN-875 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch, > YARN-875.3.patch > > > Currently that thread will die and then never callback. App can hang. > Possible solution could be to catch Throwable in the callback and then call > client.onError(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-245) Node Manager can not handle duplicate responses
[ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713154#comment-13713154 ] Hadoop QA commented on YARN-245: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593097/YARN-245-trunk-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1525//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1525//console This message is automatically generated. > Node Manager can not handle duplicate responses > --- > > Key: YARN-245 > URL: https://issues.apache.org/jira/browse/YARN-245 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Devaraj K >Assignee: Mayank Bansal > Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, > YARN-245-trunk-3.patch > > > {code:xml} > 2012-11-25 12:56:11,795 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > FINISH_APPLICATION at FINISHED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > 2012-11-25 12:56:11,796 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1353818859056_0004 transitioned from FINISHED to null > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-245) Node Manager can not handle duplicate responses
[ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713142#comment-13713142 ] Mayank Bansal commented on YARN-245: Thanks [~ojoshi] for the review. I had an offline discussion with Omkar. I removed |||+ private int lastHeartBeatId; I changed the duplicate response id behavior as well. For tests we agreed to test the duplication of heart beat. Updating the patch. Thanks, Mayank > Node Manager can not handle duplicate responses > --- > > Key: YARN-245 > URL: https://issues.apache.org/jira/browse/YARN-245 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Devaraj K >Assignee: Mayank Bansal > Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, > YARN-245-trunk-3.patch > > > {code:xml} > 2012-11-25 12:56:11,795 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > FINISH_APPLICATION at FINISHED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > 2012-11-25 12:56:11,796 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1353818859056_0004 transitioned from FINISHED to null > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-245) Node Manager can not handle duplicate responses
[ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-245: --- Attachment: YARN-245-trunk-3.patch Attaching patch Thanks, Mayank > Node Manager can not handle duplicate responses > --- > > Key: YARN-245 > URL: https://issues.apache.org/jira/browse/YARN-245 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Devaraj K >Assignee: Mayank Bansal > Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, > YARN-245-trunk-3.patch > > > {code:xml} > 2012-11-25 12:56:11,795 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > FINISH_APPLICATION at FINISHED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > 2012-11-25 12:56:11,796 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1353818859056_0004 transitioned from FINISHED to null > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701
[ https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713099#comment-13713099 ] Hadoop QA commented on YARN-918: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593085/YARN-918-20130718.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1524//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1524//console This message is automatically generated. > ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload > after YARN-701 > - > > Key: YARN-918 > URL: https://issues.apache.org/jira/browse/YARN-918 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt, > YARN-918-20130718.txt > > > Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need > ApplicationAttemptId in the RPC pay load. This is an API change, so doing it > as a blocker for 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701
[ https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713087#comment-13713087 ] Hadoop QA commented on YARN-918: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593085/YARN-918-20130718.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.api.impl.TestNMClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1523//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1523//console This message is automatically generated. > ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload > after YARN-701 > - > > Key: YARN-918 > URL: https://issues.apache.org/jira/browse/YARN-918 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt, > YARN-918-20130718.txt > > > Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need > ApplicationAttemptId in the RPC pay load. This is an API change, so doing it > as a blocker for 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-403) Node Manager throws java.io.IOException: Verification of the hashReply failed
[ https://issues.apache.org/jira/browse/YARN-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713068#comment-13713068 ] Omkar Vinit Joshi commented on YARN-403: [~devaraj.k] Can you give some more information? RM / NM / application logs would have helped a lot are you able to reproduce this? any steps to reproduce? what I can see is that hash provided is not what shuffle service was expecting.. did it fail for all? > Node Manager throws java.io.IOException: Verification of the hashReply failed > - > > Key: YARN-403 > URL: https://issues.apache.org/jira/browse/YARN-403 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.2-alpha, 0.23.6 >Reporter: Devaraj K >Assignee: Omkar Vinit Joshi > > {code:xml} > 2013-02-09 22:59:47,490 WARN org.apache.hadoop.mapred.ShuffleHandler: Shuffle > failure > java.io.IOException: Verification of the hashReply failed > at > org.apache.hadoop.mapreduce.security.SecureShuffleUtils.verifyReply(SecureShuffleUtils.java:98) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.verifyRequest(ShuffleHandler.java:436) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:383) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) > at > org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:540) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) > at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349) > at > org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280) > at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713059#comment-13713059 ] Xuan Gong commented on YARN-873: bq. When you say different errors, are you implying different error messages or different exit codes? For anyone building a script-based tool on this api, the latter would be preferred. Now, I get it. Yes, you are right. We'd better to set different values. > YARNClient.getApplicationReport(unknownAppId) returns a null report > --- > > Key: YARN-873 > URL: https://issues.apache.org/jira/browse/YARN-873 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch > > > How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-403) Node Manager throws java.io.IOException: Verification of the hashReply failed
[ https://issues.apache.org/jira/browse/YARN-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi reassigned YARN-403: -- Assignee: Omkar Vinit Joshi > Node Manager throws java.io.IOException: Verification of the hashReply failed > - > > Key: YARN-403 > URL: https://issues.apache.org/jira/browse/YARN-403 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.2-alpha, 0.23.6 >Reporter: Devaraj K >Assignee: Omkar Vinit Joshi > > {code:xml} > 2013-02-09 22:59:47,490 WARN org.apache.hadoop.mapred.ShuffleHandler: Shuffle > failure > java.io.IOException: Verification of the hashReply failed > at > org.apache.hadoop.mapreduce.security.SecureShuffleUtils.verifyReply(SecureShuffleUtils.java:98) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.verifyRequest(ShuffleHandler.java:436) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:383) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) > at > org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:540) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) > at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349) > at > org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280) > at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701
[ https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-918: - Attachment: YARN-918-20130718.txt Found one potential test issue that could be causing this. Fixing it. > ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload > after YARN-701 > - > > Key: YARN-918 > URL: https://issues.apache.org/jira/browse/YARN-918 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt, > YARN-918-20130718.txt > > > Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need > ApplicationAttemptId in the RPC pay load. This is an API change, so doing it > as a blocker for 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-880) Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution
[ https://issues.apache.org/jira/browse/YARN-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713006#comment-13713006 ] Omkar Vinit Joshi commented on YARN-880: Can you please provide below information to help debug the issue? * RM / NM / AM logs (please enable debug). * yarn-site.xml and mapred-site.xml files used. * Which scheduler you are using? > Configuring map/reduce memory equal to nodemanager's memory, hangs the job > execution > > > Key: YARN-880 > URL: https://issues.apache.org/jira/browse/YARN-880 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.1-alpha >Reporter: Nishan Shetty >Assignee: Omkar Vinit Joshi >Priority: Critical > > Scenario: > = > Cluster is installed with 2 Nodemanagers > Configuraiton: > NM memory (yarn.nodemanager.resource.memory-mb): 8 gb > map and reduce memory : 8 gb > Appmaster memory: 2 gb > If map task is reserved on the same nodemanager where appmaster of the same > job is running then job execution hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712994#comment-13712994 ] Hitesh Shah commented on YARN-873: -- [~xgong] When you say different errors, are you implying different error messages or different exit codes? For anyone building a script-based tool on this api, the latter would be preferred. > YARNClient.getApplicationReport(unknownAppId) returns a null report > --- > > Key: YARN-873 > URL: https://issues.apache.org/jira/browse/YARN-873 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch > > > How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712985#comment-13712985 ] Xuan Gong commented on YARN-873: bq.Requiring to parse a string message to determine whether an application exists or not is more work as compared to checking $? which can be used to indicate various errors such as connection issue/invalid application id/app does not exist in RM. Yes, but here we indicate different errors based on the different exceptions that we catch, such as ApplicationNotFoundException. > YARNClient.getApplicationReport(unknownAppId) returns a null report > --- > > Key: YARN-873 > URL: https://issues.apache.org/jira/browse/YARN-873 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch > > > How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701
[ https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712969#comment-13712969 ] Vinod Kumar Vavilapalli commented on YARN-918: -- Checking.. This passes on my local machine. Jenkins is complaining about port issues. Will retrigger it and at the same time run all tests locally.. > ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload > after YARN-701 > - > > Key: YARN-918 > URL: https://issues.apache.org/jira/browse/YARN-918 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt > > > Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need > ApplicationAttemptId in the RPC pay load. This is an API change, so doing it > as a blocker for 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712968#comment-13712968 ] Xuan Gong commented on YARN-873: bq.Having a return statement in a catch/finally block is not recommended normally. We could print the message and re-throw the exception or simply not catch the exception. Also, this way the cmd line would exit with non-zero exit code. I still prefer to the way "print out message, then exist" instead of the way "print out message then throw exception or not catch the exception" in this scenario. If we re-throw exception or not catch exception, it will make no different between we throw YarnException at YARNClient.getApplicationReport(unknownAppId). If the user get the Exception, that means they need to check and debug whether there is anything wrong. For this case, if the users give the unknown application_id, they will get the message, and this is the expected action. > YARNClient.getApplicationReport(unknownAppId) returns a null report > --- > > Key: YARN-873 > URL: https://issues.apache.org/jira/browse/YARN-873 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch > > > How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-903) DistributedShell throwing Errors in logs after successfull completion
[ https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712932#comment-13712932 ] Hadoop QA commented on YARN-903: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593065/YARN-903-20130718.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1522//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1522//console This message is automatically generated. > DistributedShell throwing Errors in logs after successfull completion > - > > Key: YARN-903 > URL: https://issues.apache.org/jira/browse/YARN-903 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 2.0.4-alpha > Environment: Ununtu 11.10 >Reporter: Abhishek Kapoor >Assignee: Omkar Vinit Joshi > Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, > YARN-903-20130718.1.patch, yarn-sunny-nodemanager-sunny-Inspiron.log > > > I have tried running DistributedShell and also used ApplicationMaster of the > same for my test. > The application is successfully running through logging some errors which > would be useful to fix. > Below are the logs from NodeManager and ApplicationMasterode > Log Snippet for NodeManager > = > 2013-07-07 13:39:18,787 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting > to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1 > 2013-07-07 13:39:19,050 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Rolling master-key for container-tokens, got key with id -325382586 > 2013-07-07 13:39:19,052 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: > Rolling master-key for nm-tokens, got key with id :1005046570 > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered > with ResourceManager as sunny-Inspiron:9993 with total resource of > > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying > ContainerManager to unblock new container-requests > 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: > Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE) > 2013-07-07 13:39:35,492 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Start request for container_1373184544832_0001_01_01 by user sunny > 2013-07-07 13:39:35,507 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Creating a new application reference for app application_1373184544832_0001 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny > IP=127.0.0.1OPERATION=Start Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1373184544832_0001 > CONTAINERID=container_1373184544832_0001_01_01 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1373184544832_0001 transitioned from NEW to INITING > 2013-07-07 13:39:35,512 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Adding container_1373184544832_0001_01_01 to application > application_1373184544832_0001 > 2013-07-07 13:39:35,518 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_13
[jira] [Assigned] (YARN-880) Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution
[ https://issues.apache.org/jira/browse/YARN-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi reassigned YARN-880: -- Assignee: Omkar Vinit Joshi > Configuring map/reduce memory equal to nodemanager's memory, hangs the job > execution > > > Key: YARN-880 > URL: https://issues.apache.org/jira/browse/YARN-880 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.1-alpha >Reporter: Nishan Shetty >Assignee: Omkar Vinit Joshi >Priority: Critical > > Scenario: > = > Cluster is installed with 2 Nodemanagers > Configuraiton: > NM memory (yarn.nodemanager.resource.memory-mb): 8 gb > map and reduce memory : 8 gb > Appmaster memory: 2 gb > If map task is reserved on the same nodemanager where appmaster of the same > job is running then job execution hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712931#comment-13712931 ] Karthik Kambatla commented on YARN-353: --- For the findbugs warning around NUM_RETRIES, we should probably make it non-static numRetries. > Add Zookeeper-based store implementation for RMStateStore > - > > Key: YARN-353 > URL: https://issues.apache.org/jira/browse/YARN-353 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Hitesh Shah >Assignee: Bikas Saha > Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, > YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, > YARN-353.8.patch > > > Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712902#comment-13712902 ] Hadoop QA commented on YARN-353: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593045/YARN-353.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1520//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1520//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1520//console This message is automatically generated. > Add Zookeeper-based store implementation for RMStateStore > - > > Key: YARN-353 > URL: https://issues.apache.org/jira/browse/YARN-353 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Hitesh Shah >Assignee: Bikas Saha > Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, > YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, > YARN-353.8.patch > > > Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-903) DistributedShell throwing Errors in logs after successfull completion
[ https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712889#comment-13712889 ] Omkar Vinit Joshi commented on YARN-903: Attaching a simple test to verify this. > DistributedShell throwing Errors in logs after successfull completion > - > > Key: YARN-903 > URL: https://issues.apache.org/jira/browse/YARN-903 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 2.0.4-alpha > Environment: Ununtu 11.10 >Reporter: Abhishek Kapoor >Assignee: Omkar Vinit Joshi > Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, > YARN-903-20130718.1.patch, yarn-sunny-nodemanager-sunny-Inspiron.log > > > I have tried running DistributedShell and also used ApplicationMaster of the > same for my test. > The application is successfully running through logging some errors which > would be useful to fix. > Below are the logs from NodeManager and ApplicationMasterode > Log Snippet for NodeManager > = > 2013-07-07 13:39:18,787 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting > to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1 > 2013-07-07 13:39:19,050 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Rolling master-key for container-tokens, got key with id -325382586 > 2013-07-07 13:39:19,052 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: > Rolling master-key for nm-tokens, got key with id :1005046570 > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered > with ResourceManager as sunny-Inspiron:9993 with total resource of > > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying > ContainerManager to unblock new container-requests > 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: > Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE) > 2013-07-07 13:39:35,492 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Start request for container_1373184544832_0001_01_01 by user sunny > 2013-07-07 13:39:35,507 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Creating a new application reference for app application_1373184544832_0001 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny > IP=127.0.0.1OPERATION=Start Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1373184544832_0001 > CONTAINERID=container_1373184544832_0001_01_01 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1373184544832_0001 transitioned from NEW to INITING > 2013-07-07 13:39:35,512 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Adding container_1373184544832_0001_01_01 to application > application_1373184544832_0001 > 2013-07-07 13:39:35,518 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1373184544832_0001 transitioned from INITING to > RUNNING > 2013-07-07 13:39:35,528 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1373184544832_0001_01_01 transitioned from NEW to > LOCALIZING > 2013-07-07 13:39:35,540 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource hdfs://localhost:9000/application/test.jar transitioned from INIT > to DOWNLOADING > 2013-07-07 13:39:35,540 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Created localizer for container_1373184544832_0001_01_01 > 2013-07-07 13:39:35,675 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Writing credentials to the nmPrivate file > /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens. > Credentials list: > 2013-07-07 13:39:35,694 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > Initializing user sunny > 2013-07-07 13:39:35,803 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying > from > /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens > to > /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/u
[jira] [Updated] (YARN-903) DistributedShell throwing Errors in logs after successfull completion
[ https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-903: --- Attachment: YARN-903-20130718.1.patch > DistributedShell throwing Errors in logs after successfull completion > - > > Key: YARN-903 > URL: https://issues.apache.org/jira/browse/YARN-903 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 2.0.4-alpha > Environment: Ununtu 11.10 >Reporter: Abhishek Kapoor >Assignee: Omkar Vinit Joshi > Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, > YARN-903-20130718.1.patch, yarn-sunny-nodemanager-sunny-Inspiron.log > > > I have tried running DistributedShell and also used ApplicationMaster of the > same for my test. > The application is successfully running through logging some errors which > would be useful to fix. > Below are the logs from NodeManager and ApplicationMasterode > Log Snippet for NodeManager > = > 2013-07-07 13:39:18,787 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting > to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1 > 2013-07-07 13:39:19,050 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Rolling master-key for container-tokens, got key with id -325382586 > 2013-07-07 13:39:19,052 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: > Rolling master-key for nm-tokens, got key with id :1005046570 > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered > with ResourceManager as sunny-Inspiron:9993 with total resource of > > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying > ContainerManager to unblock new container-requests > 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: > Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE) > 2013-07-07 13:39:35,492 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Start request for container_1373184544832_0001_01_01 by user sunny > 2013-07-07 13:39:35,507 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Creating a new application reference for app application_1373184544832_0001 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny > IP=127.0.0.1OPERATION=Start Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1373184544832_0001 > CONTAINERID=container_1373184544832_0001_01_01 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1373184544832_0001 transitioned from NEW to INITING > 2013-07-07 13:39:35,512 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Adding container_1373184544832_0001_01_01 to application > application_1373184544832_0001 > 2013-07-07 13:39:35,518 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1373184544832_0001 transitioned from INITING to > RUNNING > 2013-07-07 13:39:35,528 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1373184544832_0001_01_01 transitioned from NEW to > LOCALIZING > 2013-07-07 13:39:35,540 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource hdfs://localhost:9000/application/test.jar transitioned from INIT > to DOWNLOADING > 2013-07-07 13:39:35,540 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Created localizer for container_1373184544832_0001_01_01 > 2013-07-07 13:39:35,675 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Writing credentials to the nmPrivate file > /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens. > Credentials list: > 2013-07-07 13:39:35,694 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > Initializing user sunny > 2013-07-07 13:39:35,803 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying > from > /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens > to > /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_13
[jira] [Commented] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701
[ https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712882#comment-13712882 ] Hadoop QA commented on YARN-918: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592818/YARN-918-20130717.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1521//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1521//console This message is automatically generated. > ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload > after YARN-701 > - > > Key: YARN-918 > URL: https://issues.apache.org/jira/browse/YARN-918 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt > > > Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need > ApplicationAttemptId in the RPC pay load. This is an API change, so doing it > as a blocker for 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-919) Setting default heap sizes in yarn env
[ https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712839#comment-13712839 ] Hadoop QA commented on YARN-919: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593050/YARN-919-trunk-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1519//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1519//console This message is automatically generated. > Setting default heap sizes in yarn env > -- > > Key: YARN-919 > URL: https://issues.apache.org/jira/browse/YARN-919 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Mayank Bansal >Assignee: Mayank Bansal >Priority: Minor > Attachments: YARN-919-trunk-1.patch, YARN-919-trunk-2.patch, > YARN-919-trunk-3.patch > > > Right now there are no defaults in yarn env scripts for resource manager nad > node manager and if user wants to override that, then user has to go to > documentation and find the variables and change the script. > There is no straight forward way to change it in script. Just updating the > variables with defaults. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712796#comment-13712796 ] Zhijie Shen commented on YARN-321: -- bq. Running as service: By default, ApplicationHistoryService will be embedded inside ResourceManager but will be independent enough to run as a separate service for scaling purposes. IIUC, to be independent, ApplicationHistoryService should have its own event dispatcher, shouldn't it? > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-919) Setting default heap sizes in yarn env
[ https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-919: --- Attachment: YARN-919-trunk-3.patch Thanks [~hitesh] and [~vinodkv] for review Updating the patch Thanks, Mayank > Setting default heap sizes in yarn env > -- > > Key: YARN-919 > URL: https://issues.apache.org/jira/browse/YARN-919 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Mayank Bansal >Assignee: Mayank Bansal >Priority: Minor > Attachments: YARN-919-trunk-1.patch, YARN-919-trunk-2.patch, > YARN-919-trunk-3.patch > > > Right now there are no defaults in yarn env scripts for resource manager nad > node manager and if user wants to override that, then user has to go to > documentation and find the variables and change the script. > There is no straight forward way to change it in script. Just updating the > variables with defaults. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712793#comment-13712793 ] Karthik Kambatla commented on YARN-353: --- Looks good. +1 pending Jenkins. > Add Zookeeper-based store implementation for RMStateStore > - > > Key: YARN-353 > URL: https://issues.apache.org/jira/browse/YARN-353 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Hitesh Shah >Assignee: Bikas Saha > Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, > YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, > YARN-353.8.patch > > > Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-353: - Attachment: YARN-353.8.patch New patch made NUM_RETRIES configurable. Changed removeApplicationState to use multi api to remove both app state and attempts state at the same time. Also fixed the warnings. > Add Zookeeper-based store implementation for RMStateStore > - > > Key: YARN-353 > URL: https://issues.apache.org/jira/browse/YARN-353 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Hitesh Shah >Assignee: Bikas Saha > Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, > YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, > YARN-353.8.patch > > > Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-938: - Summary: Hadoop 2 benchmarking (was: Hadoop 2 Bench marking ) > Hadoop 2 benchmarking > -- > > Key: YARN-938 > URL: https://issues.apache.org/jira/browse/YARN-938 > Project: Hadoop YARN > Issue Type: Task >Reporter: Mayank Bansal >Assignee: Mayank Bansal > > I am running the benchmarks on Hadoop 2 and will update the results soon. > Thanks, > Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712712#comment-13712712 ] Vinod Kumar Vavilapalli commented on YARN-938: -- Thanks for doing this Mayank! > Hadoop 2 benchmarking > -- > > Key: YARN-938 > URL: https://issues.apache.org/jira/browse/YARN-938 > Project: Hadoop YARN > Issue Type: Task >Reporter: Mayank Bansal >Assignee: Mayank Bansal > > I am running the benchmarks on Hadoop 2 and will update the results soon. > Thanks, > Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos
[ https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712698#comment-13712698 ] Hudson commented on YARN-701: - SUCCESS: Integrated in Hadoop-trunk-Commit #4110 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4110/]) YARN-701. Use application tokens irrespective of secure or non-secure mode. Contributed by Vinod K V. (acmurthy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504604) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAMAuthorization.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCResponseId.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java > ApplicationTokens should be used irrespective of kerberos > - > > Key: YARN-701 > URL: https://issues.apache.org/jira/browse/YARN-701 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.0-beta >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 2.1.0-beta > > Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, > YARN-701-20130710.txt, YARN-701-20130712.txt, YARN-701-20130717.txt, > yarn-ojoshi-resourcemanager-HW10351.local.log > > > - Single code path for secure and non-secure cases is useful for testing, > coverage. > - Having this in non-secure mode will help us avoid accidental bugs in AMs > DDos'ing and bringing down RM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-864) YARN NM leaking containers with CGroups
[ https://issues.apache.org/jira/browse/YARN-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-864. -- Resolution: Duplicate Given Jian's update, I'm closing this as duplicate of YARN-688. > YARN NM leaking containers with CGroups > --- > > Key: YARN-864 > URL: https://issues.apache.org/jira/browse/YARN-864 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.5-alpha > Environment: YARN 2.0.5-alpha with patches applied for YARN-799 and > YARN-600. >Reporter: Chris Riccomini >Assignee: Jian He > Attachments: rm-log, YARN-864.1.patch, YARN-864.2.patch > > > Hey Guys, > I'm running YARN 2.0.5-alpha with CGroups and stateful RM turned on, and I'm > seeing containers getting leaked by the NMs. I'm not quite sure what's going > on -- has anyone seen this before? I'm concerned that maybe it's a > mis-understanding on my part about how YARN's lifecycle works. > When I look in my AM logs for my app (not an MR app master), I see: > 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Got an exit code of -100. > This means that container container_1371141151815_0008_03_02 was killed > by YARN, either due to being released by the application master or being > 'lost' due to node failures etc. > 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Released container > container_1371141151815_0008_03_02 was assigned task ID 0. Requesting a > new container for the task. > The AM has been running steadily the whole time. Here's what the NM logs say: > {noformat} > 05:34:59,783 WARN AsyncDispatcher:109 - Interrupted Exception while stopping > java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1143) > at java.lang.Thread.join(Thread.java:1196) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:107) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.stop(NodeManager.java:209) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:336) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:61) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) > at java.lang.Thread.run(Thread.java:619) > 05:35:00,314 WARN ContainersMonitorImpl:463 - > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > is interrupted. Exiting. > 05:35:00,434 WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup > at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0006_01_001598 > 05:35:00,434 WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup > at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0008_03_02 > 05:35:00,434 WARN ContainerLaunch:247 - Failed to launch container. > java.io.IOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:205) > at org.apache.hadoop.util.Shell.run(Shell.java:129) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > 05:35:00,434 WARN ContainerLaunch:247 - Failed to launch container. > java.io.IOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:205) > at org.apache.hadoop.util.Shell.run(Shell.java:129) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230) > at > org.apache.hadoop.yarn.server.nodema
[jira] [Commented] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions
[ https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712688#comment-13712688 ] Vinod Kumar Vavilapalli commented on YARN-658: -- David, can you give us more information? RM, AM and NM logs will help a lot. > Command to kill a YARN application does not work with newer Ubuntu versions > --- > > Key: YARN-658 > URL: https://issues.apache.org/jira/browse/YARN-658 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.0.3-alpha, 2.0.4-alpha >Reporter: David Yan > > After issuing a KillApplicationRequest, the application keeps running on the > system even though the state is changed to KILLED. It happens on both Ubuntu > 12.10 and 13.04, but works fine on Ubuntu 12.04. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-455) NM warns about stopping an unknown container under normal circumstances
[ https://issues.apache.org/jira/browse/YARN-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi resolved YARN-455. Resolution: Duplicate > NM warns about stopping an unknown container under normal circumstances > --- > > Key: YARN-455 > URL: https://issues.apache.org/jira/browse/YARN-455 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.3-alpha, 0.23.6 >Reporter: Jason Lowe >Assignee: Omkar Vinit Joshi > > During normal operations the NM can log warnings to its audit log about > unknown containers. For example: > {noformat} > 2013-03-06 21:04:55,327 WARN nodemanager.NMAuditLogger: USER=UnknownUser > IP=xx OPERATION=Stop Container RequestTARGET=ContainerManagerImpl > RESULT=FAILURE DESCRIPTION=Trying to stop unknown container! > APPID=application_1359150825713_3947178 > CONTAINERID=container_1359150825713_3947178_01_001266 > {noformat} > Looking closer at the audit log and the NM log shows that the container > completed successfully and was forgotten by the NM before the stop request > arrived. The NM should avoid warning in these situations since this is a > "normal" race condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-455) NM warns about stopping an unknown container under normal circumstances
[ https://issues.apache.org/jira/browse/YARN-455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712677#comment-13712677 ] Omkar Vinit Joshi commented on YARN-455: Closing this as duplicate . I am fixing it at YARN-903 > NM warns about stopping an unknown container under normal circumstances > --- > > Key: YARN-455 > URL: https://issues.apache.org/jira/browse/YARN-455 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.3-alpha, 0.23.6 >Reporter: Jason Lowe >Assignee: Omkar Vinit Joshi > > During normal operations the NM can log warnings to its audit log about > unknown containers. For example: > {noformat} > 2013-03-06 21:04:55,327 WARN nodemanager.NMAuditLogger: USER=UnknownUser > IP=xx OPERATION=Stop Container RequestTARGET=ContainerManagerImpl > RESULT=FAILURE DESCRIPTION=Trying to stop unknown container! > APPID=application_1359150825713_3947178 > CONTAINERID=container_1359150825713_3947178_01_001266 > {noformat} > Looking closer at the audit log and the NM log shows that the container > completed successfully and was forgotten by the NM before the stop request > arrived. The NM should avoid warning in these situations since this is a > "normal" race condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-208) Yarn overrides diagnostic message set by AM
[ https://issues.apache.org/jira/browse/YARN-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-208. -- Resolution: Duplicate Thomas, closing this as duplicate, please reopen if you see it again. Tx. > Yarn overrides diagnostic message set by AM > --- > > Key: YARN-208 > URL: https://issues.apache.org/jira/browse/YARN-208 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.1-alpha >Reporter: Thomas Weise > > Diagnostics set in the AM just before exit overridden by Yarn. In the case of > state FAILED with different message, for SUCCESS the field will be blank. > Should retain application info. Per FinishApplicationMasterRequest this can > be managed by ApplicationMaster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos
[ https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712665#comment-13712665 ] Arun C Murthy commented on YARN-701: I'm committing this to unblock the rest. > ApplicationTokens should be used irrespective of kerberos > - > > Key: YARN-701 > URL: https://issues.apache.org/jira/browse/YARN-701 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.0-beta >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, > YARN-701-20130710.txt, YARN-701-20130712.txt, YARN-701-20130717.txt, > yarn-ojoshi-resourcemanager-HW10351.local.log > > > - Single code path for secure and non-secure cases is useful for testing, > coverage. > - Having this in non-secure mode will help us avoid accidental bugs in AMs > DDos'ing and bringing down RM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712661#comment-13712661 ] Omkar Vinit Joshi commented on YARN-713: [~maysamyabandeh] are you working on a patch? Or else I will take over..this is critical and needs to be fixed. > ResourceManager can exit unexpectedly if DNS is unavailable > --- > > Key: YARN-713 > URL: https://issues.apache.org/jira/browse/YARN-713 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Assignee: Maysam Yabandeh >Priority: Critical > Fix For: 2.1.0-beta > > Attachments: YARN-713.patch, YARN-713.patch, YARN-713.patch, > YARN-713.patch > > > As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could > lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and > that ultimately would cause the RM to exit. The RM should not exit during > DNS hiccups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable
[ https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712607#comment-13712607 ] Hadoop QA commented on YARN-814: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593018/YARN-814.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1518//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1518//console This message is automatically generated. > Difficult to diagnose a failed container launch when error due to invalid > environment variable > -- > > Key: YARN-814 > URL: https://issues.apache.org/jira/browse/YARN-814 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Jian He > Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, > YARN-814.4.patch, YARN-814.5.patch, YARN-814.6.patch, YARN-814.7.patch, > YARN-814.patch > > > The container's launch script sets up environment variables, symlinks etc. > If there is any failure when setting up the basic context ( before the actual > user's process is launched ), nothing is captured by the NM. This makes it > impossible to diagnose the reason for the failure. > To reproduce, set an env var where the value contains characters that throw > syntax errors in bash. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-245) Node Manager can not handle duplicate responses
[ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-245: --- Summary: Node Manager can not handle duplicate responses (was: Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED) > Node Manager can not handle duplicate responses > --- > > Key: YARN-245 > URL: https://issues.apache.org/jira/browse/YARN-245 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Devaraj K >Assignee: Mayank Bansal > Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch > > > {code:xml} > 2012-11-25 12:56:11,795 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > FINISH_APPLICATION at FINISHED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > 2012-11-25 12:56:11,796 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1353818859056_0004 transitioned from FINISHED to null > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable
[ https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-814: - Attachment: YARN-814.7.patch new patch fixed the warnings and added test case for stdout/stderr diagnostics > Difficult to diagnose a failed container launch when error due to invalid > environment variable > -- > > Key: YARN-814 > URL: https://issues.apache.org/jira/browse/YARN-814 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Jian He > Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, > YARN-814.4.patch, YARN-814.5.patch, YARN-814.6.patch, YARN-814.7.patch, > YARN-814.patch > > > The container's launch script sets up environment variables, symlinks etc. > If there is any failure when setting up the basic context ( before the actual > user's process is launched ), nothing is captured by the NM. This makes it > impossible to diagnose the reason for the failure. > To reproduce, set an env var where the value contains characters that throw > syntax errors in bash. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-938) Hadoop 2 Bench marking
Mayank Bansal created YARN-938: -- Summary: Hadoop 2 Bench marking Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712546#comment-13712546 ] Karthik Kambatla commented on YARN-321: --- bq. Folks, it would be great if we have a consolidated document that describes the design and some details. +1 > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712533#comment-13712533 ] Karthik Kambatla commented on YARN-353: --- bq. Make the ZKRMStateStore#NUM_RETRIES configurable with default set to 3. bq. fixed bq. Why should NUM_RETRIES not be there? Was just noting that: the latest patch has the non-configurable NUM_RETRIES, it should exist but be configurable. If it is configurable, we should probably change the name of the variable. > Add Zookeeper-based store implementation for RMStateStore > - > > Key: YARN-353 > URL: https://issues.apache.org/jira/browse/YARN-353 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Hitesh Shah >Assignee: Bikas Saha > Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, > YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch > > > Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712528#comment-13712528 ] Bikas Saha commented on YARN-873: - Having a return statement in a catch/finally block is not recommended normally. We could print the message and re-throw the exception or simply not catch the exception. Also, this way the cmd line would exit with non-zero exit code. {code} +} catch (ApplicationNotFoundException ex) { + sysout.println("Application with id '" + + applicationId + "' doesn't exist in RM."); + return; +} {code} > YARNClient.getApplicationReport(unknownAppId) returns a null report > --- > > Key: YARN-873 > URL: https://issues.apache.org/jira/browse/YARN-873 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch > > > How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712523#comment-13712523 ] Bikas Saha commented on YARN-353: - bq. ZKRMStateStore#getNewZooKeeper need not be synchronized bq. fixed The code is derived from ActiveStandyLeaderElector code in hadoop common. It was synchronized there for a race condition that showed up in testing. I would like to keep the synchronization as it was in the original patch. bq. the patch still seems to have NUM_RETRIES Why should NUM_RETRIES not be there? > Add Zookeeper-based store implementation for RMStateStore > - > > Key: YARN-353 > URL: https://issues.apache.org/jira/browse/YARN-353 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Hitesh Shah >Assignee: Bikas Saha > Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, > YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch > > > Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712514#comment-13712514 ] Hadoop QA commented on YARN-873: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592998/YARN-873.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1517//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1517//console This message is automatically generated. > YARNClient.getApplicationReport(unknownAppId) returns a null report > --- > > Key: YARN-873 > URL: https://issues.apache.org/jira/browse/YARN-873 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch > > > How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712507#comment-13712507 ] Hitesh Shah commented on YARN-873: -- [~xgong] Requiring to parse a string message to determine whether an application exists or not is more work as compared to checking $? which can be used to indicate various errors such as connection issue/invalid application id/app does not exist in RM. > YARNClient.getApplicationReport(unknownAppId) returns a null report > --- > > Key: YARN-873 > URL: https://issues.apache.org/jira/browse/YARN-873 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch > > > How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712482#comment-13712482 ] Karthik Kambatla commented on YARN-353: --- [~jianhe], the patch still seems to have NUM_RETRIES. Also, can you take a look at the test failure and findbugs warnings. Thanks. > Add Zookeeper-based store implementation for RMStateStore > - > > Key: YARN-353 > URL: https://issues.apache.org/jira/browse/YARN-353 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Hitesh Shah >Assignee: Bikas Saha > Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, > YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch > > > Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-873: --- Attachment: YARN-873.3.patch > YARNClient.getApplicationReport(unknownAppId) returns a null report > --- > > Key: YARN-873 > URL: https://issues.apache.org/jira/browse/YARN-873 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch > > > How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712462#comment-13712462 ] Xuan Gong commented on YARN-873: bq.1. Application -status command returns exit code as 0 when the application doesn't exist. Can we return the non-zero exit status code when the application doesn't exist? Well, return exit code as 0 is fine. if the users give the unknowAppId, they will get "Application doesn't exist in RM." response. And this response is expected. I think when we get the expected outout, we set the exit code as 0, otherwise set the exit code as non-zero. > YARNClient.getApplicationReport(unknownAppId) returns a null report > --- > > Key: YARN-873 > URL: https://issues.apache.org/jira/browse/YARN-873 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-873.1.patch, YARN-873.2.patch > > > How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-865) RM webservices can't query based on application Types
[ https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712358#comment-13712358 ] Hudson commented on YARN-865: - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1491 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1491/]) YARN-865. RM webservices can't query based on application Types. Contributed by Xuan Gong. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504288) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm > RM webservices can't query based on application Types > - > > Key: YARN-865 > URL: https://issues.apache.org/jira/browse/YARN-865 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.1.0-beta > > Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, > YARN-865.3.patch, YARN-865.4.patch, YARN-865.5.patch, YARN-865.6.patch > > > The resource manager web service api to get the list of apps doesn't have a > query parameter for appTypes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-922) Change FileSystemRMStateStore to use directories
[ https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712357#comment-13712357 ] Hudson commented on YARN-922: - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1491 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1491/]) YARN-922. Change FileSystemRMStateStore to use directories (Jian He via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504261) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java > Change FileSystemRMStateStore to use directories > > > Key: YARN-922 > URL: https://issues.apache.org/jira/browse/YARN-922 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > Fix For: 2.1.0-beta > > Attachments: YARN-922.1.patch, YARN-922.2.patch, YARN-922.3.patch, > YARN-922.patch > > > Store each app and its attempts in the same directory so that removing > application state is only one operation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos
[ https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712349#comment-13712349 ] Arun C Murthy commented on YARN-701: bq. Sure I can help. Thanks, I've opened YARN-937 and marked it a blocker. I'll commit YARN-701 later today to unblock both YARN-937 & YARN-918. > ApplicationTokens should be used irrespective of kerberos > - > > Key: YARN-701 > URL: https://issues.apache.org/jira/browse/YARN-701 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.0-beta >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, > YARN-701-20130710.txt, YARN-701-20130712.txt, YARN-701-20130717.txt, > yarn-ojoshi-resourcemanager-HW10351.local.log > > > - Single code path for secure and non-secure cases is useful for testing, > coverage. > - Having this in non-secure mode will help us avoid accidental bugs in AMs > DDos'ing and bringing down RM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701
[ https://issues.apache.org/jira/browse/YARN-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-937: --- Target Version/s: 2.1.0-beta > Fix unmanaged AM in non-secure/secure setup post YARN-701 > - > > Key: YARN-937 > URL: https://issues.apache.org/jira/browse/YARN-937 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Arun C Murthy >Assignee: Alejandro Abdelnur >Priority: Blocker > Fix For: 2.1.0-beta > > > Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens > will be used in both scenarios. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701
Arun C Murthy created YARN-937: -- Summary: Fix unmanaged AM in non-secure/secure setup post YARN-701 Key: YARN-937 URL: https://issues.apache.org/jira/browse/YARN-937 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Arun C Murthy Assignee: Alejandro Abdelnur Priority: Blocker Fix For: 2.1.0-beta Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens will be used in both scenarios. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-922) Change FileSystemRMStateStore to use directories
[ https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712305#comment-13712305 ] Hudson commented on YARN-922: - FAILURE: Integrated in Hadoop-Hdfs-trunk #1464 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1464/]) YARN-922. Change FileSystemRMStateStore to use directories (Jian He via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504261) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java > Change FileSystemRMStateStore to use directories > > > Key: YARN-922 > URL: https://issues.apache.org/jira/browse/YARN-922 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > Fix For: 2.1.0-beta > > Attachments: YARN-922.1.patch, YARN-922.2.patch, YARN-922.3.patch, > YARN-922.patch > > > Store each app and its attempts in the same directory so that removing > application state is only one operation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-865) RM webservices can't query based on application Types
[ https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712306#comment-13712306 ] Hudson commented on YARN-865: - FAILURE: Integrated in Hadoop-Hdfs-trunk #1464 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1464/]) YARN-865. RM webservices can't query based on application Types. Contributed by Xuan Gong. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504288) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm > RM webservices can't query based on application Types > - > > Key: YARN-865 > URL: https://issues.apache.org/jira/browse/YARN-865 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.1.0-beta > > Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, > YARN-865.3.patch, YARN-865.4.patch, YARN-865.5.patch, YARN-865.6.patch > > > The resource manager web service api to get the list of apps doesn't have a > query parameter for appTypes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712227#comment-13712227 ] Devaraj K commented on YARN-873: Sorry, I missed in the above comment. 3. {code:xml} // If the RM doesn't have the application, provide the response with // application report as null and let the clients to handle. {code} Can you also update this in-line comment in ClientRMService.java? > YARNClient.getApplicationReport(unknownAppId) returns a null report > --- > > Key: YARN-873 > URL: https://issues.apache.org/jira/browse/YARN-873 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-873.1.patch, YARN-873.2.patch > > > How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-922) Change FileSystemRMStateStore to use directories
[ https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712215#comment-13712215 ] Hudson commented on YARN-922: - SUCCESS: Integrated in Hadoop-Yarn-trunk #274 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/274/]) YARN-922. Change FileSystemRMStateStore to use directories (Jian He via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504261) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java > Change FileSystemRMStateStore to use directories > > > Key: YARN-922 > URL: https://issues.apache.org/jira/browse/YARN-922 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > Fix For: 2.1.0-beta > > Attachments: YARN-922.1.patch, YARN-922.2.patch, YARN-922.3.patch, > YARN-922.patch > > > Store each app and its attempts in the same directory so that removing > application state is only one operation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-865) RM webservices can't query based on application Types
[ https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712216#comment-13712216 ] Hudson commented on YARN-865: - SUCCESS: Integrated in Hadoop-Yarn-trunk #274 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/274/]) YARN-865. RM webservices can't query based on application Types. Contributed by Xuan Gong. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504288) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm > RM webservices can't query based on application Types > - > > Key: YARN-865 > URL: https://issues.apache.org/jira/browse/YARN-865 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.1.0-beta > > Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, > YARN-865.3.patch, YARN-865.4.patch, YARN-865.5.patch, YARN-865.6.patch > > > The resource manager web service api to get the list of apps doesn't have a > query parameter for appTypes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712197#comment-13712197 ] Devaraj K commented on YARN-873: The latest patch overall looks good to me. These are things I feel which we can take care, 1. Application -status command returns exit code as 0 when the application doesn't exist. Can we return the non-zero exit status code when the application doesn't exist? 2. In TestClientRMService.java, {code:xml} +try { + GetApplicationReportResponse applicationReport = rmService + .getApplicationReport(request); +} catch (ApplicationNotFoundException ex) { + getExpectedException = true; + Assert.assertEquals(ex.getMessage(), + "Application with id '" + request.getApplicationId() + + "' doesn't exist in RM."); +} +Assert.assertTrue(getExpectedException); {code} Can we fail after getApplicationReport using Assert.fail() instead of having boolean flag and checking. And also applicationReport variable is never used. 3. {code:xml} // If the RM doesn't have the application, provide the response with // application report as null and let the clients to handle. {code} Do we have other JIRA to fix the same for kill application, if not can we file a JIRA? > YARNClient.getApplicationReport(unknownAppId) returns a null report > --- > > Key: YARN-873 > URL: https://issues.apache.org/jira/browse/YARN-873 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-873.1.patch, YARN-873.2.patch > > > How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-933) After an AppAttempt_1 got failed [ removal and releasing of container is done , AppAttempt_2 is scheduled ] again relaunching of AppAttempt_1 throws Exception at RM .And cl
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated YARN-933: Description: am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:476) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:662) Client Logs Caused by: org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=host-10-18-40-15/10.18.40.59:8020] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:573) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534) 2013-07-17 16:37:05,987 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:R
[jira] [Updated] (YARN-933) After an AppAttempt_1 got failed [ removal and releasing of container is done , AppAttempt_2 is scheduled ] again relaunching of AppAttempt_1 throws Exception at RM .And cl
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated YARN-933: Description: Hostname enabled. am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:476) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:662) Client Logs Caused by: org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=host-10-18-40-15/10.18.40.59:8020] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:573) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534) 2013-07-17 16:37:05,987 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedAc
[jira] [Commented] (YARN-935) Correcting pom.xml to build applicationhistoryserver sub-project successfully
[ https://issues.apache.org/jira/browse/YARN-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712089#comment-13712089 ] Hadoop QA commented on YARN-935: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592927/YARN-935.1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1516//console This message is automatically generated. > Correcting pom.xml to build applicationhistoryserver sub-project successfully > - > > Key: YARN-935 > URL: https://issues.apache.org/jira/browse/YARN-935 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-935.1.patch > > > The branch was created from branch-2, > hadoop-yarn-server-applicationhistoryserver/pom.xml should use > 2.2.0-SNAPSHOT, not 3.0.0-SNAPSHOT. Otherwise, the sub-project cannot be > built correctly because of wrong dependency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira