[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable

2013-07-18 Thread Maysam Yabandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713322#comment-13713322
 ] 

Maysam Yabandeh commented on YARN-713:
--

[~ojoshi], but the patch was ready since June 14! Anyway, feel free to take 
over.

> ResourceManager can exit unexpectedly if DNS is unavailable
> ---
>
> Key: YARN-713
> URL: https://issues.apache.org/jira/browse/YARN-713
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Maysam Yabandeh
>Priority: Critical
> Fix For: 2.1.0-beta
>
> Attachments: YARN-713.patch, YARN-713.patch, YARN-713.patch, 
> YARN-713.patch
>
>
> As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could 
> lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and 
> that ultimately would cause the RM to exit.  The RM should not exit during 
> DNS hiccups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception

2013-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713315#comment-13713315
 ] 

Hadoop QA commented on YARN-875:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12593101/YARN-875.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1529//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1529//console

This message is automatically generated.

> Application can hang if AMRMClientAsync callback thread has exception
> -
>
> Key: YARN-875
> URL: https://issues.apache.org/jira/browse/YARN-875
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch, 
> YARN-875.3.patch
>
>
> Currently that thread will die and then never callback. App can hang. 
> Possible solution could be to catch Throwable in the callback and then call 
> client.onError().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-853) maximum-am-resource-percent doesn't work after refreshQueues command

2013-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713311#comment-13713311
 ] 

Hadoop QA commented on YARN-853:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12593117/YARN-853-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1528//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1528//console

This message is automatically generated.

> maximum-am-resource-percent doesn't work after refreshQueues command
> 
>
> Key: YARN-853
> URL: https://issues.apache.org/jira/browse/YARN-853
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.5-alpha
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: YARN-853-1.patch, YARN-853-2.patch, YARN-853-3.patch, 
> YARN-853.patch
>
>
> If we update yarn.scheduler.capacity.maximum-am-resource-percent / 
> yarn.scheduler.capacity..maximum-am-resource-percent 
> configuration and then do the refreshNodes, it uses the new config value to 
> calculate Max Active Applications and Max Active Application Per User. If we 
> add new node after issuing  'rmadmin -refreshQueues' command, it uses the old 
> maximum-am-resource-percent config value to calculate Max Active Applications 
> and Max Active Application Per User. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-853) maximum-am-resource-percent doesn't work after refreshQueues command

2013-07-18 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-853:
---

Attachment: YARN-853-3.patch

> maximum-am-resource-percent doesn't work after refreshQueues command
> 
>
> Key: YARN-853
> URL: https://issues.apache.org/jira/browse/YARN-853
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.5-alpha
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: YARN-853-1.patch, YARN-853-2.patch, YARN-853-3.patch, 
> YARN-853.patch
>
>
> If we update yarn.scheduler.capacity.maximum-am-resource-percent / 
> yarn.scheduler.capacity..maximum-am-resource-percent 
> configuration and then do the refreshNodes, it uses the new config value to 
> calculate Max Active Applications and Max Active Application Per User. If we 
> add new node after issuing  'rmadmin -refreshQueues' command, it uses the old 
> maximum-am-resource-percent config value to calculate Max Active Applications 
> and Max Active Application Per User. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-685) Capacity Scheduler is not distributing the reducers tasks across the cluster

2013-07-18 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713285#comment-13713285
 ] 

Omkar Vinit Joshi commented on YARN-685:


[~raviprak] can you please tell me what is  and ?? I think 
first one is nodes..what is second?
also what do you mean here?
{code}
For 23, Reduce: 
2 1
32 2
1 4
{code}

> Capacity Scheduler is not distributing the reducers tasks across the cluster
> 
>
> Key: YARN-685
> URL: https://issues.apache.org/jira/browse/YARN-685
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.0.4-alpha
>Reporter: Devaraj K
>
> If we have reducers whose total memory required to complete is less than the 
> total cluster memory, it is not assigning the reducers to all the nodes 
> uniformly(~uniformly). Also at that time there are no other jobs or job tasks 
> running in the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-245) Node Manager can not handle duplicate responses

2013-07-18 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713282#comment-13713282
 ] 

Omkar Vinit Joshi commented on YARN-245:


ignore thread comment ...realized it is L not 1 :D


> Node Manager can not handle duplicate responses
> ---
>
> Key: YARN-245
> URL: https://issues.apache.org/jira/browse/YARN-245
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Mayank Bansal
> Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, 
> YARN-245-trunk-3.patch
>
>
> {code:xml}
> 2012-11-25 12:56:11,795 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> FINISH_APPLICATION at FINISHED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
> at java.lang.Thread.run(Thread.java:662)
> 2012-11-25 12:56:11,796 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1353818859056_0004 transitioned from FINISHED to null
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-245) Node Manager can not handle duplicate responses

2013-07-18 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713280#comment-13713280
 ] 

Omkar Vinit Joshi commented on YARN-245:


Thanks [~mayank_bansal] for the patch.. I agree that checking heartbeat ids 
will test this issue... few comments..
{code}
+  conf.setBoolean(YarnConfiguration.LOG_AGGREGATION_ENABLED, true);
{code}

why are we doing this?

{code}
+  NodeStatus nodeStatus = request.getNodeStatus();
+  nodeStatus.setResponseId(heartBeatID++);
{code}
required? can be removed?

* There is one issue at present with NodeStatusUpdaterImpl.java ...imagine if 
we get such a heartbeat then we will not wait but try again.. check finally 
code {} which won't get executed. and will keep pinging RM until we get 
correct response with response-id. Should we wait or immediately request? 
thoughts?

{code}
+Thread.sleep(1000l);
{code}
can we make it 1000? .. 

* test will need timeout. however I see there are certain tests without 
timeout... if adding timeout then add little larger value... :) 

{code}
+  if (nodeStatus.getKeepAliveApplications() != null
+  && nodeStatus.getKeepAliveApplications().size() > 0) {
+for (ApplicationId appId : nodeStatus.getKeepAliveApplications()) {
+  List list = keepAliveRequests.get(appId);
+  if (list == null) {
+list = new LinkedList();
+keepAliveRequests.put(appId, list);
+  }
+  list.add(System.currentTimeMillis());
+}
+  }
{code}
{code}
+  if (heartBeatID == 2) {
+LOG.info("Sending FINISH_APP for application: [" + appId + "]");
+this.context.getApplications().put(appId, mock(Application.class));
+
nhResponse.addAllApplicationsToCleanup(Collections.singletonList(appId));
+  }
{code}

{code}
+  rt.context.getApplications().remove(rt.appId);
{code} 

{code}
+private Map> keepAliveRequests =
+new HashMap>();
+private ApplicationId appId = BuilderUtils.newApplicationId(1, 1);
{code}

do we need this? can we remove all application related stuff? as we are now 
checking only heartbeat ids..we can remove this.. thoughts?

> Node Manager can not handle duplicate responses
> ---
>
> Key: YARN-245
> URL: https://issues.apache.org/jira/browse/YARN-245
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Mayank Bansal
> Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, 
> YARN-245-trunk-3.patch
>
>
> {code:xml}
> 2012-11-25 12:56:11,795 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> FINISH_APPLICATION at FINISHED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
> at java.lang.Thread.run(Thread.java:662)
> 2012-11-25 12:56:11,796 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1353818859056_0004 transitioned from FINISHED to null
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-857) Errors when localizing end up with the localization failure not being seen by the NM

2013-07-18 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-857:
-

Issue Type: Sub-task  (was: Bug)
Parent: YARN-522

> Errors when localizing end up with the localization failure not being seen by 
> the NM
> 
>
> Key: YARN-857
> URL: https://issues.apache.org/jira/browse/YARN-857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: YARN-857.1.patch, YARN-857.2.patch
>
>
> at 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:106)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:978)
> Traced this down to DefaultExecutor which does not look at the exit code for 
> the localizer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701

2013-07-18 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713243#comment-13713243
 ] 

Arun C Murthy commented on YARN-937:


Hey [~tucu00], is there a chance you can get to this one quickly? Much 
appreciated, thanks!

> Fix unmanaged AM in non-secure/secure setup post YARN-701
> -
>
> Key: YARN-937
> URL: https://issues.apache.org/jira/browse/YARN-937
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Arun C Murthy
>Assignee: Alejandro Abdelnur
>Priority: Blocker
> Fix For: 2.1.0-beta
>
>
> Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens 
> will be used in both scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-919) Setting default heap sizes in yarn env

2013-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713244#comment-13713244
 ] 

Hadoop QA commented on YARN-919:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12593107/YARN.919.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1526//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1526//console

This message is automatically generated.

> Setting default heap sizes in yarn env
> --
>
> Key: YARN-919
> URL: https://issues.apache.org/jira/browse/YARN-919
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
>Priority: Minor
> Attachments: YARN.919.4.patch, YARN-919-trunk-1.patch, 
> YARN-919-trunk-2.patch, YARN-919-trunk-3.patch
>
>
> Right now there are no defaults in yarn env scripts for resource manager nad 
> node manager and if user wants to override that, then user has to go to 
> documentation and find the variables and change the script.
> There is no straight forward way to change it in script. Just updating the 
> variables with defaults.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-18 Thread Djellel Eddine Difallah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713241#comment-13713241
 ] 

Djellel Eddine Difallah commented on YARN-897:
--

That is what I tried before, but because the parentQueue doesn't know which 
queue made the call, we don't know what to reinsert. Basically all we can do is 
to go down to tree to figure this out. Alternatively, we can modify the 
signature of completedContainer to include the Queue...
That's why we figured that it's more efficient to have the leaf queue 
explicitly trigger the call.

> CapacityScheduler wrongly sorted queues
> ---
>
> Key: YARN-897
> URL: https://issues.apache.org/jira/browse/YARN-897
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.0.4-alpha
>Reporter: Djellel Eddine Difallah
>Priority: Blocker
> Attachments: TestBugParentQueue.java, YARN-897-1.patch, 
> YARN-897-2.patch
>
>
> The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
> defines the sort order. This ensures the queue with least UsedCapacity to 
> receive resources next. On containerAssignment we correctly update the order, 
> but we miss to do so on container completions. This corrupts the TreeSet 
> structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-927) Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest

2013-07-18 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713235#comment-13713235
 ] 

Vinod Kumar Vavilapalli commented on YARN-927:
--

Seems like this is already committed. Close this?

> Change ContainerRequest to not have more than 1 container count and remove 
> StoreContainerRequest
> 
>
> Key: YARN-927
> URL: https://issues.apache.org/jira/browse/YARN-927
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: YARN-927.1.patch, YARN-927.2.patch, YARN-927.3.patch, 
> YARN-927.4.patch
>
>
> The downside is having to use more than 1 container request when requesting 
> more than 1 container at * priority. For most other use cases that have 
> specific locations we anyways need to make multiple container requests. This 
> will also remove unnecessary duplication caused by StoredContainerRequest. It 
> will make the getMatchingRequest() always available and easy to use 
> removeContainerRequest().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-403) Node Manager throws java.io.IOException: Verification of the hashReply failed

2013-07-18 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713234#comment-13713234
 ] 

Devaraj K commented on YARN-403:


Sorry Omkar, I don't have logs for this now, I have faced it once during long 
runs when the load was high on the cluster. During that time the fetch request 
from the Reducer got failed due to this error.

We could wait for sometime If I get any further info I will update, or if any 
others face this they could also help to check this further. 

> Node Manager throws java.io.IOException: Verification of the hashReply failed
> -
>
> Key: YARN-403
> URL: https://issues.apache.org/jira/browse/YARN-403
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.2-alpha, 0.23.6
>Reporter: Devaraj K
>Assignee: Omkar Vinit Joshi
>
> {code:xml}
> 2013-02-09 22:59:47,490 WARN org.apache.hadoop.mapred.ShuffleHandler: Shuffle 
> failure 
> java.io.IOException: Verification of the hashReply failed
>   at 
> org.apache.hadoop.mapreduce.security.SecureShuffleUtils.verifyReply(SecureShuffleUtils.java:98)
>   at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.verifyRequest(ShuffleHandler.java:436)
>   at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:383)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
>   at 
> org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:540)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
>   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349)
>   at 
> org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
>   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
>   at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-18 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713233#comment-13713233
 ] 

Arun C Murthy commented on YARN-897:


On second thoughts - we should get ParentQueue.completedContainer to resort, 
rather than relying on LeafQueue to make an explicit 'resort' call... 

> CapacityScheduler wrongly sorted queues
> ---
>
> Key: YARN-897
> URL: https://issues.apache.org/jira/browse/YARN-897
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Djellel Eddine Difallah
> Attachments: TestBugParentQueue.java, YARN-897-1.patch, 
> YARN-897-2.patch
>
>
> The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
> defines the sort order. This ensures the queue with least UsedCapacity to 
> receive resources next. On containerAssignment we correctly update the order, 
> but we miss to do so on container completions. This corrupts the TreeSet 
> structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-18 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-897:
---

Target Version/s: 2.1.0-beta

> CapacityScheduler wrongly sorted queues
> ---
>
> Key: YARN-897
> URL: https://issues.apache.org/jira/browse/YARN-897
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.0.4-alpha
>Reporter: Djellel Eddine Difallah
>Priority: Blocker
> Attachments: TestBugParentQueue.java, YARN-897-1.patch, 
> YARN-897-2.patch
>
>
> The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
> defines the sort order. This ensures the queue with least UsedCapacity to 
> receive resources next. On containerAssignment we correctly update the order, 
> but we miss to do so on container completions. This corrupts the TreeSet 
> structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-18 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-897:
---

Priority: Blocker  (was: Major)

> CapacityScheduler wrongly sorted queues
> ---
>
> Key: YARN-897
> URL: https://issues.apache.org/jira/browse/YARN-897
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Djellel Eddine Difallah
>Priority: Blocker
> Attachments: TestBugParentQueue.java, YARN-897-1.patch, 
> YARN-897-2.patch
>
>
> The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
> defines the sort order. This ensures the queue with least UsedCapacity to 
> receive resources next. On containerAssignment we correctly update the order, 
> but we miss to do so on container completions. This corrupts the TreeSet 
> structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-18 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-897:
---

Affects Version/s: 2.0.4-alpha

> CapacityScheduler wrongly sorted queues
> ---
>
> Key: YARN-897
> URL: https://issues.apache.org/jira/browse/YARN-897
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.0.4-alpha
>Reporter: Djellel Eddine Difallah
>Priority: Blocker
> Attachments: TestBugParentQueue.java, YARN-897-1.patch, 
> YARN-897-2.patch
>
>
> The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
> defines the sort order. This ensures the queue with least UsedCapacity to 
> receive resources next. On containerAssignment we correctly update the order, 
> but we miss to do so on container completions. This corrupts the TreeSet 
> structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-919) Setting default heap sizes in yarn env

2013-07-18 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-919:
-

Attachment: YARN.919.4.patch

[~mayank_bansal] Thanks for addressing the review comments. Looking at bin/yarn 
in a bit more detail, I noticed a couple of other gotchas that can affect users 
when setting heapsize. In that aspect, I have attached a patch with a bit more 
verbose documentation. 

Let me know what you think. 

> Setting default heap sizes in yarn env
> --
>
> Key: YARN-919
> URL: https://issues.apache.org/jira/browse/YARN-919
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
>Priority: Minor
> Attachments: YARN.919.4.patch, YARN-919-trunk-1.patch, 
> YARN-919-trunk-2.patch, YARN-919-trunk-3.patch
>
>
> Right now there are no defaults in yarn env scripts for resource manager nad 
> node manager and if user wants to override that, then user has to go to 
> documentation and find the variables and change the script.
> There is no straight forward way to change it in script. Just updating the 
> variables with defaults.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-18 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713231#comment-13713231
 ] 

Arun C Murthy commented on YARN-897:


Patch looks good. Can you please provide a patch which includes the test as 
well (rather than 2 files). Tx!

> CapacityScheduler wrongly sorted queues
> ---
>
> Key: YARN-897
> URL: https://issues.apache.org/jira/browse/YARN-897
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Djellel Eddine Difallah
> Attachments: TestBugParentQueue.java, YARN-897-1.patch, 
> YARN-897-2.patch
>
>
> The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
> defines the sort order. This ensures the queue with least UsedCapacity to 
> receive resources next. On containerAssignment we correctly update the order, 
> but we miss to do so on container completions. This corrupts the TreeSet 
> structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701

2013-07-18 Thread Hudson (JIRA)
/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCResponseId.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


> ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload 
> after YARN-701
> -
>
> Key: YARN-918
> URL: https://issues.apache.org/jira/browse/YARN-918
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 2.1.0-beta
>
> Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt, 
> YARN-918-20130718.txt
>
>
> Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need 
> ApplicationAttemptId in the RPC pay load. This is an API change, so doing it 
> as a blocker for 2.1.0-beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

2013-07-18 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713206#comment-13713206
 ] 

Omkar Vinit Joshi commented on YARN-744:


[~bikassaha] yes... there is similar but different bug though..so 
[~mayank_bansal] is fixing it. There we are computing the response and then 
updating RMNodeImpl asynchronously. If this approach is correct then we can do 
the similar thing after YARN-245 is in.

> Race condition in ApplicationMasterService.allocate .. It might process same 
> allocate request twice resulting in additional containers getting allocated.
> -
>
> Key: YARN-744
> URL: https://issues.apache.org/jira/browse/YARN-744
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>Priority: Minor
> Attachments: MAPREDUCE-3899-branch-0.23.patch, 
> YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch
>
>
> Looks like the lock taken in this is broken. It takes a lock on lastResponse 
> object and then puts a new lastResponse object into the map. At this point a 
> new thread entering this function will get a new lastResponse object and will 
> be able to take its lock and enter the critical section. Presumably we want 
> to limit one response per app attempt. So the lock could be taken on the 
> ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-18 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713204#comment-13713204
 ] 

Bikas Saha commented on YARN-321:
-

Sounds right.

> Generic application history service
> ---
>
> Key: YARN-321
> URL: https://issues.apache.org/jira/browse/YARN-321
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Luke Lu
>Assignee: Vinod Kumar Vavilapalli
> Attachments: HistoryStorageDemo.java
>
>
> The mapreduce job history server currently needs to be deployed as a trusted 
> server in sync with the mapreduce runtime. Every new application would need a 
> similar application history server. Having to deploy O(T*V) (where T is 
> number of type of application, V is number of version of application) trusted 
> servers is clearly not scalable.
> Job history storage handling itself is pretty generic: move the logs and 
> history data into a particular directory for later serving. Job history data 
> is already stored as json (or binary avro). I propose that we create only one 
> trusted application history server, which can have a generic UI (display json 
> as a tree of strings) as well. Specific application/version can deploy 
> untrusted webapps (a la AMs) to query the application history server and 
> interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

2013-07-18 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713203#comment-13713203
 ] 

Bikas Saha commented on YARN-744:
-

Does the same thing apply for ResourceTrackerService too?

> Race condition in ApplicationMasterService.allocate .. It might process same 
> allocate request twice resulting in additional containers getting allocated.
> -
>
> Key: YARN-744
> URL: https://issues.apache.org/jira/browse/YARN-744
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>Priority: Minor
> Attachments: MAPREDUCE-3899-branch-0.23.patch, 
> YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch
>
>
> Looks like the lock taken in this is broken. It takes a lock on lastResponse 
> object and then puts a new lastResponse object into the map. At this point a 
> new thread entering this function will get a new lastResponse object and will 
> be able to take its lock and enter the critical section. Presumably we want 
> to limit one response per app attempt. So the lock could be taken on the 
> ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable

2013-07-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713181#comment-13713181
 ] 

Hudson commented on YARN-814:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4115 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4115/])
YARN-814. Improving diagnostics when containers fail during launch due to 
various reasons like invalid env etc. Contributed by Jian He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504732)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java


> Difficult to diagnose a failed container launch when error due to invalid 
> environment variable
> --
>
> Key: YARN-814
> URL: https://issues.apache.org/jira/browse/YARN-814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Jian He
> Fix For: 2.1.1-beta
>
> Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, 
> YARN-814.4.patch, YARN-814.5.patch, YARN-814.6.patch, YARN-814.7.patch, 
> YARN-814.patch
>
>
> The container's launch script sets up environment variables, symlinks etc. 
> If there is any failure when setting up the basic context ( before the actual 
> user's process is launched ), nothing is captured by the NM. This makes it 
> impossible to diagnose the reason for the failure. 
> To reproduce, set an env var where the value contains characters that throw 
> syntax errors in bash. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-919) Setting default heap sizes in yarn env

2013-07-18 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713180#comment-13713180
 ] 

Vinod Kumar Vavilapalli commented on YARN-919:
--

[~hitesh], can you please review/commit this? Tx.

> Setting default heap sizes in yarn env
> --
>
> Key: YARN-919
> URL: https://issues.apache.org/jira/browse/YARN-919
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
>Priority: Minor
> Attachments: YARN-919-trunk-1.patch, YARN-919-trunk-2.patch, 
> YARN-919-trunk-3.patch
>
>
> Right now there are no defaults in yarn env scripts for resource manager nad 
> node manager and if user wants to override that, then user has to go to 
> documentation and find the variables and change the script.
> There is no straight forward way to change it in script. Just updating the 
> variables with defaults.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable

2013-07-18 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713158#comment-13713158
 ] 

Vinod Kumar Vavilapalli commented on YARN-814:
--

+1. This looks good. Hopefully the tests run fine on Windows too.

Checking this in.

> Difficult to diagnose a failed container launch when error due to invalid 
> environment variable
> --
>
> Key: YARN-814
> URL: https://issues.apache.org/jira/browse/YARN-814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Jian He
> Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, 
> YARN-814.4.patch, YARN-814.5.patch, YARN-814.6.patch, YARN-814.7.patch, 
> YARN-814.patch
>
>
> The container's launch script sets up environment variables, symlinks etc. 
> If there is any failure when setting up the basic context ( before the actual 
> user's process is launched ), nothing is captured by the NM. This makes it 
> impossible to diagnose the reason for the failure. 
> To reproduce, set an env var where the value contains characters that throw 
> syntax errors in bash. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception

2013-07-18 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-875:
---

Attachment: YARN-875.3.patch

> Application can hang if AMRMClientAsync callback thread has exception
> -
>
> Key: YARN-875
> URL: https://issues.apache.org/jira/browse/YARN-875
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch, 
> YARN-875.3.patch
>
>
> Currently that thread will die and then never callback. App can hang. 
> Possible solution could be to catch Throwable in the callback and then call 
> client.onError().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception

2013-07-18 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713159#comment-13713159
 ] 

Xuan Gong commented on YARN-875:


Currently, if callback handler has exception, this thread will stop. But 
applicationMaster will keep running. We try to stop ApplicationMaster if 
callback handler has exception. That is why we add a try..catch block, and at 
catch block, we call handler.onError().

And calling stop() inside onError() is not required, it is the recommended 
action. If it call stop() inside onError(), that is fine, too. Eventually, 
AMRMClientAsync will call unregisterApplicationMaster and set keepRunning flag 
to false which will stop the heartBeat thread. But it is good to let heartBeat 
thread stop earlier.

> Application can hang if AMRMClientAsync callback thread has exception
> -
>
> Key: YARN-875
> URL: https://issues.apache.org/jira/browse/YARN-875
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch, 
> YARN-875.3.patch
>
>
> Currently that thread will die and then never callback. App can hang. 
> Possible solution could be to catch Throwable in the callback and then call 
> client.onError().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-245) Node Manager can not handle duplicate responses

2013-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713154#comment-13713154
 ] 

Hadoop QA commented on YARN-245:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12593097/YARN-245-trunk-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1525//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1525//console

This message is automatically generated.

> Node Manager can not handle duplicate responses
> ---
>
> Key: YARN-245
> URL: https://issues.apache.org/jira/browse/YARN-245
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Mayank Bansal
> Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, 
> YARN-245-trunk-3.patch
>
>
> {code:xml}
> 2012-11-25 12:56:11,795 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> FINISH_APPLICATION at FINISHED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
> at java.lang.Thread.run(Thread.java:662)
> 2012-11-25 12:56:11,796 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1353818859056_0004 transitioned from FINISHED to null
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-245) Node Manager can not handle duplicate responses

2013-07-18 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713142#comment-13713142
 ] 

Mayank Bansal commented on YARN-245:


Thanks [~ojoshi] for the review.

I had an offline discussion with Omkar.

I removed 
 |||+ private int lastHeartBeatId;

I changed the duplicate response id behavior as well.

For tests we agreed to test the duplication of heart beat.

Updating the patch.

Thanks,
Mayank

> Node Manager can not handle duplicate responses
> ---
>
> Key: YARN-245
> URL: https://issues.apache.org/jira/browse/YARN-245
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Mayank Bansal
> Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, 
> YARN-245-trunk-3.patch
>
>
> {code:xml}
> 2012-11-25 12:56:11,795 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> FINISH_APPLICATION at FINISHED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
> at java.lang.Thread.run(Thread.java:662)
> 2012-11-25 12:56:11,796 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1353818859056_0004 transitioned from FINISHED to null
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-245) Node Manager can not handle duplicate responses

2013-07-18 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-245:
---

Attachment: YARN-245-trunk-3.patch

Attaching patch

Thanks,
Mayank

> Node Manager can not handle duplicate responses
> ---
>
> Key: YARN-245
> URL: https://issues.apache.org/jira/browse/YARN-245
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Mayank Bansal
> Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, 
> YARN-245-trunk-3.patch
>
>
> {code:xml}
> 2012-11-25 12:56:11,795 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> FINISH_APPLICATION at FINISHED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
> at java.lang.Thread.run(Thread.java:662)
> 2012-11-25 12:56:11,796 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1353818859056_0004 transitioned from FINISHED to null
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701

2013-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713099#comment-13713099
 ] 

Hadoop QA commented on YARN-918:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12593085/YARN-918-20130718.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1524//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1524//console

This message is automatically generated.

> ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload 
> after YARN-701
> -
>
> Key: YARN-918
> URL: https://issues.apache.org/jira/browse/YARN-918
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt, 
> YARN-918-20130718.txt
>
>
> Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need 
> ApplicationAttemptId in the RPC pay load. This is an API change, so doing it 
> as a blocker for 2.1.0-beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701

2013-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713087#comment-13713087
 ] 

Hadoop QA commented on YARN-918:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12593085/YARN-918-20130718.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.api.impl.TestNMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1523//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1523//console

This message is automatically generated.

> ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload 
> after YARN-701
> -
>
> Key: YARN-918
> URL: https://issues.apache.org/jira/browse/YARN-918
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt, 
> YARN-918-20130718.txt
>
>
> Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need 
> ApplicationAttemptId in the RPC pay load. This is an API change, so doing it 
> as a blocker for 2.1.0-beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-403) Node Manager throws java.io.IOException: Verification of the hashReply failed

2013-07-18 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713068#comment-13713068
 ] 

Omkar Vinit Joshi commented on YARN-403:


[~devaraj.k] Can you give some more information? RM / NM / application logs 
would have helped a lot are you able to reproduce this? any steps to 
reproduce? what I can see is that hash provided is not what shuffle service was 
expecting.. did it fail for all?

> Node Manager throws java.io.IOException: Verification of the hashReply failed
> -
>
> Key: YARN-403
> URL: https://issues.apache.org/jira/browse/YARN-403
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.2-alpha, 0.23.6
>Reporter: Devaraj K
>Assignee: Omkar Vinit Joshi
>
> {code:xml}
> 2013-02-09 22:59:47,490 WARN org.apache.hadoop.mapred.ShuffleHandler: Shuffle 
> failure 
> java.io.IOException: Verification of the hashReply failed
>   at 
> org.apache.hadoop.mapreduce.security.SecureShuffleUtils.verifyReply(SecureShuffleUtils.java:98)
>   at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.verifyRequest(ShuffleHandler.java:436)
>   at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:383)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
>   at 
> org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:540)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
>   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349)
>   at 
> org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
>   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
>   at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713059#comment-13713059
 ] 

Xuan Gong commented on YARN-873:


bq. When you say different errors, are you implying different error messages or 
different exit codes? For anyone building a script-based tool on this api, the 
latter would be preferred.

Now, I get it. Yes, you are right. We'd better to set different values.

> YARNClient.getApplicationReport(unknownAppId) returns a null report
> ---
>
> Key: YARN-873
> URL: https://issues.apache.org/jira/browse/YARN-873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch
>
>
> How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-403) Node Manager throws java.io.IOException: Verification of the hashReply failed

2013-07-18 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi reassigned YARN-403:
--

Assignee: Omkar Vinit Joshi

> Node Manager throws java.io.IOException: Verification of the hashReply failed
> -
>
> Key: YARN-403
> URL: https://issues.apache.org/jira/browse/YARN-403
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.2-alpha, 0.23.6
>Reporter: Devaraj K
>Assignee: Omkar Vinit Joshi
>
> {code:xml}
> 2013-02-09 22:59:47,490 WARN org.apache.hadoop.mapred.ShuffleHandler: Shuffle 
> failure 
> java.io.IOException: Verification of the hashReply failed
>   at 
> org.apache.hadoop.mapreduce.security.SecureShuffleUtils.verifyReply(SecureShuffleUtils.java:98)
>   at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.verifyRequest(ShuffleHandler.java:436)
>   at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:383)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
>   at 
> org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:540)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
>   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349)
>   at 
> org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
>   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
>   at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701

2013-07-18 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-918:
-

Attachment: YARN-918-20130718.txt

Found one potential test issue that could be causing this. Fixing it.

> ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload 
> after YARN-701
> -
>
> Key: YARN-918
> URL: https://issues.apache.org/jira/browse/YARN-918
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt, 
> YARN-918-20130718.txt
>
>
> Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need 
> ApplicationAttemptId in the RPC pay load. This is an API change, so doing it 
> as a blocker for 2.1.0-beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-880) Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution

2013-07-18 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713006#comment-13713006
 ] 

Omkar Vinit Joshi commented on YARN-880:


Can you please provide below information to help debug the issue?
* RM / NM / AM logs (please enable debug).
* yarn-site.xml and mapred-site.xml files used.
* Which scheduler you are using?


> Configuring map/reduce memory equal to nodemanager's memory, hangs the job 
> execution
> 
>
> Key: YARN-880
> URL: https://issues.apache.org/jira/browse/YARN-880
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.1-alpha
>Reporter: Nishan Shetty
>Assignee: Omkar Vinit Joshi
>Priority: Critical
>
> Scenario:
> =
> Cluster is installed with 2 Nodemanagers 
> Configuraiton:
> NM memory (yarn.nodemanager.resource.memory-mb): 8 gb
> map and reduce memory : 8 gb
> Appmaster memory: 2 gb
> If map task is reserved on the same nodemanager where appmaster of the same 
> job is running then job execution hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712994#comment-13712994
 ] 

Hitesh Shah commented on YARN-873:
--

[~xgong] When you say different errors, are you implying different error 
messages or different exit codes? For anyone building a script-based tool on 
this api, the latter would be preferred.

> YARNClient.getApplicationReport(unknownAppId) returns a null report
> ---
>
> Key: YARN-873
> URL: https://issues.apache.org/jira/browse/YARN-873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch
>
>
> How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712985#comment-13712985
 ] 

Xuan Gong commented on YARN-873:


bq.Requiring to parse a string message to determine whether an application 
exists or not is more work as compared to checking $? which can be used to 
indicate various errors such as connection issue/invalid application id/app 
does not exist in RM.

Yes, but here we indicate different errors based on the different exceptions 
that we catch, such as ApplicationNotFoundException.

> YARNClient.getApplicationReport(unknownAppId) returns a null report
> ---
>
> Key: YARN-873
> URL: https://issues.apache.org/jira/browse/YARN-873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch
>
>
> How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701

2013-07-18 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712969#comment-13712969
 ] 

Vinod Kumar Vavilapalli commented on YARN-918:
--

Checking.. This passes on my local machine. Jenkins is complaining about port 
issues. Will retrigger it and at the same time run all tests locally..

> ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload 
> after YARN-701
> -
>
> Key: YARN-918
> URL: https://issues.apache.org/jira/browse/YARN-918
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt
>
>
> Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need 
> ApplicationAttemptId in the RPC pay load. This is an API change, so doing it 
> as a blocker for 2.1.0-beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712968#comment-13712968
 ] 

Xuan Gong commented on YARN-873:


bq.Having a return statement in a catch/finally block is not recommended 
normally. We could print the message and re-throw the exception or simply not 
catch the exception. Also, this way the cmd line would exit with non-zero exit 
code.

I still prefer to the way "print out message, then exist" instead of the way 
"print out message then throw exception or not catch the exception" in this 
scenario. If we re-throw exception or not catch exception, it will make no 
different between we throw YarnException at 
YARNClient.getApplicationReport(unknownAppId).
If the user get the Exception, that means they need to check and debug whether 
there is anything wrong. For this case, if the users give the unknown 
application_id, they will get the message, and this is the expected action. 



> YARNClient.getApplicationReport(unknownAppId) returns a null report
> ---
>
> Key: YARN-873
> URL: https://issues.apache.org/jira/browse/YARN-873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch
>
>
> How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-903) DistributedShell throwing Errors in logs after successfull completion

2013-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712932#comment-13712932
 ] 

Hadoop QA commented on YARN-903:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12593065/YARN-903-20130718.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1522//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1522//console

This message is automatically generated.

> DistributedShell throwing Errors in logs after successfull completion
> -
>
> Key: YARN-903
> URL: https://issues.apache.org/jira/browse/YARN-903
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Affects Versions: 2.0.4-alpha
> Environment: Ununtu 11.10
>Reporter: Abhishek Kapoor
>Assignee: Omkar Vinit Joshi
> Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, 
> YARN-903-20130718.1.patch, yarn-sunny-nodemanager-sunny-Inspiron.log
>
>
> I have tried running DistributedShell and also used ApplicationMaster of the 
> same for my test.
> The application is successfully running through logging some errors which 
> would be useful to fix.
> Below are the logs from NodeManager and ApplicationMasterode
> Log Snippet for NodeManager
> =
> 2013-07-07 13:39:18,787 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting 
> to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1
> 2013-07-07 13:39:19,050 INFO 
> org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager:
>  Rolling master-key for container-tokens, got key with id -325382586
> 2013-07-07 13:39:19,052 INFO 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: 
> Rolling master-key for nm-tokens, got key with id :1005046570
> 2013-07-07 13:39:19,053 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered 
> with ResourceManager as sunny-Inspiron:9993 with total resource of 
> 
> 2013-07-07 13:39:19,053 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying 
> ContainerManager to unblock new container-requests
> 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
> Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE)
> 2013-07-07 13:39:35,492 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Start request for container_1373184544832_0001_01_01 by user sunny
> 2013-07-07 13:39:35,507 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Creating a new application reference for app application_1373184544832_0001
> 2013-07-07 13:39:35,511 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny  
> IP=127.0.0.1OPERATION=Start Container Request   
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1373184544832_0001
> CONTAINERID=container_1373184544832_0001_01_01
> 2013-07-07 13:39:35,511 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1373184544832_0001 transitioned from NEW to INITING
> 2013-07-07 13:39:35,512 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Adding container_1373184544832_0001_01_01 to application 
> application_1373184544832_0001
> 2013-07-07 13:39:35,518 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_13

[jira] [Assigned] (YARN-880) Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution

2013-07-18 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi reassigned YARN-880:
--

Assignee: Omkar Vinit Joshi

> Configuring map/reduce memory equal to nodemanager's memory, hangs the job 
> execution
> 
>
> Key: YARN-880
> URL: https://issues.apache.org/jira/browse/YARN-880
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.1-alpha
>Reporter: Nishan Shetty
>Assignee: Omkar Vinit Joshi
>Priority: Critical
>
> Scenario:
> =
> Cluster is installed with 2 Nodemanagers 
> Configuraiton:
> NM memory (yarn.nodemanager.resource.memory-mb): 8 gb
> map and reduce memory : 8 gb
> Appmaster memory: 2 gb
> If map task is reserved on the same nodemanager where appmaster of the same 
> job is running then job execution hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712931#comment-13712931
 ] 

Karthik Kambatla commented on YARN-353:
---

For the findbugs warning around NUM_RETRIES, we should probably make it 
non-static numRetries.

> Add Zookeeper-based store implementation for RMStateStore
> -
>
> Key: YARN-353
> URL: https://issues.apache.org/jira/browse/YARN-353
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
> Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
> YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, 
> YARN-353.8.patch
>
>
> Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712902#comment-13712902
 ] 

Hadoop QA commented on YARN-353:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12593045/YARN-353.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1520//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1520//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1520//console

This message is automatically generated.

> Add Zookeeper-based store implementation for RMStateStore
> -
>
> Key: YARN-353
> URL: https://issues.apache.org/jira/browse/YARN-353
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
> Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
> YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, 
> YARN-353.8.patch
>
>
> Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-903) DistributedShell throwing Errors in logs after successfull completion

2013-07-18 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712889#comment-13712889
 ] 

Omkar Vinit Joshi commented on YARN-903:


Attaching a simple test to verify this.

> DistributedShell throwing Errors in logs after successfull completion
> -
>
> Key: YARN-903
> URL: https://issues.apache.org/jira/browse/YARN-903
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Affects Versions: 2.0.4-alpha
> Environment: Ununtu 11.10
>Reporter: Abhishek Kapoor
>Assignee: Omkar Vinit Joshi
> Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, 
> YARN-903-20130718.1.patch, yarn-sunny-nodemanager-sunny-Inspiron.log
>
>
> I have tried running DistributedShell and also used ApplicationMaster of the 
> same for my test.
> The application is successfully running through logging some errors which 
> would be useful to fix.
> Below are the logs from NodeManager and ApplicationMasterode
> Log Snippet for NodeManager
> =
> 2013-07-07 13:39:18,787 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting 
> to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1
> 2013-07-07 13:39:19,050 INFO 
> org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager:
>  Rolling master-key for container-tokens, got key with id -325382586
> 2013-07-07 13:39:19,052 INFO 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: 
> Rolling master-key for nm-tokens, got key with id :1005046570
> 2013-07-07 13:39:19,053 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered 
> with ResourceManager as sunny-Inspiron:9993 with total resource of 
> 
> 2013-07-07 13:39:19,053 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying 
> ContainerManager to unblock new container-requests
> 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
> Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE)
> 2013-07-07 13:39:35,492 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Start request for container_1373184544832_0001_01_01 by user sunny
> 2013-07-07 13:39:35,507 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Creating a new application reference for app application_1373184544832_0001
> 2013-07-07 13:39:35,511 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny  
> IP=127.0.0.1OPERATION=Start Container Request   
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1373184544832_0001
> CONTAINERID=container_1373184544832_0001_01_01
> 2013-07-07 13:39:35,511 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1373184544832_0001 transitioned from NEW to INITING
> 2013-07-07 13:39:35,512 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Adding container_1373184544832_0001_01_01 to application 
> application_1373184544832_0001
> 2013-07-07 13:39:35,518 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1373184544832_0001 transitioned from INITING to 
> RUNNING
> 2013-07-07 13:39:35,528 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1373184544832_0001_01_01 transitioned from NEW to 
> LOCALIZING
> 2013-07-07 13:39:35,540 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource hdfs://localhost:9000/application/test.jar transitioned from INIT 
> to DOWNLOADING
> 2013-07-07 13:39:35,540 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Created localizer for container_1373184544832_0001_01_01
> 2013-07-07 13:39:35,675 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Writing credentials to the nmPrivate file 
> /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens.
>  Credentials list: 
> 2013-07-07 13:39:35,694 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
> Initializing user sunny
> 2013-07-07 13:39:35,803 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying 
> from 
> /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens
>  to 
> /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/u

[jira] [Updated] (YARN-903) DistributedShell throwing Errors in logs after successfull completion

2013-07-18 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-903:
---

Attachment: YARN-903-20130718.1.patch

> DistributedShell throwing Errors in logs after successfull completion
> -
>
> Key: YARN-903
> URL: https://issues.apache.org/jira/browse/YARN-903
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Affects Versions: 2.0.4-alpha
> Environment: Ununtu 11.10
>Reporter: Abhishek Kapoor
>Assignee: Omkar Vinit Joshi
> Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, 
> YARN-903-20130718.1.patch, yarn-sunny-nodemanager-sunny-Inspiron.log
>
>
> I have tried running DistributedShell and also used ApplicationMaster of the 
> same for my test.
> The application is successfully running through logging some errors which 
> would be useful to fix.
> Below are the logs from NodeManager and ApplicationMasterode
> Log Snippet for NodeManager
> =
> 2013-07-07 13:39:18,787 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting 
> to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1
> 2013-07-07 13:39:19,050 INFO 
> org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager:
>  Rolling master-key for container-tokens, got key with id -325382586
> 2013-07-07 13:39:19,052 INFO 
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: 
> Rolling master-key for nm-tokens, got key with id :1005046570
> 2013-07-07 13:39:19,053 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered 
> with ResourceManager as sunny-Inspiron:9993 with total resource of 
> 
> 2013-07-07 13:39:19,053 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying 
> ContainerManager to unblock new container-requests
> 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
> Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE)
> 2013-07-07 13:39:35,492 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Start request for container_1373184544832_0001_01_01 by user sunny
> 2013-07-07 13:39:35,507 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Creating a new application reference for app application_1373184544832_0001
> 2013-07-07 13:39:35,511 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny  
> IP=127.0.0.1OPERATION=Start Container Request   
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1373184544832_0001
> CONTAINERID=container_1373184544832_0001_01_01
> 2013-07-07 13:39:35,511 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1373184544832_0001 transitioned from NEW to INITING
> 2013-07-07 13:39:35,512 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Adding container_1373184544832_0001_01_01 to application 
> application_1373184544832_0001
> 2013-07-07 13:39:35,518 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1373184544832_0001 transitioned from INITING to 
> RUNNING
> 2013-07-07 13:39:35,528 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1373184544832_0001_01_01 transitioned from NEW to 
> LOCALIZING
> 2013-07-07 13:39:35,540 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource hdfs://localhost:9000/application/test.jar transitioned from INIT 
> to DOWNLOADING
> 2013-07-07 13:39:35,540 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Created localizer for container_1373184544832_0001_01_01
> 2013-07-07 13:39:35,675 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Writing credentials to the nmPrivate file 
> /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens.
>  Credentials list: 
> 2013-07-07 13:39:35,694 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
> Initializing user sunny
> 2013-07-07 13:39:35,803 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying 
> from 
> /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens
>  to 
> /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_13

[jira] [Commented] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701

2013-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712882#comment-13712882
 ] 

Hadoop QA commented on YARN-918:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592818/YARN-918-20130717.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1521//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1521//console

This message is automatically generated.

> ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload 
> after YARN-701
> -
>
> Key: YARN-918
> URL: https://issues.apache.org/jira/browse/YARN-918
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt
>
>
> Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need 
> ApplicationAttemptId in the RPC pay load. This is an API change, so doing it 
> as a blocker for 2.1.0-beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-919) Setting default heap sizes in yarn env

2013-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712839#comment-13712839
 ] 

Hadoop QA commented on YARN-919:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12593050/YARN-919-trunk-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1519//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1519//console

This message is automatically generated.

> Setting default heap sizes in yarn env
> --
>
> Key: YARN-919
> URL: https://issues.apache.org/jira/browse/YARN-919
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
>Priority: Minor
> Attachments: YARN-919-trunk-1.patch, YARN-919-trunk-2.patch, 
> YARN-919-trunk-3.patch
>
>
> Right now there are no defaults in yarn env scripts for resource manager nad 
> node manager and if user wants to override that, then user has to go to 
> documentation and find the variables and change the script.
> There is no straight forward way to change it in script. Just updating the 
> variables with defaults.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-18 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712796#comment-13712796
 ] 

Zhijie Shen commented on YARN-321:
--

bq. Running as service: By default, ApplicationHistoryService will be embedded 
inside ResourceManager but will be independent enough to run as a separate 
service for scaling purposes.

IIUC, to be independent, ApplicationHistoryService should have its own event 
dispatcher, shouldn't it?

> Generic application history service
> ---
>
> Key: YARN-321
> URL: https://issues.apache.org/jira/browse/YARN-321
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Luke Lu
>Assignee: Vinod Kumar Vavilapalli
> Attachments: HistoryStorageDemo.java
>
>
> The mapreduce job history server currently needs to be deployed as a trusted 
> server in sync with the mapreduce runtime. Every new application would need a 
> similar application history server. Having to deploy O(T*V) (where T is 
> number of type of application, V is number of version of application) trusted 
> servers is clearly not scalable.
> Job history storage handling itself is pretty generic: move the logs and 
> history data into a particular directory for later serving. Job history data 
> is already stored as json (or binary avro). I propose that we create only one 
> trusted application history server, which can have a generic UI (display json 
> as a tree of strings) as well. Specific application/version can deploy 
> untrusted webapps (a la AMs) to query the application history server and 
> interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-919) Setting default heap sizes in yarn env

2013-07-18 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-919:
---

Attachment: YARN-919-trunk-3.patch

Thanks [~hitesh] and [~vinodkv] for review

Updating the patch

Thanks,
Mayank

> Setting default heap sizes in yarn env
> --
>
> Key: YARN-919
> URL: https://issues.apache.org/jira/browse/YARN-919
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
>Priority: Minor
> Attachments: YARN-919-trunk-1.patch, YARN-919-trunk-2.patch, 
> YARN-919-trunk-3.patch
>
>
> Right now there are no defaults in yarn env scripts for resource manager nad 
> node manager and if user wants to override that, then user has to go to 
> documentation and find the variables and change the script.
> There is no straight forward way to change it in script. Just updating the 
> variables with defaults.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712793#comment-13712793
 ] 

Karthik Kambatla commented on YARN-353:
---

Looks good. +1 pending Jenkins.

> Add Zookeeper-based store implementation for RMStateStore
> -
>
> Key: YARN-353
> URL: https://issues.apache.org/jira/browse/YARN-353
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
> Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
> YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, 
> YARN-353.8.patch
>
>
> Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-18 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-353:
-

Attachment: YARN-353.8.patch

New patch made NUM_RETRIES configurable.
Changed removeApplicationState to use multi api to remove both app state and 
attempts state at the same time. Also fixed the warnings.

> Add Zookeeper-based store implementation for RMStateStore
> -
>
> Key: YARN-353
> URL: https://issues.apache.org/jira/browse/YARN-353
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
> Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
> YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, 
> YARN-353.8.patch
>
>
> Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-938) Hadoop 2 benchmarking

2013-07-18 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-938:
-

Summary: Hadoop 2 benchmarking   (was: Hadoop 2 Bench marking )

> Hadoop 2 benchmarking 
> --
>
> Key: YARN-938
> URL: https://issues.apache.org/jira/browse/YARN-938
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
>
> I am running the benchmarks on Hadoop 2 and will update the results soon.
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-938) Hadoop 2 benchmarking

2013-07-18 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712712#comment-13712712
 ] 

Vinod Kumar Vavilapalli commented on YARN-938:
--

Thanks for doing this Mayank!

> Hadoop 2 benchmarking 
> --
>
> Key: YARN-938
> URL: https://issues.apache.org/jira/browse/YARN-938
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
>
> I am running the benchmarks on Hadoop 2 and will update the results soon.
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos

2013-07-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712698#comment-13712698
 ] 

Hudson commented on YARN-701:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4110 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4110/])
YARN-701. Use application tokens irrespective of secure or non-secure mode. 
Contributed by Vinod K V. (acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504604)
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAMAuthorization.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCResponseId.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


> ApplicationTokens should be used irrespective of kerberos
> -
>
> Key: YARN-701
> URL: https://issues.apache.org/jira/browse/YARN-701
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.0-beta
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 2.1.0-beta
>
> Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, 
> YARN-701-20130710.txt, YARN-701-20130712.txt, YARN-701-20130717.txt, 
> yarn-ojoshi-resourcemanager-HW10351.local.log
>
>
>  - Single code path for secure and non-secure cases is useful for testing, 
> coverage.
>  - Having this in non-secure mode will help us avoid accidental bugs in AMs 
> DDos'ing and bringing down RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-864) YARN NM leaking containers with CGroups

2013-07-18 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-864.
--

Resolution: Duplicate

Given Jian's update, I'm closing this as duplicate of YARN-688.

> YARN NM leaking containers with CGroups
> ---
>
> Key: YARN-864
> URL: https://issues.apache.org/jira/browse/YARN-864
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.5-alpha
> Environment: YARN 2.0.5-alpha with patches applied for YARN-799 and 
> YARN-600.
>Reporter: Chris Riccomini
>Assignee: Jian He
> Attachments: rm-log, YARN-864.1.patch, YARN-864.2.patch
>
>
> Hey Guys,
> I'm running YARN 2.0.5-alpha with CGroups and stateful RM turned on, and I'm 
> seeing containers getting leaked by the NMs. I'm not quite sure what's going 
> on -- has anyone seen this before? I'm concerned that maybe it's a 
> mis-understanding on my part about how YARN's lifecycle works.
> When I look in my AM logs for my app (not an MR app master), I see:
> 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Got an exit code of -100. 
> This means that container container_1371141151815_0008_03_02 was killed 
> by YARN, either due to being released by the application master or being 
> 'lost' due to node failures etc.
> 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Released container 
> container_1371141151815_0008_03_02 was assigned task ID 0. Requesting a 
> new container for the task.
> The AM has been running steadily the whole time. Here's what the NM logs say:
> {noformat}
> 05:34:59,783  WARN AsyncDispatcher:109 - Interrupted Exception while stopping
> java.lang.InterruptedException
> at java.lang.Object.wait(Native Method)
> at java.lang.Thread.join(Thread.java:1143)
> at java.lang.Thread.join(Thread.java:1196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:107)
> at 
> org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
> at 
> org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.stop(NodeManager.java:209)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:336)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:61)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
> at java.lang.Thread.run(Thread.java:619)
> 05:35:00,314  WARN ContainersMonitorImpl:463 - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is interrupted. Exiting.
> 05:35:00,434  WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup 
> at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0006_01_001598
> 05:35:00,434  WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup 
> at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0008_03_02
> 05:35:00,434  WARN ContainerLaunch:247 - Failed to launch container.
> java.io.IOException: java.lang.InterruptedException
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:205)
> at org.apache.hadoop.util.Shell.run(Shell.java:129)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> 05:35:00,434  WARN ContainerLaunch:247 - Failed to launch container.
> java.io.IOException: java.lang.InterruptedException
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:205)
> at org.apache.hadoop.util.Shell.run(Shell.java:129)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230)
> at 
> org.apache.hadoop.yarn.server.nodema

[jira] [Commented] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions

2013-07-18 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712688#comment-13712688
 ] 

Vinod Kumar Vavilapalli commented on YARN-658:
--

David, can you give us more information? RM, AM and NM logs will help a lot.

> Command to kill a YARN application does not work with newer Ubuntu versions
> ---
>
> Key: YARN-658
> URL: https://issues.apache.org/jira/browse/YARN-658
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha, 2.0.4-alpha
>Reporter: David Yan
>
> After issuing a KillApplicationRequest, the application keeps running on the 
> system even though the state is changed to KILLED.  It happens on both Ubuntu 
> 12.10 and 13.04, but works fine on Ubuntu 12.04.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-455) NM warns about stopping an unknown container under normal circumstances

2013-07-18 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi resolved YARN-455.


Resolution: Duplicate

> NM warns about stopping an unknown container under normal circumstances
> ---
>
> Key: YARN-455
> URL: https://issues.apache.org/jira/browse/YARN-455
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
>
> During normal operations the NM can log warnings to its audit log about 
> unknown containers.  For example:
> {noformat}
> 2013-03-06 21:04:55,327 WARN nodemanager.NMAuditLogger: USER=UnknownUser  
> IP=xx   OPERATION=Stop Container RequestTARGET=ContainerManagerImpl   
>   RESULT=FAILURE  DESCRIPTION=Trying to stop unknown container!   
> APPID=application_1359150825713_3947178 
> CONTAINERID=container_1359150825713_3947178_01_001266
> {noformat}
> Looking closer at the audit log and the NM log shows that the container 
> completed successfully and was forgotten by the NM before the stop request 
> arrived.  The NM should avoid warning in these situations since this is a 
> "normal" race condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-455) NM warns about stopping an unknown container under normal circumstances

2013-07-18 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712677#comment-13712677
 ] 

Omkar Vinit Joshi commented on YARN-455:


Closing this as duplicate . I am fixing it at YARN-903

> NM warns about stopping an unknown container under normal circumstances
> ---
>
> Key: YARN-455
> URL: https://issues.apache.org/jira/browse/YARN-455
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
>
> During normal operations the NM can log warnings to its audit log about 
> unknown containers.  For example:
> {noformat}
> 2013-03-06 21:04:55,327 WARN nodemanager.NMAuditLogger: USER=UnknownUser  
> IP=xx   OPERATION=Stop Container RequestTARGET=ContainerManagerImpl   
>   RESULT=FAILURE  DESCRIPTION=Trying to stop unknown container!   
> APPID=application_1359150825713_3947178 
> CONTAINERID=container_1359150825713_3947178_01_001266
> {noformat}
> Looking closer at the audit log and the NM log shows that the container 
> completed successfully and was forgotten by the NM before the stop request 
> arrived.  The NM should avoid warning in these situations since this is a 
> "normal" race condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-208) Yarn overrides diagnostic message set by AM

2013-07-18 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-208.
--

Resolution: Duplicate

Thomas, closing this as duplicate, please reopen if you see it again. Tx.

> Yarn overrides diagnostic message set by AM
> ---
>
> Key: YARN-208
> URL: https://issues.apache.org/jira/browse/YARN-208
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.1-alpha
>Reporter: Thomas Weise
>
> Diagnostics set in the AM just before exit overridden by Yarn. In the case of 
> state FAILED with different message, for SUCCESS the field will be blank. 
> Should retain application info. Per FinishApplicationMasterRequest this can 
> be managed by ApplicationMaster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos

2013-07-18 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712665#comment-13712665
 ] 

Arun C Murthy commented on YARN-701:


I'm committing this to unblock the rest.

> ApplicationTokens should be used irrespective of kerberos
> -
>
> Key: YARN-701
> URL: https://issues.apache.org/jira/browse/YARN-701
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.0-beta
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, 
> YARN-701-20130710.txt, YARN-701-20130712.txt, YARN-701-20130717.txt, 
> yarn-ojoshi-resourcemanager-HW10351.local.log
>
>
>  - Single code path for secure and non-secure cases is useful for testing, 
> coverage.
>  - Having this in non-secure mode will help us avoid accidental bugs in AMs 
> DDos'ing and bringing down RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable

2013-07-18 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712661#comment-13712661
 ] 

Omkar Vinit Joshi commented on YARN-713:


[~maysamyabandeh] are you working on a patch? Or else I will take over..this is 
critical and needs to be fixed.

> ResourceManager can exit unexpectedly if DNS is unavailable
> ---
>
> Key: YARN-713
> URL: https://issues.apache.org/jira/browse/YARN-713
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Maysam Yabandeh
>Priority: Critical
> Fix For: 2.1.0-beta
>
> Attachments: YARN-713.patch, YARN-713.patch, YARN-713.patch, 
> YARN-713.patch
>
>
> As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could 
> lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and 
> that ultimately would cause the RM to exit.  The RM should not exit during 
> DNS hiccups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable

2013-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712607#comment-13712607
 ] 

Hadoop QA commented on YARN-814:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12593018/YARN-814.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1518//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1518//console

This message is automatically generated.

> Difficult to diagnose a failed container launch when error due to invalid 
> environment variable
> --
>
> Key: YARN-814
> URL: https://issues.apache.org/jira/browse/YARN-814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Jian He
> Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, 
> YARN-814.4.patch, YARN-814.5.patch, YARN-814.6.patch, YARN-814.7.patch, 
> YARN-814.patch
>
>
> The container's launch script sets up environment variables, symlinks etc. 
> If there is any failure when setting up the basic context ( before the actual 
> user's process is launched ), nothing is captured by the NM. This makes it 
> impossible to diagnose the reason for the failure. 
> To reproduce, set an env var where the value contains characters that throw 
> syntax errors in bash. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-245) Node Manager can not handle duplicate responses

2013-07-18 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-245:
---

Summary: Node Manager can not handle duplicate responses  (was: Node 
Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED)

> Node Manager can not handle duplicate responses
> ---
>
> Key: YARN-245
> URL: https://issues.apache.org/jira/browse/YARN-245
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Mayank Bansal
> Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch
>
>
> {code:xml}
> 2012-11-25 12:56:11,795 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> FINISH_APPLICATION at FINISHED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
> at java.lang.Thread.run(Thread.java:662)
> 2012-11-25 12:56:11,796 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1353818859056_0004 transitioned from FINISHED to null
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable

2013-07-18 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-814:
-

Attachment: YARN-814.7.patch

new patch fixed the warnings and added test case for stdout/stderr diagnostics

> Difficult to diagnose a failed container launch when error due to invalid 
> environment variable
> --
>
> Key: YARN-814
> URL: https://issues.apache.org/jira/browse/YARN-814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Jian He
> Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, 
> YARN-814.4.patch, YARN-814.5.patch, YARN-814.6.patch, YARN-814.7.patch, 
> YARN-814.patch
>
>
> The container's launch script sets up environment variables, symlinks etc. 
> If there is any failure when setting up the basic context ( before the actual 
> user's process is launched ), nothing is captured by the NM. This makes it 
> impossible to diagnose the reason for the failure. 
> To reproduce, set an env var where the value contains characters that throw 
> syntax errors in bash. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-938) Hadoop 2 Bench marking

2013-07-18 Thread Mayank Bansal (JIRA)
Mayank Bansal created YARN-938:
--

 Summary: Hadoop 2 Bench marking 
 Key: YARN-938
 URL: https://issues.apache.org/jira/browse/YARN-938
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Mayank Bansal
Assignee: Mayank Bansal


I am running the benchmarks on Hadoop 2 and will update the results soon.

Thanks,
Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712546#comment-13712546
 ] 

Karthik Kambatla commented on YARN-321:
---

bq. Folks, it would be great if we have a consolidated document that describes 
the design and some details.
+1

> Generic application history service
> ---
>
> Key: YARN-321
> URL: https://issues.apache.org/jira/browse/YARN-321
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Luke Lu
>Assignee: Vinod Kumar Vavilapalli
> Attachments: HistoryStorageDemo.java
>
>
> The mapreduce job history server currently needs to be deployed as a trusted 
> server in sync with the mapreduce runtime. Every new application would need a 
> similar application history server. Having to deploy O(T*V) (where T is 
> number of type of application, V is number of version of application) trusted 
> servers is clearly not scalable.
> Job history storage handling itself is pretty generic: move the logs and 
> history data into a particular directory for later serving. Job history data 
> is already stored as json (or binary avro). I propose that we create only one 
> trusted application history server, which can have a generic UI (display json 
> as a tree of strings) as well. Specific application/version can deploy 
> untrusted webapps (a la AMs) to query the application history server and 
> interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712533#comment-13712533
 ] 

Karthik Kambatla commented on YARN-353:
---

bq. Make the ZKRMStateStore#NUM_RETRIES configurable with default set to 3.
bq. fixed

bq. Why should NUM_RETRIES not be there?
Was just noting that: the latest patch has the non-configurable NUM_RETRIES, it 
should exist but be configurable. If it is configurable, we should probably 
change the name of the variable.

> Add Zookeeper-based store implementation for RMStateStore
> -
>
> Key: YARN-353
> URL: https://issues.apache.org/jira/browse/YARN-353
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
> Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
> YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch
>
>
> Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712528#comment-13712528
 ] 

Bikas Saha commented on YARN-873:
-

Having a return statement in a catch/finally block is not recommended normally. 
We could print the message and re-throw the exception or simply not catch the 
exception. Also, this way the cmd line would exit with non-zero exit code.
{code}
+} catch (ApplicationNotFoundException ex) {
+  sysout.println("Application with id '"
+  + applicationId + "' doesn't exist in RM.");
+  return;
+}
{code}

> YARNClient.getApplicationReport(unknownAppId) returns a null report
> ---
>
> Key: YARN-873
> URL: https://issues.apache.org/jira/browse/YARN-873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch
>
>
> How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-18 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712523#comment-13712523
 ] 

Bikas Saha commented on YARN-353:
-

bq. ZKRMStateStore#getNewZooKeeper need not be synchronized
bq. fixed
The code is derived from ActiveStandyLeaderElector code in hadoop common. It 
was synchronized there for a race condition that showed up in testing. I would 
like to keep the synchronization as it was in the original patch.

bq. the patch still seems to have NUM_RETRIES
Why should NUM_RETRIES not be there?

> Add Zookeeper-based store implementation for RMStateStore
> -
>
> Key: YARN-353
> URL: https://issues.apache.org/jira/browse/YARN-353
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
> Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
> YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch
>
>
> Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712514#comment-13712514
 ] 

Hadoop QA commented on YARN-873:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592998/YARN-873.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1517//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1517//console

This message is automatically generated.

> YARNClient.getApplicationReport(unknownAppId) returns a null report
> ---
>
> Key: YARN-873
> URL: https://issues.apache.org/jira/browse/YARN-873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch
>
>
> How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712507#comment-13712507
 ] 

Hitesh Shah commented on YARN-873:
--

[~xgong] Requiring to parse a string message to determine whether an 
application exists or not is more work as compared to checking $? which can be 
used to indicate various errors such as connection issue/invalid application 
id/app does not exist in RM.

> YARNClient.getApplicationReport(unknownAppId) returns a null report
> ---
>
> Key: YARN-873
> URL: https://issues.apache.org/jira/browse/YARN-873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch
>
>
> How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712482#comment-13712482
 ] 

Karthik Kambatla commented on YARN-353:
---

[~jianhe], the patch still seems to have NUM_RETRIES. Also, can you take a look 
at the test failure and findbugs warnings. Thanks.

> Add Zookeeper-based store implementation for RMStateStore
> -
>
> Key: YARN-353
> URL: https://issues.apache.org/jira/browse/YARN-353
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
> Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
> YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch
>
>
> Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-873:
---

Attachment: YARN-873.3.patch

> YARNClient.getApplicationReport(unknownAppId) returns a null report
> ---
>
> Key: YARN-873
> URL: https://issues.apache.org/jira/browse/YARN-873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch
>
>
> How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712462#comment-13712462
 ] 

Xuan Gong commented on YARN-873:


bq.1. Application -status command returns exit code as 0 when the application 
doesn't exist. Can we return the non-zero exit status code when the application 
doesn't exist?

Well, return exit code as 0 is fine. if the users give the unknowAppId, they 
will get "Application doesn't exist in RM." response. And this response is 
expected. I think when we get the expected outout, we set the exit code as 0, 
otherwise set the exit code as non-zero.  



> YARNClient.getApplicationReport(unknownAppId) returns a null report
> ---
>
> Key: YARN-873
> URL: https://issues.apache.org/jira/browse/YARN-873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-873.1.patch, YARN-873.2.patch
>
>
> How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-865) RM webservices can't query based on application Types

2013-07-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712358#comment-13712358
 ] 

Hudson commented on YARN-865:
-

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1491 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1491/])
YARN-865. RM webservices can't query based on application Types. Contributed by 
Xuan Gong. (hitesh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504288)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


> RM webservices can't query based on application Types
> -
>
> Key: YARN-865
> URL: https://issues.apache.org/jira/browse/YARN-865
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.1.0-beta
>
> Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, 
> YARN-865.3.patch, YARN-865.4.patch, YARN-865.5.patch, YARN-865.6.patch
>
>
> The resource manager web service api to get the list of apps doesn't have a 
> query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-922) Change FileSystemRMStateStore to use directories

2013-07-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712357#comment-13712357
 ] 

Hudson commented on YARN-922:
-

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1491 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1491/])
YARN-922. Change FileSystemRMStateStore to use directories (Jian He via bikas) 
(bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504261)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java


> Change FileSystemRMStateStore to use directories
> 
>
> Key: YARN-922
> URL: https://issues.apache.org/jira/browse/YARN-922
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.1.0-beta
>
> Attachments: YARN-922.1.patch, YARN-922.2.patch, YARN-922.3.patch, 
> YARN-922.patch
>
>
> Store each app and its attempts in the same directory so that removing 
> application state is only one operation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos

2013-07-18 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712349#comment-13712349
 ] 

Arun C Murthy commented on YARN-701:


bq. Sure I can help.

Thanks, I've opened YARN-937 and marked it a blocker.

I'll commit YARN-701 later today to unblock both YARN-937 & YARN-918.

> ApplicationTokens should be used irrespective of kerberos
> -
>
> Key: YARN-701
> URL: https://issues.apache.org/jira/browse/YARN-701
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.0-beta
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, 
> YARN-701-20130710.txt, YARN-701-20130712.txt, YARN-701-20130717.txt, 
> yarn-ojoshi-resourcemanager-HW10351.local.log
>
>
>  - Single code path for secure and non-secure cases is useful for testing, 
> coverage.
>  - Having this in non-secure mode will help us avoid accidental bugs in AMs 
> DDos'ing and bringing down RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701

2013-07-18 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-937:
---

Target Version/s: 2.1.0-beta

> Fix unmanaged AM in non-secure/secure setup post YARN-701
> -
>
> Key: YARN-937
> URL: https://issues.apache.org/jira/browse/YARN-937
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Arun C Murthy
>Assignee: Alejandro Abdelnur
>Priority: Blocker
> Fix For: 2.1.0-beta
>
>
> Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens 
> will be used in both scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701

2013-07-18 Thread Arun C Murthy (JIRA)
Arun C Murthy created YARN-937:
--

 Summary: Fix unmanaged AM in non-secure/secure setup post YARN-701
 Key: YARN-937
 URL: https://issues.apache.org/jira/browse/YARN-937
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Arun C Murthy
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.1.0-beta


Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens will 
be used in both scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-922) Change FileSystemRMStateStore to use directories

2013-07-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712305#comment-13712305
 ] 

Hudson commented on YARN-922:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk #1464 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1464/])
YARN-922. Change FileSystemRMStateStore to use directories (Jian He via bikas) 
(bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504261)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java


> Change FileSystemRMStateStore to use directories
> 
>
> Key: YARN-922
> URL: https://issues.apache.org/jira/browse/YARN-922
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.1.0-beta
>
> Attachments: YARN-922.1.patch, YARN-922.2.patch, YARN-922.3.patch, 
> YARN-922.patch
>
>
> Store each app and its attempts in the same directory so that removing 
> application state is only one operation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-865) RM webservices can't query based on application Types

2013-07-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712306#comment-13712306
 ] 

Hudson commented on YARN-865:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk #1464 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1464/])
YARN-865. RM webservices can't query based on application Types. Contributed by 
Xuan Gong. (hitesh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504288)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


> RM webservices can't query based on application Types
> -
>
> Key: YARN-865
> URL: https://issues.apache.org/jira/browse/YARN-865
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.1.0-beta
>
> Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, 
> YARN-865.3.patch, YARN-865.4.patch, YARN-865.5.patch, YARN-865.6.patch
>
>
> The resource manager web service api to get the list of apps doesn't have a 
> query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712227#comment-13712227
 ] 

Devaraj K commented on YARN-873:


Sorry, I missed in the above comment.
3. {code:xml}
  // If the RM doesn't have the application, provide the response with
  // application report as null and let the clients to handle.
{code}

Can you also update this in-line comment in ClientRMService.java?


> YARNClient.getApplicationReport(unknownAppId) returns a null report
> ---
>
> Key: YARN-873
> URL: https://issues.apache.org/jira/browse/YARN-873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-873.1.patch, YARN-873.2.patch
>
>
> How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-922) Change FileSystemRMStateStore to use directories

2013-07-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712215#comment-13712215
 ] 

Hudson commented on YARN-922:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #274 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/274/])
YARN-922. Change FileSystemRMStateStore to use directories (Jian He via bikas) 
(bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504261)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java


> Change FileSystemRMStateStore to use directories
> 
>
> Key: YARN-922
> URL: https://issues.apache.org/jira/browse/YARN-922
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.1.0-beta
>
> Attachments: YARN-922.1.patch, YARN-922.2.patch, YARN-922.3.patch, 
> YARN-922.patch
>
>
> Store each app and its attempts in the same directory so that removing 
> application state is only one operation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-865) RM webservices can't query based on application Types

2013-07-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712216#comment-13712216
 ] 

Hudson commented on YARN-865:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #274 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/274/])
YARN-865. RM webservices can't query based on application Types. Contributed by 
Xuan Gong. (hitesh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1504288)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


> RM webservices can't query based on application Types
> -
>
> Key: YARN-865
> URL: https://issues.apache.org/jira/browse/YARN-865
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.1.0-beta
>
> Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, 
> YARN-865.3.patch, YARN-865.4.patch, YARN-865.5.patch, YARN-865.6.patch
>
>
> The resource manager web service api to get the list of apps doesn't have a 
> query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712197#comment-13712197
 ] 

Devaraj K commented on YARN-873:


The latest patch overall looks good to me. These are things I feel which we can 
take care,

1. Application -status command returns exit code as 0 when the application 
doesn't exist. Can we return the non-zero exit status code when the application 
doesn't exist?

2. In TestClientRMService.java, 
{code:xml}
+try {
+  GetApplicationReportResponse applicationReport = rmService
+  .getApplicationReport(request);
+} catch (ApplicationNotFoundException ex) {
+  getExpectedException = true;
+  Assert.assertEquals(ex.getMessage(),
+  "Application with id '" + request.getApplicationId()
+  + "' doesn't exist in RM.");
+}
+Assert.assertTrue(getExpectedException);
{code}

Can we fail after getApplicationReport using Assert.fail() instead of having 
boolean flag and checking. And also applicationReport variable is never used.

3. 
{code:xml}
  // If the RM doesn't have the application, provide the response with
  // application report as null and let the clients to handle.
{code}

Do we have other JIRA to fix the same for kill application, if not can we file 
a JIRA?

> YARNClient.getApplicationReport(unknownAppId) returns a null report
> ---
>
> Key: YARN-873
> URL: https://issues.apache.org/jira/browse/YARN-873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-873.1.patch, YARN-873.2.patch
>
>
> How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-933) After an AppAttempt_1 got failed [ removal and releasing of container is done , AppAttempt_2 is scheduled ] again relaunching of AppAttempt_1 throws Exception at RM .And cl

2013-07-18 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated YARN-933:


Description: 
am max retries configured as 3 at client and RM side.

Step 1: Install cluster with NM on 2 Machines 
Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
using Hostname should fail
Step 3: Execute a job
Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
connection loss happened.

Observation :
==
After AppAttempt_1 has moved to failed state ,release of container for 
AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed.

1. Then again retry for AppAttempt_1 happens.
2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
InvalidStateTransitonException
3. Client got exited after AppAttempt_1 is been finished [But actually job is 
still running ], while the appattempts configured is 3 and rest appattempts are 
all sponed and running.


RMLogs:
==
2013-07-17 16:22:51,013 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
maxRetries=45
2013-07-17 16:36:07,091 INFO 
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
2013-07-17 16:36:07,093 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to 
EXPIRED

2013-07-17 16:36:07,093 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Registering appattempt_1373952096466_0056_02

2013-07-17 16:36:07,131 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
2013-07-17 16:36:07,131 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
Application removed - appId: application_1373952096466_0056 user: Rex 
leaf-queue of parent: root #applications: 35

2013-07-17 16:36:07,132 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Application Submission: appattempt_1373952096466_0056_02, 
2013-07-17 16:36:07,138 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED

2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
maxRetries=45
2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
maxRetries=45
2013-07-17 16:38:56,207 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
launching appattempt_1373952096466_0056_01. Got exception: 
java.lang.reflect.UndeclaredThrowableException
2013-07-17 16:38:56,207 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
LAUNCH_FAILED at FAILED
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:476)
 at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
 at java.lang.Thread.run(Thread.java:662)

Client Logs

Caused by: org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout 
while waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending 
remote=host-10-18-40-15/10.18.40.59:8020]
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:573)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
2013-07-17 16:37:05,987 ERROR org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:R

[jira] [Updated] (YARN-933) After an AppAttempt_1 got failed [ removal and releasing of container is done , AppAttempt_2 is scheduled ] again relaunching of AppAttempt_1 throws Exception at RM .And cl

2013-07-18 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated YARN-933:


Description: 
Hostname enabled.
am max retries configured as 3 at client and RM side.

Step 1: Install cluster with NM on 2 Machines 
Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
using Hostname should fail
Step 3: Execute a job
Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
connection loss happened.

Observation :
==
After AppAttempt_1 has moved to failed state ,release of container for 
AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed.

1. Then again retry for AppAttempt_1 happens.
2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
InvalidStateTransitonException
3. Client got exited after AppAttempt_1 is been finished [But actually job is 
still running ], while the appattempts configured is 3 and rest appattempts are 
all sponed and running.


RMLogs:
==
2013-07-17 16:22:51,013 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
maxRetries=45
2013-07-17 16:36:07,091 INFO 
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
2013-07-17 16:36:07,093 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to 
EXPIRED

2013-07-17 16:36:07,093 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Registering appattempt_1373952096466_0056_02

2013-07-17 16:36:07,131 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
2013-07-17 16:36:07,131 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
Application removed - appId: application_1373952096466_0056 user: Rex 
leaf-queue of parent: root #applications: 35

2013-07-17 16:36:07,132 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Application Submission: appattempt_1373952096466_0056_02, 
2013-07-17 16:36:07,138 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED

2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
maxRetries=45
2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
maxRetries=45
2013-07-17 16:38:56,207 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
launching appattempt_1373952096466_0056_01. Got exception: 
java.lang.reflect.UndeclaredThrowableException
2013-07-17 16:38:56,207 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
LAUNCH_FAILED at FAILED
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:476)
 at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
 at java.lang.Thread.run(Thread.java:662)

Client Logs

Caused by: org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout 
while waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending 
remote=host-10-18-40-15/10.18.40.59:8020]
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:573)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
2013-07-17 16:37:05,987 ERROR org.apache.hadoop.security.UserGroupInformation: 
PriviledgedAc

[jira] [Commented] (YARN-935) Correcting pom.xml to build applicationhistoryserver sub-project successfully

2013-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712089#comment-13712089
 ] 

Hadoop QA commented on YARN-935:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592927/YARN-935.1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1516//console

This message is automatically generated.

> Correcting pom.xml to build applicationhistoryserver sub-project successfully
> -
>
> Key: YARN-935
> URL: https://issues.apache.org/jira/browse/YARN-935
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-935.1.patch
>
>
> The branch was created from branch-2, 
> hadoop-yarn-server-applicationhistoryserver/pom.xml should use 
> 2.2.0-SNAPSHOT, not 3.0.0-SNAPSHOT. Otherwise, the sub-project cannot be 
> built correctly because of wrong dependency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira