[jira] [Commented] (YARN-1474) Make schedulers services

2014-05-21 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005674#comment-14005674
 ] 

Tsuyoshi OZAWA commented on YARN-1474:
--

I'm rebasing a patch on YARN-2017. Please wait a moment.

> Make schedulers services
> 
>
> Key: YARN-1474
> URL: https://issues.apache.org/jira/browse/YARN-1474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Sandy Ryza
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
> YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
> YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.2.patch, YARN-1474.3.patch, 
> YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, 
> YARN-1474.8.patch, YARN-1474.9.patch
>
>
> Schedulers currently have a reinitialize but no start and stop.  Fitting them 
> into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-21 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005667#comment-14005667
 ] 

Tsuyoshi OZAWA commented on YARN-2017:
--

Good job!

> Merge some of the common lib code in schedulers
> ---
>
> Key: YARN-2017
> URL: https://issues.apache.org/jira/browse/YARN-2017
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.5.0
>
> Attachments: YARN-2017.1.patch, YARN-2017.2.patch, YARN-2017.3.patch, 
> YARN-2017.4.patch, YARN-2017.4.patch, YARN-2017.5.patch, YARN-2017.6.patch, 
> YARN-2017.6.patch, YARN-2017.7.patch
>
>
> A bunch of same code is repeated among schedulers, e.g:  between 
> FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a 
> common base.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005632#comment-14005632
 ] 

Vinod Kumar Vavilapalli commented on YARN-2017:
---

+1, looks good. Checking this in.

> Merge some of the common lib code in schedulers
> ---
>
> Key: YARN-2017
> URL: https://issues.apache.org/jira/browse/YARN-2017
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2017.1.patch, YARN-2017.2.patch, YARN-2017.3.patch, 
> YARN-2017.4.patch, YARN-2017.4.patch, YARN-2017.5.patch, YARN-2017.6.patch, 
> YARN-2017.6.patch, YARN-2017.7.patch
>
>
> A bunch of same code is repeated among schedulers, e.g:  between 
> FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a 
> common base.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder

2014-05-21 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005602#comment-14005602
 ] 

Hong Zhiguo commented on YARN-2088:
---

+1

I guess the failure before this patch is caused by 
builder.clearApplicationTags() being not called in setApplicationTags() or 
mergeLocalToBuilder().

2 lines as below could be cleaned away too.
{code}
 public GetApplicationsRequestProto getProto() {
 mergeLocalToProto();
-proto = viaProto ? proto : builder.build();
-viaProto = true;
 return proto;
 }
{code}



> Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
> 
>
> Key: YARN-2088
> URL: https://issues.apache.org/jira/browse/YARN-2088
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: YARN-2088.v1.patch
>
>
> Some fields(set,list) are added to proto builders many times, we need to 
> clear those fields before add, otherwise the result proto contains more 
> contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2094) how to enable job counters for mapreduce or applications

2014-05-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-2094.
---

Resolution: Invalid

Closing it as invalid. Please ask such questions on the user mailing lists. 
Thanks.

> how to enable job counters for mapreduce or applications
> 
>
> Key: YARN-2094
> URL: https://issues.apache.org/jira/browse/YARN-2094
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Nikhil Mulley
>
> Hi,
> I was looking at MapReduce jobs in my YARN setup and was wondering about the 
> jobcounters. I do not see the jobcounters for the mapreduce applications. 
> When I browse through the web page for job counters, there are no job 
> counters. Is there a specific setting to enable the application/job counters 
> in YARN? Please let me know.
> thanks,
> Nikhil



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005604#comment-14005604
 ] 

Hadoop QA commented on YARN-2017:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646173/YARN-2017.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3787//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3787//console

This message is automatically generated.

> Merge some of the common lib code in schedulers
> ---
>
> Key: YARN-2017
> URL: https://issues.apache.org/jira/browse/YARN-2017
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2017.1.patch, YARN-2017.2.patch, YARN-2017.3.patch, 
> YARN-2017.4.patch, YARN-2017.4.patch, YARN-2017.5.patch, YARN-2017.6.patch, 
> YARN-2017.6.patch, YARN-2017.7.patch
>
>
> A bunch of same code is repeated among schedulers, e.g:  between 
> FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a 
> common base.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-21 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2017:
--

Attachment: YARN-2017.7.patch

> Merge some of the common lib code in schedulers
> ---
>
> Key: YARN-2017
> URL: https://issues.apache.org/jira/browse/YARN-2017
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2017.1.patch, YARN-2017.2.patch, YARN-2017.3.patch, 
> YARN-2017.4.patch, YARN-2017.4.patch, YARN-2017.5.patch, YARN-2017.6.patch, 
> YARN-2017.6.patch, YARN-2017.7.patch
>
>
> A bunch of same code is repeated among schedulers, e.g:  between 
> FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a 
> common base.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005583#comment-14005583
 ] 

Hadoop QA commented on YARN-2017:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646169/YARN-2017.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3786//console

This message is automatically generated.

> Merge some of the common lib code in schedulers
> ---
>
> Key: YARN-2017
> URL: https://issues.apache.org/jira/browse/YARN-2017
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2017.1.patch, YARN-2017.2.patch, YARN-2017.3.patch, 
> YARN-2017.4.patch, YARN-2017.4.patch, YARN-2017.5.patch, YARN-2017.6.patch, 
> YARN-2017.6.patch
>
>
> A bunch of same code is repeated among schedulers, e.g:  between 
> FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a 
> common base.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2094) how to enable job counters for mapreduce or applications

2014-05-21 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005575#comment-14005575
 ] 

Rohith commented on YARN-2094:
--

Hi Nikhil, 
 Welcome to Hadoop community.
 bq. When I browse through the web page for job counters, there are no job 
counters.
   Which web page are you browsing? Counters link is available in 
HistoryServer web page,top left Job popdown menu. Make sure history server is 
running. You can access jobcounter page in link 
*http:///jobhistory/jobcounters/*

For asking any question like this , you can post in Hadoop user mailing list. 
For subscribe to Hadoop user mailing list follow the link
http://hadoop.apache.org/mailing_lists.html#User

> how to enable job counters for mapreduce or applications
> 
>
> Key: YARN-2094
> URL: https://issues.apache.org/jira/browse/YARN-2094
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Nikhil Mulley
>
> Hi,
> I was looking at MapReduce jobs in my YARN setup and was wondering about the 
> jobcounters. I do not see the jobcounters for the mapreduce applications. 
> When I browse through the web page for job counters, there are no job 
> counters. Is there a specific setting to enable the application/job counters 
> in YARN? Please let me know.
> thanks,
> Nikhil



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-21 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005574#comment-14005574
 ] 

Junping Du commented on YARN-2017:
--

Seems the jenkins is not started automatically. Kick off test manually.

> Merge some of the common lib code in schedulers
> ---
>
> Key: YARN-2017
> URL: https://issues.apache.org/jira/browse/YARN-2017
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2017.1.patch, YARN-2017.2.patch, YARN-2017.3.patch, 
> YARN-2017.4.patch, YARN-2017.4.patch, YARN-2017.5.patch, YARN-2017.6.patch, 
> YARN-2017.6.patch
>
>
> A bunch of same code is repeated among schedulers, e.g:  between 
> FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a 
> common base.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-21 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2017:
--

Attachment: YARN-2017.6.patch

Same patch to kick jenkins

> Merge some of the common lib code in schedulers
> ---
>
> Key: YARN-2017
> URL: https://issues.apache.org/jira/browse/YARN-2017
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2017.1.patch, YARN-2017.2.patch, YARN-2017.3.patch, 
> YARN-2017.4.patch, YARN-2017.4.patch, YARN-2017.5.patch, YARN-2017.6.patch, 
> YARN-2017.6.patch
>
>
> A bunch of same code is repeated among schedulers, e.g:  between 
> FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a 
> common base.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-21 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005564#comment-14005564
 ] 

Jian He commented on YARN-2074:
---

bq. Use this condition to decide whether this RMAppAttempt is isLastAttempt
Actually the isLastAttempt boolean is not used for determining whether to 
restart the AM , the method getAttemptFailureCount is used to do that. Will 
rename this boolean flag to avoid confusion.
bq. maybe we could use a more general way to check whether the AM is 
isPreempted, (check ContainerExitStatus instead
 thinking about this. To do this we need to persist the ContainerExitStatus in 
state store also.

> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1474) Make schedulers services

2014-05-21 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005544#comment-14005544
 ] 

Tsuyoshi OZAWA commented on YARN-1474:
--

[~kkambatl], thanks for your review.  

{quote}
And, let us handle the incompatible change to reinitialize in a separate JIRA.
{quote}

I agree with this point. Fixed the following points in a latest patch:

1. Moved the part corresponding to if (!initialized) to {{serviceInit()}}. 
Moved initialization code to {{initScheduler}} and {{startThreads}} to avoid 
code duplication.
2. Changed to call {{initScheduler}} and {{startThreads}} instead of calling 
{{reinitialize()}} in serviceInit or serviceStart.
3. For the individual threads in the schedulers, init them in serviceInit, but 
call thread.start() in serviceStart()
4. Fixed serviceStop() for CS.
5. Fixed tests based on your idea. 


> Make schedulers services
> 
>
> Key: YARN-1474
> URL: https://issues.apache.org/jira/browse/YARN-1474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Sandy Ryza
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
> YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
> YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.2.patch, YARN-1474.3.patch, 
> YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, 
> YARN-1474.8.patch, YARN-1474.9.patch
>
>
> Schedulers currently have a reinitialize but no start and stop.  Fitting them 
> into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-21 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1368:
--

Attachment: YARN-1368.3.patch

Thanks Wangda for the review ! the new patch fixed the comments also.
bq. Should we change Resource(1024, 1) to its actually resource?
fixed
bq. For recoverContainersOnNode, is it possible NODE_ADDED happened before 
APP_ADDED?
Not possible, APP_ADDED happens synchronously before ResourceTrackerService is 
started.
bq. It may better to use two parameter assertEquals, because delta is 0
because they are two doubles. fixed the delta value to be 1e-8
bq. Why use split AMContainerCrashedTransition to two transitions and set their 
states to RUNNING/LAUNCHED differently.
To capture completed containers at RUNNING/LAUNCHED state and reuse the common 
code.

> Common work to re-populate containers’ state into scheduler
> ---
>
> Key: YARN-1368
> URL: https://issues.apache.org/jira/browse/YARN-1368
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
> YARN-1368.combined.001.patch, YARN-1368.preliminary.patch
>
>
> YARN-1367 adds support for the NM to tell the RM about all currently running 
> containers upon registration. The RM needs to send this information to the 
> schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
> the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-21 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2049:
--

Attachment: YARN-2049.5.patch

The history deamon in MiniYarnCluster is also affected by the changes. Fix it 
accordingly.

> Delegation token stuff for the timeline sever
> -
>
> Key: YARN-2049
> URL: https://issues.apache.org/jira/browse/YARN-2049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, 
> YARN-2049.4.patch, YARN-2049.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-21 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1368:
--

Attachment: (was: YARN-1368.3.patch)

> Common work to re-populate containers’ state into scheduler
> ---
>
> Key: YARN-1368
> URL: https://issues.apache.org/jira/browse/YARN-1368
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-1368.1.patch, YARN-1368.2.patch, 
> YARN-1368.combined.001.patch, YARN-1368.preliminary.patch
>
>
> YARN-1367 adds support for the NM to tell the RM about all currently running 
> containers upon registration. The RM needs to send this information to the 
> schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
> the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-21 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2017:
--

Attachment: YARN-2017.6.patch

Thanks Vinod for the review !

Fixed the comments.
bq. he new node classes have a lot of getReserverdContainer() calls which can 
be replaced by a single call assigned to a local variable.
FicaSchedulerNode#reserveResource: parameter reservedContainer is renamed to 
container, similarly for FSSchedulerNode. single getReserverdContainer() call 
is used upfront.

Suppressed the find bugs warnings.

> Merge some of the common lib code in schedulers
> ---
>
> Key: YARN-2017
> URL: https://issues.apache.org/jira/browse/YARN-2017
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2017.1.patch, YARN-2017.2.patch, YARN-2017.3.patch, 
> YARN-2017.4.patch, YARN-2017.4.patch, YARN-2017.5.patch, YARN-2017.6.patch
>
>
> A bunch of same code is repeated among schedulers, e.g:  between 
> FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a 
> common base.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-21 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005511#comment-14005511
 ] 

Jian He commented on YARN-1368:
---

The new patch is rebased on YARN-2017 and created a new ContainerRecoveryReport 
record in NM-RM protocol to include the container resource capability.

> Common work to re-populate containers’ state into scheduler
> ---
>
> Key: YARN-1368
> URL: https://issues.apache.org/jira/browse/YARN-1368
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
> YARN-1368.combined.001.patch, YARN-1368.preliminary.patch
>
>
> YARN-1367 adds support for the NM to tell the RM about all currently running 
> containers upon registration. The RM needs to send this information to the 
> schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
> the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-21 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1368:
--

Attachment: YARN-1368.3.patch

> Common work to re-populate containers’ state into scheduler
> ---
>
> Key: YARN-1368
> URL: https://issues.apache.org/jira/browse/YARN-1368
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
> YARN-1368.combined.001.patch, YARN-1368.preliminary.patch
>
>
> YARN-1367 adds support for the NM to tell the RM about all currently running 
> containers upon registration. The RM needs to send this information to the 
> schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
> the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1338) Recover localized resource cache state upon nodemanager restart

2014-05-21 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005493#comment-14005493
 ] 

Junping Du commented on YARN-1338:
--

Thanks for addressing my comments, [~jlowe]! Some additional comments:
I think currently we are using initStorage(conf) to create DB items for storing 
NMState when NM is start for the first time and the same method for locating DB 
items when NM is restart. Do we have any code to destroy DB items for NMState 
when NM is decommissioned (not expecting short-term restart)? If not, when NM 
is recommissioned - which should be recognized as a fresh node, it will still 
have stale NMState info if NM_RECOVERY_DIR and DB_NAME not changed. Do I miss 
anything here?

In LocalResourcesTrackerImpl#recoverResource()
{code}
+incrementFileCountForLocalCacheDirectory(localDir.getParent());
{code}
Given localDir is already the parent of localPath, may be we should just 
increment locaDir rather than its parent? I didn't see we have unit test to 
check file count for resource directory after recovery. May be we should add 
some?

> Recover localized resource cache state upon nodemanager restart
> ---
>
> Key: YARN-1338
> URL: https://issues.apache.org/jira/browse/YARN-1338
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-1338.patch, YARN-1338v2.patch, 
> YARN-1338v3-and-YARN-1987.patch, YARN-1338v4.patch, YARN-1338v5.patch
>
>
> Today when node manager restarts we clean up all the distributed cache files 
> from disk. This is definitely not ideal from 2 aspects.
> * For work preserving restart we definitely want them as running containers 
> are using them
> * For even non work preserving restart this will be useful in the sense that 
> we don't have to download them again if needed by future tasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-05-21 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1913:
--

Attachment: YARN-1913.patch

init patch for review.
Add queueMaxAMShare configuration for each queue. And update the code in 
MaxRunningAppsEnforcer.java by considering AM share. Instead of using accurate 
AM resource usage, here use an easier way. The max_app_limited_by_AM = 
(queue.queueMaxAMShare * queue.maxShare) / scheduler.minAllocation.

> With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
> --
>
> Key: YARN-1913
> URL: https://issues.apache.org/jira/browse/YARN-1913
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Karthik Kambatla
> Attachments: YARN-1913.patch
>
>
> It's possible to deadlock a cluster by submitting many applications at once, 
> and have all cluster resources taken up by AMs.
> One solution is for the scheduler to limit resources taken up by AMs, as a 
> percentage of total cluster resources, via a "maxApplicationMasterShare" 
> config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-05-21 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan reassigned YARN-1913:
-

Assignee: Wei Yan  (was: Karthik Kambatla)

> With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
> --
>
> Key: YARN-1913
> URL: https://issues.apache.org/jira/browse/YARN-1913
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Wei Yan
> Attachments: YARN-1913.patch
>
>
> It's possible to deadlock a cluster by submitting many applications at once, 
> and have all cluster resources taken up by AMs.
> One solution is for the scheduler to limit resources taken up by AMs, as a 
> percentage of total cluster resources, via a "maxApplicationMasterShare" 
> config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-21 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2049:
--

Attachment: YARN-2049.4.patch

Updated the patch given YARN-1938 is committed

> Delegation token stuff for the timeline sever
> -
>
> Key: YARN-2049
> URL: https://issues.apache.org/jira/browse/YARN-2049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, 
> YARN-2049.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2094) how to enable job counters for mapreduce or applications

2014-05-21 Thread Nikhil Mulley (JIRA)
Nikhil Mulley created YARN-2094:
---

 Summary: how to enable job counters for mapreduce or applications
 Key: YARN-2094
 URL: https://issues.apache.org/jira/browse/YARN-2094
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Nikhil Mulley


Hi,

I was looking at MapReduce jobs in my YARN setup and was wondering about the 
jobcounters. I do not see the jobcounters for the mapreduce applications. When 
I browse through the web page for job counters, there are no job counters. Is 
there a specific setting to enable the application/job counters in YARN? Please 
let me know.

thanks,
Nikhil



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2082) Support for alternative log aggregation mechanism

2014-05-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005427#comment-14005427
 ] 

Zhijie Shen commented on YARN-2082:
---

Just think it out loudly. Instead of making another store based on HBase to 
host the aggregated logs. Is it possible to reuse the timeline store to do it? 
I think the event stream data model should be suitable in this case, and 
there's a pending work to scale out the timeline store with HBase as well 
(YARN-2032). The additional benefit is that the interfaces for publish and 
querying the data are ready, and we just need to change the hook or wrap them 
into a log aggregation plugin.

> Support for alternative log aggregation mechanism
> -
>
> Key: YARN-2082
> URL: https://issues.apache.org/jira/browse/YARN-2082
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Ming Ma
>
> I will post a more detailed design later. Here is the brief summary and would 
> like to get early feedback.
> Problem Statement:
> Current implementation of log aggregation create one HDFS file for each 
> {application, nodemanager }. These files are relative small, in the range of 
> 1-2 MB. In a large cluster with lots of application and many nodemanagers, it 
> ends up creating lots of small files in HDFS. This creates pressure on HDFS 
> NN on the following ways.
> 1. It increases NN Memory size. It is mitigated by having history server 
> deletes old log files in HDFS.
> 2. Runtime RPC hit on HDFS. Each log aggregation file introduced several NN 
> RPCs such as create, getAdditionalBlock, complete, rename. When the cluster 
> is busy, such RPC hit has impact on NN performance.
> In addition, to support non-MR applications on YARN, we might need to support 
> aggregation for long running applications.
> Design choices:
> 1. Don't aggregate all the logs, as in YARN-221.
> 2. Create a dedicated HDFS namespace used only for log aggregation.
> 3. Write logs to some key-value store like HBase. HBase's RPC hit on NN will 
> be much less.
> 4. Decentralize the application level log aggregation to NMs. All logs for a 
> given application are aggregated first by a dedicated NM before it is pushed 
> to HDFS.
> 5. Have NM aggregate logs on a regular basis; each of these log files will 
> have data from different applications and there needs to be some index for 
> quick lookup.
> Proposal:
> 1. Make yarn log aggregation pluggable for both read and write path. Note 
> that Hadoop FileSystem provides an abstraction and we could ask alternative 
> log aggregator implement compatable FileSystem, but that seems to an overkill.
> 2. Provide a log aggregation plugin that write to HBase. The scheme design 
> needs to support efficient read on a per application as well as per 
> application+container basis; in addition, it shouldn't create hotspot in a 
> cluster where certain users might create more jobs than others. For example, 
> we can use hash($user+$applicationId} + containerid as the row key.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2012) Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute

2014-05-21 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005409#comment-14005409
 ] 

Sandy Ryza commented on YARN-2012:
--

My thinking is that QueuePlacementRule.assignAppToQueue should return "" (pass) 
if the queue returned by the default rule is not configured and create is 
false.  I think this is a rare case that could only be a result of 
misconfiguration, so it's not worth adding any special handling that 
complicates the logic.

> Fair Scheduler : Default rule in queue placement policy can take a queue as 
> an optional attribute
> -
>
> Key: YARN-2012
> URL: https://issues.apache.org/jira/browse/YARN-2012
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
>  Labels: scheduler
> Attachments: YARN-2012-v1.txt, YARN-2012-v2.txt
>
>
> Currently 'default' rule in queue placement policy,if applied,puts the app in 
> root.default queue. It would be great if we can make 'default' rule 
> optionally point to a different queue as default queue . This queue should be 
> an existing queue,if not we fall back to root.default queue hence keeping 
> this rule as terminal.
> This default queue can be a leaf queue or it can also be an parent queue if 
> the 'default' rule is nested inside nestedUserQueue rule(YARN-1864).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP

2014-05-21 Thread Jon Bringhurst (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005407#comment-14005407
 ] 

Jon Bringhurst commented on YARN-2093:
--

RM-HA is enabled. This only happened on the first start after upgrading from 
2.2.0. Starting the RM again after the first start works without error. I 
haven't tried to do an upgrade again, so I'm not sure if it's reproducible.





[ 
https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005397#comment-14005397
 ]  


Sandy Ryza commented on YARN-2093:

--



Thanks for reporting this Jon.



Did this occur in an RM-HA setup?



Is it reproducible?
























































--

This message was sent by Atlassian JIRA

(v6.2#6252)



> Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP
> ---
>
> Key: YARN-2093
> URL: https://issues.apache.org/jira/browse/YARN-2093
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.1
>Reporter: Jon Bringhurst
>
> After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:
> {noformat}
> 21:19:34,308  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0003_09 State change from SUBMITTED to SCHEDULED
> 21:19:34,309  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_08 State change from SUBMITTED to SCHEDULED
> 21:19:34,310  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0003_10 State change from SUBMITTED to SCHEDULED
> 21:19:34,310  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0003_11 State change from SUBMITTED to SCHEDULED
> 21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
> appattempt_1400092144371_0004_09 to scheduler from user: 
> samza-perf-playground
> 21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
> appattempt_1400092144371_0004_10 to scheduler from user: 
> samza-perf-playground
> 21:19:34,318  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_09 State change from SUBMITTED to SCHEDULED
> 21:19:34,318  INFO FairScheduler:733 - Application 
> appattempt_1400092144371_0003_05 is done. finalState=FAILED
> 21:19:34,319  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_10 State change from SUBMITTED to SCHEDULED
> 21:19:34,319  INFO AppSchedulingInfo:108 - Application 
> application_1400092144371_0003 requests cleared
> 21:19:34,319  INFO FairScheduler:673 - Added Application Attempt 
> appattempt_1400092144371_0004_11 to scheduler from user: 
> samza-perf-playground
> 21:19:34,320  INFO FairScheduler:733 - Application 
> appattempt_1400092144371_0003_06 is done. finalState=FAILED
> 21:19:34,320  INFO AppSchedulingInfo:108 - Application 
> application_1400092144371_0003 requests cleared
> 21:19:34,320  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_11 State change from SUBMITTED to SCHEDULED
> 21:19:34,323 FATAL ResourceManager:600 - Error in handling event type 
> APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Given app to remove 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d
>  does not exist in queue [root.samza-perf-playground, demand= vCores:0>, running=, share=, 
> w=]
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>   at java.lang.Thread.run(Thread.java:744)
> 21:19:34,330  INFO ResourceManager:604 - Exiting, bbye..
> 21:19:34,335  INFO log:67 - Stopped SelectChannelConnector@:8088
> 21:19:34,437  INFO Server:2398 - Stopping server on 8033
> 21:19:34,438  INFO Server:694 - Stopping IPC Server listener on 8033
> {noformat}
> Last commit message for this build is (branch-2.4 on 
> github.com/apache/hadoop-common):
> {noformat}
> commit 09e24d5519187c0db67aacc1992be5d43829aa1e
> Author: Arpit Agarwal 
> Date:   Tue May 20 20:18:46 2014 +
> HADOOP-10562. Fix CHANGES.txt entry again
> 
> git-svn-id: 
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 
> 13f79535-47bb-0310-9956-ffa450edef68
> {noformat}



--
This message wa

[jira] [Commented] (YARN-2054) Poor defaults for YARN ZK configs for retries and retry-inteval

2014-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005402#comment-14005402
 ] 

Hadoop QA commented on YARN-2054:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646115/yarn-2054-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3785//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3785//console

This message is automatically generated.

> Poor defaults for YARN ZK configs for retries and retry-inteval
> ---
>
> Key: YARN-2054
> URL: https://issues.apache.org/jira/browse/YARN-2054
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-2054-1.patch, yarn-2054-2.patch
>
>
> Currenly, we have the following default values:
> # yarn.resourcemanager.zk-num-retries - 500
> # yarn.resourcemanager.zk-retry-interval-ms - 2000
> This leads to a cumulate 1000 seconds before the RM gives up trying to 
> connect to the ZK. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP

2014-05-21 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005397#comment-14005397
 ] 

Sandy Ryza commented on YARN-2093:
--

Thanks for reporting this Jon.

Did this occur in an RM-HA setup?

Is it reproducible?

> Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP
> ---
>
> Key: YARN-2093
> URL: https://issues.apache.org/jira/browse/YARN-2093
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.1
>Reporter: Jon Bringhurst
>
> After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:
> {noformat}
> 21:19:34,308  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0003_09 State change from SUBMITTED to SCHEDULED
> 21:19:34,309  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_08 State change from SUBMITTED to SCHEDULED
> 21:19:34,310  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0003_10 State change from SUBMITTED to SCHEDULED
> 21:19:34,310  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0003_11 State change from SUBMITTED to SCHEDULED
> 21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
> appattempt_1400092144371_0004_09 to scheduler from user: 
> samza-perf-playground
> 21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
> appattempt_1400092144371_0004_10 to scheduler from user: 
> samza-perf-playground
> 21:19:34,318  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_09 State change from SUBMITTED to SCHEDULED
> 21:19:34,318  INFO FairScheduler:733 - Application 
> appattempt_1400092144371_0003_05 is done. finalState=FAILED
> 21:19:34,319  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_10 State change from SUBMITTED to SCHEDULED
> 21:19:34,319  INFO AppSchedulingInfo:108 - Application 
> application_1400092144371_0003 requests cleared
> 21:19:34,319  INFO FairScheduler:673 - Added Application Attempt 
> appattempt_1400092144371_0004_11 to scheduler from user: 
> samza-perf-playground
> 21:19:34,320  INFO FairScheduler:733 - Application 
> appattempt_1400092144371_0003_06 is done. finalState=FAILED
> 21:19:34,320  INFO AppSchedulingInfo:108 - Application 
> application_1400092144371_0003 requests cleared
> 21:19:34,320  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_11 State change from SUBMITTED to SCHEDULED
> 21:19:34,323 FATAL ResourceManager:600 - Error in handling event type 
> APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Given app to remove 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d
>  does not exist in queue [root.samza-perf-playground, demand= vCores:0>, running=, share=, 
> w=]
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>   at java.lang.Thread.run(Thread.java:744)
> 21:19:34,330  INFO ResourceManager:604 - Exiting, bbye..
> 21:19:34,335  INFO log:67 - Stopped SelectChannelConnector@:8088
> 21:19:34,437  INFO Server:2398 - Stopping server on 8033
> 21:19:34,438  INFO Server:694 - Stopping IPC Server listener on 8033
> {noformat}
> Last commit message for this build is (branch-2.4 on 
> github.com/apache/hadoop-common):
> {noformat}
> commit 09e24d5519187c0db67aacc1992be5d43829aa1e
> Author: Arpit Agarwal 
> Date:   Tue May 20 20:18:46 2014 +
> HADOOP-10562. Fix CHANGES.txt entry again
> 
> git-svn-id: 
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 
> 13f79535-47bb-0310-9956-ffa450edef68
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations

2014-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005398#comment-14005398
 ] 

Hadoop QA commented on YARN-2089:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646089/yarn-2089.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3784//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3784//console

This message is automatically generated.

> FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing 
> audience annotations
> ---
>
> Key: YARN-2089
> URL: https://issues.apache.org/jira/browse/YARN-2089
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: zhihai xu
>  Labels: newbie
> Attachments: yarn-2089.patch
>
>
> We should mark QueuePlacementPolicy and QueuePlacementRule with audience 
> annotations @Private  @Unstable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1474) Make schedulers services

2014-05-21 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1474:
-

Attachment: (was: YARN-1474.15.patch)

> Make schedulers services
> 
>
> Key: YARN-1474
> URL: https://issues.apache.org/jira/browse/YARN-1474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Sandy Ryza
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
> YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
> YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.2.patch, YARN-1474.3.patch, 
> YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, 
> YARN-1474.8.patch, YARN-1474.9.patch
>
>
> Schedulers currently have a reinitialize but no start and stop.  Fitting them 
> into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1474) Make schedulers services

2014-05-21 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1474:
-

Attachment: YARN-1474.15.patch

> Make schedulers services
> 
>
> Key: YARN-1474
> URL: https://issues.apache.org/jira/browse/YARN-1474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Sandy Ryza
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
> YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
> YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.2.patch, YARN-1474.3.patch, 
> YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, 
> YARN-1474.8.patch, YARN-1474.9.patch
>
>
> Schedulers currently have a reinitialize but no start and stop.  Fitting them 
> into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1868) YARN status web ui does not show correctly in IE 11

2014-05-21 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005387#comment-14005387
 ] 

Mike Liddell commented on YARN-1868:


I didn't see this mentioned:  A specific workaround in IE 11 is
 Settings|Compatability View Settings|Display intranet sites in Compatability 
View -> False.


> YARN status web ui does not show correctly in IE 11
> ---
>
> Key: YARN-1868
> URL: https://issues.apache.org/jira/browse/YARN-1868
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.0.0
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-1868.1.patch, YARN-1868.2.patch, YARN-1868.patch, 
> YARN_status.png
>
>
> The YARN status web ui does not show correctly in IE 11. The drop down menu 
> for app entries are not shown. Also the navigation menu displays incorrectly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server

2014-05-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005369#comment-14005369
 ] 

Vinod Kumar Vavilapalli commented on YARN-1938:
---

Looks good. +1. Checking this in..

> Kerberos authentication for the timeline server
> ---
>
> Key: YARN-1938
> URL: https://issues.apache.org/jira/browse/YARN-1938
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005370#comment-14005370
 ] 

Xuan Gong commented on YARN-2074:
-

Comments:
1. {code}
RMAppAttempt attempt =
new RMAppAttemptImpl(appAttemptId, rmContext, scheduler, masterService,
  submissionContext, conf, maxAppAttempts <= attempts.size());
{code}
Use this condition to decide whether this RMAppAttempt is isLastAttempt, does 
not sound right to me. 
For example, we set the maxAppAttempts as 3, but previous 2 AM is preempted, 
based on the condition you set here, the next RMAppAttempt is the lastAttempt 
?? If this Attempt is failed, the whole application will be marked as failure. 

2. {code}
  public boolean isPreempted() {
return getDiagnostics().contains(SchedulerUtils.PREEMPTED_CONTAINER);
  }
{code}
It is fine to use this to check  isPreempted. But, link 
https://issues.apache.org/jira/browse/YARN-614, basically, this ticket is 
saying we should separate hardware failures or YARN issues from AM failure, and 
do not count them as AM failure. I think that the Preemption of AM is one of 
them. So, maybe we could use a more general way to check whether the AM is 
isPreempted, (check ContainerExitStatus instead ?)

> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1474) Make schedulers services

2014-05-21 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1474:
-

Attachment: YARN-1474.15.patch

> Make schedulers services
> 
>
> Key: YARN-1474
> URL: https://issues.apache.org/jira/browse/YARN-1474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Sandy Ryza
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
> YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
> YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.2.patch, YARN-1474.3.patch, 
> YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, 
> YARN-1474.8.patch, YARN-1474.9.patch
>
>
> Schedulers currently have a reinitialize but no start and stop.  Fitting them 
> into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server

2014-05-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005340#comment-14005340
 ] 

Zhijie Shen commented on YARN-2070:
---

bq.  the Server itself is going to start injecting a user-name that is the sole 
authority. 

In YARN-1937, I try to keep users away from the system information (entity 
owner here), and it will be removed before the entity/event is returned back to 
the user.

> DistributedShell publishes unfriendly user information to the timeline server
> -
>
> Key: YARN-2070
> URL: https://issues.apache.org/jira/browse/YARN-2070
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Robert Kanter
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-2070.patch
>
>
> Bellow is the code of using the string of current user object as the "user" 
> value.
> {code}
> entity.addPrimaryFilter("user", UserGroupInformation.getCurrentUser()
> .toString());
> {code}
> When we use kerberos authentication, it's going to output the full name, such 
> as "zjshen/localhost@LOCALHOST (auth.KERBEROS)". It is not user friendly for 
> searching by the primary filters. It's better to use shortUserName instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT

2014-05-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005335#comment-14005335
 ] 

Zhijie Shen commented on YARN-2092:
---

{code}
2014-05-19 20:09:07,933 FATAL [HistoryEventHandlingThread] 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
Thread[HistoryEventHandlingThread,5,main] threw an Error. Shutting down now...
java.lang.NoSuchMethodError: 
org.codehaus.jackson.map.ObjectMapper.setSerializationInclusion(Lorg/codehaus/jackson/map/annotate/JsonSerialize$Inclusion;)Lorg/codehaus/jackson/map/ObjectMapper;
at 
org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.locateMapper(YarnJacksonJaxbJsonProvider.java:54)
at 
org.codehaus.jackson.jaxrs.JacksonJsonProvider.writeTo(JacksonJsonProvider.java:488)
{code}

JacksonJsonProvider is in jackson-jaxrs while ObjectMapper is in 
jackson-mapper-asl. If I understand correctly, it looks like the two libs' 
versions don't match.

Hadoop uses the following 4 jackson libs.
{code}
  
org.codehaus.jackson
jackson-mapper-asl
1.9.13
  
  
org.codehaus.jackson
jackson-core-asl
1.9.13
  
  
org.codehaus.jackson
jackson-jaxrs
1.9.13
  
  
org.codehaus.jackson
jackson-xc
1.9.13
  
{code}

Given Tez includes all these 4 jars of 1.8.8 in its classpath, either putting 
it before or after Hadoop classpath, there shouldn't be the mismatch. On the 
other side, if Tez includes part of these 4 jars and puts the libs before 
hadoop libs, this problem will occur. Say:

{code}
cp=...:jackson-jaxrs-1.8.8.jar:jackson-xc-1.8.3.jar:jackson-jaxrs-1.9.13.jar:jackson-xc-1.9.13.jar:jackson-mapper-asl-1.9.13.jar:jackson-xc-1.9.13.jar:...
{code}

> Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 
> 2.5.0-SNAPSHOT
> 
>
> Key: YARN-2092
> URL: https://issues.apache.org/jira/browse/YARN-2092
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>
> Came across this when trying to integrate with the timeline server. Using a 
> 1.8.8 dependency of jackson works fine against 2.4.0 but fails against 
> 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user 
> jars are first in the classpath.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server

2014-05-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005320#comment-14005320
 ] 

Vinod Kumar Vavilapalli commented on YARN-2070:
---

With some of the tickets under YARN-1935, the Server itself is going to start 
injecting a user-name that is the sole authority. Given that, should we 
consider dropping this completely?

> DistributedShell publishes unfriendly user information to the timeline server
> -
>
> Key: YARN-2070
> URL: https://issues.apache.org/jira/browse/YARN-2070
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Robert Kanter
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-2070.patch
>
>
> Bellow is the code of using the string of current user object as the "user" 
> value.
> {code}
> entity.addPrimaryFilter("user", UserGroupInformation.getCurrentUser()
> .toString());
> {code}
> When we use kerberos authentication, it's going to output the full name, such 
> as "zjshen/localhost@LOCALHOST (auth.KERBEROS)". It is not user friendly for 
> searching by the primary filters. It's better to use shortUserName instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2054) Poor defaults for YARN ZK configs for retries and retry-inteval

2014-05-21 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2054:
---

Attachment: yarn-2054-2.patch

A patch that sets the retry interval based on the session timeout, number of 
retries and whether HA is enabled.

> Poor defaults for YARN ZK configs for retries and retry-inteval
> ---
>
> Key: YARN-2054
> URL: https://issues.apache.org/jira/browse/YARN-2054
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-2054-1.patch, yarn-2054-2.patch
>
>
> Currenly, we have the following default values:
> # yarn.resourcemanager.zk-num-retries - 500
> # yarn.resourcemanager.zk-retry-interval-ms - 2000
> This leads to a cumulate 1000 seconds before the RM gives up trying to 
> connect to the ZK. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP

2014-05-21 Thread Jon Bringhurst (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Bringhurst updated YARN-2093:
-

Description: 
After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:

{noformat}
21:19:34,308  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 
State change from SUBMITTED to SCHEDULED
21:19:34,309  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 
State change from SUBMITTED to SCHEDULED
21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_09 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_10 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 
State change from SUBMITTED to SCHEDULED
21:19:34,318  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_05 is done. finalState=FAILED
21:19:34,319  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_10 
State change from SUBMITTED to SCHEDULED
21:19:34,319  INFO AppSchedulingInfo:108 - Application 
application_1400092144371_0003 requests cleared
21:19:34,319  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_11 to scheduler from user: 
samza-perf-playground
21:19:34,320  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_06 is done. finalState=FAILED
21:19:34,320  INFO AppSchedulingInfo:108 - Application 
application_1400092144371_0003 requests cleared
21:19:34,320  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_11 
State change from SUBMITTED to SCHEDULED
21:19:34,323 FATAL ResourceManager:600 - Error in handling event type 
APP_ATTEMPT_REMOVED to the scheduler
java.lang.IllegalStateException: Given app to remove 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d
 does not exist in queue [root.samza-perf-playground, demand=, running=, share=, 
w=]
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
at java.lang.Thread.run(Thread.java:744)
21:19:34,330  INFO ResourceManager:604 - Exiting, bbye..
21:19:34,335  INFO log:67 - Stopped SelectChannelConnector@:8088
21:19:34,437  INFO Server:2398 - Stopping server on 8033
21:19:34,438  INFO Server:694 - Stopping IPC Server listener on 8033
{noformat}

Last commit message for this build is (branch-2.4 on 
github.com/apache/hadoop-common):

{noformat}
commit 09e24d5519187c0db67aacc1992be5d43829aa1e
Author: Arpit Agarwal 
Date:   Tue May 20 20:18:46 2014 +

HADOOP-10562. Fix CHANGES.txt entry again

git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 
13f79535-47bb-0310-9956-ffa450edef68
{noformat}

  was:
After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:

{noformat}
21:19:34,308  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 
State change from SUBMITTED to SCHEDULED
21:19:34,309  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 
State change from SUBMITTED to SCHEDULED
21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_09 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_10 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 
State change from SUBMITTED to SCHEDULED
21:19:34,318  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_05 is done. finalState=FAILED
21:19:34,319  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_10 
State change from SUBMITTED to SCHEDULED
21:19:34,319  INFO AppSchedulingIn

[jira] [Updated] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP

2014-05-21 Thread Jon Bringhurst (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Bringhurst updated YARN-2093:
-

Description: 
After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:

{noformat}
21:19:34,308  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 
State change from SUBMITTED to SCHEDULED
21:19:34,309  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 
State change from SUBMITTED to SCHEDULED
21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_09 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_10 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 
State change from SUBMITTED to SCHEDULED
21:19:34,318  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_05 is done. finalState=FAILED
21:19:34,319  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_10 
State change from SUBMITTED to SCHEDULED
21:19:34,319  INFO AppSchedulingInfo:108 - Application 
application_1400092144371_0003 requests cleared
21:19:34,319  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_11 to scheduler from user: 
samza-perf-playground
21:19:34,320  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_06 is done. finalState=FAILED
21:19:34,320  INFO AppSchedulingInfo:108 - Application 
application_1400092144371_0003 requests cleared
21:19:34,320  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_11 
State change from SUBMITTED to SCHEDULED
21:19:34,323 FATAL ResourceManager:600 - Error in handling event type 
APP_ATTEMPT_REMOVED to the scheduler
java.lang.IllegalStateException: Given app to remove 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d
 does not exist in queue [root.samza-perf-playground, demand=, running=, share=, 
w=]
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
at java.lang.Thread.run(Thread.java:744)
21:19:34,330  INFO ResourceManager:604 - Exiting, bbye..
21:19:34,335  INFO log:67 - Stopped SelectChannelConnector@:8088
21:19:34,437  INFO Server:2398 - Stopping server on 8033
21:19:34,438  INFO Server:694 - Stopping IPC Server listener on 8033
{noformat}

Last commit message is (branch-2.4 on github.com/apache/hadoop-common):

{noformat}
commit 09e24d5519187c0db67aacc1992be5d43829aa1e
Author: Arpit Agarwal 
Date:   Tue May 20 20:18:46 2014 +

HADOOP-10562. Fix CHANGES.txt entry again

git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 
13f79535-47bb-0310-9956-ffa450edef68
{noformat}

  was:
After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:

{noformat}
21:19:34,308  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 
State change from SUBMITTED to SCHEDULED
21:19:34,309  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 
State change from SUBMITTED to SCHEDULED
21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_09 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_10 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 
State change from SUBMITTED to SCHEDULED
21:19:34,318  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_05 is done. finalState=FAILED
21:19:34,319  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_10 
State change from SUBMITTED to SCHEDULED
21:19:34,319  INFO AppSchedulingInfo:108 - Applica

[jira] [Created] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP

2014-05-21 Thread Jon Bringhurst (JIRA)
Jon Bringhurst created YARN-2093:


 Summary: Fair Scheduler IllegalStateException after upgrade from 
2.2.0 to 2.4.1-SNAP
 Key: YARN-2093
 URL: https://issues.apache.org/jira/browse/YARN-2093
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Jon Bringhurst


After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:

{noformat}
21:19:34,308  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 
State change from SUBMITTED to SCHEDULED
21:19:34,309  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 
State change from SUBMITTED to SCHEDULED
21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_09 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_10 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 
State change from SUBMITTED to SCHEDULED
21:19:34,318  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_05 is done. finalState=FAILED
21:19:34,319  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_10 
State change from SUBMITTED to SCHEDULED
21:19:34,319  INFO AppSchedulingInfo:108 - Application 
application_1400092144371_0003 requests cleared
21:19:34,319  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_11 to scheduler from user: 
samza-perf-playground
21:19:34,320  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_06 is done. finalState=FAILED
21:19:34,320  INFO AppSchedulingInfo:108 - Application 
application_1400092144371_0003 requests cleared
21:19:34,320  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_11 
State change from SUBMITTED to SCHEDULED
21:19:34,323 FATAL ResourceManager:600 - Error in handling event type 
APP_ATTEMPT_REMOVED to the scheduler
java.lang.IllegalStateException: Given app to remove 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d
 does not exist in queue [root.samza-perf-playground, demand=, running=, share=, 
w=]
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
at java.lang.Thread.run(Thread.java:744)
21:19:34,330  INFO ResourceManager:604 - Exiting, bbye..
21:19:34,335  INFO log:67 - Stopped 
selectchannelconnec...@eat1-app587.stg.linkedin.com:8088
21:19:34,437  INFO Server:2398 - Stopping server on 8033
21:19:34,438  INFO Server:694 - Stopping IPC Server listener on 8033
{noformat}

Last commit message is (branch-2.4 on github.com/apache/hadoop-common):

{noformat}
commit 09e24d5519187c0db67aacc1992be5d43829aa1e
Author: Arpit Agarwal 
Date:   Tue May 20 20:18:46 2014 +

HADOOP-10562. Fix CHANGES.txt entry again

git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 
13f79535-47bb-0310-9956-ffa450edef68
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT

2014-05-21 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005286#comment-14005286
 ] 

Hitesh Shah commented on YARN-2092:
---

See 
https://issues.apache.org/jira/browse/TEZ-1066?focusedCommentId=14002674&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14002674
 for the stack trace when trying to run against 2.5.0-SNAPSHOT with jackson 
1.8.8 jars.

> Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 
> 2.5.0-SNAPSHOT
> 
>
> Key: YARN-2092
> URL: https://issues.apache.org/jira/browse/YARN-2092
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>
> Came across this when trying to integrate with the timeline server. Using a 
> 1.8.8 dependency of jackson works fine against 2.4.0 but fails against 
> 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user 
> jars are first in the classpath.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT

2014-05-21 Thread Hitesh Shah (JIRA)
Hitesh Shah created YARN-2092:
-

 Summary: Incompatible org.codehaus.jackson* dependencies when 
moving from 2.4.0 to 2.5.0-SNAPSHOT
 Key: YARN-2092
 URL: https://issues.apache.org/jira/browse/YARN-2092
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah


Came across this when trying to integrate with the timeline server. Using a 
1.8.8 dependency of jackson works fine against 2.4.0 but fails against 
2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user jars 
are first in the classpath.  





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-21 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-2091:
-

Summary: Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app 
masters  (was: Add ContainerExitStatus.KILL_EXCEECED_MEMORY and pass it to app 
masters)

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2091) Add ContainerExitStatus.KILL_EXCEECED_MEMORY and pass it to app masters

2014-05-21 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-2091:


 Summary: Add ContainerExitStatus.KILL_EXCEECED_MEMORY and pass it 
to app masters
 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha


Currently, the AM cannot programmatically determine if the task was killed due 
to using excessive memory. The NM kills it without passing this information in 
the container status back to the RM. So the AM cannot take any action here. The 
jira tracks adding this exit status and passing it from the NM to the RM and 
then the AM. In general, there may be other such actions taken by YARN that are 
currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2090) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase

2014-05-21 Thread Victor Kim (JIRA)
Victor Kim created YARN-2090:


 Summary: If Kerberos Authentication is enabled, MapReduce job is 
failing on reducer phase
 Key: YARN-2090
 URL: https://issues.apache.org/jira/browse/YARN-2090
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, nodemanager
Affects Versions: 2.4.0
 Environment: hadoop: 2.4.0.2.1.2.0
Reporter: Victor Kim
Priority: Critical


I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, 
Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. 
ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos 
principal. 
Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one 
having Kerberos principal on all boxes). Result: job successfully completed.
Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. 
Result: Map tasks are completed SUCCESSfully, Reduce task fails with 
ShuffleError Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES 
(see the stack trace below).
The use case with user impersonation used to work on earlier versions, without 
YARN (with JT&TT).

I found similar issue with Kerberos AUTH involved here: 
https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ
And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as 
resolved, which is not the case when Kerberos Authentication is enabled.

The exception trace from YarnChild JVM:
2014-05-21 12:49:35,687 FATAL [fetcher#3] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed 
with too many fetch failures and insufficient progress!
2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : 
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle 
in fetcher#3
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
at 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations

2014-05-21 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005214#comment-14005214
 ] 

Karthik Kambatla commented on YARN-2089:


(actually, let us wait for Jenkins even though the changes are not really code)

> FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing 
> audience annotations
> ---
>
> Key: YARN-2089
> URL: https://issues.apache.org/jira/browse/YARN-2089
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: zhihai xu
>  Labels: newbie
> Attachments: yarn-2089.patch
>
>
> We should mark QueuePlacementPolicy and QueuePlacementRule with audience 
> annotations @Private  @Unstable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations

2014-05-21 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005211#comment-14005211
 ] 

Karthik Kambatla commented on YARN-2089:


Looks good to me as well. +1. Checking this in.


> FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing 
> audience annotations
> ---
>
> Key: YARN-2089
> URL: https://issues.apache.org/jira/browse/YARN-2089
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: zhihai xu
>  Labels: newbie
> Attachments: yarn-2089.patch
>
>
> We should mark QueuePlacementPolicy and QueuePlacementRule with audience 
> annotations @Private  @Unstable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations

2014-05-21 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005206#comment-14005206
 ] 

Sandy Ryza commented on YARN-2089:
--

+1

> FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing 
> audience annotations
> ---
>
> Key: YARN-2089
> URL: https://issues.apache.org/jira/browse/YARN-2089
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: zhihai xu
>  Labels: newbie
> Attachments: yarn-2089.patch
>
>
> We should mark QueuePlacementPolicy and QueuePlacementRule with audience 
> annotations @Private  @Unstable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server

2014-05-21 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2070:


Attachment: YARN-2070.patch

> DistributedShell publishes unfriendly user information to the timeline server
> -
>
> Key: YARN-2070
> URL: https://issues.apache.org/jira/browse/YARN-2070
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Robert Kanter
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-2070.patch
>
>
> Bellow is the code of using the string of current user object as the "user" 
> value.
> {code}
> entity.addPrimaryFilter("user", UserGroupInformation.getCurrentUser()
> .toString());
> {code}
> When we use kerberos authentication, it's going to output the full name, such 
> as "zjshen/localhost@LOCALHOST (auth.KERBEROS)". It is not user friendly for 
> searching by the primary filters. It's better to use shortUserName instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations

2014-05-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2089:


Attachment: yarn-2089.patch

> FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing 
> audience annotations
> ---
>
> Key: YARN-2089
> URL: https://issues.apache.org/jira/browse/YARN-2089
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: zhihai xu
>  Labels: newbie
> Attachments: yarn-2089.patch
>
>
> We should mark QueuePlacementPolicy and QueuePlacementRule with audience 
> annotations @Private  @Unstable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations

2014-05-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu reassigned YARN-2089:
---

Assignee: zhihai xu

> FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing 
> audience annotations
> ---
>
> Key: YARN-2089
> URL: https://issues.apache.org/jira/browse/YARN-2089
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: zhihai xu
>  Labels: newbie
>
> We should mark QueuePlacementPolicy and QueuePlacementRule with audience 
> annotations @Private  @Unstable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2012) Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute

2014-05-21 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004968#comment-14004968
 ] 

Ashwin Shankar commented on YARN-2012:
--

[~sandyr] Thanks for looking into this.
bq. I think it's a little confusing for the rule to fall back to "default". Can 
we let this part be handled by the "create" logic in assignAppToQueue?
Sure but just to clarify,are you saying that in 
QueuePlacementRule.assignAppToQueue we should return "root.default" if the 
queue returned by Default rule is not configured and if create is false ? ie 
Basically, in the "create" logic, we won't cause a skip to occur if the rule is 
a Default rule, instead we return root.default.


> Fair Scheduler : Default rule in queue placement policy can take a queue as 
> an optional attribute
> -
>
> Key: YARN-2012
> URL: https://issues.apache.org/jira/browse/YARN-2012
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
>  Labels: scheduler
> Attachments: YARN-2012-v1.txt, YARN-2012-v2.txt
>
>
> Currently 'default' rule in queue placement policy,if applied,puts the app in 
> root.default queue. It would be great if we can make 'default' rule 
> optionally point to a different queue as default queue . This queue should be 
> an existing queue,if not we fall back to root.default queue hence keeping 
> this rule as terminal.
> This default queue can be a leaf queue or it can also be an parent queue if 
> the 'default' rule is nested inside nestedUserQueue rule(YARN-1864).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-21 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004927#comment-14004927
 ] 

Bikas Saha commented on YARN-1366:
--

I mean what will go wrong is we allow unregister without register? Is it 
fundamentally wrong?

> ApplicationMasterService should Resync with the AM upon allocate call after 
> restart
> ---
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.patch, 
> YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server

2014-05-21 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter reassigned YARN-2070:
---

Assignee: Robert Kanter

> DistributedShell publishes unfriendly user information to the timeline server
> -
>
> Key: YARN-2070
> URL: https://issues.apache.org/jira/browse/YARN-2070
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Robert Kanter
>Priority: Minor
>  Labels: newbie
>
> Bellow is the code of using the string of current user object as the "user" 
> value.
> {code}
> entity.addPrimaryFilter("user", UserGroupInformation.getCurrentUser()
> .toString());
> {code}
> When we use kerberos authentication, it's going to output the full name, such 
> as "zjshen/localhost@LOCALHOST (auth.KERBEROS)". It is not user friendly for 
> searching by the primary filters. It's better to use shortUserName instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-05-21 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004900#comment-14004900
 ] 

Anubhav Dhoot commented on YARN-1365:
-

The failed test has race conditions that i am fixing.

> ApplicationMasterService to allow Register and Unregister of an app that was 
> running before restart
> ---
>
> Key: YARN-1365
> URL: https://issues.apache.org/jira/browse/YARN-1365
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
> YARN-1365.initial.patch
>
>
> For an application that was running before restart, the 
> ApplicationMasterService currently throws an exception when the app tries to 
> make the initial register or final unregister call. These should succeed and 
> the RMApp state machine should transition to completed like normal. 
> Unregistration should succeed for an app that the RM considers complete since 
> the RM may have died after saving completion in the store but before 
> notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations

2014-05-21 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-2089:
---

 Summary: FairScheduler: QueuePlacementPolicy and 
QueuePlacementRule are missing audience annotations
 Key: YARN-2089
 URL: https://issues.apache.org/jira/browse/YARN-2089
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Anubhav Dhoot


We should mark QueuePlacementPolicy and QueuePlacementRule with audience 
annotations @Private  @Unstable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder

2014-05-21 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004803#comment-14004803
 ] 

Binglin Chang commented on YARN-2088:
-

Based on recent bugs related to api records/PBImpl, I have some doubts about 
the general patterns used in PBImpls(java fields mixed with proto objects, 
cached states), which causes lots of redundant code and confusion, changes to 
those code is a mental challenge and can easily generate new bugs...


> Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
> 
>
> Key: YARN-2088
> URL: https://issues.apache.org/jira/browse/YARN-2088
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: YARN-2088.v1.patch
>
>
> Some fields(set,list) are added to proto builders many times, we need to 
> clear those fields before add, otherwise the result proto contains more 
> contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder

2014-05-21 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated YARN-2088:


Attachment: YARN-2088.v1.patch

Bug and fixes:
1. clear builder before adding Set/Lists
2. remove unnecessary maybeInitBuilder in mergeLocalToBuilder
3. we don't need to construct Iterable manually, just use guava library
4. the property limit is not set properly in mergeLocalToBuilder, this may 
cause the limit property be reset to Long.MAX...
5. add a test assertion in TestGetApplicationsRequest to verify the bug

Run the test on my local laptop, the test failed before the patch, and success 
after the patch. 


> Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
> 
>
> Key: YARN-2088
> URL: https://issues.apache.org/jira/browse/YARN-2088
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: YARN-2088.v1.patch
>
>
> Some fields(set,list) are added to proto builders many times, we need to 
> clear those fields before add, otherwise the result proto contains more 
> contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder

2014-05-21 Thread Binglin Chang (JIRA)
Binglin Chang created YARN-2088:
---

 Summary: Fix code bug in 
GetApplicationsRequestPBImpl#mergeLocalToBuilder
 Key: YARN-2088
 URL: https://issues.apache.org/jira/browse/YARN-2088
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang


Some fields(set,list) are added to proto builders many times, we need to clear 
those fields before add, otherwise the result proto contains more contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2084) YARN to support REST APIs in AMs

2014-05-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004609#comment-14004609
 ] 

Steve Loughran commented on YARN-2084:
--

Note that for Slider we worked around the proxy problems with our own AM-side 
filter -we don't want to encourage that.

> YARN to support REST APIs in AMs
> 
>
> Key: YARN-2084
> URL: https://issues.apache.org/jira/browse/YARN-2084
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: webapp
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>
> Having built a REST API in a YARN app, we've had to work around a few 
> quirks/issues that could be addressed centrally
> # proxy & filter not allowing PUT/POST/DELETE operations
> # NotFound exceptions incompatible with text/plain responses
> This JIRA exists to cover them and any other issues. It'll probably need some 
> tests too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2087) YARN proxy doesn't relay verbs other than GET

2014-05-21 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-2087:


 Summary: YARN proxy doesn't relay verbs other than GET
 Key: YARN-2087
 URL: https://issues.apache.org/jira/browse/YARN-2087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Steve Loughran


the {{WebAppProxy}} class only proxies GET requests, REST verbs PUT, DELETE and 
POST aren't handled. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2086) AmIpFilter to support REST APIs

2014-05-21 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-2086:


 Summary: AmIpFilter to support REST APIs
 Key: YARN-2086
 URL: https://issues.apache.org/jira/browse/YARN-2086
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran


The {{AmIpFilter}} doesn't like REST APIs, as all operations are redirected to 
the proxy as a 302. Even if the proxy did relay all verbs, the filter would 
need to return a 307 and hope the client was enable to re-issue the verb.

The alternative is to have a dedicated part of the webapp to be unproxied, 
which we did with a custom filter to not relay "/ws/*", or even allow apps to 
register a rest endpoint directly, either in the AppReport data, or via the 
YARN-913 registry



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2085) GenericExceptionHandler can't report into TEXT/PLAIN responses

2014-05-21 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-2085:


 Summary: GenericExceptionHandler can't report into TEXT/PLAIN 
responses
 Key: YARN-2085
 URL: https://issues.apache.org/jira/browse/YARN-2085
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: webapp
Affects Versions: 2.4.0
Reporter: Steve Loughran


As seen in SLIDER-51, exceptions (like NotFound) can't be mapped into 
text/plain responses. 

it may be that the {{Response.status(s).entity(exception).build()}} logic just 
doesn't work for plaintext, in which case the handler should detect an 
unsupported mime type and just return the error code with an empty body. That 
might be the best approach for other binaries too. 

or: simply catch the marshalling exception and downgrade to an empty-body 
status code. This would be a more graceful fallback, as it would catch all 
marshalling issues and return the original error code to the user



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2084) YARN to support REST APIs in AMs

2014-05-21 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-2084:


 Summary: YARN to support REST APIs in AMs
 Key: YARN-2084
 URL: https://issues.apache.org/jira/browse/YARN-2084
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: webapp
Affects Versions: 2.4.0
Reporter: Steve Loughran


Having built a REST API in a YARN app, we've had to work around a few 
quirks/issues that could be addressed centrally

# proxy & filter not allowing PUT/POST/DELETE operations
# NotFound exceptions incompatible with text/plain responses

This JIRA exists to cover them and any other issues. It'll probably need some 
tests too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2082) Support for alternative log aggregation mechanism

2014-05-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004478#comment-14004478
 ] 

Steve Loughran commented on YARN-2082:
--

we do have log issues in long-lived apps, but different ones, YARN-1104 and 
YARN- look at those.

for those services we don't want the logs aggregated at the end of the run, 
more streamed off while the app runs along. I don't know if this plugin 
mechanism would help at that phase, unless the logs were being snapshotted and 
rolled out.


> Support for alternative log aggregation mechanism
> -
>
> Key: YARN-2082
> URL: https://issues.apache.org/jira/browse/YARN-2082
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Ming Ma
>
> I will post a more detailed design later. Here is the brief summary and would 
> like to get early feedback.
> Problem Statement:
> Current implementation of log aggregation create one HDFS file for each 
> {application, nodemanager }. These files are relative small, in the range of 
> 1-2 MB. In a large cluster with lots of application and many nodemanagers, it 
> ends up creating lots of small files in HDFS. This creates pressure on HDFS 
> NN on the following ways.
> 1. It increases NN Memory size. It is mitigated by having history server 
> deletes old log files in HDFS.
> 2. Runtime RPC hit on HDFS. Each log aggregation file introduced several NN 
> RPCs such as create, getAdditionalBlock, complete, rename. When the cluster 
> is busy, such RPC hit has impact on NN performance.
> In addition, to support non-MR applications on YARN, we might need to support 
> aggregation for long running applications.
> Design choices:
> 1. Don't aggregate all the logs, as in YARN-221.
> 2. Create a dedicated HDFS namespace used only for log aggregation.
> 3. Write logs to some key-value store like HBase. HBase's RPC hit on NN will 
> be much less.
> 4. Decentralize the application level log aggregation to NMs. All logs for a 
> given application are aggregated first by a dedicated NM before it is pushed 
> to HDFS.
> 5. Have NM aggregate logs on a regular basis; each of these log files will 
> have data from different applications and there needs to be some index for 
> quick lookup.
> Proposal:
> 1. Make yarn log aggregation pluggable for both read and write path. Note 
> that Hadoop FileSystem provides an abstraction and we could ask alternative 
> log aggregator implement compatable FileSystem, but that seems to an overkill.
> 2. Provide a log aggregation plugin that write to HBase. The scheme design 
> needs to support efficient read on a per application as well as per 
> application+container basis; in addition, it shouldn't create hotspot in a 
> cluster where certain users might create more jobs than others. For example, 
> we can use hash($user+$applicationId} + containerid as the row key.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2014-05-21 Thread Yi Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Tian updated YARN-2083:
--

Attachment: YARN-2083.patch

> In fair scheduler, Queue should not been assigned more containers when its 
> usedResource had reach the maxResource limit
> ---
>
> Key: YARN-2083
> URL: https://issues.apache.org/jira/browse/YARN-2083
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.3.0
>Reporter: Yi Tian
>  Labels: assignContainer, fair, scheduler
> Fix For: 2.3.0
>
> Attachments: YARN-2083.patch
>
>
> In fair scheduler, FSParentQueue and FSLeafQueue do an 
> assignContainerPreCheck to guaranty this queue is not over its limit.
> But the fitsIn function in Resource.java did not return false when the 
> usedResource equals the maxResource.
> I think we should create a new Function "fitsInWithoutEqual" instead of 
> "fitsIn" in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2014-05-21 Thread Yi Tian (JIRA)
Yi Tian created YARN-2083:
-

 Summary: In fair scheduler, Queue should not been assigned more 
containers when its usedResource had reach the maxResource limit
 Key: YARN-2083
 URL: https://issues.apache.org/jira/browse/YARN-2083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Yi Tian


In fair scheduler, FSParentQueue and FSLeafQueue do an assignContainerPreCheck 
to guaranty this queue is not over its limit.
But the fitsIn function in Resource.java did not return false when the 
usedResource equals the maxResource.

I think we should create a new Function "fitsInWithoutEqual" instead of 
"fitsIn" in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)