date:20170329

[jira] [Commented] (YARN-5478) [YARN-4902] Define Java API for generalized & unified scheduling-strategies.

2017-03-29 Thread Konstantinos Karanasos (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948454#comment-15948454
 ] 

Konstantinos Karanasos commented on YARN-5478:
--

Hi [~leftnoteasy], 
bq. I think now we generally agree that we should stop investing on the old 
ResourceRequest and we should move APIs for new features like Allocation tag, 
Affinity/Anti-affinity, Node attributes: YARN-4902, to the new ResourceRequest.
My understanding was that allocation tags that are attached to containers could 
indeed be added either in the existing ResourceRequest, in the new 
ResourceRequest or in the AllocateRequest object as a map between 
AllocateRequestID and tags.
For the remaining features (affinity, node attributes), I am still not sure 
there is need to add them at the (old or new) ResourceRequest object. It seems 
that adding constraint expressions in the ApplicationSubmissionContext and the 
AllocateRequest (for more targeted ones) is sufficient for all the use cases we 
have come across and those mentioned in YARN-4793.

I just uploaded a design document in YARN-5468, where we give more details on 
our thoughts. We tried to address all the points we discussed in our last 
meeting.
Please give it a look and let's continue the discussion.

[~Naganarasimha], please also check the document. Based on our latest 
discussions with Wangda, we included a way to specify node attributes in the 
constraint expression (using namespaces to differentiate between different type 
of constraints).

> [YARN-4902] Define Java API for generalized & unified scheduling-strategies.
> 
>
> Key: YARN-5478
> URL: https://issues.apache.org/jira/browse/YARN-5478
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-5478.1.patch, YARN-5478.2.patch, 
> YARN-5478.preliminary-poc.1.patch, YARN-5478.preliminary-poc.2.patch
>
>
> Define Java API for application to specify generic scheduling requirements 
> described in YARN-4902 design doc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6342) Make TimelineV2Client's drain timeout after stop configurable

2017-03-29 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948462#comment-15948462
 ] 

Varun Saxena commented on YARN-6342:


Sorry for nitpicking. 
I would gives spaces in between timelinev2client. And say draining "leftover 
entities" to indicate we are talking about entities.

{code}
+
+The time period for which timelinev2client will wait for draining its queue
+after stop.
+
{code}

Let me invoke the build manually. Some failures look weird.

Regarding this being an advanced configuration, I felt upon offline discussion 
with Rohith that it would be better to make this config public as in their 
testing they found entities were taking upto 3 seconds to flush. So this would 
largely depend on AM and the amount of entities they write on stop.  

> Make TimelineV2Client's drain timeout after stop configurable
> -
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Haibo Chen
> Attachments: YARN-6342.00.patch, YARN-6342.01.patch
>
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6411) Clean up the overwrite of createDispatcher() in subclass of MockRM

2017-03-29 Thread Yufei Gu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6411:
---
Attachment: YARN-6411.001.patch

> Clean up the overwrite of createDispatcher() in subclass of MockRM
> --
>
> Key: YARN-6411
> URL: https://issues.apache.org/jira/browse/YARN-6411
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>Priority: Minor
> Attachments: YARN-6411.001.patch
>
>
> MockRM creates a object of {{DrainDispatcher}} in YARN-3102. We don't need to 
> do the same thing in its subclasses.
> {code}
>   @Override
>   protected Dispatcher createDispatcher() {
> return new DrainDispatcher();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6395) Integrate service app master to write data into ATSv2.

2017-03-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948445#comment-15948445
 ] 

Hadoop QA commented on YARN-6395:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
24s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 20s{color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-slider_hadoop-yarn-slider-core
 generated 6 new + 23 unchanged - 6 fixed = 29 total (was 29) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 17s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-slider/hadoop-yarn-slider-core:
 The patch generated 18 new + 256 unchanged - 1 fixed = 274 total (was 257) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} hadoop-yarn-slider-core in the patch passed. {color} 
|
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6395 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12861162/YARN-6395.yarn-native-services.0002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 2e257e6386c3 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | yarn-native-services / c0055cd |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/15431/artifact/patchprocess/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-slider_hadoop-yarn-slider-core.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/15431/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-slider_hadoop-yarn-slider-core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/15431/testReport/ |
| modules | C: 
hadoop-yarn-project

[jira] [Commented] (YARN-6395) Integrate service app master to write data into ATSv2.

2017-03-29 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948451#comment-15948451
 ] 

Rohith Sharma K S commented on YARN-6395:
-

javadoc errors are not related to patch.

> Integrate service app master to write data into ATSv2.
> --
>
> Key: YARN-6395
> URL: https://issues.apache.org/jira/browse/YARN-6395
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-6395.yarn-native-services.0001.patch, 
> YARN-6395.yarn-native-services.0002.patch
>
>
> Integration of ATSv2 with native service app master. And ATSv2 data's are 
> used for UI rendering. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6342) Make TimelineV2Client's drain timeout after stop configurable

2017-03-29 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948450#comment-15948450
 ] 

Rohith Sharma K S commented on YARN-6342:
-

+1 for the patch. would you look at the build failure details?

> Make TimelineV2Client's drain timeout after stop configurable
> -
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Haibo Chen
> Attachments: YARN-6342.00.patch, YARN-6342.01.patch
>
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5468) Scheduling of long-running applications

2017-03-29 Thread Konstantinos Karanasos (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Karanasos updated YARN-5468:
-
Attachment: LRA-scheduling-design.v1.pdf

Attaching latest design document, reflecting current design.

> Scheduling of long-running applications
> ---
>
> Key: YARN-5468
> URL: https://issues.apache.org/jira/browse/YARN-5468
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacityscheduler, fairscheduler
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: LRA-scheduling-design.v1.pdf, YARN-5468.prototype.patch
>
>
> This JIRA is about the scheduling of applications with long-running tasks.
> It will include adding support to the YARN for a richer set of scheduling 
> constraints (such as affinity, anti-affinity, cardinality and time 
> constraints), and extending the schedulers to take them into account during 
> placement of containers to nodes.
> We plan to have both an online version that will accommodate such requests as 
> they arrive, as well as a Long-running Application Planner that will make 
> more global decisions by considering multiple applications at once.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6395) Integrate service app master to write data into ATSv2.

2017-03-29 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-6395:

Attachment: YARN-6395.yarn-native-services.0002.patch

updated the patch fixing some of the checkstyle and java doc. Many java docs  
errors are not from patch.

> Integrate service app master to write data into ATSv2.
> --
>
> Key: YARN-6395
> URL: https://issues.apache.org/jira/browse/YARN-6395
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-6395.yarn-native-services.0001.patch, 
> YARN-6395.yarn-native-services.0002.patch
>
>
> Integration of ATSv2 with native service app master. And ATSv2 data's are 
> used for UI rendering. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Issue Comment Deleted] (YARN-6276) Now container kill mechanism may lead process leak

2017-03-29 Thread Feng Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated YARN-6276:

Comment: was deleted

(was: Add this issue here.In this patch container signal operation will
check current pid is or not own by containerId str.
IMO,there is some process does not contain containerId itself.
Such as subprocess create by user-code itself.)

> Now container kill mechanism may lead process leak
> --
>
> Key: YARN-6276
> URL: https://issues.apache.org/jira/browse/YARN-6276
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Feng Yuan
>Assignee: Feng Yuan
>
> When kill bash process, YarnChild may didn`t response because fullgc, 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6276) Now container kill mechanism may lead process leak

2017-03-29 Thread Feng Yuan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948427#comment-15948427
 ] 

Feng Yuan commented on YARN-6276:
-

Add this issue here.In this patch container signal operation will
check current pid is or not own by containerId str.
IMO,there is some process does not contain containerId itself.
Such as subprocess create by user-code itself.

> Now container kill mechanism may lead process leak
> --
>
> Key: YARN-6276
> URL: https://issues.apache.org/jira/browse/YARN-6276
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Feng Yuan
>Assignee: Feng Yuan
>
> When kill bash process, YarnChild may didn`t response because fullgc, 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6277) Nodemanager heap memory leak

2017-03-29 Thread Feng Yuan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948417#comment-15948417
 ] 

Feng Yuan commented on YARN-6277:
-

I have attach a patch,this issue is due to ShuffleHandler and Nodemanager 
configuration is inconformity.
So if NM change the local-dir configs and ShuffleHandler would not synchronize.


> Nodemanager heap memory leak
> 
>
> Key: YARN-6277
> URL: https://issues.apache.org/jira/browse/YARN-6277
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.3, 2.8.1, 3.0.0-alpha2
>Reporter: Feng Yuan
>Assignee: Feng Yuan
> Attachments: YARN-6277.branch-2.8.001.patch
>
>
> Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create 
> massive LocalFileSystem.So lead to heap leak.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6277) Nodemanager heap memory leak

2017-03-29 Thread Feng Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated YARN-6277:

Attachment: YARN-6277.branch-2.8.001.patch

> Nodemanager heap memory leak
> 
>
> Key: YARN-6277
> URL: https://issues.apache.org/jira/browse/YARN-6277
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.3, 2.8.1, 3.0.0-alpha2
>Reporter: Feng Yuan
>Assignee: Feng Yuan
> Attachments: YARN-6277.branch-2.8.001.patch
>
>
> Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create 
> massive LocalFileSystem.So lead to heap leak.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6141) ppc64le on Linux doesn't trigger __linux get_executable codepath

2017-03-29 Thread Sonia Garudi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948390#comment-15948390
 ] 

Sonia Garudi commented on YARN-6141:


Any update ?

> ppc64le on Linux doesn't trigger __linux get_executable codepath
> 
>
> Key: YARN-6141
> URL: https://issues.apache.org/jira/browse/YARN-6141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha3
> Environment: $ uname -a
> Linux f8eef0f055cf 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 
> 17:42:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
>Reporter: Sonia Garudi
>  Labels: ppc64le
> Attachments: YARN-6141.patch
>
>
> On ppc64le architecture, the build fails in the 'Hadoop YARN NodeManager' 
> project with the below error:
> Cannot safely determine executable path with a relative HADOOP_CONF_DIR on 
> this operating system.
> [WARNING]  #error Cannot safely determine executable path with a relative 
> HADOOP_CONF_DIR on this operating system.
> [WARNING]   ^
> [WARNING] make[2]: *** 
> [CMakeFiles/container.dir/main/native/container-executor/impl/get_executable.c.o]
>  Error 1
> [WARNING] make[2]: *** Waiting for unfinished jobs
> [WARNING] make[1]: *** [CMakeFiles/container.dir/all] Error 2
> [WARNING] make: *** [all] Error 2
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> Cmake version used :
> $ /usr/bin/cmake --version
> cmake version 2.8.12.2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6403) Invalid local resource request can raise NPE and make NM exit

2017-03-29 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-6403:
---
Attachment: YARN-6403.002.patch

[~jlowe] Thanks for correcting me. 
The last server-side change is not proper and I corrected it as your mentioned. 
For the client-side change, IIUIC the generated protobuf code won't throws NPE 
for this case actually.
Unit tests for both the client and server change is added.
Attach a new patch for review, please correct me if I missed something.

> Invalid local resource request can raise NPE and make NM exit
> -
>
> Key: YARN-6403
> URL: https://issues.apache.org/jira/browse/YARN-6403
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Tao Yang
> Attachments: YARN-6403.001.patch, YARN-6403.002.patch
>
>
> Recently we found this problem on our testing environment. The app that 
> caused this problem added a invalid local resource request(have no location) 
> into ContainerLaunchContext like this:
> {code}
> localResources.put("test", LocalResource.newInstance(location,
> LocalResourceType.FILE, LocalResourceVisibility.PRIVATE, 100,
> System.currentTimeMillis()));
> ContainerLaunchContext amContainer =
> ContainerLaunchContext.newInstance(localResources, environment,
>   vargsFinal, null, securityTokens, acls);
> {code}
> The actual value of location was null although app doesn't expect that. This 
> mistake cause several NMs exited with the NPE below and can't restart until 
> the nm recovery dirs were deleted. 
> {code}
> FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.(LocalResourceRequest.java:46)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:711)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:660)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1320)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:88)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1293)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1286)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> NPE occured when created LocalResourceRequest instance for invalid resource 
> request.
> {code}
>   public LocalResourceRequest(LocalResource resource)
>   throws URISyntaxException {
> this(resource.getResource().toPath(),  //NPE occurred here
> resource.getTimestamp(),
> resource.getType(),
> resource.getVisibility(),
> resource.getPattern());
>   }
> {code}
> We can't guarantee the validity of local resource request now, but we could 
> avoid damaging the cluster. Perhaps we can verify the resource both in 
> ContainerLaunchContext and LocalResourceRequest? Please feel free to give 
> your suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6204) Set UncaughtExceptionHandler for event handling thread in AsyncDispatcher

2017-03-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948259#comment-15948259
 ] 

Hadoop QA commented on YARN-6204:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
53s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 52 unchanged - 0 fixed = 53 total (was 52) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
27s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 39m 
50s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 95m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6204 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12861126/YARN-6204.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 30463f3ad681 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 6a5516c |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/15430/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/15430/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-

[jira] [Created] (YARN-6413) Decouple Yarn Registry API from ZK

2017-03-29 Thread Ellen Hui (JIRA)

Ellen Hui created YARN-6413:
---

 Summary: Decouple Yarn Registry API from ZK
 Key: YARN-6413
 URL: https://issues.apache.org/jira/browse/YARN-6413
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: amrmproxy, api, resourcemanager
Reporter: Ellen Hui
Assignee: Jian He


Right now the Yarn Registry API (defined in the RegistryOperations interface) 
is a very thin layer over Zookeeper. This jira proposes changing the interface 
to abstract away the implementation details so that we can write a FS-based 
implementation of the registry service, which will be used to support AMRMProxy 
HA.

The new interface will use register/delete/resolve APIs instead of 
Zookeeper-specific operations like mknode. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-6412) aux-services classpath not documented

2017-03-29 Thread Miklos Szegedi (JIRA)

Miklos Szegedi created YARN-6412:


 Summary: aux-services classpath not documented
 Key: YARN-6412
 URL: https://issues.apache.org/jira/browse/YARN-6412
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Priority: Minor


YARN-4577 introduced two new configuration entries 
yarn.nodemanager.aux-services.%s.classpath and 
yarn.nodemanager.aux-services.%s.system-classes. These are not documented in 
hadoop-yarn-common/.../yarn-default.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6202) Configuration item Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY is disregarded

2017-03-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948153#comment-15948153
 ] 

Hadoop QA commented on YARN-6202:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
33s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
31s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m  
6s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m  1s{color} | {color:orange} root: The patch generated 2 new + 313 unchanged 
- 5 fixed = 315 total (was 318) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
37s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 39s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 45s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m  
6s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
16s{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}124m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6202 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12861109/YARN-6202.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 5886c5f6133c 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 4966a6e |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| checkstyle

[jira] [Commented] (YARN-6342) Make TimelineV2Client's drain timeout after stop configurable

2017-03-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948151#comment-15948151
 ] 

Hadoop QA commented on YARN-6342:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
37s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
24s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
25s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
30s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
20s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 27s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6342 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12861115/YARN-6342.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  xml  |
| uname | Linux 32955eff96c3 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 6a5516c |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-YARN-Build/15428/artifact/patchprocess/patch-mvnin

[jira] [Commented] (YARN-6363) Extending SLS: Synthetic Load Generator

2017-03-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948147#comment-15948147
 ] 

Hadoop QA commented on YARN-6363:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
25s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m  3s{color} | {color:orange} root: The patch generated 155 new + 270 
unchanged - 21 fixed = 425 total (was 291) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
20s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
24s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
6s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
51s{color} | {color:red} hadoop-tools/hadoop-sls generated 1 new + 0 unchanged 
- 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 43s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
21s{color} | {color:green} hadoop-rumen in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 32s{color} 
| {color:red} hadoop-sls in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
37s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}133m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-tools/hadoop-sls |
|  |  org.apache.hadoop.yarn.sls.SLSRunner.run(String[]) invokes 
System.exit(...), which shuts down the entire virtual machine  At 
SLSRunner.java:down the entire virtual machine  At SLSRunner.java:[line 744] |
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior
 |
|   | hadoop.yarn.server.resourcemanager.reservation.TestReservationSystem |
|   | hadoop.yarn.server.

[jira] [Created] (YARN-6411) Clean up the overwrite of createDispatcher() in subclass of MockRM

2017-03-29 Thread Yufei Gu (JIRA)

Yufei Gu created YARN-6411:
--

 Summary: Clean up the overwrite of createDispatcher() in subclass 
of MockRM
 Key: YARN-6411
 URL: https://issues.apache.org/jira/browse/YARN-6411
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager
Affects Versions: 3.0.0-alpha2, 2.9.0
Reporter: Yufei Gu
Assignee: Yufei Gu
Priority: Minor


MockRM creates a object of {{DrainDispatcher}} in YARN-3102. We don't need to 
do the same thing in its subclasses.
{code}
  @Override
  protected Dispatcher createDispatcher() {
return new DrainDispatcher();
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6204) Set UncaughtExceptionHandler for event handling thread in AsyncDispatcher

2017-03-29 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948139#comment-15948139
 ] 

Yufei Gu commented on YARN-6204:


Uploaded patch 001. Make {{RMCriticalThreadUncaughtExceptionHandler}} inherit 
from {{YarnUncaughtExceptionHandler}}, so that {{AsyncDispatcher}} doesn't need 
to deal with {{RMCriticalThreadUncaughtExceptionHandler}} directly.

> Set UncaughtExceptionHandler for event handling thread in AsyncDispatcher
> -
>
> Key: YARN-6204
> URL: https://issues.apache.org/jira/browse/YARN-6204
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6204.001.patch
>
>
> The event handling thread in AsyncDispatcher is a critical thread in RM. We 
> should set UncaughtExceptionHandler introduced in YARN-6061 for it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6204) Set UncaughtExceptionHandler for event handling thread in AsyncDispatcher

2017-03-29 Thread Yufei Gu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6204:
---
Attachment: YARN-6204.001.patch

> Set UncaughtExceptionHandler for event handling thread in AsyncDispatcher
> -
>
> Key: YARN-6204
> URL: https://issues.apache.org/jira/browse/YARN-6204
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6204.001.patch
>
>
> The event handling thread in AsyncDispatcher is a critical thread in RM. We 
> should set UncaughtExceptionHandler introduced in YARN-6061 for it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2017-03-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948131#comment-15948131
 ] 

Hadoop QA commented on YARN-5797:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 18s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 1 new + 289 unchanged - 7 fixed = 290 total (was 296) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
55s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 56s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5797 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12843130/YARN-5797-trunk.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 94aa661c8faf 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 6a5516c |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/15429/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/15429/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/15429/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
>

[jira] [Commented] (YARN-5654) Not be able to run SLS with FairScheduler

2017-03-29 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948112#comment-15948112
 ] 

Yufei Gu commented on YARN-5654:


Great! Thanks [~rkanter] for the review and commit!

> Not be able to run SLS with FairScheduler
> -
>
> Key: YARN-5654
> URL: https://issues.apache.org/jira/browse/YARN-5654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-5654.002.patch, YARN-5654.003.patch, 
> YARN-5654.1.patch
>
>
> With the config:
> https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/configs/hadoop-conf-fs
> And data:
> https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/data/scheduler-load-test-data
> Capacity Scheduler runs fine, but Fair Scheduler cannot be successfully run. 
> It reports NPE from RMAppAttemptImpl



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5654) Not be able to run SLS with FairScheduler

2017-03-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948110#comment-15948110
 ] 

Hudson commented on YARN-5654:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11493 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11493/])
YARN-5654. Not be able to run SLS with FairScheduler (yufeigu via (rkanter: rev 
6a5516c2381f107d96b8326939514de3c6e53d3d)
* (edit) 
hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/appmaster/TestAMSimulator.java
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/utils/SLSUtils.java
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/web/SLSWebApp.java
* (add) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/Tracker.java
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerWrapper.java
* (edit) 
hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/nodemanager/TestNMSimulator.java
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java
* (add) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java
* (delete) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java


> Not be able to run SLS with FairScheduler
> -
>
> Key: YARN-5654
> URL: https://issues.apache.org/jira/browse/YARN-5654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-5654.002.patch, YARN-5654.003.patch, 
> YARN-5654.1.patch
>
>
> With the config:
> https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/configs/hadoop-conf-fs
> And data:
> https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/data/scheduler-load-test-data
> Capacity Scheduler runs fine, but Fair Scheduler cannot be successfully run. 
> It reports NPE from RMAppAttemptImpl



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines

2017-03-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948109#comment-15948109
 ] 

Hadoop QA commented on YARN-6004:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 88 unchanged - 7 fixed = 88 total (was 95) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m  
3s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 31m 38s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6004 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12861114/YARN-6004-trunk.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux cc8cc14f8ee8 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 4966a6e |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/15427/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/15427/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer 
> so that it is less than 150 lines
> --
>
> Key: YARN-6004
> URL: https://issues.apache.org/jira/browse/YARN-6004
> Project: Hadoop YARN

[jira] [Assigned] (YARN-6356) Allow different values of yarn.log-aggregation.retain-seconds for succeeded and failed jobs

2017-03-29 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned YARN-6356:


Assignee: Haibo Chen

> Allow different values of yarn.log-aggregation.retain-seconds for succeeded 
> and failed jobs
> ---
>
> Key: YARN-6356
> URL: https://issues.apache.org/jira/browse/YARN-6356
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Reporter: Robert Kanter
>Assignee: Haibo Chen
>
> It would be useful to have a value of {{yarn.log-aggregation.retain-seconds}} 
> for succeeded jobs and a different value for failed/killed jobs.  For jobs 
> that succeeded, you typically don't care about the logs, so a shorter 
> retention time is fine (and saves space/blocks in HDFS).  For jobs that 
> failed or were killed, the logs are much more important, and it's likely to 
> want to keep them around for longer so you have time to look at them.
> For instance, you could set it to keep logs for succeeded jobs for 1 day and 
> logs for failed/killed jobs for 1 week.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5654) Not be able to run SLS with FairScheduler

2017-03-29 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948070#comment-15948070
 ] 

Robert Kanter commented on YARN-5654:
-

+1

> Not be able to run SLS with FairScheduler
> -
>
> Key: YARN-5654
> URL: https://issues.apache.org/jira/browse/YARN-5654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Yufei Gu
> Attachments: YARN-5654.002.patch, YARN-5654.003.patch, 
> YARN-5654.1.patch
>
>
> With the config:
> https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/configs/hadoop-conf-fs
> And data:
> https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/data/scheduler-load-test-data
> Capacity Scheduler runs fine, but Fair Scheduler cannot be successfully run. 
> It reports NPE from RMAppAttemptImpl



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2017-03-29 Thread Chris Trezzo (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948062#comment-15948062
 ] 

Chris Trezzo commented on YARN-5797:


Patch is now available for YARN-6004 as well. Thanks!

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5797-trunk.002.patch, YARN-5797-trunk-v1.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6342) Make TimelineV2Client's drain timeout after stop configurable

2017-03-29 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-6342:
-
Attachment: YARN-6342.01.patch

> Make TimelineV2Client's drain timeout after stop configurable
> -
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Haibo Chen
> Attachments: YARN-6342.00.patch, YARN-6342.01.patch
>
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6354) LeveldbRMStateStore can parse invalid keys when recovering reservations

2017-03-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948050#comment-15948050
 ] 

Hadoop QA commented on YARN-6354:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m  8s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6354 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12861105/YARN-6354.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 4fb5e828bdd0 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 4966a6e |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/15424/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/15424/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/15424/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> LeveldbRMStateStore can parse invalid keys when recovering reservations
> ---
>
> Key: YARN-6354
> URL: https://iss

[jira] [Updated] (YARN-6342) Make TimelineV2Client's drain timeout after stop configurable

2017-03-29 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-6342:
-
Summary: Make TimelineV2Client's drain timeout after stop configurable  
(was: Make TimelineV2Client's drain period after stop configurable)

> Make TimelineV2Client's drain timeout after stop configurable
> -
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Haibo Chen
> Attachments: YARN-6342.00.patch
>
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines

2017-03-29 Thread Chris Trezzo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-6004:
---
Attachment: YARN-6004-trunk.002.patch

Attached is v2 for trunk to address checkstyle issues.

> Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer 
> so that it is less than 150 lines
> --
>
> Key: YARN-6004
> URL: https://issues.apache.org/jira/browse/YARN-6004
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-6004-trunk.001.patch, YARN-6004-trunk.002.patch
>
>
> The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill 
> method is over 150 lines:
> bq. 
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128:
>  @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150).
> This method needs to be refactored and broken up into smaller methods.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6342) Make TimelineV2Client's drain period after stop configurable

2017-03-29 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948024#comment-15948024
 ] 

Haibo Chen commented on YARN-6342:
--

Thanks for your review, [~varun_saxena]! 
bq. We need to add this configuration in yarn-default.xml
I was under the impression that this is going to be an advanced configuration 
that people rarely change, so did not add to yarn-default.xml to advertise it. 
I'll address this along with the rest of your comments.

> Make TimelineV2Client's drain period after stop configurable
> 
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Haibo Chen
> Attachments: YARN-6342.00.patch
>
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6363) Extending SLS: Synthetic Load Generator

2017-03-29 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948016#comment-15948016
 ] 

Subru Krishnan commented on YARN-6363:
--

Thanks [~curino] for working on this really useful extension to SLS. 

Can you kindly move the documentation to the SLS md to ensure it gets published 
to the apache hadoop wiki.

> Extending SLS: Synthetic Load Generator
> ---
>
> Key: YARN-6363
> URL: https://issues.apache.org/jira/browse/YARN-6363
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-6363 overview.pdf, YARN-6363.v0.patch, 
> YARN-6363.v1.patch, YARN-6363.v2.patch, YARN-6363.v3.patch, YARN-6363.v4.patch
>
>
> This JIRA tracks the introduction of a synthetic load generator in the SLS. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests

2017-03-29 Thread Yufei Gu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-5106:
--

Assignee: (was: Yufei Gu)

> Provide a builder interface for FairScheduler allocations for use in tests
> --
>
> Key: YARN-5106
> URL: https://issues.apache.org/jira/browse/YARN-5106
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>  Labels: newbie++
>
> Most, if not all, fair scheduler tests create an allocations XML file. Having 
> a helper class that potentially uses a builder would make the tests cleaner. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6202) Configuration item Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY is disregarded

2017-03-29 Thread Yufei Gu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6202:
---
Description: 
Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY (yarn.dispatcher.exit-on-error) always 
be true no matter what value in configuration files. This misleads users. Two 
solutions: 
# Remove the configuration item and provide a method to allow 
{{exitOnDispatchException}}/{{shouldExitOnError}} to be false to enable related 
unit tests. There is no need for false value in a real daemon since daemons 
should crash if its dispatcher quit.
# Make it default true instead of false, so that we don't need to hard code it 
to be true in RM and NM, it is still configurable, and also provide method to 
enable related unit tests.

Other than that, the code around it needs to refactor. {{public static final}} 
for a variable of interface isn't necessary, and YARN related configure item 
should be in class YarnConfiguration. 

  was:
Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY (yarn.dispatcher.exit-on-error) always 
be true no matter what value in configuration files. This misleads users. Two 
solutions: 
# Remove the configuration item and provide a method to allow 
{{exitOnDispatchException}}/{{shouldExitOnError}} to be false to enable related 
unit tests, the assumption is there is no need for false value in a real 
daemon, this is valid according to MAPREDUCE-3634 which introduce the 
configuration item.
# Make it default true instead of false, so that we don't need to hard code it 
to be true in RM and NM, it is still configurable, and also provide method to 
enable related unit tests.

Other than that, the code around it needs to refactor. {{public static final}} 
for a variable of interface isn't necessary, and YARN related configure item 
should be in class YarnConfiguration. 


> Configuration item Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY is disregarded
> -
>
> Key: YARN-6202
> URL: https://issues.apache.org/jira/browse/YARN-6202
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6202.001.patch
>
>
> Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY (yarn.dispatcher.exit-on-error) 
> always be true no matter what value in configuration files. This misleads 
> users. Two solutions: 
> # Remove the configuration item and provide a method to allow 
> {{exitOnDispatchException}}/{{shouldExitOnError}} to be false to enable 
> related unit tests. There is no need for false value in a real daemon since 
> daemons should crash if its dispatcher quit.
> # Make it default true instead of false, so that we don't need to hard code 
> it to be true in RM and NM, it is still configurable, and also provide method 
> to enable related unit tests.
> Other than that, the code around it needs to refactor. {{public static 
> final}} for a variable of interface isn't necessary, and YARN related 
> configure item should be in class YarnConfiguration. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6202) Configuration item Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY is disregarded

2017-03-29 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948001#comment-15948001
 ] 

Yufei Gu commented on YARN-6202:


Uploaded patch v1.
Remove configuration items: Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY, and add 
methods for {{AsyncDispatcher}} and {{EventDispatcher}} to disable exit on 
exception/error, which are invoked by unit tests.

> Configuration item Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY is disregarded
> -
>
> Key: YARN-6202
> URL: https://issues.apache.org/jira/browse/YARN-6202
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6202.001.patch
>
>
> Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY (yarn.dispatcher.exit-on-error) 
> always be true no matter what value in configuration files. This misleads 
> users. Two solutions: 
> # Remove the configuration item and provide a method to allow 
> {{exitOnDispatchException}}/{{shouldExitOnError}} to be false to enable 
> related unit tests, the assumption is there is no need for false value in a 
> real daemon, this is valid according to MAPREDUCE-3634 which introduce the 
> configuration item.
> # Make it default true instead of false, so that we don't need to hard code 
> it to be true in RM and NM, it is still configurable, and also provide method 
> to enable related unit tests.
> Other than that, the code around it needs to refactor. {{public static 
> final}} for a variable of interface isn't necessary, and YARN related 
> configure item should be in class YarnConfiguration. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6202) Configuration item Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY is disregarded

2017-03-29 Thread Yufei Gu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6202:
---
Attachment: YARN-6202.001.patch

> Configuration item Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY is disregarded
> -
>
> Key: YARN-6202
> URL: https://issues.apache.org/jira/browse/YARN-6202
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6202.001.patch
>
>
> Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY (yarn.dispatcher.exit-on-error) 
> always be true no matter what value in configuration files. This misleads 
> users. Two solutions: 
> # Remove the configuration item and provide a method to allow 
> {{exitOnDispatchException}}/{{shouldExitOnError}} to be false to enable 
> related unit tests, the assumption is there is no need for false value in a 
> real daemon, this is valid according to MAPREDUCE-3634 which introduce the 
> configuration item.
> # Make it default true instead of false, so that we don't need to hard code 
> it to be true in RM and NM, it is still configurable, and also provide method 
> to enable related unit tests.
> Other than that, the code around it needs to refactor. {{public static 
> final}} for a variable of interface isn't necessary, and YARN related 
> configure item should be in class YarnConfiguration. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6363) Extending SLS: Synthetic Load Generator

2017-03-29 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947980#comment-15947980
 ] 

Carlo Curino commented on YARN-6363:


v4 of the patch contains correct command-line scripts to run the SLS. And a few 
fixes. During a long run, we notice the NPE mentioned in YARN-6408 (not sure if 
it is due to SLS or exists in normal operations as well, [~sunilg] is 
investigating).

> Extending SLS: Synthetic Load Generator
> ---
>
> Key: YARN-6363
> URL: https://issues.apache.org/jira/browse/YARN-6363
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-6363 overview.pdf, YARN-6363.v0.patch, 
> YARN-6363.v1.patch, YARN-6363.v2.patch, YARN-6363.v3.patch, YARN-6363.v4.patch
>
>
> This JIRA tracks the introduction of a synthetic load generator in the SLS. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6363) Extending SLS: Synthetic Load Generator

2017-03-29 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-6363:
---
Attachment: YARN-6363.v4.patch

> Extending SLS: Synthetic Load Generator
> ---
>
> Key: YARN-6363
> URL: https://issues.apache.org/jira/browse/YARN-6363
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-6363 overview.pdf, YARN-6363.v0.patch, 
> YARN-6363.v1.patch, YARN-6363.v2.patch, YARN-6363.v3.patch, YARN-6363.v4.patch
>
>
> This JIRA tracks the introduction of a synthetic load generator in the SLS. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6363) Extending SLS: Synthetic Load Generator

2017-03-29 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-6363:
---
Attachment: YARN-6363 overview.pdf

Minor fix in doc (commandline param)

> Extending SLS: Synthetic Load Generator
> ---
>
> Key: YARN-6363
> URL: https://issues.apache.org/jira/browse/YARN-6363
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-6363 overview.pdf, YARN-6363.v0.patch, 
> YARN-6363.v1.patch, YARN-6363.v2.patch, YARN-6363.v3.patch
>
>
> This JIRA tracks the introduction of a synthetic load generator in the SLS. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6363) Extending SLS: Synthetic Load Generator

2017-03-29 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-6363:
---
Attachment: (was: YARN-6363 overview.pdf)

> Extending SLS: Synthetic Load Generator
> ---
>
> Key: YARN-6363
> URL: https://issues.apache.org/jira/browse/YARN-6363
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-6363 overview.pdf, YARN-6363.v0.patch, 
> YARN-6363.v1.patch, YARN-6363.v2.patch, YARN-6363.v3.patch
>
>
> This JIRA tracks the introduction of a synthetic load generator in the SLS. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6354) LeveldbRMStateStore can parse invalid keys when recovering reservations

2017-03-29 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-6354:
-
Attachment: YARN-6354.001.patch

Patch that adds a termination check for the reservation key traversal loop and 
a unit test.

> LeveldbRMStateStore can parse invalid keys when recovering reservations
> ---
>
> Key: YARN-6354
> URL: https://issues.apache.org/jira/browse/YARN-6354
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Jason Lowe
> Attachments: YARN-6354.001.patch
>
>
> When trying to upgrade an RM to 2.8 it fails with a 
> StringIndexOutOfBoundsException trying to load reservation state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-6354) LeveldbRMStateStore can parse invalid keys when recovering reservations

2017-03-29 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned YARN-6354:


Assignee: Jason Lowe

> LeveldbRMStateStore can parse invalid keys when recovering reservations
> ---
>
> Key: YARN-6354
> URL: https://issues.apache.org/jira/browse/YARN-6354
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-6354.001.patch
>
>
> When trying to upgrade an RM to 2.8 it fails with a 
> StringIndexOutOfBoundsException trying to load reservation state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5301) NM mount cpu cgroups failed on some systems

2017-03-29 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947957#comment-15947957
 ] 

Miklos Szegedi commented on YARN-5301:
--

[~bibinchundatt], do you have time to take a look?

> NM mount cpu cgroups failed on some systems
> ---
>
> Key: YARN-5301
> URL: https://issues.apache.org/jira/browse/YARN-5301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: Miklos Szegedi
> Attachments: YARN-5301.000.patch, YARN-5301.001.patch
>
>
> on ubuntu  with linux kernel 3.19, , NM start failed if enable auto mount 
> cgroup. try command:
> ./bin/container-executor --mount-cgroups yarn-hadoop cpu=/cgroup/cpufail
> ./bin/container-executor --mount-cgroups yarn-hadoop cpu,cpuacct=/cgroup/cpu  
>   succ



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6407) Improve and fix locks of RM scheduler

2017-03-29 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947955#comment-15947955
 ] 

Karthik Kambatla commented on YARN-6407:


On large clusters, we have been recommending our customers to turn off 
continuous scheduling. Are you sure you need continuous scheduling? 

> Improve and fix locks of RM scheduler
> -
>
> Key: YARN-6407
> URL: https://issues.apache.org/jira/browse/YARN-6407
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
> Environment: CentOS 7, 1 Gigabit Ethernet
>Reporter: zhengchenyu
> Fix For: 2.7.1
>
>   Original Estimate: 2m
>  Remaining Estimate: 2m
>
> First，this issue dose not duplicate the YARN-3091.
> In our cluster, we have 5k nodes, and the server is configured with 1 Gigabit 
> Ethernet. So network is bottleneck in our cluster.
> We must distcp data from warehouse, because of 1 Gigabit Ethernet, we must 
> set yarn.scheduler.fair.max.assign to 5, or must lead to hotspot.
> The setting that max.assign is 5 lead to the assigned ability decreased. So 
> we start the ContinuousSchedulingThread. 
> As more applicaitons running in our cluster , and with 
> ContinuousSchedulingThread, the problem of lock contention is more serious. 
> In our cluster, the callqueue of ApplicationMasterSeriver's rpc is high 
> occasionally. we worried that more problem occure in future with more 
> application are running.
> Here is our logical graph:
> "1 Gigabit Ethernet" and "data hot spot" ==> "set 
> yarn.scheduler.fair.max.assign to 5" ==> "ContinuousSchedulingThread is 
> started" and "more applcations" => "lock contention"
> I know YARN-3091 solved this problem, but the patch aims that change the 
> object lock to read write lock. This change is still Coarse-Grained. So I 
> think we lock the resources or not lock the large section code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6342) Make TimelineV2Client's drain period after stop configurable

2017-03-29 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947892#comment-15947892
 ] 

Varun Saxena commented on YARN-6342:


Thanks [~haibochen] for the patch.

It looks good to me in general.
I have a few comments on the config though.
# I think the configuration should start with 
{{yarn.timeline-service.client.}}. You can use TIMELINE_SERVICE_CLIENT_PREFIX 
instead of TIMELINE_SERVICE_PREFIX in YarnConfiguration.
# How about naming the configuration 
{{yarn.timeline-service.client.drain-entities.timeout.ms}}. I think the 
configuration description can tell that its done on stop.
# We need to add this configuration in yarn-default.xml

> Make TimelineV2Client's drain period after stop configurable
> 
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Haibo Chen
> Attachments: YARN-6342.00.patch
>
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5301) NM mount cpu cgroups failed on some systems

2017-03-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947891#comment-15947891
 ] 

Hadoop QA commented on YARN-5301:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 4 unchanged - 4 fixed = 4 total (was 8) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
45s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 38s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5301 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12861093/YARN-5301.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  |
| uname | Linux d852e53ce52f 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 640ba1d |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/15423/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/15423/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NM mount cpu cgroups failed on some systems
> ---
>
> Key: YARN-5301
> URL: https://issues.apache.org/jira/browse/YARN-5301
> Project: Hadoop YARN
>  Is

[jira] [Commented] (YARN-6109) Add an ability to convert ChildQueue to ParentQueue

2017-03-29 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947886#comment-15947886
 ] 

Xuan Gong commented on YARN-6109:
-

Thanks for the comments. [~subru].

After carefully investigation, I still prefer to create a separate ticket to 
make sure reservation system work for Stop/Delete/Convert queue scenarios. And 
we should do:
* For Delete/Convert a queue, we will make sure the current queue is in the 
stopped state which means we should stop the queue first. For the reservation 
system/plan queue, we should check the current leaf queue (if it is a parent 
queue, check its children) whether it has the future reservations or not. If it 
does not, we could stop the queue. Otherwise, we would throw out exception and 
ask the user to remove the reservation first.

* To submits a reservation creation request, we should always make sure the 
current leaf queue is in RUNNING state. Otherwise, we would reject the request.

[~leftnoteasy] [~subru] If you are fine with the plan, i will create a separate 
ticket.

> Add an ability to convert ChildQueue to ParentQueue
> ---
>
> Key: YARN-6109
> URL: https://issues.apache.org/jira/browse/YARN-6109
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-6109.1.patch, YARN-6109.2.patch, 
> YARN-6109.rebase.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6302) Fail the node, if Linux Container Executor is not configured properly

2017-03-29 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947841#comment-15947841
 ] 

Daniel Templeton commented on YARN-6302:


LGTM.  Can you please post a patch to the JIRA so that Jenkins has something to 
chew on?

> Fail the node, if Linux Container Executor is not configured properly
> -
>
> Key: YARN-6302
> URL: https://issues.apache.org/jira/browse/YARN-6302
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
>
> We have a cluster that has one node with misconfigured Linux Container 
> Executor. Every time an AM or regular container is launched on the cluster, 
> it will fail. The node will still have resources available, so it keeps 
> failing apps until the administrator notices the issue and decommissions the 
> node. AM Blacklisting only helps, if the application is already running.
> As a possible improvement, when the LCE is used on the cluster and a NM gets 
> certain errors back from the LCE, like error 24 configuration not found, we 
> should not try to allocate anything on the node anymore or shut down the node 
> entirely. That kind of problem normally does not fix itself and it means that 
> nothing can really run on that node.
> {code}
> Application application_1488920587909_0010 failed 2 times due to AM Container 
> for appattempt_1488920587909_0010_02 exited with exitCode: -1000
> Failing this attempt.Diagnostics: Application application_1488920587909_0010 
> initialization failed (exitCode=24) with output:
> For more detailed output, check the application tracking page: 
> http://node-1.domain.com:8088/cluster/app/application_1488920587909_0010 Then 
> click on links to logs of each attempt.
> . Failing the application.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6302) Fail the node, if Linux Container Executor is not configured properly

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947838#comment-15947838
 ] 

ASF GitHub Bot commented on YARN-6302:
--

Github user szegedim commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/200#discussion_r108779597
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
 ---
@@ -580,19 +579,19 @@ public int launchContainer(ContainerStartContext ctx)
 logOutput(diagnostics);
 container.handle(new ContainerDiagnosticsUpdateEvent(containerId,
 diagnostics));
-if (exitCode == LinuxContainerExecutorExitCode.
+if (exitCode == ExitCode.
--- End diff --

I did not run my last git push. It should be fixed now.


> Fail the node, if Linux Container Executor is not configured properly
> -
>
> Key: YARN-6302
> URL: https://issues.apache.org/jira/browse/YARN-6302
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
>
> We have a cluster that has one node with misconfigured Linux Container 
> Executor. Every time an AM or regular container is launched on the cluster, 
> it will fail. The node will still have resources available, so it keeps 
> failing apps until the administrator notices the issue and decommissions the 
> node. AM Blacklisting only helps, if the application is already running.
> As a possible improvement, when the LCE is used on the cluster and a NM gets 
> certain errors back from the LCE, like error 24 configuration not found, we 
> should not try to allocate anything on the node anymore or shut down the node 
> entirely. That kind of problem normally does not fix itself and it means that 
> nothing can really run on that node.
> {code}
> Application application_1488920587909_0010 failed 2 times due to AM Container 
> for appattempt_1488920587909_0010_02 exited with exitCode: -1000
> Failing this attempt.Diagnostics: Application application_1488920587909_0010 
> initialization failed (exitCode=24) with output:
> For more detailed output, check the application tracking page: 
> http://node-1.domain.com:8088/cluster/app/application_1488920587909_0010 Then 
> click on links to logs of each attempt.
> . Failing the application.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5301) NM mount cpu cgroups failed on some systems

2017-03-29 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-5301:
-
Attachment: YARN-5301.001.patch

Fixing checkstyle issues.

> NM mount cpu cgroups failed on some systems
> ---
>
> Key: YARN-5301
> URL: https://issues.apache.org/jira/browse/YARN-5301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: Miklos Szegedi
> Attachments: YARN-5301.000.patch, YARN-5301.001.patch
>
>
> on ubuntu  with linux kernel 3.19, , NM start failed if enable auto mount 
> cgroup. try command:
> ./bin/container-executor --mount-cgroups yarn-hadoop cpu=/cgroup/cpufail
> ./bin/container-executor --mount-cgroups yarn-hadoop cpu,cpuacct=/cgroup/cpu  
>   succ



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5301) NM mount cpu cgroups failed on some systems

2017-03-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947816#comment-15947816
 ] 

Hadoop QA commented on YARN-5301:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 17s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 9 new + 8 unchanged - 0 fixed = 17 total (was 8) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
24s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5301 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12861086/YARN-5301.000.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  |
| uname | Linux 3f21c1c856ad 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 640ba1d |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/15422/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/15422/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/15422/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NM mount cpu cgroups failed on some syst

[jira] [Commented] (YARN-6246) Identifying starved apps does not need the scheduler writelock

2017-03-29 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947815#comment-15947815
 ] 

Daniel Templeton commented on YARN-6246:


Thanks, [~kasha].  Looking at the patch, I'm concerned about two elements of 
state that are no longer protected by a lock: {{FSAppAttempt.demand}} and 
{{FSLeafQueue.lastTimeAtFairShare}}.  It looks to me like both are used from 
different threads with no locking.  {{FSLeafQueue.dumpStateInternal()}} appears 
to be the primary problem.

> Identifying starved apps does not need the scheduler writelock
> --
>
> Key: YARN-6246
> URL: https://issues.apache.org/jira/browse/YARN-6246
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: YARN-6246.001.patch, YARN-6246.002.patch, 
> YARN-6246.003.patch
>
>
> Currently, the starvation checks are done holding the scheduler writelock. We 
> are probably better of doing this outside. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6342) Make TimelineV2Client's drain period after stop configurable

2017-03-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947809#comment-15947809
 ] 

Hadoop QA commented on YARN-6342:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
33s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
26s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 61m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6342 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12861083/YARN-6342.00.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux d778fa99cbee 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 13c766b |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/15421/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/15421/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.

[jira] [Commented] (YARN-5685) RM configuration allows all failover methods to disabled when automatic failover is enabled

2017-03-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947797#comment-15947797
 ] 

Hudson commented on YARN-5685:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11491 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11491/])
YARN-5685. RM configuration allows all failover methods to disabled when 
(templedf: rev 640ba1d23fe8b8105bae6d342ddc1c839302f8e5)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestHAUtil.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> RM configuration allows all failover methods to disabled when automatic 
> failover is enabled
> ---
>
> Key: YARN-5685
> URL: https://issues.apache.org/jira/browse/YARN-5685
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-hard
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: YARN-5685.001.patch, YARN-5685.002.patch, 
> YARN-5685.003.patch, YARN-5685.004.patch, YARN-5685.005.patch
>
>
> If HA is enabled with automatic failover enabled and embedded failover 
> disabled, all RMs all come up in standby state.  To make one of them active, 
> the {{\-\-forcemanual}} flag must be used when manually triggering the state 
> change.  Should the active go down, the standby will not become active and 
> must be manually transitioned with the {{\-\-forcemanual}} flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6200) Revert YARN-5068

2017-03-29 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6200:
-
Fix Version/s: (was: 2.8.1)

> Revert YARN-5068
> 
>
> Key: YARN-6200
> URL: https://issues.apache.org/jira/browse/YARN-6200
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha3
>
>
> As Vinod 
> [commented|https://issues.apache.org/jira/browse/YARN-5068?focusedCommentId=15867356&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15867356],
>  the functionality of YARN-5068 is achieved by YARN-1623. So, YARN-5068 need 
> to be reverted. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3427) Remove deprecated methods from ResourceCalculatorProcessTree

2017-03-29 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947789#comment-15947789
 ] 

Daniel Templeton commented on YARN-3427:


The methods were made public in 2.7 specifically for Tez, but they were exposed 
as deprecated with the very explicit plan of removing them in 3.0.

> Remove deprecated methods from ResourceCalculatorProcessTree
> 
>
> Key: YARN-3427
> URL: https://issues.apache.org/jira/browse/YARN-3427
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Blocker
> Attachments: YARN-3427.000.patch, YARN-3427.001.patch
>
>
> In 2.7, we made ResourceCalculatorProcessTree Public and exposed some 
> existing ill-formed methods as deprecated ones for use by Tez.
> We should remove it in 3.0.0, considering that the methods have been 
> deprecated for the all 2.x.y releases that it is marked Public in. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-6410) FSContext.scheduler should be final

2017-03-29 Thread Daniel Templeton (JIRA)

Daniel Templeton created YARN-6410:
--

 Summary: FSContext.scheduler should be final
 Key: YARN-6410
 URL: https://issues.apache.org/jira/browse/YARN-6410
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.8.0
Reporter: Daniel Templeton
Priority: Minor


{code}
  private FairScheduler scheduler;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3427) Remove deprecated methods from ResourceCalculatorProcessTree

2017-03-29 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947786#comment-15947786
 ] 

Siddharth Seth commented on YARN-3427:
--

>From a Tez perspective, would prefer if the methods were left in place. If 
>this was something that was fixed in 2.6, that would have been easier to work 
>with. Since the new methods were in 2.7 - Tez will need to introduce a shim 
>for this.

> Remove deprecated methods from ResourceCalculatorProcessTree
> 
>
> Key: YARN-3427
> URL: https://issues.apache.org/jira/browse/YARN-3427
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Blocker
> Attachments: YARN-3427.000.patch, YARN-3427.001.patch
>
>
> In 2.7, we made ResourceCalculatorProcessTree Public and exposed some 
> existing ill-formed methods as deprecated ones for use by Tez.
> We should remove it in 3.0.0, considering that the methods have been 
> deprecated for the all 2.x.y releases that it is marked Public in. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Moved] (YARN-6409) RM does not blacklist node for AM launch failures

2017-03-29 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen moved MAPREDUCE-6872 to YARN-6409:
-

Affects Version/s: (was: 3.0.0-alpha2)
   3.0.0-alpha2
 Target Version/s: 3.0.0-alpha3  (was: 3.0.0-alpha3)
  Component/s: (was: resourcemanager)
   resourcemanager
  Key: YARN-6409  (was: MAPREDUCE-6872)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

> RM does not blacklist node for AM launch failures
> -
>
> Key: YARN-6409
> URL: https://issues.apache.org/jira/browse/YARN-6409
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5685) RM configuration allows all failover methods to disabled when automatic failover is enabled

2017-03-29 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-5685:
---
Summary: RM configuration allows all failover methods to disabled when 
automatic failover is enabled  (was: RM allows all failover methods to disabled 
when automatic failover is enabled)

> RM configuration allows all failover methods to disabled when automatic 
> failover is enabled
> ---
>
> Key: YARN-5685
> URL: https://issues.apache.org/jira/browse/YARN-5685
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-hard
> Attachments: YARN-5685.001.patch, YARN-5685.002.patch, 
> YARN-5685.003.patch, YARN-5685.004.patch, YARN-5685.005.patch
>
>
> If HA is enabled with automatic failover enabled and embedded failover 
> disabled, all RMs all come up in standby state.  To make one of them active, 
> the {{\-\-forcemanual}} flag must be used when manually triggering the state 
> change.  Should the active go down, the standby will not become active and 
> must be manually transitioned with the {{\-\-forcemanual}} flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5301) NM mount cpu cgroups failed on some systems

2017-03-29 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-5301:
-
Summary: NM mount cpu cgroups failed on some systems  (was: NM mount cpu 
cgroups failed on some system)

> NM mount cpu cgroups failed on some systems
> ---
>
> Key: YARN-5301
> URL: https://issues.apache.org/jira/browse/YARN-5301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: Miklos Szegedi
> Attachments: YARN-5301.000.patch
>
>
> on ubuntu  with linux kernel 3.19, , NM start failed if enable auto mount 
> cgroup. try command:
> ./bin/container-executor --mount-cgroups yarn-hadoop cpu=/cgroup/cpufail
> ./bin/container-executor --mount-cgroups yarn-hadoop cpu,cpuacct=/cgroup/cpu  
>   succ



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5685) RM allows all failover methods to disabled when automatic failover is enabled

2017-03-29 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-5685:
---
Summary: RM allows all failover methods to disabled when automatic failover 
is enabled  (was: Non-embedded HA failover is broken)

> RM allows all failover methods to disabled when automatic failover is enabled
> -
>
> Key: YARN-5685
> URL: https://issues.apache.org/jira/browse/YARN-5685
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-hard
> Attachments: YARN-5685.001.patch, YARN-5685.002.patch, 
> YARN-5685.003.patch, YARN-5685.004.patch, YARN-5685.005.patch
>
>
> If HA is enabled with automatic failover enabled and embedded failover 
> disabled, all RMs all come up in standby state.  To make one of them active, 
> the {{\-\-forcemanual}} flag must be used when manually triggering the state 
> change.  Should the active go down, the standby will not become active and 
> must be manually transitioned with the {{\-\-forcemanual}} flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5301) NM mount cpu cgroups failed on some system

2017-03-29 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-5301:
-
Attachment: YARN-5301.000.patch

The attached patch fixes the issue by identifying any premounted cgroups and 
the subsystem set for each mount point. If the user specifies the mount option 
that means that the existing mount point might not be usable (access, design, 
etc.), so we remount it with the existing subsystem set (cpu, cpuacct in most 
cases). I tested it on CentOS6, Ubuntu16 with kernel 4.4. I am testing it on 
RHEL7 now.
I also updated the unit tests and fixed a few compiler warnings in those files.

> NM mount cpu cgroups failed on some system
> --
>
> Key: YARN-5301
> URL: https://issues.apache.org/jira/browse/YARN-5301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: Miklos Szegedi
> Attachments: YARN-5301.000.patch
>
>
> on ubuntu  with linux kernel 3.19, , NM start failed if enable auto mount 
> cgroup. try command:
> ./bin/container-executor --mount-cgroups yarn-hadoop cpu=/cgroup/cpufail
> ./bin/container-executor --mount-cgroups yarn-hadoop cpu,cpuacct=/cgroup/cpu  
>   succ



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5952) Create REST API for changing YARN scheduler configurations

2017-03-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947763#comment-15947763
 ] 

Hadoop QA commented on YARN-5952:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 0s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
20s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
24s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} YARN-5734 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 46s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5952 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12861074/YARN-5952-YARN-5734.008.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 96af0bf6102c 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-5734 / 25d2028 |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/15419/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/15419/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/15419/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Create REST API for changing YARN scheduler configurations
> --
>
> Key: YARN-

[jira] [Commented] (YARN-6195) Export UsedCapacity and AbsoluteUsedCapacity to JMX

2017-03-29 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947730#comment-15947730
 ] 

Jason Lowe commented on YARN-6195:
--

Latest patch lgtm, with the caveat that I don't think we can really support 
used capacity and absolute used capacity in the queue metrics without having 
per-partition queue metrics.  Using a max across partitions seems like a 
reasonable value to report given we're trying to squash multiple values into a 
single field.

[~leftnoteasy] do you have any concerns about using a max-across-partitions 
approach here?  If not then I think this is ready to go.

> Export UsedCapacity and AbsoluteUsedCapacity to JMX
> ---
>
> Key: YARN-6195
> URL: https://issues.apache.org/jira/browse/YARN-6195
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, metrics, yarn
>Affects Versions: 3.0.0-alpha3
>Reporter: Benson Qiu
>Assignee: Benson Qiu
> Attachments: YARN-6195.001.patch, YARN-6195.002.patch, 
> YARN-6195.003.patch
>
>
> `usedCapacity` and `absoluteUsedCapacity` are currently not available as JMX. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed

2017-03-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947723#comment-15947723
 ] 

Hadoop QA commented on YARN-6376:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
39s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 19m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6376 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12861075/YARN-6376.00.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux f2fbd7eac1bc 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 13c766b |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/15420/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/15420/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Exceptions caused by synchronous putEntities requests can be swallowed
> --
>
> Key: YARN-6376
> URL: https://issues.apache.org/jira/browse/YARN-6376
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.0.0-

[jira] [Commented] (YARN-6342) Make TimelineV2Client's drain period after stop configurable

2017-03-29 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947719#comment-15947719
 ] 

Haibo Chen commented on YARN-6342:
--

Upload a simple patch to make the drain period configurable. We could do 
[~jrottinghuis]'s suggestion of doing a global drain period for all clients in 
a followup jira.

> Make TimelineV2Client's drain period after stop configurable
> 
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Haibo Chen
> Attachments: YARN-6342.00.patch
>
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6342) Make TimelineV2Client's drain period after stop configurable

2017-03-29 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-6342:
-
Attachment: YARN-6342.00.patch

> Make TimelineV2Client's drain period after stop configurable
> 
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Haibo Chen
> Attachments: YARN-6342.00.patch
>
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6342) Make TimelineV2Client's drain period after stop configurable

2017-03-29 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-6342:
-
Summary: Make TimelineV2Client's drain period after stop configurable  
(was: Issues in async API of TimelineClient)

> Make TimelineV2Client's drain period after stop configurable
> 
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Haibo Chen
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6302) Fail the node, if Linux Container Executor is not configured properly

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947706#comment-15947706
 ] 

ASF GitHub Bot commented on YARN-6302:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/200#discussion_r108759529
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
 ---
@@ -551,16 +550,16 @@ public int launchContainer(ContainerStartContext ctx)
   } else {
 LOG.info(
 "Container was marked as inactive. Returning terminated 
error");
-return ExitCode.TERMINATED.getExitCode();
+return ContainerExecutor.ExitCode.TERMINATED.getExitCode();
--- End diff --

Ah, missed that.


> Fail the node, if Linux Container Executor is not configured properly
> -
>
> Key: YARN-6302
> URL: https://issues.apache.org/jira/browse/YARN-6302
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
>
> We have a cluster that has one node with misconfigured Linux Container 
> Executor. Every time an AM or regular container is launched on the cluster, 
> it will fail. The node will still have resources available, so it keeps 
> failing apps until the administrator notices the issue and decommissions the 
> node. AM Blacklisting only helps, if the application is already running.
> As a possible improvement, when the LCE is used on the cluster and a NM gets 
> certain errors back from the LCE, like error 24 configuration not found, we 
> should not try to allocate anything on the node anymore or shut down the node 
> entirely. That kind of problem normally does not fix itself and it means that 
> nothing can really run on that node.
> {code}
> Application application_1488920587909_0010 failed 2 times due to AM Container 
> for appattempt_1488920587909_0010_02 exited with exitCode: -1000
> Failing this attempt.Diagnostics: Application application_1488920587909_0010 
> initialization failed (exitCode=24) with output:
> For more detailed output, check the application tracking page: 
> http://node-1.domain.com:8088/cluster/app/application_1488920587909_0010 Then 
> click on links to logs of each attempt.
> . Failing the application.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6302) Fail the node, if Linux Container Executor is not configured properly

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947687#comment-15947687
 ] 

ASF GitHub Bot commented on YARN-6302:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/200#discussion_r108757816
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
 ---
@@ -580,19 +579,19 @@ public int launchContainer(ContainerStartContext ctx)
 logOutput(diagnostics);
 container.handle(new ContainerDiagnosticsUpdateEvent(containerId,
 diagnostics));
-if (exitCode == LinuxContainerExecutorExitCode.
+if (exitCode == ExitCode.
--- End diff --

What am I missing?  It doesn't look like anything changed...


> Fail the node, if Linux Container Executor is not configured properly
> -
>
> Key: YARN-6302
> URL: https://issues.apache.org/jira/browse/YARN-6302
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
>
> We have a cluster that has one node with misconfigured Linux Container 
> Executor. Every time an AM or regular container is launched on the cluster, 
> it will fail. The node will still have resources available, so it keeps 
> failing apps until the administrator notices the issue and decommissions the 
> node. AM Blacklisting only helps, if the application is already running.
> As a possible improvement, when the LCE is used on the cluster and a NM gets 
> certain errors back from the LCE, like error 24 configuration not found, we 
> should not try to allocate anything on the node anymore or shut down the node 
> entirely. That kind of problem normally does not fix itself and it means that 
> nothing can really run on that node.
> {code}
> Application application_1488920587909_0010 failed 2 times due to AM Container 
> for appattempt_1488920587909_0010_02 exited with exitCode: -1000
> Failing this attempt.Diagnostics: Application application_1488920587909_0010 
> initialization failed (exitCode=24) with output:
> For more detailed output, check the application tracking page: 
> http://node-1.domain.com:8088/cluster/app/application_1488920587909_0010 Then 
> click on links to logs of each attempt.
> . Failing the application.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5478) [YARN-4902] Define Java API for generalized & unified scheduling-strategies.

2017-03-29 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947674#comment-15947674
 ] 

Naganarasimha G R commented on YARN-5478:
-

Had offline discussion with the above two points with [~leftnoteasy], And as 
per it :
# If the node attribute expression is placed in {{affinityTargets}} then nodes 
satisfying the node attribute expression will be selected and if placed in 
{{antiAffinityTargets}} nodes *not* satisfying the node attribute expression 
will be selected. And advantage is, we can have combinations of node label 
expression, like delayed-or(placement-strategy-1, placement-strategy-2), 
different placement-strategy has different node attribute expression. 
# *PlacementStrategy* if it has scope as *RACK* and Node attribute affinity 
target, then we can throw an exception not accept the RR.

 

> [YARN-4902] Define Java API for generalized & unified scheduling-strategies.
> 
>
> Key: YARN-5478
> URL: https://issues.apache.org/jira/browse/YARN-5478
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-5478.1.patch, YARN-5478.2.patch, 
> YARN-5478.preliminary-poc.1.patch, YARN-5478.preliminary-poc.2.patch
>
>
> Define Java API for application to specify generic scheduling requirements 
> described in YARN-4902 design doc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3760) FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close()

2017-03-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947664#comment-15947664
 ] 

Hadoop QA commented on YARN-3760:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
19s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 27s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-3760 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12861071/YARN-3760.00.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux b2f1d744a65e 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 15e3873 |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/15418/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/15418/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close()
> 
>
> Key: YARN-3760
> URL: https://issues.apache.org/jira/browse/YARN-3760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Daryn Sharp
>Assignee: Haibo Chen
>Priority: Critical

[jira] [Updated] (YARN-6361) FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues

2017-03-29 Thread Yufei Gu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6361:
---
Description: FSLeafQueue.fetchAppsWithDemand sorts the applications by the 
current policy. Most of the time is spent in FairShareComparator.compare. We 
could improve this by doing the calculations outside the sort loop {{(O\(n\))}} 
and we sorted by a fixed number inside instead {{O(n*log\(n\))}}. This could be 
an performance issue when there are huge number of applications in a single 
queue. The attachments shows the performance impact when there are 10k 
applications in one queue.  (was: FSLeafQueue.fetchAppsWithDemand sorts the 
applications by the current policy. Most of the time is spent in 
FairShareComparator.compare. We could improve this by doing the calculations 
outside the sort loop {{(O\(n\))}} and we sorted by a fixed number inside 
instead {{O(n*log\(n\))}}.)

> FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big 
> queues
> 
>
> Key: YARN-6361
> URL: https://issues.apache.org/jira/browse/YARN-6361
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Miklos Szegedi
>Assignee: Yufei Gu
>Priority: Minor
> Attachments: dispatcherthread.png, threads.png
>
>
> FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. 
> Most of the time is spent in FairShareComparator.compare. We could improve 
> this by doing the calculations outside the sort loop {{(O\(n\))}} and we 
> sorted by a fixed number inside instead {{O(n*log\(n\))}}. This could be an 
> performance issue when there are huge number of applications in a single 
> queue. The attachments shows the performance impact when there are 10k 
> applications in one queue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6401) terminating signal should be able to specify per application to support graceful-stop

2017-03-29 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947656#comment-15947656
 ] 

Jason Lowe commented on YARN-6401:
--

Ah, sorry.  I was thinking it was ignoring SIGTERM and thus not cleaning up 
because it would get killed by the subsequent SIGKILL.  Instead it sounds like 
it _is_ responding to SIGTERM but not cleaning up.  Isn't that a bit odd?  The 
whole point of SIGTERM is to request a shutdown of the process rather than 
forcing one.

I'm not an httpd expert, so I started digging into the docs to try to 
understand why it wouldn't do something sane with TERM but does with a 
non-standard signal like WINCH.  Turns out it does handle TERM, but it's 
aggressive such that in-progress requests may be interrupted/canceled.  WINCH 
only advises things to exit, which sounds like active requests could continue 
to be processed but the listen port is no longer monitored so no new requests 
will be processed.

What worries me here is that we can still end up with an unorderly shutdown 
even if YARN sent WINCH instead of TERM. The default delay between the TERM and 
KILL signals is relatively short,  which is why the processing httpd does for 
TERM seems more appropriate here.  If a request could take hundreds of 
milliseconds to process then the KILL is going to arrive too soon after the 
WINCH signal unless the delay between the two signals is widened.  However that 
delay is not a per-app setting, and making it a per-app setting would cause a 
DoS problem.  Containers are often killed because YARN needs the container to 
leave in a timely manner (e.g.: container running beyond limits, preemption, 
etc.).

So I still think this is something better handled by the application framework 
(in this case Slider) rather than YARN.  MapReduce has a similar example.  
MapReduce jobs can be killed via YARN, but it's harsh and things are often lost 
when this occurs.  That's why the {{mapred job -kill}} command first tries to 
kill the job by contacting the AM and requesting it to do an orderly shutdown 
outside of YARN, and only falls back on YARN to terminate the containers if the 
job is unresponsive to the kill request.  I think the same thing applies here.  
If we really want an orderly shutdown to httpd so we won't kill outstanding 
requests (even if they can take a while) then Slider (or some layer on top of 
Slider) should support sending the WINCH signals to the containers for the app 
and then the app can terminate when all containers have completed their 
shutdown.  Then the application can implement an arbitrary, 
application-specific shutdown sequence and timing.  If YARN needs to do the 
killing directly then we cannot wait an arbitrary amount of time for the app to 
cleanup and shutdown gracefully.

I think YARN will still need some support to send the WINCH signal in either 
case.  Currently containers can be sent signals after YARN-1897, but it's only 
a restricted subset that can be translated cross-platform.  That would need to 
be extended to support more arbitrary signals like WINCH.

> terminating signal should be able to specify per application to support 
> graceful-stop
> -
>
> Key: YARN-6401
> URL: https://issues.apache.org/jira/browse/YARN-6401
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: kyungwan nam
>
> when stop container, first send SIGTERM to the process.
> after a while, send SIGKILL if the process is still alive.
> above process is always the same for any application.
> but, to graceful-stop, sometimes it need to send another signal instead of 
> SIGTERM.
> for instance, if apache httpd on slider is running, SIGWINCH should be came 
> to stop gracefully.
> the way to stop gracefully is depend on application.
> it will be good if we can define a signal to terminate per application.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6408) NPE in handling NODE_UPDATE (while running SLS)

2017-03-29 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947652#comment-15947652
 ] 

Carlo Curino commented on YARN-6408:


It looks like the userName was probably removed (last app for this user) and 
then we tried to update a missing entry:

{color:red}
getUser(userName).setUserResourceLimit(userLimitResource);
return userLimitResource;
 {color}

I put a simple null-check, though I would like someone to validate it, as I am 
not sure that is enough: [~wangda], [~jianhe], [~asuresh], thoughts?


> NPE in handling NODE_UPDATE (while running SLS)
> ---
>
> Key: YARN-6408
> URL: https://issues.apache.org/jira/browse/YARN-6408
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
> Attachments: YARN-6408.v0.patch
>
>
> The UsersManager.computeUserLimit() throws an NPE during an SLS simulation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6408) NPE in handling NODE_UPDATE (while running SLS)

2017-03-29 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947650#comment-15947650
 ] 

Sunil G commented on YARN-6408:
---

I will help to take a look.

It seems we got NPE from here 
{{getUser(userName).setUserResourceLimit(userLimitResource);}}.
It seems the user was removed was never added while computeUserLimit was 
executing from SLS. Since {{computeUserLimit}} and {{removeUser}} has 
writeLock, i suspect that the user was not added. 

I ll check the sls part a lil bit more and will share my feedback

> NPE in handling NODE_UPDATE (while running SLS)
> ---
>
> Key: YARN-6408
> URL: https://issues.apache.org/jira/browse/YARN-6408
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
> Attachments: YARN-6408.v0.patch
>
>
> The UsersManager.computeUserLimit() throws an NPE during an SLS simulation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed

2017-03-29 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-6376:
-
Attachment: YARN-6376.00.patch

> Exceptions caused by synchronous putEntities requests can be swallowed
> --
>
> Key: YARN-6376
> URL: https://issues.apache.org/jira/browse/YARN-6376
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.0.0-alpha2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Critical
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-6376.00.patch
>
>
> TimelineCollector.putEntitities() is currently implemented by calling 
> TimelineWriter.write() followed by TimelineWriter.flush(). Given 
> HBaseTimelineWriter.write() is an asynchronous operation, it is possible that 
> TimelineClient sends a synchronous putEntities() request for critical data, 
> but never gets back an exception even though the HBase write request to store 
> the entities may have failed. 
> This is due to a race condition between the WriterFlushThread in 
> TimelineCollectorManager and web threads handling synchronous putEntities() 
> requests. Entities are first put into the buffer by the web thread, it is 
> possible that before the web thread invokes writer.flush(), WriterFlushThread 
> is fired up to flush the writer. If the entities were not successfully 
> written to the backend during flush, the WriterFlushThread would just simply 
> log an error, whereas the web thread would never get an exception out from 
> its writer.flush() invocation. This is bad because the reason of 
> TimelineClient sending synchronously putEntities() is to retry upon any 
> exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-6382) Address race condition on TimelineWriter.flush() caused by buffer-sized flush

2017-03-29 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned YARN-6382:


Assignee: Haibo Chen

> Address race condition on TimelineWriter.flush() caused by buffer-sized flush
> -
>
> Key: YARN-6382
> URL: https://issues.apache.org/jira/browse/YARN-6382
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>  Labels: yarn-5355-merge-blocker
>
> YARN-6376 fixes the race condition between putEntities() and periodical 
> flush() by WriterFlushThread in TimelineCollectorManager, or between 
> putEntities() in different threads.
> However, BufferedMutator can have internal size-based flush as well. We need 
> to address the resulting race condition.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6408) NPE in handling NODE_UPDATE (while running SLS)

2017-03-29 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-6408:
---
Attachment: YARN-6408.v0.patch

> NPE in handling NODE_UPDATE (while running SLS)
> ---
>
> Key: YARN-6408
> URL: https://issues.apache.org/jira/browse/YARN-6408
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
> Attachments: YARN-6408.v0.patch
>
>
> The UsersManager.computeUserLimit() throws an NPE during an SLS simulation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed

2017-03-29 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-6376:
-
Summary: Exceptions caused by synchronous putEntities requests can be 
swallowed  (was: Exceptions caused by synchronous putEntities requests can be 
swallowed in TimelineCollector)

> Exceptions caused by synchronous putEntities requests can be swallowed
> --
>
> Key: YARN-6376
> URL: https://issues.apache.org/jira/browse/YARN-6376
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.0.0-alpha2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Critical
>  Labels: yarn-5355-merge-blocker
>
> TimelineCollector.putEntitities() is currently implemented by calling 
> TimelineWriter.write() followed by TimelineWriter.flush(). Given 
> HBaseTimelineWriter.write() is an asynchronous operation, it is possible that 
> TimelineClient sends a synchronous putEntities() request for critical data, 
> but never gets back an exception even though the HBase write request to store 
> the entities may have failed. 
> This is due to a race condition between the WriterFlushThread in 
> TimelineCollectorManager and web threads handling synchronous putEntities() 
> requests. Entities are first put into the buffer by the web thread, it is 
> possible that before the web thread invokes writer.flush(), WriterFlushThread 
> is fired up to flush the writer. If the entities were not successfully 
> written to the backend during flush, the WriterFlushThread would just simply 
> log an error, whereas the web thread would never get an exception out from 
> its writer.flush() invocation. This is bad because the reason of 
> TimelineClient sending synchronously putEntities() is to retry upon any 
> exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6408) NPE in handling NODE_UPDATE (while running SLS)

2017-03-29 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947638#comment-15947638
 ] 

Carlo Curino commented on YARN-6408:


After a 2 hours running SLS (with YARN-6363 synth generator), I got an NPE in 
handling NODE_UPDATE. 
[~asuresh] suspects this to be related to the processing of a NODE_UPDATE after 
the application is completed.
Below the stack trace for this:

2017-03-28 20:07:53,059 FATAL [SchedulerEventDispatcher:Event Processor] 
event.EventDispatcher (EventDispatcher.java:run(75)) - Error in handling event 
type NODE_UPDATE to the Event Dispatcher
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager.computeUserLimit(UsersManager.java:732)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager.reComputeUserLimits(UsersManager.java:611)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager.getComputedResourceLimitForActiveUsers(UsersManager.java:463)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getResourceLimitForActiveUsers(LeafQueue.java:1370)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getHeadroom(LeafQueue.java:1243)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getHeadroom(LeafQueue.java:1235)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityHeadroomProvider.getHeadroom(CapacityHeadroomProvider.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.getHeadroom(FiCaSchedulerApp.java:757)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.releaseResource(LeafQueue.java:1618)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1528)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1600)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:602)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateCompletedContainers(AbstractYarnScheduler.java:965)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.nodeUpdate(AbstractYarnScheduler.java:1038)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:971)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1368)
at 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:254)
at 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:84)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
at java.lang.Thread.run(Thread.java:745)


> NPE in handling NODE_UPDATE (while running SLS)
> ---
>
> Key: YARN-6408
> URL: https://issues.apache.org/jira/browse/YARN-6408
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>
> The UsersManager.computeUserLimit() throws an NPE during an SLS simulation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5952) Create REST API for changing YARN scheduler configurations

2017-03-29 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947633#comment-15947633
 ] 

Jonathan Hung commented on YARN-5952:
-

008 patch preserves the capacity-scheduler.xml in the target/test-classes 
directory so that resourcemanager tests which run after this one don't fail.

> Create REST API for changing YARN scheduler configurations
> --
>
> Key: YARN-5952
> URL: https://issues.apache.org/jira/browse/YARN-5952
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: YARN-5952.001.patch, YARN-5952.002.patch, 
> YARN-5952-YARN-5734.003.patch, YARN-5952-YARN-5734.004.patch, 
> YARN-5952-YARN-5734.005.patch, YARN-5952-YARN-5734.006.patch, 
> YARN-5952-YARN-5734.007.patch, YARN-5952-YARN-5734.008.patch
>
>
> Based on the design in YARN-5734.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-6408) NPE in handling NODE_UPDATE (while running SLS)

2017-03-29 Thread Carlo Curino (JIRA)

Carlo Curino created YARN-6408:
--

 Summary: NPE in handling NODE_UPDATE (while running SLS)
 Key: YARN-6408
 URL: https://issues.apache.org/jira/browse/YARN-6408
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Carlo Curino


The UsersManager.computeUserLimit() throws an NPE during an SLS simulation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5952) Create REST API for changing YARN scheduler configurations

2017-03-29 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-5952:

Attachment: YARN-5952-YARN-5734.008.patch

> Create REST API for changing YARN scheduler configurations
> --
>
> Key: YARN-5952
> URL: https://issues.apache.org/jira/browse/YARN-5952
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: YARN-5952.001.patch, YARN-5952.002.patch, 
> YARN-5952-YARN-5734.003.patch, YARN-5952-YARN-5734.004.patch, 
> YARN-5952-YARN-5734.005.patch, YARN-5952-YARN-5734.006.patch, 
> YARN-5952-YARN-5734.007.patch, YARN-5952-YARN-5734.008.patch
>
>
> Based on the design in YARN-5734.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-3760) FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close()

2017-03-29 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-3760:
-
Attachment: YARN-3760.00.patch

> FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close()
> 
>
> Key: YARN-3760
> URL: https://issues.apache.org/jira/browse/YARN-3760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Daryn Sharp
>Assignee: Haibo Chen
>Priority: Critical
> Attachments: YARN-3760.00.patch
>
>
> The aggregated log file does not appear to be properly closed when writes 
> fail.  This leaves a lease renewer active in the NM that spams the NN with 
> lease renewals.  If the token is marked not to be cancelled, the renewals 
> appear to continue until the token expires.  If the token is cancelled, the 
> periodic renew spam turns into a flood of failed connections until the lease 
> renewer gives up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-3760) FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close()

2017-03-29 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-3760:
-
Target Version/s: 2.8.1, 3.0.0-alpha3  (was: 2.8.0)

> FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close()
> 
>
> Key: YARN-3760
> URL: https://issues.apache.org/jira/browse/YARN-3760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Daryn Sharp
>Assignee: Haibo Chen
>Priority: Critical
> Attachments: YARN-3760.00.patch
>
>
> The aggregated log file does not appear to be properly closed when writes 
> fail.  This leaves a lease renewer active in the NM that spams the NN with 
> lease renewals.  If the token is marked not to be cancelled, the renewals 
> appear to continue until the token expires.  If the token is cancelled, the 
> periodic renew spam turns into a flood of failed connections until the lease 
> renewer gives up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-3760) FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close()

2017-03-29 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-3760:
-
Summary: FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close()  
(was: Log aggregation failures )

> FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close()
> 
>
> Key: YARN-3760
> URL: https://issues.apache.org/jira/browse/YARN-3760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Daryn Sharp
>Assignee: Haibo Chen
>Priority: Critical
>
> The aggregated log file does not appear to be properly closed when writes 
> fail.  This leaves a lease renewer active in the NM that spams the NN with 
> lease renewals.  If the token is marked not to be cancelled, the renewals 
> appear to continue until the token expires.  If the token is cancelled, the 
> periodic renew spam turns into a flood of failed connections until the lease 
> renewer gives up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3760) Log aggregation failures

2017-03-29 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947608#comment-15947608
 ] 

Haibo Chen commented on YARN-3760:
--

bq. the ctor creates the fs data stream then a TFile.Writer w/o a try/catch. If 
the TFile.Writer ctor throws an exception, it's impossible to close the stream.
YARN-6288 is in flight to fix this issue.
Will upload a patch to address the other issue that ISE causes FSDataStream 
leak.

> Log aggregation failures 
> -
>
> Key: YARN-3760
> URL: https://issues.apache.org/jira/browse/YARN-3760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Daryn Sharp
>Assignee: Haibo Chen
>Priority: Critical
>
> The aggregated log file does not appear to be properly closed when writes 
> fail.  This leaves a lease renewer active in the NM that spams the NN with 
> lease renewals.  If the token is marked not to be cancelled, the renewals 
> appear to continue until the token expires.  If the token is cancelled, the 
> periodic renew spam turns into a flood of failed connections until the lease 
> renewer gives up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-3760) Log aggregation failures

2017-03-29 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned YARN-3760:


Assignee: Haibo Chen

> Log aggregation failures 
> -
>
> Key: YARN-3760
> URL: https://issues.apache.org/jira/browse/YARN-3760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Daryn Sharp
>Assignee: Haibo Chen
>Priority: Critical
>
> The aggregated log file does not appear to be properly closed when writes 
> fail.  This leaves a lease renewer active in the NM that spams the NN with 
> lease renewals.  If the token is marked not to be cancelled, the renewals 
> appear to continue until the token expires.  If the token is cancelled, the 
> periodic renew spam turns into a flood of failed connections until the lease 
> renewer gives up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6407) Improve and fix locks of RM scheduler

2017-03-29 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947594#comment-15947594
 ] 

Yufei Gu commented on YARN-6407:


Hi [~zhengchenyu], thanks for filing this jira. IIUC, you reduced frequency of 
NM node update to avoid flooding the network in a 5k nodes cluster, but 
Continuous Scheduling is not necessary when there are still enough node update 
events in the clusters. Besides the improvement of lock in FS, we can always 
balance time interval of continuous scheduling  and frequency of NM node update 
to get better scheduling latency.

> Improve and fix locks of RM scheduler
> -
>
> Key: YARN-6407
> URL: https://issues.apache.org/jira/browse/YARN-6407
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
> Environment: CentOS 7, 1 Gigabit Ethernet
>Reporter: zhengchenyu
> Fix For: 2.7.1
>
>   Original Estimate: 2m
>  Remaining Estimate: 2m
>
> First，this issue dose not duplicate the YARN-3091.
> In our cluster, we have 5k nodes, and the server is configured with 1 Gigabit 
> Ethernet. So network is bottleneck in our cluster.
> We must distcp data from warehouse, because of 1 Gigabit Ethernet, we must 
> set yarn.scheduler.fair.max.assign to 5, or must lead to hotspot.
> The setting that max.assign is 5 lead to the assigned ability decreased. So 
> we start the ContinuousSchedulingThread. 
> As more applicaitons running in our cluster , and with 
> ContinuousSchedulingThread, the problem of lock contention is more serious. 
> In our cluster, the callqueue of ApplicationMasterSeriver's rpc is high 
> occasionally. we worried that more problem occure in future with more 
> application are running.
> Here is our logical graph:
> "1 Gigabit Ethernet" and "data hot spot" ==> "set 
> yarn.scheduler.fair.max.assign to 5" ==> "ContinuousSchedulingThread is 
> started" and "more applcations" => "lock contention"
> I know YARN-3091 solved this problem, but the patch aims that change the 
> object lock to read write lock. This change is still Coarse-Grained. So I 
> think we lock the resources or not lock the large section code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6407) Improve and fix locks of RM scheduler

2017-03-29 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947567#comment-15947567
 ] 

Daniel Templeton commented on YARN-6407:


YARN-3091 just replaced {{synchronized}} with a read/write lock.  There's still 
a lot that can be done to improve the locking.  The trick is to map the state 
that's being protected and figure out how not to lock unnecessarily for each 
operation.

> Improve and fix locks of RM scheduler
> -
>
> Key: YARN-6407
> URL: https://issues.apache.org/jira/browse/YARN-6407
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
> Environment: CentOS 7, 1 Gigabit Ethernet
>Reporter: zhengchenyu
> Fix For: 2.7.1
>
>   Original Estimate: 2m
>  Remaining Estimate: 2m
>
> First，this issue dose not duplicate the YARN-3091.
> In our cluster, we have 5k nodes, and the server is configured with 1 Gigabit 
> Ethernet. So network is bottleneck in our cluster.
> We must distcp data from warehouse, because of 1 Gigabit Ethernet, we must 
> set yarn.scheduler.fair.max.assign to 5, or must lead to hotspot.
> The setting that max.assign is 5 lead to the assigned ability decreased. So 
> we start the ContinuousSchedulingThread. 
> As more applicaitons running in our cluster , and with 
> ContinuousSchedulingThread, the problem of lock contention is more serious. 
> In our cluster, the callqueue of ApplicationMasterSeriver's rpc is high 
> occasionally. we worried that more problem occure in future with more 
> application are running.
> Here is our logical graph:
> "1 Gigabit Ethernet" and "data hot spot" ==> "set 
> yarn.scheduler.fair.max.assign to 5" ==> "ContinuousSchedulingThread is 
> started" and "more applcations" => "lock contention"
> I know YARN-3091 solved this problem, but the patch aims that change the 
> object lock to read write lock. This change is still Coarse-Grained. So I 
> think we lock the resources or not lock the large section code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Reopened] (YARN-3760) Log aggregation failures

2017-03-29 Thread Daryn Sharp (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp reopened YARN-3760:
---

Line numbers are from an old release but the error is evident.
{code}
java.lang.IllegalStateException: Cannot close TFile in the middle of key-value 
insertion.
at org.apache.hadoop.io.file.tfile.TFile$Writer.close(TFile.java:310)
at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.close(AggregatedLogFormat.java:456)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:326)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:429)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:388)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:387)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
{code}

_AggregatedLogFormat.LogWriter_
{code}
public void close() {
  try {
this.writer.close();
  } catch (IOException e) {
LOG.warn("Exception closing writer", e);
  }
  IOUtils.closeStream(fsDataOStream);
}
{code}
TFile writer's close which may throw {{IllegalStateException}} if the 
underlying fs data stream failed.  Unfortunately it only catches IOE, so the 
ISE rips out w/o closing the fsdata stream.

Additionally, the ctor creates the fs data stream then a TFile.Writer w/o a 
try/catch.  If the TFile.Writer ctor throws an exception, it's impossible to 
close the stream.

I haven't checked if there are futher issues with closing the writer high in 
the stack.

> Log aggregation failures 
> -
>
> Key: YARN-3760
> URL: https://issues.apache.org/jira/browse/YARN-3760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> The aggregated log file does not appear to be properly closed when writes 
> fail.  This leaves a lease renewer active in the NM that spams the NN with 
> lease renewals.  If the token is marked not to be cancelled, the renewals 
> appear to continue until the token expires.  If the token is cancelled, the 
> periodic renew spam turns into a flood of failed connections until the lease 
> renewer gives up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6168) Restarted RM may not inform AM about all existing containers

2017-03-29 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947326#comment-15947326
 ] 

Jason Lowe commented on YARN-6168:
--

This sounds like the RM isn't waiting long enough for all the live NMs to 
report in before reporting the live containers to the app.  Technically it 
would have to wait up to the full NM expiry interval before it could know for 
sure no more containers are going to be reported by late-heartbeating NMs, so 
once fix would be to hold off AM restarts of container-preserving apps after an 
RM restart until the NM expiry interval has passed since restart.  However I 
don't know if apps are willing to wait that long before their AM recovers.  If 
not then there is always going to be the possibility that not all live 
containers are reported when the AM restarts and registers if an NM ends jup 
heartbeating late.




> Restarted RM may not inform AM about all existing containers
> 
>
> Key: YARN-6168
> URL: https://issues.apache.org/jira/browse/YARN-6168
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>
> There appears to be a race condition when an RM is restarted. I had a 
> situation where the RMs and AM were down, but NMs and app containers were 
> still running. When I restarted the RM, the AM restarted, registered with the 
> RM, and received its list of existing containers before the NMs had reported 
> all of their containers to the RM. The AM was only told about some of the 
> app's existing containers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6168) Restarted RM may not inform AM about all existing containers

2017-03-29 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947305#comment-15947305
 ] 

Billie Rinaldi commented on YARN-6168:
--

Yes. In my case, the AM requested new containers immediately and got them 
allocated, so when it was informed later about the old containers, it just 
released them.

> Restarted RM may not inform AM about all existing containers
> 
>
> Key: YARN-6168
> URL: https://issues.apache.org/jira/browse/YARN-6168
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>
> There appears to be a race condition when an RM is restarted. I had a 
> situation where the RMs and AM were down, but NMs and app containers were 
> still running. When I restarted the RM, the AM restarted, registered with the 
> RM, and received its list of existing containers before the NMs had reported 
> all of their containers to the RM. The AM was only told about some of the 
> app's existing containers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6403) Invalid local resource request can raise NPE and make NM exit

2017-03-29 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947283#comment-15947283
 ] 

Jason Lowe commented on YARN-6403:
--

Sorry, I completely missed the server-side change in ContainerImpl.

I'm not sure that's the correct place to make the server-side change because 
it's happening so late in the container lifecycle.  It would be better if we 
simply failed the container launch request _immediately_ rather than wait until 
the container transitions all the way to the localizing state.  That way the 
client gets immediate feedback that their request was malformed rather than 
wondering why their container launch mysteriously failed sometime later.

I think it's more appropriate to have 
ContainerManagerImpl#startContainerInternal sanity check the request (which it 
already does to some degree, just not for the local resources) and throw a 
YarnException if the request is malformed.  That way the client will receive a 
failed container start response to their start request, so they will 
immediately know their request was bad.

It would be good to add unit tests for both the client and server changes.

> Invalid local resource request can raise NPE and make NM exit
> -
>
> Key: YARN-6403
> URL: https://issues.apache.org/jira/browse/YARN-6403
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Tao Yang
> Attachments: YARN-6403.001.patch
>
>
> Recently we found this problem on our testing environment. The app that 
> caused this problem added a invalid local resource request(have no location) 
> into ContainerLaunchContext like this:
> {code}
> localResources.put("test", LocalResource.newInstance(location,
> LocalResourceType.FILE, LocalResourceVisibility.PRIVATE, 100,
> System.currentTimeMillis()));
> ContainerLaunchContext amContainer =
> ContainerLaunchContext.newInstance(localResources, environment,
>   vargsFinal, null, securityTokens, acls);
> {code}
> The actual value of location was null although app doesn't expect that. This 
> mistake cause several NMs exited with the NPE below and can't restart until 
> the nm recovery dirs were deleted. 
> {code}
> FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.(LocalResourceRequest.java:46)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:711)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:660)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1320)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:88)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1293)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1286)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> NPE occured when created LocalResourceRequest instance for invalid resource 
> request.
> {code}
>   public LocalResourceRequest(LocalResource resource)
>   throws URISyntaxException {
> this(resource.getResource().toPath(),  //NPE occurred here
> resource.getTimestamp(),
> resource.getType(),
> resource.getVisibility(),
> resource.getPattern());
>   }
> {code}
> We can't guarantee the validity of local resource request now, but we could 
> avoid damaging the cluster. Perhaps we can verify the resource both in 
> ContainerLaunchContext and LocalResourceRequest? Please feel free to give 
> your suggestions.



--
This message

1 2 >

1 - 100 of 112 matches

Mail list logo