[jira] [Commented] (YARN-9755) RM fails to start with FileSystemBasedConfigurationProvider

2019-08-26 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916333#comment-16916333
 ] 

Prabhu Joseph commented on YARN-9755:
-

Thanks [~eyang] for the review. Can you commit this when you get time. Thanks.

> RM fails to start with FileSystemBasedConfigurationProvider
> ---
>
> Key: YARN-9755
> URL: https://issues.apache.org/jira/browse/YARN-9755
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9755-001.patch, YARN-9755-002.patch, 
> YARN-9755-003.patch, YARN-9755-004.patch
>
>
> RM fails to start with below exception when 
> FileSystemBasedConfigurationProvider is used.
> *Exception:*
> {code}
> 2019-08-16 12:05:33,802 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> java.io.IOException: Filesystem closed
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:868)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1281)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:1312)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1335)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1328)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1379)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1567)
> Caused by: java.io.IOException: java.io.IOException: Filesystem closed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FileBasedCSConfigurationProvider.loadConfiguration(FileBasedCSConfigurationProvider.java:64)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:445)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> ... 14 more
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:475)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1682)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1586)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1598)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1701)
> at 
> org.apache.hadoop.yarn.FileSystemBasedConfigurationProvider.getConfigurationInputStream(FileSystemBasedConfigurationProvider.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FileBasedCSConfigurationProvider.loadConfiguration(FileBasedCSConfigurationProvider.java:56)
> {code}
> FileSystemBasedConfigurationProvider uses the cached FileSystem causing the 
> issue.
> *Configs:*
> {code}
> yarn.resourcemanager.configuration.provider-classorg.apache.hadoop.yarn.FileSystemBasedConfigurationProvider
> yarn.resourcemanager.configuration.file-system-based-store/yarn/conf
> [yarn@yarndocker-1 yarn]$ hadoop fs -ls /yarn/conf
> -rw-r--r--   3 yarn supergroup   4138 

[jira] [Commented] (YARN-9780) SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single call

2019-08-26 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916330#comment-16916330
 ] 

Prabhu Joseph commented on YARN-9780:
-

[~jhung] Can you review this Jira when you get time. This allows SchedulerConf 
to allow Stop and Remove Queue in a single Rest Call. Thanks.

> SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single 
> call
> 
>
> Key: YARN-9780
> URL: https://issues.apache.org/jira/browse/YARN-9780
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9780-001.patch, YARN-9780-002.patch, 
> YARN-9780-003.patch
>
>
> SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single 
> call. The queue has to be stopped before removing and so it is useful to 
> allow both Stop and remove queue in a single call.
> *Repro:*
> {code:java}
> Capacity-Scheduler.xml:
> yarn.scheduler.capacity.root.queues = new, default, dummy
> yarn.scheduler.capacity.root.default.capacity = 60
> yarn.scheduler.capacity.root.dummy.capacity = 30
> yarn.scheduler.capacity.root.new.capacity = 10   
> curl -v -X PUT -d @abc.xml -H "Content-type: application/xml" 
> 'http://:8088/ws/v1/cluster/scheduler-conf'
> abc.xml
> 
>   
>   root.default
>   
> 
>   capacity
>   70
> 
>   
> 
> 
>   root.new
>   
> 
>   state
>   STOPPED
> 
>   
> 
> root.new
> 
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9775) RMWebServices /scheduler-conf GET returns all hadoop configurations for ZKConfigurationStore

2019-08-26 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916328#comment-16916328
 ] 

Prabhu Joseph commented on YARN-9775:
-

Thanks [~jhung].

> RMWebServices /scheduler-conf GET returns all hadoop configurations for 
> ZKConfigurationStore
> 
>
> Key: YARN-9775
> URL: https://issues.apache.org/jira/browse/YARN-9775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: restapi
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9775-001.patch, YARN-9775-002.patch
>
>
> RMWebServices /scheduler-conf GET returns all hadoop configurations.
> Repro:
> {code}
> yarn.scheduler.configuration.store.class = zk
> http://:8088/ws/v1/cluster/scheduler-conf
> returns all hadoop configurations present in RM Classpath
> {code}
> The issue is only with ZKConfigurationStore, the other memory, fs and leveldb 
> works fine.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9640) Slow event processing could cause too many attempt unregister events

2019-08-26 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916312#comment-16916312
 ] 

Hudson commented on YARN-9640:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17189 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17189/])
YARN-9640. Slow event processing could cause too many attempt unregister 
(rohithsharmaks: rev d70f5231a794804d9baf0bdb7e6833533fe60db5)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterServiceTestBase.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java


> Slow event processing could cause too many attempt unregister events
> 
>
> Key: YARN-9640
> URL: https://issues.apache.org/jira/browse/YARN-9640
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>  Labels: scalability
> Fix For: 3.3.0
>
> Attachments: YARN-9640.001.patch, YARN-9640.002.patch, 
> YARN-9640.003.patch
>
>
> We found in one of our test cluster verification that the number attempt 
> unregister events is about 300k+.
>  # AM all containers completed.
>  # AMRMClientImpl send finishApplcationMaster
>  # AMRMClient check event 100ms the finish Status using 
> finishApplicationMaster request.
>  # AMRMClientImpl#unregisterApplicationMaster
> {code:java}
>   while (true) {
> FinishApplicationMasterResponse response =
> rmClient.finishApplicationMaster(request);
> if (response.getIsUnregistered()) {
>   break;
> }
> LOG.info("Waiting for application to be successfully unregistered.");
> Thread.sleep(100);
>   }
> {code}
>  # ApplicationMasterService finishApplicationMaster interface sends 
> unregister events on every status update.
> We should send unregister event only once and cache event send , ignore and 
> send not unregistered response back to AM not overloading the event queue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9640) Slow event processing could cause too many attempt unregister events

2019-08-26 Thread Rohith Sharma K S (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916300#comment-16916300
 ] 

Rohith Sharma K S commented on YARN-9640:
-

committed to trunk. branch-3.2 patch doesn't apply. [~bibinchundatt] Can you 
provide branch-3.2 patch. 

> Slow event processing could cause too many attempt unregister events
> 
>
> Key: YARN-9640
> URL: https://issues.apache.org/jira/browse/YARN-9640
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>  Labels: scalability
> Attachments: YARN-9640.001.patch, YARN-9640.002.patch, 
> YARN-9640.003.patch
>
>
> We found in one of our test cluster verification that the number attempt 
> unregister events is about 300k+.
>  # AM all containers completed.
>  # AMRMClientImpl send finishApplcationMaster
>  # AMRMClient check event 100ms the finish Status using 
> finishApplicationMaster request.
>  # AMRMClientImpl#unregisterApplicationMaster
> {code:java}
>   while (true) {
> FinishApplicationMasterResponse response =
> rmClient.finishApplicationMaster(request);
> if (response.getIsUnregistered()) {
>   break;
> }
> LOG.info("Waiting for application to be successfully unregistered.");
> Thread.sleep(100);
>   }
> {code}
>  # ApplicationMasterService finishApplicationMaster interface sends 
> unregister events on every status update.
> We should send unregister event only once and cache event send , ignore and 
> send not unregistered response back to AM not overloading the event queue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9640) Slow event processing could cause too many attempt unregister events

2019-08-26 Thread Rohith Sharma K S (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-9640:

Fix Version/s: 3.3.0

> Slow event processing could cause too many attempt unregister events
> 
>
> Key: YARN-9640
> URL: https://issues.apache.org/jira/browse/YARN-9640
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>  Labels: scalability
> Fix For: 3.3.0
>
> Attachments: YARN-9640.001.patch, YARN-9640.002.patch, 
> YARN-9640.003.patch
>
>
> We found in one of our test cluster verification that the number attempt 
> unregister events is about 300k+.
>  # AM all containers completed.
>  # AMRMClientImpl send finishApplcationMaster
>  # AMRMClient check event 100ms the finish Status using 
> finishApplicationMaster request.
>  # AMRMClientImpl#unregisterApplicationMaster
> {code:java}
>   while (true) {
> FinishApplicationMasterResponse response =
> rmClient.finishApplicationMaster(request);
> if (response.getIsUnregistered()) {
>   break;
> }
> LOG.info("Waiting for application to be successfully unregistered.");
> Thread.sleep(100);
>   }
> {code}
>  # ApplicationMasterService finishApplicationMaster interface sends 
> unregister events on every status update.
> We should send unregister event only once and cache event send , ignore and 
> send not unregistered response back to AM not overloading the event queue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9640) Slow event processing could cause too many attempt unregister events

2019-08-26 Thread Rohith Sharma K S (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916296#comment-16916296
 ] 

Rohith Sharma K S commented on YARN-9640:
-

+1 committing shortly

> Slow event processing could cause too many attempt unregister events
> 
>
> Key: YARN-9640
> URL: https://issues.apache.org/jira/browse/YARN-9640
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>  Labels: scalability
> Attachments: YARN-9640.001.patch, YARN-9640.002.patch, 
> YARN-9640.003.patch
>
>
> We found in one of our test cluster verification that the number attempt 
> unregister events is about 300k+.
>  # AM all containers completed.
>  # AMRMClientImpl send finishApplcationMaster
>  # AMRMClient check event 100ms the finish Status using 
> finishApplicationMaster request.
>  # AMRMClientImpl#unregisterApplicationMaster
> {code:java}
>   while (true) {
> FinishApplicationMasterResponse response =
> rmClient.finishApplicationMaster(request);
> if (response.getIsUnregistered()) {
>   break;
> }
> LOG.info("Waiting for application to be successfully unregistered.");
> Thread.sleep(100);
>   }
> {code}
>  # ApplicationMasterService finishApplicationMaster interface sends 
> unregister events on every status update.
> We should send unregister event only once and cache event send , ignore and 
> send not unregistered response back to AM not overloading the event queue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-26 Thread Tao Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9714:
---
Attachment: YARN-9714.004.patch

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch, 
> YARN-9714.003.patch, YARN-9714.004.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8917) Absolute (maximum) capacity of level3+ queues is wrongly calculated for absolute resource

2019-08-26 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916291#comment-16916291
 ] 

Tao Yang commented on YARN-8917:


Thanks [~rohithsharma], [~leftnoteasy], [~sunilg] for the review and commit!

> Absolute (maximum) capacity of level3+ queues is wrongly calculated for 
> absolute resource
> -
>
> Key: YARN-8917
> URL: https://issues.apache.org/jira/browse/YARN-8917
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-8917.001.patch, YARN-8917.002.patch
>
>
> Absolute capacity should be equal to multiply capacity by parent-queue's 
> absolute-capacity,
> but currently it's calculated as dividing capacity by parent-queue's 
> absolute-capacity.
> Calculation for absolute-maximum-capacity has the same problem.
> For example: 
> root.a   capacity=0.4   maximum-capacity=0.8
> root.a.a1   capacity=0.5  maximum-capacity=0.6
> Absolute capacity of root.a.a1 should be 0.2 but is wrongly calculated as 1.25
> Absolute maximum capacity of root.a.a1 should be 0.48 but is wrongly 
> calculated as 0.75
> Moreover:
> {{childQueue.getQueueCapacities().getCapacity()}}  should be changed to 
> {{childQueue.getQueueCapacities().getCapacity(label)}} to avoid getting wrong 
> capacity from default partition when calculating for a non-default partition.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-26 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916290#comment-16916290
 ] 

Tao Yang commented on YARN-9714:


TestZKRMStateStore#testZKRootPathAcls UT failure is caused by itself, 
stateStore (ZKRMStateStore instance) used for verification is not updated after 
RM HA transition. Will attach v4 patch to fix this UT problem.

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch, 
> YARN-9714.003.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9785) Application gets activated even when AM memory has reached

2019-08-26 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916269#comment-16916269
 ] 

Zhankun Tang commented on YARN-9785:


[~BilwaST], Thanks for reporting this. We're going to have branch cut for 3.1.3 
and 3.2.1. Do you have a patch for this blocker issue? We can get it reviewed 
and merged soon.

> Application gets activated even when AM memory has reached
> --
>
> Key: YARN-9785
> URL: https://issues.apache.org/jira/browse/YARN-9785
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Blocker
>
> Configure below property in resource-types.xml
> {quote}
>  yarn.resource-types
>  yarn.io/gpu
>  
> {quote}
> Submit applications even after AM limit for a queue is reached. Applications 
> get activated even after limit is reached
> !queue.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9775) RMWebServices /scheduler-conf GET returns all hadoop configurations for ZKConfigurationStore

2019-08-26 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916246#comment-16916246
 ] 

Hudson commented on YARN-9775:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17188 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17188/])
YARN-9775. RMWebServices /scheduler-conf GET returns all hadoop (jhung: rev 
8660e48ca15098e891c560beb3181c22ef3f80ff)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/ZKConfigurationStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestZKConfigurationStore.java


> RMWebServices /scheduler-conf GET returns all hadoop configurations for 
> ZKConfigurationStore
> 
>
> Key: YARN-9775
> URL: https://issues.apache.org/jira/browse/YARN-9775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: restapi
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9775-001.patch, YARN-9775-002.patch
>
>
> RMWebServices /scheduler-conf GET returns all hadoop configurations.
> Repro:
> {code}
> yarn.scheduler.configuration.store.class = zk
> http://:8088/ws/v1/cluster/scheduler-conf
> returns all hadoop configurations present in RM Classpath
> {code}
> The issue is only with ZKConfigurationStore, the other memory, fs and leveldb 
> works fine.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9775) RMWebServices /scheduler-conf GET returns all hadoop configurations for ZKConfigurationStore

2019-08-26 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9775:

Fix Version/s: 3.1.3
   3.2.1
   3.3.0
   3.0.4
   2.10.0

> RMWebServices /scheduler-conf GET returns all hadoop configurations for 
> ZKConfigurationStore
> 
>
> Key: YARN-9775
> URL: https://issues.apache.org/jira/browse/YARN-9775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: restapi
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9775-001.patch, YARN-9775-002.patch
>
>
> RMWebServices /scheduler-conf GET returns all hadoop configurations.
> Repro:
> {code}
> yarn.scheduler.configuration.store.class = zk
> http://:8088/ws/v1/cluster/scheduler-conf
> returns all hadoop configurations present in RM Classpath
> {code}
> The issue is only with ZKConfigurationStore, the other memory, fs and leveldb 
> works fine.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9775) RMWebServices /scheduler-conf GET returns all hadoop configurations for ZKConfigurationStore

2019-08-26 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9775:

Summary: RMWebServices /scheduler-conf GET returns all hadoop 
configurations for ZKConfigurationStore  (was: RMWebServices /scheduler-conf 
GET returns all hadoop configurations)

> RMWebServices /scheduler-conf GET returns all hadoop configurations for 
> ZKConfigurationStore
> 
>
> Key: YARN-9775
> URL: https://issues.apache.org/jira/browse/YARN-9775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: restapi
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9775-001.patch, YARN-9775-002.patch
>
>
> RMWebServices /scheduler-conf GET returns all hadoop configurations.
> Repro:
> {code}
> yarn.scheduler.configuration.store.class = zk
> http://:8088/ws/v1/cluster/scheduler-conf
> returns all hadoop configurations present in RM Classpath
> {code}
> The issue is only with ZKConfigurationStore, the other memory, fs and leveldb 
> works fine.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2019-08-26 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916128#comment-16916128
 ] 

Jonathan Hung commented on YARN-9615:
-

[~bibinchundatt] that approach looks better. Feel free to take over this JIRA.

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9615.poc.patch, screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-08-26 Thread CR Hota (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916104#comment-16916104
 ] 

CR Hota commented on YARN-9768:
---

[~maniraj...@gmail.com] Thanks for pointing this out.

Sure feel free to put a patch here. We are close YARN-9478 as duplicate.

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-08-26 Thread CR Hota (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916104#comment-16916104
 ] 

CR Hota edited comment on YARN-9768 at 8/26/19 7:45 PM:


[~maniraj...@gmail.com] Thanks for pointing this out.

Sure feel free to put a patch here. We can close YARN-9478 as duplicate.


was (Author: crh):
[~maniraj...@gmail.com] Thanks for pointing this out.

Sure feel free to put a patch here. We are close YARN-9478 as duplicate.

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9775) RMWebServices /scheduler-conf GET returns all hadoop configurations

2019-08-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916090#comment-16916090
 ] 

Hadoop QA commented on YARN-9775:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 56s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 84m 
38s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}139m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9775 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12978607/YARN-9775-002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9d96b486bf6e 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6d7f01c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24638/testReport/ |
| Max. process+thread count | 895 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24638/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> RMWebServices /scheduler-conf GET returns all hadoop 

[jira] [Commented] (YARN-9781) SchedConfCli to get current stored scheduler configuration

2019-08-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916078#comment-16916078
 ] 

Hadoop QA commented on YARN-9781:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  6s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 5 new + 22 unchanged - 3 fixed = 27 total (was 25) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m  
6s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 
34s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m  
1s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}199m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9781 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12978600/YARN-9781-002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  xml  findbugs  checkstyle  |
| uname | Linux 02298932594a 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 

[jira] [Commented] (YARN-9756) Create metric that sums total memory/vcores preempted per round

2019-08-26 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916040#comment-16916040
 ] 

Eric Payne commented on YARN-9756:
--

Also in 3.2 and earlier, {{QueueMetrics#updatePreemptedResources}} should be 
called from {{CapacityScheduler#updateQueuePreemptionMetrics}}

> Create metric that sums total memory/vcores preempted per round
> ---
>
> Key: YARN-9756
> URL: https://issues.apache.org/jira/browse/YARN-9756
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2
>Reporter: Eric Payne
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9756.001.patch, YARN-9756.002.patch, 
> YARN-9756.WIP.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9738) Remove lock on ClusterNodeTracker#getNodeReport as it blocks application submission

2019-08-26 Thread Bibin A Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916038#comment-16916038
 ] 

Bibin A Chundatt commented on YARN-9738:


[~BilwaST] 

As discussed offline need to handle get to nodes using null key.

> Remove lock on ClusterNodeTracker#getNodeReport as it blocks application 
> submission
> ---
>
> Key: YARN-9738
> URL: https://issues.apache.org/jira/browse/YARN-9738
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9738-001.patch, YARN-9738-002.patch
>
>
> *Env :*
> Server OS :- UBUNTU
> No. of Cluster Node:- 9120 NMs
> Env Mode:- [Secure / Non secure]Secure
> *Preconditions:*
> ~9120 NM's was running
> ~1250 applications was in running state 
> 35K applications was in pending state
> *Test Steps:*
> 1. Submit the application from 5 clients, each client 2 threads and total 10 
> queues
> 2. Once application submittion increases (for each application of 
> distributted shell will call getClusterNodes)
> *ClientRMservice#getClusterNodes tries to get 
> ClusterNodeTracker#getNodeReport where map nodes is locked.*
> {quote}
> "IPC Server handler 36 on 45022" #246 daemon prio=5 os_prio=0 
> tid=0x7f75095de000 nid=0x1949c waiting on condition [0x7f74cff78000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x7f759f6d8858> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.getNodeReport(ClusterNodeTracker.java:123)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getNodeReport(AbstractYarnScheduler.java:449)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.createNodeReports(ClientRMService.java:1067)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getClusterNodes(ClientRMService.java:992)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:313)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:589)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2792)
> {quote}
> *Instead we can make nodes as concurrentHashMap and remove readlock*



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9642) Fix Memory Leak in AbstractYarnScheduler caused by timer

2019-08-26 Thread Bibin A Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916014#comment-16916014
 ] 

Bibin A Chundatt edited comment on YARN-9642 at 8/26/19 5:58 PM:
-

Committed to trunk , branch-3.2 and branch-3.1

 

Thank you [~sunilg] , [~rohithsharma] , [~Tao Yang] , [~tangzhankun]  for 
reviews


was (Author: bibinchundatt):
Committed to trunk , branch-3.2 and branch-3.

 

Thank you [~sunilg] , [~rohithsharma] , [~Tao Yang] , [~tangzhankun]  for 
reviews

> Fix Memory Leak in AbstractYarnScheduler caused by timer
> 
>
> Key: YARN-9642
> URL: https://issues.apache.org/jira/browse/YARN-9642
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
>  Labels: memory-leak
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9642.001.patch, YARN-9642.002.patch, 
> YARN-9642.003.patch, image-2019-06-22-16-05-24-114.png
>
>
> The TimeTask could hold the reference of Scheduler in case of fast switch 
> over too.
>  AbstractYarnScheduler should make sure scheduled Timer cancelled on 
> serviceStop.
> Causes memory leak too
> !image-2019-06-22-16-05-24-114.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9642) Fix Memory Leak in AbstractYarnScheduler caused by timer

2019-08-26 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916010#comment-16916010
 ] 

Hudson commented on YARN-9642:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17186 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17186/])
YARN-9642. Fix Memory Leak in AbstractYarnScheduler caused by timer. 
(bibinchundatt: rev d3ce53e5073e35e162f1725836282e4268cd26a5)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java


> Fix Memory Leak in AbstractYarnScheduler caused by timer
> 
>
> Key: YARN-9642
> URL: https://issues.apache.org/jira/browse/YARN-9642
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
>  Labels: memory-leak
> Attachments: YARN-9642.001.patch, YARN-9642.002.patch, 
> YARN-9642.003.patch, image-2019-06-22-16-05-24-114.png
>
>
> The TimeTask could hold the reference of Scheduler in case of fast switch 
> over too.
>  AbstractYarnScheduler should make sure scheduled Timer cancelled on 
> serviceStop.
> Causes memory leak too
> !image-2019-06-22-16-05-24-114.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9642) Fix Memory Leak in AbstractYarnScheduler caused by timer

2019-08-26 Thread Bibin A Chundatt (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-9642:
---
Summary: Fix Memory Leak in AbstractYarnScheduler caused by timer  (was: 
AbstractYarnScheduler#clearPendingContainerCache cause memory leak)

> Fix Memory Leak in AbstractYarnScheduler caused by timer
> 
>
> Key: YARN-9642
> URL: https://issues.apache.org/jira/browse/YARN-9642
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
>  Labels: memory-leak
> Attachments: YARN-9642.001.patch, YARN-9642.002.patch, 
> YARN-9642.003.patch, image-2019-06-22-16-05-24-114.png
>
>
> The TimeTask could hold the reference of Scheduler in case of fast switch 
> over too.
>  AbstractYarnScheduler should make sure scheduled Timer cancelled on 
> serviceStop.
> Causes memory leak too
> !image-2019-06-22-16-05-24-114.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9640) Slow event processing could cause too many attempt unregister events

2019-08-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916001#comment-16916001
 ] 

Hadoop QA commented on YARN-9640:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 36s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 80m 
31s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}137m 56s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9640 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972602/YARN-9640.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 35e3b649b51f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 23e532d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24635/testReport/ |
| Max. process+thread count | 917 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24635/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Slow event processing could cause too many attempt 

[jira] [Commented] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.

2019-08-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916002#comment-16916002
 ] 

Hadoop QA commented on YARN-8193:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
14s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 58m 
38s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 92m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:da675796017 |
| JIRA Issue | YARN-8193 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12930548/YARN-8193-branch-2-001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3e4eb2e5ef9f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2 / 97f207d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| Multi-JDK versions |  

[jira] [Updated] (YARN-9642) AbstractYarnScheduler#clearPendingContainerCache cause memory leak

2019-08-26 Thread Bibin A Chundatt (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-9642:
---
Summary: AbstractYarnScheduler#clearPendingContainerCache cause memory leak 
 (was: AbstractYarnScheduler#clearPendingContainerCache could run even after 
transitiontostandby)

> AbstractYarnScheduler#clearPendingContainerCache cause memory leak
> --
>
> Key: YARN-9642
> URL: https://issues.apache.org/jira/browse/YARN-9642
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
>  Labels: memory-leak
> Attachments: YARN-9642.001.patch, YARN-9642.002.patch, 
> YARN-9642.003.patch, image-2019-06-22-16-05-24-114.png
>
>
> The TimeTask could hold the reference of Scheduler in case of fast switch 
> over too.
>  AbstractYarnScheduler should make sure scheduled Timer cancelled on 
> serviceStop.
> Causes memory leak too
> !image-2019-06-22-16-05-24-114.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9785) Application gets activated even when AM memory has reached

2019-08-26 Thread Bibin A Chundatt (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-9785:
---
Target Version/s: 3.2.1, 3.1.3

> Application gets activated even when AM memory has reached
> --
>
> Key: YARN-9785
> URL: https://issues.apache.org/jira/browse/YARN-9785
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Blocker
>
> Configure below property in resource-types.xml
> {quote}
>  yarn.resource-types
>  yarn.io/gpu
>  
> {quote}
> Submit applications even after AM limit for a queue is reached. Applications 
> get activated even after limit is reached
> !queue.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9775) RMWebServices /scheduler-conf GET returns all hadoop configurations

2019-08-26 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915970#comment-16915970
 ] 

Jonathan Hung commented on YARN-9775:
-

Thanks, +1! I'll commit it by EOD if no objections.

> RMWebServices /scheduler-conf GET returns all hadoop configurations
> ---
>
> Key: YARN-9775
> URL: https://issues.apache.org/jira/browse/YARN-9775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: restapi
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9775-001.patch, YARN-9775-002.patch
>
>
> RMWebServices /scheduler-conf GET returns all hadoop configurations.
> Repro:
> {code}
> yarn.scheduler.configuration.store.class = zk
> http://:8088/ws/v1/cluster/scheduler-conf
> returns all hadoop configurations present in RM Classpath
> {code}
> The issue is only with ZKConfigurationStore, the other memory, fs and leveldb 
> works fine.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types

2019-08-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915969#comment-16915969
 ] 

Hadoop QA commented on YARN-8453:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 44m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 27s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 11 new + 3 unchanged - 0 fixed = 14 total (was 3) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 35s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 50s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}157m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-8453 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929467/YARN-8453.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c64aa62a4592 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 23e532d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24634/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 

[jira] [Updated] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2

2019-08-26 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8200:

Target Version/s: 2.10.0
  Labels: release-blocker  (was: )

> Backport resource types/GPU features to branch-3.0/branch-2
> ---
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-8200-branch-2.001.patch, 
> YARN-8200-branch-2.002.patch, YARN-8200-branch-2.003.patch, 
> YARN-8200-branch-3.0.001.patch, 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9642) AbstractYarnScheduler#clearPendingContainerCache could run even after transitiontostandby

2019-08-26 Thread Bibin A Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915955#comment-16915955
 ] 

Bibin A Chundatt commented on YARN-9642:


[~tangzhankun] and [~rohithsharma]

Tried running it locally

{code}
INFO] 
[INFO] --- maven-surefire-plugin:3.0.0-M1:test (default-test) @ 
hadoop-yarn-server-resourcemanager ---
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.808 s 
- in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
{code}

Failure seems random

> AbstractYarnScheduler#clearPendingContainerCache could run even after 
> transitiontostandby
> -
>
> Key: YARN-9642
> URL: https://issues.apache.org/jira/browse/YARN-9642
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
>  Labels: memory-leak
> Attachments: YARN-9642.001.patch, YARN-9642.002.patch, 
> YARN-9642.003.patch, image-2019-06-22-16-05-24-114.png
>
>
> The TimeTask could hold the reference of Scheduler in case of fast switch 
> over too.
>  AbstractYarnScheduler should make sure scheduled Timer cancelled on 
> serviceStop.
> Causes memory leak too
> !image-2019-06-22-16-05-24-114.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9756) Create metric that sums total memory/vcores preempted per round

2019-08-26 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915951#comment-16915951
 ] 

Eric Payne commented on YARN-9756:
--

Hi [~maniraj...@gmail.com], thanks for the updated patch!

We will also need a separate patch for branch-3.2 and earlier because the 
{{queueMetricsForCustomResources}} stuff doesn't exist there.

> Create metric that sums total memory/vcores preempted per round
> ---
>
> Key: YARN-9756
> URL: https://issues.apache.org/jira/browse/YARN-9756
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2
>Reporter: Eric Payne
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9756.001.patch, YARN-9756.002.patch, 
> YARN-9756.WIP.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9754) Add support for arbitrary DAG AM Simulator.

2019-08-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915952#comment-16915952
 ] 

Íñigo Goiri commented on YARN-9754:
---

To make it easier to read, it would be nice to do: {{super.amtype = "dag";}}, 
or even having a setter.
In general, the relation between AMSimulator and DAGAMSimulator is a little 
half way, for example {{totalContainers}} could just be 
{{allContainers.size()}} all the time but then the parent would not understand 
it.
Can we do a better object abstraction here?

Other comments:
* {{ugi.doAs()}} could just use a lambda.
* Can we have a test that actually tests delays?

> Add support for arbitrary DAG AM Simulator.
> ---
>
> Key: YARN-9754
> URL: https://issues.apache.org/jira/browse/YARN-9754
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9754.001.patch, YARN-9754.002.patch, 
> YARN-9754.003.patch
>
>
> Currently, all map containers are requests as soon as Application master 
> comes up and then all reducer containers are requested. This doesn't get 
> flexibility to simulate behavior of DAG where various number of containers 
> would be requested at different time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9775) RMWebServices /scheduler-conf GET returns all hadoop configurations

2019-08-26 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915950#comment-16915950
 ] 

Prabhu Joseph commented on YARN-9775:
-

Thanks [~jhung] for reviewing. Have fixed the above changes in  
[^YARN-9775-002.patch] .

> RMWebServices /scheduler-conf GET returns all hadoop configurations
> ---
>
> Key: YARN-9775
> URL: https://issues.apache.org/jira/browse/YARN-9775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: restapi
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9775-001.patch, YARN-9775-002.patch
>
>
> RMWebServices /scheduler-conf GET returns all hadoop configurations.
> Repro:
> {code}
> yarn.scheduler.configuration.store.class = zk
> http://:8088/ws/v1/cluster/scheduler-conf
> returns all hadoop configurations present in RM Classpath
> {code}
> The issue is only with ZKConfigurationStore, the other memory, fs and leveldb 
> works fine.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9642) AbstractYarnScheduler#clearPendingContainerCache could run even after transitiontostandby

2019-08-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915949#comment-16915949
 ] 

Hadoop QA commented on YARN-9642:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 
28s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}145m 56s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9642 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976240/YARN-9642.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c4b512e4a03b 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 23e532d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24633/testReport/ |
| Max. process+thread count | 794 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24633/console |
| Powered by | Apache Yetus 0.8.0   

[jira] [Updated] (YARN-9775) RMWebServices /scheduler-conf GET returns all hadoop configurations

2019-08-26 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9775:

Attachment: YARN-9775-002.patch

> RMWebServices /scheduler-conf GET returns all hadoop configurations
> ---
>
> Key: YARN-9775
> URL: https://issues.apache.org/jira/browse/YARN-9775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: restapi
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9775-001.patch, YARN-9775-002.patch
>
>
> RMWebServices /scheduler-conf GET returns all hadoop configurations.
> Repro:
> {code}
> yarn.scheduler.configuration.store.class = zk
> http://:8088/ws/v1/cluster/scheduler-conf
> returns all hadoop configurations present in RM Classpath
> {code}
> The issue is only with ZKConfigurationStore, the other memory, fs and leveldb 
> works fine.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9775) RMWebServices /scheduler-conf GET returns all hadoop configurations

2019-08-26 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9775:

Attachment: (was: YARN-9775-002.patch)

> RMWebServices /scheduler-conf GET returns all hadoop configurations
> ---
>
> Key: YARN-9775
> URL: https://issues.apache.org/jira/browse/YARN-9775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: restapi
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9775-001.patch
>
>
> RMWebServices /scheduler-conf GET returns all hadoop configurations.
> Repro:
> {code}
> yarn.scheduler.configuration.store.class = zk
> http://:8088/ws/v1/cluster/scheduler-conf
> returns all hadoop configurations present in RM Classpath
> {code}
> The issue is only with ZKConfigurationStore, the other memory, fs and leveldb 
> works fine.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9775) RMWebServices /scheduler-conf GET returns all hadoop configurations

2019-08-26 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9775:

Attachment: YARN-9775-002.patch

> RMWebServices /scheduler-conf GET returns all hadoop configurations
> ---
>
> Key: YARN-9775
> URL: https://issues.apache.org/jira/browse/YARN-9775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: restapi
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9775-001.patch
>
>
> RMWebServices /scheduler-conf GET returns all hadoop configurations.
> Repro:
> {code}
> yarn.scheduler.configuration.store.class = zk
> http://:8088/ws/v1/cluster/scheduler-conf
> returns all hadoop configurations present in RM Classpath
> {code}
> The issue is only with ZKConfigurationStore, the other memory, fs and leveldb 
> works fine.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9775) RMWebServices /scheduler-conf GET returns all hadoop configurations

2019-08-26 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915931#comment-16915931
 ] 

Jonathan Hung edited comment on YARN-9775 at 8/26/19 4:44 PM:
--

Thanks for the fix [~Prabhu Joseph], two small nits:
* can we use the RM_HOSTNAME keyword in YarnConfiguration for the test case?
* can we change assertEquals to assertNull?

LGTM otherwise.


was (Author: jhung):
Thanks for the fix [~Prabhu Joseph], +1 LGTM

> RMWebServices /scheduler-conf GET returns all hadoop configurations
> ---
>
> Key: YARN-9775
> URL: https://issues.apache.org/jira/browse/YARN-9775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: restapi
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9775-001.patch
>
>
> RMWebServices /scheduler-conf GET returns all hadoop configurations.
> Repro:
> {code}
> yarn.scheduler.configuration.store.class = zk
> http://:8088/ws/v1/cluster/scheduler-conf
> returns all hadoop configurations present in RM Classpath
> {code}
> The issue is only with ZKConfigurationStore, the other memory, fs and leveldb 
> works fine.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9775) RMWebServices /scheduler-conf GET returns all hadoop configurations

2019-08-26 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915931#comment-16915931
 ] 

Jonathan Hung commented on YARN-9775:
-

Thanks for the fix [~Prabhu Joseph], +1 LGTM

> RMWebServices /scheduler-conf GET returns all hadoop configurations
> ---
>
> Key: YARN-9775
> URL: https://issues.apache.org/jira/browse/YARN-9775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: restapi
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9775-001.patch
>
>
> RMWebServices /scheduler-conf GET returns all hadoop configurations.
> Repro:
> {code}
> yarn.scheduler.configuration.store.class = zk
> http://:8088/ws/v1/cluster/scheduler-conf
> returns all hadoop configurations present in RM Classpath
> {code}
> The issue is only with ZKConfigurationStore, the other memory, fs and leveldb 
> works fine.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9771) Add GPU in the container-executor.cfg example

2019-08-26 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915928#comment-16915928
 ] 

Hudson commented on YARN-9771:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17185 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17185/])
YARN-9771. Add GPU in the container-executor.cfg example. Contributed by 
(ebadger: rev 6d7f01c92dcbf22356bb813359081944f684a9c4)
* (edit) hadoop-yarn-project/hadoop-yarn/conf/container-executor.cfg


> Add GPU in the container-executor.cfg example
> -
>
> Key: YARN-9771
> URL: https://issues.apache.org/jira/browse/YARN-9771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Julia Kinga Marton
>Priority: Trivial
>  Labels: newbie
> Fix For: 3.3.0
>
> Attachments: YARN-9771.001.patch
>
>
> Currently the {{hadoop-yarn-project/hadoop-yarn/conf/container-executor.cfg}} 
> example config file does not contain a gpu section. According to the [apache 
> Hadoop wiki 
> page|https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html]
>  this is the way how you enable using gpus:
> {code:java}
> [gpu]
>   module.enabled=true
> {code}
> I didn't see any other related configs just like in the case of fpga or 
> pluggable device, but I think we can add that short part to show that the gpu 
> feature works.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9771) Add GPU in the container-executor.cfg example

2019-08-26 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-9771:
--
Fix Version/s: 3.3.0

+1 lgtm. I committed this to trunk. Thanks [~kmarton] for the patch and 
[~adam.antal] for the additional review.

> Add GPU in the container-executor.cfg example
> -
>
> Key: YARN-9771
> URL: https://issues.apache.org/jira/browse/YARN-9771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Julia Kinga Marton
>Priority: Trivial
>  Labels: newbie
> Fix For: 3.3.0
>
> Attachments: YARN-9771.001.patch
>
>
> Currently the {{hadoop-yarn-project/hadoop-yarn/conf/container-executor.cfg}} 
> example config file does not contain a gpu section. According to the [apache 
> Hadoop wiki 
> page|https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html]
>  this is the way how you enable using gpus:
> {code:java}
> [gpu]
>   module.enabled=true
> {code}
> I didn't see any other related configs just like in the case of fpga or 
> pluggable device, but I think we can add that short part to show that the gpu 
> feature works.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9786) testCancelledDelegationToken fails intermittently

2019-08-26 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915916#comment-16915916
 ] 

Peter Bacsko commented on YARN-9786:


[~adam.antal] this is a duplicate of YARN-9461. It's been solved already.

> testCancelledDelegationToken fails intermittently
> -
>
> Key: YARN-9786
> URL: https://issues.apache.org/jira/browse/YARN-9786
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Priority: Major
> Attachments: testCancelledDelegationToken.txt
>
>
> testCancelledDelegationToken[0] fails intermittently with the following 
> error/stack trace:
> {noformat}
> java.io.IOException: Server returned HTTP response code: 400 for URL: 
> http://localhost:8088/ws/v1/cluster/delegation-token
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.cancelDelegationToken(TestRMWebServicesDelegationTokenAuthentication.java:446)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.testCancelledDelegationToken(TestRMWebServicesDelegationTokenAuthentication.java:267)
> {noformat}
> I'll attach an stdout as well to this issue.
> It seems that the test helper infrastructure does not come up correctly. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9771) Add GPU in the container-executor.cfg example

2019-08-26 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915901#comment-16915901
 ] 

Adam Antal commented on YARN-9771:
--

+1 (non-binding) from me.

> Add GPU in the container-executor.cfg example
> -
>
> Key: YARN-9771
> URL: https://issues.apache.org/jira/browse/YARN-9771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Julia Kinga Marton
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-9771.001.patch
>
>
> Currently the {{hadoop-yarn-project/hadoop-yarn/conf/container-executor.cfg}} 
> example config file does not contain a gpu section. According to the [apache 
> Hadoop wiki 
> page|https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html]
>  this is the way how you enable using gpus:
> {code:java}
> [gpu]
>   module.enabled=true
> {code}
> I didn't see any other related configs just like in the case of fpga or 
> pluggable device, but I think we can add that short part to show that the gpu 
> feature works.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9786) testCancelledDelegationToken fails intermittently

2019-08-26 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-9786:
-
Attachment: testCancelledDelegationToken.txt

> testCancelledDelegationToken fails intermittently
> -
>
> Key: YARN-9786
> URL: https://issues.apache.org/jira/browse/YARN-9786
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Priority: Major
> Attachments: testCancelledDelegationToken.txt
>
>
> testCancelledDelegationToken[0] fails intermittently with the following 
> error/stack trace:
> {noformat}
> java.io.IOException: Server returned HTTP response code: 400 for URL: 
> http://localhost:8088/ws/v1/cluster/delegation-token
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.cancelDelegationToken(TestRMWebServicesDelegationTokenAuthentication.java:446)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.testCancelledDelegationToken(TestRMWebServicesDelegationTokenAuthentication.java:267)
> {noformat}
> I'll attach an stdout as well to this issue.
> It seems that the test helper infrastructure does not come up correctly. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9781) SchedConfCli to get current stored scheduler configuration

2019-08-26 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9781:

Attachment: YARN-9781-002.patch

> SchedConfCli to get current stored scheduler configuration
> --
>
> Key: YARN-9781
> URL: https://issues.apache.org/jira/browse/YARN-9781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9781-001.patch, YARN-9781-002.patch
>
>
> SchedConfCLI currently allows to add / remove / remove queue. It does not 
> support get configuration which RMWebServices provides as part of YARN-8559.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8917) Absolute (maximum) capacity of level3+ queues is wrongly calculated for absolute resource

2019-08-26 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915896#comment-16915896
 ] 

Hudson commented on YARN-8917:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17184 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17184/])
YARN-8917. Absolute (maximum) capacity of level3+ queues is wrongly 
(rohithsharmaks: rev 689d2e61058b5f719c6cbe9897a72b19b44a29a3)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestAbsoluteResourceConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java


> Absolute (maximum) capacity of level3+ queues is wrongly calculated for 
> absolute resource
> -
>
> Key: YARN-8917
> URL: https://issues.apache.org/jira/browse/YARN-8917
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8917.001.patch, YARN-8917.002.patch
>
>
> Absolute capacity should be equal to multiply capacity by parent-queue's 
> absolute-capacity,
> but currently it's calculated as dividing capacity by parent-queue's 
> absolute-capacity.
> Calculation for absolute-maximum-capacity has the same problem.
> For example: 
> root.a   capacity=0.4   maximum-capacity=0.8
> root.a.a1   capacity=0.5  maximum-capacity=0.6
> Absolute capacity of root.a.a1 should be 0.2 but is wrongly calculated as 1.25
> Absolute maximum capacity of root.a.a1 should be 0.48 but is wrongly 
> calculated as 0.75
> Moreover:
> {{childQueue.getQueueCapacities().getCapacity()}}  should be changed to 
> {{childQueue.getQueueCapacities().getCapacity(label)}} to avoid getting wrong 
> capacity from default partition when calculating for a non-default partition.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9786) testCancelledDelegationToken fails intermittently

2019-08-26 Thread Adam Antal (Jira)
Adam Antal created YARN-9786:


 Summary: testCancelledDelegationToken fails intermittently
 Key: YARN-9786
 URL: https://issues.apache.org/jira/browse/YARN-9786
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.2.0
Reporter: Adam Antal


testCancelledDelegationToken[0] fails intermittently with the following 
error/stack trace:
{noformat}
java.io.IOException: Server returned HTTP response code: 400 for URL: 
http://localhost:8088/ws/v1/cluster/delegation-token
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.cancelDelegationToken(TestRMWebServicesDelegationTokenAuthentication.java:446)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.testCancelledDelegationToken(TestRMWebServicesDelegationTokenAuthentication.java:267)

{noformat}

I'll attach an stdout as well to this issue.

It seems that the test helper infrastructure does not come up correctly. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8917) Absolute (maximum) capacity of level3+ queues is wrongly calculated for absolute resource

2019-08-26 Thread Rohith Sharma K S (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915890#comment-16915890
 ] 

Rohith Sharma K S commented on YARN-8917:
-

Committing shortly.. 

> Absolute (maximum) capacity of level3+ queues is wrongly calculated for 
> absolute resource
> -
>
> Key: YARN-8917
> URL: https://issues.apache.org/jira/browse/YARN-8917
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8917.001.patch, YARN-8917.002.patch
>
>
> Absolute capacity should be equal to multiply capacity by parent-queue's 
> absolute-capacity,
> but currently it's calculated as dividing capacity by parent-queue's 
> absolute-capacity.
> Calculation for absolute-maximum-capacity has the same problem.
> For example: 
> root.a   capacity=0.4   maximum-capacity=0.8
> root.a.a1   capacity=0.5  maximum-capacity=0.6
> Absolute capacity of root.a.a1 should be 0.2 but is wrongly calculated as 1.25
> Absolute maximum capacity of root.a.a1 should be 0.48 but is wrongly 
> calculated as 0.75
> Moreover:
> {{childQueue.getQueueCapacities().getCapacity()}}  should be changed to 
> {{childQueue.getQueueCapacities().getCapacity(label)}} to avoid getting wrong 
> capacity from default partition when calculating for a non-default partition.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9766) YARN CapacityScheduler QueueMetrics has missing metrics for parent queues having same name

2019-08-26 Thread Tarun Parimi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915884#comment-16915884
 ] 

Tarun Parimi commented on YARN-9766:


The test case failures dont seem to be related.

> YARN CapacityScheduler QueueMetrics has missing metrics for parent queues 
> having same name
> --
>
> Key: YARN-9766
> URL: https://issues.apache.org/jira/browse/YARN-9766
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-9766.001.patch
>
>
> In Capacity Scheduler, we enforce Leaf Queues to have unique names. But it is 
> not the case for Parent Queues. For example, we can have the below queue 
> hierarchy, where "b" is the queue name for two different queue paths root.a.b 
> and root.a.d.b . Since it is not a leaf queue this configuration works and 
> apps run fine in the leaf queues 'c'  and 'e'.
>  * root
>  ** a
>  *** b
>   c
>  *** d
>   b
>  * e
> But the jmx metrics does not show the metrics for the parent queue 
> "root.a.d.b" . We can see metrics only for "root.a.b" queue.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9785) Application gets activated even when AM memory has reached

2019-08-26 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-9785:

Description: 
Configure below property in resource-types.xml
{quote}
 yarn.resource-types
 yarn.io/gpu
 
{quote}
Submit applications even after AM limit for a queue is reached. Applications 
get activated even after limit is reached

!queue.png!

  was:
Configure below property in resource-types.xml
{quote}
 yarn.resource-types
 yarn.io/gpu
 
{quote}
Submit applications even after AM limit for a queue is reached. Applications 
get activated even after limit is reached

!issue.png!


> Application gets activated even when AM memory has reached
> --
>
> Key: YARN-9785
> URL: https://issues.apache.org/jira/browse/YARN-9785
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Blocker
>
> Configure below property in resource-types.xml
> {quote}
>  yarn.resource-types
>  yarn.io/gpu
>  
> {quote}
> Submit applications even after AM limit for a queue is reached. Applications 
> get activated even after limit is reached
> !queue.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9785) Application gets activated even when AM memory has reached

2019-08-26 Thread Bilwa S T (Jira)
Bilwa S T created YARN-9785:
---

 Summary: Application gets activated even when AM memory has reached
 Key: YARN-9785
 URL: https://issues.apache.org/jira/browse/YARN-9785
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bilwa S T
Assignee: Bilwa S T


Configure below property in resource-types.xml
{quote}
 yarn.resource-types
 yarn.io/gpu
 
{quote}
Submit applications even after AM limit for a queue is reached. Applications 
get activated even after limit is reached

!issue.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5

2019-08-26 Thread Szalay-Beko Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szalay-Beko Mate reassigned YARN-9783:
--

Assignee: Szalay-Beko Mate

> Remove low-level zookeeper test to be able to build Hadoop against zookeeper 
> 3.5.5
> --
>
> Key: YARN-9783
> URL: https://issues.apache.org/jira/browse/YARN-9783
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szalay-Beko Mate
>Assignee: Szalay-Beko Mate
>Priority: Major
>
> ZooKeeper 3.5.5 release is the latest stable one. It contains many new 
> features (including SSL related improvements which are very important for 
> production use; see [the release 
> notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
> should be no backward incompatible changes on the API, so the applications 
> using ZooKeeper clients should be built against the new zookeeper without any 
> problem and the new ZooKeeper client should work with the older (3.4) servers 
> without any issue, at least until someone is start to use new functionality.
> The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
> YARN yet, but to enable people to rebuild and test Hadoop with the new 
> ZooKeeper version.
> Currently the Hadoop build (with ZooKeeper 3.5.5) fails because of a YARN 
> test case: 
> [TestSecureRegistry.testLowlevelZKSaslLogin()|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureRegistry.java#L64].
>  This test case seems to use low-level ZooKeeper internal code, which changed 
> in the new ZooKeeper version. Although I am not sure what was the original 
> reasoning of the inclusion of this test in the YARN code, I propose to remove 
> it, and if there is still any missing test case in ZooKeeper, then let's 
> issue a ZooKeeper ticket to test this scenario there.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9607) Auto-configuring rollover-size of IFile format for non-appendable filesystems

2019-08-26 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915831#comment-16915831
 ] 

Zhankun Tang commented on YARN-9607:


Bulk update: Preparing for 3.1.3 release. moved all 3.1.3 non-blocker issues to 
3.1.4, please move back if it is a blocker for you.

> Auto-configuring rollover-size of IFile format for non-appendable filesystems
> -
>
> Key: YARN-9607
> URL: https://issues.apache.org/jira/browse/YARN-9607
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9607.001.patch, YARN-9607.002.patch
>
>
> In YARN-9525, we made IFile format compatible with remote folders with s3a 
> scheme. In rolling fashioned log-aggregation IFile still fails with the 
> "append is not supported" error message, which is a known limitation of the 
> format by design. 
> There is a workaround though: setting the rollover size in the configuration 
> of the IFile format, in each rolling cycle a new aggregated log file will be 
> created, thus we eliminated the append from the process. Setting this config 
> globally would cause performance problems in the regular log-aggregation, so 
> I'm suggesting to enforcing this config to zero, if the scheme of the URI is 
> s3a (or any other non-appendable filesystem).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9718) Yarn REST API, services endpoint remote command ejection

2019-08-26 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915826#comment-16915826
 ] 

Zhankun Tang commented on YARN-9718:


Bulk update: Preparing for 3.1.3 release. moved all 3.1.3 non-blocker issues to 
3.1.4, please move back if it is a blocker for you.

> Yarn REST API, services endpoint remote command ejection
> 
>
> Key: YARN-9718
> URL: https://issues.apache.org/jira/browse/YARN-9718
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.2.0, 3.1.1, 3.1.2
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9718.001.patch, YARN-9718.002.patch, 
> YARN-9718.003.patch, YARN-9718.004.patch
>
>
> Email from Oskars Vegeris:
>  
> During internal infrastructure testing it was discovered that the Hadoop Yarn 
> REST endpoint /app/v1/services contains a command injection vulnerability.
>  
> The services endpoint's normal use-case is for launching containers (e.g. 
> Docker images/apps), however by providing an argument with special shell 
> characters it is possible to execute arbitrary commands on the Host server - 
> this would allow to escalate privileges and access. 
>  
> The command injection is possible in the parameter for JVM options - 
> "yarn.service.am.java.opts". It's possible to enter arbitrary shell commands 
> by using sub-shell syntax `cmd` or $(cmd). No shell character filtering is 
> performed. 
>  
> The "launch_command" which needs to be provided is meant for the container 
> and if it's not being run in privileged mode or with special options, host OS 
> should not be accessible.
>  
> I've attached a minimal request sample with an injected 'ping' command. The 
> endpoint can also be found via UI @ 
> [http://yarn-resource-manager:8088/ui2/#/yarn-services]
>  
> If no auth, or "simple auth" (username) is enabled, commands can be executed 
> on the host OS. I know commands can also be ran by the "new-application" 
> feature, however this is clearly not meant to be a way to touch the host OS.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9718) Yarn REST API, services endpoint remote command ejection

2019-08-26 Thread Zhankun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-9718:
---
Target Version/s: 3.3.0, 3.2.1, 3.1.4  (was: 3.3.0, 3.2.1, 3.1.3)

> Yarn REST API, services endpoint remote command ejection
> 
>
> Key: YARN-9718
> URL: https://issues.apache.org/jira/browse/YARN-9718
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.2.0, 3.1.1, 3.1.2
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9718.001.patch, YARN-9718.002.patch, 
> YARN-9718.003.patch, YARN-9718.004.patch
>
>
> Email from Oskars Vegeris:
>  
> During internal infrastructure testing it was discovered that the Hadoop Yarn 
> REST endpoint /app/v1/services contains a command injection vulnerability.
>  
> The services endpoint's normal use-case is for launching containers (e.g. 
> Docker images/apps), however by providing an argument with special shell 
> characters it is possible to execute arbitrary commands on the Host server - 
> this would allow to escalate privileges and access. 
>  
> The command injection is possible in the parameter for JVM options - 
> "yarn.service.am.java.opts". It's possible to enter arbitrary shell commands 
> by using sub-shell syntax `cmd` or $(cmd). No shell character filtering is 
> performed. 
>  
> The "launch_command" which needs to be provided is meant for the container 
> and if it's not being run in privileged mode or with special options, host OS 
> should not be accessible.
>  
> I've attached a minimal request sample with an injected 'ping' command. The 
> endpoint can also be found via UI @ 
> [http://yarn-resource-manager:8088/ui2/#/yarn-services]
>  
> If no auth, or "simple auth" (username) is enabled, commands can be executed 
> on the host OS. I know commands can also be ran by the "new-application" 
> feature, however this is clearly not meant to be a way to touch the host OS.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9607) Auto-configuring rollover-size of IFile format for non-appendable filesystems

2019-08-26 Thread Zhankun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-9607:
---
Target Version/s: 3.3.0, 3.2.1, 3.1.4  (was: 3.3.0, 3.2.1, 3.1.3)

> Auto-configuring rollover-size of IFile format for non-appendable filesystems
> -
>
> Key: YARN-9607
> URL: https://issues.apache.org/jira/browse/YARN-9607
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9607.001.patch, YARN-9607.002.patch
>
>
> In YARN-9525, we made IFile format compatible with remote folders with s3a 
> scheme. In rolling fashioned log-aggregation IFile still fails with the 
> "append is not supported" error message, which is a known limitation of the 
> format by design. 
> There is a workaround though: setting the rollover size in the configuration 
> of the IFile format, in each rolling cycle a new aggregated log file will be 
> created, thus we eliminated the append from the process. Setting this config 
> globally would cause performance problems in the regular log-aggregation, so 
> I'm suggesting to enforcing this config to zero, if the scheme of the URI is 
> s3a (or any other non-appendable filesystem).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types

2019-08-26 Thread Zhankun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-8453:
---
Target Version/s: 3.0.4, 3.1.4  (was: 3.0.4, 3.1.3)

> Additional Unit  tests to verify queue limit and max-limit with multiple 
> resource types
> ---
>
> Key: YARN-8453
> URL: https://issues.apache.org/jira/browse/YARN-8453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.2
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-8453.001.patch
>
>
> Post support of additional resource types other then CPU and Memory, it could 
> be possible that one such new resource is exhausted its quota on a given 
> queue. But other resources such as Memory / CPU is still there beyond its 
> guaranteed limit (under max-limit). Adding more units test to ensure we are 
> not starving such allocation requests



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types

2019-08-26 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915818#comment-16915818
 ] 

Zhankun Tang commented on YARN-8453:


Bulk update: Preparing for 3.1.3 release. moved all 3.1.3 non-blocker issues to 
3.1.4, please move back if it is a blocker for you.

> Additional Unit  tests to verify queue limit and max-limit with multiple 
> resource types
> ---
>
> Key: YARN-8453
> URL: https://issues.apache.org/jira/browse/YARN-8453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.2
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-8453.001.patch
>
>
> Post support of additional resource types other then CPU and Memory, it could 
> be possible that one such new resource is exhausted its quota on a given 
> queue. But other resources such as Memory / CPU is still there beyond its 
> guaranteed limit (under max-limit). Adding more units test to ensure we are 
> not starving such allocation requests



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5

2019-08-26 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915812#comment-16915812
 ] 

Steve Loughran commented on YARN-9783:
--

cut it

> Remove low-level zookeeper test to be able to build Hadoop against zookeeper 
> 3.5.5
> --
>
> Key: YARN-9783
> URL: https://issues.apache.org/jira/browse/YARN-9783
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szalay-Beko Mate
>Priority: Major
>
> ZooKeeper 3.5.5 release is the latest stable one. It contains many new 
> features (including SSL related improvements which are very important for 
> production use; see [the release 
> notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
> should be no backward incompatible changes on the API, so the applications 
> using ZooKeeper clients should be built against the new zookeeper without any 
> problem and the new ZooKeeper client should work with the older (3.4) servers 
> without any issue, at least until someone is start to use new functionality.
> The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
> YARN yet, but to enable people to rebuild and test Hadoop with the new 
> ZooKeeper version.
> Currently the Hadoop build (with ZooKeeper 3.5.5) fails because of a YARN 
> test case: 
> [TestSecureRegistry.testLowlevelZKSaslLogin()|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureRegistry.java#L64].
>  This test case seems to use low-level ZooKeeper internal code, which changed 
> in the new ZooKeeper version. Although I am not sure what was the original 
> reasoning of the inclusion of this test in the YARN code, I propose to remove 
> it, and if there is still any missing test case in ZooKeeper, then let's 
> issue a ZooKeeper ticket to test this scenario there.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9642) AbstractYarnScheduler#clearPendingContainerCache could run even after transitiontostandby

2019-08-26 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915799#comment-16915799
 ] 

Zhankun Tang commented on YARN-9642:


Triggered a rebuild just now. Let's see the result if it finishes before the 
Jenkins shutdown.

> AbstractYarnScheduler#clearPendingContainerCache could run even after 
> transitiontostandby
> -
>
> Key: YARN-9642
> URL: https://issues.apache.org/jira/browse/YARN-9642
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
>  Labels: memory-leak
> Attachments: YARN-9642.001.patch, YARN-9642.002.patch, 
> YARN-9642.003.patch, image-2019-06-22-16-05-24-114.png
>
>
> The TimeTask could hold the reference of Scheduler in case of fast switch 
> over too.
>  AbstractYarnScheduler should make sure scheduled Timer cancelled on 
> serviceStop.
> Causes memory leak too
> !image-2019-06-22-16-05-24-114.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9784) TestLeafQueue is flaky

2019-08-26 Thread Julia Kinga Marton (Jira)
Julia Kinga Marton created YARN-9784:


 Summary: TestLeafQueue is flaky
 Key: YARN-9784
 URL: https://issues.apache.org/jira/browse/YARN-9784
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 3.3.0
Reporter: Julia Kinga Marton
Assignee: Julia Kinga Marton


There are some test cases in TestLeafQueue which are failing intermittently.

>From 100 runs, there were 16 failures. 

Some failure examples are the following ones:

{code:java}
2019-08-26 13:18:13 [ERROR] Errors: 
2019-08-26 13:18:13 [ERROR]   TestLeafQueue.setUp:144->setUpInternal:221 
WrongTypeOfReturnValue 
2019-08-26 13:18:13 YarnConfigu...
2019-08-26 13:18:13 [ERROR]   TestLeafQueue.setUp:144->setUpInternal:221 
WrongTypeOfReturnValue 
2019-08-26 13:18:13 YarnConfigu...
2019-08-26 13:18:13 [INFO] 
2019-08-26 13:18:13 [ERROR] Tests run: 36, Failures: 0, Errors: 2, Skipped: 0
{code}

{code:java}
2019-08-26 13:18:09 [ERROR] Failures: 
2019-08-26 13:18:09 [ERROR]   TestLeafQueue.testHeadroomWithMaxCap:1373 
expected:<2048> but was:<0>
2019-08-26 13:18:09 [INFO] 
2019-08-26 13:18:09 [ERROR] Tests run: 36, Failures: 1, Errors: 0, Skipped: 0
{code}
{code:java}
2019-08-26 13:18:18 [ERROR] Errors: 
2019-08-26 13:18:18 [ERROR]   TestLeafQueue.setUp:144->setUpInternal:221 
WrongTypeOfReturnValue 
2019-08-26 13:18:18 YarnConfigu...
2019-08-26 13:18:18 [ERROR]   TestLeafQueue.testHeadroomWithMaxCap:1307 ? 
ClassCast org.apache.hadoop.yarn.c...
2019-08-26 13:18:18 [INFO] 
2019-08-26 13:18:18 [ERROR] Tests run: 36, Failures: 0, Errors: 2, Skipped: 0
{code}
{code:java}
2019-08-26 13:18:10 [ERROR] Failures: 
2019-08-26 13:18:10 [ERROR]   TestLeafQueue.testDRFUserLimits:847 Verify user_0 
got resources 
2019-08-26 13:18:10 [INFO] 
2019-08-26 13:18:10 [ERROR] Tests run: 36, Failures: 1, Errors: 0, Skipped: 0
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915788#comment-16915788
 ] 

Hadoop QA commented on YARN-9714:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m  
5s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 53s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 33s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}168m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9714 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12978576/YARN-9714.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d5a6aa414cff 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 23e532d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24631/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24631/testReport/ |
| Max. process+thread count | 893 (vs. ulimit of 5500) |
| modules | C: 

[jira] [Commented] (YARN-8917) Absolute (maximum) capacity of level3+ queues is wrongly calculated for absolute resource

2019-08-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915758#comment-16915758
 ] 

Hadoop QA commented on YARN-8917:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
58s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 30s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 25 unchanged - 0 fixed = 26 total (was 25) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 51s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 87m  
3s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 49s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-8917 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944836/YARN-8917.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d5fba3c5cae2 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d2225c8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24630/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24630/testReport/ |
| Max. process+thread count | 862 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-9771) Add GPU in the container-executor.cfg example

2019-08-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915712#comment-16915712
 ] 

Hadoop QA commented on YARN-9771:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  2s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 47s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9771 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12978577/YARN-9771.001.patch |
| Optional Tests |  dupname  asflicense  |
| uname | Linux f0067b518f07 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 23e532d |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 335 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn U: 
hadoop-yarn-project/hadoop-yarn |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24632/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add GPU in the container-executor.cfg example
> -
>
> Key: YARN-9771
> URL: https://issues.apache.org/jira/browse/YARN-9771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Julia Kinga Marton
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-9771.001.patch
>
>
> Currently the {{hadoop-yarn-project/hadoop-yarn/conf/container-executor.cfg}} 
> example config file does not contain a gpu section. According to the [apache 
> Hadoop wiki 
> page|https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html]
>  this is the way how you enable using gpus:
> {code:java}
> [gpu]
>   module.enabled=true
> {code}
> I didn't see any other related configs just like in the case of fpga or 
> pluggable device, but I think we can add that short part to show that the gpu 
> feature works.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5

2019-08-26 Thread Szalay-Beko Mate (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915704#comment-16915704
 ] 

Szalay-Beko Mate commented on YARN-9783:


[~ste...@apache.org], what do you think?

> Remove low-level zookeeper test to be able to build Hadoop against zookeeper 
> 3.5.5
> --
>
> Key: YARN-9783
> URL: https://issues.apache.org/jira/browse/YARN-9783
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szalay-Beko Mate
>Priority: Major
>
> ZooKeeper 3.5.5 release is the latest stable one. It contains many new 
> features (including SSL related improvements which are very important for 
> production use; see [the release 
> notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
> should be no backward incompatible changes on the API, so the applications 
> using ZooKeeper clients should be built against the new zookeeper without any 
> problem and the new ZooKeeper client should work with the older (3.4) servers 
> without any issue, at least until someone is start to use new functionality.
> The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
> YARN yet, but to enable people to rebuild and test Hadoop with the new 
> ZooKeeper version.
> Currently the Hadoop build (with ZooKeeper 3.5.5) fails because of a YARN 
> test case: 
> [TestSecureRegistry.testLowlevelZKSaslLogin()|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureRegistry.java#L64].
>  This test case seems to use low-level ZooKeeper internal code, which changed 
> in the new ZooKeeper version. Although I am not sure what was the original 
> reasoning of the inclusion of this test in the YARN code, I propose to remove 
> it, and if there is still any missing test case in ZooKeeper, then let's 
> issue a ZooKeeper ticket to test this scenario there.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5

2019-08-26 Thread Szalay-Beko Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szalay-Beko Mate updated YARN-9783:
---
Description: 
ZooKeeper 3.5.5 release is the latest stable one. It contains many new features 
(including SSL related improvements which are very important for production 
use; see [the release 
notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
should be no backward incompatible changes on the API, so the applications 
using ZooKeeper clients should be built against the new zookeeper without any 
problem and the new ZooKeeper client should work with the older (3.4) servers 
without any issue, at least until someone is start to use new functionality.

The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
YARN yet, but to enable people to rebuild and test Hadoop with the new 
ZooKeeper version.

Currently the Hadoop build (with ZooKeeper 3.5.5) fails because of a YARN test 
case: 
[TestSecureRegistry.testLowlevelZKSaslLogin()|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureRegistry.java#L64].
 This test case seems to use low-level ZooKeeper internal code, which changed 
in the new ZooKeeper version. Although I am not sure what was the original 
reasoning of the inclusion of this test in the YARN code, I propose to remove 
it, and if there is still any missing test case in ZooKeeper, then let's issue 
a ZooKeeper ticket to test this scenario there.

  was:
ZooKeeper 3.5.5 release is the latest stable one. It contains many new features 
(including SSL related improvements which are very important for production 
use; see [the release 
notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
should be no backward incompatible changes on the API, so the applications 
using ZooKeeper clients should be built against the new zookeeper without any 
problem and the new ZooKeeper client should work with the older (3.4) servers 
without any issue, at least until someone is start to use new functionality.

The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
YARN yet, but to enable people to rebuild and test Hadoop with the new 
ZooKeeper version.

Currently the Hadoop build (with ZooKeeper 3.5.5) fails because of a YARN test 
case: `TestSecureRegistry.testLowlevelZKSaslLogin()`. This test case seems to 
use low-level ZooKeeper internal code, which changed in the new ZooKeeper 
version. Although I am not sure what was the original reasoning of the 
inclusion of this test in the YARN code, I propose to remove it, and if there 
is still any missing test case in ZooKeeper, then let's issue a ZooKeeper 
ticket to test this scenario there.


> Remove low-level zookeeper test to be able to build Hadoop against zookeeper 
> 3.5.5
> --
>
> Key: YARN-9783
> URL: https://issues.apache.org/jira/browse/YARN-9783
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szalay-Beko Mate
>Priority: Major
>
> ZooKeeper 3.5.5 release is the latest stable one. It contains many new 
> features (including SSL related improvements which are very important for 
> production use; see [the release 
> notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
> should be no backward incompatible changes on the API, so the applications 
> using ZooKeeper clients should be built against the new zookeeper without any 
> problem and the new ZooKeeper client should work with the older (3.4) servers 
> without any issue, at least until someone is start to use new functionality.
> The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
> YARN yet, but to enable people to rebuild and test Hadoop with the new 
> ZooKeeper version.
> Currently the Hadoop build (with ZooKeeper 3.5.5) fails because of a YARN 
> test case: 
> [TestSecureRegistry.testLowlevelZKSaslLogin()|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureRegistry.java#L64].
>  This test case seems to use low-level ZooKeeper internal code, which changed 
> in the new ZooKeeper version. Although I am not sure what was the original 
> reasoning of the inclusion of this test in the YARN code, I propose to remove 
> it, and if there is still any missing test case in ZooKeeper, then let's 
> issue a ZooKeeper ticket to test this scenario there.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: 

[jira] [Updated] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5

2019-08-26 Thread Szalay-Beko Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szalay-Beko Mate updated YARN-9783:
---
Description: 
ZooKeeper 3.5.5 release is the latest stable one. It contains many new features 
(including SSL related improvements which are very important for production 
use; see [the release 
notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
should be no backward incompatible changes on the API, so the applications 
using ZooKeeper clients should be built against the new zookeeper without any 
problem and the new ZooKeeper client should work with the older (3.4) servers 
without any issue, at least until someone is start to use new functionality.

The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
YARN yet, but to enable people to rebuild and test Hadoop with the new 
ZooKeeper version.

Currently the Hadoop build (with ZooKeeper 3.5.5) fails because of a YARN test 
case: `TestSecureRegistry.testLowlevelZKSaslLogin()`. This test case seems to 
use low-level ZooKeeper internal code, which changed in the new ZooKeeper 
version. Although I am not sure what was the original reasoning of the 
inclusion of this test in the YARN code, I propose to remove it, and if there 
is still any missing test case in ZooKeeper, then let's issue a ZooKeeper 
ticket to test this scenario there.

  was:
ZooKeeper 3.5.5 release is the latest stable one. It contains many new features 
(including SSL related improvements which are very important for production 
use; see [the release 
notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
should be no backward incompatible changes on the API, so the applications 
using ZooKeeper clients should be built against the new zookeeper without any 
problem and the new ZooKeeper client should work with the older (3.4) servers 
without any issue, at least until someone is start to use new functionality.

The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
YARN yet, but to enable people to rebuild and test Hadoop with the new 
ZooKeeper version.

Currently the Hadoop build fails because of a YARN test case: 
`TestSecureRegistry.testLowlevelZKSaslLogin()`. This test case seems to use 
low-level ZooKeeper internal code, which changed in the new ZooKeeper version. 
Although I am not sure what was the original reasoning of the inclusion of this 
test in the YARN code, I propose to remove it, and if there is still any 
missing test case in ZooKeeper, then let's issue a ZooKeeper ticket to test 
this scenario there.


> Remove low-level zookeeper test to be able to build Hadoop against zookeeper 
> 3.5.5
> --
>
> Key: YARN-9783
> URL: https://issues.apache.org/jira/browse/YARN-9783
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szalay-Beko Mate
>Priority: Major
>
> ZooKeeper 3.5.5 release is the latest stable one. It contains many new 
> features (including SSL related improvements which are very important for 
> production use; see [the release 
> notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
> should be no backward incompatible changes on the API, so the applications 
> using ZooKeeper clients should be built against the new zookeeper without any 
> problem and the new ZooKeeper client should work with the older (3.4) servers 
> without any issue, at least until someone is start to use new functionality.
> The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
> YARN yet, but to enable people to rebuild and test Hadoop with the new 
> ZooKeeper version.
> Currently the Hadoop build (with ZooKeeper 3.5.5) fails because of a YARN 
> test case: `TestSecureRegistry.testLowlevelZKSaslLogin()`. This test case 
> seems to use low-level ZooKeeper internal code, which changed in the new 
> ZooKeeper version. Although I am not sure what was the original reasoning of 
> the inclusion of this test in the YARN code, I propose to remove it, and if 
> there is still any missing test case in ZooKeeper, then let's issue a 
> ZooKeeper ticket to test this scenario there.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5

2019-08-26 Thread Szalay-Beko Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szalay-Beko Mate updated YARN-9783:
---
Description: 
ZooKeeper 3.5.5 release is the latest stable one. It contains many new features 
(including SSL related improvements which are very important for production 
use; see [the release 
notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
should be no backward incompatible changes on the API, so the applications 
using ZooKeeper clients should be built against the new zookeeper without any 
problem and the new ZooKeeper client should work with the older (3.4) servers 
without any issue, at least until someone is start to use new functionality.

The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
YARN yet, but to enable people to rebuild and test Hadoop with the new 
ZooKeeper version.

Currently the Hadoop build fails because of a YARN test case: 
`TestSecureRegistry.testLowlevelZKSaslLogin()`. This test case seems to use 
low-level ZooKeeper internal code, which changed in the new ZooKeeper version. 
Although I am not sure what was the original reasoning of the inclusion of this 
test in the YARN code, I propose to remove it, and if there is still any 
missing test case in ZooKeeper, then let's issue a ZooKeeper ticket to test 
this scenario there.

  was:
ZooKeeper 3.5.5 release is the latest stable one. It contains many new features 
(including SSL related improvements which are very important for production 
use; see [the release 
notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
should be no backward incompatible changes on the API, so the applications 
using ZooKeeper clients should be built against the new zookeeper without any 
problem and the new ZooKeeper client should work with the older (3.4) servers 
without any issue, at least until someone is start to use new functionality.

The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
YARN yet, but to enable people to rebuild and test Hadoop with the new 
ZooKeeper version.

Currently the Hadoop build fails only because of a single YARN test case: 
`TestSecureRegistry.testLowlevelZKSaslLogin()`. This test case seems to use 
low-level ZooKeeper internal code, which changed in the new ZooKeeper version. 
Although I am not sure what was the original reasoning of the inclusion of this 
test in the YARN code, I propose to remove it, and if there is still any 
missing test case in ZooKeeper, then let's issue a ZooKeeper ticket to test 
this scenario there.


> Remove low-level zookeeper test to be able to build Hadoop against zookeeper 
> 3.5.5
> --
>
> Key: YARN-9783
> URL: https://issues.apache.org/jira/browse/YARN-9783
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szalay-Beko Mate
>Priority: Major
>
> ZooKeeper 3.5.5 release is the latest stable one. It contains many new 
> features (including SSL related improvements which are very important for 
> production use; see [the release 
> notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
> should be no backward incompatible changes on the API, so the applications 
> using ZooKeeper clients should be built against the new zookeeper without any 
> problem and the new ZooKeeper client should work with the older (3.4) servers 
> without any issue, at least until someone is start to use new functionality.
> The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
> YARN yet, but to enable people to rebuild and test Hadoop with the new 
> ZooKeeper version.
> Currently the Hadoop build fails because of a YARN test case: 
> `TestSecureRegistry.testLowlevelZKSaslLogin()`. This test case seems to use 
> low-level ZooKeeper internal code, which changed in the new ZooKeeper 
> version. Although I am not sure what was the original reasoning of the 
> inclusion of this test in the YARN code, I propose to remove it, and if there 
> is still any missing test case in ZooKeeper, then let's issue a ZooKeeper 
> ticket to test this scenario there.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9766) YARN CapacityScheduler QueueMetrics has missing metrics for parent queues having same name

2019-08-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915693#comment-16915693
 ] 

Hadoop QA commented on YARN-9766:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
48s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 30s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 6 new + 102 unchanged - 0 fixed = 108 total (was 102) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}113m 25s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}169m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestSchedulingRequestContainerAllocationAsync
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerOvercommit
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerOvercommit |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | 
hadoop.yarn.server.resourcemanager.metrics.TestCombinedSystemMetricsPublisher |
|   | hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9766 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12978555/YARN-9766.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6b68ef091439 4.15.0-48-generic #51-Ubuntu SMP 

[jira] [Created] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5

2019-08-26 Thread Szalay-Beko Mate (Jira)
Szalay-Beko Mate created YARN-9783:
--

 Summary: Remove low-level zookeeper test to be able to build 
Hadoop against zookeeper 3.5.5
 Key: YARN-9783
 URL: https://issues.apache.org/jira/browse/YARN-9783
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szalay-Beko Mate


ZooKeeper 3.5.5 release is the latest stable one. It contains many new features 
(including SSL related improvements which are very important for production 
use; see [the release 
notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
should be no backward incompatible changes on the API, so the applications 
using ZooKeeper clients should be built against the new zookeeper without any 
problem and the new ZooKeeper client should work with the older (3.4) servers 
without any issue, at least until someone is start to use new functionality.

The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
YARN yet, but to enable people to rebuild and test Hadoop with the new 
ZooKeeper version.

Currently the Hadoop build fails only because of a single YARN test case: 
`TestSecureRegistry.testLowlevelZKSaslLogin()`. This test case seems to use 
low-level ZooKeeper internal code, which changed in the new ZooKeeper version. 
Although I am not sure what was the original reasoning of the inclusion of this 
test in the YARN code, I propose to remove it, and if there is still any 
missing test case in ZooKeeper, then let's issue a ZooKeeper ticket to test 
this scenario there.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9780) SchedulerConf Mutation Api does not Allow Stop and Remove Queue in a single call

2019-08-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915686#comment-16915686
 ] 

Hadoop QA commented on YARN-9780:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m  
9s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}141m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9780 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12978556/YARN-9780-003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b9018a146963 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d2225c8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24628/testReport/ |
| Max. process+thread count | 868 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24628/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> SchedulerConf Mutation Api does not Allow Stop and 

[jira] [Commented] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.

2019-08-26 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915685#comment-16915685
 ] 

Tao Yang commented on YARN-8193:


Hi, [~sunilg], [~leftnoteasy]. 
Any updates or plans about this fix on branch-2.x?  YARN-9779 seems to be the 
same issue.

> YARN RM hangs abruptly (stops allocating resources) when running successive 
> applications.
> -
>
> Key: YARN-8193
> URL: https://issues.apache.org/jira/browse/YARN-8193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8193-branch-2-001.patch, 
> YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, YARN-8193.002.patch
>
>
> When running massive queries successively, at some point RM just hangs and 
> stops allocating resources. At the point RM get hangs, YARN throw 
> NullPointerException  at RegularContainerAllocator.getLocalityWaitFactor.
> There's sufficient space given to yarn.nodemanager.local-dirs (not a node 
> health issue, RM didn't report any node being unhealthy). There is no fixed 
> trigger for this (query or operation).
> This problem goes away on restarting ResourceManager. No NM restart is 
> required. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9779) NPE while allocating a container

2019-08-26 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915684#comment-16915684
 ] 

Tao Yang commented on YARN-9779:


Sorry for the late reply. I think this issue is duplicate with YARN-8193.

> NPE while allocating a container
> 
>
> Key: YARN-9779
> URL: https://issues.apache.org/jira/browse/YARN-9779
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0
>Reporter: Amithsha
>Priority: Critical
>
> Getting the following exception while allocating a container 
>  
> 2019-08-22 23:59:20,180 FATAL event.EventDispatcher (?:?(?)) - Error in 
> handling event type NODE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.canAssign(RegularContainerAllocator.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignOffSwitchContainers(RegularContainerAllocator.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainersOnNode(RegularContainerAllocator.java:469)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.tryAllocateOnNode(RegularContainerAllocator.java:250)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.allocate(RegularContainerAllocator.java:819)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(RegularContainerAllocator.java:857)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:55)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:868)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:1121)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1346)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1341)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1430)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1205)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1067)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1472)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:151)
>  at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-08-22 23:59:20,180 INFO  rmcontainer.RMContainerImpl (?:?(?)) - 
> container_e2364_1565770624228_198773_01_000946 Container Transitioned from 
> ALLOCATED to ACQUIRED
> 2019-08-22 23:59:20,180 INFO  event.EventDispatcher (?:?(?)) - Exiting, bbye..



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-9779) NPE while allocating a container

2019-08-26 Thread Tao Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9779:
---
Comment: was deleted

(was: Sorry for the late reply. I think this issue is duplicate with YARN-8193.)

> NPE while allocating a container
> 
>
> Key: YARN-9779
> URL: https://issues.apache.org/jira/browse/YARN-9779
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0
>Reporter: Amithsha
>Priority: Critical
>
> Getting the following exception while allocating a container 
>  
> 2019-08-22 23:59:20,180 FATAL event.EventDispatcher (?:?(?)) - Error in 
> handling event type NODE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.canAssign(RegularContainerAllocator.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignOffSwitchContainers(RegularContainerAllocator.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainersOnNode(RegularContainerAllocator.java:469)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.tryAllocateOnNode(RegularContainerAllocator.java:250)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.allocate(RegularContainerAllocator.java:819)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(RegularContainerAllocator.java:857)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:55)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:868)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:1121)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1346)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1341)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1430)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1205)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1067)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1472)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:151)
>  at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-08-22 23:59:20,180 INFO  rmcontainer.RMContainerImpl (?:?(?)) - 
> container_e2364_1565770624228_198773_01_000946 Container Transitioned from 
> ALLOCATED to ACQUIRED
> 2019-08-22 23:59:20,180 INFO  event.EventDispatcher (?:?(?)) - Exiting, bbye..



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8678) Queue Management API - rephrase error messages

2019-08-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915656#comment-16915656
 ] 

Hadoop QA commented on YARN-8678:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
54s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 41s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 94m 
32s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
54s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}163m 58s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-8678 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12978551/YARN-8678-002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3e4f21da3329 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 
22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d2225c8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24626/testReport/ |
| Max. process+thread count | 874 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24626/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Queue Management API - rephrase error messages
> 

[jira] [Commented] (YARN-8234) Improve RM system metrics publisher's performance by pushing events to timeline server in batch

2019-08-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915651#comment-16915651
 ] 

Hadoop QA commented on YARN-8234:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 16m 
30s{color} | {color:red} Docker failed to build yetus/hadoop:c2d96dd50e7. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8234 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12939409/YARN-8234-branch-2.8.3.004.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24629/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Improve RM system metrics publisher's performance by pushing events to 
> timeline server in batch
> ---
>
> Key: YARN-8234
> URL: https://issues.apache.org/jira/browse/YARN-8234
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.8.3
>Reporter: Hu Ziqian
>Assignee: Hu Ziqian
>Priority: Critical
> Attachments: YARN-8234-branch-2.8.3.001.patch, 
> YARN-8234-branch-2.8.3.002.patch, YARN-8234-branch-2.8.3.003.patch, 
> YARN-8234-branch-2.8.3.004.patch, YARN-8234.001.patch, YARN-8234.002.patch, 
> YARN-8234.003.patch, YARN-8234.004.patch
>
>
> When system metrics publisher is enabled, RM will push events to timeline 
> server via restful api. If the cluster load is heavy, many events are sent to 
> timeline server and the timeline server's event handler thread locked. 
> YARN-7266 talked about the detail of this problem. Because of the lock, 
> timeline server can't receive event as fast as it generated in RM and lots of 
> timeline event stays in RM's memory. Finally, those events will consume all 
> RM's memory and RM will start a full gc (which cause an JVM stop-world and 
> cause a timeout from rm to zookeeper) or even get an OOM. 
> The main problem here is that timeline can't receive timeline server's event 
> as fast as it generated. Now, RM system metrics publisher put only one event 
> in a request, and most time costs on handling http header or some thing about 
> the net connection on timeline side. Only few time is spent on dealing with 
> the timeline event which is truly valuable.
> In this issue, we add a buffer in system metrics publisher and let publisher 
> send events to timeline server in batch via one request. When sets the batch 
> size to 1000, in out experiment the speed of the timeline server receives 
> events has 100x improvement. We have implement this function int our product 
> environment which accepts 2 app's in one hour and it works fine.
> We add following configuration:
>  * yarn.resourcemanager.system-metrics-publisher.batch-size: the size of 
> system metrics publisher sending events in one request. Default value is 1000
>  * yarn.resourcemanager.system-metrics-publisher.buffer-size: the size of the 
> event buffer in system metrics publisher.
>  * yarn.resourcemanager.system-metrics-publisher.interval-seconds: When 
> enable batch publishing, we must avoid that the publisher waits for a batch 
> to be filled up and hold events in buffer for long time. So we add another 
> thread which send event's in the buffer periodically. This config sets the 
> interval of the cyclical sending thread. The default value is 60s.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9771) Add GPU in the container-executor.cfg example

2019-08-26 Thread Julia Kinga Marton (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julia Kinga Marton updated YARN-9771:
-
Attachment: YARN-9771.001.patch

> Add GPU in the container-executor.cfg example
> -
>
> Key: YARN-9771
> URL: https://issues.apache.org/jira/browse/YARN-9771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Julia Kinga Marton
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-9771.001.patch
>
>
> Currently the {{hadoop-yarn-project/hadoop-yarn/conf/container-executor.cfg}} 
> example config file does not contain a gpu section. According to the [apache 
> Hadoop wiki 
> page|https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html]
>  this is the way how you enable using gpus:
> {code:java}
> [gpu]
>   module.enabled=true
> {code}
> I didn't see any other related configs just like in the case of fpga or 
> pluggable device, but I think we can add that short part to show that the gpu 
> feature works.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-26 Thread Tao Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9714:
---
Attachment: YARN-9714.003.patch

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch, 
> YARN-9714.003.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8917) Absolute (maximum) capacity of level3+ queues is wrongly calculated for absolute resource

2019-08-26 Thread Rohith Sharma K S (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915632#comment-16915632
 ] 

Rohith Sharma K S commented on YARN-8917:
-

Thanks [~Tao Yang].. I will retriger jenkins again to check does jenkins is 
intermittent or consistently failing. 

> Absolute (maximum) capacity of level3+ queues is wrongly calculated for 
> absolute resource
> -
>
> Key: YARN-8917
> URL: https://issues.apache.org/jira/browse/YARN-8917
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8917.001.patch, YARN-8917.002.patch
>
>
> Absolute capacity should be equal to multiply capacity by parent-queue's 
> absolute-capacity,
> but currently it's calculated as dividing capacity by parent-queue's 
> absolute-capacity.
> Calculation for absolute-maximum-capacity has the same problem.
> For example: 
> root.a   capacity=0.4   maximum-capacity=0.8
> root.a.a1   capacity=0.5  maximum-capacity=0.6
> Absolute capacity of root.a.a1 should be 0.2 but is wrongly calculated as 1.25
> Absolute maximum capacity of root.a.a1 should be 0.48 but is wrongly 
> calculated as 0.75
> Moreover:
> {{childQueue.getQueueCapacities().getCapacity()}}  should be changed to 
> {{childQueue.getQueueCapacities().getCapacity(label)}} to avoid getting wrong 
> capacity from default partition when calculating for a non-default partition.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-26 Thread Rohith Sharma K S (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915630#comment-16915630
 ] 

Rohith Sharma K S commented on YARN-9714:
-

OK, sure. go ahead

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8917) Absolute (maximum) capacity of level3+ queues is wrongly calculated for absolute resource

2019-08-26 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915628#comment-16915628
 ] 

Tao Yang commented on YARN-8917:


Hi, [~rohithsharma], I can't reproduce this failure on trunk branch in my local 
environment, according to the standard output, the assignment which should 
allocate a container was broke for unknown reason after checking node, debug 
info is not enough to locate the cause. Moreover, this patch only affect the 
capacity shown in WebUI or REST results, won't affect the scheduling process. 
So I think they are not related.

> Absolute (maximum) capacity of level3+ queues is wrongly calculated for 
> absolute resource
> -
>
> Key: YARN-8917
> URL: https://issues.apache.org/jira/browse/YARN-8917
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8917.001.patch, YARN-8917.002.patch
>
>
> Absolute capacity should be equal to multiply capacity by parent-queue's 
> absolute-capacity,
> but currently it's calculated as dividing capacity by parent-queue's 
> absolute-capacity.
> Calculation for absolute-maximum-capacity has the same problem.
> For example: 
> root.a   capacity=0.4   maximum-capacity=0.8
> root.a.a1   capacity=0.5  maximum-capacity=0.6
> Absolute capacity of root.a.a1 should be 0.2 but is wrongly calculated as 1.25
> Absolute maximum capacity of root.a.a1 should be 0.48 but is wrongly 
> calculated as 0.75
> Moreover:
> {{childQueue.getQueueCapacities().getCapacity()}}  should be changed to 
> {{childQueue.getQueueCapacities().getCapacity(label)}} to avoid getting wrong 
> capacity from default partition when calculating for a non-default partition.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-26 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915625#comment-16915625
 ] 

Tao Yang commented on YARN-9714:


We can close the connection after checking {{zkManager != 
resourceManager.getZKManager()}}, 
when resourceManager.getZKManager() is null, it will make sure zkManager is not 
null then close.
when resourceManager.getZKManager() is not null and they are not the same one, 
it will close too.
I think it's a more direct way, make sense?

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-26 Thread Rohith Sharma K S (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915613#comment-16915613
 ] 

Rohith Sharma K S edited comment on YARN-9714 at 8/26/19 8:59 AM:
--

I see. Thanks. May be fix could be check for curator disabled and close the 
connection. Make sense?


was (Author: rohithsharma):
I see. Thanks. May be fix could be check for curator enabled and close the 
connection. Make sense?

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-26 Thread Rohith Sharma K S (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915613#comment-16915613
 ] 

Rohith Sharma K S commented on YARN-9714:
-

I see. Thanks. May be fix could be check for curator enabled and close the 
connection. Make sense?

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-26 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915598#comment-16915598
 ] 

Tao Yang edited comment on YARN-9714 at 8/26/19 8:45 AM:
-

Hi, [~rohithsharma].
I have commented (over 
[here|https://issues.apache.org/jira/browse/YARN-9714?focusedCommentId=16896704=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16896704])
 for this: "As for zkManager in ZKStateStore, it will reuse zkManager for HA 
when RM uses the Curator-based elector for leader election, otherwise it will 
be created for ZKRMStateStore".  Please refer to 
{{ResourceManager#createEmbeddedElector}} and {{ZKRMStateStore#initInternal}} 
for details.


was (Author: tao yang):
Hi, [~rohithsharma].

I have commented (over here) for this: "As for zkManager in ZKStateStore, it 
will reuse zkManager for HA when RM uses the Curator-based elector for leader 
election, otherwise it will be created for ZKRMStateStore".  Please refer to 
{{ResourceManager#createEmbeddedElector}} and {{ZKRMStateStore#initInternal}} 
for details.

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-26 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915598#comment-16915598
 ] 

Tao Yang edited comment on YARN-9714 at 8/26/19 8:43 AM:
-

Hi, [~rohithsharma].

I have commented (over here) for this: "As for zkManager in ZKStateStore, it 
will reuse zkManager for HA when RM uses the Curator-based elector for leader 
election, otherwise it will be created for ZKRMStateStore".  Please refer to 
{{ResourceManager#createEmbeddedElector}} and {{ZKRMStateStore#initInternal}} 
for details.


was (Author: tao yang):
Hi, [~rohithsharma].

I have commented (over 
[here|https://issues.apache.org/jira/browse/YARN-9714?focusedCommentId=16896704=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16896704])
 for this: "As for zkManager in ZKStateStore, it will reuse zkManager for HA 
when RM uses the Curator-based elector for leader election, otherwise it will 
be created for ZKRMStateStore". 

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-26 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915598#comment-16915598
 ] 

Tao Yang commented on YARN-9714:


Hi, [~rohithsharma].

I have commented (over 
[here|https://issues.apache.org/jira/browse/YARN-9714?focusedCommentId=16896704=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16896704])
 for this: "As for zkManager in ZKStateStore, it will reuse zkManager for HA 
when RM uses the Curator-based elector for leader election, otherwise it will 
be created for ZKRMStateStore". 

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-26 Thread Rohith Sharma K S (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915586#comment-16915586
 ] 

Rohith Sharma K S commented on YARN-9714:
-

In HA mode, we should NOT close the connection because the same ZK instance 
used for leader election. Am I missing anything?

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9779) NPE while allocating a container

2019-08-26 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915579#comment-16915579
 ] 

Bilwa S T commented on YARN-9779:
-

Hi [~Amithsha] i think YARN-8193 takes care of this issue. Please check

> NPE while allocating a container
> 
>
> Key: YARN-9779
> URL: https://issues.apache.org/jira/browse/YARN-9779
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0
>Reporter: Amithsha
>Priority: Critical
>
> Getting the following exception while allocating a container 
>  
> 2019-08-22 23:59:20,180 FATAL event.EventDispatcher (?:?(?)) - Error in 
> handling event type NODE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.canAssign(RegularContainerAllocator.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignOffSwitchContainers(RegularContainerAllocator.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainersOnNode(RegularContainerAllocator.java:469)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.tryAllocateOnNode(RegularContainerAllocator.java:250)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.allocate(RegularContainerAllocator.java:819)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(RegularContainerAllocator.java:857)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:55)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:868)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:1121)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1346)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1341)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1430)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1205)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1067)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1472)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:151)
>  at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-08-22 23:59:20,180 INFO  rmcontainer.RMContainerImpl (?:?(?)) - 
> container_e2364_1565770624228_198773_01_000946 Container Transitioned from 
> ALLOCATED to ACQUIRED
> 2019-08-22 23:59:20,180 INFO  event.EventDispatcher (?:?(?)) - Exiting, bbye..



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8257) Native service should automatically adding escapes for environment/launch cmd before sending to YARN

2019-08-26 Thread Zhankun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-8257:
---
Target Version/s: 3.1.4  (was: 3.1.3)

Bulk update: Preparing for 3.1.3 release. Moved all 3.1.3 non-blocker issues to 
3.1.4, please move back if it is a blocker.

> Native service should automatically adding escapes for environment/launch cmd 
> before sending to YARN
> 
>
> Key: YARN-8257
> URL: https://issues.apache.org/jira/browse/YARN-8257
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Gour Saha
>Priority: Critical
>
> Noticed this issue while using native service: 
> Basically, when a string for environment / launch command contains chars like 
> ", /, `: it needs to be escaped twice.
> The first time is from json spec, because of json accept double quote only, 
> it needs an escape.
> The second time is from launch container, what we did for command line is: 
> (ContainerLaunch.java)
> {code:java}
> line("exec /bin/bash -c \"", StringUtils.join(" ", command), "\"");{code}
> And for environment:
> {code:java}
> line("export ", key, "=\"", value, "\"");{code}
> An example of launch_command: 
> {code:java}
> "launch_command": "export CLASSPATH=\\`\\$HADOOP_HDFS_HOME/bin/hadoop 
> classpath --glob\\`"{code}
> And example of environment:
> {code:java}
> "TF_CONFIG" : "{\\\"cluster\\\": {\\\"master\\\": 
> [\\\"master-0.distributed-tf.ambari-qa.tensorflow.site:8000\\\"], \\\"ps\\\": 
> [\\\"ps-0.distributed-tf.ambari-qa.tensorflow.site:8000\\\"], \\\"worker\\\": 
> [\\\"worker-0.distributed-tf.ambari-qa.tensorflow.site:8000\\\"]}, 
> \\\"task\\\": {\\\"type\\\":\\\"${COMPONENT_NAME}\\\", 
> \\\"index\\\":${COMPONENT_ID}}, \\\"environment\\\":\\\"cloud\\\"}",{code}
> To improve usability, I think we should auto escape the input string once. 
> (For example, if user specified 
> {code}
> "TF_CONFIG": "\"key\""
> {code}
> We will automatically escape it to:
> {code}
> "TF_CONFIG": \\\"key\\\"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8417) Should skip passing HDFS_HOME, HADOOP_CONF_DIR, JAVA_HOME, etc. to Docker container.

2019-08-26 Thread Zhankun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-8417:
---
Target Version/s: 3.1.4  (was: 3.1.3)

Bulk update: Preparing for 3.1.3 release. Moved all 3.1.3 non-blocker issues to 
3.1.4, please move back if it is a blocker.

> Should skip passing HDFS_HOME, HADOOP_CONF_DIR, JAVA_HOME, etc. to Docker 
> container.
> 
>
> Key: YARN-8417
> URL: https://issues.apache.org/jira/browse/YARN-8417
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Critical
>
> Currently, YARN NM passes JAVA_HOME, HDFS_HOME, CLASSPATH environments before 
> launching Docker container no matter if ENTRY_POINT is used or not. This will 
> overwrite environments defined inside Dockerfile (by using \{{ENV}}). For 
> Docker container, it actually doesn't make sense to pass JAVA_HOME, 
> HDFS_HOME, etc. because inside docker image we have a separate Java/Hadoop 
> installed or mounted to exactly same directory of host machine.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8052) Move overwriting of service definition during flex to service master

2019-08-26 Thread Zhankun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-8052:
---
Target Version/s: 3.1.4  (was: 3.1.3)

Bulk update: Preparing for 3.1.3 release. Moved all 3.1.3 non-blocker issues to 
3.1.4, please move back if it is a blocker.

> Move overwriting of service definition during flex to service master
> 
>
> Key: YARN-8052
> URL: https://issues.apache.org/jira/browse/YARN-8052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> The overwrite of service definition during flex is done from the 
> ServiceClient. 
> During auto finalization of upgrade, the current service definition gets 
> overwritten as well by the service master. This creates a potential conflict. 
> Need to move the overwrite of service definition during flex to the 
> ServiceClient. 
> Discussed on YARN-8018.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8552) [DS] Container report fails for distributed containers

2019-08-26 Thread Zhankun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-8552:
---
Target Version/s: 3.1.4  (was: 3.1.3)

Bulk update: Preparing for 3.1.3 release. Moved all 3.1.3 non-blocker issues to 
3.1.4, please move back if it is a blocker.

> [DS]  Container report fails for distributed containers
> ---
>
> Key: YARN-8552
> URL: https://issues.apache.org/jira/browse/YARN-8552
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
>
> 2018-07-19 19:15:02,281 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1531994217928_0003_01_1099511627753 Container Transitioned from 
> ACQUIRED to RUNNING
> 2018-07-19 19:15:02,384 ERROR 
> org.apache.hadoop.yarn.server.webapp.ContainerBlock: Failed to read the 
> container container_1531994217928_0003_01_1099511627773.
> Container report failing for Distributed Scheduler containers. Currently all 
> the container are fetched from central RM so need to find alternative for the 
> same.
> {code}
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.yarn.exceptions.ContainerNotFoundException: 
> Container with id 'container_1531994217928_0003_01_1099511627773' doesn't 
> exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:499)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMContainerBlock.getContainerReport(RMContainerBlock.java:44)
> at 
> org.apache.hadoop.yarn.server.webapp.ContainerBlock$1.run(ContainerBlock.java:82)
> at 
> org.apache.hadoop.yarn.server.webapp.ContainerBlock$1.run(ContainerBlock.java:79)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> ... 70 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8234) Improve RM system metrics publisher's performance by pushing events to timeline server in batch

2019-08-26 Thread Zhankun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-8234:
---
Target Version/s: 3.1.4  (was: 3.1.3)

Bulk update: Preparing for 3.1.3 release. Moved all 3.1.3 non-blocker issues to 
3.1.4, please move back if it is a blocker.

> Improve RM system metrics publisher's performance by pushing events to 
> timeline server in batch
> ---
>
> Key: YARN-8234
> URL: https://issues.apache.org/jira/browse/YARN-8234
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.8.3
>Reporter: Hu Ziqian
>Assignee: Hu Ziqian
>Priority: Critical
> Attachments: YARN-8234-branch-2.8.3.001.patch, 
> YARN-8234-branch-2.8.3.002.patch, YARN-8234-branch-2.8.3.003.patch, 
> YARN-8234-branch-2.8.3.004.patch, YARN-8234.001.patch, YARN-8234.002.patch, 
> YARN-8234.003.patch, YARN-8234.004.patch
>
>
> When system metrics publisher is enabled, RM will push events to timeline 
> server via restful api. If the cluster load is heavy, many events are sent to 
> timeline server and the timeline server's event handler thread locked. 
> YARN-7266 talked about the detail of this problem. Because of the lock, 
> timeline server can't receive event as fast as it generated in RM and lots of 
> timeline event stays in RM's memory. Finally, those events will consume all 
> RM's memory and RM will start a full gc (which cause an JVM stop-world and 
> cause a timeout from rm to zookeeper) or even get an OOM. 
> The main problem here is that timeline can't receive timeline server's event 
> as fast as it generated. Now, RM system metrics publisher put only one event 
> in a request, and most time costs on handling http header or some thing about 
> the net connection on timeline side. Only few time is spent on dealing with 
> the timeline event which is truly valuable.
> In this issue, we add a buffer in system metrics publisher and let publisher 
> send events to timeline server in batch via one request. When sets the batch 
> size to 1000, in out experiment the speed of the timeline server receives 
> events has 100x improvement. We have implement this function int our product 
> environment which accepts 2 app's in one hour and it works fine.
> We add following configuration:
>  * yarn.resourcemanager.system-metrics-publisher.batch-size: the size of 
> system metrics publisher sending events in one request. Default value is 1000
>  * yarn.resourcemanager.system-metrics-publisher.buffer-size: the size of the 
> event buffer in system metrics publisher.
>  * yarn.resourcemanager.system-metrics-publisher.interval-seconds: When 
> enable batch publishing, we must avoid that the publisher waits for a batch 
> to be filled up and hold events in buffer for long time. So we add another 
> thread which send event's in the buffer periodically. This config sets the 
> interval of the cyclical sending thread. The default value is 60s.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9776) yarn logs throws an error "Not a valid BCFile"

2019-08-26 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915556#comment-16915556
 ] 

Prabhu Joseph commented on YARN-9776:
-

[~jordanagoodboy] Thanks for the update. Are we good to close this Jira.

> yarn logs throws an error "Not a valid BCFile"
> --
>
> Key: YARN-9776
> URL: https://issues.apache.org/jira/browse/YARN-9776
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.0
> Environment: HDP 3.1.0.78
>  
>Reporter: agoodboy
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Env: hdp 3.1.0.0-78
> Command: yarn logs -applicationId xxx, throws an error "Not a valid BCFile.", 
> and then exit.
> After open debug log using "export YARN_ROOT_LOGGER="DEBUG,console", and 
> rerun command. It shows that 
> "fileName=/data1/app-logs/hadoop/logs/application_1566555356033_0032/meta" is 
> not a valid BCFile. And after I remove the file from hdfs, and rerun command, 
> it success. 
> So, how to generate this meta file? 
> I guess that is because this in yarn-site.xml:
> 
>  
> yarn.timeline-service.generic-application-history.save-non-am-container-meta-info
>  true
>  
> I set this value to true because I want to see all container logs in timeline 
> web page, orelse I can just see the am container log.
> So, it seems that yarn logs can't properly handle this suituation.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9376) too many ContainerIdComparator instances are not necessary

2019-08-26 Thread Zhankun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-9376:
---

Bulk update: Preparing for 3.1.3 release. Moved the incorrect "3.1.2" 
non-blocker issues to 3.1.4, please move back to 3.1.3 if it is a blocker.

> too many ContainerIdComparator instances are not necessary
> --
>
> Key: YARN-9376
> URL: https://issues.apache.org/jira/browse/YARN-9376
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.1, 3.1.2
>Reporter: lindongdong
>Assignee: lindongdong
>Priority: Minor
> Attachments: YARN-9376.000.patch
>
>
>  One RMNodeImpl will create a new ContainerIdComparator instance, but it is 
> not necessary.
> we may keep a static ContainerIdComparator instance and it is enough.
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl#containersToClean
> {code:java}
> /* set of containers that need to be cleaned */
> private final Set containersToClean = new TreeSet(
> new ContainerIdComparator());
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9330) Add support to query scheduler endpoint filtered via queue (/scheduler/queue=abc)

2019-08-26 Thread Zhankun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-9330:
---

Bulk update: Preparing for 3.1.3 release. Moved the incorrect "3.1.2" 
non-blocker issues to 3.1.4, please move back to 3.1.3 if it is a blocker.

> Add support to query scheduler endpoint filtered via queue 
> (/scheduler/queue=abc)
> -
>
> Key: YARN-9330
> URL: https://issues.apache.org/jira/browse/YARN-9330
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 3.1.2
>Reporter: Prashant Golash
>Assignee: Prashant Golash
>Priority: Minor
>  Labels: newbie, patch
> Attachments: YARN-9330.001.patch, YARN-9330.002.patch, 
> YARN-9330.003.patch, YARN-9330.004.patch
>
>
> Currently, the endpoint */ws/v1/cluster/scheduler * returns all the 
> queues as part of rest contract.
> The intention of the JIRA is to be able to pass additional queue PathParam to 
> just return that queue. For e.g.
> */ws/v1/cluster/scheduler/queue=testParentQueue*
> */ws/v1/cluster/scheduler/queue=testChildQueue*
> This will make it easy for Rest clients to query just for the desired queue 
> and parse from the response.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9674) Max AM Resource calculation is wrong

2019-08-26 Thread Zhankun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-9674:
---

Bulk update: Preparing for 3.1.3 release. Moved the incorrect "3.1.2" 
non-blocker issues to 3.1.4, please move back to 3.1.3 if it is a blocker.

> Max AM Resource calculation is wrong
> 
>
> Key: YARN-9674
> URL: https://issues.apache.org/jira/browse/YARN-9674
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.2
>Reporter: ANANDA G B
>Priority: Major
> Attachments: RM_Issue.png
>
>
> 'Max AM Resource' calculated for default partition using 'Effective Max 
> Capacity' and ohter partitions it using 'Effective Capacity'.
> Which one is correct implemenation?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >