[jira] [Comment Edited] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060372#comment-16060372 ] Tao Yang edited comment on YARN-6678 at 6/23/17 4:11 AM: - Thanks [~sunilg] for your comments. {quote} 1. In FiCaSchedulerApp#accept, its better to use RMContainer#equals instead of using != {quote} As [~leftnoteasy] mentioned, it should be enough to use == to compare two instances. Are there some other concerns about this? I noticed that this patch caused several failed tests, but these are all passed when I run it locally. What might be the problem? was (Author: tao yang): Thanks [~sunilg] for your comments. {quote} 1. In FiCaSchedulerApp#accept, its better to use RMContainer#equals instead of using != {quote} As [~leftnoteasy] mentioned, it should be enough to use == to compare two instances. Are there some other concerns about this? > Committer thread crashes with IllegalStateException in async-scheduling mode > of CapacityScheduler > - > > Key: YARN-6678 > URL: https://issues.apache.org/jira/browse/YARN-6678 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.9.0, 3.0.0-alpha3 >Reporter: Tao Yang >Assignee: Tao Yang > Attachments: YARN-6678.001.patch, YARN-6678.002.patch, > YARN-6678.003.patch > > > Error log: > {noformat} > java.lang.IllegalStateException: Trying to reserve container > container_e10_1495599791406_7129_01_001453 for application > appattempt_1495599791406_7129_01 when currently reserved container > container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 > #containers=40 available=... used=... > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546) > {noformat} > Reproduce this problem: > 1. nm1 re-reserved app-1/container-X1 and generated reserve proposal-1 > 2. nm2 had enough resource for app-1, un-reserved app-1/container-X1 and > allocated app-1/container-X2 > 3. nm1 reserved app-2/container-Y > 4. proposal-1 was accepted but throw IllegalStateException when applying > Currently the check code for reserve proposal in FiCaSchedulerApp#accept as > follows: > {code} > // Container reserved first time will be NEW, after the container > // accepted & confirmed, it will become RESERVED state > if (schedulerContainer.getRmContainer().getState() > == RMContainerState.RESERVED) { > // Set reReservation == true > reReservation = true; > } else { > // When reserve a resource (state == NEW is for new container, > // state == RUNNING is for increase container). > // Just check if the node is not already reserved by someone > if (schedulerContainer.getSchedulerNode().getReservedContainer() > != null) { > if (LOG.isDebugEnabled()) { > LOG.debug("Try to reserve a container, but the node is " > + "already reserved by another container=" > + schedulerContainer.getSchedulerNode() > .getReservedContainer().getContainerId()); > } > return false; > } > } > {code} > The reserved container on the node of reserve proposal will be checked only > for first-reserve container. > We should confirm that reserved container on this node is equal to re-reserve > container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060372#comment-16060372 ] Tao Yang commented on YARN-6678: Thanks [~sunilg] for your comments. {quote} 1. In FiCaSchedulerApp#accept, its better to use RMContainer#equals instead of using != {quote} As [~leftnoteasy] mentioned, it should be enough to use == to compare two instances. Are there some other concerns about this? > Committer thread crashes with IllegalStateException in async-scheduling mode > of CapacityScheduler > - > > Key: YARN-6678 > URL: https://issues.apache.org/jira/browse/YARN-6678 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.9.0, 3.0.0-alpha3 >Reporter: Tao Yang >Assignee: Tao Yang > Attachments: YARN-6678.001.patch, YARN-6678.002.patch, > YARN-6678.003.patch > > > Error log: > {noformat} > java.lang.IllegalStateException: Trying to reserve container > container_e10_1495599791406_7129_01_001453 for application > appattempt_1495599791406_7129_01 when currently reserved container > container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 > #containers=40 available=... used=... > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546) > {noformat} > Reproduce this problem: > 1. nm1 re-reserved app-1/container-X1 and generated reserve proposal-1 > 2. nm2 had enough resource for app-1, un-reserved app-1/container-X1 and > allocated app-1/container-X2 > 3. nm1 reserved app-2/container-Y > 4. proposal-1 was accepted but throw IllegalStateException when applying > Currently the check code for reserve proposal in FiCaSchedulerApp#accept as > follows: > {code} > // Container reserved first time will be NEW, after the container > // accepted & confirmed, it will become RESERVED state > if (schedulerContainer.getRmContainer().getState() > == RMContainerState.RESERVED) { > // Set reReservation == true > reReservation = true; > } else { > // When reserve a resource (state == NEW is for new container, > // state == RUNNING is for increase container). > // Just check if the node is not already reserved by someone > if (schedulerContainer.getSchedulerNode().getReservedContainer() > != null) { > if (LOG.isDebugEnabled()) { > LOG.debug("Try to reserve a container, but the node is " > + "already reserved by another container=" > + schedulerContainer.getSchedulerNode() > .getReservedContainer().getContainerId()); > } > return false; > } > } > {code} > The reserved container on the node of reserve proposal will be checked only > for first-reserve container. > We should confirm that reserved container on this node is equal to re-reserve > container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5006) ResourceManager quit due to ApplicationStateData exceed the limit size of znode in zk
[ https://issues.apache.org/jira/browse/YARN-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060355#comment-16060355 ] Naganarasimha G R commented on YARN-5006: - [~bibinchundatt] there seems to be compilation problem for branch-2 on cherry pick. Can you please check and upload patch for branch -2 > ResourceManager quit due to ApplicationStateData exceed the limit size of > znode in zk > -- > > Key: YARN-5006 > URL: https://issues.apache.org/jira/browse/YARN-5006 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0, 2.7.2 >Reporter: dongtingting >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-5006.001.patch, YARN-5006.002.patch, > YARN-5006.003.patch, YARN-5006.004.patch, YARN-5006.005.patch > > > Client submit a job, this job add 1 file into DistributedCache. when the > job is submitted, ResourceManager sotre ApplicationStateData into zk. > ApplicationStateData is exceed the limit size of znode. RM exit 1. > The related code in RMStateStore.java : > {code} > private static class StoreAppTransition > implements SingleArcTransition{ > @Override > public void transition(RMStateStore store, RMStateStoreEvent event) { > if (!(event instanceof RMStateStoreAppEvent)) { > // should never happen > LOG.error("Illegal event type: " + event.getClass()); > return; > } > ApplicationState appState = ((RMStateStoreAppEvent) > event).getAppState(); > ApplicationId appId = appState.getAppId(); > ApplicationStateData appStateData = ApplicationStateData > .newInstance(appState); > LOG.info("Storing info for app: " + appId); > try { > store.storeApplicationStateInternal(appId, appStateData); //store > the appStateData > store.notifyApplication(new RMAppEvent(appId, >RMAppEventType.APP_NEW_SAVED)); > } catch (Exception e) { > LOG.error("Error storing app: " + appId, e); > store.notifyStoreOperationFailed(e); //handle fail event, system > exit > } > }; > } > {code} > The Exception log: > {code} > ... > 2016-04-20 11:26:35,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore > AsyncDispatcher event handler: Maxed out ZK retries. Giving up! > 2016-04-20 11:26:35,732 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore > AsyncDispatcher event handler: Error storing app: > application_1461061795989_17671 > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at >
[jira] [Commented] (YARN-3659) Federation Router (hiding multiple RMs for ApplicationClientProtocol)
[ https://issues.apache.org/jira/browse/YARN-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060350#comment-16060350 ] Subru Krishnan commented on YARN-3659: -- Thanks [~giovanni.fumarola] for working on this. I looked at your patch, please find my comments below: * A general note on Javadocs - missing code/link annotations throughout. * Can you add some documentation for ROUTER_CLIENTRM_SUBMIT_RETRY. You will also need to exclude it in {{TestYarnConfigurationFields}}. * Is it possible to move the check of subcluster list being empty to {{AbstractRouterPolicy}} to prevent duplication? * A note on *isRunning* field in {{MockResourceManagerFacade}} will be useful. * Rename NO_SUBCLUSTER_MESSAGE to NO_ACTIVE_SUBCLUSTER_AVAILABLE and update the value accordingly. * Why is the visibility of {{RouterUtil}} public? I feel you can also rename it to RouterServerUtil. * Resolution of UGI can be moved to {{AbstractClientRequestInterceptor}} as again all child classes are duplicating the same code. * Would it be better to use *LRUCacheHashMap* for _clientRMProxies_? * Rename *getApplicationClientProtocolFromSubClusterId* --> *getClientRMProxyForSubCluster*. * Distinguish the log message for different exceptions for clarity. * Use the {{UniformRandomRouterPolicy}} to determine a random subcluster as that's the purpose of the policy. * {code} Client: behavior as YARN. should be Client: identical behavior as {@code ClientRMService}. {code} * Documentation for exception handling in *submitApplication* is unclear, please rephrase it. * Rename _aclTemp_ to _clientRMProxy_. * {code}"Unable to create a new application to SubCluster " should be " No response when attempting to submit the application " + applicationId + "to SubCluster " + subClusterId.getId() {code}. * The log statement for app submission can be moved to post successful submission as it's redundant now. * We should forward any exceptions we get on *forceKillApplication* and *getApplicationReport* to client. * Have a note upfront saying we have only implemented the core functionalities and rest will be done in a follow up JIRA (create and link). > Federation Router (hiding multiple RMs for ApplicationClientProtocol) > - > > Key: YARN-3659 > URL: https://issues.apache.org/jira/browse/YARN-3659 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3659.pdf, YARN-3659-YARN-2915.1.patch, > YARN-3659-YARN-2915.draft.patch > > > This JIRA tracks the design/implementation of the layer for routing > ApplicaitonClientProtocol requests to the appropriate > RM(s) in a federated YARN cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5006) ResourceManager quit due to ApplicationStateData exceed the limit size of znode in zk
[ https://issues.apache.org/jira/browse/YARN-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060341#comment-16060341 ] Hudson commented on YARN-5006: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11912 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11912/]) YARN-5006. ResourceManager quit due to ApplicationStateData exceed the (naganarasimha_gr: rev 740204b2926f49ea70596c6059582ce409fbdd90) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/StoreLimitException.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppEvent.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java > ResourceManager quit due to ApplicationStateData exceed the limit size of > znode in zk > -- > > Key: YARN-5006 > URL: https://issues.apache.org/jira/browse/YARN-5006 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0, 2.7.2 >Reporter: dongtingting >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-5006.001.patch, YARN-5006.002.patch, > YARN-5006.003.patch, YARN-5006.004.patch, YARN-5006.005.patch > > > Client submit a job, this job add 1 file into DistributedCache. when the > job is submitted, ResourceManager sotre ApplicationStateData into zk. > ApplicationStateData is exceed the limit size of znode. RM exit 1. > The related code in RMStateStore.java : > {code} > private static class StoreAppTransition > implements SingleArcTransition{ > @Override > public void transition(RMStateStore store, RMStateStoreEvent event) { > if (!(event instanceof RMStateStoreAppEvent)) { > // should never happen > LOG.error("Illegal event type: " + event.getClass()); > return; > } > ApplicationState appState = ((RMStateStoreAppEvent) > event).getAppState(); > ApplicationId appId = appState.getAppId(); > ApplicationStateData appStateData = ApplicationStateData > .newInstance(appState); > LOG.info("Storing info for app: " + appId); > try { > store.storeApplicationStateInternal(appId, appStateData); //store > the appStateData > store.notifyApplication(new RMAppEvent(appId, >RMAppEventType.APP_NEW_SAVED)); > } catch (Exception e) { > LOG.error("Error storing app: " + appId, e); > store.notifyStoreOperationFailed(e); //handle fail event, system > exit > } > }; > } > {code} > The Exception log: > {code} > ... > 2016-04-20 11:26:35,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore > AsyncDispatcher event handler: Maxed out ZK retries. Giving up! > 2016-04-20 11:26:35,732 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore > AsyncDispatcher event handler: Error storing app: > application_1461061795989_17671 > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075) > at >
[jira] [Updated] (YARN-3659) Federation Router (hiding multiple RMs for ApplicationClientProtocol)
[ https://issues.apache.org/jira/browse/YARN-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-3659: --- Attachment: YARN-3659-YARN-2915.1.patch > Federation Router (hiding multiple RMs for ApplicationClientProtocol) > - > > Key: YARN-3659 > URL: https://issues.apache.org/jira/browse/YARN-3659 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3659.pdf, YARN-3659-YARN-2915.1.patch, > YARN-3659-YARN-2915.draft.patch > > > This JIRA tracks the design/implementation of the layer for routing > ApplicaitonClientProtocol requests to the appropriate > RM(s) in a federated YARN cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6734) Ensure sub-application user is extracted & sent to timeline service
[ https://issues.apache.org/jira/browse/YARN-6734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S reassigned YARN-6734: --- Assignee: Rohith Sharma K S > Ensure sub-application user is extracted & sent to timeline service > --- > > Key: YARN-6734 > URL: https://issues.apache.org/jira/browse/YARN-6734 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Rohith Sharma K S > > After a discussion with Tez folks, we have been thinking over introducing a > table to store sub-application information. YARN-6733 > For example, if a Tez session runs for a certain period as User X and runs a > few AMs. These AMs accept DAGs from other users. Tez will execute these dags > with a doAs user. ATSv2 should store this information in a new table perhaps > called as "sub_application" table. > YARN-6733 tracks the code changes needed for table schema creation. > This jira tracks writing to that table, updating the user name fields to > include sub-application user etc. This would mean adding a field to Flow > Context which can store an additional user -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6736) Consider writing to both ats v1 & v2 from RM for smoother upgrades
[ https://issues.apache.org/jira/browse/YARN-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S reassigned YARN-6736: --- Assignee: Rohith Sharma K S > Consider writing to both ats v1 & v2 from RM for smoother upgrades > -- > > Key: YARN-6736 > URL: https://issues.apache.org/jira/browse/YARN-6736 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Rohith Sharma K S > > When the cluster is being upgraded from atsv1 to v2, it may be good to have a > brief time period during which RM writes to both atsv1 and v2. This will help > frameworks like Tez migrate more smoothly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6735) Have a way to turn off system metrics from NMs
[ https://issues.apache.org/jira/browse/YARN-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S reassigned YARN-6735: --- Assignee: Rohith Sharma K S > Have a way to turn off system metrics from NMs > -- > > Key: YARN-6735 > URL: https://issues.apache.org/jira/browse/YARN-6735 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Rohith Sharma K S > > Have a way to turn off emitting system metrics from NMs -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6736) Consider writing to both ats v1 & v2 from RM for smoother upgrades
Vrushali C created YARN-6736: Summary: Consider writing to both ats v1 & v2 from RM for smoother upgrades Key: YARN-6736 URL: https://issues.apache.org/jira/browse/YARN-6736 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vrushali C When the cluster is being upgraded from atsv1 to v2, it may be good to have a brief time period during which RM writes to both atsv1 and v2. This will help frameworks like Tez migrate more smoothly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6735) Have a way to turn off system metrics from NMs
Vrushali C created YARN-6735: Summary: Have a way to turn off system metrics from NMs Key: YARN-6735 URL: https://issues.apache.org/jira/browse/YARN-6735 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vrushali C Have a way to turn off emitting system metrics from NMs -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6734) Ensure sub-application user is extracted & sent to timeline service
Vrushali C created YARN-6734: Summary: Ensure sub-application user is extracted & sent to timeline service Key: YARN-6734 URL: https://issues.apache.org/jira/browse/YARN-6734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vrushali C After a discussion with Tez folks, we have been thinking over introducing a table to store sub-application information. YARN-6733 For example, if a Tez session runs for a certain period as User X and runs a few AMs. These AMs accept DAGs from other users. Tez will execute these dags with a doAs user. ATSv2 should store this information in a new table perhaps called as "sub_application" table. YARN-6733 tracks the code changes needed for table schema creation. This jira tracks writing to that table, updating the user name fields to include sub-application user etc. This would mean adding a field to Flow Context which can store an additional user -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6733) Add table for storing sub-application entities
[ https://issues.apache.org/jira/browse/YARN-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-6733: - Summary: Add table for storing sub-application entities (was: Support storing sub-application entities) > Add table for storing sub-application entities > -- > > Key: YARN-6733 > URL: https://issues.apache.org/jira/browse/YARN-6733 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > > After a discussion with Tez folks, we have been thinking over introducing a > table to store sub-application information. > For example, if a Tez session runs for a certain period as User X and runs a > few AMs. These AMs accept DAGs from other users. Tez will execute these dags > with a doAs user. ATSv2 should store this information in a new table perhaps > called as "sub_application" table. > This jira tracks the code changes needed for table schema creation. > I will file other jiras for writing to that table, updating the user name > fields to include sub-application user etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6733) Support storing sub-application entities
Vrushali C created YARN-6733: Summary: Support storing sub-application entities Key: YARN-6733 URL: https://issues.apache.org/jira/browse/YARN-6733 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vrushali C Assignee: Vrushali C After a discussion with Tez folks, we have been thinking over introducing a table to store sub-application information. For example, if a Tez session runs for a certain period as User X and runs a few AMs. These AMs accept DAGs from other users. Tez will execute these dags with a doAs user. ATSv2 should store this information in a new table perhaps called as "sub_application" table. This jira tracks the code changes needed for table schema creation. I will file other jiras for writing to that table, updating the user name fields to include sub-application user etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment
[ https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060140#comment-16060140 ] Yufei Gu edited comment on YARN-6625 at 6/22/17 11:52 PM: -- Thanks [~rkanter] for the review. I uploaded patch v2 as our off-line discussion. First, we don't use the solution in patch v1 since it is not a good idea to let AM access the Admin service of RM. Second, AM cannot get RM HA status thought scheduler service since there is no scheduler service in a standby RM. Patch v2 provide a solution which try RMs one by one until we find one alive RM based on the assumption that it doesn't matter if RM is active or standby in this situation since if AM redirect to a standby RM, the standby RM will do redirect to active RM anyway. The only concern is that we should make sure the RM is alive. was (Author: yufeigu): Thanks [~rkanter] for the review. I uploaded patch v2 as our off-line discussion. First, we don't use the solution in patch v1. Seems like it is not a good idea to let AM access the Admin service of RM. Meanwhile, AM cannot get RM HA status thought scheduler service since there is no scheduler service in a standby RM. Patch v2 provide a solution which try RMs one by one until we find one alive RM based on the assumption that it doesn't matter if RM is active or standby in this situation since if AM redirect to a standby RM, the standby RM will do redirect to active RM anyway. The only concern is that we should make sure the RM is alive. > yarn application -list returns a tracking URL for AM that doesn't work in > secured and HA environment > > > Key: YARN-6625 > URL: https://issues.apache.org/jira/browse/YARN-6625 > Project: Hadoop YARN > Issue Type: Bug > Components: amrmproxy >Affects Versions: 3.0.0-alpha2 >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6625.001.patch, YARN-6625.002.patch > > > The tracking URL given at the command line should work secured or not. The > tracking URLs are like http://node-2.abc.com:47014 and AM web server supposed > to redirect it to a RM address like this > http://node-1.abc.com:8088/proxy/application_1494544954891_0002/, but it > fails to do that because the connection is rejected when AM is talking to RM > admin service to get HA status. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment
[ https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060146#comment-16060146 ] Hadoop QA commented on YARN-6625: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 30s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 9s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy: The patch generated 1 new + 13 unchanged - 0 fixed = 14 total (was 13) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hadoop-yarn-server-web-proxy in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 21m 31s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6625 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874148/YARN-6625.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux ac8dd4d8289f 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6d116ff | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16225/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-web-proxy.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16225/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16225/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > yarn application -list returns a tracking URL for AM that doesn't work in > secured and HA environment > > > Key: YARN-6625 > URL: https://issues.apache.org/jira/browse/YARN-6625 >
[jira] [Commented] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment
[ https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060140#comment-16060140 ] Yufei Gu commented on YARN-6625: Thanks [~rkanter] for the review. I uploaded patch v2 as our off-line discussion. First, we don't use the solution in patch v1. Seems like it is not a good idea to let AM access the Admin service of RM. Meanwhile, AM cannot get RM HA status thought scheduler service since there is no scheduler service in a standby RM. Patch v2 provide a solution which try RMs one by one until we find one alive RM based on the assumption that it doesn't matter if RM is active or standby in this situation since if AM redirect to a standby RM, the standby RM will do redirect to active RM anyway. The only concern is that we should make sure the RM is alive. > yarn application -list returns a tracking URL for AM that doesn't work in > secured and HA environment > > > Key: YARN-6625 > URL: https://issues.apache.org/jira/browse/YARN-6625 > Project: Hadoop YARN > Issue Type: Bug > Components: amrmproxy >Affects Versions: 3.0.0-alpha2 >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6625.001.patch, YARN-6625.002.patch > > > The tracking URL given at the command line should work secured or not. The > tracking URLs are like http://node-2.abc.com:47014 and AM web server supposed > to redirect it to a RM address like this > http://node-1.abc.com:8088/proxy/application_1494544954891_0002/, but it > fails to do that because the connection is rejected when AM is talking to RM > admin service to get HA status. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060139#comment-16060139 ] Benyi Wang commented on YARN-1492: -- Hi [~ctrezzo], bq. The shared cache leverages checksuming and the node manager local cache to ensure applications can reuse resources that are already localized on node managers. Questions about the cache on Node Manager? * Could you explain how the shared cache leverage the node manager local cache in detail? * Are those shared jars marked as PUBLIC? * Could you point me the source code that handle this? > truly shared cache for jars (jobjar/libjar) > --- > > Key: YARN-1492 > URL: https://issues.apache.org/jira/browse/YARN-1492 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.0.4-alpha >Reporter: Sangjin Lee >Assignee: Chris Trezzo > Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, > shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, > shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, > YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, > YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, > YARN-1492-all-trunk-v5.patch > > > Currently there is the distributed cache that enables you to cache jars and > files so that attempts from the same job can reuse them. However, sharing is > limited with the distributed cache because it is normally on a per-job basis. > On a large cluster, sometimes copying of jobjars and libjars becomes so > prevalent that it consumes a large portion of the network bandwidth, not to > speak of defeating the purpose of "bringing compute to where data is". This > is wasteful because in most cases code doesn't change much across many jobs. > I'd like to propose and discuss feasibility of introducing a truly shared > cache so that multiple jobs from multiple users can share and cache jars. > This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment
[ https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6625: --- Attachment: YARN-6625.002.patch > yarn application -list returns a tracking URL for AM that doesn't work in > secured and HA environment > > > Key: YARN-6625 > URL: https://issues.apache.org/jira/browse/YARN-6625 > Project: Hadoop YARN > Issue Type: Bug > Components: amrmproxy >Affects Versions: 3.0.0-alpha2 >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6625.001.patch, YARN-6625.002.patch > > > The tracking URL given at the command line should work secured or not. The > tracking URLs are like http://node-2.abc.com:47014 and AM web server supposed > to redirect it to a RM address like this > http://node-1.abc.com:8088/proxy/application_1494544954891_0002/, but it > fails to do that because the connection is rejected when AM is talking to RM > admin service to get HA status. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled
[ https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-6127: -- Target Version/s: 2.9.0, 3.0.0-beta1 (was: 3.0.0-beta1) > Add support for work preserving NM restart when AMRMProxy is enabled > > > Key: YARN-6127 > URL: https://issues.apache.org/jira/browse/YARN-6127 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy, nodemanager >Reporter: Subru Krishnan >Assignee: Botong Huang > Attachments: YARN-6127-branch-2.v1.patch, YARN-6127.v1.patch, > YARN-6127.v2.patch, YARN-6127.v3.patch, YARN-6127.v4.patch > > > YARN-1336 added the ability to restart NM without loosing any running > containers. In a Federated YARN environment, there's additional state in the > {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need > to enhance {{AMRMProxy}} to support work-preserving restart. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled
[ https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060094#comment-16060094 ] Arun Suresh commented on YARN-6127: --- Not sure why Jenkins gave a compilation error. Committed to branch-2 as well. Thanks for the branch-2 patch [~botong]. It works fine locally. > Add support for work preserving NM restart when AMRMProxy is enabled > > > Key: YARN-6127 > URL: https://issues.apache.org/jira/browse/YARN-6127 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy, nodemanager >Reporter: Subru Krishnan >Assignee: Botong Huang > Attachments: YARN-6127-branch-2.v1.patch, YARN-6127.v1.patch, > YARN-6127.v2.patch, YARN-6127.v3.patch, YARN-6127.v4.patch > > > YARN-1336 added the ability to restart NM without loosing any running > containers. In a Federated YARN environment, there's additional state in the > {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need > to enhance {{AMRMProxy}} to support work-preserving restart. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6732) Could not find artifact org.apache.hadoop:hadoop-azure-datalake:jar
Miklos Szegedi created YARN-6732: Summary: Could not find artifact org.apache.hadoop:hadoop-azure-datalake:jar Key: YARN-6732 URL: https://issues.apache.org/jira/browse/YARN-6732 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi I get the following build error when resolving dependencies: {code} [INFO] BUILD FAILURE [INFO] [INFO] Total time: 04:03 min [INFO] Finished at: 2017-06-22T18:34:55+00:00 [INFO] Final Memory: 69M/167M [INFO] [ERROR] Failed to execute goal on project hadoop-tools-dist: Could not resolve dependencies for project org.apache.hadoop:hadoop-tools-dist:jar:2.9.0-SNAPSHOT: Could not find artifact org.apache.hadoop:hadoop-azure-datalake:jar:2.9.0-SNAPSHOT in apache.snapshots.https (https://repository.apache.org/content/repositories/snapshots) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hadoop-tools-dist The command '/bin/sh -c mvn dependency:resolve' returned a non-zero code: 1 {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled
[ https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060003#comment-16060003 ] Hadoop QA commented on YARN-6127: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 37s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 30s{color} | {color:red} root in branch-2 failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 19s{color} | {color:red} hadoop-yarn-server in branch-2 failed with JDK v1.8.0_131. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 14s{color} | {color:red} hadoop-yarn-server in branch-2 failed with JDK v1.7.0_131. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} branch-2 passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 46s{color} | {color:red} hadoop-yarn-server in the patch failed with JDK v1.8.0_131. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 46s{color} | {color:red} hadoop-yarn-server in the patch failed with JDK v1.8.0_131. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 58s{color} | {color:red} hadoop-yarn-server in the patch failed with JDK v1.7.0_131. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 58s{color} | {color:red} hadoop-yarn-server in the patch failed with JDK v1.7.0_131. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 0 new + 198 unchanged - 4 fixed = 198 total (was 202) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.8.0_131 with JDK v1.8.0_131 generated 0 new + 197 unchanged - 26 fixed = 197 total (was 223) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 7s{color} | {color:green} hadoop-yarn-server-tests in the patch passed with JDK v1.8.0_131. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 17s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.7.0_131. {color} | |
[jira] [Created] (YARN-6731) Add ability to export scheduler configuration XML
Jonathan Hung created YARN-6731: --- Summary: Add ability to export scheduler configuration XML Key: YARN-6731 URL: https://issues.apache.org/jira/browse/YARN-6731 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Hung Assignee: Jonathan Hung This is useful for debugging/cluster migration/peace of mind. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled
[ https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh reopened YARN-6127: --- Re-opening to test branch-2 patch > Add support for work preserving NM restart when AMRMProxy is enabled > > > Key: YARN-6127 > URL: https://issues.apache.org/jira/browse/YARN-6127 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy, nodemanager >Reporter: Subru Krishnan >Assignee: Botong Huang > Attachments: YARN-6127-branch-2.v1.patch, YARN-6127.v1.patch, > YARN-6127.v2.patch, YARN-6127.v3.patch, YARN-6127.v4.patch > > > YARN-1336 added the ability to restart NM without loosing any running > containers. In a Federated YARN environment, there's additional state in the > {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need > to enhance {{AMRMProxy}} to support work-preserving restart. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled
[ https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-6127: --- Attachment: YARN-6127-branch-2.v1.patch > Add support for work preserving NM restart when AMRMProxy is enabled > > > Key: YARN-6127 > URL: https://issues.apache.org/jira/browse/YARN-6127 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy, nodemanager >Reporter: Subru Krishnan >Assignee: Botong Huang > Attachments: YARN-6127-branch-2.v1.patch, YARN-6127.v1.patch, > YARN-6127.v2.patch, YARN-6127.v3.patch, YARN-6127.v4.patch > > > YARN-1336 added the ability to restart NM without loosing any running > containers. In a Federated YARN environment, there's additional state in the > {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need > to enhance {{AMRMProxy}} to support work-preserving restart. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6673) Add cpu cgroup configurations for opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059883#comment-16059883 ] ASF GitHub Bot commented on YARN-6673: -- Github user haibchen commented on a diff in the pull request: https://github.com/apache/hadoop/pull/232#discussion_r123601141 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsCpuResourceHandlerImpl.java --- @@ -181,16 +184,23 @@ public static boolean cpuLimitsExist(String path) @Override public List preStart(Container container) throws ResourceHandlerException { - String cgroupId = container.getContainerId().toString(); Resource containerResource = container.getResource(); cGroupsHandler.createCGroup(CPU, cgroupId); try { int containerVCores = containerResource.getVirtualCores(); - int cpuShares = CPU_DEFAULT_WEIGHT * containerVCores; - cGroupsHandler - .updateCGroupParam(CPU, cgroupId, CGroupsHandler.CGROUP_CPU_SHARES, - String.valueOf(cpuShares)); + ContainerTokenIdentifier id = container.getContainerTokenIdentifier(); + if (id != null && id.getExecutionType() == + ExecutionType.OPPORTUNISTIC) { +cGroupsHandler +.updateCGroupParam(CPU, cgroupId, CGroupsHandler.CGROUP_CPU_SHARES, +String.valueOf(CPU_DEFAULT_WEIGHT_OPPORTUNISTIC)); + } else { +int cpuShares = CPU_DEFAULT_WEIGHT * containerVCores; +cGroupsHandler +.updateCGroupParam(CPU, cgroupId, CGroupsHandler.CGROUP_CPU_SHARES, +String.valueOf(cpuShares)); + } if (strictResourceUsageMode) { --- End diff -- I see. Thanks for the clarification. > Add cpu cgroup configurations for opportunistic containers > -- > > Key: YARN-6673 > URL: https://issues.apache.org/jira/browse/YARN-6673 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Haibo Chen >Assignee: Miklos Szegedi > > In addition to setting cpu.cfs_period_us on a per-container basis, we could > also set cpu.shares to 2 for opportunistic containers so they are run on a > best-effort basis -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6673) Add cpu cgroup configurations for opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059881#comment-16059881 ] ASF GitHub Bot commented on YARN-6673: -- Github user haibchen commented on a diff in the pull request: https://github.com/apache/hadoop/pull/232#discussion_r123601076 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsCpuResourceHandlerImpl.java --- @@ -181,16 +184,23 @@ public static boolean cpuLimitsExist(String path) @Override public List preStart(Container container) throws ResourceHandlerException { - String cgroupId = container.getContainerId().toString(); Resource containerResource = container.getResource(); cGroupsHandler.createCGroup(CPU, cgroupId); try { int containerVCores = containerResource.getVirtualCores(); - int cpuShares = CPU_DEFAULT_WEIGHT * containerVCores; - cGroupsHandler - .updateCGroupParam(CPU, cgroupId, CGroupsHandler.CGROUP_CPU_SHARES, - String.valueOf(cpuShares)); + ContainerTokenIdentifier id = container.getContainerTokenIdentifier(); + if (id != null && id.getExecutionType() == + ExecutionType.OPPORTUNISTIC) { --- End diff -- yeah. In some sense, it is to be more descriptive than code reuse. Maybe we could have an inline boolean variable with such name? > Add cpu cgroup configurations for opportunistic containers > -- > > Key: YARN-6673 > URL: https://issues.apache.org/jira/browse/YARN-6673 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Haibo Chen >Assignee: Miklos Szegedi > > In addition to setting cpu.cfs_period_us on a per-container basis, we could > also set cpu.shares to 2 for opportunistic containers so they are run on a > best-effort basis -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled
[ https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059858#comment-16059858 ] Hudson commented on YARN-6127: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11908 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11908/]) YARN-6127. Add support for work preserving NM restart when AMRMProxy is (arun suresh: rev 49aa60e50d20f8c18ed6f00fa8966244536fe7da) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AbstractRequestInterceptor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/RequestInterceptor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestAMRMProxyService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyApplicationContextImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/BaseAMRMProxyTest.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyTokenSecretManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestAMRMProxyTokenSecretManager.java > Add support for work preserving NM restart when AMRMProxy is enabled > > > Key: YARN-6127 > URL: https://issues.apache.org/jira/browse/YARN-6127 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy, nodemanager >Reporter: Subru Krishnan >Assignee: Botong Huang > Attachments: YARN-6127.v1.patch, YARN-6127.v2.patch, > YARN-6127.v3.patch, YARN-6127.v4.patch > > > YARN-1336 added the ability to restart NM without loosing any running > containers. In a Federated YARN environment, there's additional state in the > {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need > to enhance {{AMRMProxy}} to support work-preserving restart. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6674) Add memory cgroup settings for opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059829#comment-16059829 ] ASF GitHub Bot commented on YARN-6674: -- Github user szegedim commented on a diff in the pull request: https://github.com/apache/hadoop/pull/240#discussion_r123589398 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java --- @@ -122,12 +128,23 @@ int getSwappiness() { cGroupsHandler.updateCGroupParam(MEMORY, cgroupId, CGroupsHandler.CGROUP_PARAM_MEMORY_HARD_LIMIT_BYTES, String.valueOf(containerHardLimit) + "M"); - cGroupsHandler.updateCGroupParam(MEMORY, cgroupId, - CGroupsHandler.CGROUP_PARAM_MEMORY_SOFT_LIMIT_BYTES, - String.valueOf(containerSoftLimit) + "M"); - cGroupsHandler.updateCGroupParam(MEMORY, cgroupId, - CGroupsHandler.CGROUP_PARAM_MEMORY_SWAPPINESS, - String.valueOf(swappiness)); + ContainerTokenIdentifier id = container.getContainerTokenIdentifier(); + if (id != null && id.getExecutionType() == + ExecutionType.OPPORTUNISTIC) { +cGroupsHandler.updateCGroupParam(MEMORY, cgroupId, +CGroupsHandler.CGROUP_PARAM_MEMORY_SOFT_LIMIT_BYTES, +String.valueOf(OPPORTUNISTIC_SOFT_LIMIT) + "M"); +cGroupsHandler.updateCGroupParam(MEMORY, cgroupId, --- End diff -- Swapping may cause issues, yes, however we cannot tell the user to turn it off. > Add memory cgroup settings for opportunistic containers > --- > > Key: YARN-6674 > URL: https://issues.apache.org/jira/browse/YARN-6674 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Miklos Szegedi > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled
[ https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059826#comment-16059826 ] Arun Suresh commented on YARN-6127: --- Committed this to trunk. Thanks [~botong] > Add support for work preserving NM restart when AMRMProxy is enabled > > > Key: YARN-6127 > URL: https://issues.apache.org/jira/browse/YARN-6127 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy, nodemanager >Reporter: Subru Krishnan >Assignee: Botong Huang > Attachments: YARN-6127.v1.patch, YARN-6127.v2.patch, > YARN-6127.v3.patch, YARN-6127.v4.patch > > > YARN-1336 added the ability to restart NM without loosing any running > containers. In a Federated YARN environment, there's additional state in the > {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need > to enhance {{AMRMProxy}} to support work-preserving restart. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5648) [ATSv2 Security] Client side changes for authentication
[ https://issues.apache.org/jira/browse/YARN-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059820#comment-16059820 ] Varun Saxena commented on YARN-5648: Findbugs warning are existing issues and unrelated. Test failures are outstanding issues on trunk > [ATSv2 Security] Client side changes for authentication > --- > > Key: YARN-5648 > URL: https://issues.apache.org/jira/browse/YARN-5648 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Attachments: YARN-5648-YARN-5355.02.patch, > YARN-5648-YARN-5355.03.patch, YARN-5648-YARN-5355.04.patch, > YARN-5648-YARN-5355.wip.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6719) Fix findbugs warnings in SLSCapacityScheduler.java
[ https://issues.apache.org/jira/browse/YARN-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059817#comment-16059817 ] Zhe Zhang commented on YARN-6719: - Removed release-blocker label since it's already committed to branch-2.7 > Fix findbugs warnings in SLSCapacityScheduler.java > -- > > Key: YARN-6719 > URL: https://issues.apache.org/jira/browse/YARN-6719 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka > Fix For: 2.9.0, 2.7.4, 2.8.2 > > Attachments: YARN-6719-branch-2.01.patch, > YARN-6719-branch-2.8-01.patch > > > There are 2 findbugs warnings in branch-2. > https://builds.apache.org/job/PreCommit-HADOOP-Build/12560/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-sls-warnings.html > {noformat} > DmFound reliance on default encoding in > org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics(): new > java.io.FileWriter(String) > Bug type DM_DEFAULT_ENCODING (click for details) > In class org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler > In method > org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics() > Called method new java.io.FileWriter(String) > At SLSCapacityScheduler.java:[line 464] > DmFound reliance on default encoding in new > org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler): > new java.io.FileWriter(String) > Bug type DM_DEFAULT_ENCODING (click for details) > In class > org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable > In method new > org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler) > Called method new java.io.FileWriter(String) > At SLSCapacityScheduler.java:[line 669] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6719) Fix findbugs warnings in SLSCapacityScheduler.java
[ https://issues.apache.org/jira/browse/YARN-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated YARN-6719: Labels: (was: release-blocker) > Fix findbugs warnings in SLSCapacityScheduler.java > -- > > Key: YARN-6719 > URL: https://issues.apache.org/jira/browse/YARN-6719 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka > Fix For: 2.9.0, 2.7.4, 2.8.2 > > Attachments: YARN-6719-branch-2.01.patch, > YARN-6719-branch-2.8-01.patch > > > There are 2 findbugs warnings in branch-2. > https://builds.apache.org/job/PreCommit-HADOOP-Build/12560/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-sls-warnings.html > {noformat} > DmFound reliance on default encoding in > org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics(): new > java.io.FileWriter(String) > Bug type DM_DEFAULT_ENCODING (click for details) > In class org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler > In method > org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics() > Called method new java.io.FileWriter(String) > At SLSCapacityScheduler.java:[line 464] > DmFound reliance on default encoding in new > org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler): > new java.io.FileWriter(String) > Bug type DM_DEFAULT_ENCODING (click for details) > In class > org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable > In method new > org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler) > Called method new java.io.FileWriter(String) > At SLSCapacityScheduler.java:[line 669] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled
[ https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059809#comment-16059809 ] Botong Huang commented on YARN-6127: Thanks [~asuresh]! YARN-6730 created to follow up. > Add support for work preserving NM restart when AMRMProxy is enabled > > > Key: YARN-6127 > URL: https://issues.apache.org/jira/browse/YARN-6127 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy, nodemanager >Reporter: Subru Krishnan >Assignee: Botong Huang > Attachments: YARN-6127.v1.patch, YARN-6127.v2.patch, > YARN-6127.v3.patch, YARN-6127.v4.patch > > > YARN-1336 added the ability to restart NM without loosing any running > containers. In a Federated YARN environment, there's additional state in the > {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need > to enhance {{AMRMProxy}} to support work-preserving restart. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6729) NM percentage-physical-cpu-limit should be always 100 if DefaultLCEResourcesHandler is used
Yufei Gu created YARN-6729: -- Summary: NM percentage-physical-cpu-limit should be always 100 if DefaultLCEResourcesHandler is used Key: YARN-6729 URL: https://issues.apache.org/jira/browse/YARN-6729 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0-alpha3 Reporter: Yufei Gu NM percentage-physical-cpu-limit is not honored in DefaultLCEResourcesHandler, which may cause container cpu usage calculation issue. e.g. container vcore usage is potentially more than 100% if percentage-physical-cpu-limit is set to a value less than 100. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6730) Make sure NM state store is not null consistently
Botong Huang created YARN-6730: -- Summary: Make sure NM state store is not null consistently Key: YARN-6730 URL: https://issues.apache.org/jira/browse/YARN-6730 Project: Hadoop YARN Issue Type: Task Reporter: Botong Huang Assignee: Botong Huang Priority: Minor In the NM statestore for NM restart, there are a lot of places where we check if the stateStore != null. This is true in the existing codebase too. Ideally, the stateStore should never be null because we have the NullStateStore implementation and we should not have to perform so many defensive checks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled
[ https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059800#comment-16059800 ] Arun Suresh edited comment on YARN-6127 at 6/22/17 6:18 PM: Will commit this shortly. One suggestion though - there are a lot of places where we check if the stateStore != null. This is true in the existing codebase too. Ideally, the stateStore should never be null and we should not have to perform so many defensive checks. [~botong], can you open a followup JIRA to fix this ? was (Author: asuresh): Will commit this shortly. One suggestion though - there are a lot of places where we check if the stateStore != null. This is true in the existing codebase too. Ideally, the stateStore should never be null and we should have to perform so many defensive checks. [~botong], can you open a followup JIRA to fix this ? > Add support for work preserving NM restart when AMRMProxy is enabled > > > Key: YARN-6127 > URL: https://issues.apache.org/jira/browse/YARN-6127 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy, nodemanager >Reporter: Subru Krishnan >Assignee: Botong Huang > Attachments: YARN-6127.v1.patch, YARN-6127.v2.patch, > YARN-6127.v3.patch, YARN-6127.v4.patch > > > YARN-1336 added the ability to restart NM without loosing any running > containers. In a Federated YARN environment, there's additional state in the > {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need > to enhance {{AMRMProxy}} to support work-preserving restart. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled
[ https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059800#comment-16059800 ] Arun Suresh commented on YARN-6127: --- Will commit this shortly. One suggestion though - there are a lot of places where we check if the stateStore != null. This is true in the existing codebase too. Ideally, the stateStore should never be null and we should have to perform so many defensive checks. [~botong], can you open a followup JIRA to fix this ? > Add support for work preserving NM restart when AMRMProxy is enabled > > > Key: YARN-6127 > URL: https://issues.apache.org/jira/browse/YARN-6127 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy, nodemanager >Reporter: Subru Krishnan >Assignee: Botong Huang > Attachments: YARN-6127.v1.patch, YARN-6127.v2.patch, > YARN-6127.v3.patch, YARN-6127.v4.patch > > > YARN-1336 added the ability to restart NM without loosing any running > containers. In a Federated YARN environment, there's additional state in the > {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need > to enhance {{AMRMProxy}} to support work-preserving restart. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5950) Create StoreConfigurationProvider to construct a Configuration from the backing store
[ https://issues.apache.org/jira/browse/YARN-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung resolved YARN-5950. - Resolution: Duplicate This was handled in YARN-5948. > Create StoreConfigurationProvider to construct a Configuration from the > backing store > - > > Key: YARN-5950 > URL: https://issues.apache.org/jira/browse/YARN-5950 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Hung > > The StoreConfigurationProvider will query the YarnConfigurationStore for > various configuration keys, and construct a Configuration object out of it > (to be passed to the scheduler, and possibly other YARN components). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6673) Add cpu cgroup configurations for opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059785#comment-16059785 ] ASF GitHub Bot commented on YARN-6673: -- Github user szegedim commented on a diff in the pull request: https://github.com/apache/hadoop/pull/232#discussion_r123583254 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsCpuResourceHandlerImpl.java --- @@ -181,16 +184,23 @@ public static boolean cpuLimitsExist(String path) @Override public List preStart(Container container) throws ResourceHandlerException { - String cgroupId = container.getContainerId().toString(); Resource containerResource = container.getResource(); cGroupsHandler.createCGroup(CPU, cgroupId); try { int containerVCores = containerResource.getVirtualCores(); - int cpuShares = CPU_DEFAULT_WEIGHT * containerVCores; - cGroupsHandler - .updateCGroupParam(CPU, cgroupId, CGroupsHandler.CGROUP_CPU_SHARES, - String.valueOf(cpuShares)); + ContainerTokenIdentifier id = container.getContainerTokenIdentifier(); + if (id != null && id.getExecutionType() == + ExecutionType.OPPORTUNISTIC) { +cGroupsHandler +.updateCGroupParam(CPU, cgroupId, CGroupsHandler.CGROUP_CPU_SHARES, +String.valueOf(CPU_DEFAULT_WEIGHT_OPPORTUNISTIC)); + } else { +int cpuShares = CPU_DEFAULT_WEIGHT * containerVCores; +cGroupsHandler +.updateCGroupParam(CPU, cgroupId, CGroupsHandler.CGROUP_CPU_SHARES, +String.valueOf(cpuShares)); + } if (strictResourceUsageMode) { --- End diff -- Yes, I think so. If the admin chooses strict cpu limits, all containers should get strict cpu limits based on vcores. Opportunistic ones still will be throttled by cpu.shares, if guaranteed are running. This is just a cap, for opportunistic containers with different thread counts not to affect each other negatively. > Add cpu cgroup configurations for opportunistic containers > -- > > Key: YARN-6673 > URL: https://issues.apache.org/jira/browse/YARN-6673 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Haibo Chen >Assignee: Miklos Szegedi > > In addition to setting cpu.cfs_period_us on a per-container basis, we could > also set cpu.shares to 2 for opportunistic containers so they are run on a > best-effort basis -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6673) Add cpu cgroup configurations for opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059787#comment-16059787 ] ASF GitHub Bot commented on YARN-6673: -- Github user szegedim commented on a diff in the pull request: https://github.com/apache/hadoop/pull/232#discussion_r123583294 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestCGroupsCpuResourceHandlerImpl.java --- @@ -294,4 +296,26 @@ public void testTeardown() throws Exception { public void testStrictResourceUsage() throws Exception { Assert.assertNull(cGroupsCpuResourceHandler.teardown()); } + + @Test + public void testOpportunistic() throws Exception { +Configuration conf = new YarnConfiguration(); + +String id = "container_01_01"; +ContainerId mockContainerId = mock(ContainerId.class); +when(mockContainerId.toString()).thenReturn(id); + +cGroupsCpuResourceHandler.bootstrap(plugin, conf); --- End diff -- OKay. > Add cpu cgroup configurations for opportunistic containers > -- > > Key: YARN-6673 > URL: https://issues.apache.org/jira/browse/YARN-6673 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Haibo Chen >Assignee: Miklos Szegedi > > In addition to setting cpu.cfs_period_us on a per-container basis, we could > also set cpu.shares to 2 for opportunistic containers so they are run on a > best-effort basis -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6673) Add cpu cgroup configurations for opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059769#comment-16059769 ] ASF GitHub Bot commented on YARN-6673: -- Github user szegedim commented on a diff in the pull request: https://github.com/apache/hadoop/pull/232#discussion_r123582374 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsCpuResourceHandlerImpl.java --- @@ -181,16 +184,23 @@ public static boolean cpuLimitsExist(String path) @Override public List preStart(Container container) throws ResourceHandlerException { - String cgroupId = container.getContainerId().toString(); Resource containerResource = container.getResource(); cGroupsHandler.createCGroup(CPU, cgroupId); try { int containerVCores = containerResource.getVirtualCores(); - int cpuShares = CPU_DEFAULT_WEIGHT * containerVCores; - cGroupsHandler - .updateCGroupParam(CPU, cgroupId, CGroupsHandler.CGROUP_CPU_SHARES, - String.valueOf(cpuShares)); + ContainerTokenIdentifier id = container.getContainerTokenIdentifier(); + if (id != null && id.getExecutionType() == + ExecutionType.OPPORTUNISTIC) { --- End diff -- Hmm this is just 2 lines of code. Adding a function would be at least 5 more lines. Ideally the container has this function but that might be too much churn for the interface. > Add cpu cgroup configurations for opportunistic containers > -- > > Key: YARN-6673 > URL: https://issues.apache.org/jira/browse/YARN-6673 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Haibo Chen >Assignee: Miklos Szegedi > > In addition to setting cpu.cfs_period_us on a per-container basis, we could > also set cpu.shares to 2 for opportunistic containers so they are run on a > best-effort basis -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6668) Use cgroup to get container resource utilization
[ https://issues.apache.org/jira/browse/YARN-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059755#comment-16059755 ] ASF GitHub Bot commented on YARN-6668: -- Github user szegedim commented on a diff in the pull request: https://github.com/apache/hadoop/pull/241#discussion_r123580271 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsResourceCalculator.java --- @@ -0,0 +1,346 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources; + +import com.google.common.annotations.VisibleForTesting; +import org.apache.hadoop.util.CpuTimeTracker; +import org.apache.hadoop.util.Shell; +import org.apache.hadoop.util.SysInfoLinux; +import org.apache.hadoop.yarn.exceptions.YarnException; +import org.apache.hadoop.yarn.util.Clock; +import org.apache.hadoop.yarn.util.ResourceCalculatorProcessTree; +import org.apache.hadoop.yarn.util.SystemClock; + +import java.io.BufferedReader; +import java.io.File; +import java.io.FileInputStream; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.InputStreamReader; +import java.math.BigInteger; +import java.nio.charset.Charset; +import java.util.function.Function; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +/** + * A cgroups file-system based Resource calculator without the process tree + * features. + */ +public class CGroupsResourceCalculator extends ResourceCalculatorProcessTree { + enum Result { +Continue, +Exit + } + private static final String PROCFS = "/proc"; + static final String CGROUP = "cgroup"; + static final String CPU_STAT = "cpuacct.stat"; + static final String MEM_STAT = "memory.usage_in_bytes"; + static final String MEMSW_STAT = "memory.memsw.usage_in_bytes"; + private static final String USER = "user "; + private static final String SYSTEM = "system "; + + private static final Pattern CGROUP_FILE_FORMAT = Pattern.compile( + "^(\\d+):([^:]+):/(.*)$"); + private final String procfsDir; + private CGroupsHandler cGroupsHandler; + + private String pid; + private File cpuStat; + private File memStat; + private File memswStat; + + private final long jiffyLengthMs; + private BigInteger processTotalJiffies = BigInteger.ZERO; + private final CpuTimeTracker cpuTimeTracker; + private Clock clock; + + private final static Object LOCK = new Object(); + private static boolean firstError = true; + + /** + * Create resource calculator for all Yarn containers. + * @throws YarnException Could not access cgroups + */ + public CGroupsResourceCalculator() throws YarnException { +this(null, PROCFS, ResourceHandlerModule.getCGroupsHandler(), +SystemClock.getInstance()); + } + + /** + * Create resource calculator for the container that has the specified pid. + * @param pid A pid from the cgroup or null for all containers + * @throws YarnException Could not access cgroups + */ + public CGroupsResourceCalculator(String pid) throws YarnException { +this(pid, PROCFS, ResourceHandlerModule.getCGroupsHandler(), +SystemClock.getInstance()); + } + + /** + * Create resource calculator for testing. + * @param pid A pid from the cgroup or null for all containers + * @param procfsDir Path to /proc or a mock /proc directory + * @param cGroupsHandler Initialized cgroups handler object + * @param clock A clock object + * @throws YarnException YarnException Could not access cgroups + */ + @VisibleForTesting + CGroupsResourceCalculator(String pid, String procfsDir, +
[jira] [Commented] (YARN-6668) Use cgroup to get container resource utilization
[ https://issues.apache.org/jira/browse/YARN-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059742#comment-16059742 ] ASF GitHub Bot commented on YARN-6668: -- Github user szegedim commented on a diff in the pull request: https://github.com/apache/hadoop/pull/241#discussion_r123577788 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java --- @@ -588,6 +589,31 @@ private void initializeProcessTrees( } /** + * Get the best process tree calculator. + * @param pId container process id + * @return process tree calculator + */ +private ResourceCalculatorProcessTree +getResourceCalculatorProcessTree(String pId) { + ResourceCalculatorProcessTree pt = null; + + // CGroups is best in perforance, so try to use it, if it is enabled + if (processTreeClass == null && --- End diff -- Impossible. CGroupsResourceCalculator relies on cgroups that are present only in node manager. ResourceCalculatorProcessTree is hadoop-yarn.common package > Use cgroup to get container resource utilization > > > Key: YARN-6668 > URL: https://issues.apache.org/jira/browse/YARN-6668 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Miklos Szegedi > > Container Monitor relies on proc file system to get container resource > utilization, which is not as efficient as reading cgroup accounting. We > should in NM, when cgroup is enabled, read cgroup stats instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6668) Use cgroup to get container resource utilization
[ https://issues.apache.org/jira/browse/YARN-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059737#comment-16059737 ] ASF GitHub Bot commented on YARN-6668: -- Github user szegedim commented on a diff in the pull request: https://github.com/apache/hadoop/pull/241#discussion_r123577119 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsResourceCalculator.java --- @@ -0,0 +1,292 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources; + +import org.apache.hadoop.util.CpuTimeTracker; +import org.apache.hadoop.util.Shell; +import org.apache.hadoop.util.SysInfoLinux; +import org.apache.hadoop.yarn.exceptions.YarnException; +import org.apache.hadoop.yarn.util.Clock; +import org.apache.hadoop.yarn.util.ResourceCalculatorProcessTree; +import org.apache.hadoop.yarn.util.SystemClock; + +import java.io.BufferedReader; +import java.io.File; +import java.io.FileInputStream; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.InputStreamReader; +import java.math.BigInteger; +import java.nio.charset.Charset; +import java.util.function.Function; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +/** + * A cgroups file-system based Resource calculator without the process tree + * features. + */ + +public class CGroupsResourceCalculator extends ResourceCalculatorProcessTree { --- End diff -- It supports aggregated container utilization by giving a null pid > Use cgroup to get container resource utilization > > > Key: YARN-6668 > URL: https://issues.apache.org/jira/browse/YARN-6668 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Miklos Szegedi > > Container Monitor relies on proc file system to get container resource > utilization, which is not as efficient as reading cgroup accounting. We > should in NM, when cgroup is enabled, read cgroup stats instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6708) Nodemanager container crash after ext3 folder limit
[ https://issues.apache.org/jira/browse/YARN-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059711#comment-16059711 ] Hadoop QA commented on YARN-6708: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 47s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 5 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 6s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 49s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 10 new + 135 unchanged - 5 fixed = 145 total (was 140) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 0 new + 226 unchanged - 1 fixed = 226 total (was 227) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 34s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 20s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 62m 47s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6708 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874105/YARN-6708.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 7558ca6f45cf 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8dbd53e | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/16223/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html | | checkstyle |
[jira] [Commented] (YARN-6714) RM crashed with IllegalStateException while handling APP_ATTEMPT_REMOVED event when async-scheduling enabled in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059710#comment-16059710 ] Sunil G commented on YARN-6714: --- [~Tao Yang] I think your fix is correct. However could you add some more comments in code for future reference. It could help to know in which cases running attemptId may mismatch with app.getApplicationAttemptId(). > RM crashed with IllegalStateException while handling APP_ATTEMPT_REMOVED > event when async-scheduling enabled in CapacityScheduler > - > > Key: YARN-6714 > URL: https://issues.apache.org/jira/browse/YARN-6714 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-alpha3 >Reporter: Tao Yang >Assignee: Tao Yang > Attachments: YARN-6714.001.patch, YARN-6714.002.patch > > > Currently in async-scheduling mode of CapacityScheduler, after AM failover > and unreserve all reserved containers, it still have chance to get and commit > the outdated reserve proposal of the failed app attempt. This problem > happened on an app in our cluster, when this app stopped, it unreserved all > reserved containers and compared these appAttemptId with current > appAttemptId, if not match it will throw IllegalStateException and make RM > crashed. > Error log: > {noformat} > 2017-06-08 11:02:24,339 FATAL [ResourceManager Event Processor] > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ATTEMPT_REMOVED to the scheduler > java.lang.IllegalStateException: Trying to unreserve for application > appattempt_1495188831758_0121_02 when currently reserved for application > application_1495188831758_0121 on node host: node1:45454 #containers=2 > available=... used=... > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.unreserveResource(FiCaSchedulerNode.java:123) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:845) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1787) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1957) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:586) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:966) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1740) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:152) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:822) > at java.lang.Thread.run(Thread.java:834) > {noformat} > When async-scheduling enabled, CapacityScheduler#doneApplicationAttempt and > CapacityScheduler#tryCommit both need to get write_lock before executing, so > we can check the app attempt state in commit process to avoid committing > outdated proposals. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059706#comment-16059706 ] Sunil G commented on YARN-6678: --- Thanks [~Tao Yang]. Nice catch. Few comments: # In {{FiCaSchedulerApp#accept}}, its better to use {{RMContainer#equals}} instead of using *!=* # In same method's debug log, please add nodeId to log as well. # In testcase named {{testCommitOutdatedReservedProposal}}, you can call {{rm.stop()}} at the end # In same test case {code} 222 while (true) { 223 if (sn1.getReservedContainer() != null) { 224 break; 225 } 226 Thread.sleep(100); 227 } {code} I prefer to reduce timeout and do certain retries. Though test case has a timeout at top level, its better to make this 10ms and to 20 or 50 retries. > Committer thread crashes with IllegalStateException in async-scheduling mode > of CapacityScheduler > - > > Key: YARN-6678 > URL: https://issues.apache.org/jira/browse/YARN-6678 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.9.0, 3.0.0-alpha3 >Reporter: Tao Yang >Assignee: Tao Yang > Attachments: YARN-6678.001.patch, YARN-6678.002.patch, > YARN-6678.003.patch > > > Error log: > {noformat} > java.lang.IllegalStateException: Trying to reserve container > container_e10_1495599791406_7129_01_001453 for application > appattempt_1495599791406_7129_01 when currently reserved container > container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 > #containers=40 available=... used=... > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546) > {noformat} > Reproduce this problem: > 1. nm1 re-reserved app-1/container-X1 and generated reserve proposal-1 > 2. nm2 had enough resource for app-1, un-reserved app-1/container-X1 and > allocated app-1/container-X2 > 3. nm1 reserved app-2/container-Y > 4. proposal-1 was accepted but throw IllegalStateException when applying > Currently the check code for reserve proposal in FiCaSchedulerApp#accept as > follows: > {code} > // Container reserved first time will be NEW, after the container > // accepted & confirmed, it will become RESERVED state > if (schedulerContainer.getRmContainer().getState() > == RMContainerState.RESERVED) { > // Set reReservation == true > reReservation = true; > } else { > // When reserve a resource (state == NEW is for new container, > // state == RUNNING is for increase container). > // Just check if the node is not already reserved by someone > if (schedulerContainer.getSchedulerNode().getReservedContainer() > != null) { > if (LOG.isDebugEnabled()) { > LOG.debug("Try to reserve a container, but the node is " > + "already reserved by another container=" > + schedulerContainer.getSchedulerNode() > .getReservedContainer().getContainerId()); > } > return false; > } > } > {code} > The reserved container on the node of reserve proposal will be checked only > for first-reserve container. > We should confirm that reserved container on this node is equal to re-reserve > container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5876) TestResourceTrackerService#testGracefulDecommissionWithApp fails intermittently on trunk
[ https://issues.apache.org/jira/browse/YARN-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059668#comment-16059668 ] Robert Kanter commented on YARN-5876: - Test failures unrelated - they fail without the patch too. > TestResourceTrackerService#testGracefulDecommissionWithApp fails > intermittently on trunk > > > Key: YARN-5876 > URL: https://issues.apache.org/jira/browse/YARN-5876 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Robert Kanter > Attachments: YARN-5876.001.patch > > > {noformat} > java.lang.AssertionError: node shouldn't be null > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertNotNull(Assert.java:621) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:750) > at > org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testGracefulDecommissionWithApp(TestResourceTrackerService.java:318) > {noformat} > Refer to https://builds.apache.org/job/PreCommit-YARN-Build/13884/testReport/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6716) Native services support for specifying component start order
[ https://issues.apache.org/jira/browse/YARN-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059659#comment-16059659 ] Jian He commented on YARN-6716: --- when we assign priority key to components, I think we may also consider the dependency order here, the dependent-upon component should have higher priorities so that its containers get the higher chance to be allocated. Basically, we may sort the components in dependency order in below code ? {code} for (Component component : app.getComponents()) { priority = getNewPriority(priority); String name = component.getName(); if (roles.containsKey(name)) { continue; } log.info("Adding component: " + name); createComponent(name, component, priority++); } {code} > Native services support for specifying component start order > > > Key: YARN-6716 > URL: https://issues.apache.org/jira/browse/YARN-6716 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-6716-yarn-native-services.001.patch, > YARN-6716-yarn-native-services.002.patch, > YARN-6716-yarn-native-services.003.patch > > > Some native services apps have components that should be started after other > components. The readiness_check and dependencies features of the native > services API are currently unimplemented, and we could use these to implement > a basic start order feature. When component B has a dependency on component > A, the AM could delay making a container request for component B until > component A's readiness check has passed (for all instances of component A). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6708) Nodemanager container crash after ext3 folder limit
[ https://issues.apache.org/jira/browse/YARN-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-6708: --- Attachment: YARN-6708.003.patch Attaching patch after following modification # Testcase addition # findbug correction # whitespace correction Please review patch attached > Nodemanager container crash after ext3 folder limit > --- > > Key: YARN-6708 > URL: https://issues.apache.org/jira/browse/YARN-6708 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Priority: Critical > Attachments: YARN-6708.001.patch, YARN-6708.002.patch, > YARN-6708.003.patch > > > Configure umask as *027* for nodemanager service user > and {{yarn.nodemanager.local-cache.max-files-per-directory}} as {{40}}. After > 4 *private* dir localization next directory will be *0/14* > Local Directory cache manager > {code} > vm2:/opt/hadoop/release/data/nmlocal/usercache/mapred/filecache # l > total 28 > drwx--x--- 7 mapred hadoop 4096 Jun 10 14:35 ./ > drwxr-s--- 4 mapred hadoop 4096 Jun 10 12:07 ../ > drwxr-x--- 3 mapred users 4096 Jun 10 14:36 0/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:15 10/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:22 11/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:27 12/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:31 13/ > {code} > *drwxr-x---* 3 mapred users 4096 Jun 10 14:36 0/ is only *750* > Nodemanager user will not be able check for localization path exists or not. > {{LocalResourcesTrackerImpl}} > {code} > case REQUEST: > if (rsrc != null && (!isResourcePresent(rsrc))) { > LOG.info("Resource " + rsrc.getLocalPath() > + " is missing, localizing it again"); > removeResource(req); > rsrc = null; > } > if (null == rsrc) { > rsrc = new LocalizedResource(req, dispatcher); > localrsrc.put(req, rsrc); > } > break; > {code} > *isResourcePresent* will always return false and same resource will be > localized to {{0}} to next unique number -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6708) Nodemanager container crash after ext3 folder limit
[ https://issues.apache.org/jira/browse/YARN-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt reassigned YARN-6708: -- Assignee: Bibin A Chundatt > Nodemanager container crash after ext3 folder limit > --- > > Key: YARN-6708 > URL: https://issues.apache.org/jira/browse/YARN-6708 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-6708.001.patch, YARN-6708.002.patch, > YARN-6708.003.patch > > > Configure umask as *027* for nodemanager service user > and {{yarn.nodemanager.local-cache.max-files-per-directory}} as {{40}}. After > 4 *private* dir localization next directory will be *0/14* > Local Directory cache manager > {code} > vm2:/opt/hadoop/release/data/nmlocal/usercache/mapred/filecache # l > total 28 > drwx--x--- 7 mapred hadoop 4096 Jun 10 14:35 ./ > drwxr-s--- 4 mapred hadoop 4096 Jun 10 12:07 ../ > drwxr-x--- 3 mapred users 4096 Jun 10 14:36 0/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:15 10/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:22 11/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:27 12/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:31 13/ > {code} > *drwxr-x---* 3 mapred users 4096 Jun 10 14:36 0/ is only *750* > Nodemanager user will not be able check for localization path exists or not. > {{LocalResourcesTrackerImpl}} > {code} > case REQUEST: > if (rsrc != null && (!isResourcePresent(rsrc))) { > LOG.info("Resource " + rsrc.getLocalPath() > + " is missing, localizing it again"); > removeResource(req); > rsrc = null; > } > if (null == rsrc) { > rsrc = new LocalizedResource(req, dispatcher); > localrsrc.put(req, rsrc); > } > break; > {code} > *isResourcePresent* will always return false and same resource will be > localized to {{0}} to next unique number -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059298#comment-16059298 ] Hadoop QA commented on YARN-6678: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 30s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 52s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAsyncScheduling | | | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6678 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874072/YARN-6678.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux bbaa151c606c 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b649519 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/16222/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16222/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16222/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Committer thread crashes with IllegalStateException in async-scheduling mode > of CapacityScheduler > - > > Key: YARN-6678 > URL: https://issues.apache.org/jira/browse/YARN-6678 >
[jira] [Updated] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-6678: --- Attachment: YARN-6678.003.patch Updated the patch without adding new method to CapacityScheduler. Thanks [~leftnoteasy] for your suggestion, it's fine to only change the spy target for the test case. > Committer thread crashes with IllegalStateException in async-scheduling mode > of CapacityScheduler > - > > Key: YARN-6678 > URL: https://issues.apache.org/jira/browse/YARN-6678 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.9.0, 3.0.0-alpha3 >Reporter: Tao Yang >Assignee: Tao Yang > Attachments: YARN-6678.001.patch, YARN-6678.002.patch, > YARN-6678.003.patch > > > Error log: > {noformat} > java.lang.IllegalStateException: Trying to reserve container > container_e10_1495599791406_7129_01_001453 for application > appattempt_1495599791406_7129_01 when currently reserved container > container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 > #containers=40 available=... used=... > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546) > {noformat} > Reproduce this problem: > 1. nm1 re-reserved app-1/container-X1 and generated reserve proposal-1 > 2. nm2 had enough resource for app-1, un-reserved app-1/container-X1 and > allocated app-1/container-X2 > 3. nm1 reserved app-2/container-Y > 4. proposal-1 was accepted but throw IllegalStateException when applying > Currently the check code for reserve proposal in FiCaSchedulerApp#accept as > follows: > {code} > // Container reserved first time will be NEW, after the container > // accepted & confirmed, it will become RESERVED state > if (schedulerContainer.getRmContainer().getState() > == RMContainerState.RESERVED) { > // Set reReservation == true > reReservation = true; > } else { > // When reserve a resource (state == NEW is for new container, > // state == RUNNING is for increase container). > // Just check if the node is not already reserved by someone > if (schedulerContainer.getSchedulerNode().getReservedContainer() > != null) { > if (LOG.isDebugEnabled()) { > LOG.debug("Try to reserve a container, but the node is " > + "already reserved by another container=" > + schedulerContainer.getSchedulerNode() > .getReservedContainer().getContainerId()); > } > return false; > } > } > {code} > The reserved container on the node of reserve proposal will be checked only > for first-reserve container. > We should confirm that reserved container on this node is equal to re-reserve > container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6714) RM crashed with IllegalStateException while handling APP_ATTEMPT_REMOVED event when async-scheduling enabled in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-6714: --- Attachment: YARN-6714.002.patch Updated the patch with moving test case to TestCapacitySchedulerAsyncScheduling. > RM crashed with IllegalStateException while handling APP_ATTEMPT_REMOVED > event when async-scheduling enabled in CapacityScheduler > - > > Key: YARN-6714 > URL: https://issues.apache.org/jira/browse/YARN-6714 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-alpha3 >Reporter: Tao Yang >Assignee: Tao Yang > Attachments: YARN-6714.001.patch, YARN-6714.002.patch > > > Currently in async-scheduling mode of CapacityScheduler, after AM failover > and unreserve all reserved containers, it still have chance to get and commit > the outdated reserve proposal of the failed app attempt. This problem > happened on an app in our cluster, when this app stopped, it unreserved all > reserved containers and compared these appAttemptId with current > appAttemptId, if not match it will throw IllegalStateException and make RM > crashed. > Error log: > {noformat} > 2017-06-08 11:02:24,339 FATAL [ResourceManager Event Processor] > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ATTEMPT_REMOVED to the scheduler > java.lang.IllegalStateException: Trying to unreserve for application > appattempt_1495188831758_0121_02 when currently reserved for application > application_1495188831758_0121 on node host: node1:45454 #containers=2 > available=... used=... > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.unreserveResource(FiCaSchedulerNode.java:123) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:845) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1787) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1957) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:586) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:966) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1740) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:152) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:822) > at java.lang.Thread.run(Thread.java:834) > {noformat} > When async-scheduling enabled, CapacityScheduler#doneApplicationAttempt and > CapacityScheduler#tryCommit both need to get write_lock before executing, so > we can check the app attempt state in commit process to avoid committing > outdated proposals. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5648) [ATSv2 Security] Client side changes for authentication
[ https://issues.apache.org/jira/browse/YARN-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059206#comment-16059206 ] Hadoop QA commented on YARN-5648: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 7s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 1s{color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 34s{color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} YARN-5355 passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 7s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in YARN-5355 has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} YARN-5355 passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 25s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 41s{color} | {color:red} hadoop-yarn-server-tests in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 69m 15s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.TestMiniYarnClusterNodeUtilization | | | hadoop.yarn.server.TestContainerManagerSecurity | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ac17dc | | JIRA Issue | YARN-5648 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874050/YARN-5648-YARN-5355.04.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml | | uname | Linux 136823d8d1d0 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | YARN-5355 / 0763450 | |
[jira] [Updated] (YARN-5076) YARN web interfaces lack XFS protection
[ https://issues.apache.org/jira/browse/YARN-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-5076: Labels: security (was: ) > YARN web interfaces lack XFS protection > --- > > Key: YARN-5076 > URL: https://issues.apache.org/jira/browse/YARN-5076 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager, timelineserver >Reporter: Jonathan Maron >Assignee: Jonathan Maron > Labels: security > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: YARN-5076.002.patch, YARN-5076.003.patch, > YARN-5076.004.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > There are web interfaces in YARN that do not provide protection against cross > frame scripting > (https://www.owasp.org/index.php/Clickjacking_Defense_Cheat_Sheet). > HADOOP-13008 provides a common filter for addressing this vulnerability, so > this filter should be integrated into the YARN web interfaces. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5648) [ATSv2 Security] Client side changes for authentication
[ https://issues.apache.org/jira/browse/YARN-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-5648: --- Attachment: YARN-5648-YARN-5355.04.patch > [ATSv2 Security] Client side changes for authentication > --- > > Key: YARN-5648 > URL: https://issues.apache.org/jira/browse/YARN-5648 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Attachments: YARN-5648-YARN-5355.02.patch, > YARN-5648-YARN-5355.03.patch, YARN-5648-YARN-5355.04.patch, > YARN-5648-YARN-5355.wip.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5648) [ATSv2 Security] Client side changes for authentication
[ https://issues.apache.org/jira/browse/YARN-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-5648: --- Attachment: (was: YARN-5648-YARN-5355.04.patch) > [ATSv2 Security] Client side changes for authentication > --- > > Key: YARN-5648 > URL: https://issues.apache.org/jira/browse/YARN-5648 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Attachments: YARN-5648-YARN-5355.02.patch, > YARN-5648-YARN-5355.03.patch, YARN-5648-YARN-5355.04.patch, > YARN-5648-YARN-5355.wip.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5648) [ATSv2 Security] Client side changes for authentication
[ https://issues.apache.org/jira/browse/YARN-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-5648: --- Attachment: YARN-5648-YARN-5355.04.patch > [ATSv2 Security] Client side changes for authentication > --- > > Key: YARN-5648 > URL: https://issues.apache.org/jira/browse/YARN-5648 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Attachments: YARN-5648-YARN-5355.02.patch, > YARN-5648-YARN-5355.03.patch, YARN-5648-YARN-5355.04.patch, > YARN-5648-YARN-5355.wip.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.
[ https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059056#comment-16059056 ] zhengchenyu commented on YARN-6728: --- Our partner [~maobaolong] advice to remove verifyAndCreateRemoteLogDir and mkdir to daemon thread. By this way container will not be stuck by defaultFs > Job will run slow when the performance of defaultFs degrades and the > log-aggregation is enable. > > > Key: YARN-6728 > URL: https://issues.apache.org/jira/browse/YARN-6728 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, yarn >Affects Versions: 2.7.1 > Environment: CentOS 7.1 hadoop-2.7.1 >Reporter: zhengchenyu > Fix For: 2.9.0, 2.7.4 > > Original Estimate: 1m > Remaining Estimate: 1m > > In our cluster, I found many map keep "NEW" state for several minutes. Here > I got the container log: > {code} > [2017-06-13T18:21:23.068+08:00] [INFO] > containermanager.application.ApplicationImpl.transition(ApplicationImpl.java > 304) [AsyncDispatcher event handler] : Adding > container_1495632926847_2459604_01_11 to application > application_1495632926847_2459604 > [2017-06-13T18:23:08.715+08:00] [INFO] > containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) > [AsyncDispatcher event handler] : Container > container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING > {code} > Then I search the log from 18:21:23.068 to 18:23:08.715. I found some > dispatch of AsyncDispather run slow, because they visit the defaultFs. Our > cluster increase to 4k node, the pressure of defaultFs increase. (Note: > log-aggregation is enable. ) > Container runs in nodemanager will invoke initApp(), then invoke > verifyAndCreateRemoteLogDir and mkdir remote log, these operation will visit > the defaultFs. So the container will be stuck here. Then application will run > slow. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.
[ https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated YARN-6728: -- Description: In our cluster, I found many map keep "NEW" state for several minutes. Here I got the container log: {code} [2017-06-13T18:21:23.068+08:00] [INFO] containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 304) [AsyncDispatcher event handler] : Adding container_1495632926847_2459604_01_11 to application application_1495632926847_2459604 [2017-06-13T18:23:08.715+08:00] [INFO] containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) [AsyncDispatcher event handler] : Container container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING {code} Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch of AsyncDispather run slow, because they visit the defaultFs. Our cluster increase to 4k node, the pressure of defaultFs increase. (Note: log-aggregation is enable. ) Container runs in nodemanager will invoke initApp(), then invoke verifyAndCreateRemoteLogDir and mkdir remote log, these operation will visit the defaultFs. So the container will be stuck here. Then application will run slow. was: In our cluster, I found many map keep "NEW" state for several minutes. Here I got the container log: {code} [2017-06-13T18:21:23.068+08:00] [INFO] containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 304) [AsyncDispatcher event handler] : Adding container_1495632926847_2459604_01_11 to application application_1495632926847_2459604 [2017-06-13T18:23:08.715+08:00] [INFO] containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) [AsyncDispatcher event handler] : Container container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING {code} Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch of AsyncDispather run slow, because they visit the defaultFs. Our cluster increase to 4k node, the pressure of defaultFs increase. (Note: log-aggregation is enable. ) Container runs in nodemanager will invoke initApp(), then invoke verifyAndCreateRemoteLogDir and mkdir remote log. So the container will be stuck here. Then application will run slow. > Job will run slow when the performance of defaultFs degrades and the > log-aggregation is enable. > > > Key: YARN-6728 > URL: https://issues.apache.org/jira/browse/YARN-6728 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, yarn >Affects Versions: 2.7.1 > Environment: CentOS 7.1 hadoop-2.7.1 >Reporter: zhengchenyu > Fix For: 2.9.0, 2.7.4 > > Original Estimate: 1m > Remaining Estimate: 1m > > In our cluster, I found many map keep "NEW" state for several minutes. Here > I got the container log: > {code} > [2017-06-13T18:21:23.068+08:00] [INFO] > containermanager.application.ApplicationImpl.transition(ApplicationImpl.java > 304) [AsyncDispatcher event handler] : Adding > container_1495632926847_2459604_01_11 to application > application_1495632926847_2459604 > [2017-06-13T18:23:08.715+08:00] [INFO] > containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) > [AsyncDispatcher event handler] : Container > container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING > {code} > Then I search the log from 18:21:23.068 to 18:23:08.715. I found some > dispatch of AsyncDispather run slow, because they visit the defaultFs. Our > cluster increase to 4k node, the pressure of defaultFs increase. (Note: > log-aggregation is enable. ) > Container runs in nodemanager will invoke initApp(), then invoke > verifyAndCreateRemoteLogDir and mkdir remote log, these operation will visit > the defaultFs. So the container will be stuck here. Then application will run > slow. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.
[ https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated YARN-6728: -- Description: In our cluster, I found many map keep "NEW" state for several minutes. Here I got the container log: {code} [2017-06-13T18:21:23.068+08:00] [INFO] containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 304) [AsyncDispatcher event handler] : Adding container_1495632926847_2459604_01_11 to application application_1495632926847_2459604 [2017-06-13T18:23:08.715+08:00] [INFO] containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) [AsyncDispatcher event handler] : Container container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING {code} Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch of AsyncDispather run slow, because they visit the defaultFs. Our cluster increase to 4k node, the pressure of defaultFs increase. (Note: log-aggregation is enable. ) Container runs in nodemanager will invoke initApp(), then invoke verifyAndCreateRemoteLogDir and mkdir remote log. So the container will be stuck here. Then application will run slow. was: In our cluster, I found many map keep "NEW" state for several minutes. Here I got the container log: {code} [2017-06-13T18:21:23.068+08:00] [INFO] containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 304) [AsyncDispatcher event handler] : Adding container_1495632926847_2459604_01_11 to application application_1495632926847_2459604 [2017-06-13T18:23:08.715+08:00] [INFO] containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) [AsyncDispatcher event handler] : Container container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING {code} Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch of AsyncDispather run slow, because they visit the defaultFs. Our cluster increase to 4k node, the pressure of defaultFs increase. (Note: log-aggregation is enable. ) > Job will run slow when the performance of defaultFs degrades and the > log-aggregation is enable. > > > Key: YARN-6728 > URL: https://issues.apache.org/jira/browse/YARN-6728 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, yarn >Affects Versions: 2.7.1 > Environment: CentOS 7.1 hadoop-2.7.1 >Reporter: zhengchenyu > Fix For: 2.9.0, 2.7.4 > > Original Estimate: 1m > Remaining Estimate: 1m > > In our cluster, I found many map keep "NEW" state for several minutes. Here > I got the container log: > {code} > [2017-06-13T18:21:23.068+08:00] [INFO] > containermanager.application.ApplicationImpl.transition(ApplicationImpl.java > 304) [AsyncDispatcher event handler] : Adding > container_1495632926847_2459604_01_11 to application > application_1495632926847_2459604 > [2017-06-13T18:23:08.715+08:00] [INFO] > containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) > [AsyncDispatcher event handler] : Container > container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING > {code} > Then I search the log from 18:21:23.068 to 18:23:08.715. I found some > dispatch of AsyncDispather run slow, because they visit the defaultFs. Our > cluster increase to 4k node, the pressure of defaultFs increase. (Note: > log-aggregation is enable. ) > Container runs in nodemanager will invoke initApp(), then invoke > verifyAndCreateRemoteLogDir and mkdir remote log. So the container will be > stuck here. Then application will run slow. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.
[ https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated YARN-6728: -- Description: In our cluster, I found many map keep "NEW" state for several minutes. Here I got the container log: {code} [2017-06-13T18:21:23.068+08:00] [INFO] containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 304) [AsyncDispatcher event handler] : Adding container_1495632926847_2459604_01_11 to application application_1495632926847_2459604 [2017-06-13T18:23:08.715+08:00] [INFO] containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) [AsyncDispatcher event handler] : Container container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING {code} Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch of AsyncDispather run slow, because they visit the defaultFs. Our cluster increase to 4k node, the pressure of defaultFs increase. (Note: log-aggregation is enable. ) was: In our cluster, I found many map keep "NEW" state for several minutes. Here I got the container log: {code} [2017-06-13T18:21:23.068+08:00] [INFO] containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 304) [AsyncDispatcher event handler] : Adding container_1495632926847_2459604_01_11 to application application_1495632926847_2459604 [2017-06-13T18:23:08.715+08:00] [INFO] containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) [AsyncDispatcher event handler] : Container container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING {code} Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch of AsyncDispather run slow, because they visit the defaultFs. Our cluster increase to 4k node, the pressure of defaultFs increase. (Note: we ) > Job will run slow when the performance of defaultFs degrades and the > log-aggregation is enable. > > > Key: YARN-6728 > URL: https://issues.apache.org/jira/browse/YARN-6728 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, yarn >Affects Versions: 2.7.1 > Environment: CentOS 7.1 hadoop-2.7.1 >Reporter: zhengchenyu > Fix For: 2.9.0, 2.7.4 > > Original Estimate: 1m > Remaining Estimate: 1m > > In our cluster, I found many map keep "NEW" state for several minutes. Here > I got the container log: > {code} > [2017-06-13T18:21:23.068+08:00] [INFO] > containermanager.application.ApplicationImpl.transition(ApplicationImpl.java > 304) [AsyncDispatcher event handler] : Adding > container_1495632926847_2459604_01_11 to application > application_1495632926847_2459604 > [2017-06-13T18:23:08.715+08:00] [INFO] > containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) > [AsyncDispatcher event handler] : Container > container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING > {code} > Then I search the log from 18:21:23.068 to 18:23:08.715. I found some > dispatch of AsyncDispather run slow, because they visit the defaultFs. Our > cluster increase to 4k node, the pressure of defaultFs increase. (Note: > log-aggregation is enable. ) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.
[ https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated YARN-6728: -- Description: In our cluster, I found many map keep "NEW" state for several minutes. Here I got the container log: {code} [2017-06-13T18:21:23.068+08:00] [INFO] containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 304) [AsyncDispatcher event handler] : Adding container_1495632926847_2459604_01_11 to application application_1495632926847_2459604 [2017-06-13T18:23:08.715+08:00] [INFO] containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) [AsyncDispatcher event handler] : Container container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING {code} Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch of AsyncDispather run slow, because they visit the defaultFs. Our cluster increase to 4k node, the pressure of defaultFs increase. (Note: we ) was:Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable. > Job will run slow when the performance of defaultFs degrades and the > log-aggregation is enable. > > > Key: YARN-6728 > URL: https://issues.apache.org/jira/browse/YARN-6728 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, yarn >Affects Versions: 2.7.1 > Environment: CentOS 7.1 hadoop-2.7.1 >Reporter: zhengchenyu > Fix For: 2.9.0, 2.7.4 > > Original Estimate: 1m > Remaining Estimate: 1m > > In our cluster, I found many map keep "NEW" state for several minutes. Here > I got the container log: > {code} > [2017-06-13T18:21:23.068+08:00] [INFO] > containermanager.application.ApplicationImpl.transition(ApplicationImpl.java > 304) [AsyncDispatcher event handler] : Adding > container_1495632926847_2459604_01_11 to application > application_1495632926847_2459604 > [2017-06-13T18:23:08.715+08:00] [INFO] > containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) > [AsyncDispatcher event handler] : Container > container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING > {code} > Then I search the log from 18:21:23.068 to 18:23:08.715. I found some > dispatch of AsyncDispather run slow, because they visit the defaultFs. Our > cluster increase to 4k node, the pressure of defaultFs increase. (Note: we ) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.
zhengchenyu created YARN-6728: - Summary: Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable. Key: YARN-6728 URL: https://issues.apache.org/jira/browse/YARN-6728 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, yarn Affects Versions: 2.7.1 Environment: CentOS 7.1 hadoop-2.7.1 Reporter: zhengchenyu Fix For: 2.9.0, 2.7.4 Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2919) Potential race between renew and cancel in DelegationTokenRenwer
[ https://issues.apache.org/jira/browse/YARN-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058972#comment-16058972 ] Hadoop QA commented on YARN-2919: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 7s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 40s{color} | {color:red} hadoop-common-project/hadoop-common in trunk has 17 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 3s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 5s{color} | {color:orange} root: The patch generated 9 new + 101 unchanged - 0 fixed = 110 total (was 101) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 7s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 44m 49s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}142m 23s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.ha.TestActiveStandbyElectorRealZK | | | hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService | | | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | | Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-2919 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874013/YARN-2919.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 80a3a2a3ab73 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9ae9467 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/16220/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16220/artifact/patchprocess/diff-checkstyle-root.txt | | unit |
[jira] [Commented] (YARN-6727) Improve getQueueUserAcls API to query for specific queue and user
[ https://issues.apache.org/jira/browse/YARN-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058945#comment-16058945 ] Naganarasimha G R commented on YARN-6727: - Thanks [~bibinchundatt] for raising this issue. Agree that it can be simplified in the server side it gets simplified. > Improve getQueueUserAcls API to query for specific queue and user > -- > > Key: YARN-6727 > URL: https://issues.apache.org/jira/browse/YARN-6727 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Currently {{ApplicationClientProtocol#getQueueUserAcls}} return data for all > the queues available in scheduler for user. > User wants to know whether he has rights of a particular queue only. For > systems with 5K queues returning all queues list is not efficient. > Suggested change: support additional parameters *userName and queueName* as > optional. Admin user should be able to query other users ACL for a particular > queueName. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6727) Improve getQueueUserAcls API to query for specific queue and user
[ https://issues.apache.org/jira/browse/YARN-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-6727: --- Summary: Improve getQueueUserAcls API to query for specific queue and user (was: Improve getQueueUserAcls API to queery for specific queue and user) > Improve getQueueUserAcls API to query for specific queue and user > -- > > Key: YARN-6727 > URL: https://issues.apache.org/jira/browse/YARN-6727 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Currently {{ApplicationClientProtocol#getQueueUserAcls}} return data for all > the queues available in scheduler for user. > User wants to know whether he has rights of a particular queue only. For > systems with 5K queues returning all queues list is not efficient. > Suggested change: support additional parameters *userName and queueName* as > optional. Admin user should be able to query other users ACL for a particular > queueName. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6727) Improve getQueueUserAcls API to queery for specific queue and user
Bibin A Chundatt created YARN-6727: -- Summary: Improve getQueueUserAcls API to queery for specific queue and user Key: YARN-6727 URL: https://issues.apache.org/jira/browse/YARN-6727 Project: Hadoop YARN Issue Type: Improvement Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Currently {{ApplicationClientProtocol#getQueueUserAcls}} return data for all the queues available in scheduler for user. User wants to know whether he has rights of a particular queue only. For systems with 5K queues returning all queues list is not efficient. Suggested change: support additional parameters *userName and queueName* as optional. Admin user should be able to query other users ACL for a particular queueName. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5648) [ATSv2 Security] Client side changes for authentication
[ https://issues.apache.org/jira/browse/YARN-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058839#comment-16058839 ] Varun Saxena commented on YARN-5648: Checkstyle failure is related. Will update a patch. > [ATSv2 Security] Client side changes for authentication > --- > > Key: YARN-5648 > URL: https://issues.apache.org/jira/browse/YARN-5648 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Attachments: YARN-5648-YARN-5355.02.patch, > YARN-5648-YARN-5355.03.patch, YARN-5648-YARN-5355.wip.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org