----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/58109/ -----------------------------------------------------------
(Updated March 31, 2017, 9:16 p.m.) Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas. Bugs: AMBARI-20646 https://issues.apache.org/jira/browse/AMBARI-20646 Repository: ambari Description ------- When creating a massive request (a rolling upgrade on a cluster with 1000 nodes), the size of the request seems to slow down the {{ActionScheduler}}. Each command was taking between 1 to 2 minutes to run (even server-side tasks). The cause of this can be seen in the following two stack traces: {code:title=ActionSchedulerImpl} at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84) at org.apache.ambari.server.actionmanager.Stage.<init>(Stage.java:157) at org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72) at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getStagesInProgress(ActionDBAccessorImpl.java:303) at org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:341) at org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:302) at java.lang.Thread.run(Thread.java:745) {code} {code:title=Server Action Executor} at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700) at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84) at org.apache.ambari.server.actionmanager.Stage.<init>(Stage.java:157) at org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72) at org.apache.ambari.server.actionmanager.Request.<init>(Request.java:199) at org.apache.ambari.server.actionmanager.Request$$FastClassByGuice$$9071e03.newInstance(<generated>) at com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40) at com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60) at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85) at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254) at com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024) at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974) at com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632) at com.sun.proxy.$Proxy26.createExisting(Unknown Source) at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getRequests(ActionDBAccessorImpl.java:784) at org.apache.ambari.server.serveraction.ServerActionExecutor.cleanRequestShareDataContexts(ServerActionExecutor.java:259) - locked <0x00007ff0a14083c8> (a java.util.HashMap) at org.apache.ambari.server.serveraction.ServerActionExecutor.doWork(ServerActionExecutor.java:454) at org.apache.ambari.server.serveraction.ServerActionExecutor$1.run(ServerActionExecutor.java:160) at java.lang.Thread.run(Thread.java:745) {code} It's clear from these stacks that every {{PENDING}} stage (roughly 15,000) were being loaded into memory every second (and their accompanying task as well). This makes no sense as these methods don't need all stages - just the _next_ stage. This is because all stages are synchronous within a single request. The proposed solution is to fix the {{StageEntity.findByCommandStatuses}} call so it doesn't return every stage: {code} SELECT stage.requestid, MIN(stage.stageid) FROM stageentity stage, hostrolecommandentity hrc WHERE hrc.status IN :statuses AND hrc.stageid = stage.stageid AND hrc.requestid = stage.requestid GROUP BY stage.requestid {code} Diffs ----- ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java 9325d03 ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java ab4feaa ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java 0984c5c ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java 5151fb3 ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java f68338f ambari-server/src/main/java/org/apache/ambari/server/serveraction/ServerActionExecutor.java b0be6b3 ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionDBAccessorImpl.java 81eef3b ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionScheduler.java 2b5d2f3 ambari-server/src/test/java/org/apache/ambari/server/orm/dao/RequestDAOTest.java 9b62671 ambari-server/src/test/java/org/apache/ambari/server/serveraction/ServerActionExecutorTest.java 44d5b63 ambari-server/src/test/java/org/apache/ambari/server/state/services/RetryUpgradeActionServiceTest.java e2ce6e7 Diff: https://reviews.apache.org/r/58109/diff/2/ Testing ------- Tests run: 4976, Failures: 0, Errors: 0, Skipped: 39 [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 17:49 min [INFO] Finished at: 2017-03-31T12:58:22-04:00 [INFO] Final Memory: 59M/664M [INFO] ------------------------------------------------------------------------ Thanks, Jonathan Hurley