On Dec. 15, 2016, 4:49 p.m., Jaimin Jetly wrote: > > One thing I don't quite see here (and it could be due to the size of the > > patch) is what happens in these two cases: > > - Something goes wrong when trying to store a task's status. How does the > > system recover and mark it completed? > > - What about waiting until a request is HOLDING and then restarting Ambari > > - will the relevent maps get re-populated? > > Jaimin Jetly wrote: > >> - Something goes wrong when trying to store a task's status. How does > the system recover and mark it completed? > > This work only adds logic to add/update stage and request status. The way > task status is being updated or the logic for system to recover from anything > that goes wrong when storing task status is not changed. > This work ensures that task status update, respective stage status update > and respective request status update happens inside same transactional > boundary. Thus all three entities remains consistent in the status they show. > This work does not add any recovery logic and piggybacks on existing > failure recovery mechanism for updating task status. Thus if something goes > wrong storing task status then stage/request status will also be not store > and vice-versa. Next time when ambari-agent sends command reports again then > task update and respective stage/request status update should also also get > updated successfully. > > >> - What about waiting until a request is HOLDING and then restarting > Ambari - will the relevent maps get re-populated? > > Yes, everytime ambari-server starts, we check for all stages in > HostRoleStatus.IN_PROGRESS_STATUSES and publish an event with the tasks. > these will repopulate the maps. The patch adds that logic with > "publishInProgressTasks(stages)" method in ActionScheduler.java > > I have tested that scenario of restarting ambari-server when a request is > ongoing with in progress tasks and validating that stages and requests status > is correctly updated
The agent re-sending command reports are we generally fault tolerant in that area? Sicne you are probably mostly up-to-date on that part of the code can you shed some light on loss of task status scenario. Do we recover the correct status? - Sid ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/53686/#review159315 ----------------------------------------------------------- On Jan. 18, 2017, 10:33 p.m., Jaimin Jetly wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/53686/ > ----------------------------------------------------------- > > (Updated Jan. 18, 2017, 10:33 p.m.) > > > Review request for Ambari, Jonathan Hurley, Nate Cole, Sumit Mohanty, and Sid > Wagle. > > > Bugs: AMBARI-18868 > https://issues.apache.org/jira/browse/AMBARI-18868 > > > Repository: ambari > > > Description > ------- > > Stage and Request status should be persisted in the database. > > upgrading to ambari-3.0.0 should add status for all present stages and > request for the cluster. > > > Diffs > ----- > > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java > 7837a7b > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java > dabcb98 > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleStatus.java > 3656bfe > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/Request.java > 31e11c1 > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/Stage.java > 4a05b32 > > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java > 3c415df > > ambari-server/src/main/java/org/apache/ambari/server/events/TaskCreateEvent.java > PRE-CREATION > ambari-server/src/main/java/org/apache/ambari/server/events/TaskEvent.java > PRE-CREATION > > ambari-server/src/main/java/org/apache/ambari/server/events/TaskUpdateEvent.java > PRE-CREATION > > ambari-server/src/main/java/org/apache/ambari/server/events/listeners/tasks/TaskStatusListener.java > PRE-CREATION > > ambari-server/src/main/java/org/apache/ambari/server/events/publishers/TaskEventPublisher.java > PRE-CREATION > > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java > 02c4091 > > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/RequestDAO.java > 1c4d0a3 > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java > d2f899f > > ambari-server/src/main/java/org/apache/ambari/server/orm/entities/HostRoleCommandEntity.java > 74271b9 > > ambari-server/src/main/java/org/apache/ambari/server/orm/entities/RequestEntity.java > 7944d21 > > ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java > f9c8810 > > ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntityPK.java > 9ca0470 > > ambari-server/src/main/java/org/apache/ambari/server/upgrade/UpgradeCatalog300.java > 4f90ef3 > ambari-server/src/main/resources/Ambari-DDL-Derby-CREATE.sql b79c945 > ambari-server/src/main/resources/Ambari-DDL-MySQL-CREATE.sql 1c502bc > ambari-server/src/main/resources/Ambari-DDL-Oracle-CREATE.sql c6d4ad0 > ambari-server/src/main/resources/Ambari-DDL-Postgres-CREATE.sql 1be87bb > ambari-server/src/main/resources/Ambari-DDL-SQLAnywhere-CREATE.sql abe48e8 > ambari-server/src/main/resources/Ambari-DDL-SQLServer-CREATE.sql 169a464 > > ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionDBAccessorImpl.java > 1ca777d > > ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionScheduler.java > 6cc511e > > ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UpgradeResourceProviderTest.java > a702e6f > > ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UpgradeSummaryResourceProviderTest.java > e398a54 > > ambari-server/src/test/java/org/apache/ambari/server/events/listeners/tasks/TaskStatusListenerTest.java > PRE-CREATION > > ambari-server/src/test/java/org/apache/ambari/server/orm/dao/UpgradeDAOTest.java > ae85241 > > ambari-server/src/test/java/org/apache/ambari/server/state/services/RetryUpgradeActionServiceTest.java > 7dd9932 > > ambari-server/src/test/java/org/apache/ambari/server/upgrade/UpgradeCatalog300Test.java > d7979e8 > > Diff: https://reviews.apache.org/r/53686/diff/ > > > Testing > ------- > > Verified manually on a cluster by making api requests and upgrading ambari. > Add unit tests. > Verified that the patch does not break any existing unit tests on dev box. > Jenkins job overall unit test result pending.. > Verified on 1000 node cluster that the patch does not regress big operations. > Executed Stop Services and Start Services API call which gernerated around > 9000 tasks and compared request completion time for these operations. There > was a minor performance gain with the patch. As part of > https://issues.apache.org/jira/browse/AMBARI-18889, I will look if we can use > the request status and stage status to further enhance performance. > > > Thanks, > > Jaimin Jetly > >