On Dec. 15, 2016, 4:49 p.m., Jaimin Jetly wrote:
> > One thing I don't quite see here (and it could be due to the size of the 
> > patch) is what happens in these two cases:
> > - Something goes wrong when trying to store a task's status. How does the 
> > system recover and mark it completed?
> > - What about waiting until a request is HOLDING and then restarting Ambari 
> > - will the relevent maps get re-populated?
> 
> Jaimin Jetly wrote:
>     >> - Something goes wrong when trying to store a task's status. How does 
> the system recover and mark it completed?
>     
>     This work only adds logic to add/update stage and request status. The way 
> task status is being updated or the logic for system to recover from anything 
> that goes wrong when storing task status is not changed.
>     This work ensures that task status update, respective stage status update 
> and respective request status update happens inside same transactional 
> boundary. Thus all three entities remains consistent in the status they show. 
>     This work does not add any recovery logic and piggybacks on existing 
> failure recovery mechanism for updating task status. Thus if something goes 
> wrong storing task status then stage/request status will also be not store 
> and vice-versa. Next time when ambari-agent sends command reports again then 
> task update and respective stage/request status update should also also get 
> updated successfully. 
>     
>     >> -  What about waiting until a request is HOLDING and then restarting 
> Ambari - will the relevent maps get re-populated?
>     
>     Yes, everytime ambari-server starts, we check for all stages in 
> HostRoleStatus.IN_PROGRESS_STATUSES and publish an event with the tasks. 
> these will repopulate the maps. The patch adds that logic with 
> "publishInProgressTasks(stages)" method in ActionScheduler.java
>     
>     I have tested that scenario of restarting ambari-server when a request is 
> ongoing with in progress tasks and validating that stages and requests status 
> is correctly updated

The agent re-sending command reports are we generally fault tolerant in that 
area? Sicne you are probably mostly up-to-date on that part of the code can you 
shed some light on loss of task status scenario. Do we recover the correct 
status?


- Sid


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53686/#review159315
-----------------------------------------------------------


On Jan. 18, 2017, 10:33 p.m., Jaimin Jetly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53686/
> -----------------------------------------------------------
> 
> (Updated Jan. 18, 2017, 10:33 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley, Nate Cole, Sumit Mohanty, and Sid 
> Wagle.
> 
> 
> Bugs: AMBARI-18868
>     https://issues.apache.org/jira/browse/AMBARI-18868
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Stage and Request status should be persisted in the database.
> 
> upgrading to ambari-3.0.0 should add status for all present stages and 
> request for the cluster.
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
>  7837a7b 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
>  dabcb98 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleStatus.java
>  3656bfe 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/Request.java
>  31e11c1 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/Stage.java 
> 4a05b32 
>   
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java
>  3c415df 
>   
> ambari-server/src/main/java/org/apache/ambari/server/events/TaskCreateEvent.java
>  PRE-CREATION 
>   ambari-server/src/main/java/org/apache/ambari/server/events/TaskEvent.java 
> PRE-CREATION 
>   
> ambari-server/src/main/java/org/apache/ambari/server/events/TaskUpdateEvent.java
>  PRE-CREATION 
>   
> ambari-server/src/main/java/org/apache/ambari/server/events/listeners/tasks/TaskStatusListener.java
>  PRE-CREATION 
>   
> ambari-server/src/main/java/org/apache/ambari/server/events/publishers/TaskEventPublisher.java
>  PRE-CREATION 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java
>  02c4091 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/dao/RequestDAO.java 
> 1c4d0a3 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java 
> d2f899f 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/HostRoleCommandEntity.java
>  74271b9 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/RequestEntity.java
>  7944d21 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java
>  f9c8810 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntityPK.java
>  9ca0470 
>   
> ambari-server/src/main/java/org/apache/ambari/server/upgrade/UpgradeCatalog300.java
>  4f90ef3 
>   ambari-server/src/main/resources/Ambari-DDL-Derby-CREATE.sql b79c945 
>   ambari-server/src/main/resources/Ambari-DDL-MySQL-CREATE.sql 1c502bc 
>   ambari-server/src/main/resources/Ambari-DDL-Oracle-CREATE.sql c6d4ad0 
>   ambari-server/src/main/resources/Ambari-DDL-Postgres-CREATE.sql 1be87bb 
>   ambari-server/src/main/resources/Ambari-DDL-SQLAnywhere-CREATE.sql abe48e8 
>   ambari-server/src/main/resources/Ambari-DDL-SQLServer-CREATE.sql 169a464 
>   
> ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionDBAccessorImpl.java
>  1ca777d 
>   
> ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionScheduler.java
>  6cc511e 
>   
> ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UpgradeResourceProviderTest.java
>  a702e6f 
>   
> ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UpgradeSummaryResourceProviderTest.java
>  e398a54 
>   
> ambari-server/src/test/java/org/apache/ambari/server/events/listeners/tasks/TaskStatusListenerTest.java
>  PRE-CREATION 
>   
> ambari-server/src/test/java/org/apache/ambari/server/orm/dao/UpgradeDAOTest.java
>  ae85241 
>   
> ambari-server/src/test/java/org/apache/ambari/server/state/services/RetryUpgradeActionServiceTest.java
>  7dd9932 
>   
> ambari-server/src/test/java/org/apache/ambari/server/upgrade/UpgradeCatalog300Test.java
>  d7979e8 
> 
> Diff: https://reviews.apache.org/r/53686/diff/
> 
> 
> Testing
> -------
> 
> Verified manually on a cluster by making api requests and upgrading ambari.
> Add unit tests.
> Verified that the patch does not break any existing unit tests on dev box. 
> Jenkins job overall unit test result pending..
> Verified on 1000 node cluster that the patch does not regress big operations. 
> Executed Stop Services and Start Services API call which gernerated around 
> 9000 tasks and compared  request completion time for these operations. There 
> was a minor performance gain with the patch. As part of 
> https://issues.apache.org/jira/browse/AMBARI-18889, I will look if we can use 
> the request status and stage status to further enhance performance.
> 
> 
> Thanks,
> 
> Jaimin Jetly
> 
>

Reply via email to