Thanks for the explanation Raj! Its more clear now On Thu, Feb 12, 2015 at 6:56 AM, Rajkumar Rajaratnam <rajkum...@wso2.com> wrote:
> Hi Sajith, > > Please find my comments inline. > > On Thu, Feb 12, 2015 at 1:06 AM, Sajith Kariyawasam <saj...@wso2.com> > wrote: > >> Hi Devs, >> >> While testing group scaling, I noticed when scaling down it takes 30 >> minutes from the moment scaling rule decides to terminate an instance. >> >> An active member, which was selected by the rule, first moves to a >> "termination pending member map", and after a certain period >> (terminationPendingMemberExpiryTime) that member >> moves to an "obsolete member map". Then by the obsolete check rule, that >> member will be terminated via cloud controller. >> >> It seems because of the property terminationPendingMemberExpiryTime, >> default value of which is 30 minutes, this takes that amount of time to get >> terminated >> >> Sorry for asking, I might have missed some past discussions regarding >> this, could someone explain the purpose of moving the member to an >> intermediary map "termination pending member map", rather than moving >> directly to "obsolete member map"? >> > > The reason is to avoid event lost and graceful termination. Let me explain > the logic. > > - When scaling down, AS will move the member from "active member list" > to "termination pending member map". > - There is a drool-rule "Cleanup Instances which are pending > termination" which will run periodically and take all the members which are > in "termination pending member map" and publish instance clean up event. > - When CA receives instance clean up event, it will publish instance > ready to shutdown. > - When CC receives instance ready to shutdown event, it will publish > member ready to shutdown. > - When AS receives member ready to shutdown event, it will move the > member from "termination pending member map" to "obsolete member map". > - Hence, until AS receives member ready to shutdown event, it will > keep publishing instance clean up event in every cluster monitor interval > (drool is running) > - If AS is not receiving member ready to shutdown event for a member > in "termination pending member map" within 30 min (upper limit), this > member will be moved to obsolete list without waiting for the member ready > to shutdown event. > > The reason for this complete cycle is graceful termination. If we put the > member into "obsolete member map", it will not be terminated gracefully. > > The reason why we are moving the member from "active member list" to > "termination pending member map" is to avoid event lost. We have had > situations where some event is lost in the above cycle. These events are > published only once. If we lost one event in this cycle, that member will > not be terminated forever. That is why we are putting the member in the > map. In every cluster monitor interval, we are taking all the members in > the "termination pending member map" and send the instance clean up event. > This will overcome event lost. > > 30 min is the upper limit, maximum time a member can resides in > "termination pending member map". You have faced the edge scenario, where > AS didn't receive the member ready to shutdown event. So AS took 30 min to > move the member to obsolete list. > >> >> Also, is terminationPendingMemberExpiryTime parameter configurable? >> (seems not) , and any reason for it to set to 30 minutes? >> > > This is not configurable yet. But other member list/map expiry times are > configurable AFAIR. > >> >> Further, we should make sleep times of PendingMemberWatcher, >> ObsoletedMemberWatcher and TerminationPendingMemberWatcher configurable. >> WDYT? >> > > Yes we have to. > > >> >> We need to document those configurable parameters as well, @Mari please >> note. >> >> >> Thanks, >> Sajith >> >> >> > > > -- > Rajkumar Rajaratnam > Committer & PMC Member, Apache Stratos > Software Engineer, WSO2 > > Mobile : +94777568639 > Blog : rajkumarr.com >