Hi, I made *terminationPendingMemberExp**iry**Time* configurable via autoscaler.xml, like other expiry timeouts.
Thanks. On Thu, Feb 12, 2015 at 12:58 PM, Sajith Kariyawasam <[email protected]> wrote: > Thanks for the explanation Raj! Its more clear now > > On Thu, Feb 12, 2015 at 6:56 AM, Rajkumar Rajaratnam <[email protected]> > wrote: > >> Hi Sajith, >> >> Please find my comments inline. >> >> On Thu, Feb 12, 2015 at 1:06 AM, Sajith Kariyawasam <[email protected]> >> wrote: >> >>> Hi Devs, >>> >>> While testing group scaling, I noticed when scaling down it takes 30 >>> minutes from the moment scaling rule decides to terminate an instance. >>> >>> An active member, which was selected by the rule, first moves to a >>> "termination pending member map", and after a certain period >>> (terminationPendingMemberExpiryTime) that member >>> moves to an "obsolete member map". Then by the obsolete check rule, that >>> member will be terminated via cloud controller. >>> >>> It seems because of the property terminationPendingMemberExpiryTime, >>> default value of which is 30 minutes, this takes that amount of time to get >>> terminated >>> >>> Sorry for asking, I might have missed some past discussions regarding >>> this, could someone explain the purpose of moving the member to an >>> intermediary map "termination pending member map", rather than moving >>> directly to "obsolete member map"? >>> >> >> The reason is to avoid event lost and graceful termination. Let me >> explain the logic. >> >> - When scaling down, AS will move the member from "active member >> list" to "termination pending member map". >> - There is a drool-rule "Cleanup Instances which are pending >> termination" which will run periodically and take all the members which >> are >> in "termination pending member map" and publish instance clean up event. >> - When CA receives instance clean up event, it will publish instance >> ready to shutdown. >> - When CC receives instance ready to shutdown event, it will publish >> member ready to shutdown. >> - When AS receives member ready to shutdown event, it will move the >> member from "termination pending member map" to "obsolete member map". >> - Hence, until AS receives member ready to shutdown event, it will >> keep publishing instance clean up event in every cluster monitor interval >> (drool is running) >> - If AS is not receiving member ready to shutdown event for a member >> in "termination pending member map" within 30 min (upper limit), this >> member will be moved to obsolete list without waiting for the member ready >> to shutdown event. >> >> The reason for this complete cycle is graceful termination. If we put the >> member into "obsolete member map", it will not be terminated gracefully. >> >> The reason why we are moving the member from "active member list" to >> "termination pending member map" is to avoid event lost. We have had >> situations where some event is lost in the above cycle. These events are >> published only once. If we lost one event in this cycle, that member will >> not be terminated forever. That is why we are putting the member in the >> map. In every cluster monitor interval, we are taking all the members in >> the "termination pending member map" and send the instance clean up event. >> This will overcome event lost. >> >> 30 min is the upper limit, maximum time a member can resides in >> "termination pending member map". You have faced the edge scenario, where >> AS didn't receive the member ready to shutdown event. So AS took 30 min to >> move the member to obsolete list. >> >>> >>> Also, is terminationPendingMemberExpiryTime parameter configurable? >>> (seems not) , and any reason for it to set to 30 minutes? >>> >> >> This is not configurable yet. But other member list/map expiry times are >> configurable AFAIR. >> >>> >>> Further, we should make sleep times of PendingMemberWatcher, >>> ObsoletedMemberWatcher and TerminationPendingMemberWatcher configurable. >>> WDYT? >>> >> >> Yes we have to. >> >> >>> >>> We need to document those configurable parameters as well, @Mari please >>> note. >>> >>> >>> Thanks, >>> Sajith >>> >>> >>> >> >> >> -- >> Rajkumar Rajaratnam >> Committer & PMC Member, Apache Stratos >> Software Engineer, WSO2 >> >> Mobile : +94777568639 >> Blog : rajkumarr.com >> > > -- Rajkumar Rajaratnam Committer & PMC Member, Apache Stratos Software Engineer, WSO2 Mobile : +94777568639 Blog : rajkumarr.com
