Re: Member termination took 30 minutes

Sajith Kariyawasam Wed, 11 Feb 2015 23:31:00 -0800

Thanks for the explanation Raj! Its more clear now

On Thu, Feb 12, 2015 at 6:56 AM, Rajkumar Rajaratnam <[email protected]>
wrote:


> Hi Sajith,
>
> Please find my comments inline.
>
> On Thu, Feb 12, 2015 at 1:06 AM, Sajith Kariyawasam <[email protected]>
> wrote:
>
>> Hi Devs,
>>
>> While testing group scaling, I noticed when scaling down it takes 30
>> minutes from the moment scaling rule decides to terminate an instance.
>>
>> An active member, which was selected by the rule, first moves to a
>> "termination pending member map", and after a certain period
>> (terminationPendingMemberExpiryTime) that member
>> moves to an "obsolete member map". Then by the obsolete check rule, that
>> member will be terminated via cloud controller.
>>
>> It seems because of the property  terminationPendingMemberExpiryTime,
>> default value of which is 30 minutes, this takes that amount of time to get
>> terminated
>>
>> Sorry for asking, I might have missed some past discussions regarding
>> this, could someone explain the purpose of moving the member to an
>> intermediary map "termination pending member map", rather than moving
>> directly to "obsolete member map"?
>>
>
> The reason is to avoid event lost and graceful termination. Let me explain
> the logic.
>
>    - When scaling down, AS will move the member from "active member list"
>    to "termination pending member map".
>    - There is a drool-rule "Cleanup Instances which are pending
>    termination" which will run periodically and take all the members which are
>    in "termination pending member map" and publish instance clean up event.
>    - When CA receives instance clean up event, it will publish instance
>    ready to shutdown.
>    - When CC receives instance ready to shutdown event, it will publish
>    member ready to shutdown.
>    - When AS receives member ready to shutdown event, it will move the
>    member from  "termination pending member map" to "obsolete member map".
>    - Hence, until AS receives member ready to shutdown event, it will
>    keep publishing instance clean up event in every cluster monitor interval
>    (drool is running)
>    - If AS is not receiving member ready to shutdown event for a member
>    in "termination pending member map" within 30 min (upper limit), this
>    member will be moved to obsolete list without waiting for the member ready
>    to shutdown event.
>
> The reason for this complete cycle is graceful termination. If we put the
> member into "obsolete member map", it will not be terminated gracefully.
>
> The reason why we are moving the member from "active member list" to
> "termination pending member map" is to avoid event lost. We have had
> situations where some event is lost in the above cycle. These events are
> published only once. If we lost one event in this cycle, that member will
> not be terminated forever. That is why we are putting the member in the
> map. In every cluster monitor interval, we are taking all the members in
> the "termination pending member map" and send the instance clean up event.
> This will overcome event lost.
>
> 30 min is the upper limit, maximum time a member can resides in
> "termination pending member map". You have faced the edge scenario, where
> AS didn't receive the member ready to shutdown event. So AS took 30 min to
> move the member to obsolete list.
>
>>
>> Also, is terminationPendingMemberExpiryTime parameter configurable?
>> (seems not) , and any reason for it to set to 30 minutes?
>>
>
> This is not configurable yet. But other member list/map expiry times are
> configurable AFAIR.
>
>>
>> Further, we should make sleep times of  PendingMemberWatcher,
>> ObsoletedMemberWatcher and TerminationPendingMemberWatcher configurable.
>> WDYT?
>>
>
> Yes we have to.
>
>
>>
>> We need to document those configurable parameters as well, @Mari please
>> note.
>>
>>
>> Thanks,
>> Sajith
>>
>>
>>
>
>
> --
> Rajkumar Rajaratnam
> Committer & PMC Member, Apache Stratos
> Software Engineer, WSO2
>
> Mobile : +94777568639
> Blog : rajkumarr.com
>

Re: Member termination took 30 minutes

Reply via email to