[tickets] [opensaf:tickets] #3029 amfd: improve failover behaviour if consensus service is enabled

Gary Lee via Opensaf-tickets Tue, 09 Jul 2019 21:53:41 -0700

commit 520607e2d6882a11fdcba269cae771e1cdb8834b
Author: Gary Lee <gary....@dektech.com.au>
Date:   Wed Jul 10 14:49:32 2019 +1000


    osaf: make wait time configurable [#3029]
    
    If FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is enabled,
    make the time that we wait for MDS node events configurable.

commit 16dd227ea678355a9e911af387f3106352cf6038
Author: Gary Lee <gary....@dektech.com.au>
Date:   Wed Jul 10 14:49:32 2019 +1000

    amfd: improve controller failover behavior [#3029]
    
    If consensus service is enabled, only perform node failover
    after peer controller has self-fenced
    (after 2 * FMS_TAKEOVER_REQUEST_VALID_TIME seconds).
    
    This also means if node failover delay is set to a large value,
    we do not unnecesarily wait too long before failing over assignments
    previously assigned to the peer controller.
    
    Remove unused fmd_conf_file variable.
    
    Change some LOG_ER calls to LOG_WA.

commit 1b110e9537729324e025ca411006d29f1552e0a9
Author: Gary Lee <gary....@dektech.com.au>
Date:   Wed Jul 10 14:49:32 2019 +1000

    fmd: add active promotion supervision timer [#3029]
    
    Add supervision timer so controller will reboot if it cannot obtain
    consensus lock within the allocation period
    (2* FMS_TAKEOVER_REQUEST_VALID_TIME).
    
    The peer controller can then safely perform a node failover
    after this period of time.

commit aeb58a5b76dfb7a1f1608e62947d6b74d9112bd7
Author: Gary Lee <gary....@dektech.com.au>
Date:   Wed Jul 10 14:49:32 2019 +1000

    osaf: add function to return takeover request expiry time [#3029]



---

** [tickets:#3029] amfd: improve failover behaviour if consensus service is 
enabled**

**Status:** review
**Milestone:** 5.19.07
**Created:** Wed Apr 03, 2019 02:04 AM UTC by Gary Lee
**Last Updated:** Wed Jul 03, 2019 06:28 AM UTC
**Owner:** Gary Lee


Imagine a cluster consisting only of SC-1 and SC-2. A 2N app is active on SC-2 
/ standby on SC-1. SC-1 is the active SC.

Currently, if the node failover delay is set to 0 and SC-2 is isolated, SC-1 
will failover the 2N app to SC-1 immediately.

If split brain prevention is enabled, amfd should wait at least the takeover 
timeout before failing over the 2N app.

Conversely, if split brain prevention is enabled, and node failover delay is 
set to a large value, amfd does not have to wait the full delay for a peer SC. 
It can failover after the takeover timeout.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.

_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #3029 amfd: improve failover behaviour if consensus service is enabled

Reply via email to