On 07-Aug-15 4:20 PM, Girish Nagaraj wrote:
> Hi,
>
>
>
> We have few applications (all under one single service unit) modeled as
> opensaf components.
>
>
>
> In 2N model failure recovery action is configured as suFailover.
>
>
>
> Behavior in 4.3.1
>
>    In 2N model when one of the component fails, then failed component is
> restarted and assigned STANDBY HA state and all other components SIs are
> assigned STANDBY HA state
>
>
>
> Behavior in 4.6.0
>
>    In 2N model when one of the component fails, failed component is not
> restarted and all other components are terminated abruptly
>
>
>
> Why is this difference in behavior? Has anything changed in 4.6 ? we are
> using same model files in both.
>
>
 From 4.4 release onwards, sufailover feature is supported. In any 
application configuration, in which saAmfSUFailover or 
saAmfSutDefSUFailover is enabled, sufailover instead of comp-failover 
for the faulted SU will be performed.

In think in the present case one of these flag was enabled in the 
original configuration.To observe old behavior dynamically disable 
saAmfSUFailover or saAmfSutDefSUFailover flag using immcfg command:
immcfg -a saAmfSUFailover=0 <name of su>.


Thanks
Praveen
>
> // 4.3.1 log
>
> Aug  7 16:08:30 localhost abrt[10373]: Saved core dump of pid 10080
> (/usr/local/sbin/chkptapp) to
> /var/spool/abrt/ccpp-2015-08-07-16:08:30-10080 (3244032 bytes)
>
> Aug  7 16:08:30 localhost abrtd: Directory 'ccpp-2015-08-07-16:08:30-10080'
> creation detected
>
> Aug  7 16:08:31 localhost avahi-daemon[523]: Withdrawing workstation
> service for svlan0.1.
>
> Aug  7 16:08:31 localhost osafamfnd[9852]: NO
> 'safComp=chkptapp,safSu=SU1,safSg=app2N,safApp=app' faulted due to
> 'avaDown' : Recovery is 'suFailover'
>
> Aug  7 16:08:31 localhost osafamfnd[9852]: NO
> 'safSu=SU1,safSg=app2N,safApp=app' Presence State INSTANTIATED =>
> TERMINATING
>
> Aug  7 16:08:31 localhost appn_amf_script: Stopping chkptapp
>
> Aug  7 16:08:31 localhost osafamfnd[9852]: NO Assigning
> 'safSi=appSI,safApp=app' QUIESCED to 'safSu=SU1,safSg=app2N,safApp=app'
>
>
>
> // 4.6 log
>
> Aug  7 06:56:52 ha-node-1 abrt[10462]: Saved core dump of pid 10191
> (/usr/local/sbin/chkptapp) to
> /var/spool/abrt/ccpp-2015-08-07-06:56:52-10191 (3309568 bytes)
>
> Aug  7 06:56:52 ha-node-1 osafamfnd[9897]: NO saAmfSUFailover is true for
> 'safSu=SU1,safSg=zebos2N,safApp=app'
>
> Aug  7 06:56:52 ha-node-1 osafamfnd[9897]: NO SU failover probation timer
> started (timeout: 1200000000000 ns)
>
> Aug  7 06:56:52 ha-node-1 osafamfnd[9897]: NO Performing failover of
> 'safSu=SU1,safSg=app2N,safApp=app' (SU failover count: 1)
>
> Aug  7 06:56:52 ha-node-1 osafamfnd[9897]: NO
> 'safComp=chkptapp,safSu=SU1,safSg=app2N,safApp=app' recovery action
> escalated from 'componentFailover' to 'suFailover'
>
> Aug  7 06:56:52 ha-node-1 osafamfnd[9897]: NO
> 'safComp=chkptapp,safSu=SU1,safSg=app2N,safApp=app' faulted due to
> 'avaDown' : Recovery is 'suFailover'
>
> Aug  7 06:56:52 ha-node-1 osafamfnd[9897]: NO Terminating components of
> 'safSu=SU1,safSg=app2N,safApp=app'(abruptly & unordered)
>
> Aug  7 06:56:52 ha-node-1 osafamfnd[9897]: NO
> 'safSu=SU1,safSg=app2N,safApp=app' Presence State INSTANTIATED =>
> TERMINATING
>
>
>
> Regards,
>
> Girish
>

------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to