Hi Hans,

 

As I said earlier this patch is for the cases where Remote fencing is not 
enabled. Stonith is valid only when Remote fencing is enabled.

 

Also the ideal solution in this scenario is CLM taking complete responsibility 
of fencing the node and AMF should depend on CLM Notification for doing role 
failover

In that case we won't see two Active SU's at the same time.

The patch is a temporary solution only where we are trying to Isolate the 
faulted node immediately.

 

Thanks,

Ravi

 

 

From: Hans Nordebäck [mailto:hans.nordeb...@ericsson.com] 
Sent: Friday, April 13, 2018 5:08 PM
To: Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>; Anders Widell 
<anders.wid...@ericsson.com>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: SV: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot 
[#2833]

 

Hi Ravi,

 

stonith is not only valid for virutalized environment, I assume stonith 
supports other e.g. ipmi in a legacy environment. The probability for 
"flickering" may be higher in a virtualized environment,

but for redundancy there should be two interfaces configured, which is the 
normal configuration in legacy. If the problem in this ticket is solved by 
using stonith I don't see a need for adding this patch.

BTW do this patch work when stonith is enabled?

/Regards HansN

 

On 04/13/2018 10:59 AM, Ravi Sekhar Reddy Konda wrote:

HI Hans,

 

The use case that we are addressing here is link flickering  when remote 
fencing is not enabled, Also remote fencing using Stonith is valid only in 
Virtualization environments. I have not tested using Stonith enabled as the use 
case is in the case where remote fencing is disabled.

 

Thanks,

Ravi 

 

 

From: Hans Nordebäck [mailto:hans.nordeb...@ericsson.com] 
Sent: Friday, April 13, 2018 1:10 AM
To: ravi-sekhar HYPERLINK 
"mailto:ravisekhar.ko...@oracle.com";<ravisekhar.ko...@oracle.com>; Anders 
Widell HYPERLINK "mailto:anders.wid...@ericsson.com";<anders.wid...@ericsson.com>
Cc: HYPERLINK 
"mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net
Subject: SV: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

 

Hi Ravi,

 

I think stonith, implemented in ticket #1859, handles this case. This 
"flickering" was one the (manual) tests verifying the added stonith support.

It is important to have a separate interface for stonith, to be able to perform 
the remote fencing, similar to use a back plane.

Have you tested with stonith enabled? 

 

/Regards HansN 

  _____  

Från: ravi-sekhar <HYPERLINK 
"mailto:ravisekhar.ko...@oracle.com"ravisekhar.ko...@oracle.com>
Skickat: den 12 april 2018 15:29:13
Till: Hans Nordebäck; Anders Widell
Kopia: HYPERLINK 
"mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net;
 ravi-sekhar
Ämne: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833] 

 

---
 scripts/opensaf_reboot | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index df65c26..b219c39 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
 if [ -f "$pkgsysconfdir/fmd.conf" ]; then
   . "$pkgsysconfdir/fmd.conf"
 fi
+if [ -f "$pkgsysconfdir/nid.conf" ]; then
+  . "$pkgsysconfdir/nid.conf"
+fi
 
 NODE_ID_FILE=$pkglocalstatedir/node_id
 
@@ -118,7 +121,17 @@ else
                 # uncomment the following line if debugging errors that keep 
restarting the node
                 # exit 0
 
+                # If the application is using different interface for cluster 
communication, please
+                # add your application specific isolation commands here
+
                 logger -t "opensaf_reboot" "Rebooting local node; 
timeout=$OPENSAF_REBOOT_TIMEOUT"
+  
+                # Isolate the node
+                if [ "$MDS_TRANSPORT" = "TIPC" ]; then
+                   tipc-config -bd eth:$TIPC_ETH_IF
+                else
+                   $icmd pkill -STOP osafdtmd
+                fi
 
                 # Start a reboot supervision background process. Note that a 
similar
                 # supervision is also done in the opensaf_reboot() function in 
LEAP.
@@ -128,12 +141,6 @@ else
                         (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" > 
"/proc/sysrq-trigger") &
                 fi
 
-               # Stop some important opensaf processes to prevent bad things 
from happening
-               $icmd pkill -STOP osafamfwd
-               $icmd pkill -STOP osafamfnd
-               $icmd pkill -STOP osafamfd
-               $icmd pkill -STOP osaffmd
-
                 # Flush OpenSAF internal log server messages to disk.
                 $bindir/osaflog --flush
 
-- 
1.9.1

 
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to