Finally I could get hold of the traces! Turned out to be a simpler case than 
the initial analysis. Thanks surender for re-running and sharing the traces.

It's a simple case of 

-          Issue CLM Lock of a node (PL-5)

-          Make the PL-5 node non-member

-          Lock callback timesout and the nodeentry is not found(whichis fine) 
and the abort gets hit.

 

While the root cause is of an incorrectly placed abort, the fix is to lookup 
based on name than on id because the node with that id has gone down and is not 
relevant any more.

 

Cheers,

Mathi.

 

From: surender khetavath [mailto:[email protected]] 
Sent: Friday, July 05, 2013 4:36 PM
To: [opensaf:tickets] 
Subject: [opensaf:tickets] #227 clmd asserts on active controller during node 
lock timeout

 

The issue is always reproducible. 
Test:

A campaign is modeled to include PL-5 and an SU on this node. For this the 
script '/usr/share/opensaf/immxml/immxml-modify-config' is being used. While 
doing rollback clm crash is observed. It is seen that the campaign is doing a 
lock/lock-in op on PL-5 and simultaneously the script immxml-modify-config is 
also trying to perform admin lock i.e the lines below if commented in 
immxml-modify-config, then the rollback goes fine. if enabled then clm crashes. 

 PLMNODE=`cat $CURRENT_NODECFG | grep ".. $node " | awk '{ print $ 3 }'`
 trace "PLMNODE: $PLMNODE"
 cmd="amf-adm lock safNode=$PLMNODE,safCluster=myClmCluster"

The scripts, configuration are attached. 

Attachment: scripts.tgz (4.9 kB; application/x-compressed-tar)

  _____  

HYPERLINK "http://sourceforge.net/p/opensaf/tickets/227/"[tickets:#227] clmd 
asserts on active controller during node lock timeout

Status: unassigned
Created: Wed May 15, 2013 10:23 AM UTC by Mathi Naickan
Last Updated: Fri Jun 28, 2013 10:45 AM UTC
Owner: Mathi Naickan

I have asked for traces from the submitter.

changeset : 4007 with patch 2865
scenario:
========
Trying to do lock/lock-in of PL-5.
amf-adm lock safNode=PL-5,safCluster=myClmCluster
error - saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_TIMEOUT (5)
error: failed to eval/store amf-adm lock safNode=PL-5,safCluster=myClmCluster 
failed. Aborting script! exitCode: 1


0 0x00007fb446240b55 in raise () from /lib64/libc.so.6


(gdb) bt


0 0x00007fb446240b55 in raise () from /lib64/libc.so.6


1 0x00007fb446242131 in abort () from /lib64/libc.so.6


2 0x00007fb447881e44 in osafassert_fail (file=0x420380 "clms_evt.c", line=390,


func=0x420680 "proc_node_lock_tmr_exp_msg", assertion=0x42069b "op_node != 
NULL") at sysf_def.c:301


3 0x000000000040954a in proc_node_lock_tmr_exp_msg (evt=0x655290) at 
clms_evt.c:390


4 0x000000000040bc42 in clms_process_mbx (mbx=0x6298a0) at clms_evt.c:1272


5 0x0000000000412b3b in main (argc=1, argv=0x7fff3162cb28) at clms_main.c:455


(gdb) bt full


0 0x00007fb446240b55 in raise () from /lib64/libc.so.6


No symbol table info available.


1 0x00007fb446242131 in abort () from /lib64/libc.so.6


No symbol table info available.


2 0x00007fb447881e44 in osafassert_fail (file=0x420380 "clms_evt.c", line=390,


func=0x420680 "proc_node_lock_tmr_exp_msg", assertion=0x42069b "op_node != 
NULL") at sysf_def.c:301
No locals.


3 0x000000000040954a in proc_node_lock_tmr_exp_msg (evt=0x655290) at 
clms_evt.c:390


rc = 1
node_id = 132367
op_node = 0x0
FUNCTION = "proc_node_lock_tmr_exp_msg"


4 0x000000000040bc42 in clms_process_mbx (mbx=0x6298a0) at clms_evt.c:1272


msg = 0x655290
FUNCTION = "clms_process_mbx"


5 0x0000000000412b3b in main (argc=1, argv=0x7fff3162cb28) at clms_main.c:455


ret = 1
mbx_fd = {raise_obj = 11, rmv_obj = 12}
error = SA_AIS_OK
rc = 1
FUNCTION = "main"
syslog on sc-1:
==============
Mar 13 12:27:23 SLES1 osafclmd[6575]: clms_evt.c:390: 
proc_node_lock_tmr_exp_msg: Assertion 'op_node != NULL' failed.
Mar 13 12:27:23 SLES1 osafamfnd[6604]: NO 
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Mar 13 12:27:23 SLES1 osafamfnd[6604]: ER 
safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Mar 13 12:27:23 SLES1 osafamfnd[6604]: Rebooting OpenSAF NodeId? = 131343 EE 
Name = , Reason: Component faulted: recovery is node failfast
Mar 13 12:27:23 SLES1 opensaf_reboot: Rebooting local node

  _____  

Sent from sourceforge.net because you indicated interest in 
https://sourceforge.net/p/opensaf/tickets/227/

To unsubscribe from further messages, please visit 
https://sourceforge.net/auth/subscriptions/
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to