Hello,

I'm still stress-testing Corosync/Openais (trunk) on FreeBSD, and I've found 
out a tiny bug:

On peer A, my test program calls:
- saClmInitialize()
- saClmClusterTrack(SA_TRACK_CURRENT | SA_TRACK_CHANGES)
- saClmFinalize()
- (does various tests with CPG ..)
Next, when I shut down/kill Corosync on peer B, Corosync on peer A segfaults.

When my test program calls saClmClusterTrackStop() before saClmFinalize,
Corosync doesn't crash on peer B. From that and the stacktrace (joined below),
I guess it tries to signal the change in the cluster to a program that is not
connected anymore (-> missing disconnection notification to CLM ?). I also
guess it means that Corosync will segfault if the client itself crashes.

By the way, it's *not* due to a BSD-ism ;) (I've also tested on a small Debian 
cluster).

The core of the crashed Corosync gives me the following stacktrace:

-----
(gdb) bt
#0  0x28501120 in library_notification_send 
(cluster_notification_entries=0x3fbd36a0, notify_count=2) at clm.c:429
#1  0x285014ca in lib_notification_leave (nodes=0x3fbf76f0, nodes_entries=2) at 
clm.c:524
#2  0x2850189c in clm_confchg_fn 
(configuration_type=TOTEM_CONFIGURATION_TRANSITIONAL, member_list=0x3fbf8a14, 
member_list_entries=1, left_list=0x3fbf7e14, left_list_entries=2, 
    joined_list=0x0, joined_list_entries=0, ring_id=0x2833765c) at clm.c:584
#3  0x0804ba7b in confchg_fn 
(configuration_type=TOTEM_CONFIGURATION_TRANSITIONAL, member_list=0x3fbf8a14, 
member_list_entries=1, left_list=0x3fbf7e14, left_list_entries=2, 
joined_list=0x0, 
    joined_list_entries=0, ring_id=0x2833765c) at main.c:324
#4  0x280a2a2f in app_confchg_fn 
(configuration_type=TOTEM_CONFIGURATION_TRANSITIONAL, member_list=0x3fbf8a14, 
member_list_entries=1, left_list=0x3fbf7e14, left_list_entries=2, 
    joined_list=0x0, joined_list_entries=0, ring_id=0x2833765c) at totempg.c:350
#5  0x280a2932 in totempg_confchg_fn 
(configuration_type=TOTEM_CONFIGURATION_TRANSITIONAL, member_list=0x3fbf8a14, 
member_list_entries=1, left_list=0x3fbf7e14, left_list_entries=2, 
    joined_list=0x0, joined_list_entries=0, ring_id=0x2833765c) at totempg.c:524
#6  0x280a2343 in totemmrp_confchg_fn 
(configuration_type=TOTEM_CONFIGURATION_TRANSITIONAL, member_list=0x3fbf8a14, 
member_list_entries=1, left_list=0x3fbf7e14, left_list_entries=2, 
    joined_list=0x0, joined_list_entries=0, ring_id=0x2833765c) at 
totemmrp.c:109
#7  0x2809ad3f in memb_state_operational_enter (instance=0x28316000) at 
totemsrp.c:1678
#8  0x2809f890 in message_handler_orf_token (instance=0x28316000, 
msg=0x28381638, msg_len=70, endian_conversion_needed=0) at totemsrp.c:3484
#9  0x280a20bb in main_deliver_fn (context=0x28316000, msg=0x28381638, 
msg_len=70) at totemsrp.c:4212
#10 0x28095bb2 in none_token_recv (rrp_instance=0x282fe400, iface_no=0, 
context=0x28316000, msg=0x28381638, msg_len=70, token_seq=3) at totemrrp.c:536
#11 0x28097849 in rrp_deliver_fn (context=0x28206190, msg=0x28381638, 
msg_len=70) at totemrrp.c:1393
#12 0x28093cf5 in net_deliver_fn (handle=7749363892505018368, fd=7, revents=1, 
data=0x28381000) at totemudp.c:1223
#13 0x28091d44 in poll_run (handle=7749363892505018368) at coropoll.c:394
#14 0x0804d432 in main (argc=2, argv=0x3fbfece4) at main.c:1069
(gdb) print *cluster_notification_entries 
$1 = {cluster_node = {node_id = 2, node_address = {length = 11, family = 
MAR_CLM_AF_INET, value = "172.16.10.2", '\0' <repeats 52 times>}, node_name = 
{length = 11, 
      value = "172.16.10.2", '\0' <repeats 244 times>}, member = 0, 
boot_timestamp = 1254723307000000000, initial_view_number = 617}, 
cluster_change = MAR_NODE_LEFT}
(gdb) print clm_pd
$2 = (struct clm_pd *) 0xc3fbbe18
(gdb) print *clm_pd
Cannot access memory at address 0xc3fbbe18
(gdb) info threads 
* 3 Thread 0x28201040 (LWP 100188)  0x28501120 in library_notification_send 
(cluster_notification_entries=0x3fbd36a0, notify_count=2) at clm.c:429
  2 Thread 0x28201150 (LWP 100284)  0x2815fe3f in poll () at poll.S:2
  1 Thread 0x282019d0 (LWP 100286)  0x2811f0fb in semop () at semop.S:2
-----

Hope it helps.

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to