On Mon, Oct 05, 2009 at 03:49:42PM +0200, Jerome Flesch wrote:
> Hello,
Hi, Jerome.
> I'm still stress-testing Corosync/Openais (trunk) on FreeBSD, and I've found
> out a tiny bug:
>
> On peer A, my test program calls:
> - saClmInitialize()
> - saClmClusterTrack(SA_TRACK_CURRENT | SA_TRACK_CHANGES)
> - saClmFinalize()
> - (does various tests with CPG ..)
> Next, when I shut down/kill Corosync on peer B, Corosync on peer A segfaults.
Can you provide the exact test program? I'd like to see all the
details of each API call. Are you using a test program from the
openais tree or did you write your own.
> When my test program calls saClmClusterTrackStop() before saClmFinalize,
> Corosync doesn't crash on peer B. From that and the stacktrace
> (joined below)
>
> I guess it tries to signal the change in the cluster to a program that is not
> connected anymore (-> missing disconnection notification to CLM ?). I also
> guess it means that Corosync will segfault if the client itself crashes.
I'm guessing that a callback is sent to node A. If I understand, you
are enabling tracking on group A, correct? If CLM is anything like MSG
service (and I think it is with respect to how tracking works),
enabling tracking will generate callbacks on membership changes *to
the node that enabled tracking.
> By the way, it's *not* due to a BSD-ism ;) (I've also tested on a small
> Debian cluster).
>
> The core of the crashed Corosync gives me the following stacktrace:
Thanks. I'll take a look. But like I said, if you can provide the test
program, that would be great.
Ryan
> -----
> (gdb) bt
> #0 0x28501120 in library_notification_send
> (cluster_notification_entries=0x3fbd36a0, notify_count=2) at clm.c:429
> #1 0x285014ca in lib_notification_leave (nodes=0x3fbf76f0, nodes_entries=2)
> at clm.c:524
> #2 0x2850189c in clm_confchg_fn
> (configuration_type=TOTEM_CONFIGURATION_TRANSITIONAL, member_list=0x3fbf8a14,
> member_list_entries=1, left_list=0x3fbf7e14, left_list_entries=2,
> joined_list=0x0, joined_list_entries=0, ring_id=0x2833765c) at clm.c:584
> #3 0x0804ba7b in confchg_fn
> (configuration_type=TOTEM_CONFIGURATION_TRANSITIONAL, member_list=0x3fbf8a14,
> member_list_entries=1, left_list=0x3fbf7e14, left_list_entries=2,
> joined_list=0x0,
> joined_list_entries=0, ring_id=0x2833765c) at main.c:324
> #4 0x280a2a2f in app_confchg_fn
> (configuration_type=TOTEM_CONFIGURATION_TRANSITIONAL, member_list=0x3fbf8a14,
> member_list_entries=1, left_list=0x3fbf7e14, left_list_entries=2,
> joined_list=0x0, joined_list_entries=0, ring_id=0x2833765c) at
> totempg.c:350
> #5 0x280a2932 in totempg_confchg_fn
> (configuration_type=TOTEM_CONFIGURATION_TRANSITIONAL, member_list=0x3fbf8a14,
> member_list_entries=1, left_list=0x3fbf7e14, left_list_entries=2,
> joined_list=0x0, joined_list_entries=0, ring_id=0x2833765c) at
> totempg.c:524
> #6 0x280a2343 in totemmrp_confchg_fn
> (configuration_type=TOTEM_CONFIGURATION_TRANSITIONAL, member_list=0x3fbf8a14,
> member_list_entries=1, left_list=0x3fbf7e14, left_list_entries=2,
> joined_list=0x0, joined_list_entries=0, ring_id=0x2833765c) at
> totemmrp.c:109
> #7 0x2809ad3f in memb_state_operational_enter (instance=0x28316000) at
> totemsrp.c:1678
> #8 0x2809f890 in message_handler_orf_token (instance=0x28316000,
> msg=0x28381638, msg_len=70, endian_conversion_needed=0) at totemsrp.c:3484
> #9 0x280a20bb in main_deliver_fn (context=0x28316000, msg=0x28381638,
> msg_len=70) at totemsrp.c:4212
> #10 0x28095bb2 in none_token_recv (rrp_instance=0x282fe400, iface_no=0,
> context=0x28316000, msg=0x28381638, msg_len=70, token_seq=3) at totemrrp.c:536
> #11 0x28097849 in rrp_deliver_fn (context=0x28206190, msg=0x28381638,
> msg_len=70) at totemrrp.c:1393
> #12 0x28093cf5 in net_deliver_fn (handle=7749363892505018368, fd=7,
> revents=1, data=0x28381000) at totemudp.c:1223
> #13 0x28091d44 in poll_run (handle=7749363892505018368) at coropoll.c:394
> #14 0x0804d432 in main (argc=2, argv=0x3fbfece4) at main.c:1069
> (gdb) print *cluster_notification_entries
> $1 = {cluster_node = {node_id = 2, node_address = {length = 11, family =
> MAR_CLM_AF_INET, value = "172.16.10.2", '\0' <repeats 52 times>}, node_name =
> {length = 11,
> value = "172.16.10.2", '\0' <repeats 244 times>}, member = 0,
> boot_timestamp = 1254723307000000000, initial_view_number = 617},
> cluster_change = MAR_NODE_LEFT}
> (gdb) print clm_pd
> $2 = (struct clm_pd *) 0xc3fbbe18
> (gdb) print *clm_pd
> Cannot access memory at address 0xc3fbbe18
> (gdb) info threads
> * 3 Thread 0x28201040 (LWP 100188) 0x28501120 in library_notification_send
> (cluster_notification_entries=0x3fbd36a0, notify_count=2) at clm.c:429
> 2 Thread 0x28201150 (LWP 100284) 0x2815fe3f in poll () at poll.S:2
> 1 Thread 0x282019d0 (LWP 100286) 0x2811f0fb in semop () at semop.S:2
> -----
>
> Hope it helps.
>
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais