"Andrew Beekhof" <[EMAIL PROTECTED]> writes: (snip) >> Here's my observation: >> >> - An element of pending_ops is removed at lrm.c:L497 >> - It is called inside from g_has_table_foreach() at L1475 >> - This is violating the usage of g_has_table_foreach() according >> to the glib manual. >> - Therefore the iteration can not proceed correctly and would >> try to refer to a removed element. > > Turns out that the Stateful resource in CTS was never getting promoted. > Once I fixed this, I was able to trigger the bug too (in the last few > minutes).
A weird thing is that, it is not reproducable on every environments. As far as we've tested: - it _always_ happens on a RedHat 4 environment. - it has _never_ happened on a RedHat 5 environment. I'm not sure if it's the only difference but possibly the difference of the glib versions may be related to the behavior. > > Thanks for your diagnosis and the patch, you've certainly saved me some time > :-) > >> >> http://hg.linux-ha.org/lha-2.1/annotate/333aef5bd4ed/crm/crmd/lrm.c >> (...) >> 946 /* not doing this will block the node from shutting down */ >> 947 g_hash_table_remove(pending_ops, key); >> (...) >> 1475 g_hash_table_foreach(pending_ops, >> stop_recurring_action_by_rsc, rsc); >> >> >> http://library.gnome.org/devel/glib/stable/glib-Hash-Tables.html#g-hash-table-foreach >> (...) >> The hash table may not be modified while iterating over it (you can't >> add/remove items). >> >> >> I also attached my suggested patch, although I can not guarantee >> the correctness but just to show you the idea. >> >> Thanks, -- Keisuke MORI NTT DATA Intellilink Corporation _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/