[Bug 571612] Re: dlm_controld.pcmk segfault
Launchpad has imported 4 comments from the remote bug at https://bugzilla.redhat.com/show_bug.cgi?id=586752. If you reply to an imported comment from within Launchpad, your comment will be sent to the remote bug automatically. Read more about Launchpad's inter-bugtracker facilities at https://help.launchpad.net/InterBugTracking. On 2010-04-28T10:12:15+00:00 Oliver wrote: Created attachment 409748 Andrew Beekhof's patch to fix this issue Description of problem: dlm_controld.pcmk segfaults on startup if network uses vlan, bonding or bridging and corosync/pacemaker is invoked too early Version-Release number of selected component (if applicable): bug and patch testet on 3.0.7 ubuntu lucid packages How reproducible: Configure any of the obove on top of the raw interface and start corosync before the network settles. Additional info: The issue is discussed here http://oss.clusterlabs.org/pipermail/pacemaker/2010-April/005954.html Andrew Beekhof posted the attached patch that fixes this issue. gdb output is: Core was generated by `dlm_controld.pcmk -q 0'. Program terminated with signal 11, Segmentation fault. #0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:31 in ../sysdeps/x86_64/multiarch/../strlen.S #0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:31 #1 0x7f499565cd46 in *__GI___strdup (s=0x0) at strdup.c:42 #2 0x00403f0c in dlm_process_node (key=, value=0x1864a30, user_data=0x62a4f8) at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/pacemaker.c:136 #3 0x7f4995cdbd73 in IA__g_hash_table_foreach (hash_table=0x1866050, func=0x403e40 , user_data=0x62a4f8) at /build/buildd/glib2.0-2.24.0/glib/ghash.c:1325 #4 0x00403c9e in update_cluster () at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/pacemaker.c:82 #5 0x00415a4a in loop () at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/main.c:986 #6 0x0041659c in main (argc=, argv=) at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/main.c:1295 hth, Oliver Reply at: https://bugs.launchpad.net/ubuntu/+source/redhat- cluster/+bug/571612/comments/0 On 2010-04-28T12:08:13+00:00 Andrew wrote: Patch fa24b46 resolving this issue has been committed in cluster.git http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=fa24b460c51aa0c47d0842703feea8bca0ed66b7 Essentially, the dlm was trying to create a configfs entry for a node with no address. This lead to a NULL pointer being dereferenced and the dlm crashing. The above mentioned patch now checks for a valid address before continuing. Reply at: https://bugs.launchpad.net/ubuntu/+source/redhat- cluster/+bug/571612/comments/1 On 2010-04-29T13:22:44+00:00 Andrew wrote: Sorry, set the wrong status. Reply at: https://bugs.launchpad.net/ubuntu/+source/redhat- cluster/+bug/571612/comments/3 On 2010-07-30T11:29:34+00:00 Bug wrote: This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle. Changing version to '14'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Reply at: https://bugs.launchpad.net/ubuntu/+source/redhat- cluster/+bug/571612/comments/4 ** Changed in: redhatcluster Status: Unknown => Fix Released ** Changed in: redhatcluster Importance: Unknown => Medium -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/571612 Title: dlm_controld.pcmk segfault To manage notifications about this bug go to: https://bugs.launchpad.net/redhatcluster/+bug/571612/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 571612] Re: dlm_controld.pcmk segfault
Thank you for bugs and trying to make Ubuntu better. I'm marking this bug report as invalid given that the above bug refers to a feature that is not enabled in lucid nor maverick, and has already been fixed upstream for Natty. Regards ** Changed in: redhat-cluster (Ubuntu) Status: In Progress => Invalid ** Changed in: redhat-cluster (Ubuntu) Assignee: Andres Rodriguez (andreserl) => (unassigned) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/571612 Title: dlm_controld.pcmk segfault -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 571612] Re: dlm_controld.pcmk segfault
Jacob, Well, if the affected package is in PPA that's totally different thing given that it does not directly affect the distribution, and of course, it is easier to patch (there's no hassle of a review to see if it affects the Ubuntu Archive). Because of this, I'll review the above package and provide a fix for it in the PPA you listed above. Also note, that this bug doesn't apply to Ubuntu Maverick either so I'm marking this bug as invalid. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/571612 Title: dlm_controld.pcmk segfault -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 571612] Re: dlm_controld.pcmk segfault
Andres, Good point - I actually don't even have redhat-cluster installed... What led me to this bug report is following the Pacemaker, DRBD8, and OCFS2 test case in the wiki from Ubuntu-HA and other info online. I encountered the segfault problem and did a little digging to come across this bug. Although I'm not sure fact that I don't have redhat-cluster installed makes this bug invalid. I believe I encountered the bug due to the libdlm3-packemaker package I installed (from ppa:ubuntu-ha/lucid-cluster) for DLM to work in pacemaker has this problem? Correct me if I'm wrong in that assumption? Here's what I think are the relevant packages I have installed: libdlm3-pacemaker3.0.7-0ubuntu0ppa2.2 RHCS compatibility package -- dlm_controld f ocfs2-tools 1.4.3-1ubuntu0ppa4 tools for managing OCFS2 cluster pacemaker 1.0.8+hg15494-2ubuntu2 HA cluster resource manager I may be way off but is it possible to apply the patch against libdlm3-pacemaker package instead of redhat-cluster package? I am just in the testing phases right now with pacemaker/drbd/ocsf2 on a couple servers so I can definitely test a fix for you. Thanks! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/571612 Title: dlm_controld.pcmk segfault -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 571612] Re: dlm_controld.pcmk segfault
Jacob, I have uploaded a test package to a PPA. However, I'm assuming that you are using redhat-cluster - 3.0.2-2 in Ubuntu Lucid. Is this correct? If so, I believe that redhat-cluster does not have pacemaker support enabled in Lucid, which will make this bug report invalid for such release. Or, are you using any other version of RHCS? If so, where was it obtained from. Otherwise, you can test if the segfaulting issue is still present: sudo apt-get install python-software-properties sudo add-apt-repository ppa:andreserl/ha sudo apt-get update sudo apt-get install redhat-cluster-suite Best regards, -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/571612 Title: dlm_controld.pcmk segfault -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 571612] Re: dlm_controld.pcmk segfault
** Description changed: - Upstream bugreport and patch: https://bugzilla.redhat.com/show_bug.cgi?id=586752 - Please include the patch until a fixed upstream version is packaged. + Anyone who uses link aggregation (me), bridging, and vlans are affect + due to the time required to bring up the network after reboot. Corosync + comes up and dlm segfaults. This has been fixed upstream, and the fix is + included in Maverick+. + + Upstream bugreport and patch [1]. Patch commited upstream [2]. + Discussion about the issue [3]. + + [1]: https://bugzilla.redhat.com/show_bug.cgi?id=586752 + [2]: http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=fa24b460c51aa0c47d0842703feea8bca0ed66b7 + [3]: http://oss.clusterlabs.org/pipermail/pacemaker/2010-April/005954.html -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/571612 Title: dlm_controld.pcmk segfault -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 571612] Re: dlm_controld.pcmk segfault
** Changed in: redhat-cluster (Ubuntu) Status: Triaged => In Progress ** Changed in: redhat-cluster (Ubuntu) Importance: Medium => High -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/571612 Title: dlm_controld.pcmk segfault -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 571612] Re: dlm_controld.pcmk segfault
Great! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/571612 Title: dlm_controld.pcmk segfault -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 571612] Re: dlm_controld.pcmk segfault
Hi Jacob, I'll take a look at this in the next few days! Thank you ** Changed in: redhat-cluster (Ubuntu) Importance: Undecided => Medium ** Changed in: redhat-cluster (Ubuntu) Status: New => Triaged ** Changed in: redhat-cluster (Ubuntu) Assignee: (unassigned) => Andres Rodriguez (andreserl) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/571612 Title: dlm_controld.pcmk segfault -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 571612] Re: dlm_controld.pcmk segfault
Any update on this? It's been 8 months since found and reported. Any info would be great as it is a real pain! Anyone who uses link aggregation (me), bridging, and vlans are affect due to the time required to bring up the network after reboot. Corosync comes up and dlm segfaults (something to do with network not actually being completely started?). I want a node to be able to completely recover after being fenced or the like and with this problem it won't start all it's resources again due to the segfault without restarting corosync. I believe it is fixed in the 3.0.12 which is in Maverick - maybe this could be backported? Or the above referenced patch... I can figure out how to apply the patch myself but I would rather stay in sync with the HA packages... -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/571612 Title: dlm_controld.pcmk segfault -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 571612] Re: dlm_controld.pcmk segfault
** Bug watch added: Red Hat Bugzilla #586752 https://bugzilla.redhat.com/show_bug.cgi?id=586752 ** Also affects: redhatcluster via https://bugzilla.redhat.com/show_bug.cgi?id=586752 Importance: Unknown Status: Unknown -- dlm_controld.pcmk segfault https://bugs.launchpad.net/bugs/571612 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs