[Pacemaker] Occasional nonsensical resource agent errors since Debian 3.2.57-3+deb7u1 kernel update

Ken Gaillot Sat, 12 Jul 2014 06:50:17 -0700

Hi,

We run multiple deployments of corosync+pacemaker on Debian "wheezy" forhigh-availability of various resources. The configurations are unchangedand ran without any issues for many months. However, since we appliedthe Debian 3.2.57-3+deb7u1 kernel update in May, we have been gettingresource agent errors on rare occasions, with error messages that areclearly incorrect.


The incidents have happened four times on two unrelated clusters:

* Our cluster hosts "talos" and "pomona" use pacemaker to manage a fewvirtual IP adresses using the ocf:heartbeat:IPaddr2 resource agent. Thisone has had two incidents. The first incident began with this error:

Jun 2 17:30:16 pomona lrmd: [2145]: info: RA output:(ldap-ip:monitor:stderr) /usr/lib/ocf/resource.d//heartbeat/IPaddr2: 1:/usr/lib/ocf/resource.d//heartbeat/IPaddr2: : Permission denied


The second incident began with this error:

Jul 12 08:36:15 talos IPaddr2[21294]: ERROR: Setup problem: couldn'tfind command: ip

I can confidently say, the permissions of IPaddr2 and the location ofthe "ip" command, did not change at any point!

* Our cluster hosts "aries" and "taurus" use pacemaker in a morecomplicated setup, managing Xen virtual machines on shared storageutilizing DRBD and CLVM, using the resource agentsocf:pacemaker:controld, ocf:gleim:clvmd (which is the stock clvmdresource agent from a later pacemaker version than is included inwheezy), ocf:heartbeat:LVM, ocf:linbit:drbd, and ocf:gleim:Xen (which isthe stock Xen resource agent with a trivial one-line change for a localworkaround).


This cluster had also had two incidents:

* The first began with:

Jun 16 10:38:15 aries lrmd: [3646]: info: RA output:(jabber:monitor:stderr) /usr/lib/ocf/resource.d//gleim/Xen: 71: local:en-list: bad variable name

There is no variable "en-list" in the resource agent; the closest stringin the file is "xen-list", which is a binary not a variable, used like this:


  ...
  if have_binary xen-list; then
     xen-list $1 2>/dev/null | grep -qs "State.*[-r][-b][-p]--" 2>/dev/null
     ...

* The second began with:

Jun 21 11:58:58 taurus Xen[9052]: ERROR: Setup problem: couldn't findcommand: awk


Again, the location of "awk"  has not changed.

We have no reason to suspect the kernel update other than timing, andthe fact that the incidents occur on unrelated clusters. We have sinceupgraded to Debian's next update, 3.2.57-3+deb7u2, but the most recentincident occurred after that. The original update included fixes forthese issues:


CVE-2014-0196

    Jiri Slaby discovered a race condition in the pty layer, which could
    lead to denial of service or privilege escalation.

CVE-2014-1737 / CVE-2014-1738

    Matthew Daley discovered that missing input sanitising in the
    FDRAWCMD ioctl and an information leak could result in privilege
    escalation.

CVE-2014-2851

    Incorrect reference counting in the ping_init_sock() function allows
    denial of service or privilege escalation.

CVE-2014-3122

    Incorrect locking of memory can result in local denial of service.

Given the odd error messages from the resource agent, I suspect it's amemory corruption error of some sort. We've been unable to find anythingelse useful in the logs, and we'll probably end up reverting to theprior kernel version. But given the rarity of the issue, it would be along while before we could be confident that fixed it.

Is anyone else running pacemaker on Debian with 3.2.57-3+deb7u1 kernelor later? Has anyone had any similar issues?


-- Ken Gaillot <kjgai...@gleim.com>
   Gleim NOC

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Occasional nonsensical resource agent errors since Debian 3.2.57-3+deb7u1 kernel update

Reply via email to