Re: [Linux-HA] pingd removed transient attr from node attributes after short network outage, and did not recreated it

Dejan Muhamedagic Mon, 12 Nov 2007 11:44:00 -0800

Hi,

On Mon, Nov 12, 2007 at 05:03:35PM +0100, Sebastian Reitenbach wrote:
> Hi,
> 
> I did some more tests with my two node cluster, regarding pingd.
> 
> I started the two node cluster. Both nodes came up, resources are
> distributed as the location constraints define it. The location of the
> Xen resources are dependent on pingd attributes.
> Then on the only one ping node, I flushed the state tables, and only
> allowed pings from the host ppsdb101. I saw the
> Xen resources moving, everything great. I changed the Firwall
> on the Ping node to only allow pings from the ppsnfs101 host. Well,
> all four Xen resources moved over to the ppsnfs101 host.
> At 16:17 the I disabled the both ports of the switch where the nodes
> are connected, e.g. a real life usecase would be:
> 1. non redundant netork layout
> 2. no stonith, or stonith over network (e.g. ilo or ssh)
> 3. someone removes power from the switch where both nodes are connected
> 
> Then I waited about 10 seconds, and enabled both ports again.
> The RSTP took some more seconds to restructure.
> 
> After that both nodes could communicate again with each other, and
> the pings are reaching the ping node again, the lines that the pingd
> produces as transient attributes to the nodes, were both gone.
> 
> Before I removed the cable, I issued a
> cibadmin -Q -o status | grep ping
> and the two lines, one for each host, showed up, after disconnecting
> both hosts, and reconnecting, rerunning the cibadmin command,
> showed me, both attr lines were gone. I did wait for about 5-10 minutes
> but it did not came back. I did that several times, with one or the
> other node or both being able to ping the ping node before
> disabling the switch ports.
> 
> I expected the transient pingd attributes that the nodes had, 
> A) not to disappear, but only get resetted to 0


Right.

> B) In case it is ok that they disappeared, I expected them to come back, 
> when they are receiving echo replies from the ping node again.

Also.

> But maybe I am still missing sth or misunderstood. Who is right, me or the 
> cluster?

This is what happened: Due to a membership issue which has been
only recently resolved, the crmd/cib combo would jointly leave
the cluster. Other cib clients where supposed to follow in order
to be started again by the master process and then connect to the
new cib instance. But attrd doesn't have such a feature. There's
now a bugzilla for that:

http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1776

In the meantime, you could try with a newer heartbeat version.

Thanks for the report.

Dejan

> output of hb_report is attached. 
> 
> kind regards
> Sebastian
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] pingd removed transient attr from node attributes after short network outage, and did not recreated it

Reply via email to