On Fri, Feb 22, 2013 at 4:11 PM, Lars Kellogg-Stedman <l...@oddbit.com> wrote: >> > You'd think that would help, but >> > >> > https://bugzilla.redhat.com/show_bug.cgi?id=880035 suggests otherwise. >> > I have one remaining fedora machine where KVM clusters still work, I >> > don't think I'll ever update it now. >> > > > > Well, that was fascinating read. > > Using the udpu transport seems to have stabilized corosync. If I understand > that bug report correctly I should also see better multicast behavior if I > enable the multicast_querier, but I'm happy with udpu for now. This lets me > focus on the other things that are acting oddly: > > Trying to add a monitor to a systemd: resource, like this: > > pcs resource create httpd systemd:httpd op monitor interval=30s > > Which generates this in the cib: > > -- <cib admin_epoch="0" epoch="7" num_updates="25" /> ++ <primitive > class="systemd" id="httpd" type="httpd" > ++ <instance_attributes > id="httpd-instance_attributes" /> ++ <operations > ++ <op > id="httpd-monitor-interval-30s" interval="30s" name="monitor" /> ++ > </operations> ++ </primitive> > Results in the service never successfully starting:
Logs from the lrmd? selinux enabled? > > notice: process_lrm_event: LRM operation httpd_monitor_0 (call=10, rc=7, > cib-update=30, confirmed=true) not running notice: process_lrm_event: LRM > operation httpd_start_0 (call=13, rc=0, cib-update=31, confirmed=true) ok > notice: process_lrm_event: LRM operation httpd_monitor_30000 (call=16, rc=7, > cib-update=32, confirmed=false) not running warning: status_from_rc: Action > 11 (httpd_monitor_30000) on puppet0 failed (target: 0 vs. rc: 7): Error > warning: update_failcount: Updating failcount for httpd on puppet0 after > failed monitor: rc=7 (update=value++, time=1361503742) notice: run_graph: > Transition 2 (Complete=7, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-2278.bz2): Complete notice: > attrd_trigger_update: Sending flush op to all hosts for: fail-count-httpd > (1) notice: attrd_perform_update: Sent update 11: fail-count-httpd=1 notice: > attrd_trigger_update: Sending flush op to all hosts for: last-failure-httpd > (1361503742) notice: attrd_perform_update: Sent update 14: > last-failure-httpd=1361503742 warning: unpack_rsc_op: Processing failed op > monitor for httpd on puppet0: not running (7) notice: LogActions: Recover > httpd#011(Started puppet0) notice: process_pe_message: Calculated Transition > 3: /var/lib/pacemaker/pengine/pe-input-2279.bz2 warning: unpack_rsc_op: > Processing failed op monitor for httpd on puppet0: not running (7) notice: > LogActions: Recover httpd#011(Started puppet0) notice: process_pe_message: > Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-2280.bz2 > warning: unpack_rsc_op: Processing failed op monitor for httpd on puppet0: > not running (7) > > This will continue until pacemaker declares the service FAILED, even though > httpd (in this example) starts up manually (with "systemctl start httpd") > without a problem. For what it's worth, the dbus method call to get the > ActiveState property appears to work: > > # systemctl start httpd > # gdbus call --system --dest org.freedesktop.systemd1 --object-path > /org/freedesktop/systemd1/unit/httpd_2eservice -m > org.freedesktop.DBus.Properties.Get org.freedesktop.systemd1.Unit > ActiveState > (<'active'>,) > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org