Hi Dejan Just a quick question. I cannot see your new log messages being logged to syslog
ocf_log warn "domain $1 reported as not running, but it is expected to be running! Retrying for $cnt seconds ... Do you know where I can set my logging to see warn level messages? I expected to see them in my testing by default but that does not seem to be true. Thanks Tom On 10/08/2013 05:04 PM, Dejan Muhamedagic wrote: > Hi, > > On Tue, Oct 08, 2013 at 01:52:56PM +0200, Ulrich Windl wrote: >> Hi! >> >> I thought, I'll never be bitten by this bug, but I actually was! Now I'm >> wondering whether the Xen RA sees the guest if you use pygrub, and pygrub is >> still counting down for actual boot... >> >> But the reason why I'm writing is that I think I've discovered another bug in >> the RA: >> >> CRM decided to "recover" the guest VM "v02": >> [...] >> lrmd: [14903]: info: operation monitor[28] on prm_xen_v02 for client 14906: >> pid 19516 exited with return code 7 >> [...] >> pengine: [14905]: notice: LogActions: Recover prm_xen_v02 (Started h05) >> [...] >> crmd: [14906]: info: te_rsc_command: Initiating action 5: stop >> prm_xen_v02_stop_0 on h05 (local) >> [...] >> Xen(prm_xen_v02)[19552]: INFO: Xen domain v02 already stopped. >> [...] >> lrmd: [14903]: info: operation stop[31] on prm_xen_v02 for client 14906: pid >> 19552 exited with return code 0 >> [...] >> crmd: [14906]: info: te_rsc_command: Initiating action 78: start >> prm_xen_v02_start_0 on h05 (local) >> lrmd: [14903]: info: rsc:prm_xen_v02 start[32] (pid 19686) >> [...] >> lrmd: [14903]: info: RA output: (prm_xen_v02:start:stderr) Error: Domain >> 'v02' >> already exists with ID '3' >> lrmd: [14903]: info: RA output: (prm_xen_v02:start:stdout) Using config file >> "/etc/xen/vm/v02". >> [...] >> lrmd: [14903]: info: operation start[32] on prm_xen_v02 for client 14906: pid >> 19686 exited with return code 1 >> [...] >> crmd: [14906]: info: process_lrm_event: LRM operation prm_xen_v02_start_0 >> (call=32, rc=1, cib-update=5271, confirmed=true) unknown error >> crmd: [14906]: WARN: status_from_rc: Action 78 (prm_xen_v02_start_0) on h05 >> failed (target: 0 vs. rc: 1): Error >> [...] >> >> As you can clearly see "start" failed, because the guest was found up >> already! >> IMHO this is a bug in the RA (SLES11 SP2: resource-agents-3.9.4-0.26.84). > Yes, I've seen that. It's basically the same issue, i.e. the > domain being gone for a while and then reappearing. > >> I guess the following test is problematic: >> --- >> xm create ${OCF_RESKEY_xmfile} name=$DOMAIN_NAME >> rc=$? >> if [ $rc -ne 0 ]; then >> return $OCF_ERR_GENERIC >> --- >> Here "xm create" probably fails if the guest is already created... > It should fail too. Note that this is a race, but the race is > anyway caused by the strange behaviour of xen. With the recent > fix (or workaround) in the RA, this shouldn't be happening. > > Thanks, > > Dejan > >> Regards, >> Ulrich >> >> >>>>> Dejan Muhamedagic <deja...@fastmail.fm> schrieb am 01.10.2013 um 12:24 in >> Nachricht <20131001102430.GA4687@walrus.homenet>: >>> Hi, >>> >>> On Tue, Oct 01, 2013 at 12:13:02PM +0200, Lars Marowsky-Bree wrote: >>>> On 2013-10-01T00:53:15, Tom Parker <tpar...@cbnco.com> wrote: >>>> >>>>> Thanks for paying attention to this issue (not really a bug) as I am >>>>> sure I am not the only one with this issue. For now I have set all my >>>>> VMs to destroy so that the cluster is the only thing managing them but >>>>> this is not super clean as I get failures in my logs that are not really >>>>> failures. >>>> It is very much a severe bug. >>>> >>>> The Xen RA has gained a workaround for this now, but we're also pushing >>> Take a look here: >>> >>> https://github.com/ClusterLabs/resource-agents/pull/314 >>> >>> Thanks, >>> >>> Dejan >>> >>>> the Xen team (where the real problem is) to investigate and fix. >>>> >>>> >>>> Regards, >>>> Lars >>>> >>>> -- >>>> Architect Storage/HA >>>> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, >>> HRB 21284 (AG Nürnberg) >>>> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde >>>> >>>> _______________________________________________ >>>> Linux-HA mailing list >>>> Linux-HA@lists.linux-ha.org >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>> See also: http://linux-ha.org/ReportingProblems >>> _______________________________________________ >>> Linux-HA mailing list >>> Linux-HA@lists.linux-ha.org >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >> >> _______________________________________________ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems