Hi Dejan

Just a quick question.  I cannot see your new log messages being logged
to syslog

ocf_log warn "domain $1 reported as not running, but it is expected to
be running! Retrying for $cnt seconds ...

Do you know where I can set my logging to see warn level messages?  I
expected to see them in my testing by default but that does not seem to
be true.

Thanks

Tom


On 10/08/2013 05:04 PM, Dejan Muhamedagic wrote:
> Hi,
>
> On Tue, Oct 08, 2013 at 01:52:56PM +0200, Ulrich Windl wrote:
>> Hi!
>>
>> I thought, I'll never be bitten by this bug, but I actually was! Now I'm
>> wondering whether the Xen RA sees the guest if you use pygrub, and pygrub is
>> still counting down for actual boot...
>>
>> But the reason why I'm writing is that I think I've discovered another bug in
>> the RA:
>>
>> CRM decided to "recover" the guest VM "v02":
>> [...]
>> lrmd: [14903]: info: operation monitor[28] on prm_xen_v02 for client 14906:
>> pid 19516 exited with return code 7
>> [...]
>>  pengine: [14905]: notice: LogActions: Recover prm_xen_v02 (Started h05)
>> [...]
>>  crmd: [14906]: info: te_rsc_command: Initiating action 5: stop
>> prm_xen_v02_stop_0 on h05 (local)
>> [...]
>> Xen(prm_xen_v02)[19552]: INFO: Xen domain v02 already stopped.
>> [...]
>> lrmd: [14903]: info: operation stop[31] on prm_xen_v02 for client 14906: pid
>> 19552 exited with return code 0
>> [...]
>> crmd: [14906]: info: te_rsc_command: Initiating action 78: start
>> prm_xen_v02_start_0 on h05 (local)
>> lrmd: [14903]: info: rsc:prm_xen_v02 start[32] (pid 19686)
>> [...]
>> lrmd: [14903]: info: RA output: (prm_xen_v02:start:stderr) Error: Domain 
>> 'v02'
>> already exists with ID '3'
>> lrmd: [14903]: info: RA output: (prm_xen_v02:start:stdout) Using config file
>> "/etc/xen/vm/v02".
>> [...]
>> lrmd: [14903]: info: operation start[32] on prm_xen_v02 for client 14906: pid
>> 19686 exited with return code 1
>> [...]
>> crmd: [14906]: info: process_lrm_event: LRM operation prm_xen_v02_start_0
>> (call=32, rc=1, cib-update=5271, confirmed=true) unknown error
>> crmd: [14906]: WARN: status_from_rc: Action 78 (prm_xen_v02_start_0) on h05
>> failed (target: 0 vs. rc: 1): Error
>> [...]
>>
>> As you can clearly see "start" failed, because the guest was found up 
>> already!
>> IMHO this is a bug in the RA (SLES11 SP2: resource-agents-3.9.4-0.26.84).
> Yes, I've seen that. It's basically the same issue, i.e. the
> domain being gone for a while and then reappearing.
>
>> I guess the following test is problematic:
>> ---
>>   xm create ${OCF_RESKEY_xmfile} name=$DOMAIN_NAME
>>   rc=$?
>>   if [ $rc -ne 0 ]; then
>>     return $OCF_ERR_GENERIC
>> ---
>> Here "xm create" probably fails if the guest is already created...
> It should fail too. Note that this is a race, but the race is
> anyway caused by the strange behaviour of xen. With the recent
> fix (or workaround) in the RA, this shouldn't be happening.
>
> Thanks,
>
> Dejan
>
>> Regards,
>> Ulrich
>>
>>
>>>>> Dejan Muhamedagic <deja...@fastmail.fm> schrieb am 01.10.2013 um 12:24 in
>> Nachricht <20131001102430.GA4687@walrus.homenet>:
>>> Hi,
>>>
>>> On Tue, Oct 01, 2013 at 12:13:02PM +0200, Lars Marowsky-Bree wrote:
>>>> On 2013-10-01T00:53:15, Tom Parker <tpar...@cbnco.com> wrote:
>>>>
>>>>> Thanks for paying attention to this issue (not really a bug) as I am
>>>>> sure I am not the only one with this issue.  For now I have set all my
>>>>> VMs to destroy so that the cluster is the only thing managing them but
>>>>> this is not super clean as I get failures in my logs that are not really
>>>>> failures.
>>>> It is very much a severe bug.
>>>>
>>>> The Xen RA has gained a workaround for this now, but we're also pushing
>>> Take a look here:
>>>
>>> https://github.com/ClusterLabs/resource-agents/pull/314 
>>>
>>> Thanks,
>>>
>>> Dejan
>>>
>>>> the Xen team (where the real problem is) to investigate and fix.
>>>>
>>>>
>>>> Regards,
>>>>     Lars
>>>>
>>>> -- 
>>>> Architect Storage/HA
>>>> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,
>>> HRB 21284 (AG Nürnberg)
>>>> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>>>>
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> Linux-HA@lists.linux-ha.org 
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
>>>> See also: http://linux-ha.org/ReportingProblems 
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA@lists.linux-ha.org 
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
>>> See also: http://linux-ha.org/ReportingProblems 
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to