On 08/07/2012 08:18 AM, Dejan Muhamedagic wrote:
> Hi Alan,
>
> On Mon, Jul 30, 2012 at 10:14:27AM -0600, Alan Robertson wrote:
>> The LRM treats operation timeouts as ERROR:s - not just failed
>> operations that give warnings.  This violates the meaning of ERROR:
>> messages in the code.
>>
>> We reserved ERROR: messages for things that the software did not expect
>> - and therefore possibly could not be properly recovered from.  In this
>> case, the behavior is perfectly expected and the condition will be
>> properly recovered from.  It just means the operation in question failed.
>>
>> An sample message:
>>       ERROR: process_lrm_event: LRM operation agent-da:3_monitor_5000
>> (47) Timed Out (timeout=60000ms)
>>
>> Because of this one message, you can't tell customers "If you ever have
>> an ERROR: message, the HA software has failed".
>>
>> This ought to just be a warning, like any other failed action...
> I guess that ERROR is used because resource agents use the same
> severity when reporting failures they cannot recover from. In
> this case, the RA won't log anything, so the lrmd does that on
> its behalf. That seems OK to me. The other option would be to
> remove the ERROR severity log messages in all RA, because a
> resource problem should normally always be recoverable.
The exceptions that print ERROR: should be relegated to things like "The 
CRM gave me a command I didn't understand, or referenced a resource that 
I don't know about" -- and similar things that really shouldn't happen.

Or that's how it seems to me anyway...


-- 
     Alan Robertson <al...@unix.sh> - @OSSAlanR

"Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions." - William Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to