Hi Alan,

On Mon, Jul 30, 2012 at 10:14:27AM -0600, Alan Robertson wrote:
> The LRM treats operation timeouts as ERROR:s - not just failed 
> operations that give warnings.  This violates the meaning of ERROR: 
> messages in the code.
> 
> We reserved ERROR: messages for things that the software did not expect 
> - and therefore possibly could not be properly recovered from.  In this 
> case, the behavior is perfectly expected and the condition will be 
> properly recovered from.  It just means the operation in question failed.
> 
> An sample message:
>      ERROR: process_lrm_event: LRM operation agent-da:3_monitor_5000 
> (47) Timed Out (timeout=60000ms)
> 
> Because of this one message, you can't tell customers "If you ever have 
> an ERROR: message, the HA software has failed".
> 
> This ought to just be a warning, like any other failed action...

I guess that ERROR is used because resource agents use the same
severity when reporting failures they cannot recover from. In
this case, the RA won't log anything, so the lrmd does that on
its behalf. That seems OK to me. The other option would be to
remove the ERROR severity log messages in all RA, because a
resource problem should normally always be recoverable.

Cheers,

Dejan

> -- 
>      Alan Robertson <al...@unix.sh> - @OSSAlanR
> 
> "Openness is the foundation and preservative of friendship...  Let me claim 
> from you at all times your undisguised opinions." - William Wilberforce
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to