The LRM treats operation timeouts as ERROR:s - not just failed 
operations that give warnings.  This violates the meaning of ERROR: 
messages in the code.

We reserved ERROR: messages for things that the software did not expect 
- and therefore possibly could not be properly recovered from.  In this 
case, the behavior is perfectly expected and the condition will be 
properly recovered from.  It just means the operation in question failed.

An sample message:
     ERROR: process_lrm_event: LRM operation agent-da:3_monitor_5000 
(47) Timed Out (timeout=60000ms)

Because of this one message, you can't tell customers "If you ever have 
an ERROR: message, the HA software has failed".

This ought to just be a warning, like any other failed action...

-- 
     Alan Robertson <al...@unix.sh> - @OSSAlanR

"Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions." - William Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to