On 08/07/2012 08:18 AM, Dejan Muhamedagic wrote: > Hi Alan, > > On Mon, Jul 30, 2012 at 10:14:27AM -0600, Alan Robertson wrote: >> The LRM treats operation timeouts as ERROR:s - not just failed >> operations that give warnings. This violates the meaning of ERROR: >> messages in the code. >> >> We reserved ERROR: messages for things that the software did not expect >> - and therefore possibly could not be properly recovered from. In this >> case, the behavior is perfectly expected and the condition will be >> properly recovered from. It just means the operation in question failed. >> >> An sample message: >> ERROR: process_lrm_event: LRM operation agent-da:3_monitor_5000 >> (47) Timed Out (timeout=60000ms) >> >> Because of this one message, you can't tell customers "If you ever have >> an ERROR: message, the HA software has failed". >> >> This ought to just be a warning, like any other failed action... > I guess that ERROR is used because resource agents use the same > severity when reporting failures they cannot recover from. In > this case, the RA won't log anything, so the lrmd does that on > its behalf. That seems OK to me. The other option would be to > remove the ERROR severity log messages in all RA, because a > resource problem should normally always be recoverable. The exceptions that print ERROR: should be relegated to things like "The CRM gave me a command I didn't understand, or referenced a resource that I don't know about" -- and similar things that really shouldn't happen.
Or that's how it seems to me anyway... -- Alan Robertson <al...@unix.sh> - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/