Hi Lon, Thank you for reply.
What I gathered from your response is to remove manual fencing at once. This will cause fence daemon to retry fence_bladecenter until the node is fenced. More likely the fenced will succeed in fencing the failed node(provided IP, user name and password for bladecenter management module are right); even if it times out for the first time. Am I right? I will try removing manual fencing and see how things go. >> If fencing is failing (permanently), you can still run: >> fence_ack_manual -e -n <nodename> By the way as per my understanding fence_ack_manual -n <node name> can be executed to acknowledge only manually fenced node(and not bladecenter fenced node), correct me if this understanding is wrong. So God forbid, if fence_bladecenter fails for some reason; we still have option to run fence_manual and then fence_ack_manual, so cluster is back to working. Thanks again and have great weekend ahead Yours truly, Parvez On Fri, Mar 4, 2011 at 10:45 PM, Lon Hohberger <[email protected]> wrote: > On Tue, Mar 01, 2011 at 06:50:18PM +0530, Parvez Shaikh wrote: > > Hi Ryan, > > > > Thank you for response. Does it mean there is no way to intimate > > administrator about failure of fencing as of now? > > > > Let me give more information about my cluster - > > > > I have set of nodes in cluster with only IP resource being protected. I > have > > two levels of fencing, first bladecenter fencing and second one is manual > > fencing. > > If the problem you have with fence_bladecenter is intermittent - for > example, if it fails 1/2 the time, fence_manual is going to *detract* > from your cluster's ability to recover automatically. > > Ordinarily, if a fencing action fails, fenced will automatically retry > the operation. > > When you configure fence_manual as a backup, this retry will *never* > occur, meaning your cluster hangs. > > > > At times if machine is already down(either power failure or turned off > > abrupty); blade center fencing timesout and manual fencing happens. At > this > > time, administrator is expected to run fence_ack_manual. > > > Clearly this is not something which is desirable, as downtime of services > is > > as long as administrator runs fence_ack_manual. > > > What is recommended method to deal with blade center fencing failure in > > this situation? Do I have to add another level of fencing(between blade > > center and manual) which can fence automatically(not requiring manual > > interference)? > > Start with removing fence_manual. > > If fencing is failing (permanently), you can still run: > > fence_ack_manual -e -n <nodename> > > > > > my bladecenter fencing agent, I sometimes get message saying > bladecenter > > > > fencing failed because of timeout or fence device IP address/user > > > > credentials are incorrect. > > ^^ This is why I think fence_manual is, in your specific case, very > likely hurting your availability. > > -- > Lon Hohberger - Red Hat, Inc. > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster >
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
