[Pacemaker] Seeking suggestions for cluster configuration of HA iSCSI target and initiators

Phil Frost Mon, 16 Jul 2012 09:14:31 -0700

I'm designing a cluster to run both iSCSI targets and initiators toultimately provide block devices to virtual machines. I'm consideringthe case of a target failure, and how to handle that as gracefully aspossible. Ideally, IO may be paused until the target recovers, but VMsdo not restart or see IO errors.

I've observed that the iscsi RA will configure the initiator to retryconnections indefinitely if the target should fail. This is mostly good,except that if the initiator is in the retrying state, the monitoraction will return an error.

The Right Thing to do in this case, I would think, would be to justwait. Of course the initiators can't work if the target is down, but theinitiators will recover automatically when the target recovers. Ideallythe cluster would wait for the target (which it also manages) torecover, then try again to monitor the initiators. For good measure, itmight try monitoring the initiators a couple times, since it can takethem a moment to reconnect.

Unfortunately, what actually happens is the monitor action on theinitiator fails. Pacemaker then attempts to stop the initiator, and thatalso fails, because the target is still unavailable. Then the initiatornode gets STONITHed, taking out all the hosted VMs with it.

I added a mandatory, non-symmetrical order constraint of target ->initiator, so at least Pacemaker will not attempt to re-start theinitiator after a target failure. I made it asymetrical so that restartsof the target do not force restarts of the initiator. However, itdoesn't do much to help the failed-target case.

What's a good solution? Is there some way to suspend monitoring of theinitiators if pacemaker knows the target is failed? I suppose I couldmodify the iscsi RA to return success for monitor in the case that theinitiator is attempting to reconnect to the target, but then what ifactually the initiator has failed, and the target is operational? Whatthen about race conditions that might exist in cases where the targethas failed, but pacemaker has not yet detected the target failure thougha monitor operation?



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Seeking suggestions for cluster configuration of HA iSCSI target and initiators

Reply via email to