On Thu, Sep 25, 2008 at 04:36:05PM +0900, Satomi TANIGUCHI wrote: > Hi, > > > Dejan Muhamedagic wrote: >> Hi, >> >> On Wed, Sep 24, 2008 at 05:30:35PM +0900, Satomi TANIGUCHI wrote: >> >>>>>> - fencing operation timeouts per stonith resource (stonithd) >>>>> ack >>>> http://hg.clusterlabs.org/pacemaker/dev/rev/0f17d8472570 >>>> http://hg.clusterlabs.org/pacemaker/dev/rev/785fb0d9d821 >>>> >>>> The timeouts are taken from the "start" operation. Even though it >>>> may not be obvious that this timeout is used for the fencing >>>> operations as well, I think that it still makes more sense than >>>> making an extra instance attribute. Any objections? >>> Maybe, users are at a loss what to do when they want to set fence op's >>> timeout, I think. >>> Adding "stonith-timeout" in <instance_attributes> seems to be a better >>> way... >> >> It would be very easy to implement that. But I'm still not sure >> if that's really a better way. Currently, the timeout is picked >> from the start operation and, if that's not set, from the fencing >> request which comes either from the crmd or stonithd itself. >> Well, if you insist, we could have the instance attribute >> override all other timeouts. > I still consider that to add an attribute in <instance_attributes> is > a better one. > Start and reset are different operations. > Start op is to check whether stonith device's setting is enable or not, > but reset op needs to wait for the target node to die. > It's curious that both ops' timeout values are the same.
The start operation implies a monitor (or status) operation. This one is supposed to access the device and verify that it is operational. With most devices this takes as much time as a power management command. So, knowing the way this works, I didn't find it curious. Anyway, let's do it the way you suggest, i.e. assign an attribute which will hold explicit timeout for fencing operations. I would name it 'fence-timeout' -- the "stonith" term is overloaded meaning both the stonith resource and the fencing method. > If the attribute is not set, to use cluster_delay is a natural way, I think. > (Or use default-action-timeout? Which is natural?) We should keep the existing cluster_delay (which is halfed for this purpose) to avoid breaking the existing setups. Though users should be advised to update their configurations with the new timeout setting. Thanks, Dejan _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
