Dejan Muhamedagic wrote:
On Thu, Sep 25, 2008 at 04:36:05PM +0900, Satomi TANIGUCHI wrote:
Hi,
Dejan Muhamedagic wrote:
Hi,
On Wed, Sep 24, 2008 at 05:30:35PM +0900, Satomi TANIGUCHI wrote:
- fencing operation timeouts per stonith resource (stonithd)
ack
http://hg.clusterlabs.org/pacemaker/dev/rev/0f17d8472570
http://hg.clusterlabs.org/pacemaker/dev/rev/785fb0d9d821
The timeouts are taken from the "start" operation. Even though it
may not be obvious that this timeout is used for the fencing
operations as well, I think that it still makes more sense than
making an extra instance attribute. Any objections?
Maybe, users are at a loss what to do when they want to set fence op's
timeout, I think.
Adding "stonith-timeout" in <instance_attributes> seems to be a better way...
It would be very easy to implement that. But I'm still not sure
if that's really a better way. Currently, the timeout is picked
from the start operation and, if that's not set, from the fencing
request which comes either from the crmd or stonithd itself.
Well, if you insist, we could have the instance attribute
override all other timeouts.
I still consider that to add an attribute in <instance_attributes> is
a better one.
Start and reset are different operations.
Start op is to check whether stonith device's setting is enable or not,
but reset op needs to wait for the target node to die.
It's curious that both ops' timeout values are the same.
The start operation implies a monitor (or status) operation. This
one is supposed to access the device and verify that it is
operational. With most devices this takes as much time as a power
management command. So, knowing the way this works, I didn't find
it curious.
Anyway, let's do it the way you suggest, i.e. assign an
attribute which will hold explicit timeout for fencing
operations. I would name it 'fence-timeout' -- the "stonith" term
is overloaded meaning both the stonith resource and the fencing
method.
Thanks a lot for your magnanimity.
I'll test these changes.
Thanks again!
Best Regards,
Satomi TANIGUCHI
If the attribute is not set, to use cluster_delay is a natural way, I think.
(Or use default-action-timeout? Which is natural?)
We should keep the existing cluster_delay (which is halfed for
this purpose) to avoid breaking the existing setups. Though users
should be advised to update their configurations with the new
timeout setting.
Thanks,
Dejan
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/