On Thu, Sep 25, 2008 at 04:36:05PM +0900, Satomi TANIGUCHI wrote:
> Hi,
>
>
> Dejan Muhamedagic wrote:
>> Hi,
>>
>> On Wed, Sep 24, 2008 at 05:30:35PM +0900, Satomi TANIGUCHI wrote:
>>
>>>>>> - fencing operation timeouts per stonith resource (stonithd)
>>>>> ack
>>>> http://hg.clusterlabs.org/pacemaker/dev/rev/0f17d8472570
>>>> http://hg.clusterlabs.org/pacemaker/dev/rev/785fb0d9d821
>>>>
>>>> The timeouts are taken from the "start" operation. Even though it
>>>> may not be obvious that this timeout is used for the fencing
>>>> operations as well, I think that it still makes more sense than
>>>> making an extra instance attribute. Any objections?
>>> Maybe, users are at a loss what to do when they want to set fence op's 
>>> timeout, I think.
>>> Adding "stonith-timeout" in <instance_attributes> seems to be a better 
>>> way...
>>
>> It would be very easy to implement that. But I'm still not sure
>> if that's really a better way. Currently, the timeout is picked
>> from the start operation and, if that's not set, from the fencing
>> request which comes either from the crmd or stonithd itself.
>> Well, if you insist, we could have the instance attribute
>> override all other timeouts.
> I still consider that to add an attribute in <instance_attributes> is
> a better one.
> Start and reset are different operations.
> Start op is to check whether stonith device's setting is enable or not,
> but reset op needs to wait for the target node to die.
> It's curious that both ops' timeout values are the same.

The start operation implies a monitor (or status) operation. This
one is supposed to access the device and verify that it is
operational. With most devices this takes as much time as a power
management command. So, knowing the way this works, I didn't find
it curious.

Anyway, let's do it the way you suggest, i.e. assign an
attribute which will hold explicit timeout for fencing
operations. I would name it 'fence-timeout' -- the "stonith" term
is overloaded meaning both the stonith resource and the fencing
method.

> If the attribute is not set, to use cluster_delay is a natural way, I think.
> (Or use default-action-timeout? Which is natural?)

We should keep the existing cluster_delay (which is halfed for
this purpose) to avoid breaking the existing setups. Though users
should be advised to update their configurations with the new
timeout setting.

Thanks,

Dejan
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to