Hi, Serge.
Sorry for not replying to you sooner.
I consider 3 problems with STONITH operation in Xen environment.
I would like to hear your opinion.
[PROBLEM 1: About fence operation timeout]
I consider the case that two or more xm commands are executed in parallel.
For example, in the case that STONITH is failed locally, and stonithd tries to
execute remote STONITH operation.
First of all...
In the case that two or more xm commands are executed at the same time,
The later one waits until the former is done.
Maybe it is a specification of xm command.
(Even if xm list is executed during getting dump-core,
it has to wait for a while until dumping is completed...)
Here is the time required from starting 1st xm dump-core to the finish of
the last one. These commands get dump of the same domain-U,
which has 1GB memory.
The number in the left means the number of xm dump-core which runs in
parallel.
1 -> 6.557s
2 -> 11.955s
3 -> 17.573s
4 -> 23.249s
Then, for example,
when one domain-U is STONITH'ed from other two ore more domain-U,
it is very slow to finish.
It is an usual case that when several domain-U exists in a cluster.
In addition, when the load of the server is high, it takes longer for
getting dump.
And STONITH the operation's timed-out might pop.
I know timeout can be set with "stonith-timeout" parameter.
But, I think that it is too difficult to set(decide) the value for users.
Because there are too many elements to be considered.
the size of domain-U's memory, the power of each domain-0's CPU, the load when
STONITH is executed, the number of domain-U which might execute STONITH against
one domain-U, the number of domain-U which might execute STONITH at the same
time against two or more domain-U on one domain-0, and so on...
And, of course, getting domain-U's dump is so important function for failure
analysis, so it is necessary, but if xen0 has a specification which
"it waits until xm dump-core command is completed", it means
"F/O takes long time to be completed".
It is not good for users.
Then, I consider the specification that it doesn't wait until the dump is over.
How about the execution of "xm dump-core" and "xm create" as background?
For example, see the following.
[ex.1]
$SSH_COMMAND $dom0 "(xm dump-core -C ${kill_node}; xm create ${kill_node})" &
And check whether STONITH succeeded or failed with SSH_COMMAND's return code.
When SSH_COMMAND is successful, xen0 considers "STONITH is completed".
And when it is failed, xen0 considers "STONITH is failed!".
By this modification,
xen0 doesn't wait for completion of dumping and re-starting of domain-U,
so STONITH operation finishes earlier.
But, then it can't check dead or alive of domain-U with ping command or xm list
command...
[PROBLEM 2: interruption during fencing]
Internal process of "xm dump-core" is as follows.
pause domain-U -> get dump -> unpause domain-U
Fencing process in xen0 consists of xm dump-core -> xm destroy, doesn't it?
Then, xm command has a specification which waits until former one is completed,
not executed in parallel, so if other xm command is executed during
xm dump-core in fencing process (and before xm destroy in the process is not
executed yet), the "other xm command" breaks into the fencing process.
If the "other xm command" takes long time, as if dump-core does,
a domain-U works again (because it is unpaused) between xm dump-core and
xm destroy, and then some resources in this node become active again!!
So, I think using "xm dump-core -C" command is a better way.
With -C option, internal process of "xm dump-core" is:
pause domain-U -> get dump -> destroy domain-U
So, this is effective to fix this problem, I think.
Of course, when "run_dump" parameter is not set, use just "xm destroy"
to stop domain-U.
[PROBLEM 3: getting dump redundantly]
When two or more domain-U try to STONITH one domain-U at the same time,
for example remote-fencing is executed,
domain-0 gets two or more dump-core files.
Maybe each of them is over 1GB, it depends on domain-U's memory size,
so they take over domain-0's disk storages terribly and unnecessarily.
Then, how about the following?
Check whether it is during getting dump-core for the target domain-U or not.
If it is during dumping, xen0 considers "STONITH is completed".
In other words, the later STONITH operation with xen0 exits normally
without doing anything.
And if not, it shifts to fencing process.
As the way to check this, I intend to use ps command, like
ps ax | grep "xm dump-core -C domain-U" ... or something.
When an administrator xm dump-core (-C) manually, then
all resouces on the domain-U are paused, or if the worst happens,
dumping takes longer time than deadtime, and the domain-U is STONITH'ed.
In addition, with -C option, it destroys the domain-U after dumping.
Then, executing xm dump-core (-C) manually is not normal operation
in cluster management, I think.
So, I consider that maybe it is no problem to use process's existence
as criterion for judgement.
The above is summarized as follows:
How about adding these functions to xen0?
(1) execute xm dump-core or xm destroy and xm create at once via SSH
for example the code [ex.1].
(2) execute (1) as background process for not waiting for dump's completion.
(3) check return code of SSH to judge whether STONITH succeeded or not.
(4) add -C option to xm dump-core instead of "xm dump-core + xm destroy".
(5) check "xm dump-core -C" process's existence to avoid dumping redundantly.
Your comments and suggestions are really appreciated.
> Hi, Serge.
>
> Sorry for not replying to you sooner.
> I have tested your patch.
> It is unquestionable.
> Thanks.
>
> Incidentally, in the case that xen config file has space before and
> behind "=" (for example, "name = domain-U"),
> It always passes through the check processing.
> If the TRIM processing is added, it becomes better.
>
> By the way, I think of other some problems.
> (run in parallel and timeout, etc...)
>
> Therefore, please wait a little more.
>
> Regards,
> Yoshihiko SATO.
>
>> Did it work for you? Shall we ask Dejan to commit this patch?
>>
>> On Wed, Apr 15, 2009 at 12:13 AM, Yoshihiko SATO
>> <[email protected]> wrote:
>>> Hello Serge,
>>>
>>> Thank you so much for your quick action!
>>> I'll test the patch.
>>>
>>>
>>> Regards,
>>> Yoshihiko SATO.
>>>
>>>> Attached is a patch that checks that DomU disappears from the "xm
>>>> list" on Dom0 after running destroy.
>>>>
>>>> On Mon, Apr 13, 2009 at 10:03 PM, Serge Dubrouski <[email protected]>
>>>> wrote:
>>>>> Hello -
>>>>>
>>>>> This makes sense and I''ll think how to implement that. Thank for the
>>>>> suggestion.
>>>>>
>>>>> 2009/4/13 Yoshihiko SATO <[email protected]>:
>>>>>> Hi Serge,
>>>>>>
>>>>>> I consider about the case that two or more plugins are set in cib.xml.
>>>>>> For example, xen0(STONITH plugin for DomU) and ibmrsa-telnet(the one for
>>>>>> Dom0) or something.
>>>>>> The setting's purpose is to STONITH Dom0 when xen0 failed to STONITH
>>>>>> DomU.
>>>>>> Then, I found the following problem about xen0's fence(off|reset)
>>>>>> action.
>>>>>>
>>>>>> xen0 doesn't check the return code of xm destroy.
>>>>>> Instead, it check the target DomU is dead or alive with ping command in
>>>>>> CheckIfDead(), right?
>>>>>> However, ping does not receive any reply packets at all
>>>>>> not only when DomU is normally STONITH'ed but when kernel panic or
>>>>>> kernel hang occurs on Dom0.
>>>>>> In the case that failure occurs on Dom0, xen0 judges "the fence action
>>>>>> succeeded", by mistake.
>>>>>> Then, STONITH plugin which is able to STONITH Dom0 (like ibmrsa-telnet
>>>>>> etc.) is not executed.
>>>>>> So, I consider that it should confirm whether xm destroy via ssh
>>>>>> succeeded or not.
>>>>>> And it is better to check whether the target is dead with ping only when
>>>>>> the command succeeded.
>>>>>> If xm destroy is failed, xen0 should return "fence action is failed", I
>>>>>> think.
>>>>>> What do you think about this?
>>>>>> I would like to hear any opinion.
>>>>>>
>>>>>> Best regards,
>>>>>> Yoshihiko SATO
>>>>>> _______________________________________________________
>>>>>> Linux-HA-Dev: [email protected]
>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>>>>>> Home Page: http://linux-ha.org/
>>>>>>
>>>>> --
>>>>> Serge Dubrouski.
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________________
>>>> Linux-HA-Dev: [email protected]
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>>>> Home Page: http://linux-ha.org/
>>> _______________________________________________________
>>> Linux-HA-Dev: [email protected]
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>>> Home Page: http://linux-ha.org/
>>>
>>
>>
>
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/