Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module

Zhang, Chen Mon, 09 Mar 2020 02:33:14 -0700


On 3/4/2020 9:37 PM, Paolo Bonzini wrote:

On 04/03/20 09:06, Zhang, Chen wrote:

Hi Eric and Paolo, Can you give some comments about this series?

No news for a while...
We already have some users(Cloud Service Provider) try to use is module in 
their product.
But they also need to follow the Qemu upstream code.

My main comment about this series is that it's not clear why it is
needed and how to use it.  The documentation includes a demo, but no
description of what is an awd_node, a notification_node and an
opt_script.  I can more or less understand the notification_node and
opt_script role from the documentation, but not entirely because, for
example, the two-host demo has hardcoded IP addresses without saying
which host is which IP address.


Hi Paolo,

Sorry for slow reply and thank you for your comments.

Let me summarize your main opinions and methods:

1. Why AWD is needed.

Advanced Watch Dog is an universal monitoring module on VMM side, it canbe used to detect network down(VMM to guest, VMM to VMM, VMM to anotherremote server) and do previously set operation. Current AWD patch justaccept any input as the signal to refresh the watchdog timer, and we canalso make a certain interactive protocol here. For the outputs, user canpre-write some command or some messages in the AWD opt-script. Wenoticed that there is no way for VMM communicate directly, maybe somepeople think we don't need such things(up layer software like openstackcan handle it). so we engaged with real customer found that they need alightweight and efficient mechanism to solve some practical problems,

For example Edge Computing cases(they think high level software is tooheavy to use in Edge or it is hard to manage and combine with VM instance).It make user have basic VM/Host network monitoring tools and basic falsetolerance and recovery solution.


For COLO FT/HA solution, we already have some CSPs try to use AWD with COLO.

2. Documentation issues, include how to use it.

I will address all your comments and complete details about documentation.

3. Communication protocol issue.

Current AWD without any protocol, any data it gets will be considered aheartbeat signal.


I think use QMP format is good for me.

4. Implementation issue.

The AWD script as an optional feature is OK for me.

And report the triggering of the watchdog via QMP events is enough forcurrent usage.

But it looks have limitation to notify outside Qemu. I don't know whichis better choice.


If the QMP events solution is better, I will fix it in next version.


I don't know if I understand your means correctly.

Please give me more guidance on this series.  :-)

Thanks

Zhang Chen


The documentation does not describe the protocol, which is absolutely
necessary, and does not describe _why_ the protocol was designed like
that.  Without such documentation it's not clear if, for example, the
watchdog protocol could be implemented as QMP commands (e.g.
start-watchdog, stop-watchdog, notify-watchdog).  Another possibility
could be to use the systemd watchdog protocol, which consists of
essentially three commands (WATCHDOG=1, WATCHDOG=trigger,
WATCHDOG_USEC=...) which are transmitted as datagrams.  Documentation is
important for reviewers to judge the merits of the protocol without (or
before) diving into the code.

In the demo, the opt_script mechanism is currently using the "human"
monitor as opposed to QMP.  The human monitor interface is not stable
and not meant for consumption by management interface.  It is not clear
if this is just a sample usage, and in practice the notification_node
would be outside of QEMU, or not.  In general I would prefer to have the
script as an optional feature, and report the triggering of the watchdog
via QMP events.

Paolo

Re: [PATCH V4 0/5] Introduce Advanced Watch Dog module

Reply via email to