On 3/4/2020 9:37 PM, Paolo Bonzini wrote:
On 04/03/20 09:06, Zhang, Chen wrote:
Hi Eric and Paolo, Can you give some comments about this series?
No news for a while...
We already have some users(Cloud Service Provider) try to use is module in
their product.
But they also need to follow the Qemu upstream code.
My main comment about this series is that it's not clear why it is
needed and how to use it. The documentation includes a demo, but no
description of what is an awd_node, a notification_node and an
opt_script. I can more or less understand the notification_node and
opt_script role from the documentation, but not entirely because, for
example, the two-host demo has hardcoded IP addresses without saying
which host is which IP address.
Hi Paolo,
Sorry for slow reply and thank you for your comments.
Let me summarize your main opinions and methods:
1. Why AWD is needed.
Advanced Watch Dog is an universal monitoring module on VMM side, it can
be used to detect network down(VMM to guest, VMM to VMM, VMM to another
remote server) and do previously set operation. Current AWD patch just
accept any input as the signal to refresh the watchdog timer, and we can
also make a certain interactive protocol here. For the outputs, user can
pre-write some command or some messages in the AWD opt-script. We
noticed that there is no way for VMM communicate directly, maybe some
people think we don't need such things(up layer software like openstack
can handle it). so we engaged with real customer found that they need a
lightweight and efficient mechanism to solve some practical problems,
For example Edge Computing cases(they think high level software is too
heavy to use in Edge or it is hard to manage and combine with VM instance).
It make user have basic VM/Host network monitoring tools and basic false
tolerance and recovery solution.
For COLO FT/HA solution, we already have some CSPs try to use AWD with COLO.
2. Documentation issues, include how to use it.
I will address all your comments and complete details about documentation.
3. Communication protocol issue.
Current AWD without any protocol, any data it gets will be considered a
heartbeat signal.
I think use QMP format is good for me.
4. Implementation issue.
The AWD script as an optional feature is OK for me.
And report the triggering of the watchdog via QMP events is enough for
current usage.
But it looks have limitation to notify outside Qemu. I don't know which
is better choice.
If the QMP events solution is better, I will fix it in next version.
I don't know if I understand your means correctly.
Please give me more guidance on this series. :-)
Thanks
Zhang Chen
The documentation does not describe the protocol, which is absolutely
necessary, and does not describe _why_ the protocol was designed like
that. Without such documentation it's not clear if, for example, the
watchdog protocol could be implemented as QMP commands (e.g.
start-watchdog, stop-watchdog, notify-watchdog). Another possibility
could be to use the systemd watchdog protocol, which consists of
essentially three commands (WATCHDOG=1, WATCHDOG=trigger,
WATCHDOG_USEC=...) which are transmitted as datagrams. Documentation is
important for reviewers to judge the merits of the protocol without (or
before) diving into the code.
In the demo, the opt_script mechanism is currently using the "human"
monitor as opposed to QMP. The human monitor interface is not stable
and not meant for consumption by management interface. It is not clear
if this is just a sample usage, and in practice the notification_node
would be outside of QEMU, or not. In general I would prefer to have the
script as an optional feature, and report the triggering of the watchdog
via QMP events.
Paolo