Package: qemu-guest-agent
Version: 1:2.1+dfsg-12+deb8u6
Severity: important

On Jessie I noticed that several of my Proxmox backups was hanging forever
on Jessie VMs with
qemu guest agent enabled, while my Stretch VMs are OK.

Upon further inspection, backups (or actually fs freeze), do not work until
after restarting
qemu-guest-agent.

This is from the systemd journal after boot. Note the "transport endpoint
not found" warning.
####
Apr 12 16:26:32 cns-2 systemd[1]: Starting LSB: QEMU Guest Agent startup
script...
Apr 12 16:26:32 cns-2 qemu-guest-agent[1126]: qemu-ga: transport endpoint
not found, not starting ... (warning).
Apr 12 16:26:32 cns-2 systemd[1]: Started LSB: QEMU Guest Agent startup
script.


Apr 12 16:26:32 cns-2 systemd[1]: Starting LSB: QEMU Guest Agent startup
script...
-- Subject: Unit qemu-guest-agent.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit qemu-guest-agent.service has begun starting up.
Apr 12 16:26:32 cns-2 qemu-guest-agent[1126]: qemu-ga: transport endpoint
not found, not starting ... (warning).
Apr 12 16:26:32 cns-2 systemd[1]: Started LSB: QEMU Guest Agent startup
script.
-- Subject: Unit qemu-guest-agent.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit qemu-guest-agent.service has finished starting up.
--
-- The start-up result is done.
####

During boot what happens seems to be that
/dev/virtio-ports/org.qemu.guest_agent.0 is not available although it
always is when I check after boot, which of course makes the error
disappear when I restart qemu-guest-agent later.

A simple "sleep 1" in /etc/default/qemu-guest-agent as a ugly hack solves
the issue, so there seems to be some kind of race condition during boot,
with possible a dependency of virtio-serial device that should be
added somewhere
during the boot process.

Although Stretch seems to have no issues I do not know if this is by design
or just that the race condition happens to end differently most of the time.

I have tried with kernel and qemu-guest-agent (1:2.8+dfsg-3~bpo8+1) from
jessie-backports, but the issue persists.

I marked the issue important as a non-functioning qemu guest agent when
enabled for the VM actually in an unreliable way not only prevents, but
also in some cases locks Jessie VMs, preventing further operations on
platforms like Proxmox in common scenarios like doing backups or snapshots,
and the VMs also do not shutdown when guest agent enabled VMs are told to
shutdown which again causes the operation to hang or time out often ending
with a hard stop with possible corrupt data to follow.

Reply via email to