On 12/13/18 7:12 AM, De Backer, Fred (Nokia - BE/Antwerp) wrote:
Hi,
We're using Openstack Ironic to deploy baremetal servers. During the deployment
process an agent (ironic-python-agent) running on Fedora linux uses qemu-img to
write a qcow2 file to a blockdevice.
Recently we saw a change in behavior of qemu-img. Previously we were using
Fedora 27 containing a fedora packaged version of qemu-img v2.10.2
(qemu-img-2.10.2-1.fc27.x86_64.rpm); now we use Fedora 29 containing a fedora
packaged version of qemu-img v3.0.0 (qemu-img-3.0.0-2.fc29.x86_64.rpm).
The command that is run by the ironic-python-agent (the same in both FC27 and
FC29) is: qemu-img -t directsync -O host_device /tmp/image.qcow2 /dev/sda
We observe that in Fedora 29 the qemu-img, before imaging the disk, it fully
zeroes it. Taking into account the disk size, the whole process now takes 35
minutes instead of 50 seconds. This causes the ironic-python-agent operation to
time-out. The Fedora 27 qemu-img doesn't do that.
Known issue; Nir and Rich have posted a previous thread on the topic,
and the conclusion is that we need to make qemu-img smarter about NOT
requesting pre-zeroing of devices where that is more expensive than just
zeroing as we go.
https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg01182.html
Scanning through the qemu-img source code, we found that adding -S 0 to the
command on Fedora 29 qemu-img restores the behavior as observed in Fedora 27
qemu-img.
Looking through the changelogs of qemu I couldn't find this behavior change
documented.
Now the questions:
* Is this the expected/required behavior that qemu-img first zeroes the
complete target disk before writing the image. In other words: is this a
qemu-img bug?
It's a performance bug. qemu-img convert has to ensure that the
destination reads 0 (rather than is uninitialized), but the way in which
it does so needs to be more careful about destinations that do not have
efficient block status or bulk zeroing capabilities.
* Is applying the -S 0 parameter a safe/sound/sensible thing to do to revert to
the old behavior. In other words: can I write a bug against the
ironic-python-agent to start using this parameter?
Using -S 0 avoids sparseness, which may introduce its own set of
problems if you were expecting the destination to be sparse.
* If the behavior is expected: is there some pointer to
documentation/changelogs I can read about this?
Reading the mentioned thread will give some more insight, and hopefully
qemu 4.0 will either improve the behavior by default or at least add
knobs so that you can tweak the behavior based on your needs.
This message (including any attachments) contains confidential information
Such disclaimers are unenforceable on publicly-archived lists. Still,
you may want to consider using a different email address that doesn't
spam list readers with your employer's legalese gobbledygook.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org