Re: Which qemu change corresponds to RedHat bug 1655408

Max Reitz Fri, 09 Oct 2020 01:50:14 -0700

On 08.10.20 18:49, Jakob Bohm wrote:
> (Top posting because previous reply did so):
> 
> If the bug was closed as "can't reproduce", why was a very similar bug
> listed as fixed in RHSA-2019:2553-01 ?

Hi,

Which very similar bug do you mean?  I can only guess that perhaps you
mean 1603104 or 1551486.

Bug 1603104 was about qemu not ignoring errors when releasing file locks
fails (we should ignore errors then, because they're not fatal, and we
often cannot return errors, so they ended up as aborts).  (To give more
context, this error generally appeared only when the storage the image
is on somehow disappeared while qemu is running.  E.g. when the
connection to an NFS server was lost.)

Bug 1551486 entailed a bit of a rewrite of the whole locking code, which
may have resulted in the bug 1655408 no longer appearing for our QE
team.  But it was a different bug, as it wasn’t about any error, but
just about the fact that qemu used more FDs than necessary.

(Although I see 1655408 was reported for RHEL 8, whereas 1603104 and
1551486 (as part of RHSA-2019:2553) were reported for RHEL 7.  The
corresponding RHEL 8 bug for those two is 1694148.)

Either way, both of those bugs are fixed in 5.0.

1655408 in contrast reports an error at startup; locking itself failed.
 I couldn’t reproduce it, and I still can’t; neither with the image
mounted concurrently, nor with an RO NFS mount.

(For example:

exports:
[...]/test-nfs-ro
127.0.0.1(ro,sync,no_subtree_check,fsid=0,insecure,crossmnt)

$ for i in $(seq 100); do \
    echo -e '\033[1m---\033[0m'; \
    x86_64-softmmu/qemu-system-x86_64 \
      -drive \
        if=none,id=drv0,readonly=on,file=/mnt/tmp/arch.iso,format=raw \
      -device ide-cd,drive=drv0 \
      -enable-kvm -m 2048 -display none &; \
    pid=$!; \
    sleep 1; \
    kill $pid; \
  done

(Where x86_64-softmmu/qemu-system-x86_64 is upstream 5.0.1.)

All I see is something like:

---
qemu-system-x86_64: terminating on signal 15 from pid 7278 (/bin/zsh)
[2] 34103
[3]  - 34095 terminated  x86_64-softmmu/qemu-system-x86_64 -drive
-device ide-cd,drive=drv0  -m 2048

So no file locking errors.)

> On 2020-10-08 18:41, Philippe Mathieu-Daudé wrote:
>> Hi Jakob,
>>
>> On 10/8/20 6:32 PM, Jakob Bohm wrote:
>>> Red Hat bugzilla bug 1655408 against qemu is listed by Red Hat as
>>> fixed in
>>> April 2019, but I cannot find the corresponding change on qemu.org (the
>>> Changelog in the wiki is not a traditional changelog and doesn't cover
>>> bugfix releases such as 5.0.1, the git commit log is too detailed to
>>> search, the Red Hat bugzilla and security advisory pages do not link
>>> red hat bugs back to upstream (launchpad) bugs or git changes.
>>>
>>> Here is the bug title (which also affects my Debian packaged qemu 5.0):
>>>
>>> VM can not boot up due to "Failed to lock byte 100" if cdrom has been
>>> mounted on the host
>>>
>>> Further observation:
>>>
>>> The basic problem is that qemu-system refuses to start with the error
>>> message "Failed to lock byte 100" when -drive points to a read-only
>>> ISO file.  For the reporter of the Red Hat bug, that was a mount-induced
>>> read-only condition, in my case it is an NFS mount of a read-only
>>> directory.
>>>
>>> The error message itself seams meaningless, as there is no particular
>>> reason to request file locks on a read-only raw disk image.

Yes, there is.  We must prevent a concurrent instance from writing to
the image[1], and so we have to signal that somehow, which we do through
file locks.

I suppose it can be argued that if the image file itself is read-only
(outside of qemu), there is no need for locks, because nothing could
ever modify the image anyway.  But wouldn’t it be possible to change the
modifications after qemu has opened the image, or to remount some RO
filesystem R/W?

Perhaps we could automatically switch off file locks for a given image
file when taking the first one fails, and the image is read-only.  But
first I’d rather know what exactly is causing the error you see to appear.

[1] Technically, byte 100 is about being able to read valid data from
the image, which is a constraint that’s only very rarely broken.  But
still, it’s a constraint that must be signaled.  (You only see the
failure on this byte, because the later bytes (like the one not
preventing concurrent R/W access, 201) are not even attempted to be
locked after the first lock fails.)

(As for other instances writing to the image, you can allow that by
setting the share-rw=on option on the guest device.  This tells qemu
that the guest will accept modifications from the outside.  But that
still won’t prevent qemu from having to take a shared lock on byte 100.)

Max

>>> my qemu-system-x86_64 invocation contains the option (on one line):
>>>
>>> -drive if=none,id=drive-ide0-1-0,readonly=on,
>>> file=/mnt/someshare/path/gparted-live-1.1.0-5-amd64.iso,format=raw
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1655408 has been
>> closed due to lack of reproducer. Can you amend your information
>> to the BZ? It will likely be re-opened. Thanks!
>>
>>>
>>> Enjoy
>>>
>>> Jakob
>>> -- 
>>> Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
>>> Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
>>> This public discussion message is non-binding and may contain errors.
>>> WiseMo - Remote Service Management for PCs, Phones and Embedded
>>>
>>
> 
> 
> Enjoy
> 
> Jakob

Re: Which qemu change corresponds to RedHat bug 1655408

Reply via email to