>On Wed, Mar 24, 2021 at 4:52 PM Max Reitz <mre...@redhat.com> wrote: >On 22.03.21 10:25, ChangLimin wrote: >> For Linux 5.10/5.11, qemu write zeros to a multipath device using >> ioctl(fd, BLKZEROOUT, range) with cache none or directsync return -EBUSY >> permanently. > >So as far as I can track back the discussion, Kevin asked on v1 why we’d >set has_write_zeroes to false, i.e. whether the EBUSY might not go away >at some point, and if it did, whether we shouldn’t retry BLKZEROOUT then. >You haven’t explicitly replied to that question (as far as I can see), >so it kind of still stands. > >Implicitly, there are two conflicting answers in this patch: On one >hand, the commit message says “permanently”, and this is what you told >Nir as a realistic case where this can occur.
For Linux 5.10/5.11, the EBUSY is permanently, the reproduce step is below. For other Linux version, the EBUSY may be temporary. Because Linux 5.10/5.11 is not used widely, so do not set has_write_zeroes to false. >I'm afraid ChangLimin did not answer my question. I'm looking for real >world used case when qemu cannot write zeros to multipath device, when >nobody else is using the device. > >I tried to reproduce this on Fedora (kernel 5.10) with qemu-img convert, >once with a multipath device, and once with logical volume on a vg created >on the multipath device, and I could not reproduce this issue. The following is steps to reproduct the issue on Fedora 34. # uname -a Linux fedora-34 5.11.3-300.fc34.x86_64 #1 SMP Thu Mar 4 19:03:18 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux # qemu-img -V qemu-img version 5.2.0 (qemu-5.2.0-5.fc34.1) 1. Login in an ISCSI LUN created using targetcli on ubuntu 20.04 # iscsiadm -m discovery -t st -p 192.169.1.109 192.169.1.109:3260,1 iqn.2003-01.org.linux-iscsi:lio-lv100 # iscsiadm -m node -l -T iqn.2003-01.org.linux-iscsi:lio-lv100 # iscsiadm -m session tcp: [1] 192.169.1.109:3260,1 iqn.2003-01.org.linux-iscsi:lio-lv100 (non-flash) 2. start multipathd service # mpathconf --enable # systemctl start multipathd 3. add multipath path # multipath -a `/lib/udev/scsi_id -g /dev/sdb` # sdb means the ISCSI LUN wwid '36001405b76856e4816b48b99c6a77de3' added # multipathd add path /dev/sdb ok # multipath -ll # /dev/dm-1 is the multipath device based on /dev/sdb mpatha (36001405bebfc3a0522541cda30220db9) dm-1 LIO-ORG,lv102 size=1.0G features='0' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=50 status=active `- 5:0:0:0 sdd 8:48 active ready running 4. qemu-img return EBUSY both to dm-1 and sdb # wget http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img # qemu-img convert -O raw -t none cirros-0.4.0-x86_64-disk.img /dev/dm-1 qemu-img: error while writing at byte 0: Device or resource busy # qemu-img convert -O raw -t none cirros-0.4.0-x86_64-disk.img /dev/sdb qemu-img: error while writing at byte 0: Device or resource busy 5. blkdiscard also return EBUSY both to dm-1 and sdb # blkdiscard -o 0 -l 4096 /dev/dm-1 blkdiscard: cannot open /dev/dm-1: Device or resource busy # blkdiscard -o 0 -l 4096 /dev/sdb blkdiscard: cannot open /dev/sdb: No such file or directory 6. dd write zero is good, because it does not use blkdiscard # dd if=/dev/zero of=/dev/dm-1 bs=1M count=100 oflag=direct 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 2.33623 s, 44.9 MB/s 7. The LUN should support blkdiscard feature, otherwise it will not write zero with ioctl(fd, BLKZEROOUT, range) >If I understand the kernel change correctly, this can happen when there is >a mounted file system on top of the multipath device. I don't think we have >a use case when qemu accesses a multipath device when the device is used >by a file system, but maybe I missed something. > >So that to me implies >that we actually should not retry BLKZEROOUT, because the EBUSY will >remain, and that condition won’t change while the block device is in use >by qemu. > >On the other hand, in the code, you have decided not to reset >has_write_zeroes to false, so the implementation will retry. > >EBUSY is usually a temporary error, so retrying makes sense. The question >is if we really can write zeroes manually in this case? > >So I don’t quite understand. Should we keep trying BLKZEROOUT or is >there no chance of it working after it has at one point failed with >EBUSY? (Are there other cases besides what’s described in this commit >message where EBUSY might be returned and it is only temporary?) > >> Fallback to pwritev instead of exit for -EBUSY error. >> >> The issue was introduced in Linux 5.10: >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=384d87ef2c954fc58e6c5fd8253e4a1984f5fe02 >> >> Fixed in Linux 5.12: >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=56887cffe946bb0a90c74429fa94d6110a73119d >> >> Signed-off-by: ChangLimin <chan...@chinatelecom.cn> >> --- >> block/file-posix.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/block/file-posix.c b/block/file-posix.c >> index 20e14f8e96..d4054ac9cb 100644 >> --- a/block/file-posix.c >> +++ b/block/file-posix.c >> @@ -1624,8 +1624,12 @@ static ssize_t >> handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb) >> } while (errno == EINTR); >> >> ret = translate_err(-errno); >> - if (ret == -ENOTSUP) { >> - s->has_write_zeroes = false; >> + switch (ret) { >> + case -ENOTSUP: >> + s->has_write_zeroes = false; /* fall through */ >> + case -EBUSY: /* Linux 5.10/5.11 may return -EBUSY for multipath >> devices */ >> + return -ENOTSUP; >> + break; > >(Not sure why this break is here.) > >Max > >> } >> } >> #endif >> -- >> 2.27.0 >>