Re: [PATCH] block: file-posix: Fail unmap with NO_FALLBACK on block device

Kevin Wolf Wed, 17 Jun 2020 03:48:23 -0700

Am 16.06.2020 um 22:01 hat Nir Soffer geschrieben:
> On Tue, Jun 16, 2020 at 8:39 PM Nir Soffer <nsof...@redhat.com> wrote:
> >
> > On Tue, Jun 16, 2020 at 6:32 PM Kevin Wolf <kw...@redhat.com> wrote:
> > >
> > > Am 15.06.2020 um 21:32 hat Nir Soffer geschrieben:
> > > > We can zero 2.3 g/s:
> > > >
> > > > # time blkdiscard -z test-lv
> > > >
> > > > real 0m43.902s
> > > > user 0m0.002s
> > > > sys 0m0.130s
> > >
> > > > We can write 445m/s:
> > > >
> > > > # dd if=/dev/zero bs=2M count=51200 of=test-lv oflag=direct conv=fsync
> > > > 107374182400 bytes (107 GB, 100 GiB) copied, 241.257 s, 445 MB/s
> > >
> > > So using FALLOC_FL_PUNCH_HOLE _is_ faster after all. What might not be
> > > faster is zeroing out the whole device and then overwriting a
> > > considerable part of it again.
> > >
> > > I think this means that we shouldn't fail write_zeroes at the file-posix
> > > level even if BDRV_REQ_NO_FALLBACK is given. Instead, qemu-img convert
> > > is where I see a fix.
> > >
> > > Certainly qemu-img could be cleverer and zero out more selectively. The
> > > idea of doing a blk_make_zero() first seems to have caused some
> > > problems, though of course its introduction was also justified with
> > > performance, so improving one case might hurt another if we're not
> > > careful.
> > >
> > > However, when Peter Lieven introduced this (commit 5a37b60a61c), we
> > > didn't use write_zeroes yet during the regular copy loop (we do since
> > > commit 690c7301600). So chances are that blk_make_zero() doesn't
> > > actually help any more now.
> > >
> > > Can you run another test with the patch below?
> >
> > Sure, I can try this.
> 
> Tried it, and it performs the same as expected.


Thanks.

> > > I think it should perform
> > > the same as yours. Eric, Peter, do you think this would have a negative
> > > effect for NBD and/or iscsi?
> > >
> > > The other option would be providing an option and making it Someone
> > > Else's Problem.
> > >
> > > Kevin
> > >
> > >
> > > diff --git a/qemu-img.c b/qemu-img.c
> > > index d7e846e607..bdb9f6aa46 100644
> > > --- a/qemu-img.c
> > > +++ b/qemu-img.c
> > > @@ -2084,15 +2084,6 @@ static int convert_do_copy(ImgConvertState *s)
> > >          s->has_zero_init = bdrv_has_zero_init(blk_bs(s->target));
> > >      }
> > >
> > > -    if (!s->has_zero_init && !s->target_has_backing &&
> > > -        bdrv_can_write_zeroes_with_unmap(blk_bs(s->target)))
> > > -    {
> > > -        ret = blk_make_zero(s->target, BDRV_REQ_MAY_UNMAP | 
> > > BDRV_REQ_NO_FALLBACK);
> > > -        if (ret == 0) {
> > > -            s->has_zero_init = true;
> > > -        }
> > > -    }
> >
> > This will work of course, but now we will not do bulk zero for any target
> 
> I would like to have a minimal change to increase the chance that we
> can consume this very soon in oVirt.

I think this one would be pretty minimal.

Maybe we can later bring this code back, but with an implementation of
blk_make_zero() that doesn't use the generic write_zeroes operation,
but with a specific callback like Eric suggested.

> > I think we never do this for regular files anyway since we truncate
> > files, and there is nothing to zero, but maybe there is some case
> > when this is useful?

Yes, regular files have s->has_zero_init == true anyway.

> > BTW, do we use BDRV_REQ_NO_FALLBACK elsewhere? maybe we can remove
> > it.

qcow2 uses it when zeroing out parts of a newly allocated cluster on
partial writes. Though that code is questionable, too, and XFS people
suggest that we should stop using it.

Kevin

Re: [PATCH] block: file-posix: Fail unmap with NO_FALLBACK on block device

Reply via email to