Re: [Qemu-block] [PATCH] block: gluster: Probe alignment limits

Nir Soffer Thu, 22 Aug 2019 12:06:32 -0700

On Thu, Aug 22, 2019 at 10:03 AM Niels de Vos <nde...@redhat.com> wrote:


> On Wed, Aug 21, 2019 at 07:04:17PM +0200, Max Reitz wrote:
> > On 17.08.19 23:21, Nir Soffer wrote:
> > > Implement alignment probing similar to file-posix, by reading from the
> > > first 4k of the image.
> > >
> > > Before this change, provisioning a VM on storage with sector size of
> > > 4096 bytes would fail when the installer try to create filesystems.
> Here
> > > is an example command that reproduces this issue:
> > >
> > >     $ qemu-system-x86_64 -accel kvm -m 2048 -smp 2 \
> > >         -drive
> file=gluster://gluster1/gv0/fedora29.raw,format=raw,cache=none \
> > >         -cdrom Fedora-Server-dvd-x86_64-29-1.2.iso
> > >
> > > The installer fails in few seconds when trying to create filesystem on
> > > /dev/mapper/fedora-root. In error report we can see that it failed with
> > > EINVAL (I could not extract the error from guest).
> > >
> > > Copying disk fails with EINVAL:
> > >
> > >     $ qemu-img convert -p -f raw -O raw -t none -T none \
> > >         gluster://gluster1/gv0/fedora29.raw \
> > >         gluster://gluster1/gv0/fedora29-clone.raw
> > >     qemu-img: error while writing sector 4190208: Invalid argument
> > >
> > > This is a fix to same issue fixed in commit a6b257a08e3d (file-posix:
> > > Handle undetectable alignment) for gluster:// images.
> > >
> > > This fix has the same limit, that the first block of the image should
> be
> > > allocated, otherwise we cannot detect the alignment and fallback to a
> > > safe value (4096) even when using storage with sector size of 512
> bytes.
> > >
> > > Signed-off-by: Nir Soffer <nsof...@redhat.com>
> > > ---
> > >  block/gluster.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 47 insertions(+)
> > >
> > > diff --git a/block/gluster.c b/block/gluster.c
> > > index f64dc5b01e..d936240b72 100644
> > > --- a/block/gluster.c
> > > +++ b/block/gluster.c
> > > @@ -52,6 +52,9 @@
> > >
> > >  #define GERR_INDEX_HINT "hint: check in 'server' array index '%d'\n"
> > >
> > > +/* The value is known only on the server side. */
> > > +#define MAX_ALIGN 4096
> > > +
> > >  typedef struct GlusterAIOCB {
> > >      int64_t size;
> > >      int ret;
> > > @@ -902,8 +905,52 @@ out:
> > >      return ret;
> > >  }
> > >
> > > +/*
> > > + * Check if read is allowed with given memory buffer and length.
> > > + *
> > > + * This function is used to check O_DIRECT request alignment.
> > > + */
> > > +static bool gluster_is_io_aligned(struct glfs_fd *fd, void *buf,
> size_t len)
> > > +{
> > > +    ssize_t ret = glfs_pread(fd, buf, len, 0, 0, NULL);
> > > +    return ret >= 0 || errno != EINVAL;
> >
> > Is glfs_pread() guaranteed to return EINVAL on invalid alignment?
> > file-posix says this is only the case on Linux (for normal files).  Now
> > I also don’t know whether the gluster driver works on anything but Linux
> > anyway.
>
> The behaviour depends on the filesystem used by the Gluster backend. XFS
> is the recommendation, but in the end it is up to the users. The Gluster
> server is known to work on Linux, NetBSD and FreeBSD, the vast majority
> of users runs it on Linux.
>
> I do not think there is a strong guarantee EINVAL is always correct. How
> about only checking 'ret > 0'?
>

Looks like we don't have a choice.

>
> > > +}
> > > +
> > > +static void gluster_probe_alignment(BlockDriverState *bs, struct
> glfs_fd *fd,
> > > +                                    Error **errp)
> > > +{
> > > +    char *buf;
> > > +    size_t alignments[] = {1, 512, 1024, 2048, 4096};
> > > +    size_t align;
> > > +    int i;
> > > +
> > > +    buf = qemu_memalign(MAX_ALIGN, MAX_ALIGN);
> > > +
> > > +    for (i = 0; i < ARRAY_SIZE(alignments); i++) {
> > > +        align = alignments[i];
> > > +        if (gluster_is_io_aligned(fd, buf, align)) {
> > > +            /* Fallback to safe value. */
> > > +            bs->bl.request_alignment = (align != 1) ? align :
> MAX_ALIGN;
> > > +            break;
> > > +        }
> > > +    }
> >
> > I don’t like the fact that the last element of alignments[] should be
> > the same as MAX_ALIGN, without that ever having been made explicit
> anywhere.
> >
> > It’s a bit worse in the file-posix patch, because if getpagesize() is
> > greater than 4k, max_align will be greater than 4k.  But MAX_BLOCKSIZE
> > is 4k, too, so I suppose we wouldn’t support any block size beyond that
> > anyway, which makes the error message appropriate still.
> >
> > > +
> > > +    qemu_vfree(buf);
> > > +
> > > +    if (!bs->bl.request_alignment) {
> > > +        error_setg(errp, "Could not find working O_DIRECT alignment");
> > > +        error_append_hint(errp, "Try cache.direct=off\n");
> > > +    }
> > > +}
> > > +
> > >  static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error
> **errp)
> > >  {
> > > +    BDRVGlusterState *s = bs->opaque;
> > > +
> > > +    gluster_probe_alignment(bs, s->fd, errp);
> > > +
> > > +    bs->bl.min_mem_alignment = bs->bl.request_alignment;
> >
> > Well, I’ll just trust you that there is no weird system where the memory
> > alignment is greater than the request alignment.
> >
> > > +    bs->bl.opt_mem_alignment = MAX(bs->bl.request_alignment,
> MAX_ALIGN);
> >
> > Isn’t request_alignment guaranteed to not exceed MAX_ALIGN, i.e. isn’t
> > this always MAX_ALIGN?
> >
> > >      bs->bl.max_transfer = GLUSTER_MAX_TRANSFER;
> > >  }
> >
> > file-posix has a check in raw_reopen_prepare() whether we can find a
> > working alignment for the new FD.  Shouldn’t we do the same in
> > qemu_gluster_reopen_prepare()?
>
> Yes, I think that makes sense too.
>

I'l add it in v2.

Thanks for reviewing.

Nir

Re: [Qemu-block] [PATCH] block: gluster: Probe alignment limits

Reply via email to