On Thu, Aug 22, 2019 at 10:03 AM Niels de Vos <nde...@redhat.com> wrote:
> On Wed, Aug 21, 2019 at 07:04:17PM +0200, Max Reitz wrote: > > On 17.08.19 23:21, Nir Soffer wrote: > > > Implement alignment probing similar to file-posix, by reading from the > > > first 4k of the image. > > > > > > Before this change, provisioning a VM on storage with sector size of > > > 4096 bytes would fail when the installer try to create filesystems. > Here > > > is an example command that reproduces this issue: > > > > > > $ qemu-system-x86_64 -accel kvm -m 2048 -smp 2 \ > > > -drive > file=gluster://gluster1/gv0/fedora29.raw,format=raw,cache=none \ > > > -cdrom Fedora-Server-dvd-x86_64-29-1.2.iso > > > > > > The installer fails in few seconds when trying to create filesystem on > > > /dev/mapper/fedora-root. In error report we can see that it failed with > > > EINVAL (I could not extract the error from guest). > > > > > > Copying disk fails with EINVAL: > > > > > > $ qemu-img convert -p -f raw -O raw -t none -T none \ > > > gluster://gluster1/gv0/fedora29.raw \ > > > gluster://gluster1/gv0/fedora29-clone.raw > > > qemu-img: error while writing sector 4190208: Invalid argument > > > > > > This is a fix to same issue fixed in commit a6b257a08e3d (file-posix: > > > Handle undetectable alignment) for gluster:// images. > > > > > > This fix has the same limit, that the first block of the image should > be > > > allocated, otherwise we cannot detect the alignment and fallback to a > > > safe value (4096) even when using storage with sector size of 512 > bytes. > > > > > > Signed-off-by: Nir Soffer <nsof...@redhat.com> > > > --- > > > block/gluster.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 47 insertions(+) > > > > > > diff --git a/block/gluster.c b/block/gluster.c > > > index f64dc5b01e..d936240b72 100644 > > > --- a/block/gluster.c > > > +++ b/block/gluster.c > > > @@ -52,6 +52,9 @@ > > > > > > #define GERR_INDEX_HINT "hint: check in 'server' array index '%d'\n" > > > > > > +/* The value is known only on the server side. */ > > > +#define MAX_ALIGN 4096 > > > + > > > typedef struct GlusterAIOCB { > > > int64_t size; > > > int ret; > > > @@ -902,8 +905,52 @@ out: > > > return ret; > > > } > > > > > > +/* > > > + * Check if read is allowed with given memory buffer and length. > > > + * > > > + * This function is used to check O_DIRECT request alignment. > > > + */ > > > +static bool gluster_is_io_aligned(struct glfs_fd *fd, void *buf, > size_t len) > > > +{ > > > + ssize_t ret = glfs_pread(fd, buf, len, 0, 0, NULL); > > > + return ret >= 0 || errno != EINVAL; > > > > Is glfs_pread() guaranteed to return EINVAL on invalid alignment? > > file-posix says this is only the case on Linux (for normal files). Now > > I also don’t know whether the gluster driver works on anything but Linux > > anyway. > > The behaviour depends on the filesystem used by the Gluster backend. XFS > is the recommendation, but in the end it is up to the users. The Gluster > server is known to work on Linux, NetBSD and FreeBSD, the vast majority > of users runs it on Linux. > > I do not think there is a strong guarantee EINVAL is always correct. How > about only checking 'ret > 0'? > Looks like we don't have a choice. > > > > +} > > > + > > > +static void gluster_probe_alignment(BlockDriverState *bs, struct > glfs_fd *fd, > > > + Error **errp) > > > +{ > > > + char *buf; > > > + size_t alignments[] = {1, 512, 1024, 2048, 4096}; > > > + size_t align; > > > + int i; > > > + > > > + buf = qemu_memalign(MAX_ALIGN, MAX_ALIGN); > > > + > > > + for (i = 0; i < ARRAY_SIZE(alignments); i++) { > > > + align = alignments[i]; > > > + if (gluster_is_io_aligned(fd, buf, align)) { > > > + /* Fallback to safe value. */ > > > + bs->bl.request_alignment = (align != 1) ? align : > MAX_ALIGN; > > > + break; > > > + } > > > + } > > > > I don’t like the fact that the last element of alignments[] should be > > the same as MAX_ALIGN, without that ever having been made explicit > anywhere. > > > > It’s a bit worse in the file-posix patch, because if getpagesize() is > > greater than 4k, max_align will be greater than 4k. But MAX_BLOCKSIZE > > is 4k, too, so I suppose we wouldn’t support any block size beyond that > > anyway, which makes the error message appropriate still. > > > > > + > > > + qemu_vfree(buf); > > > + > > > + if (!bs->bl.request_alignment) { > > > + error_setg(errp, "Could not find working O_DIRECT alignment"); > > > + error_append_hint(errp, "Try cache.direct=off\n"); > > > + } > > > +} > > > + > > > static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error > **errp) > > > { > > > + BDRVGlusterState *s = bs->opaque; > > > + > > > + gluster_probe_alignment(bs, s->fd, errp); > > > + > > > + bs->bl.min_mem_alignment = bs->bl.request_alignment; > > > > Well, I’ll just trust you that there is no weird system where the memory > > alignment is greater than the request alignment. > > > > > + bs->bl.opt_mem_alignment = MAX(bs->bl.request_alignment, > MAX_ALIGN); > > > > Isn’t request_alignment guaranteed to not exceed MAX_ALIGN, i.e. isn’t > > this always MAX_ALIGN? > > > > > bs->bl.max_transfer = GLUSTER_MAX_TRANSFER; > > > } > > > > file-posix has a check in raw_reopen_prepare() whether we can find a > > working alignment for the new FD. Shouldn’t we do the same in > > qemu_gluster_reopen_prepare()? > > Yes, I think that makes sense too. > I'l add it in v2. Thanks for reviewing. Nir