On Wed, Apr 06, 2016 at 04:47:32PM +0200, Niels de Vos wrote: > On Wed, Apr 06, 2016 at 08:44:18AM -0400, Jeff Cody wrote: > > On Wed, Apr 06, 2016 at 01:02:16PM +0200, Kevin Wolf wrote: > > > [ Adding some CCs ] > > > > > > Am 06.04.2016 um 05:29 hat Jeff Cody geschrieben: > > > > Upon receiving an I/O error after an fsync, by default gluster will > > > > dump its cache. However, QEMU will retry the fsync, which is especially > > > > useful when encountering errors such as ENOSPC when using the > > > > werror=stop > > > > option. When using caching with gluster, however, the last written data > > > > will be lost upon encountering ENOSPC. Using the cache xlator option of > > Use "write-behind xlator" instead of "cache xlator". There are different > caches in Gluster. >
Thanks, I'll update the commit message. > > > > 'resync-failed-syncs-after-fsync' should cause gluster to retain the > > > > cached data after a failed fsync, so that ENOSPC and other transient > > > > errors are recoverable. > > > > > > > > Signed-off-by: Jeff Cody <jc...@redhat.com> > > > > --- > > > > block/gluster.c | 27 +++++++++++++++++++++++++++ > > > > configure | 8 ++++++++ > > > > 2 files changed, 35 insertions(+) > > > > > > > > diff --git a/block/gluster.c b/block/gluster.c > > > > index 30a827e..b1cf71b 100644 > > > > --- a/block/gluster.c > > > > +++ b/block/gluster.c > > > > @@ -330,6 +330,23 @@ static int qemu_gluster_open(BlockDriverState *bs, > > > > QDict *options, > > > > goto out; > > > > } > > > > > > > > +#ifdef CONFIG_GLUSTERFS_XLATOR_OPT > > > > + /* Without this, if fsync fails for a recoverable reason (for > > > > instance, > > > > + * ENOSPC), gluster will dump its cache, preventing retries. This > > > > means > > > > + * almost certain data loss. Not all gluster versions support the > > > > + * 'resync-failed-syncs-after-fsync' key value, but there is no > > > > way to > > > > + * discover during runtime if it is supported (this api returns > > > > success for > > > > + * unknown key/value pairs) */ > > > > > > Honestly, this sucks. There is apparently no way to operate gluster so > > > we can safely recover after a failed fsync. "We hope everything is fine, > > > but depending on your gluster version, we may now corrupt your image" > > > isn't very good. > > > > > > We need to consider very carefully if this is good enough to go on after > > > an error. I'm currently leaning towards "no". That is, we should only > > > enable this after Gluster provides us a way to make sure that the option > > > is really set. > > Unfortunately it is also possible to disable the write-behind xlator as > well. This would cause setting the option to fail too :-/ At the moment > there is no real useful way to detect if write-behind is disabled (it is > enabled by default). > > > > > + ret = glfs_set_xlator_option (s->glfs, "*-write-behind", > > > > + > > > > "resync-failed-syncs-after-fsync", > > > > + "on"); > > > > + if (ret < 0) { > > > > + error_setg_errno(errp, errno, "Unable to set xlator key/value > > > > pair"); > > > > + ret = -errno; > > > > + goto out; > > > > + } > > > > +#endif > > > > > > We also need to consider the case without CONFIG_GLUSTERFS_XLATOR_OPT. > > > In this case (as well as theoretically in the case that the option > > > didn't take effect - if only we could know about it), a failed > > > glfs_fsync_async() is fatal and we need to stop operating on the image, > > > i.e. set bs->drv = NULL like when we detect corruption in qcow2 images. > > > The guest will see a broken disk that fails all I/O requests, but that's > > > better than corrupting data. > > > > > > > Gluster versions that don't support CONFIG_GLUSTERFS_XLATOR_OPT will > > also not have the gluster patch that removes the file descriptor > > invalidation upon error (unless that was a relatively new > > bug/feature). But if that is the case, every operation on the file > > descriptor in those versions will return error. But it is also rather > > old versions that don't support glfs_set_xlator_option() I believe. > > Indeed, glfs_set_xlator_option() was introduced with glusterfs-3.4.0. We > are currently on glusterfs-3.7, with the oldest supported version of > 3.5. In ~2 months we hopefully have a 3.8 release and that will cause Is it possible for 3.8 to be able to validate key/value pairs in glfs_set_xlator_option()? Or is it something that by design is decoupled enough that it isn't feasible? This is the second instance I know of that a lack of error information from the gluster api has made it difficult to determine if a feature is supported (the other instance being SEEK_DATA/SEEK_HOLE). It'd be nice if error returns were more consistent in 3.8 in the API, is possible. > the end-of-life of 3.5. 3.4 has been EOL for about a year now, hopefully > all our users have upgraded, but we know that some users will stay on > unsupported versions for a long time... > > However, the "resync-failed-syncs-after-fsync" option was only > introduced recently, with glusterfs-3.7.9. You could detect this with > pkg-config glusterfs-api >= 7.3.7.9 if need to be. > Unfortunately, compile-time detection of the specific key option isn't a great option, since we are using gluster as a shared library. Backports for various distributions may or may not have the "resync-failed-syncs-after-fsync" key in it regardless of versioning, and it isn't a new symbol that is detectable by the linker or at runtime. Additionally, QEMU on a system with gluster version 3.7.8, for instance, would have no way of knowing that "resync-failed-syncs-after-fsync" does nothing. The reason for the compile-time check for glfs_set_xlator_option() is because it is a symbol that will be looked up when QEMU is linked (and executed). In contrast, if the user downgrades gluster on their system to a version that doesn't have glfs_set_xlator_option(), QEMU won't run, so the breakage will be visible. > More details about the problem the option addresses are in the commit > message on http://review.gluster.org/13057 . > Thanks, that is useful! Jeff