o error

Jeff Cody Thu, 07 Apr 2016 09:18:37 -0700

On Wed, Apr 06, 2016 at 01:02:16PM +0200, Kevin Wolf wrote:
> [ Adding some CCs ]
> 
> Am 06.04.2016 um 05:29 hat Jeff Cody geschrieben:
> > Upon receiving an I/O error after an fsync, by default gluster will
> > dump its cache.  However, QEMU will retry the fsync, which is especially
> > useful when encountering errors such as ENOSPC when using the werror=stop
> > option.  When using caching with gluster, however, the last written data
> > will be lost upon encountering ENOSPC.  Using the cache xlator option of
> > 'resync-failed-syncs-after-fsync' should cause gluster to retain the
> > cached data after a failed fsync, so that ENOSPC and other transient
> > errors are recoverable.
> > 
> > Signed-off-by: Jeff Cody <jc...@redhat.com>
> > ---
> >  block/gluster.c | 27 +++++++++++++++++++++++++++
> >  configure       |  8 ++++++++
> >  2 files changed, 35 insertions(+)
> > 
> > diff --git a/block/gluster.c b/block/gluster.c
> > index 30a827e..b1cf71b 100644
> > --- a/block/gluster.c
> > +++ b/block/gluster.c
> > @@ -330,6 +330,23 @@ static int qemu_gluster_open(BlockDriverState *bs,  
> > QDict *options,
> >          goto out;
> >      }
> >  
> > +#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
> > +    /* Without this, if fsync fails for a recoverable reason (for instance,
> > +     * ENOSPC), gluster will dump its cache, preventing retries.  This 
> > means
> > +     * almost certain data loss.  Not all gluster versions support the
> > +     * 'resync-failed-syncs-after-fsync' key value, but there is no way to
> > +     * discover during runtime if it is supported (this api returns 
> > success for
> > +     * unknown key/value pairs) */
> 
> Honestly, this sucks. There is apparently no way to operate gluster so
> we can safely recover after a failed fsync. "We hope everything is fine,
> but depending on your gluster version, we may now corrupt your image"
> isn't very good.
> 
> We need to consider very carefully if this is good enough to go on after
> an error. I'm currently leaning towards "no". That is, we should only
> enable this after Gluster provides us a way to make sure that the option
> is really set.
>


Thinking from the viewpoint of what is achievable for 2.6:

* This patch is an improvement over the current state.  It will at least
  provide protection on more recent versions of Gluster.

  The question is, are there then Gluster versions that will actually
  result in inconsistent data if this resync fsync option is not supported?

  There was another change in Gluster around v3.7.5 (can someone from
  Gluster confirm?) that removed Gluster's file descriptor invalidation
  upon any write I/O error.  If that version is correct, that means Gluster
  versions 3.7.5 - 3.7.8 are problematic on fsync failure with write-behind
  caching.

Given that, here are some alternatives that are feasible for 2.6:

A) We could decide that the lack of discoverability of this option means we
   treat all Gluster versions as not having the option.  On a fsync
   failure, we always switch over to read-only.  There will be data loss
   (and perhaps on some Gluster version where it could have otherwise been
   avoided), but we will have consistency.

B) Another option--sub-optimal from a performance perspective--is to just
   always open Gluster in O_DIRECT with the 'strict-o-direct' performance
   key value set to true.


Beyond just 2.6:

There is a larger question for QEMU, per Ric's point on other filesystem
behavior on fsync failure (different thread), and given that the POSIX spec
leaves cache validity after fsync i/o failure as not guaranteed [1]. Is
QEMU overall currently doing the correct thing on a bdrv_flush() failure?

That is, how certain are we that the other underlying protocol drivers
(including raw-posix) can safely retry a fsync?

Given the wording of the POSIX spec, should we move all images to read-only
upon a bdrv_flush() failure (unless we know explicitly otherwise by the
protocol driver)?  And, if _POSIX_SYNCHRONIZED_IO is not set, are we even
safe simply setting an image to r/o?

[1] From another thread, courtesy of Eric Blake:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/fsync.html

Jeff

> > +    ret = glfs_set_xlator_option (s->glfs, "*-write-behind",
> > +                                           
> > "resync-failed-syncs-after-fsync",
> > +                                           "on");
> > +    if (ret < 0) {
> > +        error_setg_errno(errp, errno, "Unable to set xlator key/value 
> > pair");
> > +        ret = -errno;
> > +        goto out;
> > +    }
> > +#endif
> 
> We also need to consider the case without CONFIG_GLUSTERFS_XLATOR_OPT.
> In this case (as well as theoretically in the case that the option
> didn't take effect - if only we could know about it), a failed
> glfs_fsync_async() is fatal and we need to stop operating on the image,
> i.e. set bs->drv = NULL like when we detect corruption in qcow2 images.
> The guest will see a broken disk that fails all I/O requests, but that's
> better than corrupting data.
> 
> Kevin

Re: [Qemu-devel] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error

Reply via email to