Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it

2017-06-02 Thread Jeff Layton
On Thu, 2017-06-01 at 23:25 -0600, Ross Zwisler wrote:
> On Wed, May 31, 2017 at 08:45:23AM -0400, Jeff Layton wrote:
> > v5: don't retrofit old API over the new infrastructure
> > add fstype flag to indicate how wb errors are tracked within that fs
> > add more function variants that take a errseq_t "since" value
> > add second errseq_t to struct file to track metadata wb errors
> > convert ext4 and ext2 to use the new APIs
> > 
> > v4: several more cleanup patches
> > documentation and kerneldoc comment updates
> > fix bugs in gfs2 patches
> > make sync_file_range use same error reporting semantics
> > bugfixes in buffer.c
> > convert nfs to new scheme (maybe bogus, can be dropped)
> > 
> > v3: wb_err_t -> errseq_t conversion
> > clean up places that re-set errors after calling filemap_* functions
> > 
> > v2: introduce wb_err_t, use atomics
> > 
> > This is v5 of the patchset to improve how we're tracking and reporting
> > errors that occur during pagecache writeback. The main difference in
> > this set from the last one is that I've stopped trying to retrofit the
> > old error tracking API on top of the new one. This is more work since
> > we'll have to touch each fs individually, but should be safer as the
> > "since" values used for checking errors will be more deliberate.
> > 
> > There are several situations where the kernel can "lose" errors that
> > occur during writeback, such that fsync will return success even
> > though it failed to write back some data previously. The basic idea
> > here is to have the kernel be more deliberate about the point from
> > which errors are checked to ensure that that doesn't happen.
> > 
> > An additional aim of this set is to change the behavior of fsync in
> > Linux to report writeback errors on all fds instead of just the first
> > one. This allows writers to reliably tell whether their data made it to
> > the backing device without having to coordinate fsync calls with other
> > writers.
> > 
> > To do this, we add a new typedef: errseq_t. This is a 32-bit value
> > that can store an error code, and a sequence number so we can tell
> > whether it has changed since we last sampled it. This allows us to
> > record errors in the address_space and then report those errors only
> > once per file description.
> > 
> > This set just alters block device files, ext4 and the legacy ext2
> > driver. If this general approach seems acceptable, then I'll start
> > converting other filesystems in follow-on patchsets. I'd also like
> > to get this into linux-next as soon as possible to ensure that we're
> > banging out any bugs that might be lurking here.
> > 
> > I also have a couple of xfstests for this as well that I'll re-post
> > soon.
> 
> Can you tell me a baseline that this applies cleanly to, or give me a link to
> a tree with these patches already applied?  I've tried applying it to v4.11,
> linux/master and mmots/master, and so far nothing has worked.

It's basically on top of v4.12-rc3, but it may not apply cleanly
without the pile of individual patches that I sent recently.

It may be best to just pull down the "wberr" branch from my tree here:

git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git

I was originally sending the prep patches as part of this series, but
maintainers weren't picking them up, so I moved to sending them
individually and then sending this pile as its own set.

Many thanks for giving this a look and testing it!
-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it

2017-06-01 Thread Ross Zwisler
On Wed, May 31, 2017 at 08:45:23AM -0400, Jeff Layton wrote:
> v5: don't retrofit old API over the new infrastructure
> add fstype flag to indicate how wb errors are tracked within that fs
> add more function variants that take a errseq_t "since" value
> add second errseq_t to struct file to track metadata wb errors
> convert ext4 and ext2 to use the new APIs
> 
> v4: several more cleanup patches
> documentation and kerneldoc comment updates
> fix bugs in gfs2 patches
> make sync_file_range use same error reporting semantics
> bugfixes in buffer.c
> convert nfs to new scheme (maybe bogus, can be dropped)
> 
> v3: wb_err_t -> errseq_t conversion
> clean up places that re-set errors after calling filemap_* functions
> 
> v2: introduce wb_err_t, use atomics
> 
> This is v5 of the patchset to improve how we're tracking and reporting
> errors that occur during pagecache writeback. The main difference in
> this set from the last one is that I've stopped trying to retrofit the
> old error tracking API on top of the new one. This is more work since
> we'll have to touch each fs individually, but should be safer as the
> "since" values used for checking errors will be more deliberate.
> 
> There are several situations where the kernel can "lose" errors that
> occur during writeback, such that fsync will return success even
> though it failed to write back some data previously. The basic idea
> here is to have the kernel be more deliberate about the point from
> which errors are checked to ensure that that doesn't happen.
> 
> An additional aim of this set is to change the behavior of fsync in
> Linux to report writeback errors on all fds instead of just the first
> one. This allows writers to reliably tell whether their data made it to
> the backing device without having to coordinate fsync calls with other
> writers.
> 
> To do this, we add a new typedef: errseq_t. This is a 32-bit value
> that can store an error code, and a sequence number so we can tell
> whether it has changed since we last sampled it. This allows us to
> record errors in the address_space and then report those errors only
> once per file description.
> 
> This set just alters block device files, ext4 and the legacy ext2
> driver. If this general approach seems acceptable, then I'll start
> converting other filesystems in follow-on patchsets. I'd also like
> to get this into linux-next as soon as possible to ensure that we're
> banging out any bugs that might be lurking here.
> 
> I also have a couple of xfstests for this as well that I'll re-post
> soon.

Can you tell me a baseline that this applies cleanly to, or give me a link to
a tree with these patches already applied?  I've tried applying it to v4.11,
linux/master and mmots/master, and so far nothing has worked.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it

2017-05-31 Thread Jeff Layton
On Wed, 2017-05-31 at 14:37 -0700, Andrew Morton wrote:
> On Wed, 31 May 2017 17:31:49 -0400 Jeff Layton  wrote:
> 
> > On Wed, 2017-05-31 at 13:27 -0700, Andrew Morton wrote:
> > > On Wed, 31 May 2017 08:45:23 -0400 Jeff Layton  wrote:
> > > 
> > > > This is v5 of the patchset to improve how we're tracking and reporting
> > > > errors that occur during pagecache writeback.
> > > 
> > > I'm curious to know how you've been testing this?
> > >  Is that testing
> > > strong enough for us to be confident that all nature of I/O errors
> > > will be reported to userspace?
> > > 
> > 
> > That's a tall order. This is a difficult thing to test as these sorts of
> > errors are pretty rare by nature.
> > 
> > I have an xfstest that I posted just after this set that demonstrates
> > that it works correctly, at least on ext2/3/4 when run by the ext4
> > driver (ext2 legacy driver reports too many errors currently). I had
> > btrfs and xfs working on that test too in an earlier incarnation of this
> > set, so I think we can fix this in them as well without too much
> > difficulty.
> > 
> > I'm happy to run other tests if someone wants to suggest them.
> > 
> > Now, all that said, I don't think this will make things any worse than
> > they are today as far as reporting errors properly to userland goes.
> > It's rather easy for an incidental synchronous writeback request from an
> > internal caller to clear the AS_* flags today. This will at least ensure
> > that we're reporting errors since a well-defined point in time when you
> > call fsync.
> 
> Were you using error injection of some form?  If so, how was that all
> set up?
> 

Yes, it uses dm-error for fault injection.

The test basically does:

1) set up a dm-error device in a working configuration

2) build a scratch filesystem on it, with the log on a different device
in some fashion so metadata writeback will still succeed.

3) open the same file several times

4) flip dm-error device to non-working mode

5) write to each fd

6) fsync each fd

...do you get back an error on each fsync?

It then does a bit more to make sure they're cleared afterward as you'd
expect. That works for most block device based filesystems. I also have
a second xfstest that opens a block device and does the same basic
thing. That also works correctly with this patch series.

I still need to come up with a way to simulate errors on other fs'
though. We may need to plumb in some kernel-level fault injection on
some fs' to do that correctly. Suggestions welcome there.

With this series though, the idea is to convert one filesystem at a
time, so I think that should help mitigate some of the risk.

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it

2017-05-31 Thread Andrew Morton
On Wed, 31 May 2017 17:31:49 -0400 Jeff Layton  wrote:

> On Wed, 2017-05-31 at 13:27 -0700, Andrew Morton wrote:
> > On Wed, 31 May 2017 08:45:23 -0400 Jeff Layton  wrote:
> > 
> > > This is v5 of the patchset to improve how we're tracking and reporting
> > > errors that occur during pagecache writeback.
> > 
> > I'm curious to know how you've been testing this?
> 
> >  Is that testing
> > strong enough for us to be confident that all nature of I/O errors
> > will be reported to userspace?
> > 
> 
> That's a tall order. This is a difficult thing to test as these sorts of
> errors are pretty rare by nature.
> 
> I have an xfstest that I posted just after this set that demonstrates
> that it works correctly, at least on ext2/3/4 when run by the ext4
> driver (ext2 legacy driver reports too many errors currently). I had
> btrfs and xfs working on that test too in an earlier incarnation of this
> set, so I think we can fix this in them as well without too much
> difficulty.
> 
> I'm happy to run other tests if someone wants to suggest them.
> 
> Now, all that said, I don't think this will make things any worse than
> they are today as far as reporting errors properly to userland goes.
> It's rather easy for an incidental synchronous writeback request from an
> internal caller to clear the AS_* flags today. This will at least ensure
> that we're reporting errors since a well-defined point in time when you
> call fsync.

Were you using error injection of some form?  If so, how was that all
set up?

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it

2017-05-31 Thread Jeff Layton
On Wed, 2017-05-31 at 13:27 -0700, Andrew Morton wrote:
> On Wed, 31 May 2017 08:45:23 -0400 Jeff Layton  wrote:
> 
> > This is v5 of the patchset to improve how we're tracking and reporting
> > errors that occur during pagecache writeback.
> 
> I'm curious to know how you've been testing this?

>  Is that testing
> strong enough for us to be confident that all nature of I/O errors
> will be reported to userspace?
> 

That's a tall order. This is a difficult thing to test as these sorts of
errors are pretty rare by nature.

I have an xfstest that I posted just after this set that demonstrates
that it works correctly, at least on ext2/3/4 when run by the ext4
driver (ext2 legacy driver reports too many errors currently). I had
btrfs and xfs working on that test too in an earlier incarnation of this
set, so I think we can fix this in them as well without too much
difficulty.

I'm happy to run other tests if someone wants to suggest them.

Now, all that said, I don't think this will make things any worse than
they are today as far as reporting errors properly to userland goes.
It's rather easy for an incidental synchronous writeback request from an
internal caller to clear the AS_* flags today. This will at least ensure
that we're reporting errors since a well-defined point in time when you
call fsync.
-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it

2017-05-31 Thread Andrew Morton
On Wed, 31 May 2017 08:45:23 -0400 Jeff Layton  wrote:

> This is v5 of the patchset to improve how we're tracking and reporting
> errors that occur during pagecache writeback.

I'm curious to know how you've been testing this?  Is that testing
strong enough for us to be confident that all nature of I/O errors
will be reported to userspace?

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html