On Wed, Feb 27, 2019 at 06:56:10PM +0000, Filipe Manana wrote:
> > > What do you expect by falling back to writeback_inodes_sb()?
> > > It all ends up going through the same btrfs writeback path.
> > > And as I left it, if an error happens for one root, it still tries to
> > > flush writeback for all the remaining roots as well, so I don't get it
> > > why you fallback to writeback_inodes_sb().
> >
> > As I read the changelog, you say that the corruption does not happen
> > with FLUSHONCOMMIT, which does writeback_inodes_sb. Using that would be
> > too heavy, thus you only start the delalloc on snapshotted roots.
> 
> It does not happen with flushoncommit because delalloc is flushed for all 
> roots.
> If flushoncommit (writeback_inodes_sb()) would flush only for roots
> that are about to be snapshotted, the corruption wouldn't happen as
> well.
> 
> >
> > So the idea is to use the same logic of flushoncommit in the unlikely
> > case when btrfs_start_delalloc_snapshot would fail. Only at some
> > performance cost, unless I'm missing something.
> 
> Ok, so I hope the first paragraph above explains why there's no need
> to fallback to the "flush all delalloc for all roots" logic
> (writeback_inodes_sb()).

Yeah, I think I get it now.

> > As for the v2 as you implement it without any error handling, doesn't
> > this allow the corruption to happen? If start_delalloc_inodes has a lot
> > of inodes for which it needs to allocate delalloc_work, the failure is
> > possible. That the list_for_each continues does not affect that
> > particular root.
> 
> Yes, it allows for the corruption to happen, that's why I had the
> error returned in v1.
> 
> writeback_inodes_sb() isn't special in any way - flushing some
> delalloc range can fail the same way, it ends up calling the same
> btrfs writepages() callback. Yes, it doesn't return any error, because
> it's relying on callers either not caring about it, or if they do,
> checking the inode's mapping for an error, which btrfs sets through
> its writepages() callback if any an error happens (by calling
> mapping_set_error()),
> or any other fs specific way of storing/checking for writeback errors.

Ok, so the the error handling needs to stay and there's no simple way
around it.

> If we get an error when flushing the dealloc range from the example in
> the changelog, then the corruption happens, regardless of writeback
> being
> triggered by writeback_inodes_sb() or btrfs_start_delalloc_snapshot().

Thanks for the time discussing that. I'll use code from v1 and the
subject from v2 and add the patch to the queue.

Reply via email to