On Wed, Feb 27, 2019 at 06:56:10PM +0000, Filipe Manana wrote: > > > What do you expect by falling back to writeback_inodes_sb()? > > > It all ends up going through the same btrfs writeback path. > > > And as I left it, if an error happens for one root, it still tries to > > > flush writeback for all the remaining roots as well, so I don't get it > > > why you fallback to writeback_inodes_sb(). > > > > As I read the changelog, you say that the corruption does not happen > > with FLUSHONCOMMIT, which does writeback_inodes_sb. Using that would be > > too heavy, thus you only start the delalloc on snapshotted roots. > > It does not happen with flushoncommit because delalloc is flushed for all > roots. > If flushoncommit (writeback_inodes_sb()) would flush only for roots > that are about to be snapshotted, the corruption wouldn't happen as > well. > > > > > So the idea is to use the same logic of flushoncommit in the unlikely > > case when btrfs_start_delalloc_snapshot would fail. Only at some > > performance cost, unless I'm missing something. > > Ok, so I hope the first paragraph above explains why there's no need > to fallback to the "flush all delalloc for all roots" logic > (writeback_inodes_sb()).
Yeah, I think I get it now. > > As for the v2 as you implement it without any error handling, doesn't > > this allow the corruption to happen? If start_delalloc_inodes has a lot > > of inodes for which it needs to allocate delalloc_work, the failure is > > possible. That the list_for_each continues does not affect that > > particular root. > > Yes, it allows for the corruption to happen, that's why I had the > error returned in v1. > > writeback_inodes_sb() isn't special in any way - flushing some > delalloc range can fail the same way, it ends up calling the same > btrfs writepages() callback. Yes, it doesn't return any error, because > it's relying on callers either not caring about it, or if they do, > checking the inode's mapping for an error, which btrfs sets through > its writepages() callback if any an error happens (by calling > mapping_set_error()), > or any other fs specific way of storing/checking for writeback errors. Ok, so the the error handling needs to stay and there's no simple way around it. > If we get an error when flushing the dealloc range from the example in > the changelog, then the corruption happens, regardless of writeback > being > triggered by writeback_inodes_sb() or btrfs_start_delalloc_snapshot(). Thanks for the time discussing that. I'll use code from v1 and the subject from v2 and add the patch to the queue.