Re: [PATCH] copy: adjust fiemap handling of sparse files

Chris Mason Wed, 30 Mar 2011 04:31:57 -0700

Excerpts from Pádraig Brady's message of 2011-03-30 07:02:44 -0400:
> On 28/03/11 23:06, Pádraig Brady wrote:
> > On 18/03/11 16:04, Pádraig Brady wrote:
> >> On 18/03/11 13:48, Chris Mason wrote:
> >>> Excerpts from Pádraig Brady's message of 2011-03-18 09:19:44 -0400:
> >>>> On 18/03/11 12:04, Chris Mason wrote:
> >>>>> Excerpts from Jim Meyering's message of 2011-03-18 07:52:43 -0400:
> >>>>>> Pádraig Brady wrote:
> >>>>>>>
> >>>>>>> So am I right in thinking FIEMAP_FLAG_SYNC is no longer needed with 
> >>>>>>> 2.6.38.
> >>>>>>> I'm quite worried about implicit syncing in cp actually, where it 
> >>>>>>> might
> >>>>>>> introduce bad performance regressions.
> >>>>>>
> >>>>>> Good point.
> >>>>>>
> >>>>>>> Maybe we could disable this flag if XFS || kernel >= 2.6.38?
> >>>>>>> Or maybe we might do as you suggest above and disable extent_copy()
> >>>>>>> completely, for kernels before 2.6.38.
> >>>>>>> Not an ideal situation at all :(
> >>>>>>
> >>>>>> If there really is risk of data loss with ext4 on kernels < 2.6.38,
> >>>>>> even if it's only on unusual files, then we'll have to consider
> >>>>>> using a kernel version check to disable extent_copy.
> >>>>>>
> >>>>>> Is there a stand-alone ext4 reproducer?
> >>>>>
> >>>>> dd if=/dev/zero of=/mnt/foo bs=1M seek=1 count=1
> >>>>>
> >>>>> Without a sync the buggy ext4/btrfs will not find the extent, and report
> >>>>> only holes.
> >>>>
> >>>> OK, FIEMAP_FLAG_SYNC is an effective workaround for that.
> >>>> So as mentioned above, we might consider not using this flag for
> >>>> XFS || kernel >= 2.6.38
> >>>
> >>> Btrfs before 2.6.38 may have real trouble though, even with the sync.
> >>> We were returning overlapping ranges to you, so the destination would
> >>> end up bigger than the original.  This could be fixed in cp by making
> >>> sure to never seek backwards based on the extents returned.
> >>>
> >>> Example bad results:
> >>>
> >>> logical: [ 0 ... 1MB ] -> physical [ foo .. foo ]
> >>> logical  [ 0 ... 2MB ] -> physical [ foo2 .. foo2 ]
> >>>
> >>> 100% a btrfs bug, but cp would do the 0 .. 1MB bit and then seek back to
> >>> zero and do the 0 .. 2MB bit.  If cp had not seeked backwards you would
> >>> have covered for me.  2.6.38 final fixed this problem.
> >>
> >>> I think even XFS was fixed relatively recently, 2.6.36 and later.
> >>> Looking at the commit, the simple dd test above wouldn't have triggered
> >>> it.  Actually, looking at this commit even the sync wouldn't save xfs
> >>> before 2.6.36, Eric am I reading this right?
> >>>
> >>> So maybe just give in and look for 2.6.38 instead of trying to remember
> >>> all the places where individual filesystems didn't suck.  Give the user
> >>> a cp --sparse=fiemap-im-really-really-sure to override your assumptions
> >>> about our bugs.
> >>>
> >>> commit 9af25465081480a75824fd7a16a37a5cfebeede9
> >>> Author: Tao Ma <[email protected]>
> >>> Date:   Mon Aug 30 02:44:03 2010 +0000
> >>>
> >>>     xfs: Make fiemap work with sparse files
> >>>     
> >>>     In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want
> >>>     to return fi_extent_max extents, but actually it won't work for
> >>>     a sparse file. The reason is that in xfs_getbmap we will
> >>>     calculate holes and set it in 'out', while out is malloced by
> >>>     bmv_count(fi_extent_max+1) which didn't consider holes. So in the
> >>>     worst case, if 'out' vector looks like
> >>>     [hole, extent, hole, extent, hole, ... hole, extent, hole],
> >>>     we will only return half of fi_extent_max extents.
> >>
> >> OK thanks for the info. So:
> >>
> >> XFS may miss some extents for sparse files before 2.6.36
> >> BTRFS and EXT4 need a sync before fiemap() before 2.6.38
> >> BTRFS can return overlapping extents before 2.6.38
> >>
> >> It seems like we should at least detect overlapping extents in our:
> >> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/extent-scan.c;hb=HEAD
> >> and return false (don't use fiemap) in this case.
> >> We've already stopped doing the lseeks() but that assumes non overlapping 
> >> extents.
> > 
> > Attached is the proposed patch to address the overlapping extents case,
> > which we'll need if we don't bar kernels before 2.6.38,
> > but is good to have anyway in case it happens in future.
> > 
> > I think I'll do a follow up patch to get extent_scan_read()
> > to read all extents before returning, which will improve
> > the fix (and the previous merge extents change).
> 
> I did that change, but was worried about the possible large
> mem requirements of scanning all extents up front,
> or the inefficiencies of scanning twice.
> So instead I took the approach of adjusting extents so that:
> 
> |  extent 1 |
> |               extent 2  |
> 
> is changed to
> 
> |  extent 1 |
> |           |   extent 2  |
> 
> That allows copying the data even in this case.
> Chris, will that handle this specific BTRFS issue?
> 
> I now also detect and just fail for these cases:
> 
> |  extent 1     |
> |  extent 2 |
> 
> or
>               |  extent 1  |
> |  extent 2 |


Great, that should work just fine.  Thanks!

-chris

Re: [PATCH] copy: adjust fiemap handling of sparse files

Reply via email to