Excerpts from Pádraig Brady's message of 2011-03-30 07:02:44 -0400: > On 28/03/11 23:06, Pádraig Brady wrote: > > On 18/03/11 16:04, Pádraig Brady wrote: > >> On 18/03/11 13:48, Chris Mason wrote: > >>> Excerpts from Pádraig Brady's message of 2011-03-18 09:19:44 -0400: > >>>> On 18/03/11 12:04, Chris Mason wrote: > >>>>> Excerpts from Jim Meyering's message of 2011-03-18 07:52:43 -0400: > >>>>>> Pádraig Brady wrote: > >>>>>>> > >>>>>>> So am I right in thinking FIEMAP_FLAG_SYNC is no longer needed with > >>>>>>> 2.6.38. > >>>>>>> I'm quite worried about implicit syncing in cp actually, where it > >>>>>>> might > >>>>>>> introduce bad performance regressions. > >>>>>> > >>>>>> Good point. > >>>>>> > >>>>>>> Maybe we could disable this flag if XFS || kernel >= 2.6.38? > >>>>>>> Or maybe we might do as you suggest above and disable extent_copy() > >>>>>>> completely, for kernels before 2.6.38. > >>>>>>> Not an ideal situation at all :( > >>>>>> > >>>>>> If there really is risk of data loss with ext4 on kernels < 2.6.38, > >>>>>> even if it's only on unusual files, then we'll have to consider > >>>>>> using a kernel version check to disable extent_copy. > >>>>>> > >>>>>> Is there a stand-alone ext4 reproducer? > >>>>> > >>>>> dd if=/dev/zero of=/mnt/foo bs=1M seek=1 count=1 > >>>>> > >>>>> Without a sync the buggy ext4/btrfs will not find the extent, and report > >>>>> only holes. > >>>> > >>>> OK, FIEMAP_FLAG_SYNC is an effective workaround for that. > >>>> So as mentioned above, we might consider not using this flag for > >>>> XFS || kernel >= 2.6.38 > >>> > >>> Btrfs before 2.6.38 may have real trouble though, even with the sync. > >>> We were returning overlapping ranges to you, so the destination would > >>> end up bigger than the original. This could be fixed in cp by making > >>> sure to never seek backwards based on the extents returned. > >>> > >>> Example bad results: > >>> > >>> logical: [ 0 ... 1MB ] -> physical [ foo .. foo ] > >>> logical [ 0 ... 2MB ] -> physical [ foo2 .. foo2 ] > >>> > >>> 100% a btrfs bug, but cp would do the 0 .. 1MB bit and then seek back to > >>> zero and do the 0 .. 2MB bit. If cp had not seeked backwards you would > >>> have covered for me. 2.6.38 final fixed this problem. > >> > >>> I think even XFS was fixed relatively recently, 2.6.36 and later. > >>> Looking at the commit, the simple dd test above wouldn't have triggered > >>> it. Actually, looking at this commit even the sync wouldn't save xfs > >>> before 2.6.36, Eric am I reading this right? > >>> > >>> So maybe just give in and look for 2.6.38 instead of trying to remember > >>> all the places where individual filesystems didn't suck. Give the user > >>> a cp --sparse=fiemap-im-really-really-sure to override your assumptions > >>> about our bugs. > >>> > >>> commit 9af25465081480a75824fd7a16a37a5cfebeede9 > >>> Author: Tao Ma <tao...@oracle.com> > >>> Date: Mon Aug 30 02:44:03 2010 +0000 > >>> > >>> xfs: Make fiemap work with sparse files > >>> > >>> In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want > >>> to return fi_extent_max extents, but actually it won't work for > >>> a sparse file. The reason is that in xfs_getbmap we will > >>> calculate holes and set it in 'out', while out is malloced by > >>> bmv_count(fi_extent_max+1) which didn't consider holes. So in the > >>> worst case, if 'out' vector looks like > >>> [hole, extent, hole, extent, hole, ... hole, extent, hole], > >>> we will only return half of fi_extent_max extents. > >> > >> OK thanks for the info. So: > >> > >> XFS may miss some extents for sparse files before 2.6.36 > >> BTRFS and EXT4 need a sync before fiemap() before 2.6.38 > >> BTRFS can return overlapping extents before 2.6.38 > >> > >> It seems like we should at least detect overlapping extents in our: > >> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/extent-scan.c;hb=HEAD > >> and return false (don't use fiemap) in this case. > >> We've already stopped doing the lseeks() but that assumes non overlapping > >> extents. > > > > Attached is the proposed patch to address the overlapping extents case, > > which we'll need if we don't bar kernels before 2.6.38, > > but is good to have anyway in case it happens in future. > > > > I think I'll do a follow up patch to get extent_scan_read() > > to read all extents before returning, which will improve > > the fix (and the previous merge extents change). > > I did that change, but was worried about the possible large > mem requirements of scanning all extents up front, > or the inefficiencies of scanning twice. > So instead I took the approach of adjusting extents so that: > > | extent 1 | > | extent 2 | > > is changed to > > | extent 1 | > | | extent 2 | > > That allows copying the data even in this case. > Chris, will that handle this specific BTRFS issue? > > I now also detect and just fail for these cases: > > | extent 1 | > | extent 2 | > > or > | extent 1 | > | extent 2 |
Great, that should work just fine. Thanks! -chris