On 28/03/11 23:06, Pádraig Brady wrote: > On 18/03/11 16:04, Pádraig Brady wrote: >> On 18/03/11 13:48, Chris Mason wrote: >>> Excerpts from Pádraig Brady's message of 2011-03-18 09:19:44 -0400: >>>> On 18/03/11 12:04, Chris Mason wrote: >>>>> Excerpts from Jim Meyering's message of 2011-03-18 07:52:43 -0400: >>>>>> Pádraig Brady wrote: >>>>>>> >>>>>>> So am I right in thinking FIEMAP_FLAG_SYNC is no longer needed with >>>>>>> 2.6.38. >>>>>>> I'm quite worried about implicit syncing in cp actually, where it might >>>>>>> introduce bad performance regressions. >>>>>> >>>>>> Good point. >>>>>> >>>>>>> Maybe we could disable this flag if XFS || kernel >= 2.6.38? >>>>>>> Or maybe we might do as you suggest above and disable extent_copy() >>>>>>> completely, for kernels before 2.6.38. >>>>>>> Not an ideal situation at all :( >>>>>> >>>>>> If there really is risk of data loss with ext4 on kernels < 2.6.38, >>>>>> even if it's only on unusual files, then we'll have to consider >>>>>> using a kernel version check to disable extent_copy. >>>>>> >>>>>> Is there a stand-alone ext4 reproducer? >>>>> >>>>> dd if=/dev/zero of=/mnt/foo bs=1M seek=1 count=1 >>>>> >>>>> Without a sync the buggy ext4/btrfs will not find the extent, and report >>>>> only holes. >>>> >>>> OK, FIEMAP_FLAG_SYNC is an effective workaround for that. >>>> So as mentioned above, we might consider not using this flag for >>>> XFS || kernel >= 2.6.38 >>> >>> Btrfs before 2.6.38 may have real trouble though, even with the sync. >>> We were returning overlapping ranges to you, so the destination would >>> end up bigger than the original. This could be fixed in cp by making >>> sure to never seek backwards based on the extents returned. >>> >>> Example bad results: >>> >>> logical: [ 0 ... 1MB ] -> physical [ foo .. foo ] >>> logical [ 0 ... 2MB ] -> physical [ foo2 .. foo2 ] >>> >>> 100% a btrfs bug, but cp would do the 0 .. 1MB bit and then seek back to >>> zero and do the 0 .. 2MB bit. If cp had not seeked backwards you would >>> have covered for me. 2.6.38 final fixed this problem. >> >>> I think even XFS was fixed relatively recently, 2.6.36 and later. >>> Looking at the commit, the simple dd test above wouldn't have triggered >>> it. Actually, looking at this commit even the sync wouldn't save xfs >>> before 2.6.36, Eric am I reading this right? >>> >>> So maybe just give in and look for 2.6.38 instead of trying to remember >>> all the places where individual filesystems didn't suck. Give the user >>> a cp --sparse=fiemap-im-really-really-sure to override your assumptions >>> about our bugs. >>> >>> commit 9af25465081480a75824fd7a16a37a5cfebeede9 >>> Author: Tao Ma <tao...@oracle.com> >>> Date: Mon Aug 30 02:44:03 2010 +0000 >>> >>> xfs: Make fiemap work with sparse files >>> >>> In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want >>> to return fi_extent_max extents, but actually it won't work for >>> a sparse file. The reason is that in xfs_getbmap we will >>> calculate holes and set it in 'out', while out is malloced by >>> bmv_count(fi_extent_max+1) which didn't consider holes. So in the >>> worst case, if 'out' vector looks like >>> [hole, extent, hole, extent, hole, ... hole, extent, hole], >>> we will only return half of fi_extent_max extents. >> >> OK thanks for the info. So: >> >> XFS may miss some extents for sparse files before 2.6.36 >> BTRFS and EXT4 need a sync before fiemap() before 2.6.38 >> BTRFS can return overlapping extents before 2.6.38 >> >> It seems like we should at least detect overlapping extents in our: >> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/extent-scan.c;hb=HEAD >> and return false (don't use fiemap) in this case. >> We've already stopped doing the lseeks() but that assumes non overlapping >> extents. > > Attached is the proposed patch to address the overlapping extents case, > which we'll need if we don't bar kernels before 2.6.38, > but is good to have anyway in case it happens in future. > > I think I'll do a follow up patch to get extent_scan_read() > to read all extents before returning, which will improve > the fix (and the previous merge extents change).
I did that change, but was worried about the possible large mem requirements of scanning all extents up front, or the inefficiencies of scanning twice. So instead I took the approach of adjusting extents so that: | extent 1 | | extent 2 | is changed to | extent 1 | | | extent 2 | That allows copying the data even in this case. Chris, will that handle this specific BTRFS issue? I now also detect and just fail for these cases: | extent 1 | | extent 2 | or | extent 1 | | extent 2 | cheers, Pádraig.
>From e409ab13bc9ffb1e34f7e2fe41280ce373e8dfbd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com> Date: Mon, 28 Mar 2011 19:22:21 +0100 Subject: [PATCH] copy: protect against overlapping extents * src/extent-scan.c (extent_scan_read): Add a more stringent check for OFF_T overflow, to ensure subsequent code is immune. Detect overlapping extents and adjust, so as files always copied. Detection using a single scan with fallback to a standard copy was thought too expensive in memory or time. --- NEWS | 4 ++++ src/extent-scan.c | 34 +++++++++++++++++++++++++++++++++- 2 files changed, 37 insertions(+), 1 deletions(-) diff --git a/NEWS b/NEWS index 5b418bd..9c4a16f 100644 --- a/NEWS +++ b/NEWS @@ -20,6 +20,10 @@ GNU coreutils NEWS -*- outline -*- wc would dereference a NULL pointer upon an early out-of-memory error [bug introduced in coreutils-7.1] + cp now avoids FIEMAP issues with BTRFS before Linux 2.6.38, + which could result in corrupt copies of sparse files. + [bug introduced in coreutils-8.10] + ** Changes in behavior cp now copies empty extents efficiently on file systems with FIEMAP diff --git a/src/extent-scan.c b/src/extent-scan.c index b9520b7..24b56b8 100644 --- a/src/extent-scan.c +++ b/src/extent-scan.c @@ -90,7 +90,7 @@ extent_scan_read (struct extent_scan *scan) for (i = 0; i < scan->ei_count; i++) { - assert (fm_extents[i].fe_logical <= OFF_T_MAX); + assert (fm_extents[i].fe_logical <= OFF_T_MAX - fm_extents[i].fe_length); if (si && last_ei->ext_flags == (fm_extents[i].fe_flags & ~FIEMAP_EXTENT_LAST) @@ -102,6 +102,38 @@ extent_scan_read (struct extent_scan *scan) /* Copy flags in case different. */ last_ei->ext_flags = fm_extents[i].fe_flags; } + else if ((si == 0 && scan->scan_start > fm_extents[i].fe_logical) + || (si && last_ei->ext_logical + last_ei->ext_length > + fm_extents[i].fe_logical)) + { + /* BTRFS before 2.6.38 could return overlapping extents + for sparse files. We adjust the returned extents + rather than failing, as otherwise it would be inefficient + to detect this on the initial scan. */ + uint64_t new_logical; + uint64_t length_adjust; + if (si == 0) + new_logical = scan->scan_start; + else + { + /* We could return here if scan->scan_start == 0 + but don't so as to minimize special cases. */ + new_logical = last_ei->ext_logical + last_ei->ext_length; + } + length_adjust = new_logical - fm_extents[i].fe_logical; + /* If an extent is contained within the previous one, just fail. */ + if (length_adjust < fm_extents[i].fe_length) + { + if (scan->scan_start == 0) + scan->initial_scan_failed = true; + return false; + } + fm_extents[i].fe_logical = new_logical; + fm_extents[i].fe_length -= length_adjust; + /* Process the adjusted extent again. */ + i--; + continue; + } else { last_ei = scan->ext_info + si; -- 1.7.4