On 28/03/11 23:06, Pádraig Brady wrote:
> On 18/03/11 16:04, Pádraig Brady wrote:
>> On 18/03/11 13:48, Chris Mason wrote:
>>> Excerpts from Pádraig Brady's message of 2011-03-18 09:19:44 -0400:
>>>> On 18/03/11 12:04, Chris Mason wrote:
>>>>> Excerpts from Jim Meyering's message of 2011-03-18 07:52:43 -0400:
>>>>>> Pádraig Brady wrote:
>>>>>>>
>>>>>>> So am I right in thinking FIEMAP_FLAG_SYNC is no longer needed with 
>>>>>>> 2.6.38.
>>>>>>> I'm quite worried about implicit syncing in cp actually, where it might
>>>>>>> introduce bad performance regressions.
>>>>>>
>>>>>> Good point.
>>>>>>
>>>>>>> Maybe we could disable this flag if XFS || kernel >= 2.6.38?
>>>>>>> Or maybe we might do as you suggest above and disable extent_copy()
>>>>>>> completely, for kernels before 2.6.38.
>>>>>>> Not an ideal situation at all :(
>>>>>>
>>>>>> If there really is risk of data loss with ext4 on kernels < 2.6.38,
>>>>>> even if it's only on unusual files, then we'll have to consider
>>>>>> using a kernel version check to disable extent_copy.
>>>>>>
>>>>>> Is there a stand-alone ext4 reproducer?
>>>>>
>>>>> dd if=/dev/zero of=/mnt/foo bs=1M seek=1 count=1
>>>>>
>>>>> Without a sync the buggy ext4/btrfs will not find the extent, and report
>>>>> only holes.
>>>>
>>>> OK, FIEMAP_FLAG_SYNC is an effective workaround for that.
>>>> So as mentioned above, we might consider not using this flag for
>>>> XFS || kernel >= 2.6.38
>>>
>>> Btrfs before 2.6.38 may have real trouble though, even with the sync.
>>> We were returning overlapping ranges to you, so the destination would
>>> end up bigger than the original.  This could be fixed in cp by making
>>> sure to never seek backwards based on the extents returned.
>>>
>>> Example bad results:
>>>
>>> logical: [ 0 ... 1MB ] -> physical [ foo .. foo ]
>>> logical  [ 0 ... 2MB ] -> physical [ foo2 .. foo2 ]
>>>
>>> 100% a btrfs bug, but cp would do the 0 .. 1MB bit and then seek back to
>>> zero and do the 0 .. 2MB bit.  If cp had not seeked backwards you would
>>> have covered for me.  2.6.38 final fixed this problem.
>>
>>> I think even XFS was fixed relatively recently, 2.6.36 and later.
>>> Looking at the commit, the simple dd test above wouldn't have triggered
>>> it.  Actually, looking at this commit even the sync wouldn't save xfs
>>> before 2.6.36, Eric am I reading this right?
>>>
>>> So maybe just give in and look for 2.6.38 instead of trying to remember
>>> all the places where individual filesystems didn't suck.  Give the user
>>> a cp --sparse=fiemap-im-really-really-sure to override your assumptions
>>> about our bugs.
>>>
>>> commit 9af25465081480a75824fd7a16a37a5cfebeede9
>>> Author: Tao Ma <tao...@oracle.com>
>>> Date:   Mon Aug 30 02:44:03 2010 +0000
>>>
>>>     xfs: Make fiemap work with sparse files
>>>     
>>>     In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want
>>>     to return fi_extent_max extents, but actually it won't work for
>>>     a sparse file. The reason is that in xfs_getbmap we will
>>>     calculate holes and set it in 'out', while out is malloced by
>>>     bmv_count(fi_extent_max+1) which didn't consider holes. So in the
>>>     worst case, if 'out' vector looks like
>>>     [hole, extent, hole, extent, hole, ... hole, extent, hole],
>>>     we will only return half of fi_extent_max extents.
>>
>> OK thanks for the info. So:
>>
>> XFS may miss some extents for sparse files before 2.6.36
>> BTRFS and EXT4 need a sync before fiemap() before 2.6.38
>> BTRFS can return overlapping extents before 2.6.38
>>
>> It seems like we should at least detect overlapping extents in our:
>> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/extent-scan.c;hb=HEAD
>> and return false (don't use fiemap) in this case.
>> We've already stopped doing the lseeks() but that assumes non overlapping 
>> extents.
> 
> Attached is the proposed patch to address the overlapping extents case,
> which we'll need if we don't bar kernels before 2.6.38,
> but is good to have anyway in case it happens in future.
> 
> I think I'll do a follow up patch to get extent_scan_read()
> to read all extents before returning, which will improve
> the fix (and the previous merge extents change).

I did that change, but was worried about the possible large
mem requirements of scanning all extents up front,
or the inefficiencies of scanning twice.
So instead I took the approach of adjusting extents so that:

|  extent 1 |
|               extent 2  |

is changed to

|  extent 1 |
|           |   extent 2  |

That allows copying the data even in this case.
Chris, will that handle this specific BTRFS issue?

I now also detect and just fail for these cases:

|  extent 1     |
|  extent 2 |

or
              |  extent 1  |
|  extent 2 |

cheers,
Pádraig.
>From e409ab13bc9ffb1e34f7e2fe41280ce373e8dfbd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com>
Date: Mon, 28 Mar 2011 19:22:21 +0100
Subject: [PATCH] copy: protect against overlapping extents

* src/extent-scan.c (extent_scan_read): Add a more stringent check
for OFF_T overflow, to ensure subsequent code is immune.
Detect overlapping extents and adjust, so as files always copied.
Detection using a single scan with fallback to a standard copy
was thought too expensive in memory or time.
---
 NEWS              |    4 ++++
 src/extent-scan.c |   34 +++++++++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletions(-)

diff --git a/NEWS b/NEWS
index 5b418bd..9c4a16f 100644
--- a/NEWS
+++ b/NEWS
@@ -20,6 +20,10 @@ GNU coreutils NEWS                                    -*- outline -*-
   wc would dereference a NULL pointer upon an early out-of-memory error
   [bug introduced in coreutils-7.1]
 
+  cp now avoids FIEMAP issues with BTRFS before Linux 2.6.38,
+  which could result in corrupt copies of sparse files.
+  [bug introduced in coreutils-8.10]
+
 ** Changes in behavior
 
   cp now copies empty extents efficiently on file systems with FIEMAP
diff --git a/src/extent-scan.c b/src/extent-scan.c
index b9520b7..24b56b8 100644
--- a/src/extent-scan.c
+++ b/src/extent-scan.c
@@ -90,7 +90,7 @@ extent_scan_read (struct extent_scan *scan)
 
   for (i = 0; i < scan->ei_count; i++)
     {
-      assert (fm_extents[i].fe_logical <= OFF_T_MAX);
+      assert (fm_extents[i].fe_logical <= OFF_T_MAX - fm_extents[i].fe_length);
 
       if (si && last_ei->ext_flags ==
           (fm_extents[i].fe_flags & ~FIEMAP_EXTENT_LAST)
@@ -102,6 +102,38 @@ extent_scan_read (struct extent_scan *scan)
           /* Copy flags in case different.  */
           last_ei->ext_flags = fm_extents[i].fe_flags;
         }
+      else if ((si == 0 && scan->scan_start > fm_extents[i].fe_logical)
+               || (si && last_ei->ext_logical + last_ei->ext_length >
+                   fm_extents[i].fe_logical))
+        {
+          /* BTRFS before 2.6.38 could return overlapping extents
+             for sparse files.  We adjust the returned extents
+             rather than failing, as otherwise it would be inefficient
+             to detect this on the initial scan.  */
+          uint64_t new_logical;
+          uint64_t length_adjust;
+          if (si == 0)
+            new_logical = scan->scan_start;
+          else
+            {
+              /* We could return here if scan->scan_start == 0
+                 but don't so as to minimize special cases.  */
+              new_logical = last_ei->ext_logical + last_ei->ext_length;
+            }
+          length_adjust = new_logical - fm_extents[i].fe_logical;
+          /* If an extent is contained within the previous one, just fail.  */
+          if (length_adjust < fm_extents[i].fe_length)
+            {
+              if (scan->scan_start == 0)
+                scan->initial_scan_failed = true;
+              return false;
+            }
+          fm_extents[i].fe_logical = new_logical;
+          fm_extents[i].fe_length -= length_adjust;
+          /* Process the adjusted extent again.  */
+          i--;
+          continue;
+        }
       else
         {
           last_ei = scan->ext_info + si;
-- 
1.7.4

Reply via email to