On 28/03/11 23:06, Pádraig Brady wrote:
> On 18/03/11 16:04, Pádraig Brady wrote:
>> On 18/03/11 13:48, Chris Mason wrote:
>>> Excerpts from Pádraig Brady's message of 2011-03-18 09:19:44 -0400:
>>>> On 18/03/11 12:04, Chris Mason wrote:
>>>>> Excerpts from Jim Meyering's message of 2011-03-18 07:52:43 -0400:
>>>>>> Pádraig Brady wrote:
>>>>>>>
>>>>>>> So am I right in thinking FIEMAP_FLAG_SYNC is no longer needed with
>>>>>>> 2.6.38.
>>>>>>> I'm quite worried about implicit syncing in cp actually, where it might
>>>>>>> introduce bad performance regressions.
>>>>>>
>>>>>> Good point.
>>>>>>
>>>>>>> Maybe we could disable this flag if XFS || kernel >= 2.6.38?
>>>>>>> Or maybe we might do as you suggest above and disable extent_copy()
>>>>>>> completely, for kernels before 2.6.38.
>>>>>>> Not an ideal situation at all :(
>>>>>>
>>>>>> If there really is risk of data loss with ext4 on kernels < 2.6.38,
>>>>>> even if it's only on unusual files, then we'll have to consider
>>>>>> using a kernel version check to disable extent_copy.
>>>>>>
>>>>>> Is there a stand-alone ext4 reproducer?
>>>>>
>>>>> dd if=/dev/zero of=/mnt/foo bs=1M seek=1 count=1
>>>>>
>>>>> Without a sync the buggy ext4/btrfs will not find the extent, and report
>>>>> only holes.
>>>>
>>>> OK, FIEMAP_FLAG_SYNC is an effective workaround for that.
>>>> So as mentioned above, we might consider not using this flag for
>>>> XFS || kernel >= 2.6.38
>>>
>>> Btrfs before 2.6.38 may have real trouble though, even with the sync.
>>> We were returning overlapping ranges to you, so the destination would
>>> end up bigger than the original. This could be fixed in cp by making
>>> sure to never seek backwards based on the extents returned.
>>>
>>> Example bad results:
>>>
>>> logical: [ 0 ... 1MB ] -> physical [ foo .. foo ]
>>> logical [ 0 ... 2MB ] -> physical [ foo2 .. foo2 ]
>>>
>>> 100% a btrfs bug, but cp would do the 0 .. 1MB bit and then seek back to
>>> zero and do the 0 .. 2MB bit. If cp had not seeked backwards you would
>>> have covered for me. 2.6.38 final fixed this problem.
>>
>>> I think even XFS was fixed relatively recently, 2.6.36 and later.
>>> Looking at the commit, the simple dd test above wouldn't have triggered
>>> it. Actually, looking at this commit even the sync wouldn't save xfs
>>> before 2.6.36, Eric am I reading this right?
>>>
>>> So maybe just give in and look for 2.6.38 instead of trying to remember
>>> all the places where individual filesystems didn't suck. Give the user
>>> a cp --sparse=fiemap-im-really-really-sure to override your assumptions
>>> about our bugs.
>>>
>>> commit 9af25465081480a75824fd7a16a37a5cfebeede9
>>> Author: Tao Ma <[email protected]>
>>> Date: Mon Aug 30 02:44:03 2010 +0000
>>>
>>> xfs: Make fiemap work with sparse files
>>>
>>> In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want
>>> to return fi_extent_max extents, but actually it won't work for
>>> a sparse file. The reason is that in xfs_getbmap we will
>>> calculate holes and set it in 'out', while out is malloced by
>>> bmv_count(fi_extent_max+1) which didn't consider holes. So in the
>>> worst case, if 'out' vector looks like
>>> [hole, extent, hole, extent, hole, ... hole, extent, hole],
>>> we will only return half of fi_extent_max extents.
>>
>> OK thanks for the info. So:
>>
>> XFS may miss some extents for sparse files before 2.6.36
>> BTRFS and EXT4 need a sync before fiemap() before 2.6.38
>> BTRFS can return overlapping extents before 2.6.38
>>
>> It seems like we should at least detect overlapping extents in our:
>> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/extent-scan.c;hb=HEAD
>> and return false (don't use fiemap) in this case.
>> We've already stopped doing the lseeks() but that assumes non overlapping
>> extents.
>
> Attached is the proposed patch to address the overlapping extents case,
> which we'll need if we don't bar kernels before 2.6.38,
> but is good to have anyway in case it happens in future.
>
> I think I'll do a follow up patch to get extent_scan_read()
> to read all extents before returning, which will improve
> the fix (and the previous merge extents change).
I did that change, but was worried about the possible large
mem requirements of scanning all extents up front,
or the inefficiencies of scanning twice.
So instead I took the approach of adjusting extents so that:
| extent 1 |
| extent 2 |
is changed to
| extent 1 |
| | extent 2 |
That allows copying the data even in this case.
Chris, will that handle this specific BTRFS issue?
I now also detect and just fail for these cases:
| extent 1 |
| extent 2 |
or
| extent 1 |
| extent 2 |
cheers,
Pádraig.
>From e409ab13bc9ffb1e34f7e2fe41280ce373e8dfbd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]>
Date: Mon, 28 Mar 2011 19:22:21 +0100
Subject: [PATCH] copy: protect against overlapping extents
* src/extent-scan.c (extent_scan_read): Add a more stringent check
for OFF_T overflow, to ensure subsequent code is immune.
Detect overlapping extents and adjust, so as files always copied.
Detection using a single scan with fallback to a standard copy
was thought too expensive in memory or time.
---
NEWS | 4 ++++
src/extent-scan.c | 34 +++++++++++++++++++++++++++++++++-
2 files changed, 37 insertions(+), 1 deletions(-)
diff --git a/NEWS b/NEWS
index 5b418bd..9c4a16f 100644
--- a/NEWS
+++ b/NEWS
@@ -20,6 +20,10 @@ GNU coreutils NEWS -*- outline -*-
wc would dereference a NULL pointer upon an early out-of-memory error
[bug introduced in coreutils-7.1]
+ cp now avoids FIEMAP issues with BTRFS before Linux 2.6.38,
+ which could result in corrupt copies of sparse files.
+ [bug introduced in coreutils-8.10]
+
** Changes in behavior
cp now copies empty extents efficiently on file systems with FIEMAP
diff --git a/src/extent-scan.c b/src/extent-scan.c
index b9520b7..24b56b8 100644
--- a/src/extent-scan.c
+++ b/src/extent-scan.c
@@ -90,7 +90,7 @@ extent_scan_read (struct extent_scan *scan)
for (i = 0; i < scan->ei_count; i++)
{
- assert (fm_extents[i].fe_logical <= OFF_T_MAX);
+ assert (fm_extents[i].fe_logical <= OFF_T_MAX - fm_extents[i].fe_length);
if (si && last_ei->ext_flags ==
(fm_extents[i].fe_flags & ~FIEMAP_EXTENT_LAST)
@@ -102,6 +102,38 @@ extent_scan_read (struct extent_scan *scan)
/* Copy flags in case different. */
last_ei->ext_flags = fm_extents[i].fe_flags;
}
+ else if ((si == 0 && scan->scan_start > fm_extents[i].fe_logical)
+ || (si && last_ei->ext_logical + last_ei->ext_length >
+ fm_extents[i].fe_logical))
+ {
+ /* BTRFS before 2.6.38 could return overlapping extents
+ for sparse files. We adjust the returned extents
+ rather than failing, as otherwise it would be inefficient
+ to detect this on the initial scan. */
+ uint64_t new_logical;
+ uint64_t length_adjust;
+ if (si == 0)
+ new_logical = scan->scan_start;
+ else
+ {
+ /* We could return here if scan->scan_start == 0
+ but don't so as to minimize special cases. */
+ new_logical = last_ei->ext_logical + last_ei->ext_length;
+ }
+ length_adjust = new_logical - fm_extents[i].fe_logical;
+ /* If an extent is contained within the previous one, just fail. */
+ if (length_adjust < fm_extents[i].fe_length)
+ {
+ if (scan->scan_start == 0)
+ scan->initial_scan_failed = true;
+ return false;
+ }
+ fm_extents[i].fe_logical = new_logical;
+ fm_extents[i].fe_length -= length_adjust;
+ /* Process the adjusted extent again. */
+ i--;
+ continue;
+ }
else
{
last_ei = scan->ext_info + si;
--
1.7.4