Re: [PATCH] copy: adjust fiemap handling of sparse files
Pádraig Brady wrote: ... I'm going to use this 2.6.38 check to only enable FIEMAP_FLAG_SYNC before Linux kernel 2.6.38. It's always worth avoiding sync if possible. Proposed patch attached. I'll submit my 3 outstanding fiemap patches tomorrow. ... Subject: [PATCH] copy: with fiemap copy, only sync when needed * src/extent-scan.h (struct extent_scan): Add the fm_flags member to pass to the fiemap scan. * src/extent-scan.c (extent_need_sync): A new function used to detect Linux kernels before 2.6.38. (extent_scan_init): Add FIEMAP_FLAG_SYNC when needed. * tests/cp/sparse-fiemap: Adjust comment. * NEWS: Mention the change in behavior. Indirectly suggested by Mike Frysinger --- NEWS |4 src/extent-scan.c | 32 +++- src/extent-scan.h |3 +++ tests/cp/sparse-fiemap |4 ++-- 4 files changed, 40 insertions(+), 3 deletions(-) diff --git a/NEWS b/NEWS index 9c4a16f..d1020cb 100644 --- a/NEWS +++ b/NEWS @@ -31,6 +31,10 @@ GNU coreutils NEWS-*- outline -*- bytes from the input, and also can efficiently create a hole in the output file when --sparse=always is specified. + cp now avoids syncing files when possible, when doing a FIEMAP copy. + The sync in only needed on Linux kernels before 2.6.38. + [The sync was introduced in coreutils-8.10] Thanks. This looks good. Minor suggestions below: diff --git a/src/extent-scan.c b/src/extent-scan.c ... +/* Work around Linux kernel issues on BTRFS and EXT4 before 2.6.38. */ Please add a comment here, e.g., /* FIXME: remove in 2013, or whenever we're pretty confident that the offending, unpatched kernels are no longer in use. */ +static bool +extent_need_sync (void) +{ + static int need_sync = -1; + + if (need_sync == -1) +{ + struct utsname name; + need_sync = 0; /* No workaround by default. */ + +#ifdef __linux__ + if (uname (name) != -1 strncmp (name.release, 2.6., 4) == 0) +{ + unsigned long val; + if (xstrtoul (name.release + 4, NULL, 0, val, NULL) == LONGINT_OK) That 3rd argument is the conversion base. Leaving it as 0 lets us accept octal and hexadecimal. No big risk, obviously, but you can tighten it up by using 10 instead. + { + if (val 38) + need_sync = 1; + } +} +#endif +} + + return need_sync; +}
Re: [PATCH] copy: adjust fiemap handling of sparse files
On 28/03/11 23:06, Pádraig Brady wrote: On 18/03/11 16:04, Pádraig Brady wrote: On 18/03/11 13:48, Chris Mason wrote: Excerpts from Pádraig Brady's message of 2011-03-18 09:19:44 -0400: On 18/03/11 12:04, Chris Mason wrote: Excerpts from Jim Meyering's message of 2011-03-18 07:52:43 -0400: Pádraig Brady wrote: So am I right in thinking FIEMAP_FLAG_SYNC is no longer needed with 2.6.38. I'm quite worried about implicit syncing in cp actually, where it might introduce bad performance regressions. Good point. Maybe we could disable this flag if XFS || kernel = 2.6.38? Or maybe we might do as you suggest above and disable extent_copy() completely, for kernels before 2.6.38. Not an ideal situation at all :( If there really is risk of data loss with ext4 on kernels 2.6.38, even if it's only on unusual files, then we'll have to consider using a kernel version check to disable extent_copy. Is there a stand-alone ext4 reproducer? dd if=/dev/zero of=/mnt/foo bs=1M seek=1 count=1 Without a sync the buggy ext4/btrfs will not find the extent, and report only holes. OK, FIEMAP_FLAG_SYNC is an effective workaround for that. So as mentioned above, we might consider not using this flag for XFS || kernel = 2.6.38 Btrfs before 2.6.38 may have real trouble though, even with the sync. We were returning overlapping ranges to you, so the destination would end up bigger than the original. This could be fixed in cp by making sure to never seek backwards based on the extents returned. Example bad results: logical: [ 0 ... 1MB ] - physical [ foo .. foo ] logical [ 0 ... 2MB ] - physical [ foo2 .. foo2 ] 100% a btrfs bug, but cp would do the 0 .. 1MB bit and then seek back to zero and do the 0 .. 2MB bit. If cp had not seeked backwards you would have covered for me. 2.6.38 final fixed this problem. I think even XFS was fixed relatively recently, 2.6.36 and later. Looking at the commit, the simple dd test above wouldn't have triggered it. Actually, looking at this commit even the sync wouldn't save xfs before 2.6.36, Eric am I reading this right? So maybe just give in and look for 2.6.38 instead of trying to remember all the places where individual filesystems didn't suck. Give the user a cp --sparse=fiemap-im-really-really-sure to override your assumptions about our bugs. commit 9af25465081480a75824fd7a16a37a5cfebeede9 Author: Tao Ma tao...@oracle.com Date: Mon Aug 30 02:44:03 2010 + xfs: Make fiemap work with sparse files In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want to return fi_extent_max extents, but actually it won't work for a sparse file. The reason is that in xfs_getbmap we will calculate holes and set it in 'out', while out is malloced by bmv_count(fi_extent_max+1) which didn't consider holes. So in the worst case, if 'out' vector looks like [hole, extent, hole, extent, hole, ... hole, extent, hole], we will only return half of fi_extent_max extents. OK thanks for the info. So: XFS may miss some extents for sparse files before 2.6.36 BTRFS and EXT4 need a sync before fiemap() before 2.6.38 BTRFS can return overlapping extents before 2.6.38 It seems like we should at least detect overlapping extents in our: http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/extent-scan.c;hb=HEAD and return false (don't use fiemap) in this case. We've already stopped doing the lseeks() but that assumes non overlapping extents. Attached is the proposed patch to address the overlapping extents case, which we'll need if we don't bar kernels before 2.6.38, but is good to have anyway in case it happens in future. I think I'll do a follow up patch to get extent_scan_read() to read all extents before returning, which will improve the fix (and the previous merge extents change). I did that change, but was worried about the possible large mem requirements of scanning all extents up front, or the inefficiencies of scanning twice. So instead I took the approach of adjusting extents so that: | extent 1 | | extent 2 | is changed to | extent 1 | | | extent 2 | That allows copying the data even in this case. Chris, will that handle this specific BTRFS issue? I now also detect and just fail for these cases: | extent 1 | | extent 2 | or | extent 1 | | extent 2 | cheers, Pádraig. From e409ab13bc9ffb1e34f7e2fe41280ce373e8dfbd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= p...@draigbrady.com Date: Mon, 28 Mar 2011 19:22:21 +0100 Subject: [PATCH] copy: protect against overlapping extents * src/extent-scan.c (extent_scan_read): Add a more stringent check for OFF_T overflow, to ensure subsequent code is immune. Detect overlapping extents and adjust, so as files always copied. Detection using a single scan with fallback to a standard copy was thought too expensive in memory or time. --- NEWS |4
Re: [PATCH] copy: adjust fiemap handling of sparse files
Excerpts from Pádraig Brady's message of 2011-03-30 07:02:44 -0400: On 28/03/11 23:06, Pádraig Brady wrote: On 18/03/11 16:04, Pádraig Brady wrote: On 18/03/11 13:48, Chris Mason wrote: Excerpts from Pádraig Brady's message of 2011-03-18 09:19:44 -0400: On 18/03/11 12:04, Chris Mason wrote: Excerpts from Jim Meyering's message of 2011-03-18 07:52:43 -0400: Pádraig Brady wrote: So am I right in thinking FIEMAP_FLAG_SYNC is no longer needed with 2.6.38. I'm quite worried about implicit syncing in cp actually, where it might introduce bad performance regressions. Good point. Maybe we could disable this flag if XFS || kernel = 2.6.38? Or maybe we might do as you suggest above and disable extent_copy() completely, for kernels before 2.6.38. Not an ideal situation at all :( If there really is risk of data loss with ext4 on kernels 2.6.38, even if it's only on unusual files, then we'll have to consider using a kernel version check to disable extent_copy. Is there a stand-alone ext4 reproducer? dd if=/dev/zero of=/mnt/foo bs=1M seek=1 count=1 Without a sync the buggy ext4/btrfs will not find the extent, and report only holes. OK, FIEMAP_FLAG_SYNC is an effective workaround for that. So as mentioned above, we might consider not using this flag for XFS || kernel = 2.6.38 Btrfs before 2.6.38 may have real trouble though, even with the sync. We were returning overlapping ranges to you, so the destination would end up bigger than the original. This could be fixed in cp by making sure to never seek backwards based on the extents returned. Example bad results: logical: [ 0 ... 1MB ] - physical [ foo .. foo ] logical [ 0 ... 2MB ] - physical [ foo2 .. foo2 ] 100% a btrfs bug, but cp would do the 0 .. 1MB bit and then seek back to zero and do the 0 .. 2MB bit. If cp had not seeked backwards you would have covered for me. 2.6.38 final fixed this problem. I think even XFS was fixed relatively recently, 2.6.36 and later. Looking at the commit, the simple dd test above wouldn't have triggered it. Actually, looking at this commit even the sync wouldn't save xfs before 2.6.36, Eric am I reading this right? So maybe just give in and look for 2.6.38 instead of trying to remember all the places where individual filesystems didn't suck. Give the user a cp --sparse=fiemap-im-really-really-sure to override your assumptions about our bugs. commit 9af25465081480a75824fd7a16a37a5cfebeede9 Author: Tao Ma tao...@oracle.com Date: Mon Aug 30 02:44:03 2010 + xfs: Make fiemap work with sparse files In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want to return fi_extent_max extents, but actually it won't work for a sparse file. The reason is that in xfs_getbmap we will calculate holes and set it in 'out', while out is malloced by bmv_count(fi_extent_max+1) which didn't consider holes. So in the worst case, if 'out' vector looks like [hole, extent, hole, extent, hole, ... hole, extent, hole], we will only return half of fi_extent_max extents. OK thanks for the info. So: XFS may miss some extents for sparse files before 2.6.36 BTRFS and EXT4 need a sync before fiemap() before 2.6.38 BTRFS can return overlapping extents before 2.6.38 It seems like we should at least detect overlapping extents in our: http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/extent-scan.c;hb=HEAD and return false (don't use fiemap) in this case. We've already stopped doing the lseeks() but that assumes non overlapping extents. Attached is the proposed patch to address the overlapping extents case, which we'll need if we don't bar kernels before 2.6.38, but is good to have anyway in case it happens in future. I think I'll do a follow up patch to get extent_scan_read() to read all extents before returning, which will improve the fix (and the previous merge extents change). I did that change, but was worried about the possible large mem requirements of scanning all extents up front, or the inefficiencies of scanning twice. So instead I took the approach of adjusting extents so that: | extent 1 | | extent 2 | is changed to | extent 1 | | | extent 2 | That allows copying the data even in this case. Chris, will that handle this specific BTRFS issue? I now also detect and just fail for these cases: | extent 1 | | extent 2 | or | extent 1 | | extent 2 | Great, that should work just fine. Thanks! -chris
Re: [PATCH] copy: adjust fiemap handling of sparse files
On 17/03/11 07:24, Mike Frysinger wrote: On Wednesday, March 16, 2011 19:55:56 Pádraig Brady wrote: On 16/03/11 19:18, Mike Frysinger wrote: well, in the bug report i was working with, we were seeing data loss. i wonder if it'd be possible to detect the fs/kernel version and error out when versions are found that are known to be broken ? That was a different issue. It would be nice, but I don't think it would be practical to try and detect and work around such file system issues in cp. There are always going to be teething problems with a large body of new logic, so I think the best approach with file systems is to increase trust in the gradually over time. while this is true, i'd understand if the issue were bugs in coreutils' cp. they'd get fixed and a new release made, no problem. but we're talking about silent errors in the kernel, so anyone running a newer coreutils with a kernel from two days ago is going hit issues. if we were looking at basic read/write functionality, i'd understand not bothering, but we're talking purely about optimization paths in coreutils' cp. on the Gentoo side, i guess i'll run with a hack like so: --- a/src/copy.c +++ b/src/copy.c @@ -22,6 +22,7 @@ #include sys/ioctl.h #include sys/types.h #include selinux/selinux.h +#include sys/utsname.h #if HAVE_HURD_H # include hurd.h @@ -930,7 +931,32 @@ copy_reg (char const *src_name, char const *dst_name, the file is a hole. */ if (x-sparse_mode == SPARSE_AUTO S_ISREG (src_open_sb.st_mode) ST_NBLOCKS (src_open_sb) src_open_sb.st_size / ST_NBLOCKSIZE) -make_holes = true; +{ + make_holes = true; + +# ifdef __linux__ + static int safe_kernel = -1; + + if (safe_kernel == -1) +{ + struct utsname name; + + safe_kernel = 1; + + if (uname (name) != -1 strncmp (name.release, 2.6., 4) == 0) +{ + int kver = atoi (name.release + 4); + + /* ext4 btrfs are known to be broken */ + if (kver 38) +safe_kernel = 0; +} +} + + if (safe_kernel == 0) +make_holes = false; +# endif +} #endif } -mike I'm going to use this 2.6.38 check to only enable FIEMAP_FLAG_SYNC before Linux kernel 2.6.38. It's always worth avoiding sync if possible. Proposed patch attached. I'll submit my 3 outstanding fiemap patches tomorrow. cheers, Pádraig. From d2e62bb40017571baf12727c942fb9f131f8c951 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= p...@draigbrady.com Date: Wed, 30 Mar 2011 22:50:05 +0100 Subject: [PATCH] copy: with fiemap copy, only sync when needed * src/extent-scan.h (struct extent_scan): Add the fm_flags member to pass to the fiemap scan. * src/extent-scan.c (extent_need_sync): A new function used to detect Linux kernels before 2.6.38. (extent_scan_init): Add FIEMAP_FLAG_SYNC when needed. * tests/cp/sparse-fiemap: Adjust comment. * NEWS: Mention the change in behavior. Indirectly suggested by Mike Frysinger --- NEWS |4 src/extent-scan.c | 32 +++- src/extent-scan.h |3 +++ tests/cp/sparse-fiemap |4 ++-- 4 files changed, 40 insertions(+), 3 deletions(-) diff --git a/NEWS b/NEWS index 9c4a16f..d1020cb 100644 --- a/NEWS +++ b/NEWS @@ -31,6 +31,10 @@ GNU coreutils NEWS-*- outline -*- bytes from the input, and also can efficiently create a hole in the output file when --sparse=always is specified. + cp now avoids syncing files when possible, when doing a FIEMAP copy. + The sync in only needed on Linux kernels before 2.6.38. + [The sync was introduced in coreutils-8.10] + ** New features dd now accepts the 'nocache' flag to the iflag and oflag options, diff --git a/src/extent-scan.c b/src/extent-scan.c index 24b56b8..b563363 100644 --- a/src/extent-scan.c +++ b/src/extent-scan.c @@ -20,15 +20,44 @@ #include stdio.h #include sys/types.h #include sys/ioctl.h +#include sys/utsname.h #include assert.h #include system.h #include extent-scan.h +#include xstrtol.h #ifndef HAVE_FIEMAP # include fiemap.h #endif +/* Work around Linux kernel issues on BTRFS and EXT4 before 2.6.38. */ +static bool +extent_need_sync (void) +{ + static int need_sync = -1; + + if (need_sync == -1) +{ + struct utsname name; + need_sync = 0; /* No workaround by default. */ + +#ifdef __linux__ + if (uname (name) != -1 strncmp (name.release, 2.6., 4) == 0) +{ + unsigned long val; + if (xstrtoul (name.release + 4, NULL, 0, val, NULL) == LONGINT_OK) + { + if (val 38) + need_sync = 1; + }
Re: [PATCH] copy: adjust fiemap handling of sparse files
On 18/03/11 16:04, Pádraig Brady wrote: On 18/03/11 13:48, Chris Mason wrote: Excerpts from Pádraig Brady's message of 2011-03-18 09:19:44 -0400: On 18/03/11 12:04, Chris Mason wrote: Excerpts from Jim Meyering's message of 2011-03-18 07:52:43 -0400: Pádraig Brady wrote: So am I right in thinking FIEMAP_FLAG_SYNC is no longer needed with 2.6.38. I'm quite worried about implicit syncing in cp actually, where it might introduce bad performance regressions. Good point. Maybe we could disable this flag if XFS || kernel = 2.6.38? Or maybe we might do as you suggest above and disable extent_copy() completely, for kernels before 2.6.38. Not an ideal situation at all :( If there really is risk of data loss with ext4 on kernels 2.6.38, even if it's only on unusual files, then we'll have to consider using a kernel version check to disable extent_copy. Is there a stand-alone ext4 reproducer? dd if=/dev/zero of=/mnt/foo bs=1M seek=1 count=1 Without a sync the buggy ext4/btrfs will not find the extent, and report only holes. OK, FIEMAP_FLAG_SYNC is an effective workaround for that. So as mentioned above, we might consider not using this flag for XFS || kernel = 2.6.38 Btrfs before 2.6.38 may have real trouble though, even with the sync. We were returning overlapping ranges to you, so the destination would end up bigger than the original. This could be fixed in cp by making sure to never seek backwards based on the extents returned. Example bad results: logical: [ 0 ... 1MB ] - physical [ foo .. foo ] logical [ 0 ... 2MB ] - physical [ foo2 .. foo2 ] 100% a btrfs bug, but cp would do the 0 .. 1MB bit and then seek back to zero and do the 0 .. 2MB bit. If cp had not seeked backwards you would have covered for me. 2.6.38 final fixed this problem. I think even XFS was fixed relatively recently, 2.6.36 and later. Looking at the commit, the simple dd test above wouldn't have triggered it. Actually, looking at this commit even the sync wouldn't save xfs before 2.6.36, Eric am I reading this right? So maybe just give in and look for 2.6.38 instead of trying to remember all the places where individual filesystems didn't suck. Give the user a cp --sparse=fiemap-im-really-really-sure to override your assumptions about our bugs. commit 9af25465081480a75824fd7a16a37a5cfebeede9 Author: Tao Ma tao...@oracle.com Date: Mon Aug 30 02:44:03 2010 + xfs: Make fiemap work with sparse files In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want to return fi_extent_max extents, but actually it won't work for a sparse file. The reason is that in xfs_getbmap we will calculate holes and set it in 'out', while out is malloced by bmv_count(fi_extent_max+1) which didn't consider holes. So in the worst case, if 'out' vector looks like [hole, extent, hole, extent, hole, ... hole, extent, hole], we will only return half of fi_extent_max extents. OK thanks for the info. So: XFS may miss some extents for sparse files before 2.6.36 BTRFS and EXT4 need a sync before fiemap() before 2.6.38 BTRFS can return overlapping extents before 2.6.38 It seems like we should at least detect overlapping extents in our: http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/extent-scan.c;hb=HEAD and return false (don't use fiemap) in this case. We've already stopped doing the lseeks() but that assumes non overlapping extents. Attached is the proposed patch to address the overlapping extents case, which we'll need if we don't bar kernels before 2.6.38, but is good to have anyway in case it happens in future. I think I'll do a follow up patch to get extent_scan_read() to read all extents before returning, which will improve the fix (and the previous merge extents change). cheers, Pádraig. From 602fa952feeeb0a2cd4a4bca92f4659529f01454 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= p...@draigbrady.com Date: Mon, 28 Mar 2011 19:22:21 +0100 Subject: [PATCH] copy: protect against overlapping extents * src/extent-scan.c (extent_scan_read): Add a more stringent check for OFF_T overflow, to ensure subsequent code is immune. Detect overlapping extents and fail. If this happens in the initial fiemap scan, we'll revert to a standard copy, or otherwise we'll fail with a diagnostic. * src/copy.c (extent_copy): Add an assert to protect against generating an invalid hole_size with overlapping extents. We can remove this when extent_scan_read() is modified to read all extents, thus encapsulating all extent checking and merging within that function. --- src/copy.c|3 +++ src/extent-scan.c | 11 ++- 2 files changed, 13 insertions(+), 1 deletions(-) diff --git a/src/copy.c b/src/copy.c index a672fd3..7c1b3c6 100644 --- a/src/copy.c +++ b/src/copy.c @@ -350,6 +350,9 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, ext_len = 0; } + /*
Re: [PATCH] copy: adjust fiemap handling of sparse files
On 18/03/11 16:32, Eric Sandeen wrote: On 3/18/11 8:48 AM, Chris Mason wrote: ... Btrfs before 2.6.38 may have real trouble though, even with the sync. We were returning overlapping ranges to you, so the destination would end up bigger than the original. This could be fixed in cp by making sure to never seek backwards based on the extents returned. Example bad results: logical: [ 0 ... 1MB ] - physical [ foo .. foo ] logical [ 0 ... 2MB ] - physical [ foo2 .. foo2 ] 100% a btrfs bug, but cp would do the 0 .. 1MB bit and then seek back to zero and do the 0 .. 2MB bit. If cp had not seeked backwards you would have covered for me. 2.6.38 final fixed this problem. I think even XFS was fixed relatively recently, 2.6.36 and later. Looking at the commit, the simple dd test above wouldn't have triggered it. Actually, looking at this commit even the sync wouldn't save xfs before 2.6.36, Eric am I reading this right? Actually I don't think sync helps that either :) But as long as coreutils is looking for the last extent flag, I think it works out ok. Without that fix, I -think- we just returned fewer extents than were requested, but without the last/EOF flag returned, so hopefully the application keeps asking for more. I recently found another similar behavior that still exists... and fixed up a tool in xfstests to handle it: http://git.kernel.org/?p=fs/xfs/xfstests-dev.git;a=commitdiff;h=e423b5c584300c738cee95badc13b01bf38c5dc8 OK I'll assume this is not a big issue for cp as it checks the final flag. But there is the other issue with BTRFS that I forgot to mention, where it returns unwritten extents for holes, that was also fixed recently. I wonder how bad sync will be, in practice. Thinking about my own usecases, I doubt I point cp at modified-but-not-written source (source vs. dest I mean) files very often... The testcase in coreutils should definitely remove the fdatasync hack, as that just makes the test pass without worrying about things in the real world. :) A better test suite might be to generate random holey / delalloc files and copy those around. We should write an xfstests test to do the same thing, if we can determine whether we have a cp that knows about fiemap? cp --version = 8.10 is probably the best way at present, or perhaps `timeout 2 cp holey dst` cheers, Pádraig.
Re: [PATCH] copy: adjust fiemap handling of sparse files
On 19/03/11 12:07, Jim Meyering wrote: Pádraig Brady wrote: On 18/03/11 13:19, Pádraig Brady wrote: Bah humbug. Looks like there is no such issue. This actually seems like an issue in a coreutils test script, which made it seem like the SYNC done by `filefrag -vs` was ineffective. Proposed fix attached. Thanks! Here's a new version of your patch: I've adjusted your new function to modify the actual arrays, which lets us simplify the caller. I've also added two die message $ME: prefixes I'd forgotten. Cool. I'd considered splice but was worried about side effects, which I might not notice due to my Perl inexperience. Is filefrag's -s option portable enough for us to rely on it? I'll add a -s param to the first use of filefrag, to catch that. If FIEMAP_FLAG_SYNC is not implemented by the kernel then the test will fail anyway as CP won't be doing it's internal sync. thanks for the perl tutorial, much appreciated! Pádraig.
Re: [PATCH] copy: adjust fiemap handling of sparse files
On Thursday, March 17, 2011 19:00:26 Pádraig Brady wrote: On 17/03/11 19:29, Mike Frysinger wrote: On Thursday, March 17, 2011 05:32:33 Pádraig Brady wrote: Also note the make_holes heuristic variable is no longer used in extent_copy due to this patch which came after the 8.10 http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=275c831a i'll worry about it once 8.11 is released ;) You might be safer to just bypass extent_copy for kernels 2.6.38 that's basically what i was going for, but i guess i need to tweak my patch a bit more ... -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH] copy: adjust fiemap handling of sparse files
On 17/03/11 23:00, Pádraig Brady wrote: On 17/03/11 19:29, Mike Frysinger wrote: On Thursday, March 17, 2011 05:32:33 Pádraig Brady wrote: This is the kind of patch appropriate for a distro, as they may or may not have a fix backported to their kernel. Gentoo, being a source distribution, has no kernel requirement. people are free to run on pretty much anything. we dont have the luxury of saying upgrade your kernel package and reboot. especially since we also support running Gentoo inside of other distros (often times as non-root user) where people dont have the ability at all to upgrade. I presume you're referring to coreutils bug 8001. I didn't realise ext4 was also affected. Is this specific to the compress flag? What was the specific fix to btrfs /or ext4 in 2.6.38? this is the ext4-specific issue i'm avoiding: http://www.spinics.net/lists/linux-ext4/msg23430.html Ah that issue. That's what the syncing required in this test was working around: http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=tests/cp/sparse-fiemap;h=a2460a08;hb=HEAD Note Fedora 15 about to be released is using the new fiemap code in cp, but is also based on 2.6.38 and so will have the fix soon, if it does not already have it. I also now remember this discussion: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=6131 where FIEMAP_FLAG_SYNC was introduced in cp, I think to work around this same bug in BTRFS and EXT4. This flag in ineffective though on ext4 loopback and thus I needed to add the syncs to the test as ref'd above. Also note the make_holes heuristic variable is no longer used in extent_copy due to this patch which came after the 8.10 http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=275c831a i'll worry about it once 8.11 is released ;) -mike You might be safer to just bypass extent_copy for kernels 2.6.38 I'm 30:70 for doing that in upstream coreutils So am I right in thinking FIEMAP_FLAG_SYNC is no longer needed with 2.6.38. I'm quite worried about implicit syncing in cp actually, where it might introduce bad performance regressions. Maybe we could disable this flag if XFS || kernel = 2.6.38? Or maybe we might do as you suggest above and disable extent_copy() completely, for kernels before 2.6.38. Not an ideal situation at all :( cheers, Pádraig.
Re: [PATCH] copy: adjust fiemap handling of sparse files
Pádraig Brady wrote: On 17/03/11 23:00, Pádraig Brady wrote: On 17/03/11 19:29, Mike Frysinger wrote: On Thursday, March 17, 2011 05:32:33 Pádraig Brady wrote: This is the kind of patch appropriate for a distro, as they may or may not have a fix backported to their kernel. Gentoo, being a source distribution, has no kernel requirement. people are free to run on pretty much anything. we dont have the luxury of saying upgrade your kernel package and reboot. especially since we also support running Gentoo inside of other distros (often times as non-root user) where people dont have the ability at all to upgrade. I presume you're referring to coreutils bug 8001. I didn't realise ext4 was also affected. Is this specific to the compress flag? What was the specific fix to btrfs /or ext4 in 2.6.38? this is the ext4-specific issue i'm avoiding: http://www.spinics.net/lists/linux-ext4/msg23430.html Ah that issue. That's what the syncing required in this test was working around: http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=tests/cp/sparse-fiemap;h=a2460a08;hb=HEAD Note Fedora 15 about to be released is using the new fiemap code in cp, but is also based on 2.6.38 and so will have the fix soon, if it does not already have it. I also now remember this discussion: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=6131 where FIEMAP_FLAG_SYNC was introduced in cp, I think to work around this same bug in BTRFS and EXT4. This flag in ineffective though on ext4 loopback and thus I needed to add the syncs to the test as ref'd above. Also note the make_holes heuristic variable is no longer used in extent_copy due to this patch which came after the 8.10 http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=275c831a i'll worry about it once 8.11 is released ;) -mike You might be safer to just bypass extent_copy for kernels 2.6.38 I'm 30:70 for doing that in upstream coreutils So am I right in thinking FIEMAP_FLAG_SYNC is no longer needed with 2.6.38. I'm quite worried about implicit syncing in cp actually, where it might introduce bad performance regressions. Good point. Maybe we could disable this flag if XFS || kernel = 2.6.38? Or maybe we might do as you suggest above and disable extent_copy() completely, for kernels before 2.6.38. Not an ideal situation at all :( If there really is risk of data loss with ext4 on kernels 2.6.38, even if it's only on unusual files, then we'll have to consider using a kernel version check to disable extent_copy. Is there a stand-alone ext4 reproducer?
Re: [PATCH] copy: adjust fiemap handling of sparse files
Excerpts from Jim Meyering's message of 2011-03-18 07:52:43 -0400: Pádraig Brady wrote: On 17/03/11 23:00, Pádraig Brady wrote: On 17/03/11 19:29, Mike Frysinger wrote: On Thursday, March 17, 2011 05:32:33 Pádraig Brady wrote: This is the kind of patch appropriate for a distro, as they may or may not have a fix backported to their kernel. Gentoo, being a source distribution, has no kernel requirement. people are free to run on pretty much anything. we dont have the luxury of saying upgrade your kernel package and reboot. especially since we also support running Gentoo inside of other distros (often times as non-root user) where people dont have the ability at all to upgrade. I presume you're referring to coreutils bug 8001. I didn't realise ext4 was also affected. Is this specific to the compress flag? What was the specific fix to btrfs /or ext4 in 2.6.38? this is the ext4-specific issue i'm avoiding: http://www.spinics.net/lists/linux-ext4/msg23430.html Ah that issue. That's what the syncing required in this test was working around: http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=tests/cp/sparse-fiemap;h=a2460a08;hb=HEAD Note Fedora 15 about to be released is using the new fiemap code in cp, but is also based on 2.6.38 and so will have the fix soon, if it does not already have it. I also now remember this discussion: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=6131 where FIEMAP_FLAG_SYNC was introduced in cp, I think to work around this same bug in BTRFS and EXT4. This flag in ineffective though on ext4 loopback and thus I needed to add the syncs to the test as ref'd above. Sorry, I'm not sure I follow the loopback problem? Also note the make_holes heuristic variable is no longer used in extent_copy due to this patch which came after the 8.10 http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=275c831a i'll worry about it once 8.11 is released ;) -mike You might be safer to just bypass extent_copy for kernels 2.6.38 I'm 30:70 for doing that in upstream coreutils So am I right in thinking FIEMAP_FLAG_SYNC is no longer needed with 2.6.38. I'm quite worried about implicit syncing in cp actually, where it might introduce bad performance regressions. Good point. Maybe we could disable this flag if XFS || kernel = 2.6.38? Or maybe we might do as you suggest above and disable extent_copy() completely, for kernels before 2.6.38. Not an ideal situation at all :( If there really is risk of data loss with ext4 on kernels 2.6.38, even if it's only on unusual files, then we'll have to consider using a kernel version check to disable extent_copy. Is there a stand-alone ext4 reproducer? dd if=/dev/zero of=/mnt/foo bs=1M seek=1 count=1 Without a sync the buggy ext4/btrfs will not find the extent, and report only holes. -chris
Re: [PATCH] copy: adjust fiemap handling of sparse files
Excerpts from Pádraig Brady's message of 2011-03-18 09:19:44 -0400: On 18/03/11 12:04, Chris Mason wrote: Excerpts from Jim Meyering's message of 2011-03-18 07:52:43 -0400: Pádraig Brady wrote: I also now remember this discussion: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=6131 where FIEMAP_FLAG_SYNC was introduced in cp, I think to work around this same bug in BTRFS and EXT4. This flag in ineffective though on ext4 loopback and thus I needed to add the syncs to the test as ref'd above. Sorry, I'm not sure I follow the loopback problem? Bah humbug. Looks like there is no such issue. This actually seems like an issue in a coreutils test script, which make it seem like the SYNC done by `filefrag -vs` was ineffective. Great, that makes much more sense ;) Also note the make_holes heuristic variable is no longer used in extent_copy due to this patch which came after the 8.10 http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=275c831a i'll worry about it once 8.11 is released ;) -mike You might be safer to just bypass extent_copy for kernels 2.6.38 I'm 30:70 for doing that in upstream coreutils So am I right in thinking FIEMAP_FLAG_SYNC is no longer needed with 2.6.38. I'm quite worried about implicit syncing in cp actually, where it might introduce bad performance regressions. Good point. Maybe we could disable this flag if XFS || kernel = 2.6.38? Or maybe we might do as you suggest above and disable extent_copy() completely, for kernels before 2.6.38. Not an ideal situation at all :( If there really is risk of data loss with ext4 on kernels 2.6.38, even if it's only on unusual files, then we'll have to consider using a kernel version check to disable extent_copy. Is there a stand-alone ext4 reproducer? dd if=/dev/zero of=/mnt/foo bs=1M seek=1 count=1 Without a sync the buggy ext4/btrfs will not find the extent, and report only holes. OK, FIEMAP_FLAG_SYNC is an effective workaround for that. So as mentioned above, we might consider not using this flag for XFS || kernel = 2.6.38 Btrfs before 2.6.38 may have real trouble though, even with the sync. We were returning overlapping ranges to you, so the destination would end up bigger than the original. This could be fixed in cp by making sure to never seek backwards based on the extents returned. Example bad results: logical: [ 0 ... 1MB ] - physical [ foo .. foo ] logical [ 0 ... 2MB ] - physical [ foo2 .. foo2 ] 100% a btrfs bug, but cp would do the 0 .. 1MB bit and then seek back to zero and do the 0 .. 2MB bit. If cp had not seeked backwards you would have covered for me. 2.6.38 final fixed this problem. I think even XFS was fixed relatively recently, 2.6.36 and later. Looking at the commit, the simple dd test above wouldn't have triggered it. Actually, looking at this commit even the sync wouldn't save xfs before 2.6.36, Eric am I reading this right? So maybe just give in and look for 2.6.38 instead of trying to remember all the places where individual filesystems didn't suck. Give the user a cp --sparse=fiemap-im-really-really-sure to override your assumptions about our bugs. commit 9af25465081480a75824fd7a16a37a5cfebeede9 Author: Tao Ma tao...@oracle.com Date: Mon Aug 30 02:44:03 2010 + xfs: Make fiemap work with sparse files In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want to return fi_extent_max extents, but actually it won't work for a sparse file. The reason is that in xfs_getbmap we will calculate holes and set it in 'out', while out is malloced by bmv_count(fi_extent_max+1) which didn't consider holes. So in the worst case, if 'out' vector looks like [hole, extent, hole, extent, hole, ... hole, extent, hole], we will only return half of fi_extent_max extents. This patch add a new parameter BMV_IF_NO_HOLES for bvm_iflags. So with this flags, we don't use our 'out' in xfs_getbmap for a hole. The solution is a bit ugly by just don't increasing index of 'out' vector. I felt that it is not easy to skip it at the very beginning since we have the complicated check and some function like xfs_getbmapx_fix_eof_hole to adjust 'out'. Cc: Dave Chinner da...@fromorbit.com Signed-off-by: Tao Ma tao...@oracle.com Signed-off-by: Alex Elder ael...@sgi.com -chris
Re: [PATCH] copy: adjust fiemap handling of sparse files
Pádraig Brady wrote: ... OK thanks for the info. So: XFS may miss some extents for sparse files before 2.6.36 BTRFS and EXT4 need a sync before fiemap() before 2.6.38 BTRFS can return overlapping extents before 2.6.38 It seems like we should at least detect overlapping extents in our: http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/extent-scan.c;hb=HEAD and return false (don't use fiemap) in this case. We've already stopped doing the lseeks() but that assumes non overlapping extents. So if we do the above we avoid corruption on BTRFS. We could also change to only enabling FIEMAP_FLAG_SYNC before 2.6.38? That still doesn't account for the XFS issue which was only fixed in a released kernel 5 months ago, and that leans towards us changing cp to just not use fiemap before 2.6.38 as you suggest. That allows us to remove the FIEMAP_FLAG_SYNC too. Good summary. That latter is simpler and gets my vote: don't use FIEMAP with a pre-2.6.38 kernel
Re: [PATCH] copy: adjust fiemap handling of sparse files
On 3/18/11 8:48 AM, Chris Mason wrote: ... Btrfs before 2.6.38 may have real trouble though, even with the sync. We were returning overlapping ranges to you, so the destination would end up bigger than the original. This could be fixed in cp by making sure to never seek backwards based on the extents returned. Example bad results: logical: [ 0 ... 1MB ] - physical [ foo .. foo ] logical [ 0 ... 2MB ] - physical [ foo2 .. foo2 ] 100% a btrfs bug, but cp would do the 0 .. 1MB bit and then seek back to zero and do the 0 .. 2MB bit. If cp had not seeked backwards you would have covered for me. 2.6.38 final fixed this problem. I think even XFS was fixed relatively recently, 2.6.36 and later. Looking at the commit, the simple dd test above wouldn't have triggered it. Actually, looking at this commit even the sync wouldn't save xfs before 2.6.36, Eric am I reading this right? Actually I don't think sync helps that either :) But as long as coreutils is looking for the last extent flag, I think it works out ok. Without that fix, I -think- we just returned fewer extents than were requested, but without the last/EOF flag returned, so hopefully the application keeps asking for more. I recently found another similar behavior that still exists... and fixed up a tool in xfstests to handle it: http://git.kernel.org/?p=fs/xfs/xfstests-dev.git;a=commitdiff;h=e423b5c584300c738cee95badc13b01bf38c5dc8 I wonder how bad sync will be, in practice. Thinking about my own usecases, I doubt I point cp at modified-but-not-written source (source vs. dest I mean) files very often... The testcase in coreutils should definitely remove the fdatasync hack, as that just makes the test pass without worrying about things in the real world. :) A better test suite might be to generate random holey / delalloc files and copy those around. We should write an xfstests test to do the same thing, if we can determine whether we have a cp that knows about fiemap? -Eric So maybe just give in and look for 2.6.38 instead of trying to remember all the places where individual filesystems didn't suck. Give the user a cp --sparse=fiemap-im-really-really-sure to override your assumptions about our bugs. commit 9af25465081480a75824fd7a16a37a5cfebeede9 Author: Tao Ma tao...@oracle.com Date: Mon Aug 30 02:44:03 2010 + xfs: Make fiemap work with sparse files In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want to return fi_extent_max extents, but actually it won't work for a sparse file. The reason is that in xfs_getbmap we will calculate holes and set it in 'out', while out is malloced by bmv_count(fi_extent_max+1) which didn't consider holes. So in the worst case, if 'out' vector looks like [hole, extent, hole, extent, hole, ... hole, extent, hole], we will only return half of fi_extent_max extents. This patch add a new parameter BMV_IF_NO_HOLES for bvm_iflags. So with this flags, we don't use our 'out' in xfs_getbmap for a hole. The solution is a bit ugly by just don't increasing index of 'out' vector. I felt that it is not easy to skip it at the very beginning since we have the complicated check and some function like xfs_getbmapx_fix_eof_hole to adjust 'out'. Cc: Dave Chinner da...@fromorbit.com Signed-off-by: Tao Ma tao...@oracle.com Signed-off-by: Alex Elder ael...@sgi.com -chris
Re: [PATCH] copy: adjust fiemap handling of sparse files
Eric Sandeen wrote: ... The testcase in coreutils should definitely remove the fdatasync hack, as that just makes the test pass without worrying about things in the real world. :) A better test suite might be to generate random holey / delalloc files and copy those around. We should write an xfstests test to do the same thing, if we can determine whether we have a cp that knows about fiemap? On ext4, FIEMAP-enabled cp will preserve the hole in this file, but older versions will not: perl -e 'sysseek (*STDOUT, 4096, 1) print a or die $!' j cp j k; test $(stat --format %b j) = $(stat --format %b k) \ || { du -sh j k; echo FAIL; } With cp from coreutils-8.10, both k and j occupy 8 blocks on ext4. Using F14's /bin/cp (coreutils-8.5+), k occupies 16 blocks.
Re: [PATCH] copy: adjust fiemap handling of sparse files
On 18/03/11 13:19, Pádraig Brady wrote: Bah humbug. Looks like there is no such issue. This actually seems like an issue in a coreutils test script, which made it seem like the SYNC done by `filefrag -vs` was ineffective. Proposed fix attached. My perl is still weak, so I won't apply until someone has reviewed. thanks, Pádraig. From 7e27d28b799f0399bbf79074854fa9967ff7752e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= p...@draigbrady.com Date: Sat, 19 Mar 2011 01:22:37 + Subject: [PATCH] tests: fix the sparse-fiemap test * tests/filefrag-extent-compare: Merge adjacent extents in each list before processing, so we correctly account for split extents in either list. * tests/cp/sparse-fiemap: Remove the explicit syncing, which was only changing way extents were arranged, and thus working around the extent comparison issue that was seen on ext4 loop back. --- tests/cp/sparse-fiemap| 11 +-- tests/filefrag-extent-compare | 41 ++--- 2 files changed, 35 insertions(+), 17 deletions(-) diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap index a2460a0..5f0beb7 100755 --- a/tests/cp/sparse-fiemap +++ b/tests/cp/sparse-fiemap @@ -69,12 +69,11 @@ for i in $(seq 1 2 21); do -e 'for (1..'$j') { sysseek (*F, $n, 1)' \ -e ' syswrite (*F, chr($_)x$n) or die $!}' j1 || fail=1 -# Note the explicit fdatasync is used here as -# it was seen that `filefrag -s` (FIEMAP_FLAG_SYNC) was -# ineffective on ext4 loopback on Linux 2.6.35.10-72.fc14.i686 -dd if=/dev/null of=j1 conv=notrunc,fdatasync +# Note there is an implicit sync performed by cp to +# work arounds bugs in EXT4 and BTRFS before Linux 2.6.38 +# Note also the -s parameter to the second filefrag below +# for the same reasons. cp --sparse=always j1 j2 || fail=1 -dd if=/dev/null of=j2 conv=notrunc,fdatasync cmp j1 j2 || fail=1 if ! filefrag -v j1 | grep -F extent /dev/null; then @@ -98,7 +97,7 @@ for i in $(seq 1 2 21); do # exclude the physical block numbers; they always differ filefrag -v j1 ff1 || framework_failure - filefrag -v j2 ff2 || framework_failure + filefrag -vs j2 ff2 || framework_failure { f ff1; f ff2; } | $PERL $abs_top_srcdir/tests/filefrag-extent-compare || fail=1 fi diff --git a/tests/filefrag-extent-compare b/tests/filefrag-extent-compare index 3c095d5..fa04d9e 100644 --- a/tests/filefrag-extent-compare +++ b/tests/filefrag-extent-compare @@ -28,31 +28,50 @@ my @b; foreach my $i (0..@A/2-1) { $a[$i] = { L_BLK = $A[2*$i], LEN = $A[2*$i+1] } }; foreach my $i (0..@B/2-1) { $b[$i] = { L_BLK = $B[2*$i], LEN = $B[2*$i+1] } }; +# Merge adjacent extents in passed array +# Discounted extents have length set to 0 +sub merge_extents($) +{ + my @e = @{ $_[0] }; + + my $a = 1; + my $b = 0; + while (1) +{ + (!defined $e[$a] || !defined $e[$b]) +and last; + ($e[$b]-{L_BLK} + $e[$b]-{LEN} == $e[$a]-{L_BLK}) +and $e[$b]-{LEN} += $e[$a]-{LEN}, $e[$a]-{LEN} = 0, next; + $b=$a; +} + continue +{ + ++$a; +} +} + +merge_extents(\@a); +merge_extents(\@b); + my $i = 0; my $j = 0; while (1) { +# skip discounted extents +defined $a[$i] $a[$i]-{LEN} == 0 and ++$i, next; +defined $b[$j] $b[$j]-{LEN} == 0 and ++$j, next; + !defined $a[$i] !defined $b[$j] and exit 0; defined $a[$i] defined $b[$j] or die \@a and \@b have different lengths, even after adjustment\n; ($a[$i]-{L_BLK} == $b[$j]-{L_BLK} $a[$i]-{LEN} == $b[$j]-{LEN}) - and next; -($a[$i]-{LEN} $b[$j]-{LEN} - exists $a[$i+1] $a[$i]-{LEN} + $a[$i+1]-{LEN} == $b[$j]-{LEN}) - and ++$i, next; -exists $b[$j+1] $a[$i]-{LEN} == $b[$i]-{LEN} + $b[$i+1]-{LEN} - and ++$j, next; + and $i++, $j++, next; die differing extent:\n . [$i]=$a[$i]-{L_BLK} $a[$i]-{LEN}\n . [$j]=$b[$j]-{L_BLK} $b[$j]-{LEN}\n } -continue - { -++$i; -++$j; - } ### Setup GNU style for perl-mode and cperl-mode. ## Local Variables: -- 1.7.4
Re: [PATCH] copy: adjust fiemap handling of sparse files
On Wednesday, March 16, 2011 19:55:56 Pádraig Brady wrote: On 16/03/11 19:18, Mike Frysinger wrote: well, in the bug report i was working with, we were seeing data loss. i wonder if it'd be possible to detect the fs/kernel version and error out when versions are found that are known to be broken ? That was a different issue. It would be nice, but I don't think it would be practical to try and detect and work around such file system issues in cp. There are always going to be teething problems with a large body of new logic, so I think the best approach with file systems is to increase trust in the gradually over time. while this is true, i'd understand if the issue were bugs in coreutils' cp. they'd get fixed and a new release made, no problem. but we're talking about silent errors in the kernel, so anyone running a newer coreutils with a kernel from two days ago is going hit issues. if we were looking at basic read/write functionality, i'd understand not bothering, but we're talking purely about optimization paths in coreutils' cp. on the Gentoo side, i guess i'll run with a hack like so: --- a/src/copy.c +++ b/src/copy.c @@ -22,6 +22,7 @@ #include sys/ioctl.h #include sys/types.h #include selinux/selinux.h +#include sys/utsname.h #if HAVE_HURD_H # include hurd.h @@ -930,7 +931,32 @@ copy_reg (char const *src_name, char const *dst_name, the file is a hole. */ if (x-sparse_mode == SPARSE_AUTO S_ISREG (src_open_sb.st_mode) ST_NBLOCKS (src_open_sb) src_open_sb.st_size / ST_NBLOCKSIZE) -make_holes = true; +{ + make_holes = true; + +# ifdef __linux__ + static int safe_kernel = -1; + + if (safe_kernel == -1) +{ + struct utsname name; + + safe_kernel = 1; + + if (uname (name) != -1 strncmp (name.release, 2.6., 4) == 0) +{ + int kver = atoi (name.release + 4); + + /* ext4 btrfs are known to be broken */ + if (kver 38) +safe_kernel = 0; +} +} + + if (safe_kernel == 0) +make_holes = false; +# endif +} #endif } -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH] copy: adjust fiemap handling of sparse files
On 17/03/11 07:24, Mike Frysinger wrote: On Wednesday, March 16, 2011 19:55:56 Pádraig Brady wrote: On 16/03/11 19:18, Mike Frysinger wrote: well, in the bug report i was working with, we were seeing data loss. i wonder if it'd be possible to detect the fs/kernel version and error out when versions are found that are known to be broken ? That was a different issue. It would be nice, but I don't think it would be practical to try and detect and work around such file system issues in cp. There are always going to be teething problems with a large body of new logic, so I think the best approach with file systems is to increase trust in the gradually over time. while this is true, i'd understand if the issue were bugs in coreutils' cp. they'd get fixed and a new release made, no problem. but we're talking about silent errors in the kernel, so anyone running a newer coreutils with a kernel from two days ago is going hit issues. if we were looking at basic read/write functionality, i'd understand not bothering, but we're talking purely about optimization paths in coreutils' cp. on the Gentoo side, i guess i'll run with a hack like so: --- a/src/copy.c +++ b/src/copy.c @@ -22,6 +22,7 @@ #include sys/ioctl.h #include sys/types.h #include selinux/selinux.h +#include sys/utsname.h #if HAVE_HURD_H # include hurd.h @@ -930,7 +931,32 @@ copy_reg (char const *src_name, char const *dst_name, the file is a hole. */ if (x-sparse_mode == SPARSE_AUTO S_ISREG (src_open_sb.st_mode) ST_NBLOCKS (src_open_sb) src_open_sb.st_size / ST_NBLOCKSIZE) -make_holes = true; +{ + make_holes = true; + +# ifdef __linux__ + static int safe_kernel = -1; + + if (safe_kernel == -1) +{ + struct utsname name; + + safe_kernel = 1; + + if (uname (name) != -1 strncmp (name.release, 2.6., 4) == 0) +{ + int kver = atoi (name.release + 4); + + /* ext4 btrfs are known to be broken */ + if (kver 38) +safe_kernel = 0; +} +} + + if (safe_kernel == 0) +make_holes = false; +# endif +} #endif } -mike This is the kind of patch appropriate for a distro, as they may or may not have a fix backported to their kernel. I presume you're referring to coreutils bug 8001. I didn't realise ext4 was also affected. Is this specific to the compress flag? What was the specific fix to btrfs /or ext4 in 2.6.38? Also note the make_holes heuristic variable is no longer used in extent_copy due to this patch which came after the 8.10 http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=275c831a cheers, Pádraig
Re: [PATCH] copy: adjust fiemap handling of sparse files
On Thursday, March 17, 2011 05:32:33 Pádraig Brady wrote: This is the kind of patch appropriate for a distro, as they may or may not have a fix backported to their kernel. Gentoo, being a source distribution, has no kernel requirement. people are free to run on pretty much anything. we dont have the luxury of saying upgrade your kernel package and reboot. especially since we also support running Gentoo inside of other distros (often times as non-root user) where people dont have the ability at all to upgrade. I presume you're referring to coreutils bug 8001. I didn't realise ext4 was also affected. Is this specific to the compress flag? What was the specific fix to btrfs /or ext4 in 2.6.38? this is the ext4-specific issue i'm avoiding: http://www.spinics.net/lists/linux-ext4/msg23430.html Also note the make_holes heuristic variable is no longer used in extent_copy due to this patch which came after the 8.10 http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=275c831a i'll worry about it once 8.11 is released ;) -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH] copy: adjust fiemap handling of sparse files
On 17/03/11 19:29, Mike Frysinger wrote: On Thursday, March 17, 2011 05:32:33 Pádraig Brady wrote: This is the kind of patch appropriate for a distro, as they may or may not have a fix backported to their kernel. Gentoo, being a source distribution, has no kernel requirement. people are free to run on pretty much anything. we dont have the luxury of saying upgrade your kernel package and reboot. especially since we also support running Gentoo inside of other distros (often times as non-root user) where people dont have the ability at all to upgrade. I presume you're referring to coreutils bug 8001. I didn't realise ext4 was also affected. Is this specific to the compress flag? What was the specific fix to btrfs /or ext4 in 2.6.38? this is the ext4-specific issue i'm avoiding: http://www.spinics.net/lists/linux-ext4/msg23430.html Ah that issue. That's what the syncing required in this test was working around: http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=tests/cp/sparse-fiemap;h=a2460a08;hb=HEAD Note Fedora 15 about to be released is using the new fiemap code in cp, but is also based on 2.6.38 and so will have the fix soon, if it does not already have it. Also note the make_holes heuristic variable is no longer used in extent_copy due to this patch which came after the 8.10 http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=275c831a i'll worry about it once 8.11 is released ;) -mike You might be safer to just bypass extent_copy for kernels 2.6.38 I'm 30:70 for doing that in upstream coreutils cheers, Pádraig.
Re: [PATCH] copy: adjust fiemap handling of sparse files
On 16/03/11 19:18, Mike Frysinger wrote: On Wednesday, March 16, 2011 11:26:40 Pádraig Brady wrote: On 14/02/11 06:05, Jim Meyering wrote: Pádraig Brady wrote: For my own reference, I can probably skip performance tests on (older) btrfs by checking `filefrag` output. Also in `cp`, if we see an unwritten extent we should probably create a hole rather than an empty allocation by default. It's better to decrease file allocation than increase it. Makes sense. Thinking a bit more about it, I'm not sure I should do the above, as one would be more surprised if cp by default introduced holes into a copy of a fully allocated file. The previously applied patch changes behavior on BTRFS on Fedora 14, in that it will convert a file with holes to a fully allocated one with zeros. But that is due to an already fixed bug in BTRFS where it reports holes as empty extents. I'm inclined to leave this as is, because this stuff is tricky enough, without introducing work arounds for buggy file systems. There is no data loss in this case, just bigger files (which one can avoid with cp --sparse=always). Also it will not be common to overlap future coreutils releases, with older BTRFS implementations. well, in the bug report i was working with, we were seeing data loss. i wonder if it'd be possible to detect the fs/kernel version and error out when versions are found that are known to be broken ? That was a different issue. It would be nice, but I don't think it would be practical to try and detect and work around such file system issues in cp. There are always going to be teething problems with a large body of new logic, so I think the best approach with file systems is to increase trust in the gradually over time. Personally I'd consider BTRFS for my backup drive at present, which it should be particularly good at given its snapshot capabilities. cheers, Pádraig.
Re: [PATCH] copy: adjust fiemap handling of sparse files
Unfortunately, after checking BTRFS I see that fiemap behaves differently to EXT4. IMHO the EXT4 operation seems correct, and gives full info about the structure of a file, which cp for example can use to efficiently and accurately reproduce the structure at the destination. On EXT4 (on kernel-2.6.35.11-83.fc14.i686) there are no extents returned for holes in a file. However on btrfs it does return an extent for holes? So with btrfs there is no way to know an extent is allocated but unwritten (zero), so one must read and write all the data, rather than just fallocating the space in the destination. One can also see this with the following on btrfs: $ fallocate -l 1 unwritten $ truncate -s 1 sparse $ dd count=1000 bs=10 if=/dev/zero of=zero $ filefrag -vs * Filesystem type is: 9123683e File size of sparse is 1 (24415 blocks, blocksize 4096) ext logical physical expected length flags 0 00 24415 unwritten,eof sparse: 1 extent found File size of unwritten is 1 (24415 blocks, blocksize 4096) ext logical physical expected length flags 0 068160 12207 1 122079256080366 12208 eof unwritten: 2 extents found File size of zeros is 1 (24415 blocks, blocksize 4096) ext logical physical expected length flags 0 019360 20678 1 206784376040037 3737 eof zeros: 2 extents found So is this expected? Has this already been changed to match ext4? For my own reference, I can probably skip performance tests on (older) btrfs by checking `filefrag` output. Also in `cp`, if we see an unwritten extent we should probably create a hole rather than an empty allocation by default. It's better to decrease file allocation than increase it. cheers, Pádraig.
Re: [PATCH] copy: adjust fiemap handling of sparse files
Pádraig Brady wrote: Unfortunately, after checking BTRFS I see that fiemap behaves differently to EXT4. IMHO the EXT4 operation seems correct, and gives full info about the structure of a file, which cp for example can use to efficiently and accurately reproduce the structure at the destination. On EXT4 (on kernel-2.6.35.11-83.fc14.i686) there are no extents returned for holes in a file. However on btrfs it does return an extent for holes? So with btrfs there is no way to know an extent is allocated but unwritten (zero), so one must read and write all the data, rather than just fallocating the space in the destination. One can also see this with the following on btrfs: $ fallocate -l 1 unwritten $ truncate -s 1 sparse $ dd count=1000 bs=10 if=/dev/zero of=zero $ filefrag -vs * Filesystem type is: 9123683e File size of sparse is 1 (24415 blocks, blocksize 4096) ext logical physical expected length flags 0 00 24415 unwritten,eof sparse: 1 extent found File size of unwritten is 1 (24415 blocks, blocksize 4096) ext logical physical expected length flags 0 068160 12207 1 122079256080366 12208 eof unwritten: 2 extents found File size of zeros is 1 (24415 blocks, blocksize 4096) ext logical physical expected length flags 0 019360 20678 1 206784376040037 3737 eof zeros: 2 extents found So is this expected? I heard it's a bug in F14's btrfs. To test btrfs properly, I had to use rawhide/F15. Has this already been changed to match ext4? Yes, with a new-enough kernel. For my own reference, I can probably skip performance tests on (older) btrfs by checking `filefrag` output. Also in `cp`, if we see an unwritten extent we should probably create a hole rather than an empty allocation by default. It's better to decrease file allocation than increase it. Makes sense.
Re: [PATCH] copy: adjust fiemap handling of sparse files
On 09/02/11 23:35, Pádraig Brady wrote: Subject: [PATCH] copy: adjust fiemap handling of sparse files Don't depend on heuristics to detect sparse files if fiemap is available. Also don't introduce new holes unless --sparse=always has been specified. * src/copy.c (extent_copy): Pass the user specified sparse mode, and handle as described above. Also a redundant lseek has been suppressed when there is no hole between two extents. Could that be done in two separate patches? Here is the first patch split out to suppress redundant lseek()s. It's expanded now to suppress lseek in the source as well as dest. $ fallocate -l $((129)) /tmp/t.f $ strace -e _llseek cp-before /tmp/t.f /dev/null 21 | wc -l 180 $ strace -e _llseek cp /tmp/t.f /dev/null 21 | wc -l 0 Hmm, the above suggests to me that we could use the FIEMAP_EXTENT_UNWRITTEN flag, so as to skip the read() for these allocated but unwritten (zero) extents. We could treat such extents like holes, except we would only generate holes in the dest with SPARSE_ALWAYS. I'll do patch for that this evening. Anyway the following small reorg will also simplify possible future changes to introduce fallocate(). Note the attached mostly changes indenting, with the significant change being just: $ git diff -w --no-ext-diff HEAD~ diff --git a/src/copy.c b/src/copy.c index 9182c16..5b6ffe3 100644 --- a/src/copy.c +++ b/src/copy.c @@ -334,7 +334,10 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, { off_t ext_start = scan.ext_info[i].ext_logical; uint64_t ext_len = scan.ext_info[i].ext_length; + uint64_t hole_size = ext_start - last_ext_start - last_ext_len; + if (hole_size) +{ if (lseek (src_fd, ext_start, SEEK_SET) 0) { error (0, errno, _(cannot lseek %s), quote (src_name)); @@ -356,11 +359,6 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, /* When not inducing holes and when there is a hole between the end of the previous extent and the beginning of the current one, write zeros to the destination file. */ - if (last_ext_start + last_ext_len ext_start) -{ - uint64_t hole_size = (ext_start -- last_ext_start -- last_ext_len); if (! write_zeros (dest_fd, hole_size)) { error (0, errno, _(%s: write failed), quote (dst_name)); From b113a93950776fc308985941d8e4825f1f8453dd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= p...@draigbrady.com Date: Tue, 8 Feb 2011 19:16:55 + Subject: [PATCH] copy: suppress redundant lseeks when using fiemap * src/copy.c (extent_copy): Suppress redundant lseek()s in both the source and dest files, when there is no hole between extents. --- src/copy.c | 40 +++- 1 files changed, 19 insertions(+), 21 deletions(-) diff --git a/src/copy.c b/src/copy.c index 9182c16..5b6ffe3 100644 --- a/src/copy.c +++ b/src/copy.c @@ -334,33 +334,31 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, { off_t ext_start = scan.ext_info[i].ext_logical; uint64_t ext_len = scan.ext_info[i].ext_length; + uint64_t hole_size = ext_start - last_ext_start - last_ext_len; - if (lseek (src_fd, ext_start, SEEK_SET) 0) + if (hole_size) { - error (0, errno, _(cannot lseek %s), quote (src_name)); -fail: - extent_scan_free (scan); - return false; -} + if (lseek (src_fd, ext_start, SEEK_SET) 0) +{ + error (0, errno, _(cannot lseek %s), quote (src_name)); +fail: + extent_scan_free (scan); + return false; +} - if (make_holes) -{ - if (lseek (dest_fd, ext_start, SEEK_SET) 0) + if (make_holes) { - error (0, errno, _(cannot lseek %s), quote (dst_name)); - goto fail; + if (lseek (dest_fd, ext_start, SEEK_SET) 0) +{ + error (0, errno, _(cannot lseek %s), quote (dst_name)); + goto fail; +} } -} - else -{ - /* When not inducing holes and when there is a hole between - the end of the previous extent and the beginning of the - current one, write zeros to the destination file. */ - if (last_ext_start + last_ext_len ext_start) + else { - uint64_t hole_size = (ext_start -- last_ext_start
Re: [PATCH] copy: adjust fiemap handling of sparse files
On 10/02/11 10:53, Jim Meyering wrote: While using --sparse=always is a lot less common, it looks like there's room for improvement there, too. I.e., we should be able to eliminate all of these lseek calls on the output descriptor there, too: $ strace -e lseek cp --sparse=always /tmp/t.f /tmp/t2 21|wc -l 16384 True. So merge the lseeks in the sparse detection in sparse_copy. I'll have a look at adding that this evening. Note we'd only get 1 lseek for the fallocated file above if the mentioned FIEMAP_EXTENT_UNWRITTEN idea works. cheers, Pádraig.
Re: [PATCH] copy: adjust fiemap handling of sparse files
Pádraig Brady wrote: I was looking at adding fallocate() to copy.c, now that the fiemap code has gone in and I noticed that if there was allocated space at the end of a file, not accounted for in st_size, then any holes would not be detected. Good point. In what other cases does the sparse detection heuristic fail BTW? There are probably a few, but none that I know of. Anwyay, we don't need the heuristic with fiemap, so I changed accordingly in the attached. ... Subject: [PATCH] copy: adjust fiemap handling of sparse files Don't depend on heuristics to detect sparse files if fiemap is available. Also don't introduce new holes unless --sparse=always has been specified. Good change, in principle. * src/copy.c (extent_copy): Pass the user specified sparse mode, and handle as described above. Also a redundant lseek has been suppressed when there is no hole between two extents. Could that be done in two separate patches? I haven't looked closely, but when I tested it on x86_64 (F14), it failed like this: FAIL: cp/sparse-to-pipe (exit: 1) = ... + truncate -s1M sparse + timeout 10 cat pipe + cp sparse pipe cp: failed to extend `pipe': Invalid argument + fail=1 + cmp sparse copy cmp: EOF on copy + fail=1 Sounds like the change provokes an lseek on the output FD, even though it's a pipe.
[PATCH] copy: adjust fiemap handling of sparse files
I was looking at adding fallocate() to copy.c, now that the fiemap code has gone in and I noticed that if there was allocated space at the end of a file, not accounted for in st_size, then any holes would not be detected. In what other cases does the sparse detection heuristic fail BTW? Anwyay, we don't need the heuristic with fiemap, so I changed accordingly in the attached. cheers, Pádraig. From a96d650ae634585ef46f96636d47c44297ce513a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= p...@draigbrady.com Date: Tue, 8 Feb 2011 19:16:55 + Subject: [PATCH] copy: adjust fiemap handling of sparse files Don't depend on heuristics to detect sparse files if fiemap is available. Also don't introduce new holes unless --sparse=always has been specified. * src/copy.c (extent_copy): Pass the user specified sparse mode, and handle as described above. Also a redundant lseek has been suppressed when there is no hole between two extents. --- src/copy.c | 25 ++--- 1 files changed, 10 insertions(+), 15 deletions(-) diff --git a/src/copy.c b/src/copy.c index 9182c16..63a7b1e 100644 --- a/src/copy.c +++ b/src/copy.c @@ -294,7 +294,7 @@ write_zeros (int fd, uint64_t n_bytes) return false. */ static bool extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, - off_t src_total_size, bool make_holes, + off_t src_total_size, enum Sparse_type sparse_mode, char const *src_name, char const *dst_name, bool *require_normal_copy) { @@ -343,7 +343,8 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, return false; } - if (make_holes) + uint64_t hole_size = ext_start - last_ext_start - last_ext_len; + if (hole_size sparse_mode != SPARSE_NEVER) { if (lseek (dest_fd, ext_start, SEEK_SET) 0) { @@ -351,21 +352,15 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, goto fail; } } - else + else if (hole_size) { /* When not inducing holes and when there is a hole between the end of the previous extent and the beginning of the current one, write zeros to the destination file. */ - if (last_ext_start + last_ext_len ext_start) + if (! write_zeros (dest_fd, hole_size)) { - uint64_t hole_size = (ext_start -- last_ext_start -- last_ext_len); - if (! write_zeros (dest_fd, hole_size)) -{ - error (0, errno, _(%s: write failed), quote (dst_name)); - goto fail; -} + error (0, errno, _(%s: write failed), quote (dst_name)); + goto fail; } } @@ -374,7 +369,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, off_t n_read; if ( ! sparse_copy (src_fd, dest_fd, buf, buf_size, - make_holes, src_name, dst_name, + sparse_mode == SPARSE_ALWAYS, src_name, dst_name, ext_len, n_read, wrote_hole_at_eof)) return false; @@ -397,7 +392,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, just converted them to a hole in the destination, we must call ftruncate here in order to record the proper length in the destination. */ if ((dest_pos src_total_size || wrote_hole_at_eof) - (make_holes + (sparse_mode != SPARSE_NEVER ? ftruncate (dest_fd, src_total_size) : ! write_zeros (dest_fd, src_total_size - dest_pos))) { @@ -968,7 +963,7 @@ copy_reg (char const *src_name, char const *dst_name, '--sparse=never' option is specified, write all data but use any extents to read more efficiently. */ if (extent_copy (source_desc, dest_desc, buf, buf_size, - src_open_sb.st_size, make_holes, + src_open_sb.st_size, x-sparse_mode, src_name, dst_name, normal_copy_required)) goto preserve_metadata; -- 1.7.3.4