[PATCH] btrfs-progs: allow device deletion using 'missing' keyword again
Device deletion procedures ensures the device is a block device. This patch introduces 'missing' as keyword again, correctly passing it on to the kernel instead of complaining about 'missing' not being a block device. Signed-off-by: Alexander Fougner--- cmds-device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cmds-device.c b/cmds-device.c index c2f3a40..dca30d7 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -161,7 +161,7 @@ static int _cmd_device_remove(int argc, char **argv, struct btrfs_ioctl_vol_args arg; int res; - if (is_block_device(argv[i]) != 1) { + if (is_block_device(argv[i]) != 1 && strcmp(argv[i], "missing")) { fprintf(stderr, "ERROR: %s is not a block device\n", argv[i]); ret++; -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs/RAID5 became unmountable after SATA cable fault
On 6 November 2015 at 10:03, Janos Toth F.wrote: > > Although I updated the firmware of the drives. (I found an IMPORTANT > update when I went there to download SeaTools, although there was no > change log to tell me why this was important). This might changed the > error handling behavior of the drive...? I've had Seagate drives not reporting errors until I updated the firmware. They tended to timeout instead. Got a shitload of SMART errors after I updated, but they still didn't handle errors very well (became unresponsive). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs/RAID5 became unmountable after SATA cable fault
I created a fresh RAID-5 mode Btrfs on the same 3 disks (including the faulty one which is still producing numerous random read errors) and Btrfs now seems to work exactly as I would anticipate. I copied some data and verified the checksum. The data is readable and correct regardless of the constant warning messages in the kernel log about the read errors on the single faulty HDD (the bad behavior is confirmed by the SMART logs and I tested it in a different PC as well...). I also ran several scrubs and now it always finishes with X corrected and 0 uncorrected errors. (The errors are supposedly corrected but the faulty HDD keeps randomly corrupting the data...) The last time I saw uncorrected errors during the scrub and not every data was readable. Rather strange... I ran 24 hours of Gimps/Prime95 Blend stresstest without errors on the problematic machine. Although I updated the firmware of the drives. (I found an IMPORTANT update when I went there to download SeaTools, although there was no change log to tell me why this was important). This might changed the error handling behavior of the drive...? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs progs pre-release 4.3-rc1
On Tue, Nov 03, 2015 at 12:10:14AM +, Duncan wrote: > David Sterba posted on Mon, 02 Nov 2015 16:14:53 +0100 as excerpted: > > > the kernel 4.3 was released yesterday, the btrfs-progs will follow at > > the end of this week. I've tagged an rc1 from current devel branch. > > There are a lots of small invisible changes and one change in the > > defaults: > > > > * mkfs: mixed mode is not forced anymore for devices smaller than 1 GiB > > It says one change in the /defaults/, but then it says mixed mode isn't > /forced/ anymore under a GiB. Well, it may be a loose definition of 'default'. I meant a change in the current behaviour without further tuning. > Which is it, a change in the /defaults/, under a gig now defaults to > separate data/metadata, or same /defaults/, but now there's a way to > overrule them and do separate data/metadata under a gig, so while mixed > remains the default, it's no longer /forced/? > > If the /defaults/ changed, is mixed mode still /recommended/ for small > filesystems? Yes it is, where small remains < 1 GiB. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs locking in linux 4.2.5
Hi, I believe I ran into some btrfs related locking issues with linux-4.2.5. It happened when I updated my Arch Linux OS, restarted the system, and tried to copy (not move) 20GB+ of data with in the btrfs /root partition. The system's load shot up with load averages peaking up to 20.00 at some point (the system has 8 cores). Programs were stuck while loading. After about 20 minutes, the system returned to normal operation and sane load averages. The system log had the following warnings hinting that this could be due to btrfs locking: > Nov 05 14:26:42 fulcrum kernel: INFO: task systemd-journal:330 blocked for > more than 120 seconds. > Nov 05 14:26:42 fulcrum kernel: Tainted: P IO 4.2.5-1-ARCH #1 > Nov 05 14:26:42 fulcrum kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Nov 05 14:26:42 fulcrum kernel: systemd-journal D 0005 0 330 1 > 0x0004 > Nov 05 14:26:42 fulcrum kernel: 8806027a7d28 0082 > 88060125b200 880602a2b200 > Nov 05 14:26:42 fulcrum kernel: 8806027a8000 > 8806017689f0 8806017689f0 > Nov 05 14:26:42 fulcrum kernel: 880487f1b170 0001 > 8806027a7d48 8157283e > Nov 05 14:26:42 fulcrum kernel: Call Trace: > Nov 05 14:26:42 fulcrum kernel: [] schedule+0x3e/0x90 > Nov 05 14:26:42 fulcrum kernel: [] > wait_current_trans.isra.9+0xca/0x110 [btrfs] > Nov 05 14:26:42 fulcrum kernel: [] ? > wake_atomic_t_function+0x60/0x60 > Nov 05 14:26:42 fulcrum kernel: [] > start_transaction+0x420/0x580 [btrfs] > Nov 05 14:26:42 fulcrum kernel: [] > btrfs_start_transaction+0x1b/0x20 [btrfs] > Nov 05 14:26:42 fulcrum kernel: [] > btrfs_sync_file+0x204/0x380 [btrfs] > Nov 05 14:26:42 fulcrum kernel: [] vfs_fsync_range+0x4b/0xb0 > Nov 05 14:26:42 fulcrum kernel: [] ? > SyS_timerfd_settime+0x53/0xa0 > Nov 05 14:26:42 fulcrum kernel: [] do_fsync+0x3d/0x70 > Nov 05 14:26:42 fulcrum kernel: [] SyS_fsync+0x10/0x20 > Nov 05 14:26:42 fulcrum kernel: [] > entry_SYSCALL_64_fastpath+0x12/0x71 > Nov 05 14:26:42 fulcrum kernel: INFO: task postgres:894 blocked for more than > 120 seconds. > Nov 05 14:26:42 fulcrum kernel: Tainted: P IO 4.2.5-1-ARCH #1 > Nov 05 14:26:42 fulcrum kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Nov 05 14:26:42 fulcrum kernel: postgres D 88061fc95200 0 894 865 > 0x > Nov 05 14:26:42 fulcrum kernel: 8805f435fb38 0086 > 880603ed1900 880603513200 > Nov 05 14:26:42 fulcrum kernel: 8805f435fbb8 8805f436 > 8806017689f0 8806017689f0 > Nov 05 14:26:42 fulcrum kernel: 88059e5acc38 0001 > 8805f435fb58 8157283e > Nov 05 14:26:42 fulcrum kernel: Call Trace: > Nov 05 14:26:42 fulcrum kernel: [] schedule+0x3e/0x90 > Nov 05 14:26:42 fulcrum kernel: [] > wait_current_trans.isra.9+0xca/0x110 [btrfs] > Nov 05 14:26:42 fulcrum kernel: [] ? > wake_atomic_t_function+0x60/0x60 > Nov 05 14:26:42 fulcrum kernel: [] > start_transaction+0x420/0x580 [btrfs] > Nov 05 14:26:42 fulcrum kernel: [] > btrfs_start_transaction+0x1b/0x20 [btrfs] > Nov 05 14:26:42 fulcrum kernel: [] btrfs_create+0x4a/0x210 > [btrfs] > Nov 05 14:26:42 fulcrum kernel: [] vfs_create+0x9c/0xd0 > Nov 05 14:26:42 fulcrum kernel: [] path_openat+0xfb7/0x1110 > Nov 05 14:26:42 fulcrum kernel: [] do_filp_open+0x8a/0x100 > Nov 05 14:26:42 fulcrum kernel: [] ? __alloc_fd+0x88/0x110 > Nov 05 14:26:42 fulcrum kernel: [] do_sys_open+0x146/0x230 > Nov 05 14:26:42 fulcrum kernel: [] SyS_open+0x1e/0x20 > Nov 05 14:26:42 fulcrum kernel: [] > entry_SYSCALL_64_fastpath+0x12/0x71 > Nov 05 14:26:42 fulcrum kernel: INFO: task zsh:1959 blocked for more than 120 > seconds. > Nov 05 14:26:42 fulcrum kernel: Tainted: P IO 4.2.5-1-ARCH #1 > Nov 05 14:26:42 fulcrum kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Nov 05 14:26:42 fulcrum kernel: zsh D 0001 0 1959 1323 0x > Nov 05 14:26:42 fulcrum kernel: 880556813cd8 0082 > 8805ec457080 8805eac5b200 > Nov 05 14:26:42 fulcrum kernel: 880556813ce8 880556814000 > 8805fcde09f0 8805fcde09f0 > Nov 05 14:26:42 fulcrum kernel: 8800d4f05ac8 0001 > 880556813cf8 8157283e > Nov 05 14:26:42 fulcrum kernel: Call Trace: > Nov 05 14:26:42 fulcrum kernel: [] schedule+0x3e/0x90 > Nov 05 14:26:42 fulcrum kernel: [] > wait_current_trans.isra.9+0xca/0x110 [btrfs] > Nov 05 14:26:42 fulcrum kernel: [] ? > wake_atomic_t_function+0x60/0x60 > Nov 05 14:26:42 fulcrum kernel: [] > start_transaction+0x420/0x580 [btrfs] > Nov 05 14:26:42 fulcrum kernel: [] ? __d_lookup+0xa1/0x160 > Nov 05 14:26:42 fulcrum kernel: [] > btrfs_start_transaction+0x1b/0x20 [btrfs] > Nov 05 14:26:42 fulcrum kernel: [] btrfs_symlink+0x80/0x3e0 > [btrfs] > Nov 05 14:26:42 fulcrum kernel: [] ? lookup_dcache+0x30/0xb0 > Nov 05 14:26:42 fulcrum kernel: [] vfs_symlink+0x8b/0xc0 >
Re: kernel BUG when fsync'ing file in a overlayfs merged dir, located on btrfs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/5/15 11:03 PM, Jeff Mahoney wrote: > On 11/5/15 10:18 PM, Al Viro wrote: >> On Thu, Nov 05, 2015 at 09:57:35PM -0500, Jeff Mahoney wrote: >> >>> So now file_operations callbacks can't assume that >>> file->f_path.dentry belongs to the same file system that >>> implements the callback. More than that, any code that could >>> ultimately get a dentry that comes from an open file can't >>> trust that it's from the same file system. >> >> Use file_inode() for inode. >> >>> This crash is due to this issue. Unlike xfs and ext2/3/4, we >>> use file->f_path.dentry->d_inode to resolve the inode. Using >>> file_inode() is an easy enough fix here, but we run into >>> trouble later. We have logic in the btrfs fsync() call path >>> (check_parent_dirs_for_sync) that walks back up the dentry >>> chain examining the inode's last transaction and last unlink >>> transaction to determine whether a full transaction commit is >>> required. This obviously doesn't work if we're walking the >>> overlayfs path instead. Regardless of any argument over >>> whether that's doing the right thing, it's a pretty common >>> pattern to assume that file->f_path.dentry comes from the same >>> file system when using a file_operation. Is it intended that >>> that assumption is no longer valid? >> >> It's actually rare, and your example is a perfect demonstration >> of the reasons why it is so rare. What's to protect >> btrfs_log_dentry_safe() from racing with rename(2)? Sure, you do >> dget_parent(). Which protects you from having one-time parent >> dentry freed under you. What it doesn't do is making any >> promises about its relationship with your file. > > I suppose the irony here is that, AFAIK, that code is to ensure a > file doesn't get lost between transactions due to rename. > > Isn't the file->f_path.dentry relationship stable otherwise, > though? The name might change and the parent might change but the > dentry that the file points to won't. And, taking it a bit further, it's impossible for a rename to end up with a file pointing into a different file system. So this btrfs case might misbehave, but it would never crash like we're seeing here. - -Jeff > I did find a few other places where that assumption happens without > any questionable traversals. Sure, all three are in file systems > unlikely to be used with overlayfs. > > ocfs2_prepare_inode_for_write uses file->f_path.dentry for > should_remove_suid (due to needing to do it early since cluster > locking is unknown in setattr, according to the commit). Having > should_remove_suid operate on an inode would solve that easily. > > fat_ioctl_set_attributes uses it to call fat_setattr, but that only > uses the inode and could have the inode_operation use a wrapper. > > cifs_new_fileinfo keeps a reference to the dentry but it seems to > be used mostly to access the inode except for the nasty-looking > call to build_path_from_dentry in cifs_reopen_file, which I won't > be touching. That does look like a questionable traversal, > especially with the "we can't take the rename lock here" comment. > > -Jeff > - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.19 (Darwin) Comment: GPGTools - http://gpgtools.org iQIcBAEBAgAGBQJWPL01AAoJEB57S2MheeWy1+IP/RfWvnpaXOCA2HJhzyR0attX D+SYah7Dc5OBicN0lghIg5ka0U2J1+l051yOOkT2sDRE23Lyu9/wmxhQVerx7hN4 js/ZGwbmGfO9I3kXbAKzGdsAscVAgvTcEp8gYXWFCzYIRYyDKEJM8xrQMM+Z2mIy AMu6lzMRFGD7q2KIITZzML0cozgT0TREE9D9+IrT3ywxAegIPATxwFp3pDRDwl4F zb2QjJjJvw/z0LEAlatwV1H7AAIZxAVrMWVywlsrdvg+pwA508JvkN7Wk06dAcJ2 YB+ddVIQsYyJuBYMA+IQsCM9q7LjIVPskoqi8BMxS2MvYObu6Z0zU+Iwcp0RnVa+ FiKt3gfRR0yOAuulzg9wKylYasIC8kfKD1POaAmOBgLErhDFtXIsJSXuw5HgY/VR LsSAbyOMfWg+YvreswQ7d7VMnK0wIJuRnludWVbQIn8y+4RKbqj2jiYIlZ7FMeUu rSSPlNt0GKISaSM3iSBrR2qN8PLvVyxdXpZSCl5itfqNea6KAwL+Kj61x0rNZhhF GkQlwsxJxYEue1eqqZU8iEkd0y93yPo3puhH7yHtT+dJW0NahjKiJF6TAGHF3C4a dEatwl6FSvDJA1aXvHG2dMfbtIiywKM1LJ4VAP1TOsbL3sqG3i4Orh7cN4bl2tYv /D9wgUU17XXdK76ysaxM =iP2W -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs progs release 4.3
Hi, the btrfs-progs 4.3 have been released. There's a notable change in the behaviour of mkfs on devices smaller than 1 GiB. The forced --mixed mode is no more. The resulting filesytem will have a split data and metadata groups. This may lead to earlier 'no-space' because the space is reserved for metadata use. * mkfs * mixed mode is not forced for filesystems smaller than 1GiB * mixed mode broken with mismatching sectorsize and nodesize, fixed * print version info earlier * print devices sorted by id * do not truncate target image with --rootsize * fi usage: * don't print global block reserve * print device id * minor output tuning * other cleanups * calc-size: * div-by-zero fix on an empty filesystem * fix crash * bugfixes: * more superblock sanity checks * consistently round size of all devices down to sectorsize * misc leak fixes * convert: don't try to rollback with a half-deleted ext2_saved subvolume * other: * check: add progress indicator * scrub: enahced error message * show-super: read superblock from a given offset * add README * docs: update manual page for mkfs.btrfs, btrfstune, balance, convert and inspect-internal * build: optional build with more warnings (W=...) * build: better support for static checkers * build: html output of documentation * pretty-print: last_snapshot for root_item * pretty-print: stripe dev uuid * error reporting wrappers, introduced and example use * refactor open_file_or_dir * other docs and help updates * testing: * test for nodes crossing stripes * test for broken 'subvolume sync' * basic tests for mkfs, raid option combinations * basic tests for fuzzed images (check) * command intrumentation (eg valgrind) * print commands if requested * add README for tests Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/ Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git Shortlog: Anand Jain (4): btrfs-progs: move is_numerical() helper to utils and rename btrfs-progs: device add: cleanup argument handling btrfs-progs: fix uninitialized copy of btrfs_fs_devices list btrfs-progs: fix missing initialization of list head for dev_list Chandan Rajendra (2): Btrfs-progs: Do not force mixed block group creation unless '-M' option is specified Btrfs-progs: Prevent creation of filesystem with 'mixed bgs' and having differing sectorsize and nodesize. David Sterba (46): btrfs-progs: misc tests: add 009-subvolume-sync-must-wait btrfs-progs: tests: print commands on terminal if requested btrfs-progs: build: allow to build with various compiler warnings btrfs-progs: a bit of makefile documentation btrfs-progs: check: update help text btrfs-progs: build: make support for static checkers more generic btrfs-progs: docs: add html build target btrfs-progs: cleanup and comment parse_range btrfs-progs: do not modify the string in parse_range btrfs-progs: extend parse_range API to accept a relaxed range btrfs-progs: add helpers for parsing 32bit ranges btrfs-progs: add helpers to print ranges btrfs-progs: tests: add mkfs tests btrfs-progs: tests: add 001-basic-profiles mkfs tests btrfs-progs: tests: add 002-no-force-mixed-on-small-volume btrfs-progs: tests: add 010-convert-delete-ext2-subvol btrfs-progs: tests: set default test image size to 2G btrfs-progs: tests: do not run sudo helper tests if not necessary btrfs-progs: tests: add test driver for fuzzed images btrfs-progs: tests: 001-simple-unmounted: iterate over fuzzed images and run check btrfs-progs: tests: add support for command instrumentation btrfs-progs: tests: do not log output of run_mayfail to terminal btrfs-progs: tests: add 003-mixed-with-wrong-nodesize btrfs-progs: mkfs: remove stray message about forced mixed-bg btrfs-progs: add an initial README btrfs-progs: add initial tests/README btrfs-progs: image: fix bogus check after cpu on-line detection btrfs-progs: mkfs: print version info first btrfs-progs: docs: enhance manual page for mkfs btrfs-progs: docs: enhance manual page for btrfstune btrfs-progs: docs: enhance manual page for balance btrfs-progs: docs: enhance the manual page for convert btrfs-progs: docs: enhance manual page for inspect-internal Btrfs progs v4.3-rc1 btrfs-progs: fi usage: do not print global block reserve btrfs-progs: fi usage: cleanup, print header in one go btrfs-progs: fi usage: print path header in the tabular mode btrfs-progs: fi usage: properly count real space infos btrfs-progs: fi usage: cleanup, replace header constant btrfs-progs: fi usage: cleanup, replace space info starting column constant btrfs-progs: fi usage: print device id column in the tabular output btrfs-progs: string
Re: Bad fs performance, IO freezes
I am getting a sata dock for my laptop next week. Until then, is it possible to perform an action in btrfs (like rm which seems to trigger the issue) and make it log what exactly it's doing? On Thu, Oct 29, 2015 at 9:01 PM, Austin S Hemmelgarnwrote: > On 2015-10-29 11:49, cheater00 . wrote: >> >> Hi Austin, >> seek times are fine, but this literally freezes my computer for a >> split second. I've had to re-type this email twice because the freezes >> meant letters I typed would not arrive on the screen. >> USB disks are so common they should not be having issues. > > That's debatable. USB is commonly used because it's almost impossible to > find a system that doesn't have it, not because it's reliable. The original > intent was for it to be used for stuff like mice and keyboards, so it was > designed with low-latency and fair scheduling in mind, both of which really > hurt performance of bulk data storage devices. >> >> I have 4.3.0-040300rc7-generic #201510260712 which is just three days old. > > That should be perfectly recent enough, although FWIW, the official version > of 4.3 should be out this Sunday. >> >> >> Please advise. Isn't it better to *not* use a vm to debug this? > > That depends. For something like this, it could go either way. I just use > a VM because that's what I always use, because it's nice not crashing your > system when trying to debug a kernel panic. >> >> BTW, if we are talking about slow speed making things worse, I could >> try downgrading the cable to usb2. >> Is there a standard virtualbox VM that I could use? > > In general, it's pretty easy to set something like Ubuntu up in VirtualBox, > the install is essentially identical to regular hardware aside from the > initial setup of the VM itself. The documentation for VirtualBox is really > good, if you've never used virtualization before, it's definitely worth > reading. >> >> I'll download Gentoo in the meantime. I have never used it. I'm >> getting the "minimal installation cd" from 29th september. >> >> http://distfiles.gentoo.org/releases/x86/autobuilds/20150929/install-x86-minimal-20150929.iso > > I meant by no means that you needed to use Gentoo, I only mentioned it > because it's what I use (which in turn is because that's what I use on just > about everything except stuff like the Raspberry Pi or the BeagleBoard). If > you just want to debug this and then be done with it, I would actually > advise against using Gentoo, it takes a lot of effort to get a system up and > running with it, and it's very involved to maintain compared to Ubuntu. On > the other hand though, if you are willing to learn to use it, it's one of > the most highly customizable Linux distros out there, and can have > noticeably better performance than more generic distros (FWIW, it's also one > of the last big distros that doesn't force systemd on it's users by > default). > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: test quota disable during quota rescan
This test case tests if we are able to disable quotas on a filesystem while a quota rescan is running. Up to now (4.3) this would result in a kernel NULL pointer dereference. Fixed by patch (btrfs: qgroup: fix quota disable during rescan). Signed-off-by: Justin Maggard--- tests/btrfs/115 | 62 + tests/btrfs/115.out | 2 ++ tests/btrfs/group | 1 + 3 files changed, 65 insertions(+) create mode 100755 tests/btrfs/115 create mode 100644 tests/btrfs/115.out diff --git a/tests/btrfs/115 b/tests/btrfs/115 new file mode 100755 index 000..0d1cb3a --- /dev/null +++ b/tests/btrfs/115 @@ -0,0 +1,62 @@ +#! /bin/bash +# FS QA Test No. btrfs/115 +# +# btrfs quota scan/disable sanity test +# Make sure that disabling quotas during a quota rescan doesn't crash +# +#--- +# Copyright (c) 2015 NETGEAR, Inc. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount + +for i in `seq 0 1 45`; do + echo -n > $SCRATCH_MNT/file.$i +done +echo 3 > /proc/sys/vm/drop_caches +$BTRFS_UTIL_PROG quota enable $SCRATCH_MNT +$BTRFS_UTIL_PROG quota disable $SCRATCH_MNT + + +echo "Silence is golden" +status=0 +exit diff --git a/tests/btrfs/115.out b/tests/btrfs/115.out new file mode 100644 index 000..d9dd136 --- /dev/null +++ b/tests/btrfs/115.out @@ -0,0 +1,2 @@ +QA output created by 115 +Silence is golden diff --git a/tests/btrfs/group b/tests/btrfs/group index 10ab26b..39b9aff 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -117,3 +117,4 @@ 112 auto quick clone 113 auto quick compress clone 114 auto qgroup +115 auto qgroup -- 2.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: qgroup: fix quota disable during rescan
There's a race condition that leads to a NULL pointer dereference if you disable quotas while a quota rescan is running. To fix this, we just need to wait for the quota rescan worker to actually exit before tearing down the quota structures. Signed-off-by: Justin Maggard--- fs/btrfs/qgroup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 75c0249..a7cf504 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -993,9 +993,10 @@ int btrfs_quota_disable(struct btrfs_trans_handle *trans, mutex_lock(_info->qgroup_ioctl_lock); if (!fs_info->quota_root) goto out; - spin_lock(_info->qgroup_lock); fs_info->quota_enabled = 0; fs_info->pending_quota_state = 0; + btrfs_qgroup_wait_for_completion(fs_info); + spin_lock(_info->qgroup_lock); quota_root = fs_info->quota_root; fs_info->quota_root = NULL; fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_ON; -- 2.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Btrfs
Hi Linus, Please pull my for-linus-4.4 branch: git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus-4.4 My branch was based on 4.3-rc5, and it ended up with a minor conflict against the btrfs changes sent in for a later rc. I put a sample merge resolution up here: git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus-4.4-merged The only non-obvious part of all this is in fs/btrfs/volumes.h. Both sides of the merge ended up creating the BTRFS_BALANCE_ARGS_MASK definition, which was my fault (sorry) for letting a duplicate patch in the mix. When you merge, please keep the longer of the two definitions, since we added to it later in the for-linus-4.4 branch. I'm happy to redo things merged with 4.3-final if you don't want to bother with this, sorry for the hassle. On to the fun part, we have a lot of subvolume quota improvements in here, along with big piles of cleanups from Dave Sterba and Anand Jain and others. Josef pitched in a batch of allocator fixes based on production use here at FB. We found that mount -o ssd_spread greatly improved our performance on hardware raid5/6, but it exposed some CPU bottlenecks in the allocator. These patches make a huge difference. Qu Wenruo (24) commits (+1031/-375): btrfs: extent-tree: Add new version of btrfs_check_data_free_space and btrfs_free_reserved_data_space. (+79/-9) btrfs: extent-tree: Switch to new check_data_free_space and free_reserved_data_space (+27/-19) btrfs: qgroup: Avoid calling btrfs_free_reserved_data_space in clear_bit_hook (+22/-12) btrfs: delayed_ref: Add new function to record reserved space into delayed ref (+43/-0) btrfs: extent-tree: Add new version of btrfs_delalloc_reserve/release_space (+61/-0) btrfs: extent_io: Introduce needed structure for recoding set/clear bits (+12/-0) btrfs: qgroup: Introduce functions to release/free qgroup reserve data (+62/-0) btrfs: extent-tree: Switch to new delalloc space reserve and release (+38/-25) btrfs: delayed_ref: release and free qgroup reserved at proper timing (+34/-4) btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans (+32/-21) btrfs: extent_io: Introduce new function clear_record_extent_bits() (+42/-11) btrfs: qgroup: Fix a rebase bug which will cause qgroup double free (+0/-4) btrfs: extent_io: Introduce new function set_record_extent_bits (+56/-18) btrfs: qgroup: Introduce new functions to reserve/free metadata (+48/-0) btrfs: qgroup: Don't copy extent buffer to do qgroup rescan (+16/-10) btrfs: qgroup: Add new trace point for qgroup data reserve (+130/-2) btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function (+52/-0) btrfs: fallocate: Add support to accurate qgroup reserve (+117/-44) btrfs: qgroup: Check if qgroup reserved space leaked (+34/-0) btrfs: qgroup: Cleanup old inaccurate facilities (+60/-156) btrfs: qgroup: Add handler for NOCOW and inline (+21/-1) btrfs: qgroup: Use new metadata reservation. (+13/-36) btrfs: Fix a data space underflow warning (+8/-3) btrfs: Add handler for invalidate page (+24/-0) David Sterba (17) commits (+393/-112): btrfs: introduce ratelimited _in_rcu variants of message printing functions (+38/-0) btrfs: remove waitqueue_active check from btrfs_rm_dev_replace_unblocked (+1/-2) btrfs: add balance filters limits, stripes and usage to supported mask (+4/-1) btrfs: comment the rest of implicit barriers before waitqueue_active (+22/-0) btrfs: introduce ratelimited variants of message printing functions (+21/-0) btrfs: switch message printers to ratelimited _in_rcu variants (+16/-16) btrfs: introduce _in_rcu variants of message printing functions (+29/-0) btrfs: extend balance filter usage to take minimum and maximum (+60/-4) btrfs: extend balance filter limit to take minimum and maximum (+67/-3) btrfs: add barrier for waitqueue_active in clear_btree_io_tree (+6/-0) btrfs: add comments to barriers before waitqueue_active (+16/-2) btrfs: switch message printers to ratelimited variants (+31/-33) btrfs: check unsupported filters in balance arguments (+13/-0) btrfs: switch message printers to _in_rcu variants (+27/-27) btrfs: remove extra barrier before waitqueue_active (+6/-2) btrfs: comment waitqueue_active implied by locks (+11/-1) btrfs: switch more printks to our helpers (+25/-21) Anand Jain (14) commits (+205/-184): Btrfs: __btrfs_std_error() logic should be consistent w/out CONFIG_PRINTK defined (+5/-22) Btrfs: use BTRFS_ERROR_DEV_MISSING_NOT_FOUND when missing device is not found (+2/-4) Btrfs: kernel operation should come after user input has been verified (+13/-13) Btrfs: enhance btrfs_scratch_superblock to scratch all superblocks (+27/-13) Btrfs: rename btrfs_kobj_add_device to btrfs_sysfs_add_device_link (+5/-5) Btrfs: rename btrfs_sysfs_remove_one to
Re: Unable to allocate for space usage in particular btrfs volume
On Thu, 2015-11-05 at 10:44 +, OmegaPhil wrote: > On 05/11/15 04:18, Duncan wrote: > > OmegaPhil posted on Wed, 04 Nov 2015 21:53:09 + as excerpted: > > VM image files (and large database files, for the same reason) are > > a bit > > of a problem on btrfs, and indeed, any COW-based filesystem, since > > the > > random rewrite pattern matching that use-case is pretty much the > > absolute > > worst-case match for a COW-based filesystem there is. > > Since you're not doing snapshotting (which conflicts with this > > option, > > with an imperfect workaround), setting nocow on those files may > > well > > eliminate the problem, but be aware if you aren't already that (1) > > nocow > > does turn off checksumming as well, in ordered to avoid a race that > > could > > easily lead to data corruption, and (2) you can't just activate > So a couple of gig still unaccountable but irrelevant. Thanks, > problem > solved! Although hopefully checksumming will be allowed on nocow > files > in the future as thats currently 17% of all data unprotected and will > get worse... There's actually an interesting workaround to this: Although the VM disk images aren't checksummed on the host filesystem, you can use btrfs *inside* the VMs and enable checksumming there. The downside is that you can only verify the VM data by booting the VM and running a scrub from inside. This of course doesn't help if your VMs are Windows or legacy versions of Linux without btrfs support. On BSD you could try ZFS. -- Calvin Walton-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to allocate for space usage in particular btrfs volume
On 2015-11-06 15:15, Calvin Walton wrote: On Thu, 2015-11-05 at 10:44 +, OmegaPhil wrote: On 05/11/15 04:18, Duncan wrote: OmegaPhil posted on Wed, 04 Nov 2015 21:53:09 + as excerpted: VM image files (and large database files, for the same reason) are a bit of a problem on btrfs, and indeed, any COW-based filesystem, since the random rewrite pattern matching that use-case is pretty much the absolute worst-case match for a COW-based filesystem there is. Since you're not doing snapshotting (which conflicts with this option, with an imperfect workaround), setting nocow on those files may well eliminate the problem, but be aware if you aren't already that (1) nocow does turn off checksumming as well, in ordered to avoid a race that could easily lead to data corruption, and (2) you can't just activate So a couple of gig still unaccountable but irrelevant. Thanks, problem solved! Although hopefully checksumming will be allowed on nocow files in the future as thats currently 17% of all data unprotected and will get worse... There's actually an interesting workaround to this: Although the VM disk images aren't checksummed on the host filesystem, you can use btrfs *inside* the VMs and enable checksumming there. The downside is that you can only verify the VM data by booting the VM and running a scrub from inside. Actually, by using a combination of loop devices and kpartx, it's fully possible to mount the FS and verify it without booting the VM. Of course, doing this usually requires root access on the host system, but for most people I know, that's usually not an issue. I do this on occasion when I need to pull a file off of one of my VM disks on my laptop and don't have the time to spin up the VM itself. Another option if you're doing a direct boot of the kernel (for example, when using a fully paravirtualized domain on Xen, or using some of the QEMU ARM systems) is to just do the volume management (partitioning and such) on the host, and expose each filesystem to the guest as a separate disk. I do this with most of my Linux VM's on my Xen system where I use LVM as the back-end storage for the virtual disk images, as it allows me to easily directly mount the VM's filesystems on the host if need be (and let's you do all kinds of cool things like using a cluster-aware filesystem for the VM's root so that you can mount it from the host safely while the VM is still online). smime.p7s Description: S/MIME Cryptographic Signature
[PATCH v8 4/4] vfs: Add vfs_copy_file_range() support for pagecache copies
This allows us to have an in-kernel copy mechanism that avoids frequent switches between kernel and user space. This is especially useful so NFSD can support server-side copies. The default (flags=0) means to first attempt copy acceleration, but use the pagecache if that fails. I moved the rw_verify_area() calls into the fallback code since some filesystems can handle reflinking a large range. Signed-off-by: Anna SchumakerReviewed-by: Darrick J. Wong Reviewed-by: Padraig Brady Reviewed-by: Christoph Hellwig --- fs/read_write.c | 37 ++--- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index 97c15ca..a093830 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1329,6 +1329,24 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd, } #endif +static ssize_t vfs_copy_fr_copy(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + size_t len) +{ + ssize_t ret = rw_verify_area(READ, file_in, _in, len); + + if (ret >= 0) { + len = ret; + ret = rw_verify_area(WRITE, file_out, _out, len); + if (ret >= 0) + len = ret; + } + if (ret < 0) + return ret; + + return do_splice_direct(file_in, _in, file_out, _out, len, 0); +} + /* * copy_file_range() differs from regular file read and write in that it * specifically allows return partial success. When it does so is up to @@ -1345,17 +1363,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, if (flags != 0) return -EINVAL; - /* copy_file_range allows full ssize_t len, ignoring MAX_RW_COUNT */ - ret = rw_verify_area(READ, file_in, _in, len); - if (ret >= 0) - ret = rw_verify_area(WRITE, file_out, _out, len); - if (ret < 0) - return ret; - if (!(file_in->f_mode & FMODE_READ) || !(file_out->f_mode & FMODE_WRITE) || - (file_out->f_flags & O_APPEND) || - !file_out->f_op->copy_file_range) + (file_out->f_flags & O_APPEND)) return -EBADF; /* this could be relaxed once a method supports cross-fs copies */ @@ -1369,8 +1379,13 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, if (ret) return ret; - ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, pos_out, - len, flags); + ret = -EOPNOTSUPP; + if (file_out->f_op->copy_file_range) + ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, + pos_out, len, flags); + if (ret == -EOPNOTSUPP) + ret = vfs_copy_fr_copy(file_in, pos_in, file_out, pos_out, len); + if (ret > 0) { fsnotify_access(file_in); add_rchar(current, ret); -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v8 3/4] btrfs: add .copy_file_range file operation
From: Zach BrownThis rearranges the existing COPY_RANGE ioctl implementation so that the .copy_file_range file operation can call the core loop that copies file data extent items. The extent copying loop is lifted up into its own function. It retains the core btrfs error checks that should be shared. Signed-off-by: Zach Brown [Anna Schumaker: Make flags an unsigned int, Check for COPY_FR_REFLINK] Signed-off-by: Anna Schumaker Reviewed-by: Josef Bacik Reviewed-by: David Sterba Reviewed-by: Christoph Hellwig --- fs/btrfs/ctree.h | 3 ++ fs/btrfs/file.c | 1 + fs/btrfs/ioctl.c | 91 3 files changed, 56 insertions(+), 39 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 938efe3..0046567 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3996,6 +3996,9 @@ int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode, loff_t pos, size_t write_bytes, struct extent_state **cached); int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end); +ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + size_t len, unsigned int flags); /* tree-defrag.c */ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index b823fac..b05449c 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2816,6 +2816,7 @@ const struct file_operations btrfs_file_operations = { #ifdef CONFIG_COMPAT .compat_ioctl = btrfs_ioctl, #endif + .copy_file_range = btrfs_copy_file_range, }; void btrfs_auto_defrag_exit(void) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 3e3e613..ad75e48 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3727,17 +3727,16 @@ out: return ret; } -static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, - u64 off, u64 olen, u64 destoff) +static noinline int btrfs_clone_files(struct file *file, struct file *file_src, + u64 off, u64 olen, u64 destoff) { struct inode *inode = file_inode(file); + struct inode *src = file_inode(file_src); struct btrfs_root *root = BTRFS_I(inode)->root; - struct fd src_file; - struct inode *src; int ret; u64 len = olen; u64 bs = root->fs_info->sb->s_blocksize; - int same_inode = 0; + int same_inode = src == inode; /* * TODO: @@ -3750,49 +3749,20 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, * be either compressed or non-compressed. */ - /* the destination must be opened for writing */ - if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND)) - return -EINVAL; - if (btrfs_root_readonly(root)) return -EROFS; - ret = mnt_want_write_file(file); - if (ret) - return ret; - - src_file = fdget(srcfd); - if (!src_file.file) { - ret = -EBADF; - goto out_drop_write; - } - - ret = -EXDEV; - if (src_file.file->f_path.mnt != file->f_path.mnt) - goto out_fput; - - src = file_inode(src_file.file); - - ret = -EINVAL; - if (src == inode) - same_inode = 1; - - /* the src must be open for reading */ - if (!(src_file.file->f_mode & FMODE_READ)) - goto out_fput; + if (file_src->f_path.mnt != file->f_path.mnt || + src->i_sb != inode->i_sb) + return -EXDEV; /* don't make the dst file partly checksummed */ if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) != (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) - goto out_fput; + return -EINVAL; - ret = -EISDIR; if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode)) - goto out_fput; - - ret = -EXDEV; - if (src->i_sb != inode->i_sb) - goto out_fput; + return -EISDIR; if (!same_inode) { btrfs_double_inode_lock(src, inode); @@ -3869,6 +3839,49 @@ out_unlock: btrfs_double_inode_unlock(src, inode); else mutex_unlock(>i_mutex); + return ret; +} + +ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + size_t len, unsigned int flags) +{ + ssize_t ret; + + ret = btrfs_clone_files(file_out, file_in, pos_in, len, pos_out); + if (ret == 0) + ret = len; + return ret; +} +
[PATCH v8 1/4] vfs: add copy_file_range syscall and vfs helper
From: Zach BrownAdd a copy_file_range() system call for offloading copies between regular files. This gives an interface to underlying layers of the storage stack which can copy without reading and writing all the data. There are a few candidates that should support copy offloading in the nearer term: - btrfs shares extent references with its clone ioctl - NFS has patches to add a COPY command which copies on the server - SCSI has a family of XCOPY commands which copy in the device This system call avoids the complexity of also accelerating the creation of the destination file by operating on an existing destination file descriptor, not a path. Currently the high level vfs entry point limits copy offloading to files on the same mount and super (and not in the same file). This can be relaxed if we get implementations which can copy between file systems safely. Signed-off-by: Zach Brown [Anna Schumaker: Change -EINVAL to -EBADF during file verification, Change flags parameter from int to unsigned int, Add function to include/linux/syscalls.h, Check copy len after file open mode, Don't forbid ranges inside the same file, Use rw_verify_area() to veriy ranges, Use file_out rather than file_in, Add COPY_FR_REFLINK flag] Signed-off-by: Anna Schumaker Reviewed-by: Christoph Hellwig --- -v8: - Remove redundant checks - Clear up fdget() / fdput() confusion --- fs/read_write.c | 120 ++ include/linux/fs.h| 3 + include/linux/syscalls.h | 3 + include/uapi/asm-generic/unistd.h | 4 +- kernel/sys_ni.c | 1 + 5 files changed, 130 insertions(+), 1 deletion(-) diff --git a/fs/read_write.c b/fs/read_write.c index 819ef3f..97c15ca 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "internal.h" #include @@ -1327,3 +1328,122 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd, return do_sendfile(out_fd, in_fd, NULL, count, 0); } #endif + +/* + * copy_file_range() differs from regular file read and write in that it + * specifically allows return partial success. When it does so is up to + * the copy_file_range method. + */ +ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + size_t len, unsigned int flags) +{ + struct inode *inode_in = file_inode(file_in); + struct inode *inode_out = file_inode(file_out); + ssize_t ret; + + if (flags != 0) + return -EINVAL; + + /* copy_file_range allows full ssize_t len, ignoring MAX_RW_COUNT */ + ret = rw_verify_area(READ, file_in, _in, len); + if (ret >= 0) + ret = rw_verify_area(WRITE, file_out, _out, len); + if (ret < 0) + return ret; + + if (!(file_in->f_mode & FMODE_READ) || + !(file_out->f_mode & FMODE_WRITE) || + (file_out->f_flags & O_APPEND) || + !file_out->f_op->copy_file_range) + return -EBADF; + + /* this could be relaxed once a method supports cross-fs copies */ + if (inode_in->i_sb != inode_out->i_sb) + return -EXDEV; + + if (len == 0) + return 0; + + ret = mnt_want_write_file(file_out); + if (ret) + return ret; + + ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, pos_out, + len, flags); + if (ret > 0) { + fsnotify_access(file_in); + add_rchar(current, ret); + fsnotify_modify(file_out); + add_wchar(current, ret); + } + inc_syscr(current); + inc_syscw(current); + + mnt_drop_write_file(file_out); + + return ret; +} +EXPORT_SYMBOL(vfs_copy_file_range); + +SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in, + int, fd_out, loff_t __user *, off_out, + size_t, len, unsigned int, flags) +{ + loff_t pos_in; + loff_t pos_out; + struct fd f_in; + struct fd f_out; + ssize_t ret = -EBADF; + + f_in = fdget(fd_in); + if (!f_in.file) + goto out2; + + f_out = fdget(fd_out); + if (!f_out.file) + goto out1; + + ret = -EFAULT; + if (off_in) { + if (copy_from_user(_in, off_in, sizeof(loff_t))) + goto out; + } else { + pos_in = f_in.file->f_pos; + } + + if (off_out) { + if (copy_from_user(_out, off_out, sizeof(loff_t))) + goto out; + } else { + pos_out =
[PATCH v8 0/4] VFS: In-kernel copy system call
Copy system calls came up during Plumbers a while ago, mostly because several filesystems (including NFS and XFS) are currently working on copy acceleration implementations. We haven't heard from Zach Brown in a while, so I volunteered to push his patches upstream so individual filesystems don't need to keep writing their own ioctls. This posting fixes up a few minor issues that came up on the mailing list. I looked into the O_APPEND question, and do_splice_direct() specificially disallows files that are open for appending. I've decided to keep the no-O_APPEND requirement for now since I use this function for pagecache copies. Changes in v8: - Remove redundant checks. - Make the fdget() / fdput() calls more obvious. - Document disallowing files open with O_APPEND. Thanks, Anna Anna Schumaker (1): vfs: Add vfs_copy_file_range() support for pagecache copies Zach Brown (3): vfs: add copy_file_range syscall and vfs helper x86: add sys_copy_file_range to syscall tables btrfs: add .copy_file_range file operation arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/btrfs/ctree.h | 3 + fs/btrfs/file.c| 1 + fs/btrfs/ioctl.c | 91 -- fs/read_write.c| 135 + include/linux/fs.h | 3 + include/linux/syscalls.h | 3 + include/uapi/asm-generic/unistd.h | 4 +- kernel/sys_ni.c| 1 + 10 files changed, 203 insertions(+), 40 deletions(-) -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v8 2/4] x86: add sys_copy_file_range to syscall tables
From: Zach BrownAdd sys_copy_file_range to the x86 syscall tables. Signed-off-by: Zach Brown [Anna Schumaker: Update syscall number in syscall_32.tbl] Signed-off-by: Anna Schumaker Reviewed-by: Christoph Hellwig --- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 7663c45..0531270 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -382,3 +382,4 @@ 373i386shutdownsys_shutdown 374i386userfaultfd sys_userfaultfd 375i386membarrier sys_membarrier +376i386copy_file_range sys_copy_file_range diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 278842f..03a9396 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -331,6 +331,7 @@ 32264 execveatstub_execveat 323common userfaultfd sys_userfaultfd 324common membarrier sys_membarrier +325common copy_file_range sys_copy_file_range # # x32-specific system call numbers start at 512 to avoid cache impact -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v8 5/4] copy_file_range.2: New page documenting copy_file_range()
copy_file_range() is a new system call for copying ranges of data completely in the kernel. This gives filesystems an opportunity to implement some kind of "copy acceleration", such as reflinks or server-side-copy (in the case of NFS). Signed-off-by: Anna SchumakerReviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- v8: - Document that files can not be open with O_APPEND. --- man2/copy_file_range.2 | 201 + man2/splice.2 | 1 + 2 files changed, 202 insertions(+) create mode 100644 man2/copy_file_range.2 diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2 new file mode 100644 index 000..d9f76d1 --- /dev/null +++ b/man2/copy_file_range.2 @@ -0,0 +1,201 @@ +.\"This manpage is Copyright (C) 2015 Anna Schumaker +.\" +.\" %%%LICENSE_START(VERBATIM) +.\" Permission is granted to make and distribute verbatim copies of this +.\" manual provided the copyright notice and this permission notice are +.\" preserved on all copies. +.\" +.\" Permission is granted to copy and distribute modified versions of +.\" this manual under the conditions for verbatim copying, provided that +.\" the entire resulting derived work is distributed under the terms of +.\" a permission notice identical to this one. +.\" +.\" Since the Linux kernel and libraries are constantly changing, this +.\" manual page may be incorrect or out-of-date. The author(s) assume +.\" no responsibility for errors or omissions, or for damages resulting +.\" from the use of the information contained herein. The author(s) may +.\" not have taken the same level of care in the production of this +.\" manual, which is licensed free of charge, as they might when working +.\" professionally. +.\" +.\" Formatted or processed versions of this manual, if unaccompanied by +.\" the source, must acknowledge the copyright and authors of this work. +.\" %%%LICENSE_END +.\" +.TH COPY 2 2015-11-06 "Linux" "Linux Programmer's Manual" +.SH NAME +copy_file_range \- Copy a range of data from one file to another +.SH SYNOPSIS +.nf +.B #include +.B #include + +.BI "ssize_t copy_file_range(int " fd_in ", loff_t *" off_in ", int " fd_out ", +.BI "loff_t *" off_out ", size_t " len \ +", unsigned int " flags ); +.fi +.SH DESCRIPTION +The +.BR copy_file_range () +system call performs an in-kernel copy between two file descriptors +without the additional cost of transferring data from the kernel to userspace +and then back into the kernel. +It copies up to +.I len +bytes of data from file descriptor +.I fd_in +to file descriptor +.IR fd_out , +overwriting any data that exists within the requested range of the target file. + +The following semantics apply for +.IR off_in , +and similar statements apply to +.IR off_out : +.IP * 3 +If +.I off_in +is NULL, then bytes are read from +.I fd_in +starting from the current file offset, and the offset is +adjusted by the number of bytes copied. +.IP * +If +.I off_in +is not NULL, then +.I off_in +must point to a buffer that specifies the starting +offset where bytes from +.I fd_in +will be read. The current file offset of +.I fd_in +is not changed, but +.I off_in +is adjusted appropriately. +.PP + +The +.I flags +argument must be set to 0. +.SH RETURN VALUE +Upon successful completion, +.BR copy_file_range () +will return the number of bytes copied between files. +This could be less than the length originally requested. + +On error, +.BR copy_file_range () +returns \-1 and +.I errno +is set to indicate the error. +.SH ERRORS +.TP +.B EBADF +One or more file descriptors are not valid; or +.I fd_in +is not open for reading; or +.I fd_out +is not open for writing; or +.I fd_out +is open for appending. +.TP +.B EINVAL +Requested range extends beyond the end of the source file; or the +.I flags +argument is not 0. +.TP +.B EIO +A low level I/O error occurred while copying. +.TP +.B ENOMEM +Out of memory. +.TP +.B ENOSPC +There is not enough space on the target filesystem to complete the copy. +.TP +.B EXDEV +.IR file_in " and " file_out +are not on the same mounted filesystem. +.SH VERSIONS +The +.BR copy_file_range () +system call first appeared in Linux 4.4. +.SH CONFORMING TO +The +.BR copy_file_range () +system call is a nonstandard Linux extension. +.SH NOTES +If +.I file_in +is a sparse file, then +.BR copy_file_range () +may expand any holes existing in the requested range. +Users may benefit from calling +.BR copy_file_range () +in a loop, and using +.BR lseek (2) +to find the locations of data segments. +.SH EXAMPLE +.nf +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include + +loff_t copy_file_range(int fd_in, loff_t *off_in, int fd_out, + loff_t *off_out, size_t len, unsigned int flags) +{ +return syscall(__NR_copy_file_range, fd_in, off_in, fd_out, +
Re: btrfs_sync_file alignment trap on arm (kernel 4.2.5)
On Wed, Nov 4, 2015 at 5:55 PM, Cody P Schaferwrote: > Ideas as to what could cause this would be appreciated. > > This consistently is triggered shortly after boot (I presume due to > conmand calling fsync on a file). > > Note that I'm not quite running 4.2.5, but none of the changes I have > additionally applied are to btrfs or atomics. > > Let me know if there is a way for me to get you more info. > > It looks like the line is: > > mutex_lock(>i_mutex); > atomic_inc(>log_batch); > full_sync = test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, > _I(inode)->runtime_flags); > > > addr2line of the trapping instruction: > addr2line -e > ./work/beaglebone-poky-linux-gnueabi/linux-yocto-ikabit/4.2.5+gitAUTOINC+c29ac1-r1/linux-beaglebone-standard-build/arch/arm/boot/vmlinux > c0217a34 -i > > /home/cody/obj/y/tmp/work-shared/beaglebone/kernel-source/arch/arm/include/asm/atomic.h:194 > > /home/cody/obj/y/tmp/work-shared/beaglebone/kernel-source/fs/btrfs/file.c:1886 > > fault log: > > [ 11.488382] Alignment trap: not handling instruction e1993f9f at > [] > [ 11.560558] Unhandled fault: alignment exception (0x001) at 0x026d > [ 11.607301] pgd = dc09 > [ 11.610166] [026d] *pgd=9c063831, *pte=, *ppte= > [ 11.665548] Internal error: : 1 [#1] PREEMPT ARM > [ 11.670388] Modules linked in: omaplfb(O) bufferclass_ti(O) pvrsrvkm(O) > [ 11.677341] CPU: 0 PID: 248 Comm: connmand Tainted: G O > 4.2.5-yocto-standard #1 > [ 11.686172] Hardware name: Generic AM33XX (Flattened Device Tree) > [ 11.692551] task: dc00d100 ti: dc068000 task.ti: dc068000 > [ 11.698219] PC is at btrfs_sync_file+0x104/0x3f4 > [ 11.703051] LR is at btrfs_sync_file+0x100/0x3f4 > [ 11.707883] pc : []lr : []psr: 60060013 > [ 11.707883] sp : dc069e98 ip : dc069e98 fp : dc069f1c > [ 11.719897] r10: r9 : 026d r8 : dd746b40 > [ 11.725363] r7 : dcef8dcc r6 : r5 : dcef8d68 r4 : 0001 > [ 11.732192] r3 : 7fff r2 : r1 : r0 : dcef8dcc > [ 11.739024] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment > user > [ 11.746491] Control: 10c5387d Table: 9c090019 DAC: 0015 > [ 11.752502] Process connmand (pid: 248, stack limit = 0xdc068210) > [ 11.758878] Stack: (0xdc069e98 to 0xdc06a000) > [ 11.763436] 9e80: > 7fff > [ 11.771998] 9ea0: dc069f38 c014582c b30b > 37f6 81a4 d5012f68 > [ 11.780559] 9ec0: 8000 > 008d > [ 11.789121] 9ee0: 0400 0001 20b04e34 563924e6 > dd746b40 dce2beb8 > [ 11.797683] 9f00: dc068000 dc069f54 > dc069f20 c0171d00 c0217940 > [ 11.806245] 9f20: 7fff dc069f38 c0145b6c > dd746b40 dd746b40 > [ 11.814807] 9f40: 0076 c000f7a4 dc069f74 dc069f58 c0171d3c > c0171c40 7fff > [ 11.823369] 9f60: c015cc9c dc069f94 dc069f78 c0171d7c > c0171d14 bebe2bf8 > [ 11.831931] 9f80: 00e64c4d b6a104d0 dc069fa4 dc069f98 c017204c > c0171d50 dc069fa8 > [ 11.840492] 9fa0: c000f600 c017203c bebe2bf8 00e64c4d 0009 > bebe2bf8 008d > [ 11.849054] 9fc0: bebe2bf8 00e64c4d b6a104d0 0076 00e64810 > 00e60cd0 0009 > [ 11.857616] 9fe0: bebe2bd4 b6e63384 b6dcef00 60060010 > 0009 044a1103 099b0303 > [ 11.866199] [] (btrfs_sync_file) from [] > (vfs_fsync_range+0xcc/0xd4) > [ 11.874674] [] (vfs_fsync_range) from [] > (vfs_fsync+0x34/0x3c) > [ 11.882600] [] (vfs_fsync) from [] (do_fsync+0x38/0x54) > [ 11.889890] [] (do_fsync) from [] (SyS_fsync+0x1c/0x20) > [ 11.897188] [] (SyS_fsync) from [] > (ret_fast_syscall+0x0/0x3c) > [ 11.905116] Code: e14b05fc e1a7 eb1251f7 e1993f9f (e2833001) > [ 13.418969] ---[ end trace d7bcd93aea7d243c ]--- > [ 13.442420] Kernel panic - not syncing: Fatal exception I looked into this a bit further, and it looks like my issue is that the btrfs_root* from BTRFS_I(inode)->root is somehow `1` instead of an actual pointer value, so it appears this isn't quite an alignment issue. Kernel starts without issue if I just stop doing the overlay (and as a result stop using btrfs at all). I've added some debugging, and the actual file it's trying to fsync appears to be: `/var/lib/connman/settings.4CAC7X` That directory is a overlayfs dir with a btrfs upper and a squashfs lower. Ideas on why the btrfs_root pointer is 0x1? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs_sync_file alignment trap on arm (kernel 4.2.5)
On Fri, Nov 6, 2015 at 10:16 PM, Cody P Schaferwrote: > On Wed, Nov 4, 2015 at 5:55 PM, Cody P Schafer wrote: >> Ideas as to what could cause this would be appreciated. >> >> This consistently is triggered shortly after boot (I presume due to >> conmand calling fsync on a file). >> >> Note that I'm not quite running 4.2.5, but none of the changes I have >> additionally applied are to btrfs or atomics. >> >> Let me know if there is a way for me to get you more info. >> >> It looks like the line is: >> >> mutex_lock(>i_mutex); >> atomic_inc(>log_batch); >> full_sync = test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, >> _I(inode)->runtime_flags); >> >> >> addr2line of the trapping instruction: >> addr2line -e >> ./work/beaglebone-poky-linux-gnueabi/linux-yocto-ikabit/4.2.5+gitAUTOINC+c29ac1-r1/linux-beaglebone-standard-build/arch/arm/boot/vmlinux >> c0217a34 -i >> >> /home/cody/obj/y/tmp/work-shared/beaglebone/kernel-source/arch/arm/include/asm/atomic.h:194 >> >> /home/cody/obj/y/tmp/work-shared/beaglebone/kernel-source/fs/btrfs/file.c:1886 >> >> fault log: >> >> [ 11.488382] Alignment trap: not handling instruction e1993f9f at >> [] >> [ 11.560558] Unhandled fault: alignment exception (0x001) at 0x026d >> [ 11.607301] pgd = dc09 >> [ 11.610166] [026d] *pgd=9c063831, *pte=, *ppte= >> [ 11.665548] Internal error: : 1 [#1] PREEMPT ARM >> [ 11.670388] Modules linked in: omaplfb(O) bufferclass_ti(O) pvrsrvkm(O) >> [ 11.677341] CPU: 0 PID: 248 Comm: connmand Tainted: G O >> 4.2.5-yocto-standard #1 >> [ 11.686172] Hardware name: Generic AM33XX (Flattened Device Tree) >> [ 11.692551] task: dc00d100 ti: dc068000 task.ti: dc068000 >> [ 11.698219] PC is at btrfs_sync_file+0x104/0x3f4 >> [ 11.703051] LR is at btrfs_sync_file+0x100/0x3f4 >> [ 11.707883] pc : []lr : []psr: 60060013 >> [ 11.707883] sp : dc069e98 ip : dc069e98 fp : dc069f1c >> [ 11.719897] r10: r9 : 026d r8 : dd746b40 >> [ 11.725363] r7 : dcef8dcc r6 : r5 : dcef8d68 r4 : 0001 >> [ 11.732192] r3 : 7fff r2 : r1 : r0 : dcef8dcc >> [ 11.739024] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment >> user >> [ 11.746491] Control: 10c5387d Table: 9c090019 DAC: 0015 >> [ 11.752502] Process connmand (pid: 248, stack limit = 0xdc068210) >> [ 11.758878] Stack: (0xdc069e98 to 0xdc06a000) >> [ 11.763436] 9e80: >> 7fff >> [ 11.771998] 9ea0: dc069f38 c014582c b30b >> 37f6 81a4 d5012f68 >> [ 11.780559] 9ec0: 8000 >> 008d >> [ 11.789121] 9ee0: 0400 0001 20b04e34 563924e6 >> dd746b40 dce2beb8 >> [ 11.797683] 9f00: dc068000 dc069f54 >> dc069f20 c0171d00 c0217940 >> [ 11.806245] 9f20: 7fff dc069f38 c0145b6c >> dd746b40 dd746b40 >> [ 11.814807] 9f40: 0076 c000f7a4 dc069f74 dc069f58 c0171d3c >> c0171c40 7fff >> [ 11.823369] 9f60: c015cc9c dc069f94 dc069f78 c0171d7c >> c0171d14 bebe2bf8 >> [ 11.831931] 9f80: 00e64c4d b6a104d0 dc069fa4 dc069f98 c017204c >> c0171d50 dc069fa8 >> [ 11.840492] 9fa0: c000f600 c017203c bebe2bf8 00e64c4d 0009 >> bebe2bf8 008d >> [ 11.849054] 9fc0: bebe2bf8 00e64c4d b6a104d0 0076 00e64810 >> 00e60cd0 0009 >> [ 11.857616] 9fe0: bebe2bd4 b6e63384 b6dcef00 60060010 >> 0009 044a1103 099b0303 >> [ 11.866199] [] (btrfs_sync_file) from [] >> (vfs_fsync_range+0xcc/0xd4) >> [ 11.874674] [] (vfs_fsync_range) from [] >> (vfs_fsync+0x34/0x3c) >> [ 11.882600] [] (vfs_fsync) from [] >> (do_fsync+0x38/0x54) >> [ 11.889890] [] (do_fsync) from [] >> (SyS_fsync+0x1c/0x20) >> [ 11.897188] [] (SyS_fsync) from [] >> (ret_fast_syscall+0x0/0x3c) >> [ 11.905116] Code: e14b05fc e1a7 eb1251f7 e1993f9f (e2833001) >> [ 13.418969] ---[ end trace d7bcd93aea7d243c ]--- >> [ 13.442420] Kernel panic - not syncing: Fatal exception > > I looked into this a bit further, and it looks like my issue is that > the btrfs_root* from BTRFS_I(inode)->root is somehow `1` instead of an > actual pointer value, so it appears this isn't quite an alignment > issue. > > Kernel starts without issue if I just stop doing the overlay (and as a > result stop using btrfs at all). > > I've added some debugging, and the actual file it's trying to fsync > appears to be: `/var/lib/connman/settings.4CAC7X` > > That directory is a overlayfs dir with a btrfs upper and a squashfs lower. > > Ideas on why the btrfs_root pointer is 0x1? Just a bad interaction with overlayfs, this is being discussed at this thread: http://www.spinics.net/lists/linux-btrfs/msg47744.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a