[PATCH v2] btrfs-progs: skip fs with no seed when build seed/sprout mapping for fi show
There is no need to try to build seed/sprout mapping for those btrfs without seed devices, so just skip such fs. We could get the total number of devices from the disk super block, if it equals the number of items in list @fs_devices->devices, then there shouldn't be any seed devices. Signed-off-by: Gui Hecheng --- changlog v1->v2: adopt more natural function name: no_seed_devices ==> has_seed_devices --- cmds-filesystem.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index e4b2785..c55c486 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -750,6 +750,22 @@ static int find_and_copy_seed(struct btrfs_fs_devices *seed, return 1; } +static int has_seed_devices(struct btrfs_fs_devices *fs_devices) +{ + struct btrfs_device *device; + int dev_cnt_total, dev_cnt = 0; + + device = list_first_entry(&fs_devices->devices, struct btrfs_device, + dev_list); + + dev_cnt_total = device->total_devs; + + list_for_each_entry(device, &fs_devices->devices, dev_list) + dev_cnt++; + + return dev_cnt_total != dev_cnt; +} + static int map_seed_devices(struct list_head *all_uuids, char *search, int *found) { @@ -775,6 +791,10 @@ static int map_seed_devices(struct list_head *all_uuids, *found = 1; } + /* skip fs without seeds */ + if (!has_seed_devices(cur_fs)) + continue; + /* skip all fs already shown as mounted fs */ if (is_seen_fsid(cur_fs->fsid)) continue; @@ -956,8 +976,8 @@ devs_only: } /* -* scan_for_btrfs() don't build seed/sprout mapping, -* do mapping build for each scanned fs here +* The seed/sprout mapping are not detected yet, +* do mapping build for all umounted fs */ ret = map_seed_devices(&all_uuids, search, &found); if (ret) { -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs-progs: fix wrong num_devices for btrfs fi show with seed devices
On Sat, 2014-11-08 at 19:47 +, Mike Fleetwood wrote: > On 7 November 2014 18:16, David Sterba wrote: > > On Fri, Nov 07, 2014 at 10:07:43AM +0800, Gui Hecheng wrote: > >> The @fi_args->num_devices in @get_fs_info() does not include seed devices. > >> We could just correct it by searching the chunk tree and count how > >> many dev_items there are in total which includes seed devices. > >> > >> Signed-off-by: Gui Hecheng > >> --- > >> *Note* > >> This is just a temporary workaround to fix this problem in order to > >> make users happy, because a new ioctl or sysfs interface to handle this > >> problem needs more discussions and efforts. After the work implemented > >> and accepted, we could drop this. > > > > Nice, thanks. I agree that this kind of workaround is best possible for > > the moment, and I'm glad to see that it's not that much code to get the > > seeding devices right. This would also work with older kernels without > > the updated sysfs/ioctl interfaces, so this is likely to stay for a long > > time. > > > >> +u64 find_max_id(struct btrfs_ioctl_search_args *search_args, int nr_items) > > > > That's a very generic name for a function that does a very specialized > > thing, but I don't have a suggestion right now. > > Is find_max_device_id a suitable name? I think this one is really more clear. > >> +int correct_fs_info(int fd, struct btrfs_ioctl_fs_info_args *fi_args) > > > > Same here, make fs_info correct but in what way? A comment would be good > > as well. > > Sorry, no suggestion for this function name. may be @search_chunk_tree_for_fs_info? Thanks for the suggestions. -Gui > Thanks, > Mike -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: fix dead lock while running replace and defrag concurrently
This can be reproduced by fstests: btrfs/070 The scenario is like the following: replace worker thread defrag thread - - copy_nocow_pages_worker btrfs_defrag_file copy_nocow_pages_for_inode... btrfs_writepages |A| lock_extent_bits extent_write_cache_pages |B| lock_page __extent_writepage ... writepage_delalloc find_lock_delalloc_range |B| lock_extent_bits find_or_create_page pagecache_get_page |A| lock_page This leads to an ABBA pattern deadlock. To fix it, o we just change it to an AABB pattern which means to @unlock_extent_bits() before we @lock_page(), and in this way the @extent_read_full_page_nolock() is no longer in an locked context, so change it back to @extent_read_full_page() to regain protection. o Since we @unlock_extent_bits() earlier, then before @write_page_nocow(), the extent may not really point at the physical block we want, so we have to check it before write. Signed-off-by: Gui Hecheng --- fs/btrfs/scrub.c | 90 +--- 1 file changed, 60 insertions(+), 30 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index efa0831..4325bb0 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3310,6 +3310,50 @@ out: scrub_pending_trans_workers_dec(sctx); } +static int check_extent_to_block(struct inode *inode, u64 start, u64 len, +u64 logical) +{ + struct extent_state *cached_state = NULL; + struct btrfs_ordered_extent *ordered; + struct extent_io_tree *io_tree; + struct extent_map *em; + u64 lockstart = start, lockend = start + len - 1; + int ret = 0; + + io_tree = &BTRFS_I(inode)->io_tree; + + lock_extent_bits(io_tree, lockstart, lockend, 0, &cached_state); + ordered = btrfs_lookup_ordered_range(inode, lockstart, len); + if (ordered) { + btrfs_put_ordered_extent(ordered); + ret = 1; + goto out_unlock; + } + + em = btrfs_get_extent(inode, NULL, 0, start, len, 0); + if (IS_ERR(em)) { + ret = PTR_ERR(em); + goto out_unlock; + } + + /* +* This extent does not actually cover the logical extent anymore, +* move on to the next inode. +*/ + if (em->block_start > logical || + em->block_start + em->block_len < logical + len) { + free_extent_map(em); + ret = 1; + goto out_unlock; + } + free_extent_map(em); + +out_unlock: + unlock_extent_cached(io_tree, lockstart, lockend, &cached_state, +GFP_NOFS); + return ret; +} + static int copy_nocow_pages_for_inode(u64 inum, u64 offset, u64 root, struct scrub_copy_nocow_ctx *nocow_ctx) { @@ -3318,13 +3362,10 @@ static int copy_nocow_pages_for_inode(u64 inum, u64 offset, u64 root, struct inode *inode; struct page *page; struct btrfs_root *local_root; - struct btrfs_ordered_extent *ordered; - struct extent_map *em; - struct extent_state *cached_state = NULL; struct extent_io_tree *io_tree; u64 physical_for_dev_replace; + u64 nocow_ctx_logical; u64 len = nocow_ctx->len; - u64 lockstart = offset, lockend = offset + len - 1; unsigned long index; int srcu_index; int ret = 0; @@ -3356,30 +3397,13 @@ static int copy_nocow_pages_for_inode(u64 inum, u64 offset, u64 root, physical_for_dev_replace = nocow_ctx->physical_for_dev_replace; io_tree = &BTRFS_I(inode)->io_tree; + nocow_ctx_logical = nocow_ctx->logical; - lock_extent_bits(io_tree, lockstart, lockend, 0, &cached_state); - ordered = btrfs_lookup_ordered_range(inode, lockstart, len); - if (ordered) { - btrfs_put_ordered_extent(ordered); - goto out_unlock; - } - - em = btrfs_get_extent(inode, NULL, 0, lockstart, len, 0); - if (IS_ERR(em)) { - ret = PTR_ERR(em); - goto out_unlock; - } - - /* -* This extent does not actually cover the logical extent anymore, -* move on to the next inode. -*/ - if (em->block_start > nocow_ctx->logical || - em->block_start + em->block_len < nocow_ctx->logical + len) { - free_extent_map(em); - goto out_unlock; + ret = check_extent_to_block(inode, offset, len, nocow_ctx_logical); + if (ret) { + ret = ret > 0 ? 0 : ret; + goto out; } - free_extent_map(em); while (len >=
Re: [PATCH] Btrfs: fix incorrect compression ratio detection
On Tue, 7 Oct 2014 18:44:35 -0400, Wang Shilong wrote: > Steps to reproduce: > # mkfs.btrfs -f /dev/sdb > # mount -t btrfs /dev/sdb /mnt -o compress=lzo > # dd if=/dev/zero of=/mnt/data bs=$((33*4096)) count=1 > > after previous steps, inode will be detected as bad compression ratio, > and NOCOMPRESS flag will be set for that inode. > > Reason is that compress have a max limit pages every time(128K), if a > 132k write in, it will be splitted into two write(128k+4k), this bug > is a leftover for commit 68bb462d42a(Btrfs: don't compress for a small write) > > Fix this problem by checking every time before compression, if it is a > small write(<=blocksize), we bail out and fall into nocompression directly. > > Signed-off-by: Wang Shilong Looks good. Reviewed-by: Miao Xie > --- > fs/btrfs/inode.c | 16 > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 344a322..b78e90a 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -411,14 +411,6 @@ static noinline int compress_file_range(struct inode > *inode, > (start > 0 || end + 1 < BTRFS_I(inode)->disk_i_size)) > btrfs_add_inode_defrag(NULL, inode); > > - /* > - * skip compression for a small file range(<=blocksize) that > - * isn't an inline extent, since it dosen't save disk space at all. > - */ > - if ((end - start + 1) <= blocksize && > - (start > 0 || end + 1 < BTRFS_I(inode)->disk_i_size)) > - goto cleanup_and_bail_uncompressed; > - > actual_end = min_t(u64, isize, end + 1); > again: > will_compress = 0; > @@ -440,6 +432,14 @@ again: > > total_compressed = actual_end - start; > > + /* > + * skip compression for a small file range(<=blocksize) that > + * isn't an inline extent, since it dosen't save disk space at all. > + */ > + if (total_compressed <= blocksize && > +(start > 0 || end + 1 < BTRFS_I(inode)->disk_i_size)) > + goto cleanup_and_bail_uncompressed; > + > /* we want to make sure that amount of ram required to uncompress >* an extent is reasonable, so we limit the total size in ram >* of a compressed extent to 128k. This is a crucial number > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: corruption, bad block, input/output errors - do i run --repair?
Original Message Subject: corruption, bad block, input/output errors - do i run --repair? From: Matt McKinnon To: Date: 2014年11月07日 22:33 Hi All, I'm running into some corruption and I wanted to seek out advice on whether or not to run btrfs check --repair, or if I should fall back to my backup file server, or both. The system is mountable, and usable. # uname -a Linux cbmm-fs 3.17.2-custom #1 SMP Thu Oct 30 14:09:57 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version Btrfs v3.14.2 # btrfs fi show Label: none uuid: 30c15060-8fb4-4926-87d4-f7d08c3033c5 Total devices 1 FS bytes used 58.92TiB devid1 size 76.40TiB used 59.05TiB path /dev/sda1 # btrfs fi df /home Data, single: total=58.75TiB, used=58.75TiB System, DUP: total=32.00MiB, used=2.66MiB System, single: total=4.00MiB, used=3.68MiB Metadata, DUP: total=119.00GiB, used=116.63GiB Metadata, single: total=64.01GiB, used=57.68GiB GlobalReserve, single: total=512.00MiB, used=0.00B I did run into some RO snapshot corruption which caused me to run btrfs check: This is a known bug in 3.17 with RO snapshot. It's fixable but not with your old 3.14 btrfs-progs. Please update to 3.17 btrfs-progs, and re-run btrfsck(without --repair), and it will prompt you to use --repair if this is the exact bug. Then run with --repair should fix it. Thanks, Qu parent transid verify failed on 20809493159936 wanted 4486137218058286914 found 390978 parent transid verify failed on 20809493159936 wanted 4486137218058286914 found 390978 Ignoring transid failure Checking filesystem on /dev/sda1 UUID: 30c15060-8fb4-4926-87d4-f7d08c3033c5 checking extents bad block 69290357067776 Errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots ... "dir isize wrong" 1 error "errors 500, file extent discount, nbytes wrong" 14 errors "errors 2001, no inode item, link count wrong" 257302 errors ... found 185063071745 bytes used err is 1 total csum bytes: 8428 total tree bytes: 1889284096 total fs tree bytes: 962678784 total extent tree bytes: 159297536 btree space waste bytes: 340014684 file data blocks allocated: 57344 referenced 57344 Btrfs v3.14.2 Output of a scrub: ERROR: scrubbing /home failed for device id 1 (Input/output error) scrub canceled for 30c15060-8fb4-4926-87d4-f7d08c3033c5 scrub started at Mon Nov 3 06:43:58 2014 and was aborted after 7613 seconds data_extents_scrubbed: 248507555 tree_extents_scrubbed: 10870729 data_bytes_scrubbed: 15375990317056 tree_bytes_scrubbed: 44526505984 read_errors: 0 csum_errors: 0 verify_errors: 0 no_csum: 15712 csum_discards: 988018 super_errors: 0 malloc_errors: 0 uncorrectable_errors: 0 unverified_errors: 0 corrected_errors: 0 last_physical: 15425663205376 Output of a balance: ERROR: error during balancing '/home' - Input/output error There may be more info in syslog - try dmesg | tail [501087.506642] [ cut here ] [501087.543971] WARNING: CPU: 5 PID: 31885 at fs/btrfs/relocation.c:925 build_backref_tree+0x11f0/0x1230 [btrfs]() [501087.543991] Modules linked in: ipmi_devintf(E) autofs4(E) sb_edac(E) edac_core(E) joydev(E) mei_me(E) mei(E) lpc_ich(E) ioatdma(E) ipmi_si(E) wmi(E) mac_hid(E) bnep(E) rfcomm(E) bluetooth(E) lp(E) parport(E) nfsd(E) nfs_acl(E) auth_rpcgss(E) nfs(E) fscache(E) lockd(E) sunrpc(E) ses(E) enclosure(E) hid_generic(E) ahci(E) libahci(E) usbhid(E) hid(E) igb(E) dca(E) i2c_algo_bit(E) ptp(E) pps_core(E) megaraid_sas(E) btrfs(E) raid6_pq(E) xor(E) libcrc32c(E) [501087.543995] CPU: 5 PID: 31885 Comm: btrfs Tainted: G D E 3.17.2-custom #1 [501087.543997] Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0a 12/27/2013 [501087.543999] 039d 88000eadb808 8176733c 0282 [501087.544001] 88000eadb848 8107163c 1000 [501087.544003] 8801d0d9acf0 880497c70380 0001 0001 [501087.544004] Call Trace: [501087.544014] [] dump_stack+0x46/0x58 [501087.544022] [] warn_slowpath_common+0x8c/0xc0 [501087.544024] [] warn_slowpath_null+0x1a/0x20 [501087.544039] [] build_backref_tree+0x11f0/0x1230 [btrfs] [501087.544052] [] relocate_tree_blocks+0x2d1/0x690 [btrfs] [501087.544060] [] ? kmem_cache_alloc_trace+0x39/0x1f0 [501087.544072] [] relocate_block_group+0x202/0x5f0 [btrfs] [501087.544083] [] btrfs_relocate_block_group+0x1b0/0x2d0 [btrfs] [501087.544098] [] btrfs_relocate_chunk.isra.62+0x75/0x760 [btrfs] [501087.544111] [] ? release_extent_buffer+0x36/0xe0 [btrfs] [501087.544124] [] ? free_extent_buffer+0x61/0xc0 [btrfs] [501087.544136] [] btrfs_balance+0x8ab/0xf50 [btrfs] [501087.544150] [] btrfs_ioctl_balance+0x1cc/0x530 [btrfs] [501087.544156] [] ? lru_cache_add_active_or_unevictable+0x2b/0xa0 [501087.544168]
Re: [PATCH] fstests: btrfs, add regression test for clone ioctl
Btrfs folks, review please... . On Tue, Oct 21, 2014 at 09:43:11PM +0100, Filipe Manana wrote: > Regression test for a btrfs clone ioctl issue where races between > a clone operation and concurrent target file reads would result in > leaving stale data in the page cache. After the clone operation > finished, reading from the clone target file would return the old > and no longer valid data. This affected only buffered reads (i.e. > didn't affect direct IO reads). > > This issue was fixed by the following linux kernel patch: > > Btrfs: ensure readers see new data after a clone operation > (commit c125b8bff1d9f6c8c91ce4eb8bd5616058c7d510) > > Signed-off-by: Filipe Manana > --- > tests/btrfs/081 | 131 > > tests/btrfs/081.out | 4 ++ > tests/btrfs/group | 1 + > 3 files changed, 136 insertions(+) > create mode 100755 tests/btrfs/081 > create mode 100644 tests/btrfs/081.out > > diff --git a/tests/btrfs/081 b/tests/btrfs/081 > new file mode 100755 > index 000..d2e3767 > --- /dev/null > +++ b/tests/btrfs/081 > @@ -0,0 +1,131 @@ > +#! /bin/bash > +# FSQA Test No. 081 > +# > +# Regression test for a btrfs clone ioctl issue where races between > +# a clone operation and concurrent target file reads would result in > +# leaving stale data in the page cache. After the clone operation > +# finished, reading from the clone target file would return the old > +# and no longer valid data. This affected only buffered reads (i.e. > +# didn't affect direct IO reads). > +# > +# This issue was fixed by the following linux kernel patch: > +# > +# Btrfs: ensure readers see new data after a clone operation > +# (commit c125b8bff1d9f6c8c91ce4eb8bd5616058c7d510) > +# > +#--- > +# > +# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved. > +# Author: Filipe Manana > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#--- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# real QA test starts here > +_need_to_be_root > +_supported_fs btrfs > +_supported_os Linux > +_require_scratch > +_require_btrfs_cloner > + > +rm -f $seqres.full > + > +num_extents=100 > +extent_size=8192 > + > +create_source_file() > +{ > + name=$1 > + > + # Create a file with $num_extents extents, each with a size of > + # $extent_size bytes. > + touch $SCRATCH_MNT/$name > + for ((i = 0; i < $num_extents; i++)); do > + off=$((i * $extent_size)) > + run_check $XFS_IO_PROG \ > + -c "pwrite -S $i -b $extent_size $off $extent_size" \ > + -c "fsync" $SCRATCH_MNT/$name xfs_io dumps errors into the output file and that causes failures. there is no need to use run_check here. -Dave. > + done > +} > + > +create_target_file() > +{ > + name=$1 > + file_size=$(($num_extents * $extent_size)) > + > + run_check $XFS_IO_PROG -f -c "pwrite -S 0xff 0 $file_size" \ > + -c "fsync" $SCRATCH_MNT/$name > +} > + > +reader_loop() > +{ > + name=$1 > + > + while true; do > + cat $SCRATCH_MNT/$name > /dev/null > + done > +} > + > +_scratch_mkfs >>$seqres.full 2>&1 > +_scratch_mount > + > +create_source_file "foo" > +create_target_file "bar" > + > +reader_loop "bar" & > +reader_pid=$! > + > +$CLONER_PROG -s 0 -d 0 -l $(($num_extents * $extent_size)) \ > + $SCRATCH_MNT/foo $SCRATCH_MNT/bar > + > +kill $reader_pid > /dev/null 2>&1 > + > +# Now both foo and bar should have exactly the same content. > +# This didn't use to be the case before the btrfs kernel fix mentioned > +# above. The clone ioctl was racy, as it removed bar's pages from the > +# page cache and only after it would update bar's metadata to point to > +# the same extents that foo's metadata points to - and this was done in > +# an unprotected way, so that a file read request done right after the > +# clone ioctl removed the pages from the page cache and
Re: [PATCH v2] fstests: btrfs, add test for snapshoting after file write + truncate
Ping for Btrfs folks, review please... On Sun, Oct 26, 2014 at 11:39:03AM +, Filipe Manana wrote: > Regression test for a btrfs issue where if right after the snapshot > creation ioctl started, a file write followed by a file truncate > happened, with both operations increasing the file's size, the created > snapshot would capture an inconsistent state of the file system tree. > That state reflected the file truncation but it didn't reflect the > write operation, and left a gap between two file extent items (and > that gap corresponded to the total or a partial area of the write > operation's range). > > This issue was fixed by the following linux kernel patch: > > Btrfs: fix snapshot inconsistency after a file write followed by truncate > > Signed-off-by: Filipe Manana > --- > > V2: Added some background processes to cause some cpu load. This makes the > test fail always on environments with a non-debug kernel and where no > other significant load (other the test itself) is running. > > tests/btrfs/080 | 169 > > tests/btrfs/080.out | 2 + > tests/btrfs/group | 1 + > 3 files changed, 172 insertions(+) > create mode 100755 tests/btrfs/080 > create mode 100644 tests/btrfs/080.out > > diff --git a/tests/btrfs/080 b/tests/btrfs/080 > new file mode 100755 > index 000..a5d3b38 > --- /dev/null > +++ b/tests/btrfs/080 > @@ -0,0 +1,169 @@ > +#! /bin/bash > +# FSQA Test No. 080 > +# > +# Regression test for a btrfs issue where if right after the snapshot > creation > +# ioctl started, a file write followed by a file truncate happened, with both > +# operations increasing the file's size, the created snapshot would capture > an > +# inconsistent state of the file system tree. That state reflected the file > +# truncation but it didn't reflect the write operation, and left a gap > between > +# two file extent items (and that gap corresponded to the total or a partial > +# area of the write operation's range). > +# > +# This issue was fixed by the following linux kernel patch: > +# > +# Btrfs: fix snapshot inconsistency after a file write followed by > truncate > +# > +#--- > +# > +# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved. > +# Author: Filipe Manana > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#--- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + for p in ${cpu_stress_pids[*]}; do > + kill $p &> /dev/null > + done > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# real QA test starts here > +_need_to_be_root > +_supported_fs btrfs > +_supported_os Linux > +_require_scratch_nocheck > + > +rm -f $seqres.full > + > +create_snapshot() > +{ > + local ts=`date +'%H_%M_%S_%N'` > + > + _run_btrfs_util_prog subvolume snapshot -r \ > + $SCRATCH_MNT $SCRATCH_MNT/"${ts}_snap" > +} > + > +create_file() > +{ > + local name=$1 > + > + run_check $XFS_IO_PROG -f \ > + -c "pwrite -S 0xaa -b 32K 0 32K" \ > + -c "fsync" \ > + -c "pwrite -S 0xbb -b 32770 16K 32770" \ > + -c "truncate 90123" \ > + $SCRATCH_MNT/$name > +} > + > +workout() > +{ > + local name=$1 > + > + create_file $name & > + fpid=$! > + create_snapshot & > + spid=$! > + wait $fpid > + create_ret=$? > + wait $spid > + snap_ret=$? > + if [ $create_ret != 0 -o $snap_ret != 0 ]; then > + _fail "Failure creating file or snapshot, check $seqres.full > for details" > + fi > +} > + > +# If the installed btrfs mkfs supports the no-holes feature, make sure the > +# created fs doesn't get that feature enabled. With it enabled, the below > fsck > +# call wouldn't fail. This feature hasn't been enabled by default since it > was > +# introduced, but be safe and explicitly disable it. > +_scratch_mkfs -O list-all 2>&1 | grep -q '\bno\-holes\b' > +if [ $? -eq 0 ]; then > + mkfs_options="-O ^no-holes" >
Re: [PATCH] btrfs: regression test for btrfs extent merge
On Thu, Oct 16, 2014 at 04:32:06PM +0800, Qu Wenruo wrote: > Original Message > Subject: Re: [PATCH] btrfs: regression test for btrfs extent merge > From: Eryu Guan > >>+ > >>+tmp=`mktemp -d` > >I'm a bit concerned about this. Some functions in common/rc use > >$tmp.some_sufix as temp file. If $tmp defined as a dir here, I'm not > >sure if it will cause any trouble. > It seems that most tests uses /tmp/$$ as tmp, only some exception > like btrfs/077 uses mktemp -d. > So I'd better use the old /tmp/$$ method. Can you please send a patch to fix btrfs/077 as well? Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: add generic test to verify xattr replace operations are atomic
On Mon, Nov 10, 2014 at 12:49:12AM +, Filipe David Manana wrote: > On Sun, Nov 9, 2014 at 11:45 PM, Dave Chinner wrote: > > On Fri, Nov 07, 2014 at 08:40:26PM +, Filipe Manana wrote: > >> This test verifies that replacing a xattr's value is an atomic > >> operation. This is motivated by an issue in btrfs where replacing > >> a xattr's value wasn't an atomic operation, it consisted of > >> removing the old value and then inserting the new value in a > >> btree. This made readers (getxattr and listxattrs) not getting > >> neither the old nor the new value during a short time window. > > > > OK, seems like a good thing to test that the application can only > > see the old or the new value. > > > > However, I can't help but wonder about whether the btrfs behaviour > > is crash safe as it wasn't designed to be atomic from the ground up. > > i.e. if the system crashes half way through a attribute overwrite, > > what does btrfs end up with as a result? XFS is guaranteed at a > > transactional level to return either the old or the new value, > > depending on where in the operaiton the crash occurred, but I'd just > > assumed that everyone did attribute replace atomically so it never > > occurred to me that it might be an issue... > > It's crash safe. Both the delete and insert were done in the same > transaction, so a crash in between both operations (or after both and > before the transaction commit) would result in always seeing the old > value (or better saying, the last persisted value by a transaction > commit or fsync). Alright, so no crash issues because all the modifications are in a single transaction. However, if both modifications are made in the same transaction, then this bug implies that a user can read a metadata object in the btree whilst somethign else is concurrently modifying it, right? i.e. that there is no serialisation between inode metadata reads and transactional modification operations? Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] fstests: add generic test to verify xattr replace operations are atomic
This test verifies that replacing a xattr's value is an atomic operation. This is motivated by an issue in btrfs where replacing a xattr's value wasn't an atomic operation, it consisted of removing the old value and then inserting the new value in a btree. This made readers (getxattr and listxattrs) not getting neither the old nor the new value during a short time window. Signed-off-by: Filipe Manana --- V2: Several changes after Dave's review. Renamed test file, added test to the metadata category and filtered xattr value out of getfattr to include it in the golden output. tests/generic/036 | 102 + tests/generic/036.out | 1001 + tests/generic/group |1 + 3 files changed, 1104 insertions(+) create mode 100755 tests/generic/036 create mode 100644 tests/generic/036.out diff --git a/tests/generic/036 b/tests/generic/036 new file mode 100755 index 000..ff5bef7 --- /dev/null +++ b/tests/generic/036 @@ -0,0 +1,102 @@ +#! /bin/bash +# FSQA Test No. 036 +# +# Verify that replacing a xattr's value is an atomic operation. +# This is motivated by an issue in btrfs where replacing a xattr's value +# wasn't an atomic operation, it consisted of removing the old value and +# then inserting the new value in a btree. This made readers (getxattr +# and listxattrs) not getting neither the old nor the new value during +# a short time window. +# +# The btrfs issue was fixed by the following linux kernel patch: +# +#Btrfs: make xattr replace operations atomic +# +#--- +# +# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + kill $setter_pid + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/attr + +# real QA test starts here +_need_to_be_root +_supported_fs generic +_supported_os Linux +_require_scratch +_require_attrs + +rm -f $seqres.full + +xattr_name="user.something" +xattr_value1="foobar" +xattr_value2="rabbit_hole" + +set_xattr_loop() +{ + local name=$1 + local cur_val=$xattr_value1 + while true; do + $SETFATTR_PROG -n $xattr_name -v $cur_val $SCRATCH_MNT/$name + if [ "$cur_val" == "$xattr_value1" ]; then + cur_val=$xattr_value2 + else + cur_val=$xattr_value1 + fi + done +} + +value_filter() +{ + grep "$xattr_name=" | cut -d '=' -f 2 | \ + sed -e "s/$xattr_value1/GOOD/" -e "s/$xattr_value2/GOOD/" +} + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount + +test_file="test_xattr_replace" +touch $SCRATCH_MNT/$test_file +$SETFATTR_PROG -n $xattr_name -v $xattr_value1 $SCRATCH_MNT/$test_file + +set_xattr_loop $test_file & +setter_pid=$! + +for ((i = 0; i < 1000; i++)); do + $GETFATTR_PROG --absolute-names -n $xattr_name \ + $SCRATCH_MNT/$test_file | value_filter +done + +status=0 +exit diff --git a/tests/generic/036.out b/tests/generic/036.out new file mode 100644 index 000..c440719 --- /dev/null +++ b/tests/generic/036.out @@ -0,0 +1,1001 @@ +QA output created by 036 +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD" +"GOOD
Re: [PATCH] fstests: add generic test to verify xattr replace operations are atomic
On Sun, Nov 9, 2014 at 11:45 PM, Dave Chinner wrote: > On Fri, Nov 07, 2014 at 08:40:26PM +, Filipe Manana wrote: >> This test verifies that replacing a xattr's value is an atomic >> operation. This is motivated by an issue in btrfs where replacing >> a xattr's value wasn't an atomic operation, it consisted of >> removing the old value and then inserting the new value in a >> btree. This made readers (getxattr and listxattrs) not getting >> neither the old nor the new value during a short time window. > > OK, seems like a good thing to test that the application can only > see the old or the new value. > > However, I can't help but wonder about whether the btrfs behaviour > is crash safe as it wasn't designed to be atomic from the ground up. > i.e. if the system crashes half way through a attribute overwrite, > what does btrfs end up with as a result? XFS is guaranteed at a > transactional level to return either the old or the new value, > depending on where in the operaiton the crash occurred, but I'd just > assumed that everyone did attribute replace atomically so it never > occurred to me that it might be an issue... It's crash safe. Both the delete and insert were done in the same transaction, so a crash in between both operations (or after both and before the transaction commit) would result in always seeing the old value (or better saying, the last persisted value by a transaction commit or fsync). > > Perhaps another test involving dm-flakey to trigger errors and > shutdowns whilst in the middle of replacing attributes might be a > good thing? > > [] > >> +_cleanup() >> +{ >> + if [ ! -z $setter_pid ]; then >> + kill $setter_pid &> /dev/null >> + fi > > No need to do this if[] then. You're already redirecting all errors > to /dev/null, so if $setter_pid is empty it will just output the > help string. > >> + rm -f $tmp.* >> +} >> + >> +# get standard environment, filters and checks >> +. ./common/rc >> +. ./common/filter >> +. ./common/attr >> + >> +# real QA test starts here >> +_need_to_be_root >> +_supported_fs generic >> +_supported_os Linux >> +_require_scratch >> +_require_attrs >> + >> +rm -f $seqres.full >> + >> +xattr_name="user.something" >> +xattr_value1="foobar" >> +xattr_value2="rabbit_hole" >> + >> +set_xattr_loop() >> +{ >> + local name=$1 >> + >> + local cur_val=$xattr_value1 >> + while true; do >> + $SETFATTR_PROG -n $xattr_name -v $cur_val $SCRATCH_MNT/$name >> + if [ "$cur_val" == "$xattr_value1" ]; then >> + cur_val=$xattr_value2 >> + else >> + cur_val=$xattr_value1 >> + fi >> + done >> +} >> + >> +_scratch_mkfs >>$seqres.full 2>&1 >> +_scratch_mount >> + >> +test_file="test_xattr_replace" >> +touch $SCRATCH_MNT/$test_file >> +$SETFATTR_PROG -n $xattr_name -v $xattr_value1 $SCRATCH_MNT/$test_file >> + >> +set_xattr_loop $test_file & >> +setter_pid=$! >> + >> +for ((i = 0; i < 1000; i++)); do >> + xattr_val=$($GETFATTR_PROG --absolute-names -n $xattr_name \ >> + $SCRATCH_MNT/$test_file | grep "$xattr_name=" | cut -d '=' -f >> 2) >> + if [ "$xattr_val" != "\"$xattr_value1\"" -a \ >> + "$xattr_val" != "\"$xattr_value2\"" ]; then >> + _fail "Missing or unexpected xattr value: $xattr_val" >> + fi > > I'd suggest that a filter is a much better way to do this. > > name_filter() { > grep "$xattr_name=" | cut -d '=' -f 2 | \ > sed -e "s/$xattr_value1/GOOD/" \ > -e "s/$xattr_value2/GOOD/" > } > > for (); do > $GETFATTR_PROG --absolute-names -n $xattr_name \ > $SCRATCH_MNT/$test_file | name_filter > done > > And so the output file should just be: > > GOOD > GOOD > > > And if it is bad, then it outputs the bad attribute value instead > of "GOOD", and the test fails on the golden output match. > >> +kill $setter_pid &> /dev/null >> +unset setter_pid > > See above, no need to unset the var. >> + >> +echo "Silence is golden" >> +status=0 >> +exit >> diff --git a/tests/generic/326.out b/tests/generic/326.out >> new file mode 100644 >> index 000..4ac0db5 >> --- /dev/null >> +++ b/tests/generic/326.out >> @@ -0,0 +1,2 @@ >> +QA output created by 326 >> +Silence is golden > > Except when you are using comparisons and "_fail" instead of > filters. ;) > > BTW, new tests should be numbere as the first unused number, not at > the tail (i.e generic/036). If there are collisions with other new > tests when I merge them, I'll renumber them at merge time. > >> 323 auto aio stress >> 324 auto fsr quick >> 325 auto quick data log >> +326 auto quick xattr > > There is no existing xattr group - normally xattr tests are > considered part of the "metadata" group. If you want to introduce an > xattr test group, do it as a separate patch and include all the > tests (across all the test directories) that manipulate xattrs in > some way. Agreed. V2 follo
Re: [PATCH] fstests: add generic test to verify xattr replace operations are atomic
On Fri, Nov 07, 2014 at 08:40:26PM +, Filipe Manana wrote: > This test verifies that replacing a xattr's value is an atomic > operation. This is motivated by an issue in btrfs where replacing > a xattr's value wasn't an atomic operation, it consisted of > removing the old value and then inserting the new value in a > btree. This made readers (getxattr and listxattrs) not getting > neither the old nor the new value during a short time window. OK, seems like a good thing to test that the application can only see the old or the new value. However, I can't help but wonder about whether the btrfs behaviour is crash safe as it wasn't designed to be atomic from the ground up. i.e. if the system crashes half way through a attribute overwrite, what does btrfs end up with as a result? XFS is guaranteed at a transactional level to return either the old or the new value, depending on where in the operaiton the crash occurred, but I'd just assumed that everyone did attribute replace atomically so it never occurred to me that it might be an issue... Perhaps another test involving dm-flakey to trigger errors and shutdowns whilst in the middle of replacing attributes might be a good thing? [] > +_cleanup() > +{ > + if [ ! -z $setter_pid ]; then > + kill $setter_pid &> /dev/null > + fi No need to do this if[] then. You're already redirecting all errors to /dev/null, so if $setter_pid is empty it will just output the help string. > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > +. ./common/attr > + > +# real QA test starts here > +_need_to_be_root > +_supported_fs generic > +_supported_os Linux > +_require_scratch > +_require_attrs > + > +rm -f $seqres.full > + > +xattr_name="user.something" > +xattr_value1="foobar" > +xattr_value2="rabbit_hole" > + > +set_xattr_loop() > +{ > + local name=$1 > + > + local cur_val=$xattr_value1 > + while true; do > + $SETFATTR_PROG -n $xattr_name -v $cur_val $SCRATCH_MNT/$name > + if [ "$cur_val" == "$xattr_value1" ]; then > + cur_val=$xattr_value2 > + else > + cur_val=$xattr_value1 > + fi > + done > +} > + > +_scratch_mkfs >>$seqres.full 2>&1 > +_scratch_mount > + > +test_file="test_xattr_replace" > +touch $SCRATCH_MNT/$test_file > +$SETFATTR_PROG -n $xattr_name -v $xattr_value1 $SCRATCH_MNT/$test_file > + > +set_xattr_loop $test_file & > +setter_pid=$! > + > +for ((i = 0; i < 1000; i++)); do > + xattr_val=$($GETFATTR_PROG --absolute-names -n $xattr_name \ > + $SCRATCH_MNT/$test_file | grep "$xattr_name=" | cut -d '=' -f 2) > + if [ "$xattr_val" != "\"$xattr_value1\"" -a \ > + "$xattr_val" != "\"$xattr_value2\"" ]; then > + _fail "Missing or unexpected xattr value: $xattr_val" > + fi I'd suggest that a filter is a much better way to do this. name_filter() { grep "$xattr_name=" | cut -d '=' -f 2 | \ sed -e "s/$xattr_value1/GOOD/" \ -e "s/$xattr_value2/GOOD/" } for (); do $GETFATTR_PROG --absolute-names -n $xattr_name \ $SCRATCH_MNT/$test_file | name_filter done And so the output file should just be: GOOD GOOD And if it is bad, then it outputs the bad attribute value instead of "GOOD", and the test fails on the golden output match. > +kill $setter_pid &> /dev/null > +unset setter_pid See above, no need to unset the var. > + > +echo "Silence is golden" > +status=0 > +exit > diff --git a/tests/generic/326.out b/tests/generic/326.out > new file mode 100644 > index 000..4ac0db5 > --- /dev/null > +++ b/tests/generic/326.out > @@ -0,0 +1,2 @@ > +QA output created by 326 > +Silence is golden Except when you are using comparisons and "_fail" instead of filters. ;) BTW, new tests should be numbere as the first unused number, not at the tail (i.e generic/036). If there are collisions with other new tests when I merge them, I'll renumber them at merge time. > 323 auto aio stress > 324 auto fsr quick > 325 auto quick data log > +326 auto quick xattr There is no existing xattr group - normally xattr tests are considered part of the "metadata" group. If you want to introduce an xattr test group, do it as a separate patch and include all the tests (across all the test directories) that manipulate xattrs in some way. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] add progress indicator to btrfs-convert
Hi btrfs-ml, I have tried a ext4->btrfs conversation of a 1TB (90%full) and have seen that btrfs has no progressbar or something like that. Furthermore I have found [1] a project idea for this. I have tried a generic tasklet-based solution to print the status of the convert-process. Maybe it looks good to you than please add this to your repo. I have implemented this with the linux specific timerfd facility. To enable this you haver to use the '-p' option. I have noticed that a conversation doesn't show all inodes as converted at the end. How can I improve this? A complete convert-process is showd here: sfr@devhost ./btrfs-progs % ./btrfs-convert -p /dev/sde4 creating btrfs metadata. copy inodes [o] [96/ 106] creating ext2fs image file. cleaning up system chunk. conversion complete. [1] https://btrfs.wiki.kernel.org/index.php/Project_ideas#btrfs-convert Bye and thanks, Silvio Silvio Fricke (2): btrfs-progs: add task-utils btrfs-progs: convert: use task for progress indication of metadata creation Documentation/btrfs-convert.txt | 2 + Makefile| 6 +-- btrfs-convert.c | 60 ++-- task-util.c | 101 task-util.h | 32 + 5 files changed, 193 insertions(+), 8 deletions(-) create mode 100644 task-util.c create mode 100644 task-util.h -- 2.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Quota question
Hey, I did some new tests and the issue with quota exceeded comes from the fact that the data is not sync to disk : http://pastebin.com/fZdh2YRu -- Cyril SCETBON > On 09 Nov 2014, at 18:55, Cyril Scetbon wrote: > > > -- > Cyril SCETBON > >> 1) but is the rewrite in that, or in >> 3.17, or in the coming 3.18, or not yet entirely there at all... that I >> simply don't know. > > If anyone can answer it'd be great ! > >> >> 2) What I *DO* know and *CAN* say with quite some confidence is this: >> >> With btrfs being in general a copy-on-write filesystem, even deletes >> normally require /some/ space, because the metadata pointing at the file >> in question must be rewritten, and that requires free space in ordered to >> happen. > ok >> >> At least in the case of full filesystem ENOSPC errors, the space-related >> questions in the FAQ on the btrfs wiki suggest truncating a file first. >> Find an existing file that either doesn't matter or that you have a >> backup of elsewhere, and echo > filename (or echo >| filename if you have >> noclobber set), therefore truncating the existing file. A few of those >> and with luck you'll free enough space to do some actual deletions. >> >> While to some extent I'm guessing here since I'm not directly familiar >> with quotas myself, it sounds as if such a severe ENOSPC condition, due >> in your case to quotas instead of the general filesystem limits I'm more >> familiar with, might be what you are seeing, particularly since in your >> tests you first triggered the limit, then found a way to write a bit more >> anyway, and only THEN found yourself unable to write anything. >> >> If that's the case, then try the truncate trick and see if it helps. > Unfortunately : http://pastebin.com/1Mw0yiUc >> >> In the generic filesystem case, if the truncate trick doesn't help, >> another possible trick is to temporarily add another device of several >> gigs capacity to the filesystem, enough to let a balance work and >> hopefully reduce the data/metadata imbalance, after which there should be >> enough chunk-unallocated free space left on the original device, to do a >> btrfs device delete of the temporary device, generally still leaving >> space left, since the balance should have reclaimed some otherwise wasted >> chunk allocation. >> >> I can't see how that'd apply to quotas, however, and suspect that the >> corresponding quota-system solution would be to temporarily up the quota >> for the affected user, only long enough to let them do the deletions they >> need to do. But beyond that I can't go, since I simply don't know enough >> about the quota domain to further speculate, at this point. > > I'm pretty sure it'd work by setting a temporary higher quota, but my > questions are : > > - why can I exceed the quotas (580MB when 500MB was set as the limit) ? > > qgroupid exclmax_excl > --- > 1/101999 608632832 524288000 > > - I really think that the quota limit should take into account an amount of > space for operations that would free space > > For this last question I may think that this issue is caused by the fact that > I have only one file taking all the space which won't really be the regular > use case. but i'd like to get an answer :) >> >> While admittedly limited due to my lack of quota-domain knowledge, >> perhaps that sheds at least some light on what happened and why, and >> hopefully on how to get out of the situation. > > Thank you Duncan !-- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs-progs: add task-utils
Signed-off-by: Silvio Fricke --- task-util.c | 121 task-util.h | 33 + 2 files changed, 154 insertions(+) create mode 100644 task-util.c create mode 100644 task-util.h diff --git a/task-util.c b/task-util.c new file mode 100644 index 000..9268df7 --- /dev/null +++ b/task-util.c @@ -0,0 +1,121 @@ + +#include +#include +#include +#include +#include + +#include "task-util.h" + +struct task_info *task_init(void *(*threadfn)(void *), int (*postfn)(void *), + void *thread_private) +{ + struct task_info *info = calloc(1, sizeof(struct task_info)); + + if (!info) + return NULL; + + info->private_data = thread_private; + info->threadfn = threadfn; + info->postfn = postfn; + + return info; +} + +int task_start(struct task_info *info) +{ + int ret; + + if (!info) + return -1; + + if (!info->threadfn) + return -1; + + ret = pthread_create(&info->id, NULL, info->threadfn, +info->private_data); + + if (ret == 0) + pthread_detach(info->id); + else + info->id = -1; + + return ret; +} + +void task_stop(struct task_info *info) +{ + if (!info) + return; + + if (info->periodic.timer_fd) + close(info->periodic.timer_fd); + + if (info->id > 0) + pthread_cancel(info->id); + + if (info->postfn) + info->postfn(info->private_data); +} + +void task_deinit(struct task_info *info) +{ + if (!info) + return; + + free(info); +} + +int task_period_start(struct task_info *info, unsigned int period_ms) +{ + unsigned int ns; + unsigned int sec; + struct itimerspec itval; + + if (!info) + return -1; + + info->periodic.timer_fd = timerfd_create(CLOCK_MONOTONIC, 0); + if (info->periodic.timer_fd == -1) + return info->periodic.timer_fd; + + info->periodic.wakeups_missed = 0; + + sec = period_ms/1000; + ns = (period_ms - (sec * 1000)) * 1000; + itval.it_interval.tv_sec = sec; + itval.it_interval.tv_nsec = ns; + itval.it_value.tv_sec = sec; + itval.it_value.tv_nsec = ns; + + return timerfd_settime(info->periodic.timer_fd, 0, &itval, NULL); +}; + +void task_period_wait(struct task_info *info) +{ + unsigned long long missed; + int ret; + + if (!info) + return; + + ret = read(info->periodic.timer_fd, &missed, sizeof (missed)); + if (ret == -1) { + perror ("read timer"); + return; + } + + if (missed > 0) + info->periodic.wakeups_missed += (missed - 1); +} + +void task_period_stop(struct task_info *info) +{ + if (!info) + return; + + if (info->periodic.timer_fd) { + timerfd_settime(info->periodic.timer_fd, 0, NULL, NULL); + close(info->periodic.timer_fd); + } +} diff --git a/task-util.h b/task-util.h new file mode 100644 index 000..95f7b5b --- /dev/null +++ b/task-util.h @@ -0,0 +1,33 @@ + +#ifndef __PROGRESS_ +#define __PROGRESS_ + +#include + +struct periodic_info { + int timer_fd; + unsigned long long wakeups_missed; +}; + +struct task_info { + struct periodic_info periodic; + pthread_t id; + void *private_data; + void *(*threadfn)(void *); + int (*postfn)(void *); +}; + +/* task life cycle */ +struct task_info *task_init(void *(*threadfn)(void *), int (*postfn)(void *), + void *thread_private); +int task_start(struct task_info *info); +void task_stop(struct task_info *info); +void task_deinit(struct task_info *info); + +/* periodic life cycle */ +int task_period_start(struct task_info *info, unsigned int period_ms); +void task_period_wait(struct task_info *info); +void task_period_stop(struct task_info *info); + +#endif /* __PROGRESS_ */ + -- 2.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs-progs: convert: use task for progress indication of metadata creation
Signed-off-by: Silvio Fricke --- Documentation/btrfs-convert.txt | 2 ++ Makefile| 6 ++-- btrfs-convert.c | 64 + 3 files changed, 64 insertions(+), 8 deletions(-) diff --git a/Documentation/btrfs-convert.txt b/Documentation/btrfs-convert.txt index 555fb35..9cc326c 100644 --- a/Documentation/btrfs-convert.txt +++ b/Documentation/btrfs-convert.txt @@ -29,6 +29,8 @@ Roll back to ext2fs. set filesystem label during conversion. -L:: use label from the converted filesystem. +-p:: +Show progress of convertation. EXIT STATUS --- diff --git a/Makefile b/Makefile index 203597c..f76c6b2 100644 --- a/Makefile +++ b/Makefile @@ -8,7 +8,7 @@ AM_CFLAGS = -Wall -D_FILE_OFFSET_BITS=64 -DBTRFS_FLAT_INCLUDES -fno-strict-alias CFLAGS = -g -O1 -fno-strict-aliasing -rdynamic objects = ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o \ root-tree.o dir-item.o file-item.o inode-item.o inode-map.o \ - extent-cache.o extent_io.o volumes.o utils.o repair.o \ + extent-cache.o extent_io.o volumes.o utils.o repair.o task-util.o \ qgroup.o raid6.o free-space-cache.o list_sort.o props.o \ ulist.o qgroup-verify.o backref.o cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \ @@ -20,13 +20,13 @@ libbtrfs_objects = send-stream.o send-utils.o rbtree.o btrfs-list.o crc32c.o \ uuid-tree.o utils-lib.o rbtree-utils.o libbtrfs_headers = send-stream.h send-utils.h send.h rbtree.h btrfs-list.h \ crc32c.h list.h kerncompat.h radix-tree.h extent-cache.h \ - extent_io.h ioctl.h ctree.h btrfsck.h version.h + extent_io.h ioctl.h ctree.h btrfsck.h version.h task-util.h TESTS = fsck-tests.sh convert-tests.sh INSTALL = install prefix ?= /usr/local bindir = $(prefix)/bin -lib_LIBS = -luuid -lblkid -lm -lz -llzo2 -L. +lib_LIBS = -luuid -lblkid -lm -lz -llzo2 -L. -pthread libdir ?= $(prefix)/lib incdir = $(prefix)/include/btrfs LIBS = $(lib_LIBS) $(libs_static) diff --git a/btrfs-convert.c b/btrfs-convert.c index a544fc6..f8ba920 100644 --- a/btrfs-convert.c +++ b/btrfs-convert.c @@ -38,6 +38,7 @@ #include "transaction.h" #include "crc32c.h" #include "utils.h" +#include "task-util.h" #include #include #include @@ -45,6 +46,41 @@ #define INO_OFFSET (BTRFS_FIRST_FREE_OBJECTID - EXT2_ROOT_INO) #define EXT2_IMAGE_SUBVOL_OBJECTID BTRFS_FIRST_FREE_OBJECTID +struct private { + uint32_t max_copy_inodes; + uint32_t cur_copy_inodes; + struct task_info *info; +}; + +static void *print_copied_inodes(void *p) +{ + struct private *priv = p; + const char work_indicator[] = { '.', 'o', 'O', 'o' }; + uint32_t count = 0; + + task_period_start(priv->info, 1000 /* 1s */); + while (1) { + count++; + printf("copy inodes [%c] [%10d/%10d]\r", + work_indicator[count % 4], priv->cur_copy_inodes, + priv->max_copy_inodes); + fflush(stdout); + task_period_wait(priv->info); + } + + return NULL; +} + +static int after_copied_inodes(void *p) +{ + struct private *priv = p; + + printf("\n"); + task_period_stop(priv->info); + + return 0; +} + /* * Open Ext2fs in readonly mode, read block allocation bitmap and * inode bitmap into memory. @@ -1036,7 +1072,7 @@ fail: * scan ext2's inode bitmap and copy all used inodes. */ static int copy_inodes(struct btrfs_root *root, ext2_filsys ext2_fs, - int datacsum, int packing, int noxattr) + int datacsum, int packing, int noxattr, struct private *p) { int ret; errcode_t err; @@ -1068,6 +1104,7 @@ static int copy_inodes(struct btrfs_root *root, ext2_filsys ext2_fs, objectid, ext2_fs, ext2_ino, &ext2_inode, datacsum, packing, noxattr); + p->cur_copy_inodes++; if (ret) return ret; if (trans->blocks_used >= 4096) { @@ -2197,7 +2234,7 @@ err: } static int do_convert(const char *devname, int datacsum, int packing, int noxattr, - int copylabel, const char *fslabel) + int copylabel, const char *fslabel, int progress) { int i, ret; int fd = -1; @@ -2275,11 +2312,23 @@ static int do_convert(const char *devname, int datacsum, int packing, int noxatt goto fail; } printf("creating btrfs metadata.\n"); - ret = copy_inodes(root, ext2_fs, datacsum, packing, noxattr); + struct private p = { + .max_copy_inodes = (ext2_fs->super->s_inodes_count - ext2_fs->super->s_free_inodes_count), + .cur_copy_inodes = 0, + }; + if (progress) { +
Re: Quota question
-- Cyril SCETBON > 1) but is the rewrite in that, or in > 3.17, or in the coming 3.18, or not yet entirely there at all... that I > simply don't know. If anyone can answer it'd be great ! > > 2) What I *DO* know and *CAN* say with quite some confidence is this: > > With btrfs being in general a copy-on-write filesystem, even deletes > normally require /some/ space, because the metadata pointing at the file > in question must be rewritten, and that requires free space in ordered to > happen. ok > > At least in the case of full filesystem ENOSPC errors, the space-related > questions in the FAQ on the btrfs wiki suggest truncating a file first. > Find an existing file that either doesn't matter or that you have a > backup of elsewhere, and echo > filename (or echo >| filename if you have > noclobber set), therefore truncating the existing file. A few of those > and with luck you'll free enough space to do some actual deletions. > > While to some extent I'm guessing here since I'm not directly familiar > with quotas myself, it sounds as if such a severe ENOSPC condition, due > in your case to quotas instead of the general filesystem limits I'm more > familiar with, might be what you are seeing, particularly since in your > tests you first triggered the limit, then found a way to write a bit more > anyway, and only THEN found yourself unable to write anything. > > If that's the case, then try the truncate trick and see if it helps. Unfortunately : http://pastebin.com/1Mw0yiUc > > In the generic filesystem case, if the truncate trick doesn't help, > another possible trick is to temporarily add another device of several > gigs capacity to the filesystem, enough to let a balance work and > hopefully reduce the data/metadata imbalance, after which there should be > enough chunk-unallocated free space left on the original device, to do a > btrfs device delete of the temporary device, generally still leaving > space left, since the balance should have reclaimed some otherwise wasted > chunk allocation. > > I can't see how that'd apply to quotas, however, and suspect that the > corresponding quota-system solution would be to temporarily up the quota > for the affected user, only long enough to let them do the deletions they > need to do. But beyond that I can't go, since I simply don't know enough > about the quota domain to further speculate, at this point. I'm pretty sure it'd work by setting a temporary higher quota, but my questions are : - why can I exceed the quotas (580MB when 500MB was set as the limit) ? qgroupid exclmax_excl --- 1/101999 608632832 524288000 - I really think that the quota limit should take into account an amount of space for operations that would free space For this last question I may think that this issue is caused by the fact that I have only one file taking all the space which won't really be the regular use case. but i'd like to get an answer :) > > While admittedly limited due to my lack of quota-domain knowledge, > perhaps that sheds at least some light on what happened and why, and > hopefully on how to get out of the situation. Thank you Duncan !-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] Btrfs: make xattr replace operations atomic
Replacing a xattr consists of doing a lookup for its existing value, delete the current value from the respective leaf, release the search path and then finally insert the new value. This leaves a time window where readers (getxattr, listxattrs) won't see any value for the xattr. Xattrs are used to store ACLs, so this has security implications. This change also fixes 2 other existing issues which were: *) Deleting the old xattr value without verifying first if the new xattr will fit in the existing leaf item (in case multiple xattrs are packed in the same item due to name hash collision); *) Returning -EEXIST when the flag XATTR_CREATE is given and the xattr doesn't exist but we have have an existing item that packs muliple xattrs with the same name hash as the input xattr. In this case we should return ENOSPC. A test case for xfstests follows soon. Thanks to Alexandre Oliva for reporting the non-atomicity of the xattr replace implementation. Reported-by: Alexandre Oliva Signed-off-by: Filipe Manana --- V2: Added missing btrfs_mark_buffer_dirty() call for the case where a replace happens for an item with no other xattrs and the new xattr value size is the same as the old xattr value size; Made btrfs_search_slot not release the path if it returns EOVERFLOW (from split_leaf), so that we can get the current size of the item when doing the replace and extend the leaf to a size smaller then what we passed to btrfs_search_slot if the xattr exists in the item (and the new extend size is new_size - old_size, excluding xattr's name and sizeof struct btrfs_dir_item). V3: Made btrfs_search_slot not release the path on error logic more explicit and generic. V4: Added missing ENOSPC condition - if overflow detected, there's no matching dir_item for our xattr in the leaf item and there are no flags. Ensure a replace can't insert a new value. fs/btrfs/ctree.c| 2 +- fs/btrfs/ctree.h| 5 ++ fs/btrfs/dir-item.c | 10 ++-- fs/btrfs/xattr.c| 150 4 files changed, 102 insertions(+), 65 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 19bc616..8172341 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2939,7 +2939,7 @@ done: */ if (!p->leave_spinning) btrfs_set_path_blocking(p); - if (ret < 0) + if (ret < 0 && !p->skip_release_on_error) btrfs_release_path(p); return ret; } diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a5e471a..e3ab75e 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -607,6 +607,7 @@ struct btrfs_path { unsigned int leave_spinning:1; unsigned int search_commit_root:1; unsigned int need_commit_sem:1; + unsigned int skip_release_on_error:1; }; /* @@ -3687,6 +3688,10 @@ struct btrfs_dir_item *btrfs_lookup_xattr(struct btrfs_trans_handle *trans, int verify_dir_item(struct btrfs_root *root, struct extent_buffer *leaf, struct btrfs_dir_item *dir_item); +struct btrfs_dir_item *btrfs_match_dir_item_name(struct btrfs_root *root, +struct btrfs_path *path, +const char *name, +int name_len); /* orphan.c */ int btrfs_insert_orphan_item(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c index fc8df86..1752625 100644 --- a/fs/btrfs/dir-item.c +++ b/fs/btrfs/dir-item.c @@ -21,10 +21,6 @@ #include "hash.h" #include "transaction.h" -static struct btrfs_dir_item *btrfs_match_dir_item_name(struct btrfs_root *root, - struct btrfs_path *path, - const char *name, int name_len); - /* * insert a name into a directory, doing overflow properly if there is a hash * collision. data_size indicates how big the item inserted should be. On @@ -383,9 +379,9 @@ struct btrfs_dir_item *btrfs_lookup_xattr(struct btrfs_trans_handle *trans, * this walks through all the entries in a dir item and finds one * for a specific name. */ -static struct btrfs_dir_item *btrfs_match_dir_item_name(struct btrfs_root *root, - struct btrfs_path *path, - const char *name, int name_len) +struct btrfs_dir_item *btrfs_match_dir_item_name(struct btrfs_root *root, +struct btrfs_path *path, +const char *name, int name_len) { struct btrfs_dir_item *dir_item; unsigned long name_ptr; diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c index dcf2013..47b1946 100644 --- a/fs/btrfs/xattr.c +++ b/fs/btrfs/xattr.c @@ -29,6 +29,7 @@ #include "xattr.h" #include "disk-io.h" #include "props.h" +#include "locking.h" ssize_t __btrfs_getxatt