Re: btrfs btree_ctree_super fault
We have just encountered the same bug on 4.9.0-rc2. Any solution now? > kernel BUG at fs/btrfs/ctree.c:3172! > invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC > CPU: 0 PID: 22702 Comm: trinity-c40 Not tainted 4.9.0-rc4-think+ #1 > task: 8804ffde37c0 task.stack: c90002188000 > RIP: 0010:[] > [] btrfs_set_item_key_safe+0x179/0x190 [btrfs] > RSP: :c9000218b8a8 EFLAGS: 00010246 > RAX: RBX: 8804fddcf348 RCX: 1000 > RDX: RSI: c9000218b9ce RDI: c9000218b8c7 > RBP: c9000218b908 R08: 4000 R09: c9000218b8c8 > R10: R11: 0001 R12: c9000218b8b6 > R13: c9000218b9ce R14: 0001 R15: 880480684a88 > FS: 7f7c7f998b40() GS:88050780() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: CR3: 00044f15f000 CR4: 001406f0 > DR0: 7f4ce439d000 DR1: DR2: > DR3: DR6: 0ff0 DR7: 0600 > Stack: > 88050143 d305a00a2245 006c0002 0510 > 6c0002d3 1000 6427eebb 880480684a88 > 8804fddcf348 2000 > Call Trace: > [] __btrfs_drop_extents+0xb00/0xe30 [btrfs] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] generic: test concurrent non-overlapping direct I/O on the same extents
On Wed, Nov 16, 2016 at 04:29:34PM -0800, Omar Sandoval wrote: > From: Omar Sandoval > > There have been a couple of logic bugs in `btrfs_get_extent()` which > could lead to spurious -EEXIST errors from read or write. This test > exercises those conditions by having two threads race to add an extent > to the extent map. > > This is fixed by Linux commit 8dff9c853410 ("Btrfs: deal with duplciates > during extent_map insertion in btrfs_get_extent") and the patch "Btrfs: > deal with existing encompassing extent map in btrfs_get_extent()" > (http://marc.info/?l=linux-btrfs&m=147873402311143&w=2). > > Although the bug is Btrfs-specific, nothing about the test is. > > Signed-off-by: Omar Sandoval > --- [snip] > +# real QA test starts here > + > +_supported_fs generic > +_supported_os Linux > +_require_test > +_require_xfs_io_command "falloc" > +_require_test_program "dio-interleaved" > + > +extent_size="$(($(stat -f -c '%S' "$TEST_DIR") * 2))" There's a helper to get fs block size: "get_block_size". > +num_extents=1024 > +testfile="$TEST_DIR/$$-testfile" > + > +truncate -s 0 "$testfile" I prefer using xfs_io to do the truncate, $XFS_IO_PROG -fc "truncate 0" "$testfile" Because in rare cases truncate(1) may be unavailable, e.g. RHEL5, usually it's not a big issue, but xfs_io works all the time, we have a better way, so why not :) > +for ((off = 0; off < num_extents * extent_size; off += extent_size)); do > + xfs_io -c "falloc $off $extent_size" "$testfile" Use $XFS_IO_PROG not bare xfs_io. I can fix all the tiny issues at commit time. Thanks, Eryu > +done > + > +# To reproduce the Btrfs bug, the extent map must not be cached in memory. > +sync > +echo 3 > /proc/sys/vm/drop_caches > + > +"$here/src/dio-interleaved" "$extent_size" "$num_extents" "$testfile" > + > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/generic/390.out b/tests/generic/390.out > new file mode 100644 > index 000..3c7b405 > --- /dev/null > +++ b/tests/generic/390.out > @@ -0,0 +1,2 @@ > +QA output created by 390 > +Silence is golden > diff --git a/tests/generic/group b/tests/generic/group > index 08007d7..d137d01 100644 > --- a/tests/generic/group > +++ b/tests/generic/group > @@ -392,3 +392,4 @@ > 387 auto clone > 388 auto log metadata > 389 auto quick acl > +390 auto quick rw > -- > 2.10.2 > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: qgroup: fix error in ASSERT condition expression
Option -f, -F and --sort don't work because a conditional expression of ASSERT is wrong. Signed-off-by: Tsutomu Itoh --- qgroup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/qgroup.c b/qgroup.c index 9d10cb8..071d15e 100644 --- a/qgroup.c +++ b/qgroup.c @@ -480,7 +480,7 @@ int btrfs_qgroup_setup_comparer(struct btrfs_qgroup_comparer_set **comp_set, *comp_set = set; } - ASSERT(set->comps[set->ncomps].comp_func != NULL); + ASSERT(set->comps[set->ncomps].comp_func == NULL); set->comps[set->ncomps].comp_func = all_comp_funcs[comparer]; set->comps[set->ncomps].is_descending = is_descending; @@ -847,7 +847,7 @@ int btrfs_qgroup_setup_filter(struct btrfs_qgroup_filter_set **filter_set, *filter_set = set; } - ASSERT(set->filters[set->nfilters].filter_func != NULL); + ASSERT(set->filters[set->nfilters].filter_func == NULL); set->filters[set->nfilters].filter_func = all_filter_funcs[filter]; set->filters[set->nfilters].data = data; set->nfilters++; -- 2.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Wed, Nov 16, 2016 at 11:24:33PM +0100, Niccolò Belli wrote: > On martedì 15 novembre 2016 18:52:01 CET, Zygo Blaxell wrote: > >Like I said, millions of extents per week... > > > >64K is an enormous dedup block size, especially if it comes with a 64K > >alignment constraint as well. > > > >These are the top ten duplicate block sizes from a sample of 95251 > >dedup ops on a medium-sized production server with 4TB of filesystem > >(about one machine-day of data): > > Which software do you use to dedupe your data? I tried duperemove but it > gets killed by the OOM killer because it triggers some kind of memory leak: > https://github.com/markfasheh/duperemove/issues/163 Duperemove does use a lot of memory, but the logs at that URL only show 2G of RAM in duperemove--not nearly enough to trigger OOM under normal conditions on an 8G machine. There's another process with 6G of virtual address space (although much less than that resident) that looks more interesting (i.e. duperemove might just be the victim of some interaction between baloo_file and the OOM killer). On the other hand, the logs also show kernel 4.8. 100% of my test machines failed to finish booting before they were cut down by OOM on 4.7.x kernels. The same problem occurs on early kernels in the 4.8.x series. I am having good results with 4.8.6 and later, but you should be aware that significant changes have been made to the way OOM works in these kernel versions, and maybe you're hitting a regression for your use case. > Niccolò Belli > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: Digital signature
[PATCH] fstests: Introduce check for explicit SHARED extent flag reporting
For fs support reflink, some of them (OK, btrfs again) doesn't split SHARED flag for extent fiemap reporting. For example: 0 4K 8K / File1: Extent 0 \ /\ |<- On disk Extent-->| |/ | File2 / Extent: 0 Fs supports explicit SHARED extent reporting should report fiemap like: File1: 2 extents Extent 0-4K: SHARED Extent 4-8K: File2: 1 extents Extent 0-4K: SHARED Fs doesn't support explicit reporting will report fiemap like: File1: 1 extent Extent 0-8K: SHARED File2: 1 extent Extent 0-4K: SHARED Test case like generic/372 require explicit reporting will cause false alert on btrfs. Add such runtime check for that requirememt. Signed-off-by: Qu Wenruo --- common/reflink| 44 tests/generic/372 | 1 + 2 files changed, 45 insertions(+) diff --git a/common/reflink b/common/reflink index 8b34046..9ada2e8 100644 --- a/common/reflink +++ b/common/reflink @@ -78,6 +78,50 @@ _require_scratch_reflink() _scratch_unmount } +# this test requires scratch fs to report explicit SHARED flag +# e.g. +# 0 4K 8K +#/ File1: Extent 0 \ +# /\ +# |<- On disk Extent-->| +# |/ +# | File2 / +# Extent: 0 +# Fs supports explicit SHARED extent reporting should report fiemap like: +# File1: 2 extents +# Extent 0-4K: SHARED +# Extent 4-8K: +# File2: 1 extents +# Extent 0-4K: SHARED +# +# Fs doesn't support explicit reporting will report fiemap like: +# File1: 1 extent +# Extent 0-8K: SHARED +# File2: 1 extent +# Extent 0-4K: SHARED +_require_scratch_explicit_shared_extents() +{ + _require_scratch + _require_fiemap + _require_scratch_reflink + _require_xfs_io_command "reflink" + local nr_extents + + _scratch_mkfs > /dev/null + _scratch_mount + + _pwrite_byte 0x61 0 128k $SCRATCH_MNT/file1 + _reflink_range $SCRATCH_MNT/file1 0 $SCRATCH_MNT/file2 0 64k + + _scratch_cycle_mount + + nr_extents=$(_count_extents $SCRATCH_MNT/file1) + if [ $nr_extents -eq 1 ]; then + _notrun "Explicit SHARED flag reporting not support by filesystem type: $FSTYP" + fi + _scratch_unmount +} + # this test requires the test fs support dedupe... _require_test_dedupe() { diff --git a/tests/generic/372 b/tests/generic/372 index 31dff20..51a3eca 100755 --- a/tests/generic/372 +++ b/tests/generic/372 @@ -47,6 +47,7 @@ _supported_os Linux _supported_fs generic _require_scratch_reflink _require_fiemap +_require_scratch_explicit_shared_extents echo "Format and mount" _scratch_mkfs > $seqres.full 2>&1 -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Heatmap - v2 - block group internals!
At 11/17/2016 04:30 AM, Hans van Kranenburg wrote: In the last two days I've added the --blockgroup option to btrfs heatmap to let it create pictures of block group internals. Examples and more instructions are to be found in the README at: https://github.com/knorrie/btrfs-heatmap/blob/master/README.md To use the new functionality it needs a fairly recent python-btrfs for the 'skinny' METADATA_ITEM_KEY to be present. Latest python-btrfs release is v0.3, created yesterday. Yay, Wow, really cool! I always dream about a visualizing tool to represent the chunk and extent level of btrfs. This should really save me from reading the boring dec numbers from btrfs-debug-tree. Although IMHO the full fs output is mixing extent and chunk level together, which makes it a little hard to represent multi-device case, it's still an awesome tool! And considering the "show-block" tool in btrfs-progs is quite old, I think if the tool get further polished it may have a chance get into btrfs-progs. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: Block btrfs from test case generic/372
At 11/17/2016 05:12 AM, Dave Chinner wrote: (Did you forget to cc fste...@vger.kernel.org?) On Tue, Nov 15, 2016 at 04:13:32PM +0800, Qu Wenruo wrote: Since btrfs always return the whole extent even part of it is shared with other files, so the hole/extent counts differs for "file1" in this test case. For example: /-- File 1 Extent 0-\ / \ |<--Extent A-->| \ / \ / \ File 2/\ File 2/ Ext 0~4KExt 64k~68K In that case, fiemap on File 1 will only return 1 large extent A with SHARED flag. While XFS will split it into 3 extents, first and last 4K with SHARED flag while the rest without SHARED flag. fiemap should behave the same across all filesystems if at all possible. This test failure indicates btrfs doesn't report an accurate representation of shared extents which, IMO, is a btrfs issue that needs fixing, not a test problem Regardless of this Considering only btrfs implements CoW using extent booking mechanism(*), it does affect a lot of behavior, from such SHARED flag representing to hole punching behavior. I hope there is a well documented standard on what ever flag means and how it should be represented. While I'm not quite sure if it's worthy for btrfs to modify the represent. Even btrfs can report it as "SHARED" "NON-SHARED" "SHARED" for File 1, hole punching the "NON_SHARED" range won't free any space. (Which I assume it differs from xfs, and that's making things confusing) This makes the test case meaningless as btrfs doesn't follow such assumption. So black list btrfs for this test case to avoid false alert. ... we are not going to add ad-hoc filesystem blacklists for random tests. Adding "blacklists" without any explanation of why something has been blacklisted is simply a bad practice. We use _require rules to specifically document what functionality is required for the test and check that it provided. i.e. this: _require_explicit_shared_extents() { if [ $FSTYP == "btrfs" ]; then _not_run "btrfs can't report accurate shared extent ranges in fiemap" fi } Right, this is much more helpful than the blabla I wrote in commit message. Although I'd prefer to detect it at runtime other than just checking the fs type. Maybe one day btrfs will support it. (Although we should solve the above mentioned behavior difference first) documents /exactly/ why this test is not run on btrfs. And, quite frankly, while this is /better/ it still ignores the fact we have functions like _within_tolerance for allowing a range of result values to be considered valid rather than just a fixed value. IOWs, changing the check of the extent count of file 1 post reflink to use a _within_tolerance range would mean the test would validate file1 on all reflink supporting filesystems and we don't need to exclude btrfs at all... I really agree on this idea, although for me the difference is too big. For file 1, xfs reports 5 extents, while btrfs only reports 1. If using _within_tolerance to cover the range, and one day some mysterious xfs bug(OK, I don't really believe it will happen, since it's xfs, not btrfs) makes it report 4 extents. Or one btrfs bug(on the other hand, quite possible) makes btrfs report 2 extents, then we can't detect the bug either. So I'd prefer the _require_explicit_shared_extents() method. Thanks, Qu Cheers, Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: deal with existing encompassing extent map in btrfs_get_extent()
On Thu, Nov 10, 2016 at 02:45:36PM -0800, Omar Sandoval wrote: > On Thu, Nov 10, 2016 at 02:38:14PM -0800, Liu Bo wrote: > > On Thu, Nov 10, 2016 at 12:24:13PM -0800, Omar Sandoval wrote: > > > On Thu, Nov 10, 2016 at 12:09:06PM -0800, Omar Sandoval wrote: > > > > On Thu, Nov 10, 2016 at 12:01:20PM -0800, Liu Bo wrote: > > > > > On Wed, Nov 09, 2016 at 03:26:50PM -0800, Omar Sandoval wrote: > > > > > > From: Omar Sandoval > > > > > > > > > > > > My QEMU VM was seeing inexplicable I/O errors that I tracked down to > > > > > > errors coming from the qcow2 virtual drive in the host system. The > > > > > > qcow2 > > > > > > file is a nocow file on my Btrfs drive, which QEMU opens with > > > > > > O_DIRECT. > > > > > > Every once in awhile, pread() or pwrite() would return EEXIST, which > > > > > > makes no sense. This turned out to be a bug in btrfs_get_extent(). > > > > > > > > > > > > Commit 8dff9c853410 ("Btrfs: deal with duplciates during extent_map > > > > > > insertion in btrfs_get_extent") fixed a case in btrfs_get_extent() > > > > > > where > > > > > > two threads race on adding the same extent map to an inode's extent > > > > > > map > > > > > > tree. However, if the added em is merged with an adjacent em in the > > > > > > extent tree, then we'll end up with an existing extent that is not > > > > > > identical to but instead encompasses the extent we tried to add. > > > > > > When we > > > > > > call merge_extent_mapping() to find the nonoverlapping part of the > > > > > > new > > > > > > em, the arithmetic overflows because there is no such thing. We > > > > > > then end > > > > > > up trying to add a bogus em to the em_tree, which results in a > > > > > > EEXIST > > > > > > that can bubble all the way up to userspace. > > > > > > > > > > I don't get how this could happen(even after reading Commit > > > > > 8dff9c853410), btrfs_get_extent in direct_IO is protected by > > > > > lock_extent_direct, the assumption is that a racy thread should be > > > > > blocked by lock_extent_direct and when it gets the lock, it finds the > > > > > just-inserted em when going into btrfs_get_extent if its offset is > > > > > within [em->start, extent_map_end(em)]. > > > > > > > > > > I think we may also need to figure out why the above doesn't work as > > > > > expected besides fixing another special case. > > > > > > > > > > Thanks, > > > > > > > > > > -liubo > > > > > > > > lock_extent_direct() only protects the range you're doing I/O into, not > > > > the entire extent. If two threads are doing two non-overlapping reads in > > > > the same extent, then you can get this race. > > > > > > More concretely, assume the extent tree on disk has: > > > > > > +-+---+ > > > |start=0,len=8192,bytenr=0|start=8192,len=8192,bytenr=8192| > > > +-+---+ > > > > > > And the extent map tree in memory has a single em cached for the second > > > extent {start=8192, len=8192, bytenr=8192}. Then, two threads try do do > > > direct I/O reads: > > > > > > Thread 1 | Thread 2 > > > ---+--- > > > pread(offset=0, nbyte=4096)| pread(offset=4096, nbyte=4096) > > > lock_extent_direct(start=0, end=4095) | lock_extent_direct(start=4096, > > > end=8191) > > > btrfs_get_extent(start=0, len=4096)| btrfs_get_extent(start=4096, > > > len4096) > > > lookup_extent_mapping() = NULL | lookup_extent_mapping() = NULL > > > reads extent from B-tree | reads extent from B-tree > > >| write_lock(&em_tree->lock) > > > | add_extent_mapping(start=0, > > > len=8192, bytenr=0) > > > | try_merge_map() > > > | em_tree now has {start=0, > > > len=16384, bytenr=0} > > > | write_unlock(&em_tree->lock) > > > write_lock(&em_tree->lock) | > > > add_extent_mapping(start=0, len=8192, | > > >bytenr=0) = -EEXIST | > > > search_extent_mapping() = {start=0,| > > >len=16384, | > > > bytenr=0} | > > > merge_extent_mapping() does bogus math | > > > and overflows, returns EEXIST | > > > > Yeah, so much fun. > > > > The problem is that we lock and request [0, 4096], but we insert a em of > > [0, 8192] instead. So if we insert a [0, 4096] em, then we can make > > sure that the em returned by btrfs_get_extent is protected from race by > > the range of lock_extent_direct. > > > > I'll give it a shot and do some testing. > > > > For this patch, > > > > Reviewed-by: Liu Bo > > Thank you! > > > Would you please make a reproducer for fstests? > > Sure. Trying to trigger this with xfs_io never works because it's such a > narrow race window, but
[PATCH] generic: test concurrent non-overlapping direct I/O on the same extents
From: Omar Sandoval There have been a couple of logic bugs in `btrfs_get_extent()` which could lead to spurious -EEXIST errors from read or write. This test exercises those conditions by having two threads race to add an extent to the extent map. This is fixed by Linux commit 8dff9c853410 ("Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent") and the patch "Btrfs: deal with existing encompassing extent map in btrfs_get_extent()" (http://marc.info/?l=linux-btrfs&m=147873402311143&w=2). Although the bug is Btrfs-specific, nothing about the test is. Signed-off-by: Omar Sandoval --- .gitignore| 1 + src/Makefile | 2 +- src/dio-interleaved.c | 98 +++ tests/generic/390 | 76 +++ tests/generic/390.out | 2 ++ tests/generic/group | 1 + 6 files changed, 179 insertions(+), 1 deletion(-) create mode 100644 src/dio-interleaved.c create mode 100755 tests/generic/390 create mode 100644 tests/generic/390.out diff --git a/.gitignore b/.gitignore index 915d2d8..b8d13a0 100644 --- a/.gitignore +++ b/.gitignore @@ -44,6 +44,7 @@ /src/bulkstat_unlink_test_modified /src/dbtest /src/devzero +/src/dio-interleaved /src/dirperf /src/dirstress /src/dmiperf diff --git a/src/Makefile b/src/Makefile index dd51216..4056496 100644 --- a/src/Makefile +++ b/src/Makefile @@ -21,7 +21,7 @@ LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \ stale_handle pwrite_mmap_blocked t_dir_offset2 seek_sanity_test \ seek_copy_test t_readdir_1 t_readdir_2 fsync-tester nsexec cloner \ renameat2 t_getcwd e4compact test-nextquota punch-alternating \ - attr-list-by-handle-cursor-test listxattr + attr-list-by-handle-cursor-test listxattr dio-interleaved SUBDIRS = diff --git a/src/dio-interleaved.c b/src/dio-interleaved.c new file mode 100644 index 000..831a191 --- /dev/null +++ b/src/dio-interleaved.c @@ -0,0 +1,98 @@ +#ifndef _GNU_SOURCE +#define _GNU_SOURCE +#endif +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static pthread_barrier_t barrier; + +static unsigned long extent_size; +static unsigned long num_extents; + +struct dio_thread_data { + int fd; + int thread_id; +}; + +static void *dio_thread(void *arg) +{ + struct dio_thread_data *data = arg; + off_t off; + ssize_t ret; + void *buf; + + if ((errno = posix_memalign(&buf, extent_size / 2, extent_size / 2))) { + perror("malloc"); + return NULL; + } + memset(buf, 0, extent_size / 2); + + off = (num_extents - 1) * extent_size; + if (data->thread_id) + off += extent_size / 2; + while (off >= 0) { + pthread_barrier_wait(&barrier); + + ret = pread(data->fd, buf, extent_size / 2, off); + if (ret == -1) + perror("pread"); + + off -= extent_size; + } + + free(buf); + return NULL; +} + +int main(int argc, char **argv) +{ + struct dio_thread_data data[2]; + pthread_t thread; + int fd; + + if (argc != 4) { + fprintf(stderr, "usage: %s SECTORSIZE NUM_EXTENTS PATH\n", + argv[0]); + return EXIT_FAILURE; + } + + extent_size = strtoul(argv[1], NULL, 0); + num_extents = strtoul(argv[2], NULL, 0); + + errno = pthread_barrier_init(&barrier, NULL, 2); + if (errno) { + perror("pthread_barrier_init"); + return EXIT_FAILURE; + } + + fd = open(argv[3], O_RDONLY | O_DIRECT); + if (fd == -1) { + perror("open"); + return EXIT_FAILURE; + } + + data[0].fd = fd; + data[0].thread_id = 0; + errno = pthread_create(&thread, NULL, dio_thread, &data[0]); + if (errno) { + perror("pthread_create"); + close(fd); + return EXIT_FAILURE; + } + + data[1].fd = fd; + data[1].thread_id = 1; + dio_thread(&data[1]); + + pthread_join(thread, NULL); + + close(fd); + return EXIT_SUCCESS; +} diff --git a/tests/generic/390 b/tests/generic/390 new file mode 100755 index 000..0ef6537 --- /dev/null +++ b/tests/generic/390 @@ -0,0 +1,76 @@ +#! /bin/bash +# FS QA Test 390 +# +# Test two threads doing non-overlapping direct I/O in the same extents. +# Motivated by a bug in Btrfs' direct I/O get_block function which would lead +# to spurious -EEXIST failures from direct I/O reads. +# +#--- +# Copyright (c) 2016 Facebook. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundat
Re: Announcing btrfs-dedupe
On martedì 15 novembre 2016 18:52:01 CET, Zygo Blaxell wrote: Like I said, millions of extents per week... 64K is an enormous dedup block size, especially if it comes with a 64K alignment constraint as well. These are the top ten duplicate block sizes from a sample of 95251 dedup ops on a medium-sized production server with 4TB of filesystem (about one machine-day of data): Which software do you use to dedupe your data? I tried duperemove but it gets killed by the OOM killer because it triggers some kind of memory leak: https://github.com/markfasheh/duperemove/issues/163 Niccolò Belli -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: Block btrfs from test case generic/372
(Did you forget to cc fste...@vger.kernel.org?) On Tue, Nov 15, 2016 at 04:13:32PM +0800, Qu Wenruo wrote: > Since btrfs always return the whole extent even part of it is shared > with other files, so the hole/extent counts differs for "file1" in this > test case. > > For example: > > /-- File 1 Extent 0-\ > / \ > |<--Extent A-->| > \ / \ / > \ File 2/\ File 2/ >Ext 0~4KExt 64k~68K > > In that case, fiemap on File 1 will only return 1 large extent A with > SHARED flag. > While XFS will split it into 3 extents, first and last 4K with SHARED > flag while the rest without SHARED flag. fiemap should behave the same across all filesystems if at all possible. This test failure indicates btrfs doesn't report an accurate representation of shared extents which, IMO, is a btrfs issue that needs fixing, not a test problem Regardless of this > This makes the test case meaningless as btrfs doesn't follow such > assumption. > So black list btrfs for this test case to avoid false alert. ... we are not going to add ad-hoc filesystem blacklists for random tests. Adding "blacklists" without any explanation of why something has been blacklisted is simply a bad practice. We use _require rules to specifically document what functionality is required for the test and check that it provided. i.e. this: _require_explicit_shared_extents() { if [ $FSTYP == "btrfs" ]; then _not_run "btrfs can't report accurate shared extent ranges in fiemap" fi } documents /exactly/ why this test is not run on btrfs. And, quite frankly, while this is /better/ it still ignores the fact we have functions like _within_tolerance for allowing a range of result values to be considered valid rather than just a fixed value. IOWs, changing the check of the extent count of file 1 post reflink to use a _within_tolerance range would mean the test would validate file1 on all reflink supporting filesystems and we don't need to exclude btrfs at all... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: check: fix missing newlines
From: Omar Sandoval Also, the other progress messages go to stderr, so "checking extents" probably should, as well. Fixes: c7a1f66a205f ("btrfs-progs: check: switch some messages to common helpers") Signed-off-by: Omar Sandoval --- As a side note, it seems almost completely random whether we print to stdout or stderr for any given message. That could probably use some cleaning up for consistency. A quick run of e2fsck indicated that it prints almost everything on stdout except for usage and administrative problems. xfs_repair just seems to put everything in stderr. I personally like the e2fsck approach. Anyone have any preference? cmds-check.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index 57c4300..3fb3bd7 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -11467,13 +11467,13 @@ int cmd_check(int argc, char **argv) } if (!ctx.progress_enabled) - printf("checking extents"); + fprintf(stderr, "checking extents\n"); if (check_mode == CHECK_MODE_LOWMEM) ret = check_chunks_and_extents_v2(root); else ret = check_chunks_and_extents(root); if (ret) - printf("Errors found in extent allocation tree or chunk allocation"); + error("errors found in extent allocation tree or chunk allocation"); ret = repair_root_items(info); if (ret < 0) -- 2.10.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Send/receive snapshot from/between backup
On 11/02/2016 05:13 PM, Piotr Pawłow wrote: > On 02.11.2016 15:23, René Bühlmann wrote: >> Origin: S2 S3 >> >> USB: S1 S2 >> >> SSH: S1 >> >> Transferring S3 to USB is no problem as S2 is on both btrfs drives. But >> how can I transfer S3 to SSH? > If I understand correctly how send / receive works, for the incremental > receive to work there must be a subvolume on the destination which has > "received uuid" equal to the uuid of parent choosen for the incremental > send. > >> I tried to transfer... >> >> 1. S3 from Origin to SSH -> does not work as there is no common snapshot. >> >> 2. S2 from USB to SSH -> did not work. > The "received uuid" of S1 on SSH is the uuid S1 had on Origin. The uuid > of S1 on USB is different, so when choosen as parent for the incremental > send it doesn't match. > >> 3. S1 from USB to Origin (such that there is a common snapshot with SSH) >> -> did not work. > There are no previously received subvolumes on Origin at all, so it > isn't going to work. > >> Is it correct that 1. would work if a common snapshot is present on >> Origin and SSH? > If there was a snapshot received from Origin that still exists on > Origin, then yes, you could use it as a clone source for incremental send. > >> Is it expected that 2. and 3. do not work? >> >> Is there some other way to achieve it? > I doubt you can do it without some "hacking" to fool btrfs receive. > > You would need a tool that can issue BTRFS_IOC_SET_RECEIVED_SUBVOL ioctl > to change the received uuid. Then you could: > > 1. Change received uuid of S1 on SSH to match S1 uuid on USB. > 2. Send incremental S1-S2 from USB to SSH. > 3. Change received uuid of S2 on SSH to match S2 on Origin. > 4. Send incremental S2-S3 from Origin to SSH. > > Regards > Thanks for all the input, I did successfully try this approach, could change the "received uuid" and then transfer a snapshot from a different source. So far so good. But: Due to a lot of errors during btrfs check on SSH, I decided to recreate the BTRFS filesystem on SSH still with the goal to not transfer all the data over the network. These were the steps: 1. Create a new btrfs (calling it SSH') 2. Full transfer S1 from SSH to SSH' 3. Incremental transfer S2 from USB to SSH' (S1 as parent) 4. Incremental transfer S3 from Origin to SSH' (S2 as parent) 5. Btrfs check SSH' 6. Used rsync (with checksum-diff) to verify that S3 on Origin and SSH' contain the same files. Step 2 did work and beside of a single checksum error on SSH the transfer completed without errors. Step 3 and 4 did work as well and surprisingly, I did not even had to update the "received uuid". They cant be full transfers as that would have taken months with my bandwidth. How can this be? Step 5 did not return any errors Step 6 did find a single file differing which is due to the checksum error on step 2. So, everything seems to be fine now, I just do not understand why this did work without any updating of the UUID. Do you have an explanation for that? In any case, thanks for your help. René -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs Heatmap - v2 - block group internals!
In the last two days I've added the --blockgroup option to btrfs heatmap to let it create pictures of block group internals. Examples and more instructions are to be found in the README at: https://github.com/knorrie/btrfs-heatmap/blob/master/README.md To use the new functionality it needs a fairly recent python-btrfs for the 'skinny' METADATA_ITEM_KEY to be present. Latest python-btrfs release is v0.3, created yesterday. Yay, -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] fstests: generic/098 update test for truncating a file into the middle of a hole
This updates generic/098 by adding a sync option, i.e. 'sync' after the second write, and with btrfs's NO_HOLES, we could still get wrong isize after remount. This gets fixed by the patch 'Btrfs: fix truncate down when no_holes feature is enabled' Signed-off-by: Liu Bo --- v2: use 'local' for local variable and add comments for 'sync' option. tests/generic/098 | 60 +-- tests/generic/098.out | 10 + 2 files changed, 49 insertions(+), 21 deletions(-) diff --git a/tests/generic/098 b/tests/generic/098 index 838bb5d..8ab0ad4 100755 --- a/tests/generic/098 +++ b/tests/generic/098 @@ -64,27 +64,45 @@ rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount -# Create our test file with some data and durably persist it. -$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 128K" $SCRATCH_MNT/foo | _filter_xfs_io -sync - -# Append some data to the file, increasing its size, and leave a hole between -# the old size and the start offset if the following write. So our file gets -# a hole in the range [128Kb, 256Kb[. -$XFS_IO_PROG -c "pwrite -S 0xbb 256K 32K" $SCRATCH_MNT/foo | _filter_xfs_io - -# Now truncate our file to a smaller size that is in the middle of the hole we -# previously created. On most truncate implementations the data we appended -# before gets discarded from memory (with truncate_setsize()) and never ends -# up being written to disk. -$XFS_IO_PROG -c "truncate 160K" $SCRATCH_MNT/foo - -_scratch_cycle_mount - -# We expect to see a file with a size of 160Kb, with the first 128Kb of data all -# having the value 0xaa and the remaining 32Kb of data all having the value 0x00 -echo "File content after remount:" -od -t x1 $SCRATCH_MNT/foo +workout() +{ + local need_sync=$1 + + # Create our test file with some data and durably persist it. + $XFS_IO_PROG -t -f -c "pwrite -S 0xaa 0 128K" $SCRATCH_MNT/foo | _filter_xfs_io + sync + + # Append some data to the file, increasing its size, and leave a hole between + # the old size and the start offset if the following write. So our file gets + # a hole in the range [128Kb, 256Kb[. + $XFS_IO_PROG -c "pwrite -S 0xbb 256K 32K" $SCRATCH_MNT/foo | _filter_xfs_io + + # This 'sync' is to flush file extent on disk and update on-disk inode size. + # This is required to trigger a bug in btrfs truncate where it updates on-disk + # inode size incorrectly. + if [ $need_sync -eq 1 ]; then + sync + fi + + # Now truncate our file to a smaller size that is in the middle of the hole we + # previously created. + # If we don't flush dirty page cache above, on most truncate + # implementations the data we appended before gets discarded from + # memory (with truncate_setsize()) and never ends up being written to + # disk. + $XFS_IO_PROG -c "truncate 160K" $SCRATCH_MNT/foo + + _scratch_cycle_mount + + # We expect to see a file with a size of 160Kb, with the first 128Kb of data all + # having the value 0xaa and the remaining 32Kb of data all having the value 0x00 + echo "File content after remount:" + od -t x1 $SCRATCH_MNT/foo +} + +workout 0 +# flush after each write +workout 1 status=0 exit diff --git a/tests/generic/098.out b/tests/generic/098.out index 37415ee..f87f046 100644 --- a/tests/generic/098.out +++ b/tests/generic/098.out @@ -9,3 +9,13 @@ File content after remount: 040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 050 +wrote 131072/131072 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 32768/32768 bytes at offset 262144 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +File content after remount: +000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa +* +040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +* +050 -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Клиентские базы Skype: prodawez390 Whatsapp: +79139230330 Viber: +79139230330 Telegram: +79139230330 Email: prodawez...@gmail.com
Клиентские базы Skype: prodawez390 Whatsapp: +79139230330 Viber: +79139230330 Telegram: +79139230330 Email: prodawez...@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: generic/098 update test for truncating a file into the middle of a hole
On Tue, Nov 15, 2016 at 02:53:12PM +0800, Eryu Guan wrote: > On Fri, Nov 11, 2016 at 02:30:04PM -0800, Liu Bo wrote: > > This updates generic/098 by adding a sync option, i.e. 'sync' after the > > second > > write, and with btrfs's NO_HOLES, we could still get wrong isize after > > remount. > > > > This gets fixed by the patch > > > > 'Btrfs: fix truncate down when no_holes feature is enabled' > > > > Signed-off-by: Liu Bo > > Looks good to me, just some nitpicks inline :) > > > --- > > tests/generic/098 | 57 > > --- > > tests/generic/098.out | 10 + > > 2 files changed, 46 insertions(+), 21 deletions(-) > > > > diff --git a/tests/generic/098 b/tests/generic/098 > > index 838bb5d..3b89939 100755 > > --- a/tests/generic/098 > > +++ b/tests/generic/098 > > @@ -64,27 +64,42 @@ rm -f $seqres.full > > _scratch_mkfs >>$seqres.full 2>&1 > > _scratch_mount > > > > -# Create our test file with some data and durably persist it. > > -$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 128K" $SCRATCH_MNT/foo | > > _filter_xfs_io > > -sync > > - > > -# Append some data to the file, increasing its size, and leave a hole > > between > > -# the old size and the start offset if the following write. So our file > > gets > > -# a hole in the range [128Kb, 256Kb[. > > -$XFS_IO_PROG -c "pwrite -S 0xbb 256K 32K" $SCRATCH_MNT/foo | _filter_xfs_io > > - > > -# Now truncate our file to a smaller size that is in the middle of the > > hole we > > -# previously created. On most truncate implementations the data we appended > > -# before gets discarded from memory (with truncate_setsize()) and never > > ends > > -# up being written to disk. > > -$XFS_IO_PROG -c "truncate 160K" $SCRATCH_MNT/foo > > - > > -_scratch_cycle_mount > > - > > -# We expect to see a file with a size of 160Kb, with the first 128Kb of > > data all > > -# having the value 0xaa and the remaining 32Kb of data all having the > > value 0x00 > > -echo "File content after remount:" > > -od -t x1 $SCRATCH_MNT/foo > > +workout() > > +{ > > + NEED_SYNC=$1 > > Use "local" to declare this var, and in lower case. Usually we use upper > case for global variables. OK. > > > + > > + # Create our test file with some data and durably persist it. > > + $XFS_IO_PROG -t -f -c "pwrite -S 0xaa 0 128K" $SCRATCH_MNT/foo | > > _filter_xfs_io > > + sync > > + > > + # Append some data to the file, increasing its size, and leave a hole > > between > > + # the old size and the start offset if the following write. So our file > > gets > > + # a hole in the range [128Kb, 256Kb[. > > + $XFS_IO_PROG -c "pwrite -S 0xbb 256K 32K" $SCRATCH_MNT/foo | > > _filter_xfs_io > > + > > + if [ $NEED_SYNC -eq 1 ]; then > > + sync > > + fi > > Good to see some comments to explain why we need this to test > with/without sync case. Sure, will fix in v2. Thanks, -liubo > > Thanks, > Eryu > > > + > > + # Now truncate our file to a smaller size that is in the middle of the > > hole we > > + # previously created. > > + # If we don't flush dirty page cache above, on most truncate > > + # implementations the data we appended before gets discarded from > > + # memory (with truncate_setsize()) and never ends up being written to > > + # disk. > > + $XFS_IO_PROG -c "truncate 160K" $SCRATCH_MNT/foo > > + > > + _scratch_cycle_mount > > + > > + # We expect to see a file with a size of 160Kb, with the first 128Kb of > > data all > > + # having the value 0xaa and the remaining 32Kb of data all having the > > value 0x00 > > + echo "File content after remount:" > > + od -t x1 $SCRATCH_MNT/foo > > +} > > + > > +workout 0 > > +# flush after each write > > +workout 1 > > > > status=0 > > exit > > diff --git a/tests/generic/098.out b/tests/generic/098.out > > index 37415ee..f87f046 100644 > > --- a/tests/generic/098.out > > +++ b/tests/generic/098.out > > @@ -9,3 +9,13 @@ File content after remount: > > 040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > * > > 050 > > +wrote 131072/131072 bytes at offset 0 > > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > +wrote 32768/32768 bytes at offset 262144 > > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > +File content after remount: > > +000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > > +* > > +040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > +* > > +050 > > -- > > 2.5.0 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe fstests" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
Am Mittwoch, 16. November 2016, 07:57:08 CET schrieb Austin S. Hemmelgarn: > On 2016-11-16 06:04, Martin Steigerwald wrote: > > Am Mittwoch, 16. November 2016, 16:00:31 CET schrieb Roman Mamedov: > >> On Wed, 16 Nov 2016 11:55:32 +0100 > >> > >> Martin Steigerwald wrote: […] > > As there seems to be no force option to override the limitation and I > > do not feel like compiling my own btrfs-tools right now, I will use rsync > > instead. > > In a case like this, I'd trust rsync more than send/receive. The > following rsync switches might also be of interest: > -a: This turns on a bunch of things almost everyone wants when using > rsync, similar to the same switch for cp, just with even more added in. > -H: This recreates hardlinks on the receiving end. > -S: This recreates sparse files. > -A: This copies POSIX ACL's > -X: This copies extended attributes (most of them at least, there are a > few that can't be arbitrarily written to). > Pre-creating the subvolumes by hand combined with using all of those > will get you almost everything covered by send/receive except for > sharing of extents and ctime. I usually use rsync -aAHXSP already :). I was able to rsync any relevant data of the disk which is now being deleted by shred command. Thank you, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] btrfs: change btrfs_csum_final result param type to u8
On Mon, Oct 31, 2016 at 05:47:24PM +0100, David Sterba wrote: > On Thu, Oct 27, 2016 at 08:52:33AM +0100, Domagoj Tršan wrote: > > csum member of struct btrfs_super_block has array type of u8. It makes sense > > that function btrfs_csum_final should be also declared to accept u8 *. I > > changed the declaration of method void btrfs_csum_final(u32 crc, char > > *result); > > to void btrfs_csum_final(u32 crc, u8 *result); > > Sorry, I've noticed it just now, several callers of btrfs_csum_final > cast the 2nd argument to (char *), which gets change to u8. Can you > please fix the callers? Thanks. Done and committed. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: don't poke into bio internals
On Wed, Nov 16, 2016 at 01:52:07PM +0100, Christoph Hellwig wrote: > this series has a few patches that switch btrfs to use the proper helpers for > accessing bio internals. This helps to prepare for supporting multi-page > bio_vecs, which are currently under development. Looks good to me, thanks. I'll let it pass through tests, expected merge target is 4.10. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fs: Btrfs - Improvement in code readability when
On Thu, Nov 10, 2016 at 03:17:41PM +0530, Shailendra Verma wrote: > From: "Shailendra Verma" > > There is no need to call kfree() if memdup_user() fails, as no memory > was allocated and the error in the error-valued pointer should be returned. > > Signed-off-by: Shailendra Verma Queued for 4.10, I've edited the subject line as it's very descriptive. ("btrfs: return early from failed memory allocations in ioctl handlers") -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] btrfs: make max inline data can be equal to sectorsize
On Mon, Nov 14, 2016 at 09:55:34AM +0800, Qu Wenruo wrote: > At 11/12/2016 04:22 AM, Liu Bo wrote: > > On Tue, Oct 11, 2016 at 02:47:42PM +0800, Wang Xiaoguang wrote: > >> If we use mount option "-o max_inline=sectorsize", say 4096, indeed > >> even for a fresh fs, say nodesize is 16k, we can not make the first > >> 4k data completely inline, I found this conditon causing this issue: > >> !compressed_size && (actual_end & (root->sectorsize - 1)) == 0 > >> > >> If it retuns true, we'll not make data inline. For 4k sectorsize, > >> 0~4094 dara range, we can make it inline, but 0~4095, it can not. > >> I don't think this limition is useful, so here remove it which will > >> make max inline data can be equal to sectorsize. > > > > It's difficult to tell whether we need this, I'm not a big fan of using > > max_inline size more than the default size 2048, given that most reports > > about ENOSPC is due to metadata and inline may make it worse. > > IMHO if we can use inline data extents to trigger ENOSPC more easily, > then we should allow it to dig the problem further. > > Just ignoring it because it may cause more bug will not solve the real > problem anyway. Not allowing the full 4k value as max_inline looks artificial to me. We've removed other similar limitation in the past so I'd tend to agree to do the same here. There's no significant use for it as far as I can tell, if you want to exhaust metadata, the difference to max_inline=4095 would be really tiny in the end. So, I'm okay with merging it. If anybody feels like adding his reviewed-by, please do so. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: fix file extent corruption
On 11/14/2016 06:11 PM, Liu Bo wrote: On Mon, Nov 14, 2016 at 02:06:21PM -0500, Josef Bacik wrote: In order to do hole punching we have a block reserve to hold the reservation we need to drop the extents in our range. Since we could end up dropping a lot of extents we set rsv->failfast so we can just loop around again and drop the remaining of the range. Unfortunately we unconditionally fill the hole extents in and start from the last extent we encountered, which we may or may not have dropped. So this can result in overlapping file extent entries, which can be tripped over in a variety of ways, either by hitting BUG_ON(!ret) in fill_holes() after the search, or in btrfs_set_item_key_safe() in btrfs_drop_extent() at a later time by an unrelated task. Fix this by only setting drop_end to the last extent we did actually drop. This way our holes are filled in properly for the range that we did drop, and the rest of the range that remains to be dropped is actually dropped. Thanks, Can you pleaes share the reproducer? Yup here you go https://paste.fedoraproject.org/483195/30633414 Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2][V2] Btrfs: fix file extent corruption
In order to do hole punching we have a block reserve to hold the reservation we need to drop the extents in our range. Since we could end up dropping a lot of extents we set rsv->failfast so we can just loop around again and drop the remaining of the range. Unfortunately we unconditionally fill the hole extents in and start from the last extent we encountered, which we may or may not have dropped. So this can result in overlapping file extent entries, which can be tripped over in a variety of ways, either by hitting BUG_ON(!ret) in fill_holes() after the search, or in btrfs_set_item_key_safe() in btrfs_drop_extent() at a later time by an unrelated task. Fix this by only setting drop_end to the last extent we did actually drop. This way our holes are filled in properly for the range that we did drop, and the rest of the range that remains to be dropped is actually dropped. Thanks, Signed-off-by: Josef Bacik --- V1->V2: - don't call fill_holes if our drop_end is == start. fs/btrfs/file.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index cbefdc8..23859e7 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -706,6 +706,7 @@ int __btrfs_drop_extents(struct btrfs_trans_handle *trans, u64 num_bytes = 0; u64 extent_offset = 0; u64 extent_end = 0; + u64 last_end = start; int del_nr = 0; int del_slot = 0; int extent_type; @@ -797,8 +798,10 @@ next_slot: * extent item in the call to setup_items_for_insert() later * in this function. */ - if (extent_end == key.offset && extent_end >= search_start) + if (extent_end == key.offset && extent_end >= search_start) { + last_end = extent_end; goto delete_extent_item; + } if (extent_end <= search_start) { path->slots[0]++; @@ -861,6 +864,12 @@ next_slot: key.offset = start; } /* +* From here on out we will have actually dropped something, so +* last_end can be updated. +*/ + last_end = extent_end; + + /* * | range to drop - | * | extent | */ @@ -1010,7 +1019,7 @@ delete_extent_item: if (!replace_extent || !(*key_inserted)) btrfs_release_path(path); if (drop_end) - *drop_end = found ? min(end, extent_end) : end; + *drop_end = found ? min(end, last_end) : end; return ret; } @@ -2526,7 +2535,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) trans->block_rsv = &root->fs_info->trans_block_rsv; - if (cur_offset < ino_size) { + if (cur_offset < drop_end && cur_offset < ino_size) { ret = fill_holes(trans, inode, path, cur_offset, drop_end); if (ret) { -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache
System panic'd overnight running 4.9rc5 & rsync. Attached a photo of the stack trace, and the 38 call traces in a 2 minute window shortly before, to the bugzilla case for those not on it's e-mail list: https://bugzilla.kernel.org/show_bug.cgi?id=186671 On Mon, Nov 14, 2016 at 3:56 PM, E V wrote: > Pretty sure it was the system after the OOM just did a history search > to check, though it is 3 days afterwards and several OOMs killed > several processes in somewhat rapid succession, I just listed the 1st. > I'll turn on CONFIG_DEBUG_VM and reboot again. > > On Mon, Nov 14, 2016 at 12:04 PM, Vlastimil Babka wrote: >> On 11/14/2016 02:27 PM, E V wrote: >>> System is an intel dual socket Xeon E5620, 7500/5520/5500/X58 ICH10 >>> family according to lspci. Anyways 4.8.4 OOM'd while I was gone. I'll >>> download the current 4.9rc and reboot, but in the mean time here's >>> xxd, vmstat & kern.log output: >>> 8532039 >> >> Hmm this would suggest that the memory is mostly free. But not according >> to vmstat. Is it possible you mistakenly provided the xxd from a fresh >> boot, but vmstat from after the OOM? >> >> But sure, a page_count() of zero is a reason why __isolate_lru_page() >> would fail due to its get_page_unless_zero(). The question is then how >> could it drop to zero without being freed at the same time, as >> put_page() does. >> >> I was going to suspect commit 83929372f6 and a page_ref_sub() it adds to >> delete_from_page_cache(), but that's since 4.8 and you mention problems >> since 4.7. >> >> Anyway it might be worth enabling CONFIG_DEBUG_VM as the relevant code >> usually has VM_BUG_ONs. >> >> Vlastimil >> >>>9324 0100 >>>2226 0200 >>> 405 0300 >>> 80 0400 >>> 34 0500 >>> 48 0600 >>> 17 0700 >>> 17 0800 >>> 32 0900 >>> 19 0a00 >>> 1 0c00 >>> 1 0d00 >>> 1 0e00 >>> 12 1000 >>> 8 1100 >>> 32 1200 >>> 10 1300 >>> 2 1400 >>> 11 1500 >>> 12 1600 >>> 7 1700 >>> 3 1800 >>> 5 1900 >>> 6 1a00 >>> 11 1b00 >>> 22 1c00 >>> 3 1d00 >>> 19 1e00 >>> 21 1f00 >>> 18 2000 >>> 28 2100 >>> 40 2200 >>> 38 2300 >>> 85 2400 >>> 59 2500 >>> 40520 81ff >>> >>> /proc/vmstat: >>> nr_free_pages 60965 >>> nr_zone_inactive_anon 4646 >>> nr_zone_active_anon 3265 >>> nr_zone_inactive_file 633882 >>> nr_zone_active_file 7017458 >>> nr_zone_unevictable 0 >>> nr_zone_write_pending 0 >>> nr_mlock 0 >>> nr_slab_reclaimable 299205 >>> nr_slab_unreclaimable 195497 >>> nr_page_table_pages 935 >>> nr_kernel_stack 4976 >>> nr_bounce 0 >>> numa_hit 3577063288 >>> numa_miss 541393191 >>> numa_foreign 541393191 >>> numa_interleave 19415 >>> numa_local 3577063288 >>> numa_other 0 >>> nr_free_cma 0 >>> nr_inactive_anon 4646 >>> nr_active_anon 3265 >>> nr_inactive_file 633882 >>> nr_active_file 7017458 >>> nr_unevictable 0 >>> nr_isolated_anon 0 >>> nr_isolated_file 0 >>> nr_pages_scanned 0 >>> workingset_refault 42685891 >>> workingset_activate 15247281 >>> workingset_nodereclaim 26375216 >>> nr_anon_pages 5067 >>> nr_mapped 5630 >>> nr_file_pages 7654746 >>> nr_dirty 0 >>> nr_writeback 0 >>> nr_writeback_temp 0 >>> nr_shmem 2504 >>> nr_shmem_hugepages 0 >>> nr_shmem_pmdmapped 0 >>> nr_anon_transparent_hugepages 0 >>> nr_unstable 0 >>> nr_vmscan_write 5243750485 >>> nr_vmscan_immediate_reclaim 4207633857 >>> nr_dirtied 1839143430 >>> nr_written 1832626107 >>> nr_dirty_threshold 1147728 >>> nr_dirty_background_threshold 151410 >>> pgpgin 166731189 >>> pgpgout 7328142335 >>> pswpin 98608 >>> pswpout 117794 >>> pgalloc_dma 29504 >>> pgalloc_dma32 1006726216 >>> pgalloc_normal 5275218188 >>> pgalloc_movable 0 >>> allocstall_dma 0 >>> allocstall_dma32 0 >>> allocstall_normal 36461 >>> allocstall_movable 5867 >>> pgskip_dma 0 >>> pgskip_dma32 0 >>> pgskip_normal 6417890 >>> pgskip_movable 0 >>> pgfree 6309223401 >>> pgactivate 35076483 >>> pgdeactivate 63556974 >>> pgfault 35753842 >>> pgmajfault 69126 >>> pglazyfreed 0 >>> pgrefill 70008598 >>> pgsteal_kswapd 3567289713 >>> pgsteal_direct 5878057 >>> pgscan_kswapd 9059309872 >>> pgscan_direct 4239367903 >>> pgscan_direct_throttle 0 >>> zone_reclaim_failed 0 >>> pginodesteal 102916 >>> slabs_scanned 460790262 >>> kswapd_inodesteal 9130243 >>> kswapd_low_wmark_hit_quickly 10634373 >>> kswapd_high_wmark_hit_quickly 7348173 >>> pageoutrun 18349115 >>> pgrotated 16291322 >>> drop_pagecache 0 >>> drop_slab 0 >>> pgmigrate_success 18912908 >>> pgmigrate_fail 63382146 >>> compact_migra
Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
On 2016-11-16 06:04, Martin Steigerwald wrote: Am Mittwoch, 16. November 2016, 16:00:31 CET schrieb Roman Mamedov: On Wed, 16 Nov 2016 11:55:32 +0100 Martin Steigerwald wrote: I do think that above kernel messages invite such a kind of interpretation tough. I took the "BTRFS: open_ctree failed" message as indicative to some structural issue with the filesystem. For the reason as to why the writable mount didn't work, check "btrfs fi df" for the filesystem to see if you have any "single" profile chunks on it: quite likely you did already mount it "degraded,rw" in the past *once*, after which those "single" chunks get created, and consequently it won't mount r/w anymore (without lifting the restriction on the number of missing devices as proposed). That exactly explains it. I very likely did a degraded mount without ro on this disk already. Funnily enough this creates another complication: merkaba:/mnt/zeit#1> btrfs send somesubvolume | btrfs receive /mnt/ someotherbtrfs ERROR: subvolume /mnt/zeit/somesubvolume is not read-only Yet: merkaba:/mnt/zeit> btrfs property get somesubvolume ro=false merkaba:/mnt/zeit> btrfs property set somesubvolume ro true ERROR: failed to set flags for somesubvolume: Read-only file system To me it seems right logic would be to allow the send to proceed in case the whole filesystem is readonly. It should, but doesn't currently. There was a thread about this a while back, but I don't think it ever resulted in anything changing. As there seems to be no force option to override the limitation and I do not feel like compiling my own btrfs-tools right now, I will use rsync instead. In a case like this, I'd trust rsync more than send/receive. The following rsync switches might also be of interest: -a: This turns on a bunch of things almost everyone wants when using rsync, similar to the same switch for cp, just with even more added in. -H: This recreates hardlinks on the receiving end. -S: This recreates sparse files. -A: This copies POSIX ACL's -X: This copies extended attributes (most of them at least, there are a few that can't be arbitrarily written to). Pre-creating the subvolumes by hand combined with using all of those will get you almost everything covered by send/receive except for sharing of extents and ctime. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/9] btrfs: use bi_size
Instead of using bi_vcnt to calculate it. Signed-off-by: Christoph Hellwig --- fs/btrfs/compression.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 12a631d..8618ac3 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -562,7 +562,6 @@ static noinline int add_ra_bio_pages(struct inode *inode, * * bio->bi_iter.bi_sector points to the compressed extent on disk * bio->bi_io_vec points to all of the inode pages - * bio->bi_vcnt is a count of pages * * After the compressed pages are read, we copy the bytes into the * bio we were passed and then call the bio end_io calls @@ -574,7 +573,6 @@ int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, struct extent_map_tree *em_tree; struct compressed_bio *cb; struct btrfs_root *root = BTRFS_I(inode)->root; - unsigned long uncompressed_len = bio->bi_vcnt * PAGE_SIZE; unsigned long compressed_len; unsigned long nr_pages; unsigned long pg_index; @@ -619,7 +617,7 @@ int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, free_extent_map(em); em = NULL; - cb->len = uncompressed_len; + cb->len = bio->bi_iter.bi_size; cb->compressed_len = compressed_len; cb->compress_type = extent_compress_type(bio_flags); cb->orig_bio = bio; @@ -647,8 +645,7 @@ int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, add_ra_bio_pages(inode, em_start + em_len, cb); /* include any pages we added in add_ra-bio_pages */ - uncompressed_len = bio->bi_vcnt * PAGE_SIZE; - cb->len = uncompressed_len; + cb->len = bio->bi_iter.bi_size; comp_bio = compressed_bio_alloc(bdev, cur_disk_byte, GFP_NOFS); if (!comp_bio) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/9] btrfs: calculate end of bio offset properly
Use the bvec offset and len members to prepare for multipage bvecs. Signed-off-by: Christoph Hellwig --- fs/btrfs/compression.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 8618ac3..27e9feb 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -445,6 +445,13 @@ int btrfs_submit_compressed_write(struct inode *inode, u64 start, return 0; } +static u64 bio_end_offset(struct bio *bio) +{ + struct bio_vec *last = &bio->bi_io_vec[bio->bi_vcnt - 1]; + + return page_offset(last->bv_page) + last->bv_len - last->bv_offset; +} + static noinline int add_ra_bio_pages(struct inode *inode, u64 compressed_end, struct compressed_bio *cb) @@ -463,8 +470,7 @@ static noinline int add_ra_bio_pages(struct inode *inode, u64 end; int misses = 0; - page = cb->orig_bio->bi_io_vec[cb->orig_bio->bi_vcnt - 1].bv_page; - last_offset = (page_offset(page) + PAGE_SIZE); + last_offset = bio_end_offset(cb->orig_bio); em_tree = &BTRFS_I(inode)->extent_tree; tree = &BTRFS_I(inode)->io_tree; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/9] btrfs: only check bio size to see if a repair bio should have the failfast flag
The number of pages in a bio is a bad indicatator for the number of splits lower levels could do, and with the multipage bio_vec work even that measure goes away and will become a number of segments of physically contiguous areas instead. Check the total bio size vs the sector size instead, which gives us an indication without any false negatives, although the false positive rate might increase a bit. Signed-off-by: Christoph Hellwig --- fs/btrfs/extent_io.c | 4 ++-- fs/btrfs/inode.c | 4 +--- 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index ea9ade7..a05fc41 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2296,7 +2296,7 @@ int btrfs_check_repairable(struct inode *inode, struct bio *failed_bio, * a) deliver good data to the caller * b) correct the bad sectors on disk */ - if (failed_bio->bi_vcnt > 1) { + if (failed_bio->bi_iter.bi_size > BTRFS_I(inode)->root->sectorsize) { /* * to fulfill b), we need to know the exact failing sectors, as * we don't want to rewrite any more than the failed ones. thus, @@ -2403,7 +2403,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset, return -EIO; } - if (failed_bio->bi_vcnt > 1) + if (failed_bio->bi_iter.bi_size > BTRFS_I(inode)->root->sectorsize) read_mode = READ_SYNC | REQ_FAILFAST_DEV; else read_mode = READ_SYNC; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3f09cb6..54afe41 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7933,9 +7933,7 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio, return -EIO; } - if ((failed_bio->bi_vcnt > 1) - || (failed_bio->bi_io_vec->bv_len - > BTRFS_I(inode)->root->sectorsize)) + if (failed_bio->bi_iter.bi_size > BTRFS_I(inode)->root->sectorsize) read_mode = READ_SYNC | REQ_FAILFAST_DEV; else read_mode = READ_SYNC; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/9] btrfs: refactor __btrfs_lookup_bio_sums to use bio_for_each_segment_all
Rework the loop a little bit to use the generic bio_for_each_segment_all helper for iterating over the bio. Signed-off-by: Christoph Hellwig --- fs/btrfs/file-item.c | 31 +++ 1 file changed, 11 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index fa8aa53..54ccb91 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -163,7 +163,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root, struct inode *inode, struct bio *bio, u64 logical_offset, u32 *dst, int dio) { - struct bio_vec *bvec = bio->bi_io_vec; + struct bio_vec *bvec; struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio); struct btrfs_csum_item *item = NULL; struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; @@ -177,7 +177,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root, u32 diff; int nblocks; int bio_index = 0; - int count; + int count = 0; u16 csum_size = btrfs_super_csum_size(root->fs_info->super_copy); path = btrfs_alloc_path(); @@ -223,8 +223,11 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root, if (dio) offset = logical_offset; - page_bytes_left = bvec->bv_len; - while (bio_index < bio->bi_vcnt) { + bio_for_each_segment_all(bvec, bio, bio_index) { + page_bytes_left = bvec->bv_len; + if (count) + goto next; + if (!dio) offset = page_offset(bvec->bv_page) + bvec->bv_offset; count = btrfs_find_ordered_sum(inode, offset, disk_bytenr, @@ -285,29 +288,17 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root, found: csum += count * csum_size; nblocks -= count; - +next: while (count--) { disk_bytenr += root->sectorsize; offset += root->sectorsize; page_bytes_left -= root->sectorsize; - if (!page_bytes_left) { - bio_index++; - /* -* make sure we're still inside the -* bio before we update page_bytes_left -*/ - if (bio_index >= bio->bi_vcnt) { - WARN_ON_ONCE(count); - goto done; - } - bvec++; - page_bytes_left = bvec->bv_len; - } - + if (!page_bytes_left) + break; /* move to next bio */ } } -done: + WARN_ON_ONCE(count); btrfs_free_path(path); return 0; } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/9] btrfs: use bio_for_each_segment_all in __btrfsic_submit_bio
And remove the bogus check for a NULL return value from kmap, which can't happen. While we're at it: I don't think that kmapping up to 256 will work without deadlocks on highmem machines, a better idea would be to use vm_map_ram to map all of them into a single virtual address range. Incidentally that would also simplify the code a lot. Signed-off-by: Christoph Hellwig --- fs/btrfs/check-integrity.c | 30 +++--- 1 file changed, 11 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index a6f657f..86f681f 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -2819,10 +2819,11 @@ static void __btrfsic_submit_bio(struct bio *bio) * btrfsic_mount(), this might return NULL */ dev_state = btrfsic_dev_state_lookup(bio->bi_bdev); if (NULL != dev_state && - (bio_op(bio) == REQ_OP_WRITE) && NULL != bio->bi_io_vec) { + (bio_op(bio) == REQ_OP_WRITE) && bio_has_data(bio)) { unsigned int i; u64 dev_bytenr; u64 cur_bytenr; + struct bio_vec *bvec; int bio_is_patched; char **mapped_datav; @@ -2840,32 +2841,23 @@ static void __btrfsic_submit_bio(struct bio *bio) if (!mapped_datav) goto leave; cur_bytenr = dev_bytenr; - for (i = 0; i < bio->bi_vcnt; i++) { - BUG_ON(bio->bi_io_vec[i].bv_len != PAGE_SIZE); - mapped_datav[i] = kmap(bio->bi_io_vec[i].bv_page); - if (!mapped_datav[i]) { - while (i > 0) { - i--; - kunmap(bio->bi_io_vec[i].bv_page); - } - kfree(mapped_datav); - goto leave; - } + + bio_for_each_segment_all(bvec, bio, i) { + BUG_ON(bvec->bv_len != PAGE_SIZE); + mapped_datav[i] = kmap(bvec->bv_page); + if (dev_state->state->print_mask & BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH_VERBOSE) pr_info("#%u: bytenr=%llu, len=%u, offset=%u\n", - i, cur_bytenr, bio->bi_io_vec[i].bv_len, - bio->bi_io_vec[i].bv_offset); - cur_bytenr += bio->bi_io_vec[i].bv_len; + i, cur_bytenr, bvec->bv_len, bvec->bv_offset); + cur_bytenr += bvec->bv_len; } btrfsic_process_written_block(dev_state, dev_bytenr, mapped_datav, bio->bi_vcnt, bio, &bio_is_patched, NULL, bio->bi_opf); - while (i > 0) { - i--; - kunmap(bio->bi_io_vec[i].bv_page); - } + bio_for_each_segment_all(bvec, bio, i) + kunmap(bvec->bv_page); kfree(mapped_datav); } else if (NULL != dev_state && (bio->bi_opf & REQ_PREFLUSH)) { if (dev_state->state->print_mask & -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/9] btrfs: don't access the bio directly in the raid5/6 code
Just use bio_for_each_segment_all to iterate over all segments. Signed-off-by: Christoph Hellwig --- fs/btrfs/raid56.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index d016d4a..da941fb 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -1144,10 +1144,10 @@ static void validate_rbio_for_rmw(struct btrfs_raid_bio *rbio) static void index_rbio_pages(struct btrfs_raid_bio *rbio) { struct bio *bio; + struct bio_vec *bvec; u64 start; unsigned long stripe_offset; unsigned long page_index; - struct page *p; int i; spin_lock_irq(&rbio->bio_list_lock); @@ -1156,10 +1156,8 @@ static void index_rbio_pages(struct btrfs_raid_bio *rbio) stripe_offset = start - rbio->bbio->raid_map[0]; page_index = stripe_offset >> PAGE_SHIFT; - for (i = 0; i < bio->bi_vcnt; i++) { - p = bio->bi_io_vec[i].bv_page; - rbio->bio_pages[page_index + i] = p; - } + bio_for_each_segment_all(bvec, bio, i) + rbio->bio_pages[page_index+ i] = bvec->bv_page; } spin_unlock_irq(&rbio->bio_list_lock); } @@ -1433,13 +1431,11 @@ static int fail_bio_stripe(struct btrfs_raid_bio *rbio, */ static void set_bio_pages_uptodate(struct bio *bio) { + struct bio_vec *bvec; int i; - struct page *p; - for (i = 0; i < bio->bi_vcnt; i++) { - p = bio->bi_io_vec[i].bv_page; - SetPageUptodate(p); - } + bio_for_each_segment_all(bvec, bio, i) + SetPageUptodate(bvec->bv_page); } /* -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/9] btrfs: don't access the bio directly in btrfs_csum_one_bio
Use bio_for_each_segment_all to iterate over the segments instead. This requires a bit of reshuffling so that we only lookup up the ordered item once inside the bio_for_each_segment_all loop. Signed-off-by: Christoph Hellwig --- fs/btrfs/file-item.c | 21 ++--- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index d0d571c..fa8aa53 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -447,13 +447,12 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct inode *inode, struct bio *bio, u64 file_start, int contig) { struct btrfs_ordered_sum *sums; - struct btrfs_ordered_extent *ordered; + struct btrfs_ordered_extent *ordered = NULL; char *data; - struct bio_vec *bvec = bio->bi_io_vec; - int bio_index = 0; + struct bio_vec *bvec; int index; int nr_sectors; - int i; + int i, j; unsigned long total_bytes = 0; unsigned long this_sum_bytes = 0; u64 offset; @@ -470,17 +469,20 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct inode *inode, if (contig) offset = file_start; else - offset = page_offset(bvec->bv_page) + bvec->bv_offset; + offset = 0; /* shut up gcc */ - ordered = btrfs_lookup_ordered_extent(inode, offset); - BUG_ON(!ordered); /* Logic error */ sums->bytenr = (u64)bio->bi_iter.bi_sector << 9; index = 0; - while (bio_index < bio->bi_vcnt) { + bio_for_each_segment_all(bvec, bio, j) { if (!contig) offset = page_offset(bvec->bv_page) + bvec->bv_offset; + if (!ordered) { + ordered = btrfs_lookup_ordered_extent(inode, offset); + BUG_ON(!ordered); /* Logic error */ + } + data = kmap_atomic(bvec->bv_page); nr_sectors = BTRFS_BYTES_TO_BLKS(root->fs_info, @@ -529,9 +531,6 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct inode *inode, } kunmap_atomic(data); - - bio_index++; - bvec++; } this_sum_bytes = 0; btrfs_add_ordered_sum(inode, ordered, sums); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/9] btrfs: don't access the bio directly in the raid5/6 code
Just use bio_for_each_segment_all to iterate over all segments. Signed-off-by: Christoph Hellwig --- fs/btrfs/inode.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 147df4c..3f09cb6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8394,7 +8394,7 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip, struct btrfs_root *root = BTRFS_I(inode)->root; struct bio *bio; struct bio *orig_bio = dip->orig_bio; - struct bio_vec *bvec = orig_bio->bi_io_vec; + struct bio_vec *bvec; u64 start_sector = orig_bio->bi_iter.bi_sector; u64 file_offset = dip->logical_offset; u64 submit_len = 0; @@ -8403,7 +8403,7 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip, int async_submit = 0; int nr_sectors; int ret; - int i; + int i, j; map_length = orig_bio->bi_iter.bi_size; ret = btrfs_map_block(root->fs_info, btrfs_op(orig_bio), @@ -8433,7 +8433,7 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip, btrfs_io_bio(bio)->logical = file_offset; atomic_inc(&dip->pending_bios); - while (bvec <= (orig_bio->bi_io_vec + orig_bio->bi_vcnt - 1)) { + bio_for_each_segment_all(bvec, orig_bio, j) { nr_sectors = BTRFS_BYTES_TO_BLKS(root->fs_info, bvec->bv_len); i = 0; next_block: @@ -8487,7 +8487,6 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip, i++; goto next_block; } - bvec++; } } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/9] btrfs: use bio iterators for the decompression handlers
Pass the full bio to the decompression routines and use bio iterators to iterate over the data in the bio. Signed-off-by: Christoph Hellwig --- fs/btrfs/compression.c | 122 + fs/btrfs/compression.h | 12 ++--- fs/btrfs/lzo.c | 17 ++- fs/btrfs/zlib.c| 15 ++ 4 files changed, 54 insertions(+), 112 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index d4d8b7e..12a631d 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -81,9 +81,9 @@ struct compressed_bio { u32 sums; }; -static int btrfs_decompress_biovec(int type, struct page **pages_in, - u64 disk_start, struct bio_vec *bvec, - int vcnt, size_t srclen); +static int btrfs_decompress_bio(int type, struct page **pages_in, + u64 disk_start, struct bio *orig_bio, + size_t srclen); static inline int compressed_bio_size(struct btrfs_root *root, unsigned long disk_size) @@ -175,11 +175,10 @@ static void end_compressed_bio_read(struct bio *bio) /* ok, we're the last bio for this extent, lets start * the decompression. */ - ret = btrfs_decompress_biovec(cb->compress_type, + ret = btrfs_decompress_bio(cb->compress_type, cb->compressed_pages, cb->start, - cb->orig_bio->bi_io_vec, - cb->orig_bio->bi_vcnt, + cb->orig_bio, cb->compressed_len); csum_failed: if (ret) @@ -959,9 +958,7 @@ int btrfs_compress_pages(int type, struct address_space *mapping, * * disk_start is the starting logical offset of this array in the file * - * bvec is a bio_vec of pages from the file that we want to decompress into - * - * vcnt is the count of pages in the biovec + * orig_bio contains the pages from the file that we want to decompress into * * srclen is the number of bytes in pages_in * @@ -970,18 +967,18 @@ int btrfs_compress_pages(int type, struct address_space *mapping, * be contiguous. They all correspond to the range of bytes covered by * the compressed extent. */ -static int btrfs_decompress_biovec(int type, struct page **pages_in, - u64 disk_start, struct bio_vec *bvec, - int vcnt, size_t srclen) +static int btrfs_decompress_bio(int type, struct page **pages_in, + u64 disk_start, struct bio *orig_bio, + size_t srclen) { struct list_head *workspace; int ret; workspace = find_workspace(type); - ret = btrfs_compress_op[type-1]->decompress_biovec(workspace, pages_in, -disk_start, -bvec, vcnt, srclen); + ret = btrfs_compress_op[type-1]->decompress_bio(workspace, pages_in, +disk_start, orig_bio, +srclen); free_workspace(type, workspace); return ret; } @@ -1021,9 +1018,7 @@ void btrfs_exit_compress(void) */ int btrfs_decompress_buf2page(char *buf, unsigned long buf_start, unsigned long total_out, u64 disk_start, - struct bio_vec *bvec, int vcnt, - unsigned long *pg_index, - unsigned long *pg_offset) + struct bio *bio) { unsigned long buf_offset; unsigned long current_buf_start; @@ -1031,13 +1026,13 @@ int btrfs_decompress_buf2page(char *buf, unsigned long buf_start, unsigned long working_bytes = total_out - buf_start; unsigned long bytes; char *kaddr; - struct page *page_out = bvec[*pg_index].bv_page; + struct bio_vec bvec = bio_iter_iovec(bio, bio->bi_iter); /* * start byte is the first byte of the page we're currently * copying into relative to the start of the compressed data. */ - start_byte = page_offset(page_out) - disk_start; + start_byte = page_offset(bvec.bv_page) - disk_start; /* we haven't yet hit data corresponding to this page */ if (total_out <= start_byte) @@ -1057,80 +1052,45 @@ int btrfs_decompress_buf2page(char *buf, unsigned long buf_start, /* copy bytes from the working buffer into the pages */ while (working_bytes > 0) { - bytes = min(PAGE_SIZE - *pg_offset, - PAGE_SIZE - buf_offset); + bytes = min_t(unsigned long, bvec.bv_len, +
don't poke into bio internals
Hi all, this series has a few patches that switch btrfs to use the proper helpers for accessing bio internals. This helps to prepare for supporting multi-page bio_vecs, which are currently under development. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
On 2016-11-16 05:55, Martin Steigerwald wrote: Am Mittwoch, 16. November 2016, 15:43:36 CET schrieb Roman Mamedov: On Wed, 16 Nov 2016 11:25:00 +0100 Martin Steigerwald wrote: merkaba:~> mount -o degraded,clear_cache /dev/satafp1/backup /mnt/zeit mount: Falscher Dateisystemtyp, ungültige Optionen, der Superblock von /dev/mapper/satafp1-backup ist beschädigt, fehlende Kodierungsseite oder ein anderer Fehler Manchmal liefert das Systemprotokoll wertvolle Informationen – versuchen Sie dmesg | tail oder ähnlich merkaba:~#32> dmesg | tail -6 [ 3080.120687] BTRFS info (device dm-13): allowing degraded mounts [ 3080.120699] BTRFS info (device dm-13): force clearing of disk cache [ 3080.120703] BTRFS info (device dm-13): disk space caching is enabled [ 3080.120706] BTRFS info (device dm-13): has skinny extents [ 3080.150957] BTRFS warning (device dm-13): missing devices (1) exceeds the limit (0), writeable mount is not allowed [ 3080.195941] BTRFS: open_ctree failed I have to wonder did you read the above message? What you need at this point is simply "-o degraded,ro". But I don't see that tried anywhere down the line. See also (or try): https://patchwork.kernel.org/patch/9419189/ Actually I read that one, but I read more into it than what it was saying: I read into it that BTRFS would automatically use a read only mount. merkaba:~> mount -o degraded,ro /dev/satafp1/daten /mnt/zeit actually really works. *Thank you*, Roman. I do think that above kernel messages invite such a kind of interpretation tough. I took the "BTRFS: open_ctree failed" message as indicative to some structural issue with the filesystem. Technically, the fact that a device is missing is a structural issue with the FS. Whether or not that falls under what any arbitrary person considers a structural issue or not is a different story. General background though: open_ctree is one of the core functions in the BTRFS code used during mounting the filesystem. Everything that calls it checks the return code and spits out 'BTRFS: open_ctree failed' if it failed. The problem is, just about everything internal (and many external things as well) to the BTRFS code that could prevent the FS from mounting happens either in open_ctree, or in a function it calls, so all that that line tells us is that the mount failed, which is less than useful in most cases. Given both the confusion you've experienced regarding this (which has happened to other people too), combined with the amount of effort I've had to put in to get the rest of the SysOps people where I work to understand that that message just means 'mount failed', I would really love to see that just be replaced with 'mount failed' in non-debug builds, preferrably with better info about _why_ things failed (the case of a degraded filesystem is pretty covered, but most other cases other than incompatible feature bits are not). So mounting work although for some reason scrubbing is aborted (I had this issue a long time ago on my laptop as well). After removing /var/lib/btrfs scrub status file for the filesystem: Last I knew, scrub doesn't work on degraded filesystems (in fact, by definition, it _can't_ work on a degraded array). It absolutely won't work though without the read-only flag on filesystems which are mounted read-only. merkaba:~> btrfs scrub start /mnt/zeit scrub started on /mnt/zeit, fsid […] (pid=9054) merkaba:~> btrfs scrub status /mnt/zeit scrub status for […] scrub started at Wed Nov 16 11:52:56 2016 and was aborted after 00:00:00 total bytes scrubbed: 0.00B with 0 errors Anyway, I will now just rsync off the files. Interestingly enough btrfs restore complained about looping over certain files… lets see whether the rsync or btrfs send/receive proceeds through. I'd expect rsync to be more likely to work than send/receive. In general, if you can read the files, rsync will work, whereas send/receive needs to read some low-level data from the FS which may not be touched when just reading files, so there are cases where rsync will work but send/receive won't. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
Am Mittwoch, 16. November 2016, 11:55:32 CET schrieben Sie: > So mounting work although for some reason scrubbing is aborted (I had this > issue a long time ago on my laptop as well). After removing /var/lib/btrfs > scrub status file for the filesystem: > > merkaba:~> btrfs scrub start /mnt/zeit > scrub started on /mnt/zeit, fsid […] (pid=9054) > merkaba:~> btrfs scrub status /mnt/zeit > scrub status for […] > scrub started at Wed Nov 16 11:52:56 2016 and was aborted after > 00:00:00 > total bytes scrubbed: 0.00B with 0 errors > > Anyway, I will now just rsync off the files. > > Interestingly enough btrfs restore complained about looping over certain > files… lets see whether the rsync or btrfs send/receive proceeds through. I have an idea on why scrubbing may not work: The filesystem is mounted read only and on checksum errors on one disk scrub would try to repair it with the good copy from another disk. Yes, this is it: merkaba:~> btrfs scrub start -r /dev/satafp1/daten scrub started on /dev/satafp1/daten, fsid […] (pid=9375) merkaba:~> btrfs scrub status /dev/satafp1/daten scrub status for […] scrub started at Wed Nov 16 12:13:27 2016, running for 00:00:10 total bytes scrubbed: 45.53MiB with 0 errors It would be helpful to receive a proper error message on this one. Okay, seems today I learned quite something about BTRFS. Thanks, -- Martin Steigerwald | Trainer teamix GmbH Südwestpark 43 90449 Nürnberg Tel.: +49 911 30999 55 | Fax: +49 911 30999 99 mail: martin.steigerw...@teamix.de | web: http://www.teamix.de | blog: http://blog.teamix.de Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller teamix Support Hotline: +49 911 30999-112 *** Bitte liken Sie uns auf Facebook: facebook.com/teamix *** -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
Am Mittwoch, 16. November 2016, 16:00:31 CET schrieb Roman Mamedov: > On Wed, 16 Nov 2016 11:55:32 +0100 > > Martin Steigerwald wrote: > > I do think that above kernel messages invite such a kind of interpretation > > tough. I took the "BTRFS: open_ctree failed" message as indicative to some > > structural issue with the filesystem. > > For the reason as to why the writable mount didn't work, check "btrfs fi df" > for the filesystem to see if you have any "single" profile chunks on it: > quite likely you did already mount it "degraded,rw" in the past *once*, > after which those "single" chunks get created, and consequently it won't > mount r/w anymore (without lifting the restriction on the number of missing > devices as proposed). That exactly explains it. I very likely did a degraded mount without ro on this disk already. Funnily enough this creates another complication: merkaba:/mnt/zeit#1> btrfs send somesubvolume | btrfs receive /mnt/ someotherbtrfs ERROR: subvolume /mnt/zeit/somesubvolume is not read-only Yet: merkaba:/mnt/zeit> btrfs property get somesubvolume ro=false merkaba:/mnt/zeit> btrfs property set somesubvolume ro true ERROR: failed to set flags for somesubvolume: Read-only file system To me it seems right logic would be to allow the send to proceed in case the whole filesystem is readonly. As there seems to be no force option to override the limitation and I do not feel like compiling my own btrfs-tools right now, I will use rsync instead. Thanks, -- Martin Steigerwald | Trainer teamix GmbH Südwestpark 43 90449 Nürnberg Tel.: +49 911 30999 55 | Fax: +49 911 30999 99 mail: martin.steigerw...@teamix.de | web: http://www.teamix.de | blog: http://blog.teamix.de Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller teamix Support Hotline: +49 911 30999-112 *** Bitte liken Sie uns auf Facebook: facebook.com/teamix *** -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
On Wed, 16 Nov 2016 11:55:32 +0100 Martin Steigerwald wrote: > I do think that above kernel messages invite such a kind of interpretation > tough. I took the "BTRFS: open_ctree failed" message as indicative to some > structural issue with the filesystem. For the reason as to why the writable mount didn't work, check "btrfs fi df" for the filesystem to see if you have any "single" profile chunks on it: quite likely you did already mount it "degraded,rw" in the past *once*, after which those "single" chunks get created, and consequently it won't mount r/w anymore (without lifting the restriction on the number of missing devices as proposed). -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
Am Mittwoch, 16. November 2016, 15:43:36 CET schrieb Roman Mamedov: > On Wed, 16 Nov 2016 11:25:00 +0100 > > Martin Steigerwald wrote: > > merkaba:~> mount -o degraded,clear_cache /dev/satafp1/backup /mnt/zeit > > mount: Falscher Dateisystemtyp, ungültige Optionen, der > > Superblock von /dev/mapper/satafp1-backup ist beschädigt, fehlende > > Kodierungsseite oder ein anderer Fehler > > > > Manchmal liefert das Systemprotokoll wertvolle Informationen – > > versuchen Sie dmesg | tail oder ähnlich > > > > merkaba:~#32> dmesg | tail -6 > > [ 3080.120687] BTRFS info (device dm-13): allowing degraded mounts > > [ 3080.120699] BTRFS info (device dm-13): force clearing of disk cache > > [ 3080.120703] BTRFS info (device dm-13): disk space caching is > > enabled > > [ 3080.120706] BTRFS info (device dm-13): has skinny extents > > [ 3080.150957] BTRFS warning (device dm-13): missing devices (1) > > exceeds the limit (0), writeable mount is not allowed > > [ 3080.195941] BTRFS: open_ctree failed > > I have to wonder did you read the above message? What you need at this point > is simply "-o degraded,ro". But I don't see that tried anywhere down the > line. > > See also (or try): https://patchwork.kernel.org/patch/9419189/ Actually I read that one, but I read more into it than what it was saying: I read into it that BTRFS would automatically use a read only mount. merkaba:~> mount -o degraded,ro /dev/satafp1/daten /mnt/zeit actually really works. *Thank you*, Roman. I do think that above kernel messages invite such a kind of interpretation tough. I took the "BTRFS: open_ctree failed" message as indicative to some structural issue with the filesystem. So mounting work although for some reason scrubbing is aborted (I had this issue a long time ago on my laptop as well). After removing /var/lib/btrfs scrub status file for the filesystem: merkaba:~> btrfs scrub start /mnt/zeit scrub started on /mnt/zeit, fsid […] (pid=9054) merkaba:~> btrfs scrub status /mnt/zeit scrub status for […] scrub started at Wed Nov 16 11:52:56 2016 and was aborted after 00:00:00 total bytes scrubbed: 0.00B with 0 errors Anyway, I will now just rsync off the files. Interestingly enough btrfs restore complained about looping over certain files… lets see whether the rsync or btrfs send/receive proceeds through. Ciao, -- Martin Steigerwald | Trainer teamix GmbH Südwestpark 43 90449 Nürnberg Tel.: +49 911 30999 55 | Fax: +49 911 30999 99 mail: martin.steigerw...@teamix.de | web: http://www.teamix.de | blog: http://blog.teamix.de Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller teamix Support Hotline: +49 911 30999-112 *** Bitte liken Sie uns auf Facebook: facebook.com/teamix *** -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
On Wed, 16 Nov 2016 11:25:00 +0100 Martin Steigerwald wrote: > merkaba:~> mount -o degraded,clear_cache /dev/satafp1/backup /mnt/zeit > mount: Falscher Dateisystemtyp, ungültige Optionen, der > Superblock von /dev/mapper/satafp1-backup ist beschädigt, fehlende > Kodierungsseite oder ein anderer Fehler > > Manchmal liefert das Systemprotokoll wertvolle Informationen – > versuchen Sie dmesg | tail oder ähnlich > merkaba:~#32> dmesg | tail -6 > [ 3080.120687] BTRFS info (device dm-13): allowing degraded mounts > [ 3080.120699] BTRFS info (device dm-13): force clearing of disk cache > [ 3080.120703] BTRFS info (device dm-13): disk space caching is enabled > [ 3080.120706] BTRFS info (device dm-13): has skinny extents > [ 3080.150957] BTRFS warning (device dm-13): missing devices (1) exceeds > the limit (0), writeable mount is not allowed > [ 3080.195941] BTRFS: open_ctree failed I have to wonder did you read the above message? What you need at this point is simply "-o degraded,ro". But I don't see that tried anywhere down the line. See also (or try): https://patchwork.kernel.org/patch/9419189/ -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
Hello! A degraded BTRFS RAID 1 from one 3TB SATA HDD of my former workstation is not mountable. Debian 4.8 kernel + btrfs-tools 4.7.3. A btrfs restore seems to work well enough, so on one hand there is no urgency. But on the other hand I want to repurpose the harddisk and I think I want to do it next weekend. So if you want me to gather some debug data, please speak up quickly. Thank you. AFAIR I have been able to mount the filesystems in degraded mode, but this may have been on the other SATA HDD that I already wiped with shred command. I have this: merkaba:~> btrfs fi sh […] warning, device 2 is missing warning, device 2 is missing warning, device 2 is missing Label: 'debian' uuid: […] Total devices 2 FS bytes used 20.10GiB devid1 size 50.00GiB used 29.03GiB path /dev/mapper/satafp1-debian *** Some devices missing Label: 'daten' uuid: […] Total devices 2 FS bytes used 135.02GiB devid1 size 1.00TiB used 142.06GiB path /dev/mapper/satafp1-daten *** Some devices missing Label: 'backup' uuid: […] Total devices 2 FS bytes used 88.38GiB devid1 size 1.00TiB used 93.06GiB path /dev/mapper/satafp1-backup *** Some devices missing But none of these filesystems seem to be mountable. Here some attempts: merkaba:~#130> LANG=C mount -o degraded /dev/satafp1/backup /mnt/zeit mount: wrong fs type, bad option, bad superblock on /dev/mapper/satafp1-daten, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. merkaba:~> dmesg | tail -5 [ 2945.155943] BTRFS info (device dm-13): allowing degraded mounts [ 2945.155953] BTRFS info (device dm-13): disk space caching is enabled [ 2945.155957] BTRFS info (device dm-13): has skinny extents [ 2945.611236] BTRFS warning (device dm-13): missing devices (1) exceeds the limit (0), writeable mount is not allowed [ 2945.646719] BTRFS: open_ctree failed merkaba:~> LANG=C mount -o usebackuproot /dev/satafp1/daten /mnt/zeit mount: wrong fs type, bad option, bad superblock on /dev/mapper/satafp1-daten, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. merkaba:~#32> dmesg | tail -5 [ 5739.051433] BTRFS info (device dm-12): trying to use backup root at mount time [ 5739.051441] BTRFS info (device dm-12): disk space caching is enabled [ 5739.051444] BTRFS info (device dm-12): has skinny extents [ 5739.103153] BTRFS error (device dm-12): failed to read chunk tree: -5 [ 5739.130304] BTRFS: open_ctree failed merkaba:~> LANG=C mount -o degraded,usebackuproot /dev/satafp1/daten /mnt/zeit mount: wrong fs type, bad option, bad superblock on /dev/mapper/satafp1-daten, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. merkaba:~#32> dmesg | tail -5 [ 5801.704202] BTRFS info (device dm-12): trying to use backup root at mount time [ 5801.704206] BTRFS info (device dm-12): disk space caching is enabled [ 5801.704208] BTRFS info (device dm-12): has skinny extents [ 5803.928059] BTRFS warning (device dm-12): missing devices (1) exceeds the limit (0), writeable mount is not allowed [ 5804.064638] BTRFS: open_ctree failed `btrfs check` reports: merkaba:~#32> btrfs check /dev/satafp1/backup warning, device 2 is missing Checking filesystem on /dev/satafp1/backup UUID: 01cf0493-476f-42e8-8905-61ef205313db checking extents checking free space cache failed to load free space cache for block group 58003030016 failed to load free space cache for block group 60150513664 failed to load free space cache for block group 62297997312 […] checking fs roots ^C I aborted it at this time as I wanted to try clear_cache mount option after seeing this. I can redo this thing after btrfs restore completed. merkaba:~> mount -o degraded,clear_cache /dev/satafp1/backup /mnt/zeit mount: Falscher Dateisystemtyp, ungültige Optionen, der Superblock von /dev/mapper/satafp1-backup ist beschädigt, fehlende Kodierungsseite oder ein anderer Fehler Manchmal liefert das Systemprotokoll wertvolle Informationen – versuchen Sie dmesg | tail oder ähnlich merkaba:~#32> dmesg | tail -6 [ 3080.120687] BTRFS info (device dm-13): allowing degraded mounts [ 3080.120699] BTRFS info (device dm-13): force clearing of disk cache [ 3080.120703] BTRFS info (device dm-13): disk space caching is enabled [ 3080.120706] BTRFS info (device dm-13): has skinny extents [ 3080.150