Re: Btrfs partition mount error
On 2019/9/30 下午2:58, Andrey Ivanov wrote: > On 30.09.2019 9:27, Qu Wenruo wrote: >>> This is the kernel message for /dev/sda4. It is also have some problems, >>> but it is successfully mounted. I can't mount /dev/sdc1. >>> >>> Kernel message for /dev/sdc1: >>> [ 6.503265] BTRFS info (device sdc1): disk space caching is enabled >>> [ 6.503266] BTRFS info (device sdc1): has skinny extents >>> [ 8.890135] BTRFS critical (device sdc1): corrupt leaf: root=2 >>> block=855738744832 slot=20, unexpected item end, have 15287 expect 15223 >>> [ 8.899635] BTRFS critical (device sdc1): corrupt leaf: root=2 >>> block=855738744832 slot=20, unexpected item end, have 15287 expect 15223 >> >> This is from extent tree. >> >> Please provide the following dump: >> >> # btrfs ins dump-tree -b 855738744832 /dev/sdc1 > > attached btrfs-ins-dump-tree-855738744832-sdc1.output OK, tree-checker is again detecting the problem correctly. item 19 key (613039370240 EXTENT_ITEM 1048576) itemoff 15223 itemsize 53 refs 1 gen 52116 flags DATA extent data backref root FS_TREE objectid 5841996 offset 607125504 count 1 item 20 key (613040418816 EXTENT_ITEM 1032192) itemoff 15170 itemsize 117 refs 1 gen 52162 flags DATA extent data backref root FS_TREE objectid 5842000 offset 927858688 count 1 Item 20 should be a regular single reference, but its size is 117. But please check this: 117 = 0x75 53 = 0x35 The difference is 0x40, which is a single bit. And if itemsize for slot 20 is regular 53, then it matches its neighbors without any problem. So at least to me, this looks like a bitflip. Although I'm not sure if it's btrfs itself causing the problem or it's the hardware or even 3rd party kernel modules to blame. For this one, I could help you by just reverting that bit, and then you may be able to continue mounting the fs or at least run btrfs check on it. Please prepare an environment to compile btrfs-progs (at least btrfs-corrupt-block) if you want to try. > > [[EXTRA INFO]] Please provide the following dump of that tree block by: # btrfs ins dump-tree -b 134079905792 /dev/sda4 >>> >>> attached btrfs-ins-dump-tree-134079905792.output >> >> Confirmed. >> >> It looks like the data_len got some bitflips. >> >> In that offending leaf, there is only two data_len and all are one bit >> flipped. >> Considering that directory is not that old, it looks like some memory >> bit flip. >> >> It's recommended to do a memtest to ensure it's not memory causing the >> problem. > > I did a memtest earlier. All had passed without errors. I can do a > memtest again, > but it seems to me that if the memory was faulty, the system would not > be stable > and often hung, but the system works fine. Indeed, especially considering that there are already two bitflips in one leaf, which should be pretty rare. Any out-of-tree kernel modules? Or would you like to try v5.2.15 kernel, which has a self detection for such problem. Thanks, Qu > > >> I recommend to do a "btrfs check" on all fses. > > Can't do check on /dev/sdc1: > > # btrfs check /dev/sdc1 > Opening filesystem to check... > incorrect offsets 15223 15287 > ERROR: cannot open file system signature.asc Description: OpenPGP digital signature
Re: Btrfs partition mount error
On 30.09.2019 10:31, Qu Wenruo wrote: For this one, I could help you by just reverting that bit, and then you may be able to continue mounting the fs or at least run btrfs check on it. Please prepare an environment to compile btrfs-progs (at least btrfs-corrupt-block) if you want to try. Great, I'm ready to do it. Environment is ready. I did a memtest earlier. All had passed without errors. I can do a memtest again, but it seems to me that if the memory was faulty, the system would not be stable and often hung, but the system works fine. Indeed, especially considering that there are already two bitflips in one leaf, which should be pretty rare. Any out-of-tree kernel modules? I only have vmware out-of-tree kernel modules. Or would you like to try v5.2.15 kernel, which has a self detection for such problem. If I understand correctly, if I install v5.2.15 or later kernel then I'll fix my /dev/sda4?
Re: Btrfs partition mount error
On 2019/9/30 下午4:00, Andrey Ivanov wrote: > On 30.09.2019 10:31, Qu Wenruo wrote: >> For this one, I could help you by just reverting that bit, and then you >> may be able to continue mounting the fs or at least run btrfs check on >> it. >> >> Please prepare an environment to compile btrfs-progs (at least >> btrfs-corrupt-block) if you want to try. > > Great, I'm ready to do it. Environment is ready. > > >>> I did a memtest earlier. All had passed without errors. I can do a >>> memtest again, >>> but it seems to me that if the memory was faulty, the system would not >>> be stable >>> and often hung, but the system works fine. >> >> Indeed, especially considering that there are already two bitflips in >> one leaf, which should be pretty rare. >> >> Any out-of-tree kernel modules? > > I only have vmware out-of-tree kernel modules. > > >> Or would you like to try v5.2.15 kernel, >> which has a self detection for such problem. > > If I understand correctly, if I install v5.2.15 or later kernel > then I'll fix my /dev/sda4? Nope, v5.2.15 is only to detect such problem earlier and prevent such bad data to reach disk. Thanks, Qu signature.asc Description: OpenPGP digital signature
[PATCH] fstests: btrfs: Add regression test to check if btrfs can handle high devid
Add a regression test to check if btrfs can handle high devid. The test will add and remove devices to a btrfs fs, so that the devid will increase to uncommon but still valid values. The regression is introduced by kernel commit ab4ba2e13346 ("btrfs: tree-checker: Verify dev item"). The fix is titled "btrfs: tree-checker: Fix wrong check on max devid". Signed-off-by: Qu Wenruo --- tests/btrfs/194 | 73 + tests/btrfs/194.out | 2 ++ tests/btrfs/group | 1 + 3 files changed, 76 insertions(+) create mode 100755 tests/btrfs/194 create mode 100644 tests/btrfs/194.out diff --git a/tests/btrfs/194 b/tests/btrfs/194 new file mode 100755 index ..7a52ed86 --- /dev/null +++ b/tests/btrfs/194 @@ -0,0 +1,73 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved. +# +# FS QA Test 194 +# +# Test if btrfs can handle large device ids. +# +# The regression is introduced by kernel commit ab4ba2e13346 ("btrfs: +# tree-checker: Verify dev item"). +# The fix is titlted: "btrfs: tree-checker: Fix wrong check on max devid" +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch_dev_pool 2 +_scratch_dev_pool_get 2 + +# The wrong check limit is based on node size, to reduce runtime, we only +# support 4K page size system for now +if [ $(get_page_size) != 4096 ]; then + _notrun "This test need 4k page size" +fi + +device_1=$(echo $SCRATCH_DEV_POOL | awk '{print $1}') +device_2=$(echo $SCRATCH_DEV_POOL | awk '{print $2}') + +echo device_1=$device_1 device_2=$device_2 >> $seqres.full + +# Use 4K nodesize to reduce runtime +_scratch_mkfs -n 4k >> $seqres.full +_scratch_mount + +# Add and remove device in a loop, one loop will increase devid by 2. +# for 4k nodesize, the wrong check will be triggered at devid 123. +# here 64 is enough to trigger such regression +for (( i = 0; i < 64; i++ )); do + $BTRFS_UTIL_PROG device add -f $device_2 $SCRATCH_MNT + $BTRFS_UTIL_PROG device del $device_1 $SCRATCH_MNT + $BTRFS_UTIL_PROG device add -f $device_1 $SCRATCH_MNT + $BTRFS_UTIL_PROG device del $device_2 $SCRATCH_MNT +done +_scratch_dev_pool_put + +echo "Silence is golden" + +# success, all done +status=0 +exit diff --git a/tests/btrfs/194.out b/tests/btrfs/194.out new file mode 100644 index ..7bfd50ff --- /dev/null +++ b/tests/btrfs/194.out @@ -0,0 +1,2 @@ +QA output created by 194 +Silence is golden diff --git a/tests/btrfs/group b/tests/btrfs/group index b92cb12c..ef1f0e1b 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -196,3 +196,4 @@ 191 auto quick send dedupe 192 auto replay snapshot stress 193 auto quick qgroup enospc limit +194 auto -- 2.22.0
Re: [PATCH 2/3] btrfs-progs: check/original: Add check and repair for invalid inode generation
On 24.09.19 г. 11:11 ч., Qu Wenruo wrote: > There are at least two bug reports of kernel tree-checker complaining > about invalid inode generation. > > All offending inodes seem to be caused by old kernel around 2014, with > inode generation overflow. > > So add such check and repair ability to lowmem mode check first. > > This involves: > - Calculate the inode generation upper limit > Unlike the original mode context, we don't have anyway to determine if Perhaps you meant "lowmem mode context" since in lowmem you deduce whether it's a log-tree inode or not. > this inode belongs to log tree. > So we use super_generation + 1 as upper limit, just like what we did > in kernel tree checker. > > - Check if the inode generation is larger than the upper limit > > - Repair by resetting inode generation to current transaction > generation > The difference is, in original mode, we have a common trans handle for > all repair and reset path for each repair. > > Reported-by: Charles Wright > Signed-off-by: Qu Wenruo I tested that code with the image provided and it does work as expected as well as it's fairly obvious what's happening. So: Reviewed-by: Nikolay Borisov Tested-by: Nikolay Borisov > --- > check/main.c | 50 ++- > check/mode-original.h | 1 + > 2 files changed, 50 insertions(+), 1 deletion(-) > > diff --git a/check/main.c b/check/main.c > index c2d0f394..018707c8 100644 > --- a/check/main.c > +++ b/check/main.c > @@ -604,6 +604,8 @@ static void print_inode_error(struct btrfs_root *root, > struct inode_record *rec) > if (errors & I_ERR_INVALID_IMODE) > fprintf(stderr, ", invalid inode mode bit 0%o", > rec->imode & ~0); > + if (errors & I_ERR_INVALID_GEN) > + fprintf(stderr, ", invalid inode generation"); > fprintf(stderr, "\n"); > > /* Print the holes if needed */ > @@ -857,6 +859,7 @@ static int process_inode_item(struct extent_buffer *eb, > { > struct inode_record *rec; > struct btrfs_inode_item *item; > + u64 gen_uplimit; > u64 flags; > > rec = active_node->current; > @@ -879,6 +882,13 @@ static int process_inode_item(struct extent_buffer *eb, > if (S_ISLNK(rec->imode) && > flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND)) > rec->errors |= I_ERR_ODD_INODE_FLAGS; > + /* > + * We don't have accurate root info to determine the correct > + * inode generation uplimit, use super_generation + 1 anyway > + */ > + gen_uplimit = btrfs_super_generation(global_info->super_copy) + 1; > + if (btrfs_inode_generation(eb, item) > gen_uplimit) > + rec->errors |= I_ERR_INVALID_GEN; > maybe_free_inode_rec(&active_node->inode_cache, rec); > return 0; > } > @@ -2774,6 +2784,41 @@ static int repair_imode_original(struct > btrfs_trans_handle *trans, > return ret; > } > > +static int repair_inode_gen_original(struct btrfs_trans_handle *trans, > + struct btrfs_root *root, > + struct btrfs_path *path, > + struct inode_record *rec) > +{ > + struct btrfs_inode_item *ii; > + struct btrfs_key key; > + int ret; > + > + key.objectid = rec->ino; > + key.offset = 0; > + key.type = BTRFS_INODE_ITEM_KEY; > + > + ret = btrfs_search_slot(trans, root, &key, path, 0, 1); > + if (ret > 0) { > + ret = -ENOENT; > + error("no inode item found for ino %llu", rec->ino); > + return ret; > + } > + if (ret < 0) { > + errno = -ret; > + error("failed to search inode item for ino %llu: %m", rec->ino); > + return ret; > + } > + ii = btrfs_item_ptr(path->nodes[0], path->slots[0], > + struct btrfs_inode_item); > + btrfs_set_inode_generation(path->nodes[0], ii, trans->transid); > + btrfs_mark_buffer_dirty(path->nodes[0]); > + btrfs_release_path(path); > + printf("resetting inode generation to %llu for ino %llu\n", > + trans->transid, rec->ino); > + rec->errors &= ~I_ERR_INVALID_GEN; > + return 0; > +} > + > static int try_repair_inode(struct btrfs_root *root, struct inode_record > *rec) > { > struct btrfs_trans_handle *trans; > @@ -2794,7 +2839,8 @@ static int try_repair_inode(struct btrfs_root *root, > struct inode_record *rec) >I_ERR_FILE_NBYTES_WRONG | >I_ERR_INLINE_RAM_BYTES_WRONG | >I_ERR_MISMATCH_DIR_HASH | > - I_ERR_UNALIGNED_EXTENT_REC))) > + I_ERR_UNALIGNED_EXTENT_REC | > + I_ERR_INVALID_GEN))) > return rec->errors; > > /* > @@ -2829,6 +2875,8 @@ static int try_repair_inode(struct btrfs_root *root, > struct
Re: Btrfs partition mount error
On 2019/9/30 下午4:00, Andrey Ivanov wrote: > On 30.09.2019 10:31, Qu Wenruo wrote: >> For this one, I could help you by just reverting that bit, and then you >> may be able to continue mounting the fs or at least run btrfs check on >> it. >> >> Please prepare an environment to compile btrfs-progs (at least >> btrfs-corrupt-block) if you want to try. > > Great, I'm ready to do it. Environment is ready. Here it is. https://github.com/adam900710/btrfs-progs/tree/dirty_fix To use it: $ ./btrfs-corrupt-block -X /dev/sdc1 But please keep in mind that, the offending tree block has more problems than I originally thought: - Bad item size at slot 20 The original one reported by kernel. - Bad key offset at slot 9 - Bad item size at slot 41 All bitflips. So all 3 bit flips happened in one leaf, it's really too rare. I really don't have any clue how this could happen. Anyway, I don't think the dirty fix is enough considering how many strange things I have found just in one leaf. Thanks, Qu > > >>> I did a memtest earlier. All had passed without errors. I can do a >>> memtest again, >>> but it seems to me that if the memory was faulty, the system would not >>> be stable >>> and often hung, but the system works fine. >> >> Indeed, especially considering that there are already two bitflips in >> one leaf, which should be pretty rare. >> >> Any out-of-tree kernel modules? > > I only have vmware out-of-tree kernel modules. > > >> Or would you like to try v5.2.15 kernel, >> which has a self detection for such problem. > > If I understand correctly, if I install v5.2.15 or later kernel > then I'll fix my /dev/sda4? signature.asc Description: OpenPGP digital signature
Re: [PATCH 2/3] btrfs-progs: check/original: Add check and repair for invalid inode generation
On 2019/9/30 下午4:41, Nikolay Borisov wrote: > > > On 24.09.19 г. 11:11 ч., Qu Wenruo wrote: >> There are at least two bug reports of kernel tree-checker complaining >> about invalid inode generation. >> >> All offending inodes seem to be caused by old kernel around 2014, with >> inode generation overflow. >> >> So add such check and repair ability to lowmem mode check first. >> >> This involves: >> - Calculate the inode generation upper limit >> Unlike the original mode context, we don't have anyway to determine if > > Perhaps you meant "lowmem mode context" since in lowmem you deduce > whether it's a log-tree inode or not. Oh, right. Would you mind to fix this in commit time, David? > >> this inode belongs to log tree. >> So we use super_generation + 1 as upper limit, just like what we did >> in kernel tree checker. >> >> - Check if the inode generation is larger than the upper limit >> >> - Repair by resetting inode generation to current transaction >> generation >> The difference is, in original mode, we have a common trans handle for >> all repair and reset path for each repair. >> >> Reported-by: Charles Wright >> Signed-off-by: Qu Wenruo > > I tested that code with the image provided and it does work as expected > as well as it's fairly obvious what's happening. So: > > Reviewed-by: Nikolay Borisov > Tested-by: Nikolay Borisov Thanks for the review and test. Thanks, Qu > >> --- >> check/main.c | 50 ++- >> check/mode-original.h | 1 + >> 2 files changed, 50 insertions(+), 1 deletion(-) >> >> diff --git a/check/main.c b/check/main.c >> index c2d0f394..018707c8 100644 >> --- a/check/main.c >> +++ b/check/main.c >> @@ -604,6 +604,8 @@ static void print_inode_error(struct btrfs_root *root, >> struct inode_record *rec) >> if (errors & I_ERR_INVALID_IMODE) >> fprintf(stderr, ", invalid inode mode bit 0%o", >> rec->imode & ~0); >> +if (errors & I_ERR_INVALID_GEN) >> +fprintf(stderr, ", invalid inode generation"); >> fprintf(stderr, "\n"); >> >> /* Print the holes if needed */ >> @@ -857,6 +859,7 @@ static int process_inode_item(struct extent_buffer *eb, >> { >> struct inode_record *rec; >> struct btrfs_inode_item *item; >> +u64 gen_uplimit; >> u64 flags; >> >> rec = active_node->current; >> @@ -879,6 +882,13 @@ static int process_inode_item(struct extent_buffer *eb, >> if (S_ISLNK(rec->imode) && >> flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND)) >> rec->errors |= I_ERR_ODD_INODE_FLAGS; >> +/* >> + * We don't have accurate root info to determine the correct >> + * inode generation uplimit, use super_generation + 1 anyway >> + */ >> +gen_uplimit = btrfs_super_generation(global_info->super_copy) + 1; >> +if (btrfs_inode_generation(eb, item) > gen_uplimit) >> +rec->errors |= I_ERR_INVALID_GEN; >> maybe_free_inode_rec(&active_node->inode_cache, rec); >> return 0; >> } >> @@ -2774,6 +2784,41 @@ static int repair_imode_original(struct >> btrfs_trans_handle *trans, >> return ret; >> } >> >> +static int repair_inode_gen_original(struct btrfs_trans_handle *trans, >> + struct btrfs_root *root, >> + struct btrfs_path *path, >> + struct inode_record *rec) >> +{ >> +struct btrfs_inode_item *ii; >> +struct btrfs_key key; >> +int ret; >> + >> +key.objectid = rec->ino; >> +key.offset = 0; >> +key.type = BTRFS_INODE_ITEM_KEY; >> + >> +ret = btrfs_search_slot(trans, root, &key, path, 0, 1); >> +if (ret > 0) { >> +ret = -ENOENT; >> +error("no inode item found for ino %llu", rec->ino); >> +return ret; >> +} >> +if (ret < 0) { >> +errno = -ret; >> +error("failed to search inode item for ino %llu: %m", rec->ino); >> +return ret; >> +} >> +ii = btrfs_item_ptr(path->nodes[0], path->slots[0], >> +struct btrfs_inode_item); >> +btrfs_set_inode_generation(path->nodes[0], ii, trans->transid); >> +btrfs_mark_buffer_dirty(path->nodes[0]); >> +btrfs_release_path(path); >> +printf("resetting inode generation to %llu for ino %llu\n", >> +trans->transid, rec->ino); >> +rec->errors &= ~I_ERR_INVALID_GEN; >> +return 0; >> +} >> + >> static int try_repair_inode(struct btrfs_root *root, struct inode_record >> *rec) >> { >> struct btrfs_trans_handle *trans; >> @@ -2794,7 +2839,8 @@ static int try_repair_inode(struct btrfs_root *root, >> struct inode_record *rec) >> I_ERR_FILE_NBYTES_WRONG | >> I_ERR_INLINE_RAM_BYTES_WRONG | >> I_ERR_MISMATCH_DIR_HASH | >> - I_ERR_UNALIGNED_EXTENT_REC))) >> +
[PATCH] Btrfs: fix memory leak due to concurrent append writes with fiemap
From: Filipe Manana When we have a buffered write that starts at an offset greater than or equals to the file's size happening concurrently with a full ranged fiemap, we can end up leaking an extent state structure. Suppose we have a file with a size of 1Mb, and before the buffered write and fiemap are performed, it has a single extent state in its io tree representing the range from 0 to 1Mb, with the EXTENT_DELALLOC bit set. The following sequence diagram shows how the memory leak happens if a fiemap a buffered write, starting at offset 1Mb and with a length of 4Kb, are performed concurrently. CPU 1 CPU 2 extent_fiemap() --> it's a full ranged fiemap range from 0 to LLONG_MAX - 1 (9223372036854775807) --> locks range in the inode's io tree --> after this we have 2 extent states in the io tree: --> 1 for range [0, 1Mb[ with the bits EXTENT_LOCKED and EXTENT_DELALLOC_BITS set --> 1 for the range [1Mb, LLONG_MAX[ with the EXTENT_LOCKED bit set --> start buffered write at offset 1Mb with a length of 4Kb btrfs_file_write_iter() btrfs_buffered_write() --> cached_state is NULL lock_and_cleanup_extent_if_need() --> returns 0 and does not lock range because it starts at current i_size / eof --> cached_state remains NULL btrfs_dirty_pages() btrfs_set_extent_delalloc() (...) __set_extent_bit() --> splits extent state for range [1Mb, LLONG_MAX[ and now we have 2 extent states: --> one for the range [1Mb, 1Mb + 4Kb[ with EXTENT_LOCKED set --> another one for the range [1Mb + 4Kb, LLONG_MAX[ with EXTENT_LOCKED set as well --> sets EXTENT_DELALLOC on the extent state for the range [1Mb, 1Mb + 4Kb[ --> caches extent state [1Mb, 1Mb + 4Kb[ into @cached_state because it has the bit EXTENT_LOCKED set --> btrfs_buffered_write() ends up with a non-NULL cached_state and never calls anything to release its reference on it, resulting in a memory leak Fix this by calling free_extent_state() on cached_state if the range was not locked by lock_and_cleanup_extent_if_need(). The same issue can happen if anything else other than fiemap locks a range that covers eof and beyond. This could be triggered, sporadically, by test case generic/561 from the fstests suite, which makes duperemove run concurrently with fsstress, and duperemove does plenty of calls to fiemap. When CONFIG_BTRFS_DEBUG is set the leak is reported in dmesg/syslog when removing the btrfs module with a message like the following: [77100.039461] BTRFS: state leak: start 6574080 end 6582271 state 16402 in tree 0 refs 1 Otherwise (CONFIG_BTRFS_DEBUG not set) detectable with kmemleak. CC: sta...@vger.kernel.org # 4.16+ Signed-off-by: Filipe Manana --- fs/btrfs/file.c | 13 - 1 file changed, 12
RE: BTRFS checksum mismatch - false positives
Hi Christ, Thank you very much for your help and insight. Has helped me a lot understanding the setup with BTRFS on top of SHR. I will rebuild the whole thing from scratch, as you recommend, since it seems only to get worse. The file mapping tool from Synology (synodataverifier) used on the files with checksum errors indicate that the corrupted blocks are on two different discs (out of 8), one quite new (3 TB drive) and one older (500 GB drive). Having two partially defect drives on a RAID 5'ish system of course will create a lot of trouble. Only wonder why none of them have any errors on S.M.A.R.T. (also extended tests) at all - and that Synology did not report any read errors. I'm halfway through my second full backup of all files in addition to my cloud backup, so then I'll toss the two suspicious drives and build a new system and RAID (RAID 6 / SHR 2) with two parity disks instead of just one. BTW, Do you have any idea of when the fixed RAID 5/6 BTRFS functionality might become mainstream, so we can abandon MD and simplify the stack to purely BTRFS? Thanks again Hoegge -Original Message- From: Chris Murphy Sent: 2019-09-26 22:28 To: hoe...@gmail.com Cc: Chris Murphy ; Btrfs BTRFS Subject: Re: BTRFS checksum mismatch - false positives >From the log offlist 2019-09-08T17:27:02+02:00 MHPNAS kernel: [ 22.396165] md: invalid raid superblock magic on sda5 2019-09-08T17:27:02+02:00 MHPNAS kernel: [ 22.401816] md: sda5 does not have a valid v0.90 superblock, not importing! That doesn't sound good. It's not a Btrfs problem but a md/mdadm problem. You'll have to get support for this from Synology, only they understand the design of the storage stack layout and whether these error messages are important or not and how to fix them. Anyone else speculating could end up causing damage to the NAS and data to be lost. 2019-09-08T17:27:02+02:00 MHPNAS kernel: [ 22.913298] md: sda2 has different UUID to sda1 There are several messages like this. I can't tell if they're just informational and benign or a problem. Also not related to Btrfs. 2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.419199] BTRFS warning (device dm-1): BTRFS: dm-1 checksum verify failed on 375259512832 wanted EA1A10E3 found 3080B64F level 0 2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.419199] 2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.458453] BTRFS warning (device dm-1): BTRFS: dm-1 checksum verify failed on 375259512832 wanted EA1A10E3 found 3080B64F level 0 2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.458453] 2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.528385] BTRFS: read error corrected: ino 1 off 375259512832 (dev /dev/vg1/volume_1 sector 751819488) 2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.539631] BTRFS: read error corrected: ino 1 off 375259516928 (dev /dev/vg1/volume_1 sector 751819496) 2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.550785] BTRFS: read error corrected: ino 1 off 375259521024 (dev /dev/vg1/volume_1 sector 751819504) 2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.561990] BTRFS: read error corrected: ino 1 off 375259525120 (dev /dev/vg1/volume_1 sector 751819512) There are a bunch of messages like this. Btrfs is finding metadata checksum errors, some kind of corruption has happened with one of the copies, and it's been fixed up. But why are things being corrupt in the first place? Ordinary bad sectors maybe? There's a lot of these - like really a lot. Hundreds of affected sectors. There are too many for me to read through and see if all of them were corrected by DUP metadata. 2019-09-22T21:24:27+02:00 MHPNAS kernel: [1224856.764098] md2: syno_self_heal_is_valid_md_stat(496): md's current state is not suitable for data correction What does that mean? Also not a Btrfs problem. There are quite a few of these. 2019-09-23T11:49:20+02:00 MHPNAS kernel: [1276791.652946] BTRFS error (device dm-1): BTRFS: dm-1 failed to repair btree csum error on 1353162506240, mirror = 1 OK and a few of these also. This means that some metadata could not be repaired, likely because both copies are corrupt. My recommendation is to freshen your backups now while you still can, and prepare to rebuild the NAS. i.e. these are not likely repairable problems. Once both copies of Btrfs metadata are bad, it's usually not fixable you just have to recreate the file system from scratch. You'll have to move everything off the NAS - and anything that's really important you will want at least two independent copies of, of course, and then you're going to obliterate the array and start from scratch. While you're at it, you might as well make sure you've got the latest supported version of the software for this product. Start with that. Then follow the Synology procedure to wipe the NAS totally and set it up again. You'll want to make sure the procedure you use writes out all new metadata for everything: mdadm, lvm, Btrfs. Nothi
Re: BTRFS Raid5 error during Scrub.
I've upgraded to btrfs-progs v5.2.1 Here is the output from btrfs check -p --readonly /dev/sda Opening filesystem to check... Checking filesystem on /dev/sda UUID: f7573191-664f-4540-a830-71ad654d9301 [1/7] checking root items (0:01:17 elapsed, 5138533 items checked) parent transid verify failed on 48781340082176 wanted 109181 found 109008items checked) parent transid verify failed on 48781340082176 wanted 109181 found 109008 parent transid verify failed on 48781340082176 wanted 109181 found 109008 Ignoring transid failure leaf parent key incorrect 48781340082176 bad block 48781340082176 [2/7] checking extents (0:03:22 elapsed, 1143429 items checked) ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space cache(0:05:10 elapsed, 7236 items checked) parent transid verify failed on 48781340082176 wanted 109181 found 109008ems checked) Ignoring transid failure root 15197 inode 81781 errors 1000, some csum missing48 elapsed, 33952 items checked) [4/7] checking fs roots(0:42:53 elapsed, 34145 items checked) ERROR: errors found in fs roots found 22975533985792 bytes used, error(s) found total csum bytes: 16806711120 total tree bytes: 18733842432 total fs tree bytes: 130121728 total extent tree bytes: 466305024 btree space waste bytes: 1100711497 file data blocks allocated: 3891333279744 referenced 1669470507008 Am Montag, den 30.09.2019, 09:01 +0300 schrieb Nikolay Borisov: > > On 30.09.19 г. 0:38 ч., Robert Krig wrote: > > Hi guys. First off, I've got backups so no worries there. I'm just > > trying to understand what's happening and which files are affected. > > I've got a scrub running and the kernel dmesg buffer spit out the > > following: > > > > BTRFS warning (device sda): checksum/header error at logical > > 48781340082176 on dev /dev/sdb, physical 5651115966464: metadata > > leaf > > (level 0) in tree 7 > > This indicates you have corrupted checksum tree blocks. Concretely > the > checksum in the header does not match the data in the block. > > > BTRFS warning (device sda): checksum/header error at logical > > 48781340082176 on dev /dev/sdb, physical 5651115966464: metadata > > leaf > > (level 0) in tree 7 > > BTRFS error (device sda): bdev /dev/sdb errs: wr 0, rd 0, flush 0, > > corrupt 0, gen 1 > > BTRFS error (device sda): unable to fixup (regular) error at > > logical > > 48781340082176 on dev /dev/sdb > > BTRFS warning (device sda): checksum/header error at logical > > 48781340082176 on dev /dev/sdc, physical 5651115966464: metadata > > leaf > > (level 0) in tree 7 > > BTRFS warning (device sda): checksum/header error at logical > > 48781340082176 on dev /dev/sdc, physical 5651115966464: metadata > > leaf > > (level 0) in tree 7 > > BTRFS error (device sda): bdev /dev/sdc errs: wr 0, rd 0, flush 0, > > corrupt 0, gen 1 > > BTRFS error (device sda): unable to fixup (regular) error at > > logical > > 48781340082176 on dev /dev/sdc > > > > Is there any way I can find out which files are affected so that I > > can > > just restore them from a backup? > > > > I'm running Debian Buster with Kernel 5.2. > > There was a known corruption issue in 5.2 kernel up to v5.2.15. > However > it could result in partial writes of transaction data whereas your > report seems to indicate data has been written corrupted on-disk. > > As a first step run : > > btrfs check repair --readonly > > (this is a read only command so it cannot do further damage) and post > the log to see what else btrfs check finds. > > > Btrfs-progs v4.20.1 > > > > > > Let me know if you need any further info. > > > >
Re: [BTRFS Raid5 error during Scrub.
On 29/09/2019 22:38, Robert Krig wrote: > I'm running Debian Buster with Kernel 5.2. > Btrfs-progs v4.20.1 I am running Debian testing (bullseye) and have chosen not to install the 5.2 kernel yet because the version of it in bullseye (linux-image-5.2.0-2-amd64) is based on 5.2.9 and (as far as I can tell) does not contain the BTRFS corruption fix. I do not know which version of the 5.2 kernel is in buster but you may want to check that it contains a backport of the BTRFS fix or downgrade to the 4.19 kernel until you can be sure. I note that linux-image-5.2.0-3-amd64 is in unstable and is based on 5.2.17 so should have the fix. I presume it will make its way to testing soon. If anyone can confirm which versions of the Debian kernel package the 5.2 corruption fixes are in, it would be helpful.
Re: Btrfs partition mount error
On 30.09.2019 11:59, Qu Wenruo wrote: Please prepare an environment to compile btrfs-progs (at least btrfs-corrupt-block) if you want to try. Great, I'm ready to do it. Environment is ready. Here it is. https://github.com/adam900710/btrfs-progs/tree/dirty_fix To use it: $ ./btrfs-corrupt-block -X /dev/sdc1 I had built and run it: # /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block -X /dev/sdc1 incorrect offsets 15223 15287 Open ctree failed
Re: Btrfs partition mount error
On 2019/9/30 下午6:01, Andrey Ivanov wrote: > On 30.09.2019 11:59, Qu Wenruo wrote: Please prepare an environment to compile btrfs-progs (at least btrfs-corrupt-block) if you want to try. >>> >>> Great, I'm ready to do it. Environment is ready. >> >> Here it is. >> >> https://github.com/adam900710/btrfs-progs/tree/dirty_fix >> >> To use it: >> >> $ ./btrfs-corrupt-block -X /dev/sdc1 > > I had built and run it: > > # /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block > -X /dev/sdc1 > incorrect offsets 15223 15287 > Open ctree failed That branch updated to skip the item offset check completely. But please keep in mind that, that check itself still makes sense, so please use the original "btrfs check" to check the fs. Thanks, Qu signature.asc Description: OpenPGP digital signature
Re: [PATCH] fstests: btrfs: Add regression test to check if btrfs can handle high devid
On 9/30/19 4:37 PM, Qu Wenruo wrote: Add a regression test to check if btrfs can handle high devid. The test will add and remove devices to a btrfs fs, so that the devid will increase to uncommon but still valid values. The regression is introduced by kernel commit ab4ba2e13346 ("btrfs: tree-checker: Verify dev item"). The fix is titled "btrfs: tree-checker: Fix wrong check on max devid". Signed-off-by: Qu Wenruo Suggested-by: Anand Jain Reviewed-by: Anand Jain one nit below. --- tests/btrfs/194 | 73 + tests/btrfs/194.out | 2 ++ tests/btrfs/group | 1 + 3 files changed, 76 insertions(+) create mode 100755 tests/btrfs/194 create mode 100644 tests/btrfs/194.out diff --git a/tests/btrfs/194 b/tests/btrfs/194 new file mode 100755 index ..7a52ed86 --- /dev/null +++ b/tests/btrfs/194 @@ -0,0 +1,73 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved. +# +# FS QA Test 194 +# +# Test if btrfs can handle large device ids. +# +# The regression is introduced by kernel commit ab4ba2e13346 ("btrfs: +# tree-checker: Verify dev item"). +# The fix is titlted: "btrfs: tree-checker: Fix wrong check on max devid" +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch_dev_pool 2 +_scratch_dev_pool_get 2 + +# The wrong check limit is based on node size, to reduce runtime, we only +# support 4K page size system for now +if [ $(get_page_size) != 4096 ]; then + _notrun "This test need 4k page size" +fi + +device_1=$(echo $SCRATCH_DEV_POOL | awk '{print $1}') +device_2=$(echo $SCRATCH_DEV_POOL | awk '{print $2}') + +echo device_1=$device_1 device_2=$device_2 >> $seqres.full + +# Use 4K nodesize to reduce runtime +_scratch_mkfs -n 4k >> $seqres.full This init part should be _scratch_mkfs on only one device right? Thanks, Anand +_scratch_mount + +# Add and remove device in a loop, one loop will increase devid by 2. +# for 4k nodesize, the wrong check will be triggered at devid 123. +# here 64 is enough to trigger such regression +for (( i = 0; i < 64; i++ )); do + $BTRFS_UTIL_PROG device add -f $device_2 $SCRATCH_MNT + $BTRFS_UTIL_PROG device del $device_1 $SCRATCH_MNT + $BTRFS_UTIL_PROG device add -f $device_1 $SCRATCH_MNT + $BTRFS_UTIL_PROG device del $device_2 $SCRATCH_MNT +done +_scratch_dev_pool_put + +echo "Silence is golden" + +# success, all done +status=0 +exit diff --git a/tests/btrfs/194.out b/tests/btrfs/194.out new file mode 100644 index ..7bfd50ff --- /dev/null +++ b/tests/btrfs/194.out @@ -0,0 +1,2 @@ +QA output created by 194 +Silence is golden diff --git a/tests/btrfs/group b/tests/btrfs/group index b92cb12c..ef1f0e1b 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -196,3 +196,4 @@ 191 auto quick send dedupe 192 auto replay snapshot stress 193 auto quick qgroup enospc limit +194 auto
Re: Btrfs partition mount error
On 30.09.2019 13:10, Qu Wenruo wrote: I had built and run it: # /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block -X /dev/sdc1 incorrect offsets 15223 15287 Open ctree failed That branch updated to skip the item offset check completely. But please keep in mind that, that check itself still makes sense, so please use the original "btrfs check" to check the fs. # /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block -X /dev/sdc1 key (613019873280 EXTENT_ITEM 1048576)slot end outside of leaf 1073755934 > 16283 Open ctree failed # btrfs check /dev/sdc1 Opening filesystem to check... incorrect offsets 15223 15287 ERROR: cannot open file system
Re: [PATCH] fstests: btrfs: Add regression test to check if btrfs can handle high devid
On Mon, Sep 30, 2019 at 9:39 AM Qu Wenruo wrote: > > Add a regression test to check if btrfs can handle high devid. > > The test will add and remove devices to a btrfs fs, so that the devid > will increase to uncommon but still valid values. > > The regression is introduced by kernel commit ab4ba2e13346 ("btrfs: > tree-checker: Verify dev item"). > The fix is titled "btrfs: tree-checker: Fix wrong check on max devid". > > Signed-off-by: Qu Wenruo > --- > tests/btrfs/194 | 73 + > tests/btrfs/194.out | 2 ++ > tests/btrfs/group | 1 + > 3 files changed, 76 insertions(+) > create mode 100755 tests/btrfs/194 > create mode 100644 tests/btrfs/194.out > > diff --git a/tests/btrfs/194 b/tests/btrfs/194 > new file mode 100755 > index ..7a52ed86 > --- /dev/null > +++ b/tests/btrfs/194 > @@ -0,0 +1,73 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved. > +# > +# FS QA Test 194 > +# > +# Test if btrfs can handle large device ids. > +# > +# The regression is introduced by kernel commit ab4ba2e13346 ("btrfs: > +# tree-checker: Verify dev item"). > +# The fix is titlted: "btrfs: tree-checker: Fix wrong check on max devid" type, titlted -> titled > +# > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# remove previous $seqres.full before test > +rm -f $seqres.full > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs btrfs > +_supported_os Linux > +_require_scratch_dev_pool 2 > +_scratch_dev_pool_get 2 > + > +# The wrong check limit is based on node size, to reduce runtime, we only > +# support 4K page size system for now > +if [ $(get_page_size) != 4096 ]; then > + _notrun "This test need 4k page size" > +fi > + > +device_1=$(echo $SCRATCH_DEV_POOL | awk '{print $1}') > +device_2=$(echo $SCRATCH_DEV_POOL | awk '{print $2}') > + > +echo device_1=$device_1 device_2=$device_2 >> $seqres.full > + > +# Use 4K nodesize to reduce runtime How does the node size reduces runtime? It's obvious when one wants to create deeper/larger trees for some purpose, but this test doesn't populate the filesystem, it uses an empty filesystem. So that deserves an explanation of how the node size influences the test's running time. > +_scratch_mkfs -n 4k >> $seqres.full > +_scratch_mount > + > +# Add and remove device in a loop, one loop will increase devid by 2. " ... one loop will ..." -> "... each iteration will ..." > +# for 4k nodesize, the wrong check will be triggered at devid 123. Why 123? It's not clear to me why, the test case doesn't explain and neither does the btrfs patch that fixes the regression. If it's related to the value of the constants/functions BTRFS_MAX_DEVS and BTRFS_MAX_DEVS_SYS_CHUNK, it should be mentioned here. > +# here 64 is enough to trigger such regression > +for (( i = 0; i < 64; i++ )); do > + $BTRFS_UTIL_PROG device add -f $device_2 $SCRATCH_MNT > + $BTRFS_UTIL_PROG device del $device_1 $SCRATCH_MNT > + $BTRFS_UTIL_PROG device add -f $device_1 $SCRATCH_MNT > + $BTRFS_UTIL_PROG device del $device_2 $SCRATCH_MNT > +done > +_scratch_dev_pool_put > + > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/194.out b/tests/btrfs/194.out > new file mode 100644 > index ..7bfd50ff > --- /dev/null > +++ b/tests/btrfs/194.out > @@ -0,0 +1,2 @@ > +QA output created by 194 > +Silence is golden > diff --git a/tests/btrfs/group b/tests/btrfs/group > index b92cb12c..ef1f0e1b 100644 > --- a/tests/btrfs/group > +++ b/tests/btrfs/group > @@ -196,3 +196,4 @@ > 191 auto quick send dedupe > 192 auto replay snapshot stress > 193 auto quick qgroup enospc limit > +194 auto Maybe volume group as well. Thanks Qu. > -- > 2.22.0 > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”
Re: Btrfs partition mount error
On 9/30/19 12:23 PM, Andrey Ivanov wrote: > > # /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block -X > /dev/sdc1 > key (613019873280 EXTENT_ITEM 1048576)slot end outside of leaf 1073755934 > > 16283 > Open ctree failed > > [...] bin(1073755934) '0b11101110000' Your system is really flippin' bits all over the place. Hans
Re: Btrfs partition mount error
On 2019/9/30 下午6:23, Andrey Ivanov wrote: > On 30.09.2019 13:10, Qu Wenruo wrote: >>> I had built and run it: >>> >>> # /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block >>> -X /dev/sdc1 >>> incorrect offsets 15223 15287 >>> Open ctree failed >> >> That branch updated to skip the item offset check completely. >> >> But please keep in mind that, that check itself still makes sense, so >> please use the original "btrfs check" to check the fs. > > # /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block > -X /dev/sdc1 > key (613019873280 EXTENT_ITEM 1048576)slot end outside of leaf > 1073755934 > 16283 > Open ctree failed This means there are more corruptions in just that one leaf. It's really hard to fix them one by one manually. I'd recommend to go btrfs-restore or use this patchset: https://patchwork.kernel.org/project/linux-btrfs/list/?series=170715 Then mount with ro,rescue=skipbg. Either way, the bitflip is way beyond my imagination. Thanks, Qu > > # btrfs check /dev/sdc1 > Opening filesystem to check... > incorrect offsets 15223 15287 > ERROR: cannot open file system signature.asc Description: OpenPGP digital signature
Re: [PATCH] fstests: btrfs: Add regression test to check if btrfs can handle high devid
On 2019/9/30 下午6:33, Filipe Manana wrote: > On Mon, Sep 30, 2019 at 9:39 AM Qu Wenruo wrote: >> >> Add a regression test to check if btrfs can handle high devid. >> >> The test will add and remove devices to a btrfs fs, so that the devid >> will increase to uncommon but still valid values. >> >> The regression is introduced by kernel commit ab4ba2e13346 ("btrfs: >> tree-checker: Verify dev item"). >> The fix is titled "btrfs: tree-checker: Fix wrong check on max devid". >> >> Signed-off-by: Qu Wenruo >> --- >> tests/btrfs/194 | 73 + >> tests/btrfs/194.out | 2 ++ >> tests/btrfs/group | 1 + >> 3 files changed, 76 insertions(+) >> create mode 100755 tests/btrfs/194 >> create mode 100644 tests/btrfs/194.out >> >> diff --git a/tests/btrfs/194 b/tests/btrfs/194 >> new file mode 100755 >> index ..7a52ed86 >> --- /dev/null >> +++ b/tests/btrfs/194 >> @@ -0,0 +1,73 @@ >> +#! /bin/bash >> +# SPDX-License-Identifier: GPL-2.0 >> +# Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved. >> +# >> +# FS QA Test 194 >> +# >> +# Test if btrfs can handle large device ids. >> +# >> +# The regression is introduced by kernel commit ab4ba2e13346 ("btrfs: >> +# tree-checker: Verify dev item"). >> +# The fix is titlted: "btrfs: tree-checker: Fix wrong check on max devid" > > type, titlted -> titled > >> +# >> +seq=`basename $0` >> +seqres=$RESULT_DIR/$seq >> +echo "QA output created by $seq" >> + >> +here=`pwd` >> +tmp=/tmp/$$ >> +status=1 # failure is the default! >> +trap "_cleanup; exit \$status" 0 1 2 3 15 >> + >> +_cleanup() >> +{ >> + cd / >> + rm -f $tmp.* >> +} >> + >> +# get standard environment, filters and checks >> +. ./common/rc >> +. ./common/filter >> + >> +# remove previous $seqres.full before test >> +rm -f $seqres.full >> + >> +# real QA test starts here >> + >> +# Modify as appropriate. >> +_supported_fs btrfs >> +_supported_os Linux >> +_require_scratch_dev_pool 2 >> +_scratch_dev_pool_get 2 >> + >> +# The wrong check limit is based on node size, to reduce runtime, we only >> +# support 4K page size system for now >> +if [ $(get_page_size) != 4096 ]; then >> + _notrun "This test need 4k page size" >> +fi >> + >> +device_1=$(echo $SCRATCH_DEV_POOL | awk '{print $1}') >> +device_2=$(echo $SCRATCH_DEV_POOL | awk '{print $2}') >> + >> +echo device_1=$device_1 device_2=$device_2 >> $seqres.full >> + >> +# Use 4K nodesize to reduce runtime > > How does the node size reduces runtime? The old wrong check is based on how large a chunk item can be. The macro we use is BTRFS_MAX_DEVS() which takes fs_info->nodesize to calculate. The largest chunk item (or any item) is based on nodesize. Thus nodesize will affect the runtime. > It's obvious when one wants to create deeper/larger trees for some > purpose, but this test doesn't populate the filesystem, it uses an > empty filesystem. > So that deserves an explanation of how the node size influences the > test's running time. Indeed. > >> +_scratch_mkfs -n 4k >> $seqres.full >> +_scratch_mount >> + >> +# Add and remove device in a loop, one loop will increase devid by 2. > > " ... one loop will ..." -> "... each iteration will ..." > >> +# for 4k nodesize, the wrong check will be triggered at devid 123. > > Why 123? It's not clear to me why, the test case doesn't explain and > neither does the btrfs patch that fixes the regression. > If it's related to the value of the constants/functions > BTRFS_MAX_DEVS and BTRFS_MAX_DEVS_SYS_CHUNK, it should be mentioned > here. OK, I'll mention > >> +# here 64 is enough to trigger such regression >> +for (( i = 0; i < 64; i++ )); do >> + $BTRFS_UTIL_PROG device add -f $device_2 $SCRATCH_MNT >> + $BTRFS_UTIL_PROG device del $device_1 $SCRATCH_MNT >> + $BTRFS_UTIL_PROG device add -f $device_1 $SCRATCH_MNT >> + $BTRFS_UTIL_PROG device del $device_2 $SCRATCH_MNT >> +done >> +_scratch_dev_pool_put >> + >> +echo "Silence is golden" >> + >> +# success, all done >> +status=0 >> +exit >> diff --git a/tests/btrfs/194.out b/tests/btrfs/194.out >> new file mode 100644 >> index ..7bfd50ff >> --- /dev/null >> +++ b/tests/btrfs/194.out >> @@ -0,0 +1,2 @@ >> +QA output created by 194 >> +Silence is golden >> diff --git a/tests/btrfs/group b/tests/btrfs/group >> index b92cb12c..ef1f0e1b 100644 >> --- a/tests/btrfs/group >> +++ b/tests/btrfs/group >> @@ -196,3 +196,4 @@ >> 191 auto quick send dedupe >> 192 auto replay snapshot stress >> 193 auto quick qgroup enospc limit >> +194 auto > > Maybe volume group as well. Thanks for the hint, I was looking into the groups but can't find a good candidate. At least volume makes more sense. Thanks, Qu > > Thanks Qu. > >> -- >> 2.22.0 >> > > signature.asc Description: OpenPGP digital signature
Re: Btrfs partition mount error
On 2019/9/30 下午6:57, Qu Wenruo wrote: > > > On 2019/9/30 下午6:23, Andrey Ivanov wrote: >> On 30.09.2019 13:10, Qu Wenruo wrote: I had built and run it: # /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block -X /dev/sdc1 incorrect offsets 15223 15287 Open ctree failed >>> >>> That branch updated to skip the item offset check completely. >>> >>> But please keep in mind that, that check itself still makes sense, so >>> please use the original "btrfs check" to check the fs. >> >> # /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block >> -X /dev/sdc1 >> key (613019873280 EXTENT_ITEM 1048576)slot end outside of leaf >> 1073755934 > 16283 >> Open ctree failed > > This means there are more corruptions in just that one leaf. > > It's really hard to fix them one by one manually. > > I'd recommend to go btrfs-restore or use this patchset: > https://patchwork.kernel.org/project/linux-btrfs/list/?series=170715 > > Then mount with ro,rescue=skipbg. > > Either way, the bitflip is way beyond my imagination. One strange thing hit me, not sure if it's just a coincident. We have another report internally about a similar corruption (multiple bit flips in a single fs), and they are also using VMware, along with VMware guest kernel modules. Would you mind to migrate to KVM based hypervisor to see if the corruption happens again? Thanks, Qu > > Thanks, > Qu > >> >> # btrfs check /dev/sdc1 >> Opening filesystem to check... >> incorrect offsets 15223 15287 >> ERROR: cannot open file system > signature.asc Description: OpenPGP digital signature
[PATCH] btrfs: removed unused return variable
From: Aliasgar Surti Removed unused return variable and replaced it with returning the value directly Signed-off-by: Aliasgar Surti --- fs/btrfs/disk-io.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 044981c..c80fa67 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4255,7 +4255,6 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, struct rb_node *node; struct btrfs_delayed_ref_root *delayed_refs; struct btrfs_delayed_ref_node *ref; - int ret = 0; delayed_refs = &trans->delayed_refs; @@ -4263,7 +4262,7 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, if (atomic_read(&delayed_refs->num_entries) == 0) { spin_unlock(&delayed_refs->lock); btrfs_info(fs_info, "delayed_refs has NO entry"); - return ret; + return 0; } while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) { @@ -4307,7 +4306,7 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, spin_unlock(&delayed_refs->lock); - return ret; + return 0; } static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root) -- 2.7.4
Re: [PATCH 1/3] btrfs-progs: check/lowmem: Add check and repair for invalid inode generation
On 24.09.19 г. 11:11 ч., Qu Wenruo wrote: > There are at least two bug reports of kernel tree-checker complaining > about invalid inode generation. > > All offending inodes seem to be caused by old kernel around 2014, with > inode generation overflow. > > So add such check and repair ability to lowmem mode check first. > > This involves: > - Calculate the inode generation upper limit > If it's an inode from log tree, then the upper limit is > super_generation + 1, otherwise it's super_generation. > > - Check if the inode generation is larger than the upper limit > > - Repair by resetting inode generation to current transaction > generation > > Reported-by: Charles Wright > Signed-off-by: Qu Wenruo Tested-by: Nikolay Borisov There is one small nit with the assert once rectified you can add: Reviewed-by: Nikolay Borisov > --- > check/mode-lowmem.c | 76 + > 1 file changed, 76 insertions(+) > > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c > index 5f7f101d..7af29ba9 100644 > --- a/check/mode-lowmem.c > +++ b/check/mode-lowmem.c > @@ -2472,6 +2472,59 @@ static bool has_orphan_item(struct btrfs_root *root, > u64 ino) > return false; > } > > +static int repair_inode_gen_lowmem(struct btrfs_root *root, > +struct btrfs_path *path) > +{ > + struct btrfs_trans_handle *trans; > + struct btrfs_inode_item *ii; > + struct btrfs_key key; > + u64 transid; > + int ret; > + > + trans = btrfs_start_transaction(root, 1); > + if (IS_ERR(trans)) { > + ret = PTR_ERR(trans); > + errno = -ret; > + error("failed to start transaction for inode gen repair: %m"); > + return ret; > + } > + transid = trans->transid; > + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); > + ASSERT(key.type == BTRFS_INODE_ITEM_KEY); nit: This function's sole caller, check_inode_item, is guaranteed to be called with a path pointing to BTRFS_INODE_ITEM_KEY thanks to the logic in the 'for' loop in process_one_leaf. This renders the assert redundant. At the very least I think it should be moved to check_inode_item. > + > + btrfs_release_path(path); > + > + ret = btrfs_search_slot(trans, root, &key, path, 0, 1); > + if (ret > 0) { > + ret = -ENOENT; > + error("no inode item found for ino %llu", key.objectid); > + goto error; > + } > + if (ret < 0) { > + errno = -ret; > + error("failed to find inode item for ino %llu: %m", > + key.objectid); > + goto error; > + } > + ii = btrfs_item_ptr(path->nodes[0], path->slots[0], > + struct btrfs_inode_item); > + btrfs_set_inode_generation(path->nodes[0], ii, trans->transid); > + btrfs_mark_buffer_dirty(path->nodes[0]); > + ret = btrfs_commit_transaction(trans, root); > + if (ret < 0) { > + errno = -ret; > + error("failed to commit transaction: %m"); > + goto error; > + } > + printf("reseting inode generation to %llu for ino %llu\n", > + transid, key.objectid); > + return ret; > + > +error: > + btrfs_abort_transaction(trans, ret); > + return ret; > +} > + > /* > * Check INODE_ITEM and related ITEMs (the same inode number) > * 1. check link count > @@ -2487,6 +2540,7 @@ static int check_inode_item(struct btrfs_root *root, > struct btrfs_path *path) > struct btrfs_inode_item *ii; > struct btrfs_key key; > struct btrfs_key last_key; > + struct btrfs_super_block *super = root->fs_info->super_copy; > u64 inode_id; > u32 mode; > u64 flags; > @@ -2497,6 +2551,8 @@ static int check_inode_item(struct btrfs_root *root, > struct btrfs_path *path) > u64 refs = 0; > u64 extent_end = 0; > u64 extent_size = 0; > + u64 generation; > + u64 gen_uplimit; > unsigned int dir; > unsigned int nodatasum; > bool is_orphan = false; > @@ -2527,6 +2583,7 @@ static int check_inode_item(struct btrfs_root *root, > struct btrfs_path *path) > flags = btrfs_inode_flags(node, ii); > dir = imode_to_type(mode) == BTRFS_FT_DIR; > nlink = btrfs_inode_nlink(node, ii); > + generation = btrfs_inode_generation(node, ii); > nodatasum = btrfs_inode_flags(node, ii) & BTRFS_INODE_NODATASUM; > > if (!is_valid_imode(mode)) { > @@ -2540,6 +2597,25 @@ static int check_inode_item(struct btrfs_root *root, > struct btrfs_path *path) > } > } > > + if (btrfs_super_log_root(super) != 0 && > + root->objectid == BTRFS_TREE_LOG_OBJECTID) > + gen_uplimit = btrfs_super_generation(super) + 1; > + else > + gen_uplimit = btrfs_super_generation(super); > + > + if (generation > gen_uplimit) { > + error( > + "invalid inode generation for i
Re: [PATCH] btrfs: removed unused return variable
On 30.09.19 г. 14:17 ч., Aliasgar Surti wrote: > From: Aliasgar Surti > > Removed unused return variable and replaced it with returning > the value directly > > Signed-off-by: Aliasgar Surti Actually this function should be turned to void and even the declaration at the top of disk-io can be removed. > --- > fs/btrfs/disk-io.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 044981c..c80fa67 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -4255,7 +4255,6 @@ static int btrfs_destroy_delayed_refs(struct > btrfs_transaction *trans, > struct rb_node *node; > struct btrfs_delayed_ref_root *delayed_refs; > struct btrfs_delayed_ref_node *ref; > - int ret = 0; > > delayed_refs = &trans->delayed_refs; > > @@ -4263,7 +4262,7 @@ static int btrfs_destroy_delayed_refs(struct > btrfs_transaction *trans, > if (atomic_read(&delayed_refs->num_entries) == 0) { > spin_unlock(&delayed_refs->lock); > btrfs_info(fs_info, "delayed_refs has NO entry"); > - return ret; > + return 0; > } > > while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) { > @@ -4307,7 +4306,7 @@ static int btrfs_destroy_delayed_refs(struct > btrfs_transaction *trans, > > spin_unlock(&delayed_refs->lock); > > - return ret; > + return 0; > } > > static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root) >
Re: Btrfs partition mount error
We have another report internally about a similar corruption (multiple bit flips in a single fs), and they are also using VMware, along with VMware guest kernel modules. Would you mind to migrate to KVM based hypervisor to see if the corruption happens again? I had this problem with btrfs after more than a year of using btrfs and vmware. If I switch to KVM based hypervisor, then I am not sure that this problem will occur again in a short period of time. I used virtualbox earlier, but switched to vmware because virtualbox has slow graphics.
Re: [PATCH 1/3] btrfs-progs: check/lowmem: Add check and repair for invalid inode generation
On 2019/9/30 下午7:36, Nikolay Borisov wrote: > > > On 24.09.19 г. 11:11 ч., Qu Wenruo wrote: >> There are at least two bug reports of kernel tree-checker complaining >> about invalid inode generation. >> >> All offending inodes seem to be caused by old kernel around 2014, with >> inode generation overflow. >> >> So add such check and repair ability to lowmem mode check first. >> >> This involves: >> - Calculate the inode generation upper limit >> If it's an inode from log tree, then the upper limit is >> super_generation + 1, otherwise it's super_generation. >> >> - Check if the inode generation is larger than the upper limit >> >> - Repair by resetting inode generation to current transaction >> generation >> >> Reported-by: Charles Wright >> Signed-off-by: Qu Wenruo > > Tested-by: Nikolay Borisov > > There is one small nit with the assert once rectified you can add: > > Reviewed-by: Nikolay Borisov > >> --- >> check/mode-lowmem.c | 76 + >> 1 file changed, 76 insertions(+) >> >> diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c >> index 5f7f101d..7af29ba9 100644 >> --- a/check/mode-lowmem.c >> +++ b/check/mode-lowmem.c >> @@ -2472,6 +2472,59 @@ static bool has_orphan_item(struct btrfs_root *root, >> u64 ino) >> return false; >> } >> >> +static int repair_inode_gen_lowmem(struct btrfs_root *root, >> + struct btrfs_path *path) >> +{ >> +struct btrfs_trans_handle *trans; >> +struct btrfs_inode_item *ii; >> +struct btrfs_key key; >> +u64 transid; >> +int ret; >> + >> +trans = btrfs_start_transaction(root, 1); >> +if (IS_ERR(trans)) { >> +ret = PTR_ERR(trans); >> +errno = -ret; >> +error("failed to start transaction for inode gen repair: %m"); >> +return ret; >> +} >> +transid = trans->transid; > >> +btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); >> +ASSERT(key.type == BTRFS_INODE_ITEM_KEY); > > nit: This function's sole caller, check_inode_item, is guaranteed to be > called with a path pointing to BTRFS_INODE_ITEM_KEY thanks to the logic > in the 'for' loop in process_one_leaf. This renders the assert > redundant. At the very least I think it should be moved to > check_inode_item. Yes, the ASSERT() doesn't make much sense by itself. However I still believe it won't be a problem. It's compiler's job to remove such dead ASSERT(), but for human reader, I still believe this ASSERT() could still make sense, especially when the caller or callee can get more and more complex. Thanks, Qu > >> + >> +btrfs_release_path(path); >> + >> +ret = btrfs_search_slot(trans, root, &key, path, 0, 1); >> +if (ret > 0) { >> +ret = -ENOENT; >> +error("no inode item found for ino %llu", key.objectid); >> +goto error; >> +} >> +if (ret < 0) { >> +errno = -ret; >> +error("failed to find inode item for ino %llu: %m", >> + key.objectid); >> +goto error; >> +} >> +ii = btrfs_item_ptr(path->nodes[0], path->slots[0], >> +struct btrfs_inode_item); >> +btrfs_set_inode_generation(path->nodes[0], ii, trans->transid); >> +btrfs_mark_buffer_dirty(path->nodes[0]); >> +ret = btrfs_commit_transaction(trans, root); >> +if (ret < 0) { >> +errno = -ret; >> +error("failed to commit transaction: %m"); >> +goto error; >> +} >> +printf("reseting inode generation to %llu for ino %llu\n", >> +transid, key.objectid); >> +return ret; >> + >> +error: >> +btrfs_abort_transaction(trans, ret); >> +return ret; >> +} >> + >> /* >> * Check INODE_ITEM and related ITEMs (the same inode number) >> * 1. check link count >> @@ -2487,6 +2540,7 @@ static int check_inode_item(struct btrfs_root *root, >> struct btrfs_path *path) >> struct btrfs_inode_item *ii; >> struct btrfs_key key; >> struct btrfs_key last_key; >> +struct btrfs_super_block *super = root->fs_info->super_copy; >> u64 inode_id; >> u32 mode; >> u64 flags; >> @@ -2497,6 +2551,8 @@ static int check_inode_item(struct btrfs_root *root, >> struct btrfs_path *path) >> u64 refs = 0; >> u64 extent_end = 0; >> u64 extent_size = 0; >> +u64 generation; >> +u64 gen_uplimit; >> unsigned int dir; >> unsigned int nodatasum; >> bool is_orphan = false; >> @@ -2527,6 +2583,7 @@ static int check_inode_item(struct btrfs_root *root, >> struct btrfs_path *path) >> flags = btrfs_inode_flags(node, ii); >> dir = imode_to_type(mode) == BTRFS_FT_DIR; >> nlink = btrfs_inode_nlink(node, ii); >> +generation = btrfs_inode_generation(node, ii); >> nodatasum = btrfs_inode_flags(node, ii) & BTRFS_INODE_NODATASUM; >> >> if (!is_valid_imode(mode)) { >> @@ -2540,6 +2597,25 @@ static int check_inode_item(struct bt
Re: Btrfs partition mount error
On 2019/9/30 下午8:10, Andrey Ivanov wrote: >> >> We have another report internally about a similar corruption (multiple >> bit flips in a single fs), and they are also using VMware, along with >> VMware guest kernel modules. >> >> Would you mind to migrate to KVM based hypervisor to see if the >> corruption happens again? > > I had this problem with btrfs after more than a year of using btrfs and > vmware. So it looks like some regression, although still not sure who is to blame. But anyway, I'll add more asserts for those obvious members which should never exceed 64K, to either to make sure it's not btrfs, or to confirm it's us to be blamed. Thanks, Qu > If I switch to KVM based hypervisor, then I am not sure that this > problem will occur again > in a short period of time. > I used virtualbox earlier, but switched to vmware because virtualbox has > slow graphics. signature.asc Description: OpenPGP digital signature
Re: [PATCH] btrfs: removed unused return variable
On Mon, Sep 30, 2019 at 02:53:12PM +0300, Nikolay Borisov wrote: > > > On 30.09.19 г. 14:17 ч., Aliasgar Surti wrote: > > From: Aliasgar Surti > > > > Removed unused return variable and replaced it with returning > > the value directly > > > > Signed-off-by: Aliasgar Surti > > Actually this function should be turned to void and even the declaration > at the top of disk-io can be removed. Okay, will send out v2 incorporating this change. Thanks, Aliasgar > > > --- > > fs/btrfs/disk-io.c | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > > index 044981c..c80fa67 100644 > > --- a/fs/btrfs/disk-io.c > > +++ b/fs/btrfs/disk-io.c > > @@ -4255,7 +4255,6 @@ static int btrfs_destroy_delayed_refs(struct > > btrfs_transaction *trans, > > struct rb_node *node; > > struct btrfs_delayed_ref_root *delayed_refs; > > struct btrfs_delayed_ref_node *ref; > > - int ret = 0; > > > > delayed_refs = &trans->delayed_refs; > > > > @@ -4263,7 +4262,7 @@ static int btrfs_destroy_delayed_refs(struct > > btrfs_transaction *trans, > > if (atomic_read(&delayed_refs->num_entries) == 0) { > > spin_unlock(&delayed_refs->lock); > > btrfs_info(fs_info, "delayed_refs has NO entry"); > > - return ret; > > + return 0; > > } > > > > while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) { > > @@ -4307,7 +4306,7 @@ static int btrfs_destroy_delayed_refs(struct > > btrfs_transaction *trans, > > > > spin_unlock(&delayed_refs->lock); > > > > - return ret; > > + return 0; > > } > > > > static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root) > >
[PATCH v2] btrfs: removed unused return variable
From: Aliasgar Surti Removed unused return variable and replaced it with returning the value directly Signed-off-by: Aliasgar Surti --- v2: Made btrfs_destroy_delayed_refs() as void and removed its declaration --- fs/btrfs/disk-io.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 044981c..0a8243a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -52,8 +52,6 @@ static const struct extent_io_ops btree_extent_io_ops; static void end_workqueue_fn(struct btrfs_work *work); static void btrfs_destroy_ordered_extents(struct btrfs_root *root); -static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, - struct btrfs_fs_info *fs_info); static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root); static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info, struct extent_io_tree *dirty_pages, @@ -4249,13 +4247,12 @@ static void btrfs_destroy_all_ordered_extents(struct btrfs_fs_info *fs_info) btrfs_wait_ordered_roots(fs_info, U64_MAX, 0, (u64)-1); } -static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, +static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, struct btrfs_fs_info *fs_info) { struct rb_node *node; struct btrfs_delayed_ref_root *delayed_refs; struct btrfs_delayed_ref_node *ref; - int ret = 0; delayed_refs = &trans->delayed_refs; @@ -4263,7 +4260,7 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, if (atomic_read(&delayed_refs->num_entries) == 0) { spin_unlock(&delayed_refs->lock); btrfs_info(fs_info, "delayed_refs has NO entry"); - return ret; + return; } while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) { @@ -4306,8 +4303,6 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, } spin_unlock(&delayed_refs->lock); - - return ret; } static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root) -- 2.7.4
Re: [PATCH v2] btrfs: removed unused return variable
On 30.09.19 г. 16:00 ч., Aliasgar Surti wrote: > From: Aliasgar Surti > > Removed unused return variable and replaced it with returning > the value directly The changelog could be expanded, something like: btrfs_destroy_delayed_refs always returns 0 so it can be converted to void. While at it also remove the unnecessary forward declaration in disk-io.c > > Signed-off-by: Aliasgar Surti Code wise it's ok: Reviewed-by: Nikolay Borisov > --- > v2: Made btrfs_destroy_delayed_refs() as void and removed its declaration > --- > fs/btrfs/disk-io.c | 9 ++--- > 1 file changed, 2 insertions(+), 7 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 044981c..0a8243a 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -52,8 +52,6 @@ > static const struct extent_io_ops btree_extent_io_ops; > static void end_workqueue_fn(struct btrfs_work *work); > static void btrfs_destroy_ordered_extents(struct btrfs_root *root); > -static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, > - struct btrfs_fs_info *fs_info); > static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root); > static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info, > struct extent_io_tree *dirty_pages, > @@ -4249,13 +4247,12 @@ static void btrfs_destroy_all_ordered_extents(struct > btrfs_fs_info *fs_info) > btrfs_wait_ordered_roots(fs_info, U64_MAX, 0, (u64)-1); > } > > -static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, > +static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, > struct btrfs_fs_info *fs_info) ^^ nit: It's good practice to also adjust the alignment of following lines. > { > struct rb_node *node; > struct btrfs_delayed_ref_root *delayed_refs; > struct btrfs_delayed_ref_node *ref; > - int ret = 0; > > delayed_refs = &trans->delayed_refs; > > @@ -4263,7 +4260,7 @@ static int btrfs_destroy_delayed_refs(struct > btrfs_transaction *trans, > if (atomic_read(&delayed_refs->num_entries) == 0) { > spin_unlock(&delayed_refs->lock); > btrfs_info(fs_info, "delayed_refs has NO entry"); > - return ret; > + return; > } > > while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) { > @@ -4306,8 +4303,6 @@ static int btrfs_destroy_delayed_refs(struct > btrfs_transaction *trans, > } > > spin_unlock(&delayed_refs->lock); > - > - return ret; > } > > static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root) >
Re: [PATCH v2] btrfs: removed unused return variable
On Mon, Sep 30, 2019 at 06:30:21PM +0530, Aliasgar Surti wrote: > From: Aliasgar Surti > > Removed unused return variable and replaced it with returning > the value directly This change has been sent several times and I give the same answer each time: https://lore.kernel.org/linux-btrfs/20190418154913.go20...@twin.jikos.cz/ "When switching a fuction to return void, please check the whole callgraph for functions that do not properly handler errors and do BUG_ON. You won't see errors passed from them so this gives the impression no error handling is needed in the caller. This has been sent in the past, so I can point you to the 2nd paragraph in https://lore.kernel.org/linux-btrfs/20180815124425.gm24...@twin.jikos.cz/ hint: btrfs_pin_extent"
Re: [PATCH 1/3] btrfs-progs: check/lowmem: Add check and repair for invalid inode generation
On 30.09.19 г. 15:24 ч., Qu Wenruo wrote: > Yes, the ASSERT() doesn't make much sense by itself. > > However I still believe it won't be a problem. It won't be a problem but it feels wrong to have this assert this deep into the call chain. IMO It should be put where it can trigger at the earliest which seems to be in check_inode_item. That function assumes it's working with an inode item and goes to dereference inode members so if the type is wrong we'd crash there instead of in repair_inode_gen_lowmem. > > It's compiler's job to remove such dead ASSERT(), but for human reader, > I still believe this ASSERT() could still make sense, especially when > the caller or callee can get more and more complex. > > Thanks,
Re: [PATCH] Btrfs: fix memory leak due to concurrent append writes with fiemap
Hi, [This is an automated email] This commit has been processed because it contains a -stable tag. The stable tag indicates that it's relevant for the following trees: 4.16+ The bot has tested the following trees: v5.3.1, v5.2.17, v4.19.75. v5.3.1: Build OK! v5.2.17: Build OK! v4.19.75: Failed to apply! Possible dependencies: 7073017aeb98 ("btrfs: use offset_in_page instead of open-coding it") NOTE: The patch will not be queued to stable trees until it is upstream. How should we proceed with this patch? -- Thanks, Sasha
Re: [PATCH] Btrfs: fix memory leak due to concurrent append writes with fiemap
On Mon, Sep 30, 2019 at 10:20:25AM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > When we have a buffered write that starts at an offset greater than or > equals to the file's size happening concurrently with a full ranged > fiemap, we can end up leaking an extent state structure. > > Suppose we have a file with a size of 1Mb, and before the buffered write > and fiemap are performed, it has a single extent state in its io tree > representing the range from 0 to 1Mb, with the EXTENT_DELALLOC bit set. > > The following sequence diagram shows how the memory leak happens if a > fiemap a buffered write, starting at offset 1Mb and with a length of > 4Kb, are performed concurrently. > > CPU 1 CPU 2 > > extent_fiemap() > --> it's a full ranged fiemap > range from 0 to LLONG_MAX - 1 > (9223372036854775807) > > --> locks range in the inode's > io tree > --> after this we have 2 extent > states in the io tree: > --> 1 for range [0, 1Mb[ with > the bits EXTENT_LOCKED and > EXTENT_DELALLOC_BITS set > --> 1 for the range > [1Mb, LLONG_MAX[ with > the EXTENT_LOCKED bit set > > --> start buffered write at > offset > 1Mb with a length of 4Kb > > btrfs_file_write_iter() > > btrfs_buffered_write() > --> cached_state is NULL > > > lock_and_cleanup_extent_if_need() > --> returns 0 and > does not lock > range because it > starts > at current i_size > / eof > > --> cached_state > remains NULL > > btrfs_dirty_pages() > > btrfs_set_extent_delalloc() > (...) > __set_extent_bit() > > --> splits extent > state for range > [1Mb, > LLONG_MAX[ and now we > have 2 extent > states: > > --> one for > the range > [1Mb, 1Mb > + 4Kb[ with > > EXTENT_LOCKED set > --> another > one for the range > [1Mb + > 4Kb, LLONG_MAX[ with > > EXTENT_LOCKED set as well > > --> sets > EXTENT_DELALLOC on the > extent state > for the range > [1Mb, 1Mb + > 4Kb[ > --> caches extent > state > [1Mb, 1Mb + > 4Kb[ into > @cached_state > because it has > the bit > EXTENT_LOCKED set > > --> > btrfs_buffered_write() ends up > with a non-NULL > cached_state and > never calls anything > to release its > reference on it, > resulting in a > memory leak > > Fix this by calling free_extent_state() on cached_state if the range was > not locked by lock_and_cleanup_extent_if_need(). > > The same issue can happen if anything else other than fiemap locks a range > that covers eof and beyond. > > This could be triggered, sporadically, by test case generic/561 from the > fstests suite, which makes duperemove run concurrently with fsstress, and > duperemove does plenty of calls to fiemap. When CONFIG_BTRFS_DEBUG is set > the leak is reported in dmesg/syslog when removing the btrfs module with > a message like
Re: [PATCH 1/3] btrfs-progs: check/lowmem: Add check and repair for invalid inode generation
On 2019/9/30 下午9:34, Nikolay Borisov wrote: > > > On 30.09.19 г. 15:24 ч., Qu Wenruo wrote: >> Yes, the ASSERT() doesn't make much sense by itself. >> >> However I still believe it won't be a problem. > > It won't be a problem but it feels wrong to have this assert this deep > into the call chain. IMO It should be put where it can trigger at the > earliest which seems to be in check_inode_item. That function assumes > it's working with an inode item and goes to dereference inode members so > if the type is wrong we'd crash there instead of in repair_inode_gen_lowmem. This is going to be a preference thing. To my taste, if possible, I would put ASSERT() in every function with more than 24 lines, as an alternative to comment, to show the prerequisite or the assumption. Thanks, Qu > >> >> It's compiler's job to remove such dead ASSERT(), but for human reader, >> I still believe this ASSERT() could still make sense, especially when >> the caller or callee can get more and more complex. >> >> Thanks,
[GIT PULL] Btrfs fixes for 5.4-rc2
Hi, a bunch of fixes that accumulated in recent weeks, mostly material for stable. Summary: - fix for regression from 5.3 that prevents to use balance convert with single profile - qgroup fixes: rescan race, accounting leak with multiple writers, potential leak after io failure recovery - fix for use after free in relocation (reported by KASAN) - other error handling fixups Please pull, thanks. The following changes since commit 6af112b11a4bc1b560f60a618ac9c1dcefe9836e: btrfs: Relinquish CPUs in btrfs_compare_trees (2019-09-09 14:59:20 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.4-rc1-tag for you to fetch changes up to d4e204948fe3e0dc8e1fbf3f8f3290c9c2823be3: btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls (2019-09-27 15:24:34 +0200) Dennis Zhou (1): btrfs: adjust dirty_metadata_bytes after writeback failure of extent buffer Filipe Manana (3): Btrfs: fix selftests failure due to uninitialized i_mode in test inodes Btrfs: fix missing error return if writeback for extent buffer never started Btrfs: fix race setting up and completing qgroup rescan workers Qu Wenruo (4): btrfs: relocation: fix use-after-free on dead relocation roots btrfs: Fix a regression which we can't convert to SINGLE profile btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls fs/btrfs/extent_io.c | 13 + fs/btrfs/qgroup.c| 38 +++--- fs/btrfs/relocation.c| 9 - fs/btrfs/tests/btrfs-tests.c | 8 +++- fs/btrfs/volumes.c | 8 +++- 5 files changed, 58 insertions(+), 18 deletions(-)
[PATCH] Setup GitLab-CI for btrfs-progs
Make use of GitLab-CI nested virutal environment to start QEMU instance inside containers and perform btrfs-progs build, execute unit test cases and save the logs. More details can be found at https://github.com/kdave/btrfs-progs/issues/171 Signed-off-by: Lakshmipathi.G --- .gitlab-ci.yml| 181 ++ gitlab-ci/Dockerfile | 3 + gitlab-ci/btrfs-progs-tests.service | 13 +++ gitlab-ci/build_or_run_btrfs-progs.sh | 37 +++ gitlab-ci/kernel_build.sh | 30 ++ gitlab-ci/run_tests.sh| 9 ++ gitlab-ci/setup_image.sh | 42 7 files changed, 315 insertions(+) create mode 100644 .gitlab-ci.yml create mode 100644 gitlab-ci/Dockerfile create mode 100644 gitlab-ci/btrfs-progs-tests.service create mode 100755 gitlab-ci/build_or_run_btrfs-progs.sh create mode 100755 gitlab-ci/kernel_build.sh create mode 100755 gitlab-ci/run_tests.sh create mode 100755 gitlab-ci/setup_image.sh diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml new file mode 100644 index 000..2afde50 --- /dev/null +++ b/.gitlab-ci.yml @@ -0,0 +1,181 @@ +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public +# License v2 as published by the Free Software Foundation. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public +# License along with this program; if not, write to the +# Free Software Foundation, Inc., 59 Temple Place - Suite 330, +# Boston, MA 021110-1307, USA. +# + +image: docker:18.09.7 + +services: +- docker:18.09.7-dind + +variables: + DOCKER_DRIVER: overlay2 + +stages: + - build + - btrfs-progs build + - test + +variables: + DOCKER_DRIVER: overlay2 + IMAGE_TAG: registry.gitlab.com/$CI_PROJECT_NAMESPACE/btrfs-progs:gitlab-ci + +before_script: + - docker login --username $CI_REGISTRY_USER --password $CI_REGISTRY_PASSWORD $CI_REGISTRY + +docker build: + stage: build + script: +- cd gitlab-ci +- docker pull $IMAGE_TAG > /dev/null && echo "Downloaded image" || ( docker build -t $IMAGE_TAG . && docker push $IMAGE_TAG ) +- cd .. + +## To enable or disable Kernel Build set BUILD_KERNEL: "1" or BUILD_KERNEL: "0" +## If you disable Kernel Build, make sure PREBUILT_KERNEL_ID points to previously built the kernel job id. + +kernel build: + variables: +BUILD_KERNEL: "1" +PREBUILT_KERNEL_ID: "288159334" + before_script: +- apk add curl unzip + stage: build + services: +- docker:18.09.7-dind + script: + - if [ "$BUILD_KERNEL" == "1" ]; then + docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged --device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/kernel_build.sh; + else + curl -o bzImage.zip --location --header "JOB-TOKEN:$CI_JOB_TOKEN" "https://gitlab.com/api/v4/projects/$CI_PROJECT_ID/jobs/$PREBUILT_KERNEL_ID/artifacts"; && unzip bzImage.zip; + fi; + artifacts: +when: always +paths: + - bzImage + +# To enable or disable image build update BUILD_IMAGE value to "1" or "0". +# If you disable Image Build, make sure PREBUILT_IMAGE_ID points to previously built rootfs job id. + +image build: + variables: +BUILD_IMAGE: "1" +PREBUILT_IMAGE_ID: "288506168" + before_script: +- apk add curl unzip + stage: build + services: +- docker:18.09.7-dind + script: + - if [ "$BUILD_IMAGE" == "1" ]; then + docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged --device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/setup_image.sh; + else + curl -o qemu-image.img.zip --location --header "JOB-TOKEN:$CI_JOB_TOKEN" "https://gitlab.com/api/v4/projects/$CI_PROJECT_ID/jobs/$PREBUILT_IMAGE_ID/artifacts"; && unzip qemu-image.img.zip; + fi; + artifacts: +when: always +paths: + - qemu-image.img + +btrfs-progs build: + stage: btrfs-progs build + services: +- docker:18.09.7-dind + script: + - docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged --device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/run_tests.sh + artifacts: +expire_in: 1 week +when: always +paths: + - qemu-image.img + +cli tests: + stage: test + services: +- docker:18.09.7-dind + script: + - echo "./cli-tests.sh" > $PWD/cmd + - docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged --device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/run_tests.sh + - test -e "result" || exit 1 # If result doesn't exists, job failed. + artifacts: +when: always +paths: + - "*tests-results.txt" + +convert tests: + only: +- devel + stage: test + services: +- docker:18.09
[PATCH] btrfs: fix incorrect updating of log root tree
We've historically had reports of being unable to mount file systems because the tree log root couldn't be read. Usually this is the "parent transid failure", but could be any of the related errors, including "fsid mismatch" or "bad tree block", depending on which block got allocated. The modification of the individual log root items are serialized on the per-log root root_mutex. This means that any modification to the per-subvol log root_item is completely protected. However we update the root item in the log root tree outside of the log root tree log_mutex. We do this in order to allow multiple subvolumes to be updated in each log transaction. This is problematic however because when we are writing the log root tree out we update the super block with the _current_ log root node information. Since these two operations happen independently of each other, you can end up updating the log root tree in between writing out the dirty blocks and setting the super block to point at the current root. This means we'll point at the new root node that hasn't been written out, instead of the one we should be pointing at. Thus whatever garbage or old block we end up pointing at complains when we mount the file system later and try to replay the log. Fix this by copying the log's root item into a local root item copy. Then once we're safely under the log_root_tree->log_mutex we update the root item in the log_root_tree. This way we do not modify the log_root_tree while we're committing it, fixing the problem. cc: sta...@vger.kernel.org Signed-off-by: Josef Bacik Reviewed-by: Chris Mason --- fs/btrfs/tree-log.c | 36 +++- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 7cac09a6f007..1d7f22951ef2 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2908,7 +2908,8 @@ static int walk_log_tree(struct btrfs_trans_handle *trans, * in the tree of log roots */ static int update_log_root(struct btrfs_trans_handle *trans, - struct btrfs_root *log) + struct btrfs_root *log, + struct btrfs_root_item *root_item) { struct btrfs_fs_info *fs_info = log->fs_info; int ret; @@ -2916,10 +2917,10 @@ static int update_log_root(struct btrfs_trans_handle *trans, if (log->log_transid == 1) { /* insert root item on the first sync */ ret = btrfs_insert_root(trans, fs_info->log_root_tree, - &log->root_key, &log->root_item); + &log->root_key, root_item); } else { ret = btrfs_update_root(trans, fs_info->log_root_tree, - &log->root_key, &log->root_item); + &log->root_key, root_item); } return ret; } @@ -3017,6 +3018,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *log = root->log_root; struct btrfs_root *log_root_tree = fs_info->log_root_tree; + struct btrfs_root_item new_root_item; int log_transid = 0; struct btrfs_log_ctx root_log_ctx; struct blk_plug plug; @@ -3080,17 +3082,25 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, goto out; } + /* +* We _must_ update under the root->log_mutex in order to make sure we +* have a consistent view of the log root we are trying to commit at +* this moment. +* +* We _must_ copy this into a local copy, because we are not holding the +* log_root_tree->log_mutex yet. This is important because when we +* commit the log_root_tree we must have a consistent view of the +* log_root_tree when we update the super block to point at the +* log_root_tree bytenr. If we update the log_root_tree here we'll race +* with the commit and possibly point at the new block which we may not +* have written out. +*/ btrfs_set_root_node(&log->root_item, log->node); + memcpy(&new_root_item, &log->root_item, sizeof(new_root_item)); root->log_transid++; log->log_transid = root->log_transid; root->log_start_pid = 0; - /* -* Update or create log root item under the root's log_mutex to prevent -* races with concurrent log syncs that can lead to failure to update -* log root item because it was not created yet. -*/ - ret = update_log_root(trans, log); /* * IO has been started, blocks of the log tree have WRITTEN flag set * in their headers. new modifications of the log will be written to @@ -3111,6 +3121,14 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, mutex_unlock(&log_root_tree->log_mutex); mutex_lock(&log_root_tree->log_mutex); + +
BTRFS corruption, mounts but comes read-only
Quick history, the system was rejecting ssh connections, and was not responding to ACPI shutdown, so I had to force power off. On restart, everything looked ok, but when I did a scrub, the system quickly reached a point of forced read-only., (kernel Ubuntu 4.15.0-52) I installed a current Arch for recovery, (kernel 5.3.1 and btrfs-progs 5.2.2) Here is the result of btrfs check btrfs check /dev/sdd4 Opening filesystem to check... Checking filesystem on /dev/sdd4 UUID: cd54199e-753f-4bac-862d-c9d353d3e155 [1/7] checking root items [2/7] checking extents corrupt extent record: key [5126707298304,169,16384] incorrect local backref count on 5126707298304 root 5890615246848 owner 1507958538109110 offset 386035672063981056 found 0 wanted 11927552 back 0x55737707d980 backref disk bytenr does not match extent record, bytenr=5126707298304, ref bytenr=0 tree backref 5126707298304 parent 5890436890624 root 5890436890624 not found in extent tree tree backref 5126707298304 parent 5890463039488 root 5890463039488 not found in extent tree tree backref 5126707298304 parent 5125855133696 root 5125855133696 not found in extent tree tree backref 5126707298304 parent 4717386399744 root 4717386399744 not found in extent tree tree backref 5126707298304 parent 5125777653760 root 5125777653760 not found in extent tree tree backref 5126707298304 parent 5126231179264 root 5126231179264 not found in extent tree tree backref 5126707298304 parent 4716869992448 root 4716869992448 not found in extent tree tree backref 5126707298304 parent 4716547129344 root 4716547129344 not found in extent tree tree backref 5126707298304 parent 4716575670272 root 4716575670272 not found in extent tree tree backref 5126707298304 parent 4418457927680 root 4418457927680 not found in extent tree tree backref 5126707298304 parent 4418653339648 root 4418653339648 not found in extent tree tree backref 5126707298304 parent 4716604555264 root 4716604555264 not found in extent tree tree backref 5126707298304 parent 4716953616384 root 4716953616384 not found in extent tree tree backref 5126707298304 parent 5890615246848 root 5890615246848 not found in extent tree tree backref 5126707298304 parent 4134522355712 root 4134522355712 not found in extent tree tree backref 5126707298304 parent 4418052734976 root 4418052734976 not found in extent tree tree backref 5126707298304 parent 4034125447168 root 4034125447168 not found in extent tree tree backref 5126707298304 parent 4034188083200 root 4034188083200 not found in extent tree tree backref 5126707298304 parent 4133585420288 root 4133585420288 not found in extent tree tree backref 5126707298304 parent 4034073935872 root 4034073935872 not found in extent tree tree backref 5126707298304 parent 4032730021888 root 4032730021888 not found in extent tree tree backref 5126707298304 parent 4033156546560 root 4033156546560 not found in extent tree tree backref 5126707298304 parent 4031949127680 root 4031949127680 not found in extent tree tree backref 5126707298304 parent 4032101384192 root 4032101384192 not found in extent tree tree backref 5126707298304 parent 4032197132288 root 4032197132288 not found in extent tree tree backref 5126707298304 parent 4034077949952 root 4034077949952 not found in extent tree tree backref 5126707298304 parent 4031202394112 root 4031202394112 not found in extent tree tree backref 5126707298304 parent 4030981292032 root 4030981292032 not found in extent tree tree backref 5126707298304 parent 3874566619136 root 3874566619136 not found in extent tree tree backref 5126707298304 parent 3874229714944 root 3874229714944 not found in extent tree tree backref 5126707298304 parent 3873657765888 root 3873657765888 not found in extent tree tree backref 5126707298304 parent 3874460270592 root 3874460270592 not found in extent tree tree backref 5126707298304 parent 4030556012544 root 4030556012544 not found in extent tree tree backref 5126707298304 parent 4031297765376 root 4031297765376 not found in extent tree tree backref 5126707298304 parent 4418354544640 root 4418354544640 not found in extent tree backpointer mismatch on [5126707298304 16384] bad extent [5126707298304, 5126707314688), type mismatch with chunk ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space tree cache and super generation don't match, space cache will be invalidated [4/7] checking fs roots [5/7] checking only csums items (without verifying data) [6/7] checking root refs [7/7] checking quota groups skipped (not enabled on this FS) found 4501017206864 bytes used, error(s) found total csum bytes: 4386928432 total tree bytes: 9074900992 total fs tree bytes: 3698982912 total extent tree bytes: 377012224 btree space waste bytes: 1430502412 file data blocks allocated: 16729280847872 referenced 8948709453824 Thank you.
Re: [GIT PULL] Btrfs fixes for 5.4-rc2
The pull request you sent on Mon, 30 Sep 2019 16:25:08 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.4-rc1-tag has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/bb48a59135926ece9b1361e8b96b33fc658830bc Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [PATCH][v3] btrfs: use refcount_inc_not_zero in kill_all_nodes
On Thu, Sep 26, 2019 at 08:29:32AM -0400, Josef Bacik wrote: > We hit the following warning while running down a different problem > > [ 6197.175850] [ cut here ] > [ 6197.185082] refcount_t: underflow; use-after-free. > [ 6197.194704] WARNING: CPU: 47 PID: 966 at lib/refcount.c:190 > refcount_sub_and_test_checked+0x53/0x60 > [ 6197.521792] Call Trace: > [ 6197.526687] __btrfs_release_delayed_node+0x76/0x1c0 > [ 6197.536615] btrfs_kill_all_delayed_nodes+0xec/0x130 > [ 6197.546532] ? __btrfs_btree_balance_dirty+0x60/0x60 > [ 6197.556482] btrfs_clean_one_deleted_snapshot+0x71/0xd0 > [ 6197.566910] cleaner_kthread+0xfa/0x120 > [ 6197.574573] kthread+0x111/0x130 > [ 6197.581022] ? kthread_create_on_node+0x60/0x60 > [ 6197.590086] ret_from_fork+0x1f/0x30 > [ 6197.597228] ---[ end trace 424bb7ae00509f56 ]--- > > This is because the free side drops the ref without the lock, and then > takes the lock if our refcount is 0. So you can have nodes on the tree > that have a refcount of 0. This sounds like breaking the assumptions of the refcounts, if the object is in the tree it should not be accessible by anything else than the freeing code, with refs == 0. Now the delayed nodes do not follow that and there were bugs where the 0 was increased again (ec35e48b286959991c "btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes"). Your patch fixes another instance, I still see some other refcount_inc that seem to be safe but the call to refcount_inc_not_zero should have a comment, like is in the mentioned commit. > Fix this by zero'ing out that element in our > temporary array so we don't try to kill it again. The fix looks correct.
Re: BTRFS corruption, mounts but comes read-only
Mounting the FS after that btrfs check, kernel 5.3.1, here is the dmesg log: [10524.830640] BTRFS info (device sdf4): using free space tree [10524.830644] BTRFS info (device sdf4): has skinny extents [10549.561143] BTRFS info (device sdf4): checking UUID tree [10606.900264] BTRFS info (device sdf4): leaf 3873928691712 gen 935882 total ptrs 147 free space 6236 owner 2 [10606.900268] item 0 key (5126703497216 169 0) itemoff 15845 itemsize 438 [10606.900270] extent refs 46 gen 885323 flags 2 [10606.900272] ref#0: tree block backref root 10262 [10606.900274] ref#1: shared block backref parent 6300271443968 [10606.900276] ref#2: shared block backref parent 6300177645568 [10606.900278] ref#3: shared block backref parent 6300087877632 [10606.900279] ref#4: shared block backref parent 6299851292672 [10606.900281] ref#5: shared block backref parent 6299576254464 [10606.900283] ref#6: shared block backref parent 6299308621824 [10606.900284] ref#7: shared block backref parent 6240101318656 [10606.900286] ref#8: shared block backref parent 6239972589568 [10606.900288] ref#9: shared block backref parent 5891141632000 [10606.900289] ref#10: shared block backref parent 5891082911744 [10606.900291] ref#11: shared block backref parent 5891042951168 [10606.900293] ref#12: shared block backref parent 5890615246848 [10606.900294] ref#13: shared block backref parent 5890463039488 [10606.900296] ref#14: shared block backref parent 5890436890624 [10606.900298] ref#15: shared block backref parent 5126231179264 [10606.900299] ref#16: shared block backref parent 5125855133696 [10606.900301] ref#17: shared block backref parent 5125777653760 [10606.900302] ref#18: shared block backref parent 4717386399744 [10606.900304] ref#19: shared block backref parent 4716953616384 [10606.900306] ref#20: shared block backref parent 4716869992448 [10606.900307] ref#21: shared block backref parent 4716604555264 [10606.900309] ref#22: shared block backref parent 4716575670272 [10606.900311] ref#23: shared block backref parent 4716547129344 [10606.900312] ref#24: shared block backref parent 4418653339648 [10606.900314] ref#25: shared block backref parent 4418457927680 [10606.900316] ref#26: shared block backref parent 4418052734976 [10606.900317] ref#27: shared block backref parent 4134522355712 [10606.900319] ref#28: shared block backref parent 4133585420288 [10606.900320] ref#29: shared block backref parent 4034188083200 [10606.900322] ref#30: shared block backref parent 4034125447168 [10606.900324] ref#31: shared block backref parent 4034077949952 [10606.900325] ref#32: shared block backref parent 4034073935872 [10606.900327] ref#33: shared block backref parent 4033156546560 [10606.900329] ref#34: shared block backref parent 4032730021888 [10606.900330] ref#35: shared block backref parent 4032197132288 [10606.900332] ref#36: shared block backref parent 4032101384192 [10606.900333] ref#37: shared block backref parent 4031949127680 [10606.900335] ref#38: shared block backref parent 4031297765376 [10606.900337] ref#39: shared block backref parent 4031202394112 [10606.900338] ref#40: shared block backref parent 4030981292032 [10606.900340] ref#41: shared block backref parent 4030556012544 [10606.900342] ref#42: shared block backref parent 3874566619136 [10606.900343] ref#43: shared block backref parent 3874460270592 [10606.900345] ref#44: shared block backref parent 3874229714944 [10606.900347] ref#45: shared block backref parent 3873657765888 [10606.900350] item 1 key (5126703529984 169 0) itemoff 15812 itemsize 33 [10606.900351] extent refs 1 gen 571786 flags 258 [10606.900352] ref#0: tree block backref root 46426 [10606.900355] item 2 key (5126703546368 169 0) itemoff 15779 itemsize 33 [10606.900357] extent refs 1 gen 626172 flags 2 [10606.900390] ref#0: tree block backref root 7 [10606.900393] item 3 key (5126703562752 169 0) itemoff 15746 itemsize 33 [10606.900395] extent refs 1 gen 626172 flags 2 [10606.900396] ref#0: tree block backref root 7 [10606.900399] item 4 key (5126703579136 169 0) itemoff 15713 itemsize 33 [10606.900401] extent refs 1 gen 626172 flags 2 [10606.900402] ref#0: tree block backref root 7 [10606.900404] item 5 key (5126703661056 169 0) itemoff 15680 itemsize 33 [10606.900406] extent refs 1 gen 573020 flags 258 [10606.900407] ref#0: shared block backref parent 5891148972032 [10606.900410] item 6 key (5126703677440 169 0) itemoff 15647 itemsize 33 [10606.900412] extent refs 1 gen 626172 flags 2 [10606.9004
Re: [PATCH RFC] btrfs: relocation: Hunt down BUG_ON() in merge_reloc_roots()
On Thu, Sep 26, 2019 at 02:35:45PM +0800, Qu Wenruo wrote: > [BUG] > There is one BUG_ON() report where a transaction is aborted during > balance, then kernel BUG_ON() in merge_reloc_roots(): Do you have details from the report, eg. what's the error code? > void merge_reloc_roots(struct reloc_control *rc) > { > ... > BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)); <<< > } > > [CAUSE] > It's still uncertain why we can get to such situation. > As all __add_reloc_root() calls will also link that reloc root to > rc->reloc_roots, and in merge_reloc_roots() we cleanup rc->reloc_roots. > > So the root cause is still uncertain. > > [FIX] > But we can still hunt down all the BUG_ON() in merge_reloc_roots(). > > There are 3 BUG_ON()s in it: > - BUG_ON() for read_fs_root() result > - BUG_ON() for root->reloc_root != reloc_root case > - BUG_ON() for the non-empty reloc_root_tree relocation.c is worst regarding number of BUG_ONs, some of them look like runtime assertions of the invariants but some of them are substituting for error handling. The first one BUG_ON(IS_ERR(root)); is clearly the latter, the other are assertions > > For the first two, just grab the return value and goto out tag for > cleanup. > > For the last one, make it more graceful by: > - grab the lock before doing read/write > - warn instead of panic > - cleanup the nodes in rc->reloc_root_tree > > Signed-off-by: Qu Wenruo > --- > Reason for RFC: > The root cause to leak nodes in reloc_root_tree is still unknown. > --- > fs/btrfs/relocation.c | 39 --- > 1 file changed, 36 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index 655f1d5a8c27..d562b5c52a40 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -2484,11 +2484,26 @@ void merge_reloc_roots(struct reloc_control *rc) > if (btrfs_root_refs(&reloc_root->root_item) > 0) { > root = read_fs_root(fs_info, > reloc_root->root_key.offset); > - BUG_ON(IS_ERR(root)); > - BUG_ON(root->reloc_root != reloc_root); > + if (IS_ERR(root)) { This is bug_on -> error handling, ok > + ret = PTR_ERR(root); > + btrfs_err(fs_info, > + "failed to read root %llu: %d", > + reloc_root->root_key.offset, ret); > + goto out; > + } > + if (root->reloc_root != reloc_root) { With this one I'm not sure it could happen but replacing the bug on is still good. > + btrfs_err(fs_info, > + "reloc root mismatch for root %llu", Would be good to print the number of the other root as well. > + root->root_key.objectid); > + ret = -EINVAL; > + goto out; > + } > > ret = merge_reloc_root(rc, root); > if (ret) { > + btrfs_err(fs_info, > + "failed to merge reloc tree for root %llu: %d", > + root->root_key.objectid, ret); > if (list_empty(&reloc_root->root_list)) > list_add_tail(&reloc_root->root_list, > &reloc_roots); > @@ -2520,7 +2535,25 @@ void merge_reloc_roots(struct reloc_control *rc) > free_reloc_roots(&reloc_roots); > } > > - BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)); This one looks more like the invariant, the tree should be really empty here. While the cleanup is trying to make things work despite there's a problem, I think the warning should not be debugging only. > + spin_lock(&rc->reloc_root_tree.lock); > + /* Cleanup dirty reloc tree nodes */ > + if (!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)) { > + struct mapping_node *node; > + struct mapping_node *next; > + > + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); ...
Re: Btrfs partition mount error
On 30.09.2019 15:27, Qu Wenruo wrote: We have another report internally about a similar corruption (multiple bit flips in a single fs), and they are also using VMware, along with VMware guest kernel modules. Would you mind to migrate to KVM based hypervisor to see if the corruption happens again? I had this problem with btrfs after more than a year of using btrfs and vmware. So it looks like some regression, although still not sure who is to blame. One more clarification. I had this problem with btrfs after switching from linux-4.19.52-gentoo kernel to linux-4.19.66-gentoo.
Re: [PATCH] Btrfs: fix memory leak due to concurrent append writes with fiemap
On Mon, Sep 30, 2019 at 10:20:25AM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > When we have a buffered write that starts at an offset greater than or > equals to the file's size happening concurrently with a full ranged > fiemap, we can end up leaking an extent state structure. > > Suppose we have a file with a size of 1Mb, and before the buffered write > and fiemap are performed, it has a single extent state in its io tree > representing the range from 0 to 1Mb, with the EXTENT_DELALLOC bit set. > > The following sequence diagram shows how the memory leak happens if a > fiemap a buffered write, starting at offset 1Mb and with a length of > 4Kb, are performed concurrently. > > CPU 1 CPU 2 > > extent_fiemap() > --> it's a full ranged fiemap > range from 0 to LLONG_MAX - 1 > (9223372036854775807) > > --> locks range in the inode's > io tree > --> after this we have 2 extent > states in the io tree: > --> 1 for range [0, 1Mb[ with > the bits EXTENT_LOCKED and > EXTENT_DELALLOC_BITS set > --> 1 for the range > [1Mb, LLONG_MAX[ with > the EXTENT_LOCKED bit set > > --> start buffered write at > offset > 1Mb with a length of 4Kb > > btrfs_file_write_iter() > > btrfs_buffered_write() > --> cached_state is NULL > > > lock_and_cleanup_extent_if_need() > --> returns 0 and > does not lock > range because it > starts > at current i_size > / eof > > --> cached_state > remains NULL > > btrfs_dirty_pages() > > btrfs_set_extent_delalloc() > (...) > __set_extent_bit() > > --> splits extent > state for range > [1Mb, > LLONG_MAX[ and now we > have 2 extent > states: > > --> one for > the range > [1Mb, 1Mb > + 4Kb[ with > > EXTENT_LOCKED set > --> another > one for the range > [1Mb + > 4Kb, LLONG_MAX[ with > > EXTENT_LOCKED set as well > > --> sets > EXTENT_DELALLOC on the > extent state > for the range > [1Mb, 1Mb + > 4Kb[ > --> caches extent > state > [1Mb, 1Mb + > 4Kb[ into > @cached_state > because it has > the bit > EXTENT_LOCKED set > > --> > btrfs_buffered_write() ends up > with a non-NULL > cached_state and > never calls anything > to release its > reference on it, > resulting in a > memory leak > > Fix this by calling free_extent_state() on cached_state if the range was > not locked by lock_and_cleanup_extent_if_need(). > > The same issue can happen if anything else other than fiemap locks a range > that covers eof and beyond. > > This could be triggered, sporadically, by test case generic/561 from the > fstests suite, which makes duperemove run concurrently with fsstress, and > duperemove does plenty of calls to fiemap. When CONFIG_BTRFS_DEBUG is set > the leak is reported in dmesg/syslog when removing the btrfs module with > a message like
Re: [PATCH RFC] btrfs: relocation: Hunt down BUG_ON() in merge_reloc_roots()
On 2019/10/1 上午1:51, David Sterba wrote: > On Thu, Sep 26, 2019 at 02:35:45PM +0800, Qu Wenruo wrote: >> [BUG] >> There is one BUG_ON() report where a transaction is aborted during >> balance, then kernel BUG_ON() in merge_reloc_roots(): > > Do you have details from the report, eg. what's the error code? Nope, what I get is only kernel message. Not enough to get the error code. > >> void merge_reloc_roots(struct reloc_control *rc) >> { >> ... >> BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)); <<< >> } >> >> [CAUSE] >> It's still uncertain why we can get to such situation. >> As all __add_reloc_root() calls will also link that reloc root to >> rc->reloc_roots, and in merge_reloc_roots() we cleanup rc->reloc_roots. >> >> So the root cause is still uncertain. >> >> [FIX] >> But we can still hunt down all the BUG_ON() in merge_reloc_roots(). >> >> There are 3 BUG_ON()s in it: >> - BUG_ON() for read_fs_root() result >> - BUG_ON() for root->reloc_root != reloc_root case >> - BUG_ON() for the non-empty reloc_root_tree > > relocation.c is worst regarding number of BUG_ONs, some of them look > like runtime assertions of the invariants but some of them are > substituting for error handling. Yeah, if we inject kmalloc error into btrfs_relocate_block_group(), it's too easy to hit BUG_ON(). > > The first one BUG_ON(IS_ERR(root)); is clearly the latter, the other are > assertions > >> >> For the first two, just grab the return value and goto out tag for >> cleanup. >> >> For the last one, make it more graceful by: >> - grab the lock before doing read/write >> - warn instead of panic >> - cleanup the nodes in rc->reloc_root_tree >> >> Signed-off-by: Qu Wenruo >> --- >> Reason for RFC: >> The root cause to leak nodes in reloc_root_tree is still unknown. >> --- >> fs/btrfs/relocation.c | 39 --- >> 1 file changed, 36 insertions(+), 3 deletions(-) >> >> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c >> index 655f1d5a8c27..d562b5c52a40 100644 >> --- a/fs/btrfs/relocation.c >> +++ b/fs/btrfs/relocation.c >> @@ -2484,11 +2484,26 @@ void merge_reloc_roots(struct reloc_control *rc) >> if (btrfs_root_refs(&reloc_root->root_item) > 0) { >> root = read_fs_root(fs_info, >> reloc_root->root_key.offset); >> -BUG_ON(IS_ERR(root)); >> -BUG_ON(root->reloc_root != reloc_root); >> +if (IS_ERR(root)) { > > This is bug_on -> error handling, ok > >> +ret = PTR_ERR(root); >> +btrfs_err(fs_info, >> + "failed to read root %llu: %d", >> + reloc_root->root_key.offset, ret); >> +goto out; >> +} >> +if (root->reloc_root != reloc_root) { > > With this one I'm not sure it could happen but replacing the bug on is > still good. > >> +btrfs_err(fs_info, >> +"reloc root mismatch for root %llu", > > Would be good to print the number of the other root as well. > >> +root->root_key.objectid); >> +ret = -EINVAL; >> +goto out; >> +} >> >> ret = merge_reloc_root(rc, root); >> if (ret) { >> +btrfs_err(fs_info, >> +"failed to merge reloc tree for root %llu: %d", >> + root->root_key.objectid, ret); >> if (list_empty(&reloc_root->root_list)) >> list_add_tail(&reloc_root->root_list, >>&reloc_roots); >> @@ -2520,7 +2535,25 @@ void merge_reloc_roots(struct reloc_control *rc) >> free_reloc_roots(&reloc_roots); >> } >> >> -BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)); > > This one looks more like the invariant, the tree should be really empty > here. While the cleanup is trying to make things work despite there's a > problem, I think the warning should not be debugging only. Indeed, we should output the warning no matter whatever. Thanks, Qu > >> +spin_lock(&rc->reloc_root_tree.lock); >> +/* Cleanup dirty reloc tree nodes */ >> +if (!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)) { >> +struct mapping_node *node; >> +struct mapping_node *next; >> + >> +WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); > > ... > signature.asc Description: OpenPGP digital signature
Re: BTRFS corruption, mounts but comes read-only
On 2019/10/1 上午5:08, Remi Gauvin wrote: > Mounting the FS after that btrfs check, kernel 5.3.1, here is the dmesg log: As that btrfs check shows, your extent tree is corrupted. Quite some backref is lost, thus no wonder some write opeartion would cause problem. In that case, it looks like only extent tree is corrupted, and no transid error. If you have data backed up, you would like to btrfs check --repair to see if it can repair them. Thanks, Qu > > [10524.830640] BTRFS info (device sdf4): using free space tree > [10524.830644] BTRFS info (device sdf4): has skinny extents > [10549.561143] BTRFS info (device sdf4): checking UUID tree > [10606.900264] BTRFS info (device sdf4): leaf 3873928691712 gen 935882 > total ptrs 147 free space 6236 owner 2 > [10606.900268]item 0 key (5126703497216 169 0) itemoff 15845 itemsize > 438 > [10606.900270]extent refs 46 gen 885323 flags 2 > [10606.900272]ref#0: tree block backref root 10262 > [10606.900274]ref#1: shared block backref parent 6300271443968 > [10606.900276]ref#2: shared block backref parent 6300177645568 > [10606.900278]ref#3: shared block backref parent 6300087877632 > [10606.900279]ref#4: shared block backref parent 6299851292672 > [10606.900281]ref#5: shared block backref parent 6299576254464 > [10606.900283]ref#6: shared block backref parent 6299308621824 > [10606.900284]ref#7: shared block backref parent 6240101318656 > [10606.900286]ref#8: shared block backref parent 6239972589568 > [10606.900288]ref#9: shared block backref parent 5891141632000 > [10606.900289]ref#10: shared block backref parent > 5891082911744 > [10606.900291]ref#11: shared block backref parent > 5891042951168 > [10606.900293]ref#12: shared block backref parent > 5890615246848 > [10606.900294]ref#13: shared block backref parent > 5890463039488 > [10606.900296]ref#14: shared block backref parent > 5890436890624 > [10606.900298]ref#15: shared block backref parent > 5126231179264 > [10606.900299]ref#16: shared block backref parent > 5125855133696 > [10606.900301]ref#17: shared block backref parent > 5125777653760 > [10606.900302]ref#18: shared block backref parent > 4717386399744 > [10606.900304]ref#19: shared block backref parent > 4716953616384 > [10606.900306]ref#20: shared block backref parent > 4716869992448 > [10606.900307]ref#21: shared block backref parent > 4716604555264 > [10606.900309]ref#22: shared block backref parent > 4716575670272 > [10606.900311]ref#23: shared block backref parent > 4716547129344 > [10606.900312]ref#24: shared block backref parent > 4418653339648 > [10606.900314]ref#25: shared block backref parent > 4418457927680 > [10606.900316]ref#26: shared block backref parent > 4418052734976 > [10606.900317]ref#27: shared block backref parent > 4134522355712 > [10606.900319]ref#28: shared block backref parent > 4133585420288 > [10606.900320]ref#29: shared block backref parent > 4034188083200 > [10606.900322]ref#30: shared block backref parent > 4034125447168 > [10606.900324]ref#31: shared block backref parent > 4034077949952 > [10606.900325]ref#32: shared block backref parent > 4034073935872 > [10606.900327]ref#33: shared block backref parent > 4033156546560 > [10606.900329]ref#34: shared block backref parent > 4032730021888 > [10606.900330]ref#35: shared block backref parent > 4032197132288 > [10606.900332]ref#36: shared block backref parent > 4032101384192 > [10606.900333]ref#37: shared block backref parent > 4031949127680 > [10606.900335]ref#38: shared block backref parent > 4031297765376 > [10606.900337]ref#39: shared block backref parent > 4031202394112 > [10606.900338]ref#40: shared block backref parent > 4030981292032 > [10606.900340]ref#41: shared block backref parent > 4030556012544 > [10606.900342]ref#42: shared block backref parent > 3874566619136 > [10606.900343]ref#43: shared block backref parent > 3874460270592 > [10606.900345]ref#44: shared block backref parent > 3874229714944 > [10606.900347]ref#45: shared block backref parent > 3873657765888 > [10606.900350]item 1 key (5126703529984 169 0) itemoff 15812 itemsize > 33 > [10606.900351]extent refs 1 gen 571786 flags 258 > [10606.900352]ref#0: tree block backref root 46426 > [
Re: BTRFS corruption, mounts but comes read-only
On 2019-09-30 9:30 p.m., Qu Wenruo wrote: > > > On 2019/10/1 上午5:08, Remi Gauvin wrote: >> Mounting the FS after that btrfs check, kernel 5.3.1, here is the dmesg log: > > As that btrfs check shows, your extent tree is corrupted. > Quite some backref is lost, thus no wonder some write opeartion would > cause problem. > > In that case, it looks like only extent tree is corrupted, and no > transid error. > > If you have data backed up, you would like to btrfs check --repair to > see if it can repair them. > > Thanks, > Qu This is itself a back-up system, but not wishing to tempt fate, I'm going to have to create a working substitute before I do anything risky with it. It would take me several days and lots of travel to re-aquire them. Any idea what could have happened? The data and meta data should be raid1, and none of the drives have any io errors of any kinds in their SMART log. Thank you for your guidance. signature.asc Description: OpenPGP digital signature
Re: BTRFS corruption, mounts but comes read-only
On 2019/10/1 上午9:41, Remi Gauvin wrote: > On 2019-09-30 9:30 p.m., Qu Wenruo wrote: >> >> >> On 2019/10/1 上午5:08, Remi Gauvin wrote: >>> Mounting the FS after that btrfs check, kernel 5.3.1, here is the dmesg log: >> >> As that btrfs check shows, your extent tree is corrupted. >> Quite some backref is lost, thus no wonder some write opeartion would >> cause problem. >> >> In that case, it looks like only extent tree is corrupted, and no >> transid error. >> >> If you have data backed up, you would like to btrfs check --repair to >> see if it can repair them. >> >> Thanks, >> Qu > > > This is itself a back-up system, but not wishing to tempt fate, I'm > going to have to create a working substitute before I do anything risky > with it. It would take me several days and lots of travel to re-aquire > them. > > Any idea what could have happened? Maybe some existing bugs, since it's an older kernel. But at least, no tree relationship corruption. So you should be pretty OK. You could try btrfs check --init-extent-tree, as I see no error report from fs tree check. Since it's extent tree only mismatch, you can in fact mount the fs RO and grab all data. > The data and meta data should be > raid1, and none of the drives have any io errors of any kinds in their > SMART log. It's indeed a symptom of btrfs kernel module bug. But at least looks repairable. Thanks, Qu > > Thank you for your guidance. > > > > signature.asc Description: OpenPGP digital signature
Re: Btrfs partition mount error
On 30.09.2019 9:27, Qu Wenruo wrote: I recommend to do a "btrfs check" on all fses. I had done "btrfs check" on /dev/sda4: attached btrfs-check-sda4.output There are some errors. How to fix them? [1/7] checking root items [2/7] checking extents incorrect local backref count on 1533538304 root 5 owner 7662482 offset 0 found 1 wanted 4194305 back 0x55bbb3f482a0 backpointer mismatch on [1533538304 16384] data backref 1533759488 root 5 owner 10122609 offset 0 num_refs 0 not found in extent tree incorrect local backref count on 1533759488 root 5 owner 10122609 offset 0 found 1 wanted 0 back 0x55bbb1c76e40 incorrect local backref count on 1533759488 root 16389 owner 10122609 offset 0 found 0 wanted 1 back 0x55bbb3f49d00 backref disk bytenr does not match extent record, bytenr=1533759488, ref bytenr=0 backpointer mismatch on [1533759488 4096] data backref 1534713856 root 5 owner 7312396 offset 4096 num_refs 0 not found in extent tree incorrect local backref count on 1534713856 root 5 owner 7312396 offset 4096 found 1 wanted 0 back 0x55bbac7e8ad0 incorrect local backref count on 1534713856 root 274877906949 owner 7312396 offset 4096 found 0 wanted 1 back 0x55bbb3be83c0 backref disk bytenr does not match extent record, bytenr=1534713856, ref bytenr=0 backpointer mismatch on [1534713856 4096] backref bytes do not match extent backref, bytenr=1985052672, ref bytes=20480, backref bytes=86016 backpointer mismatch on [1985052672 20480] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space cache [4/7] checking fs roots root 5 inode 843065 errors 100, file extent discount Found file extent holes: start: 0, len: 4294971392 unresolved ref dir 843063 index 3 namelen 13 name Templates.msf filetype 1 errors 4, no inode ref unresolved ref dir 843063 index 3 namelen 13 name Templates/msf filetype 0 errors 3, no dir item, no dir index unresolved ref dir 843063 index 5 namelen 6 name Drafts filetype 1 errors 4, no inode ref unresolved ref dir 843063 index 5 namelen 6 name Dsafts filetype 0 errors 3, no dir item, no dir index root 5 inode 843068 errors 400, nbytes wrong unresolved ref dir 843063 index 15 namelen 14 name filterlog.html filetype 1 errors 2, no dir index unresolved ref dir 843063 index 18 namelen 8 name Sent.sbd filetype 2 errors 2, no dir index root 5 inode 908624 errors 1, no inode item unresolved ref dir 843063 index 15 namelen 14 name filterlog.html filetype 1 errors 5, no dir item, no inode ref root 5 inode 908627 errors 1, no inode item unresolved ref dir 843063 index 18 namelen 8 name Sent/sbd filetype 2 errors 5, no dir item, no inode ref root 5 inode 7287329 errors 100, file extent discount Found file extent holes: start: 0, len: 8192 ERROR: errors found in fs roots Opening filesystem to check... Checking filesystem on /dev/sda4 UUID: a942b8da-e92d-4348-8de9-ded1e5e095ad found 184471662592 bytes used, error(s) found total csum bytes: 178843432 total tree bytes: 1172783104 total fs tree bytes: 921796608 total extent tree bytes: 52232192 btree space waste bytes: 184041078 file data blocks allocated: 538597539840 referenced 190557810688
Re: Btrfs partition mount error
On 30.09.2019 10:31, Qu Wenruo wrote: Please provide the following dump: # btrfs ins dump-tree -b 855738744832 /dev/sdc1 attached btrfs-ins-dump-tree-855738744832-sdc1.output OK, tree-checker is again detecting the problem correctly. item 19 key (613039370240 EXTENT_ITEM 1048576) itemoff 15223 itemsize 53 refs 1 gen 52116 flags DATA extent data backref root FS_TREE objectid 5841996 offset 607125504 count 1 item 20 key (613040418816 EXTENT_ITEM 1032192) itemoff 15170 itemsize 117 refs 1 gen 52162 flags DATA extent data backref root FS_TREE objectid 5842000 offset 927858688 count 1 Item 20 should be a regular single reference, but its size is 117. But please check this: 117 = 0x75 53 = 0x35 The difference is 0x40, which is a single bit. And if itemsize for slot 20 is regular 53, then it matches its neighbors without any problem. So at least to me, this looks like a bitflip. Although I'm not sure if it's btrfs itself causing the problem or it's the hardware or even 3rd party kernel modules to blame. I ran memtest86 and discovered that, indeed, one of the memory modules has a bit flip: https://drive.google.com/file/d/1pnHfUcoSwH8DEnWfl0S7newMLBVo3eXE/view?usp=sharing https://drive.google.com/file/d/1iDOlSFfMSZ1ffxrSQTeXDmv0luOHT3yS/view?usp=sharing As you can see on screenshots, two bits flip: - 0x0040 - 0x0001 > 117 = 0x75 > 53 = 0x35 > > The difference is 0x40, which is a single bit. I think it's the 0x0040 bit.
Re: Btrfs partition mount error
On 2019/10/1 下午12:08, Andrey Ivanov wrote: > On 30.09.2019 9:27, Qu Wenruo wrote: >> I recommend to do a "btrfs check" on all fses. > > I had done "btrfs check" on /dev/sda4: > > attached btrfs-check-sda4.output > > There are some errors. How to fix them? It looks like "btrfs check --repair" can handle most of these bugs. Please do a backup and try repair. Thanks, Qu signature.asc Description: OpenPGP digital signature
Re: [PATCH] Setup GitLab-CI for btrfs-progs
Hi, I'm not yet a GitLab expert myself, but AFAIK ... Am 30.09.19 um 18:56 schrieb Lakshmipathi.G: > diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml > new file mode 100644 > index 000..2afde50 > --- /dev/null > +++ b/.gitlab-ci.yml > @@ -0,0 +1,181 @@ > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public > +# License v2 as published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +# General Public License for more details. > +# > +# You should have received a copy of the GNU General Public > +# License along with this program; if not, write to the > +# Free Software Foundation, Inc., 59 Temple Place - Suite 330, > +# Boston, MA 021110-1307, USA. > +# > + > +image: docker:18.09.7 > + > +services: > +- docker:18.09.7-dind > + > +variables: > + DOCKER_DRIVER: overlay2 > + > +stages: > + - build > + - btrfs-progs build > + - test > + > +variables: You already have a "variables" section above - merge them? > + DOCKER_DRIVER: overlay2 > + IMAGE_TAG: registry.gitlab.com/$CI_PROJECT_NAMESPACE/btrfs-progs:gitlab-ci > + > +before_script: > + - docker login --username $CI_REGISTRY_USER --password > $CI_REGISTRY_PASSWORD $CI_REGISTRY > + > +docker build: > + stage: build > + script: > +- cd gitlab-ci > +- docker pull $IMAGE_TAG > /dev/null && echo "Downloaded image" || ( > docker build -t $IMAGE_TAG . && docker push $IMAGE_TAG ) > +- cd .. > + > +## To enable or disable Kernel Build set BUILD_KERNEL: "1" or BUILD_KERNEL: > "0" > +## If you disable Kernel Build, make sure PREBUILT_KERNEL_ID points to > previously built the kernel job id. > + > +kernel build: > + variables: > +BUILD_KERNEL: "1" > +PREBUILT_KERNEL_ID: "288159334" > + before_script: > +- apk add curl unzip > + stage: build > + services: > +- docker:18.09.7-dind You already have "services" defined globally - no need to repeat that here again. > + script: > + - if [ "$BUILD_KERNEL" == "1" ]; then > + docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged > --device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/kernel_build.sh; > + else > + curl -o bzImage.zip --location --header "JOB-TOKEN:$CI_JOB_TOKEN" > "https://gitlab.com/api/v4/projects/$CI_PROJECT_ID/jobs/$PREBUILT_KERNEL_ID/artifacts"; > && unzip bzImage.zip; > + fi; > + artifacts: > +when: always > +paths: > + - bzImage > + > +# To enable or disable image build update BUILD_IMAGE value to "1" or "0". > +# If you disable Image Build, make sure PREBUILT_IMAGE_ID points to > previously built rootfs job id. > + > +image build: > + variables: > +BUILD_IMAGE: "1" > +PREBUILT_IMAGE_ID: "288506168" > + before_script: > +- apk add curl unzip > + stage: build > + services: > +- docker:18.09.7-dind dito > + script: > + - if [ "$BUILD_IMAGE" == "1" ]; then > + docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged > --device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/setup_image.sh; > + else > + curl -o qemu-image.img.zip --location --header > "JOB-TOKEN:$CI_JOB_TOKEN" > "https://gitlab.com/api/v4/projects/$CI_PROJECT_ID/jobs/$PREBUILT_IMAGE_ID/artifacts"; > && unzip qemu-image.img.zip; > + fi; > + artifacts: > +when: always > +paths: > + - qemu-image.img > + > +btrfs-progs build: > + stage: btrfs-progs build > + services: > +- docker:18.09.7-dind dito > + script: > + - docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged > --device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/run_tests.sh > + artifacts: > +expire_in: 1 week > +when: always > +paths: > + - qemu-image.img > + > +cli tests: > + stage: test > + services: > +- docker:18.09.7-dind dito > + script: > + - echo "./cli-tests.sh" > $PWD/cmd > + - docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged > --device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/run_tests.sh > + - test -e "result" || exit 1 # If result doesn't exists, job failed. > + artifacts: > +when: always > +paths: > + - "*tests-results.txt" > + > +convert tests: > + only: > +- devel > + stage: test > + services: > +- docker:18.09.7-dind dito > + script: > + - echo "./convert-tests.sh" > $PWD/cmd > + - docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged > --device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/run_tests.sh > + - test -e "result" || exit 1 > + artifacts: > +when: always > +paths: > + - "*tests-results.txt" > + > +fsck tests: > + stage: test > + services: > +- docker:18.09.7-dind dito > + script: > + - echo "./fsck-tests.sh" >
[PATCH v3] btrfs: removed unused return variable
From: Aliasgar Surti Removed unused return variable and replaced it with returning the value directly. Also removed the unnecessary forward declaration. Signed-off-by: Aliasgar Surti --- v2: Made btrfs_destroy_delayed_refs() as void and removed its declaration v3: Reverted back the change where the function was made void from int --- fs/btrfs/disk-io.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 044981c..e381368 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -52,8 +52,6 @@ static const struct extent_io_ops btree_extent_io_ops; static void end_workqueue_fn(struct btrfs_work *work); static void btrfs_destroy_ordered_extents(struct btrfs_root *root); -static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, - struct btrfs_fs_info *fs_info); static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root); static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info, struct extent_io_tree *dirty_pages, @@ -4255,7 +4253,6 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, struct rb_node *node; struct btrfs_delayed_ref_root *delayed_refs; struct btrfs_delayed_ref_node *ref; - int ret = 0; delayed_refs = &trans->delayed_refs; @@ -4263,7 +4260,7 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, if (atomic_read(&delayed_refs->num_entries) == 0) { spin_unlock(&delayed_refs->lock); btrfs_info(fs_info, "delayed_refs has NO entry"); - return ret; + return 0; } while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) { @@ -4306,8 +4303,7 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, } spin_unlock(&delayed_refs->lock); - - return ret; + return 0; } static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root) -- 2.7.4