Re: Btrfs partition mount error

2019-09-30 Thread Qu Wenruo


On 2019/9/30 下午2:58, Andrey Ivanov wrote:
> On 30.09.2019 9:27, Qu Wenruo wrote:
>>> This is the kernel message for /dev/sda4. It is also have some problems,
>>> but it is successfully mounted. I can't mount /dev/sdc1.
>>>
>>> Kernel message for /dev/sdc1:
>>> [    6.503265] BTRFS info (device sdc1): disk space caching is enabled
>>> [    6.503266] BTRFS info (device sdc1): has skinny extents
>>> [    8.890135] BTRFS critical (device sdc1): corrupt leaf: root=2
>>> block=855738744832 slot=20, unexpected item end, have 15287 expect 15223
>>> [    8.899635] BTRFS critical (device sdc1): corrupt leaf: root=2
>>> block=855738744832 slot=20, unexpected item end, have 15287 expect 15223
>>
>> This is from extent tree.
>>
>> Please provide the following dump:
>>
>> # btrfs ins dump-tree -b 855738744832 /dev/sdc1
> 
> attached btrfs-ins-dump-tree-855738744832-sdc1.output

OK, tree-checker is again detecting the problem correctly.

item 19 key (613039370240 EXTENT_ITEM 1048576) itemoff 15223 itemsize 53
refs 1 gen 52116 flags DATA
extent data backref root FS_TREE objectid 5841996 offset 
607125504 count 1
item 20 key (613040418816 EXTENT_ITEM 1032192) itemoff 15170 itemsize 
117
refs 1 gen 52162 flags DATA
extent data backref root FS_TREE objectid 5842000 offset 
927858688 count 1

Item 20 should be a regular single reference, but its size is 117.
But please check this:

117 = 0x75
53  = 0x35

The difference is 0x40, which is a single bit.

And if itemsize for slot 20 is regular 53, then it matches its neighbors
without any problem.

So at least to me, this looks like a bitflip.
Although I'm not sure if it's btrfs itself causing the problem or it's
the hardware or even 3rd party kernel modules to blame.

For this one, I could help you by just reverting that bit, and then you
may be able to continue mounting the fs or at least run btrfs check on it.

Please prepare an environment to compile btrfs-progs (at least
btrfs-corrupt-block) if you want to try.

> 
> 
 [[EXTRA INFO]]
 Please provide the following dump of that tree block by:
 # btrfs ins dump-tree -b 134079905792 /dev/sda4
>>>
>>> attached btrfs-ins-dump-tree-134079905792.output
>>
>> Confirmed.
>>
>> It looks like the data_len got some bitflips.
>>
>> In that offending leaf, there is only two data_len and all are one bit
>> flipped.
>> Considering that directory is not that old, it looks like some memory
>> bit flip.
>>
>> It's recommended to do a memtest to ensure it's not memory causing the
>> problem.
> 
> I did a memtest earlier. All had passed without errors. I can do a
> memtest again,
> but it seems to me that if the memory was faulty, the system would not
> be stable
> and often hung, but the system works fine.

Indeed, especially considering that there are already two bitflips in
one leaf, which should be pretty rare.

Any out-of-tree kernel modules? Or would you like to try v5.2.15 kernel,
which has a self detection for such problem.

Thanks,
Qu

> 
> 
>> I recommend to do a "btrfs check" on all fses.
> 
> Can't do check on /dev/sdc1:
> 
> # btrfs check /dev/sdc1
> Opening filesystem to check...
> incorrect offsets 15223 15287
> ERROR: cannot open file system



signature.asc
Description: OpenPGP digital signature


Re: Btrfs partition mount error

2019-09-30 Thread Andrey Ivanov

On 30.09.2019 10:31, Qu Wenruo wrote:

For this one, I could help you by just reverting that bit, and then you
may be able to continue mounting the fs or at least run btrfs check on it.

Please prepare an environment to compile btrfs-progs (at least
btrfs-corrupt-block) if you want to try.


Great, I'm ready to do it. Environment is ready.



I did a memtest earlier. All had passed without errors. I can do a
memtest again,
but it seems to me that if the memory was faulty, the system would not
be stable
and often hung, but the system works fine.


Indeed, especially considering that there are already two bitflips in
one leaf, which should be pretty rare.

Any out-of-tree kernel modules? 


I only have vmware out-of-tree kernel modules.



Or would you like to try v5.2.15 kernel,
which has a self detection for such problem.


If I understand correctly, if I install v5.2.15 or later kernel
then I'll fix my /dev/sda4?


Re: Btrfs partition mount error

2019-09-30 Thread Qu Wenruo


On 2019/9/30 下午4:00, Andrey Ivanov wrote:
> On 30.09.2019 10:31, Qu Wenruo wrote:
>> For this one, I could help you by just reverting that bit, and then you
>> may be able to continue mounting the fs or at least run btrfs check on
>> it.
>>
>> Please prepare an environment to compile btrfs-progs (at least
>> btrfs-corrupt-block) if you want to try.
> 
> Great, I'm ready to do it. Environment is ready.
> 
> 
>>> I did a memtest earlier. All had passed without errors. I can do a
>>> memtest again,
>>> but it seems to me that if the memory was faulty, the system would not
>>> be stable
>>> and often hung, but the system works fine.
>>
>> Indeed, especially considering that there are already two bitflips in
>> one leaf, which should be pretty rare.
>>
>> Any out-of-tree kernel modules? 
> 
> I only have vmware out-of-tree kernel modules.
> 
> 
>> Or would you like to try v5.2.15 kernel,
>> which has a self detection for such problem.
> 
> If I understand correctly, if I install v5.2.15 or later kernel
> then I'll fix my /dev/sda4?

Nope, v5.2.15 is only to detect such problem earlier and prevent such
bad data to reach disk.

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


[PATCH] fstests: btrfs: Add regression test to check if btrfs can handle high devid

2019-09-30 Thread Qu Wenruo
Add a regression test to check if btrfs can handle high devid.

The test will add and remove devices to a btrfs fs, so that the devid
will increase to uncommon but still valid values.

The regression is introduced by kernel commit ab4ba2e13346 ("btrfs:
tree-checker: Verify dev item").
The fix is titled "btrfs: tree-checker: Fix wrong check on max devid".

Signed-off-by: Qu Wenruo 
---
 tests/btrfs/194 | 73 +
 tests/btrfs/194.out |  2 ++
 tests/btrfs/group   |  1 +
 3 files changed, 76 insertions(+)
 create mode 100755 tests/btrfs/194
 create mode 100644 tests/btrfs/194.out

diff --git a/tests/btrfs/194 b/tests/btrfs/194
new file mode 100755
index ..7a52ed86
--- /dev/null
+++ b/tests/btrfs/194
@@ -0,0 +1,73 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test 194
+#
+# Test if btrfs can handle large device ids.
+#
+# The regression is introduced by kernel commit ab4ba2e13346 ("btrfs:
+# tree-checker: Verify dev item").
+# The fix is titlted: "btrfs: tree-checker: Fix wrong check on max devid"
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch_dev_pool 2
+_scratch_dev_pool_get 2
+
+# The wrong check limit is based on node size, to reduce runtime, we only
+# support 4K page size system for now
+if [ $(get_page_size) != 4096 ]; then
+   _notrun "This test need 4k page size"
+fi
+
+device_1=$(echo $SCRATCH_DEV_POOL | awk '{print $1}')
+device_2=$(echo $SCRATCH_DEV_POOL | awk '{print $2}')
+
+echo device_1=$device_1 device_2=$device_2 >> $seqres.full
+
+# Use 4K nodesize to reduce runtime
+_scratch_mkfs -n 4k >> $seqres.full
+_scratch_mount
+
+# Add and remove device in a loop, one loop will increase devid by 2.
+# for 4k nodesize, the wrong check will be triggered at devid 123.
+# here 64 is enough to trigger such regression
+for (( i = 0; i < 64; i++ )); do
+   $BTRFS_UTIL_PROG device add -f $device_2 $SCRATCH_MNT
+   $BTRFS_UTIL_PROG device del $device_1 $SCRATCH_MNT
+   $BTRFS_UTIL_PROG device add -f $device_1 $SCRATCH_MNT
+   $BTRFS_UTIL_PROG device del $device_2 $SCRATCH_MNT
+done
+_scratch_dev_pool_put
+
+echo "Silence is golden"
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/194.out b/tests/btrfs/194.out
new file mode 100644
index ..7bfd50ff
--- /dev/null
+++ b/tests/btrfs/194.out
@@ -0,0 +1,2 @@
+QA output created by 194
+Silence is golden
diff --git a/tests/btrfs/group b/tests/btrfs/group
index b92cb12c..ef1f0e1b 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -196,3 +196,4 @@
 191 auto quick send dedupe
 192 auto replay snapshot stress
 193 auto quick qgroup enospc limit
+194 auto
-- 
2.22.0



Re: [PATCH 2/3] btrfs-progs: check/original: Add check and repair for invalid inode generation

2019-09-30 Thread Nikolay Borisov



On 24.09.19 г. 11:11 ч., Qu Wenruo wrote:
> There are at least two bug reports of kernel tree-checker complaining
> about invalid inode generation.
> 
> All offending inodes seem to be caused by old kernel around 2014, with
> inode generation overflow.
> 
> So add such check and repair ability to lowmem mode check first.
> 
> This involves:
> - Calculate the inode generation upper limit
>   Unlike the original mode context, we don't have anyway to determine if

Perhaps you meant "lowmem mode context" since in lowmem you deduce
whether it's a log-tree inode or not.

>   this inode belongs to log tree.
>   So we use super_generation + 1 as upper limit, just like what we did
>   in kernel tree checker.
> 
> - Check if the inode generation is larger than the upper limit
> 
> - Repair by resetting inode generation to current transaction
>   generation
>   The difference is, in original mode, we have a common trans handle for
>   all repair and reset path for each repair.
> 
> Reported-by: Charles Wright 
> Signed-off-by: Qu Wenruo 

I tested that code with the image provided and it does work as expected
as well as it's fairly obvious what's happening. So:

Reviewed-by: Nikolay Borisov 
Tested-by: Nikolay Borisov 

> ---
>  check/main.c  | 50 ++-
>  check/mode-original.h |  1 +
>  2 files changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/check/main.c b/check/main.c
> index c2d0f394..018707c8 100644
> --- a/check/main.c
> +++ b/check/main.c
> @@ -604,6 +604,8 @@ static void print_inode_error(struct btrfs_root *root, 
> struct inode_record *rec)
>   if (errors & I_ERR_INVALID_IMODE)
>   fprintf(stderr, ", invalid inode mode bit 0%o",
>   rec->imode & ~0);
> + if (errors & I_ERR_INVALID_GEN)
> + fprintf(stderr, ", invalid inode generation");
>   fprintf(stderr, "\n");
>  
>   /* Print the holes if needed */
> @@ -857,6 +859,7 @@ static int process_inode_item(struct extent_buffer *eb,
>  {
>   struct inode_record *rec;
>   struct btrfs_inode_item *item;
> + u64 gen_uplimit;
>   u64 flags;
>  
>   rec = active_node->current;
> @@ -879,6 +882,13 @@ static int process_inode_item(struct extent_buffer *eb,
>   if (S_ISLNK(rec->imode) &&
>   flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))
>   rec->errors |= I_ERR_ODD_INODE_FLAGS;
> + /*
> +  * We don't have accurate root info to determine the correct
> +  * inode generation uplimit, use super_generation + 1 anyway
> +  */
> + gen_uplimit = btrfs_super_generation(global_info->super_copy) + 1;
> + if (btrfs_inode_generation(eb, item) > gen_uplimit)
> + rec->errors |= I_ERR_INVALID_GEN;
>   maybe_free_inode_rec(&active_node->inode_cache, rec);
>   return 0;
>  }
> @@ -2774,6 +2784,41 @@ static int repair_imode_original(struct 
> btrfs_trans_handle *trans,
>   return ret;
>  }
>  
> +static int repair_inode_gen_original(struct btrfs_trans_handle *trans,
> +  struct btrfs_root *root,
> +  struct btrfs_path *path,
> +  struct inode_record *rec)
> +{
> + struct btrfs_inode_item *ii;
> + struct btrfs_key key;
> + int ret;
> +
> + key.objectid = rec->ino;
> + key.offset = 0;
> + key.type = BTRFS_INODE_ITEM_KEY;
> +
> + ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
> + if (ret > 0) {
> + ret = -ENOENT;
> + error("no inode item found for ino %llu", rec->ino);
> + return ret;
> + }
> + if (ret < 0) {
> + errno = -ret;
> + error("failed to search inode item for ino %llu: %m", rec->ino);
> + return ret;
> + }
> + ii = btrfs_item_ptr(path->nodes[0], path->slots[0],
> + struct btrfs_inode_item);
> + btrfs_set_inode_generation(path->nodes[0], ii, trans->transid);
> + btrfs_mark_buffer_dirty(path->nodes[0]);
> + btrfs_release_path(path);
> + printf("resetting inode generation to %llu for ino %llu\n",
> + trans->transid, rec->ino);
> + rec->errors &= ~I_ERR_INVALID_GEN;
> + return 0;
> +}
> +
>  static int try_repair_inode(struct btrfs_root *root, struct inode_record 
> *rec)
>  {
>   struct btrfs_trans_handle *trans;
> @@ -2794,7 +2839,8 @@ static int try_repair_inode(struct btrfs_root *root, 
> struct inode_record *rec)
>I_ERR_FILE_NBYTES_WRONG |
>I_ERR_INLINE_RAM_BYTES_WRONG |
>I_ERR_MISMATCH_DIR_HASH |
> -  I_ERR_UNALIGNED_EXTENT_REC)))
> +  I_ERR_UNALIGNED_EXTENT_REC |
> +  I_ERR_INVALID_GEN)))
>   return rec->errors;
>  
>   /*
> @@ -2829,6 +2875,8 @@ static int try_repair_inode(struct btrfs_root *root, 
> struct 

Re: Btrfs partition mount error

2019-09-30 Thread Qu Wenruo


On 2019/9/30 下午4:00, Andrey Ivanov wrote:
> On 30.09.2019 10:31, Qu Wenruo wrote:
>> For this one, I could help you by just reverting that bit, and then you
>> may be able to continue mounting the fs or at least run btrfs check on
>> it.
>>
>> Please prepare an environment to compile btrfs-progs (at least
>> btrfs-corrupt-block) if you want to try.
> 
> Great, I'm ready to do it. Environment is ready.

Here it is.

https://github.com/adam900710/btrfs-progs/tree/dirty_fix

To use it:

$ ./btrfs-corrupt-block -X /dev/sdc1

But please keep in mind that, the offending tree block has more problems
than I originally thought:
- Bad item size at slot 20
  The original one reported by kernel.

- Bad key offset at slot 9
- Bad item size at slot 41
  All bitflips.

So all 3 bit flips happened in one leaf, it's really too rare.

I really don't have any clue how this could happen.

Anyway, I don't think the dirty fix is enough considering how many
strange things I have found just in one leaf.

Thanks,
Qu
> 
> 
>>> I did a memtest earlier. All had passed without errors. I can do a
>>> memtest again,
>>> but it seems to me that if the memory was faulty, the system would not
>>> be stable
>>> and often hung, but the system works fine.
>>
>> Indeed, especially considering that there are already two bitflips in
>> one leaf, which should be pretty rare.
>>
>> Any out-of-tree kernel modules? 
> 
> I only have vmware out-of-tree kernel modules.
> 
> 
>> Or would you like to try v5.2.15 kernel,
>> which has a self detection for such problem.
> 
> If I understand correctly, if I install v5.2.15 or later kernel
> then I'll fix my /dev/sda4?



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 2/3] btrfs-progs: check/original: Add check and repair for invalid inode generation

2019-09-30 Thread Qu Wenruo



On 2019/9/30 下午4:41, Nikolay Borisov wrote:
>
>
> On 24.09.19 г. 11:11 ч., Qu Wenruo wrote:
>> There are at least two bug reports of kernel tree-checker complaining
>> about invalid inode generation.
>>
>> All offending inodes seem to be caused by old kernel around 2014, with
>> inode generation overflow.
>>
>> So add such check and repair ability to lowmem mode check first.
>>
>> This involves:
>> - Calculate the inode generation upper limit
>>   Unlike the original mode context, we don't have anyway to determine if
>
> Perhaps you meant "lowmem mode context" since in lowmem you deduce
> whether it's a log-tree inode or not.

Oh, right.

Would you mind to fix this in commit time, David?

>
>>   this inode belongs to log tree.
>>   So we use super_generation + 1 as upper limit, just like what we did
>>   in kernel tree checker.
>>
>> - Check if the inode generation is larger than the upper limit
>>
>> - Repair by resetting inode generation to current transaction
>>   generation
>>   The difference is, in original mode, we have a common trans handle for
>>   all repair and reset path for each repair.
>>
>> Reported-by: Charles Wright 
>> Signed-off-by: Qu Wenruo 
>
> I tested that code with the image provided and it does work as expected
> as well as it's fairly obvious what's happening. So:
>
> Reviewed-by: Nikolay Borisov 
> Tested-by: Nikolay Borisov 

Thanks for the review and test.

Thanks,
Qu
>
>> ---
>>  check/main.c  | 50 ++-
>>  check/mode-original.h |  1 +
>>  2 files changed, 50 insertions(+), 1 deletion(-)
>>
>> diff --git a/check/main.c b/check/main.c
>> index c2d0f394..018707c8 100644
>> --- a/check/main.c
>> +++ b/check/main.c
>> @@ -604,6 +604,8 @@ static void print_inode_error(struct btrfs_root *root, 
>> struct inode_record *rec)
>>  if (errors & I_ERR_INVALID_IMODE)
>>  fprintf(stderr, ", invalid inode mode bit 0%o",
>>  rec->imode & ~0);
>> +if (errors & I_ERR_INVALID_GEN)
>> +fprintf(stderr, ", invalid inode generation");
>>  fprintf(stderr, "\n");
>>
>>  /* Print the holes if needed */
>> @@ -857,6 +859,7 @@ static int process_inode_item(struct extent_buffer *eb,
>>  {
>>  struct inode_record *rec;
>>  struct btrfs_inode_item *item;
>> +u64 gen_uplimit;
>>  u64 flags;
>>
>>  rec = active_node->current;
>> @@ -879,6 +882,13 @@ static int process_inode_item(struct extent_buffer *eb,
>>  if (S_ISLNK(rec->imode) &&
>>  flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))
>>  rec->errors |= I_ERR_ODD_INODE_FLAGS;
>> +/*
>> + * We don't have accurate root info to determine the correct
>> + * inode generation uplimit, use super_generation + 1 anyway
>> + */
>> +gen_uplimit = btrfs_super_generation(global_info->super_copy) + 1;
>> +if (btrfs_inode_generation(eb, item) > gen_uplimit)
>> +rec->errors |= I_ERR_INVALID_GEN;
>>  maybe_free_inode_rec(&active_node->inode_cache, rec);
>>  return 0;
>>  }
>> @@ -2774,6 +2784,41 @@ static int repair_imode_original(struct 
>> btrfs_trans_handle *trans,
>>  return ret;
>>  }
>>
>> +static int repair_inode_gen_original(struct btrfs_trans_handle *trans,
>> + struct btrfs_root *root,
>> + struct btrfs_path *path,
>> + struct inode_record *rec)
>> +{
>> +struct btrfs_inode_item *ii;
>> +struct btrfs_key key;
>> +int ret;
>> +
>> +key.objectid = rec->ino;
>> +key.offset = 0;
>> +key.type = BTRFS_INODE_ITEM_KEY;
>> +
>> +ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
>> +if (ret > 0) {
>> +ret = -ENOENT;
>> +error("no inode item found for ino %llu", rec->ino);
>> +return ret;
>> +}
>> +if (ret < 0) {
>> +errno = -ret;
>> +error("failed to search inode item for ino %llu: %m", rec->ino);
>> +return ret;
>> +}
>> +ii = btrfs_item_ptr(path->nodes[0], path->slots[0],
>> +struct btrfs_inode_item);
>> +btrfs_set_inode_generation(path->nodes[0], ii, trans->transid);
>> +btrfs_mark_buffer_dirty(path->nodes[0]);
>> +btrfs_release_path(path);
>> +printf("resetting inode generation to %llu for ino %llu\n",
>> +trans->transid, rec->ino);
>> +rec->errors &= ~I_ERR_INVALID_GEN;
>> +return 0;
>> +}
>> +
>>  static int try_repair_inode(struct btrfs_root *root, struct inode_record 
>> *rec)
>>  {
>>  struct btrfs_trans_handle *trans;
>> @@ -2794,7 +2839,8 @@ static int try_repair_inode(struct btrfs_root *root, 
>> struct inode_record *rec)
>>   I_ERR_FILE_NBYTES_WRONG |
>>   I_ERR_INLINE_RAM_BYTES_WRONG |
>>   I_ERR_MISMATCH_DIR_HASH |
>> - I_ERR_UNALIGNED_EXTENT_REC)))
>> +   

[PATCH] Btrfs: fix memory leak due to concurrent append writes with fiemap

2019-09-30 Thread fdmanana
From: Filipe Manana 

When we have a buffered write that starts at an offset greater than or
equals to the file's size happening concurrently with a full ranged
fiemap, we can end up leaking an extent state structure.

Suppose we have a file with a size of 1Mb, and before the buffered write
and fiemap are performed, it has a single extent state in its io tree
representing the range from 0 to 1Mb, with the EXTENT_DELALLOC bit set.

The following sequence diagram shows how the memory leak happens if a
fiemap a buffered write, starting at offset 1Mb and with a length of
4Kb, are performed concurrently.

  CPU 1  CPU 2

  extent_fiemap()
--> it's a full ranged fiemap
range from 0 to LLONG_MAX - 1
(9223372036854775807)

--> locks range in the inode's
io tree
  --> after this we have 2 extent
  states in the io tree:
  --> 1 for range [0, 1Mb[ with
  the bits EXTENT_LOCKED and
  EXTENT_DELALLOC_BITS set
  --> 1 for the range
  [1Mb, LLONG_MAX[ with
  the EXTENT_LOCKED bit set

  --> start buffered write at 
offset
  1Mb with a length of 4Kb

  btrfs_file_write_iter()

btrfs_buffered_write()
  --> cached_state is NULL

  
lock_and_cleanup_extent_if_need()
--> returns 0 and does 
not lock
range because it 
starts
at current i_size / 
eof

  --> cached_state remains 
NULL

  btrfs_dirty_pages()

btrfs_set_extent_delalloc()
  (...)
  __set_extent_bit()

--> splits extent 
state for range
[1Mb, 
LLONG_MAX[ and now we
have 2 extent 
states:

--> one for the 
range
[1Mb, 1Mb + 
4Kb[ with

EXTENT_LOCKED set
--> another one 
for the range
[1Mb + 4Kb, 
LLONG_MAX[ with

EXTENT_LOCKED set as well

--> sets 
EXTENT_DELALLOC on the
extent state 
for the range
[1Mb, 1Mb + 4Kb[
--> caches extent 
state
[1Mb, 1Mb + 
4Kb[ into
@cached_state 
because it has
the bit 
EXTENT_LOCKED set

--> btrfs_buffered_write() 
ends up
with a non-NULL 
cached_state and
never calls anything to 
release its
reference on it, 
resulting in a
memory leak

Fix this by calling free_extent_state() on cached_state if the range was
not locked by lock_and_cleanup_extent_if_need().

The same issue can happen if anything else other than fiemap locks a range
that covers eof and beyond.

This could be triggered, sporadically, by test case generic/561 from the
fstests suite, which makes duperemove run concurrently with fsstress, and
duperemove does plenty of calls to fiemap. When CONFIG_BTRFS_DEBUG is set
the leak is reported in dmesg/syslog when removing the btrfs module with
a message like the following:

  [77100.039461] BTRFS: state leak: start 6574080 end 6582271 state 16402 in 
tree 0 refs 1

Otherwise (CONFIG_BTRFS_DEBUG not set) detectable with kmemleak.

CC: sta...@vger.kernel.org # 4.16+
Signed-off-by: Filipe Manana 
---
 fs/btrfs/file.c | 13 -
 1 file changed, 12

RE: BTRFS checksum mismatch - false positives

2019-09-30 Thread hoegge
Hi Christ,

Thank you very much for your help and insight. Has helped me a lot 
understanding the setup with BTRFS on top of SHR. I will rebuild the whole 
thing from scratch, as you recommend, since it seems only to get worse. The 
file mapping tool from Synology (synodataverifier) used on the files with 
checksum errors indicate that the corrupted blocks are on two different discs 
(out of 8), one quite new (3 TB drive) and one older (500 GB drive). Having two 
partially defect drives on a RAID 5'ish system of course will create a lot of 
trouble. Only wonder why none of them have any errors on S.M.A.R.T. (also 
extended tests) at all - and that Synology did not report any read errors.

I'm halfway through my second full backup of all files in addition to my cloud 
backup, so then I'll toss the two suspicious drives and build a new system and 
RAID (RAID 6 / SHR 2) with two parity disks instead of just one. 

BTW, Do you have any idea of when the fixed RAID 5/6 BTRFS functionality might 
become mainstream, so we can abandon MD and simplify the stack to purely BTRFS?

Thanks again

Hoegge



-Original Message-
From: Chris Murphy  
Sent: 2019-09-26 22:28
To: hoe...@gmail.com
Cc: Chris Murphy ; Btrfs BTRFS 

Subject: Re: BTRFS checksum mismatch - false positives

>From the log offlist

2019-09-08T17:27:02+02:00 MHPNAS kernel: [   22.396165] md: invalid
raid superblock magic on sda5
2019-09-08T17:27:02+02:00 MHPNAS kernel: [   22.401816] md: sda5 does
not have a valid v0.90 superblock, not importing!

That doesn't sound good. It's not a Btrfs problem but a md/mdadm problem. 
You'll have to get support for this from Synology, only they understand the 
design of the storage stack layout and whether these error messages are 
important or not and how to fix them. Anyone else speculating could end up 
causing damage to the NAS and data to be lost.


2019-09-08T17:27:02+02:00 MHPNAS kernel: [   22.913298] md: sda2 has
different UUID to sda1

There are several messages like this. I can't tell if they're just 
informational and benign or a problem. Also not related to Btrfs.


2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.419199] BTRFS warning (device 
dm-1): BTRFS: dm-1 checksum verify failed on 375259512832 wanted EA1A10E3 found 
3080B64F level 0
2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.419199]
2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.458453] BTRFS warning (device 
dm-1): BTRFS: dm-1 checksum verify failed on 375259512832 wanted EA1A10E3 found 
3080B64F level 0
2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.458453]
2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.528385] BTRFS: read error 
corrected: ino 1 off 375259512832 (dev /dev/vg1/volume_1 sector
751819488)
2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.539631] BTRFS: read error 
corrected: ino 1 off 375259516928 (dev /dev/vg1/volume_1 sector
751819496)
2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.550785] BTRFS: read error 
corrected: ino 1 off 375259521024 (dev /dev/vg1/volume_1 sector
751819504)
2019-09-08T22:09:33+02:00 MHPNAS kernel: [16997.561990] BTRFS: read error 
corrected: ino 1 off 375259525120 (dev /dev/vg1/volume_1 sector
751819512)

There are a bunch of messages like this. Btrfs is finding metadata checksum 
errors, some kind of corruption has happened with one of the copies, and it's 
been fixed up. But why are things being corrupt in the first place? Ordinary 
bad sectors maybe? There's a lot of these  - like really a lot. Hundreds of 
affected sectors. There are too many for me to read through and see if all of 
them were corrected by DUP metadata.



2019-09-22T21:24:27+02:00 MHPNAS kernel: [1224856.764098] md2:
syno_self_heal_is_valid_md_stat(496): md's current state is not suitable for 
data correction

What does that mean? Also not a Btrfs problem. There are quite a few of these.



2019-09-23T11:49:20+02:00 MHPNAS kernel: [1276791.652946] BTRFS error (device 
dm-1): BTRFS: dm-1 failed to repair btree csum error on 1353162506240, mirror = 
1

OK and a few of these also. This means that some metadata could not be 
repaired, likely because both copies are corrupt.

My recommendation is to freshen your backups now while you still can, and 
prepare to rebuild the NAS. i.e. these are not likely repairable problems. Once 
both copies of Btrfs metadata are bad, it's usually not fixable you just have 
to recreate the file system from scratch.

You'll have to move everything off the NAS - and anything that's really 
important you will want at least two independent copies of, of course, and then 
you're going to obliterate the array and start from scratch. While you're at 
it, you might as well make sure you've got the latest supported version of the 
software for this product. Start with that. Then follow the Synology procedure 
to wipe the NAS totally and set it up again. You'll want to make sure the 
procedure you use writes out all new metadata for everything: mdadm, lvm, 
Btrfs. Nothi

Re: BTRFS Raid5 error during Scrub.

2019-09-30 Thread Robert Krig
I've upgraded to btrfs-progs v5.2.1
Here is the output from btrfs check -p --readonly /dev/sda


Opening filesystem to check...
Checking filesystem on /dev/sda
UUID: f7573191-664f-4540-a830-71ad654d9301
[1/7] checking root items  (0:01:17 elapsed,
5138533 items checked)
parent transid verify failed on 48781340082176 wanted 109181 found
109008items checked)
parent transid verify failed on 48781340082176 wanted 109181 found
109008
parent transid verify failed on 48781340082176 wanted 109181 found
109008
Ignoring transid failure
leaf parent key incorrect 48781340082176
bad block 48781340082176
[2/7] checking extents (0:03:22 elapsed,
1143429 items checked)
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache(0:05:10 elapsed, 7236
items checked)
parent transid verify failed on 48781340082176 wanted 109181 found
109008ems checked)
Ignoring transid failure
root 15197 inode 81781 errors 1000, some csum missing48 elapsed, 33952
items checked)
[4/7] checking fs roots(0:42:53 elapsed, 34145
items checked)
ERROR: errors found in fs roots
found 22975533985792 bytes used, error(s) found
total csum bytes: 16806711120
total tree bytes: 18733842432
total fs tree bytes: 130121728
total extent tree bytes: 466305024
btree space waste bytes: 1100711497
file data blocks allocated: 3891333279744
 referenced 1669470507008



Am Montag, den 30.09.2019, 09:01 +0300 schrieb Nikolay Borisov:
> 
> On 30.09.19 г. 0:38 ч., Robert Krig wrote:
> > Hi guys. First off, I've got backups so no worries there. I'm just
> > trying to understand what's happening and which files are affected.
> > I've got a scrub running and the kernel dmesg buffer spit out the
> > following:
> > 
> > BTRFS warning (device sda): checksum/header error at logical
> > 48781340082176 on dev /dev/sdb, physical 5651115966464: metadata
> > leaf
> > (level 0) in tree 7
> 
> This indicates you have corrupted checksum tree blocks. Concretely
> the
> checksum in the header does not match the data in the block.
> 
> > BTRFS warning (device sda): checksum/header error at logical
> > 48781340082176 on dev /dev/sdb, physical 5651115966464: metadata
> > leaf
> > (level 0) in tree 7
> > BTRFS error (device sda): bdev /dev/sdb errs: wr 0, rd 0, flush 0,
> > corrupt 0, gen 1
> > BTRFS error (device sda): unable to fixup (regular) error at
> > logical
> > 48781340082176 on dev /dev/sdb
> > BTRFS warning (device sda): checksum/header error at logical
> > 48781340082176 on dev /dev/sdc, physical 5651115966464: metadata
> > leaf
> > (level 0) in tree 7
> > BTRFS warning (device sda): checksum/header error at logical
> > 48781340082176 on dev /dev/sdc, physical 5651115966464: metadata
> > leaf
> > (level 0) in tree 7
> > BTRFS error (device sda): bdev /dev/sdc errs: wr 0, rd 0, flush 0,
> > corrupt 0, gen 1
> > BTRFS error (device sda): unable to fixup (regular) error at
> > logical
> > 48781340082176 on dev /dev/sdc
> > 
> > Is there any way I can find out which files are affected so that I
> > can
> > just restore them from a backup? 
> > 
> > I'm running Debian Buster with Kernel 5.2.
> 
> There was a known corruption issue in 5.2 kernel up to v5.2.15.
> However
> it could result in partial writes of transaction data whereas your
> report seems to indicate data has been written corrupted on-disk.
> 
> As a first step run :
> 
> btrfs check repair --readonly
> 
> (this is a read only command so it cannot do further damage) and post
> the log to see what else btrfs check finds.
> 
> > Btrfs-progs v4.20.1
> > 
> > 
> > Let me know if you need any further info.
> > 
> > 



Re: [BTRFS Raid5 error during Scrub.

2019-09-30 Thread Graham Cobb
On 29/09/2019 22:38, Robert Krig wrote:
> I'm running Debian Buster with Kernel 5.2.
> Btrfs-progs v4.20.1

I am running Debian testing (bullseye) and have chosen not to install
the 5.2 kernel yet because the version of it in bullseye
(linux-image-5.2.0-2-amd64) is based on 5.2.9 and (as far as I can tell)
does not contain the BTRFS corruption fix.

I do not know which version of the 5.2 kernel is in buster but you may
want to check that it contains a backport of the BTRFS fix or downgrade
to the 4.19 kernel until you can be sure.

I note that linux-image-5.2.0-3-amd64 is in unstable and is based on
5.2.17 so should have the fix. I presume it will make its way to testing
soon.

If anyone can confirm which versions of the Debian kernel package the
5.2 corruption fixes are in, it would be helpful.


Re: Btrfs partition mount error

2019-09-30 Thread Andrey Ivanov

On 30.09.2019 11:59, Qu Wenruo wrote:

Please prepare an environment to compile btrfs-progs (at least
btrfs-corrupt-block) if you want to try.


Great, I'm ready to do it. Environment is ready.


Here it is.

https://github.com/adam900710/btrfs-progs/tree/dirty_fix

To use it:

$ ./btrfs-corrupt-block -X /dev/sdc1


I had built and run it:

# /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block -X 
/dev/sdc1
incorrect offsets 15223 15287
Open ctree failed


Re: Btrfs partition mount error

2019-09-30 Thread Qu Wenruo


On 2019/9/30 下午6:01, Andrey Ivanov wrote:
> On 30.09.2019 11:59, Qu Wenruo wrote:
 Please prepare an environment to compile btrfs-progs (at least
 btrfs-corrupt-block) if you want to try.
>>>
>>> Great, I'm ready to do it. Environment is ready.
>>
>> Here it is.
>>
>> https://github.com/adam900710/btrfs-progs/tree/dirty_fix
>>
>> To use it:
>>
>> $ ./btrfs-corrupt-block -X /dev/sdc1
> 
> I had built and run it:
> 
> # /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block
> -X /dev/sdc1
> incorrect offsets 15223 15287
> Open ctree failed

That branch updated to skip the item offset check completely.

But please keep in mind that, that check itself still makes sense, so
please use the original "btrfs check" to check the fs.

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] fstests: btrfs: Add regression test to check if btrfs can handle high devid

2019-09-30 Thread Anand Jain




On 9/30/19 4:37 PM, Qu Wenruo wrote:

Add a regression test to check if btrfs can handle high devid.

The test will add and remove devices to a btrfs fs, so that the devid
will increase to uncommon but still valid values.

The regression is introduced by kernel commit ab4ba2e13346 ("btrfs:
tree-checker: Verify dev item").
The fix is titled "btrfs: tree-checker: Fix wrong check on max devid".

Signed-off-by: Qu Wenruo 


Suggested-by: Anand Jain 
Reviewed-by: Anand Jain 

one nit below.


---
  tests/btrfs/194 | 73 +
  tests/btrfs/194.out |  2 ++
  tests/btrfs/group   |  1 +
  3 files changed, 76 insertions(+)
  create mode 100755 tests/btrfs/194
  create mode 100644 tests/btrfs/194.out

diff --git a/tests/btrfs/194 b/tests/btrfs/194
new file mode 100755
index ..7a52ed86
--- /dev/null
+++ b/tests/btrfs/194
@@ -0,0 +1,73 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test 194
+#
+# Test if btrfs can handle large device ids.
+#
+# The regression is introduced by kernel commit ab4ba2e13346 ("btrfs:
+# tree-checker: Verify dev item").
+# The fix is titlted: "btrfs: tree-checker: Fix wrong check on max devid"
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch_dev_pool 2
+_scratch_dev_pool_get 2
+
+# The wrong check limit is based on node size, to reduce runtime, we only
+# support 4K page size system for now
+if [ $(get_page_size) != 4096 ]; then
+   _notrun "This test need 4k page size"
+fi
+
+device_1=$(echo $SCRATCH_DEV_POOL | awk '{print $1}')
+device_2=$(echo $SCRATCH_DEV_POOL | awk '{print $2}')
+
+echo device_1=$device_1 device_2=$device_2 >> $seqres.full
+
+# Use 4K nodesize to reduce runtime
+_scratch_mkfs -n 4k >> $seqres.full


 This init part should be _scratch_mkfs on only one device right?

Thanks, Anand


+_scratch_mount
+
+# Add and remove device in a loop, one loop will increase devid by 2.
+# for 4k nodesize, the wrong check will be triggered at devid 123.
+# here 64 is enough to trigger such regression





+for (( i = 0; i < 64; i++ )); do
+   $BTRFS_UTIL_PROG device add -f $device_2 $SCRATCH_MNT
+   $BTRFS_UTIL_PROG device del $device_1 $SCRATCH_MNT
+   $BTRFS_UTIL_PROG device add -f $device_1 $SCRATCH_MNT
+   $BTRFS_UTIL_PROG device del $device_2 $SCRATCH_MNT
+done




+_scratch_dev_pool_put
+
+echo "Silence is golden"
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/194.out b/tests/btrfs/194.out
new file mode 100644
index ..7bfd50ff
--- /dev/null
+++ b/tests/btrfs/194.out
@@ -0,0 +1,2 @@
+QA output created by 194
+Silence is golden
diff --git a/tests/btrfs/group b/tests/btrfs/group
index b92cb12c..ef1f0e1b 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -196,3 +196,4 @@
  191 auto quick send dedupe
  192 auto replay snapshot stress
  193 auto quick qgroup enospc limit
+194 auto



Re: Btrfs partition mount error

2019-09-30 Thread Andrey Ivanov

On 30.09.2019 13:10, Qu Wenruo wrote:

I had built and run it:

# /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block
-X /dev/sdc1
incorrect offsets 15223 15287
Open ctree failed


That branch updated to skip the item offset check completely.

But please keep in mind that, that check itself still makes sense, so
please use the original "btrfs check" to check the fs.


# /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block -X 
/dev/sdc1
key (613019873280 EXTENT_ITEM 1048576)slot end outside of leaf 1073755934 > 
16283
Open ctree failed

# btrfs check /dev/sdc1
Opening filesystem to check...
incorrect offsets 15223 15287
ERROR: cannot open file system


Re: [PATCH] fstests: btrfs: Add regression test to check if btrfs can handle high devid

2019-09-30 Thread Filipe Manana
On Mon, Sep 30, 2019 at 9:39 AM Qu Wenruo  wrote:
>
> Add a regression test to check if btrfs can handle high devid.
>
> The test will add and remove devices to a btrfs fs, so that the devid
> will increase to uncommon but still valid values.
>
> The regression is introduced by kernel commit ab4ba2e13346 ("btrfs:
> tree-checker: Verify dev item").
> The fix is titled "btrfs: tree-checker: Fix wrong check on max devid".
>
> Signed-off-by: Qu Wenruo 
> ---
>  tests/btrfs/194 | 73 +
>  tests/btrfs/194.out |  2 ++
>  tests/btrfs/group   |  1 +
>  3 files changed, 76 insertions(+)
>  create mode 100755 tests/btrfs/194
>  create mode 100644 tests/btrfs/194.out
>
> diff --git a/tests/btrfs/194 b/tests/btrfs/194
> new file mode 100755
> index ..7a52ed86
> --- /dev/null
> +++ b/tests/btrfs/194
> @@ -0,0 +1,73 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 194
> +#
> +# Test if btrfs can handle large device ids.
> +#
> +# The regression is introduced by kernel commit ab4ba2e13346 ("btrfs:
> +# tree-checker: Verify dev item").
> +# The fix is titlted: "btrfs: tree-checker: Fix wrong check on max devid"

type, titlted -> titled

> +#
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1   # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +   cd /
> +   rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch_dev_pool 2
> +_scratch_dev_pool_get 2
> +
> +# The wrong check limit is based on node size, to reduce runtime, we only
> +# support 4K page size system for now
> +if [ $(get_page_size) != 4096 ]; then
> +   _notrun "This test need 4k page size"
> +fi
> +
> +device_1=$(echo $SCRATCH_DEV_POOL | awk '{print $1}')
> +device_2=$(echo $SCRATCH_DEV_POOL | awk '{print $2}')
> +
> +echo device_1=$device_1 device_2=$device_2 >> $seqres.full
> +
> +# Use 4K nodesize to reduce runtime

How does the node size reduces runtime?
It's obvious when one wants to create deeper/larger trees for some
purpose, but this test doesn't populate the filesystem, it uses an
empty filesystem.
So that deserves an explanation of how the node size influences the
test's running time.

> +_scratch_mkfs -n 4k >> $seqres.full
> +_scratch_mount
> +
> +# Add and remove device in a loop, one loop will increase devid by 2.

" ... one loop will ..." -> "... each iteration will ..."

> +# for 4k nodesize, the wrong check will be triggered at devid 123.

Why 123? It's not clear to me why, the test case doesn't explain and
neither does the btrfs patch that fixes the regression.
If it's related to the value of the constants/functions
BTRFS_MAX_DEVS and BTRFS_MAX_DEVS_SYS_CHUNK, it should be mentioned
here.

> +# here 64 is enough to trigger such regression
> +for (( i = 0; i < 64; i++ )); do
> +   $BTRFS_UTIL_PROG device add -f $device_2 $SCRATCH_MNT
> +   $BTRFS_UTIL_PROG device del $device_1 $SCRATCH_MNT
> +   $BTRFS_UTIL_PROG device add -f $device_1 $SCRATCH_MNT
> +   $BTRFS_UTIL_PROG device del $device_2 $SCRATCH_MNT
> +done
> +_scratch_dev_pool_put
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/194.out b/tests/btrfs/194.out
> new file mode 100644
> index ..7bfd50ff
> --- /dev/null
> +++ b/tests/btrfs/194.out
> @@ -0,0 +1,2 @@
> +QA output created by 194
> +Silence is golden
> diff --git a/tests/btrfs/group b/tests/btrfs/group
> index b92cb12c..ef1f0e1b 100644
> --- a/tests/btrfs/group
> +++ b/tests/btrfs/group
> @@ -196,3 +196,4 @@
>  191 auto quick send dedupe
>  192 auto replay snapshot stress
>  193 auto quick qgroup enospc limit
> +194 auto

Maybe volume group as well.

Thanks Qu.

> --
> 2.22.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”


Re: Btrfs partition mount error

2019-09-30 Thread Hans van Kranenburg
On 9/30/19 12:23 PM, Andrey Ivanov wrote:
> 
> # /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block -X 
> /dev/sdc1
> key (613019873280 EXTENT_ITEM 1048576)slot end outside of leaf 1073755934 > 
> 16283
> Open ctree failed
> 
> [...]
bin(1073755934)
'0b11101110000'

Your system is really flippin' bits all over the place.

Hans


Re: Btrfs partition mount error

2019-09-30 Thread Qu Wenruo


On 2019/9/30 下午6:23, Andrey Ivanov wrote:
> On 30.09.2019 13:10, Qu Wenruo wrote:
>>> I had built and run it:
>>>
>>> # /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block
>>> -X /dev/sdc1
>>> incorrect offsets 15223 15287
>>> Open ctree failed
>>
>> That branch updated to skip the item offset check completely.
>>
>> But please keep in mind that, that check itself still makes sense, so
>> please use the original "btrfs check" to check the fs.
> 
> # /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block
> -X /dev/sdc1
> key (613019873280 EXTENT_ITEM 1048576)slot end outside of leaf
> 1073755934 > 16283
> Open ctree failed

This means there are more corruptions in just that one leaf.

It's really hard to fix them one by one manually.

I'd recommend to go btrfs-restore or use this patchset:
https://patchwork.kernel.org/project/linux-btrfs/list/?series=170715

Then mount with ro,rescue=skipbg.

Either way, the bitflip is way beyond my imagination.

Thanks,
Qu

> 
> # btrfs check /dev/sdc1
> Opening filesystem to check...
> incorrect offsets 15223 15287
> ERROR: cannot open file system



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] fstests: btrfs: Add regression test to check if btrfs can handle high devid

2019-09-30 Thread Qu Wenruo


On 2019/9/30 下午6:33, Filipe Manana wrote:
> On Mon, Sep 30, 2019 at 9:39 AM Qu Wenruo  wrote:
>>
>> Add a regression test to check if btrfs can handle high devid.
>>
>> The test will add and remove devices to a btrfs fs, so that the devid
>> will increase to uncommon but still valid values.
>>
>> The regression is introduced by kernel commit ab4ba2e13346 ("btrfs:
>> tree-checker: Verify dev item").
>> The fix is titled "btrfs: tree-checker: Fix wrong check on max devid".
>>
>> Signed-off-by: Qu Wenruo 
>> ---
>>  tests/btrfs/194 | 73 +
>>  tests/btrfs/194.out |  2 ++
>>  tests/btrfs/group   |  1 +
>>  3 files changed, 76 insertions(+)
>>  create mode 100755 tests/btrfs/194
>>  create mode 100644 tests/btrfs/194.out
>>
>> diff --git a/tests/btrfs/194 b/tests/btrfs/194
>> new file mode 100755
>> index ..7a52ed86
>> --- /dev/null
>> +++ b/tests/btrfs/194
>> @@ -0,0 +1,73 @@
>> +#! /bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved.
>> +#
>> +# FS QA Test 194
>> +#
>> +# Test if btrfs can handle large device ids.
>> +#
>> +# The regression is introduced by kernel commit ab4ba2e13346 ("btrfs:
>> +# tree-checker: Verify dev item").
>> +# The fix is titlted: "btrfs: tree-checker: Fix wrong check on max devid"
> 
> type, titlted -> titled
> 
>> +#
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +here=`pwd`
>> +tmp=/tmp/$$
>> +status=1   # failure is the default!
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +_cleanup()
>> +{
>> +   cd /
>> +   rm -f $tmp.*
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +
>> +# remove previous $seqres.full before test
>> +rm -f $seqres.full
>> +
>> +# real QA test starts here
>> +
>> +# Modify as appropriate.
>> +_supported_fs btrfs
>> +_supported_os Linux
>> +_require_scratch_dev_pool 2
>> +_scratch_dev_pool_get 2
>> +
>> +# The wrong check limit is based on node size, to reduce runtime, we only
>> +# support 4K page size system for now
>> +if [ $(get_page_size) != 4096 ]; then
>> +   _notrun "This test need 4k page size"
>> +fi
>> +
>> +device_1=$(echo $SCRATCH_DEV_POOL | awk '{print $1}')
>> +device_2=$(echo $SCRATCH_DEV_POOL | awk '{print $2}')
>> +
>> +echo device_1=$device_1 device_2=$device_2 >> $seqres.full
>> +
>> +# Use 4K nodesize to reduce runtime
> 
> How does the node size reduces runtime?

The old wrong check is based on how large a chunk item can be.
The macro we use is BTRFS_MAX_DEVS() which takes fs_info->nodesize to
calculate.

The largest chunk item (or any item) is based on nodesize.
Thus nodesize will affect the runtime.

> It's obvious when one wants to create deeper/larger trees for some
> purpose, but this test doesn't populate the filesystem, it uses an
> empty filesystem.
> So that deserves an explanation of how the node size influences the
> test's running time.

Indeed.

> 
>> +_scratch_mkfs -n 4k >> $seqres.full
>> +_scratch_mount
>> +
>> +# Add and remove device in a loop, one loop will increase devid by 2.
> 
> " ... one loop will ..." -> "... each iteration will ..."
> 
>> +# for 4k nodesize, the wrong check will be triggered at devid 123.
> 
> Why 123? It's not clear to me why, the test case doesn't explain and
> neither does the btrfs patch that fixes the regression.
> If it's related to the value of the constants/functions
> BTRFS_MAX_DEVS and BTRFS_MAX_DEVS_SYS_CHUNK, it should be mentioned
> here.

OK, I'll mention
> 
>> +# here 64 is enough to trigger such regression
>> +for (( i = 0; i < 64; i++ )); do
>> +   $BTRFS_UTIL_PROG device add -f $device_2 $SCRATCH_MNT
>> +   $BTRFS_UTIL_PROG device del $device_1 $SCRATCH_MNT
>> +   $BTRFS_UTIL_PROG device add -f $device_1 $SCRATCH_MNT
>> +   $BTRFS_UTIL_PROG device del $device_2 $SCRATCH_MNT
>> +done
>> +_scratch_dev_pool_put
>> +
>> +echo "Silence is golden"
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/btrfs/194.out b/tests/btrfs/194.out
>> new file mode 100644
>> index ..7bfd50ff
>> --- /dev/null
>> +++ b/tests/btrfs/194.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 194
>> +Silence is golden
>> diff --git a/tests/btrfs/group b/tests/btrfs/group
>> index b92cb12c..ef1f0e1b 100644
>> --- a/tests/btrfs/group
>> +++ b/tests/btrfs/group
>> @@ -196,3 +196,4 @@
>>  191 auto quick send dedupe
>>  192 auto replay snapshot stress
>>  193 auto quick qgroup enospc limit
>> +194 auto
> 
> Maybe volume group as well.

Thanks for the hint, I was looking into the groups but can't find a good
candidate.

At least volume makes more sense.

Thanks,
Qu
> 
> Thanks Qu.
> 
>> --
>> 2.22.0
>>
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: Btrfs partition mount error

2019-09-30 Thread Qu Wenruo


On 2019/9/30 下午6:57, Qu Wenruo wrote:
> 
> 
> On 2019/9/30 下午6:23, Andrey Ivanov wrote:
>> On 30.09.2019 13:10, Qu Wenruo wrote:
 I had built and run it:

 # /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block
 -X /dev/sdc1
 incorrect offsets 15223 15287
 Open ctree failed
>>>
>>> That branch updated to skip the item offset check completely.
>>>
>>> But please keep in mind that, that check itself still makes sense, so
>>> please use the original "btrfs check" to check the fs.
>>
>> # /home/andrey/devel/src/btrfs/btrfs-progs-dirty_fix/btrfs-corrupt-block
>> -X /dev/sdc1
>> key (613019873280 EXTENT_ITEM 1048576)slot end outside of leaf
>> 1073755934 > 16283
>> Open ctree failed
> 
> This means there are more corruptions in just that one leaf.
> 
> It's really hard to fix them one by one manually.
> 
> I'd recommend to go btrfs-restore or use this patchset:
> https://patchwork.kernel.org/project/linux-btrfs/list/?series=170715
> 
> Then mount with ro,rescue=skipbg.
> 
> Either way, the bitflip is way beyond my imagination.

One strange thing hit me, not sure if it's just a coincident.


We have another report internally about a similar corruption (multiple
bit flips in a single fs), and they are also using VMware, along with
VMware guest kernel modules.

Would you mind to migrate to KVM based hypervisor to see if the
corruption happens again?

Thanks,
Qu

> 
> Thanks,
> Qu
> 
>>
>> # btrfs check /dev/sdc1
>> Opening filesystem to check...
>> incorrect offsets 15223 15287
>> ERROR: cannot open file system
> 



signature.asc
Description: OpenPGP digital signature


[PATCH] btrfs: removed unused return variable

2019-09-30 Thread Aliasgar Surti
From: Aliasgar Surti 

Removed unused return variable and replaced it with returning
the value directly

Signed-off-by: Aliasgar Surti 
---
 fs/btrfs/disk-io.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 044981c..c80fa67 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4255,7 +4255,6 @@ static int btrfs_destroy_delayed_refs(struct 
btrfs_transaction *trans,
struct rb_node *node;
struct btrfs_delayed_ref_root *delayed_refs;
struct btrfs_delayed_ref_node *ref;
-   int ret = 0;
 
delayed_refs = &trans->delayed_refs;
 
@@ -4263,7 +4262,7 @@ static int btrfs_destroy_delayed_refs(struct 
btrfs_transaction *trans,
if (atomic_read(&delayed_refs->num_entries) == 0) {
spin_unlock(&delayed_refs->lock);
btrfs_info(fs_info, "delayed_refs has NO entry");
-   return ret;
+   return 0;
}
 
while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
@@ -4307,7 +4306,7 @@ static int btrfs_destroy_delayed_refs(struct 
btrfs_transaction *trans,
 
spin_unlock(&delayed_refs->lock);
 
-   return ret;
+   return 0;
 }
 
 static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root)
-- 
2.7.4



Re: [PATCH 1/3] btrfs-progs: check/lowmem: Add check and repair for invalid inode generation

2019-09-30 Thread Nikolay Borisov



On 24.09.19 г. 11:11 ч., Qu Wenruo wrote:
> There are at least two bug reports of kernel tree-checker complaining
> about invalid inode generation.
> 
> All offending inodes seem to be caused by old kernel around 2014, with
> inode generation overflow.
> 
> So add such check and repair ability to lowmem mode check first.
> 
> This involves:
> - Calculate the inode generation upper limit
>   If it's an inode from log tree, then the upper limit is
>   super_generation + 1, otherwise it's super_generation.
> 
> - Check if the inode generation is larger than the upper limit
> 
> - Repair by resetting inode generation to current transaction
>   generation
> 
> Reported-by: Charles Wright 
> Signed-off-by: Qu Wenruo 

Tested-by: Nikolay Borisov 

There is one small nit with the assert once rectified you can add:

Reviewed-by: Nikolay Borisov 

> ---
>  check/mode-lowmem.c | 76 +
>  1 file changed, 76 insertions(+)
> 
> diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
> index 5f7f101d..7af29ba9 100644
> --- a/check/mode-lowmem.c
> +++ b/check/mode-lowmem.c
> @@ -2472,6 +2472,59 @@ static bool has_orphan_item(struct btrfs_root *root, 
> u64 ino)
>   return false;
>  }
>  
> +static int repair_inode_gen_lowmem(struct btrfs_root *root,
> +struct btrfs_path *path)
> +{
> + struct btrfs_trans_handle *trans;
> + struct btrfs_inode_item *ii;
> + struct btrfs_key key;
> + u64 transid;
> + int ret;
> +
> + trans = btrfs_start_transaction(root, 1);
> + if (IS_ERR(trans)) {
> + ret = PTR_ERR(trans);
> + errno = -ret;
> + error("failed to start transaction for inode gen repair: %m");
> + return ret;
> + }
> + transid = trans->transid;

> + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> + ASSERT(key.type == BTRFS_INODE_ITEM_KEY);

nit: This function's sole caller, check_inode_item, is guaranteed to be
called with a path pointing to BTRFS_INODE_ITEM_KEY thanks to the logic
in the 'for' loop in process_one_leaf. This renders the assert
redundant. At the very least I think it should be moved to
check_inode_item.

> +
> + btrfs_release_path(path);
> +
> + ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
> + if (ret > 0) {
> + ret = -ENOENT;
> + error("no inode item found for ino %llu", key.objectid);
> + goto error;
> + }
> + if (ret < 0) {
> + errno = -ret;
> + error("failed to find inode item for ino %llu: %m",
> +   key.objectid);
> + goto error;
> + }
> + ii = btrfs_item_ptr(path->nodes[0], path->slots[0],
> + struct btrfs_inode_item);
> + btrfs_set_inode_generation(path->nodes[0], ii, trans->transid);
> + btrfs_mark_buffer_dirty(path->nodes[0]);
> + ret = btrfs_commit_transaction(trans, root);
> + if (ret < 0) {
> + errno = -ret;
> + error("failed to commit transaction: %m");
> + goto error;
> + }
> + printf("reseting inode generation to %llu for ino %llu\n",
> + transid, key.objectid);
> + return ret;
> +
> +error:
> + btrfs_abort_transaction(trans, ret);
> + return ret;
> +}
> +
>  /*
>   * Check INODE_ITEM and related ITEMs (the same inode number)
>   * 1. check link count
> @@ -2487,6 +2540,7 @@ static int check_inode_item(struct btrfs_root *root, 
> struct btrfs_path *path)
>   struct btrfs_inode_item *ii;
>   struct btrfs_key key;
>   struct btrfs_key last_key;
> + struct btrfs_super_block *super = root->fs_info->super_copy;
>   u64 inode_id;
>   u32 mode;
>   u64 flags;
> @@ -2497,6 +2551,8 @@ static int check_inode_item(struct btrfs_root *root, 
> struct btrfs_path *path)
>   u64 refs = 0;
>   u64 extent_end = 0;
>   u64 extent_size = 0;
> + u64 generation;
> + u64 gen_uplimit;
>   unsigned int dir;
>   unsigned int nodatasum;
>   bool is_orphan = false;
> @@ -2527,6 +2583,7 @@ static int check_inode_item(struct btrfs_root *root, 
> struct btrfs_path *path)
>   flags = btrfs_inode_flags(node, ii);
>   dir = imode_to_type(mode) == BTRFS_FT_DIR;
>   nlink = btrfs_inode_nlink(node, ii);
> + generation = btrfs_inode_generation(node, ii);
>   nodatasum = btrfs_inode_flags(node, ii) & BTRFS_INODE_NODATASUM;
>  
>   if (!is_valid_imode(mode)) {
> @@ -2540,6 +2597,25 @@ static int check_inode_item(struct btrfs_root *root, 
> struct btrfs_path *path)
>   }
>   }
>  
> + if (btrfs_super_log_root(super) != 0 &&
> + root->objectid == BTRFS_TREE_LOG_OBJECTID)
> + gen_uplimit = btrfs_super_generation(super) + 1;
> + else
> + gen_uplimit = btrfs_super_generation(super);
> +
> + if (generation > gen_uplimit) {
> + error(
> + "invalid inode generation for i

Re: [PATCH] btrfs: removed unused return variable

2019-09-30 Thread Nikolay Borisov



On 30.09.19 г. 14:17 ч., Aliasgar Surti wrote:
> From: Aliasgar Surti 
> 
> Removed unused return variable and replaced it with returning
> the value directly
> 
> Signed-off-by: Aliasgar Surti 

Actually this function should be turned to void and even the declaration
at the top of disk-io can be removed.

> ---
>  fs/btrfs/disk-io.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 044981c..c80fa67 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -4255,7 +4255,6 @@ static int btrfs_destroy_delayed_refs(struct 
> btrfs_transaction *trans,
>   struct rb_node *node;
>   struct btrfs_delayed_ref_root *delayed_refs;
>   struct btrfs_delayed_ref_node *ref;
> - int ret = 0;
>  
>   delayed_refs = &trans->delayed_refs;
>  
> @@ -4263,7 +4262,7 @@ static int btrfs_destroy_delayed_refs(struct 
> btrfs_transaction *trans,
>   if (atomic_read(&delayed_refs->num_entries) == 0) {
>   spin_unlock(&delayed_refs->lock);
>   btrfs_info(fs_info, "delayed_refs has NO entry");
> - return ret;
> + return 0;
>   }
>  
>   while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
> @@ -4307,7 +4306,7 @@ static int btrfs_destroy_delayed_refs(struct 
> btrfs_transaction *trans,
>  
>   spin_unlock(&delayed_refs->lock);
>  
> - return ret;
> + return 0;
>  }
>  
>  static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root)
> 


Re: Btrfs partition mount error

2019-09-30 Thread Andrey Ivanov


We have another report internally about a similar corruption (multiple
bit flips in a single fs), and they are also using VMware, along with
VMware guest kernel modules.

Would you mind to migrate to KVM based hypervisor to see if the
corruption happens again?


I had this problem with btrfs after more than a year of using btrfs and vmware.
If I switch to KVM based hypervisor, then I am not sure that this problem will 
occur again
in a short period of time.
I used virtualbox earlier, but switched to vmware because virtualbox has slow 
graphics.


Re: [PATCH 1/3] btrfs-progs: check/lowmem: Add check and repair for invalid inode generation

2019-09-30 Thread Qu Wenruo



On 2019/9/30 下午7:36, Nikolay Borisov wrote:
>
>
> On 24.09.19 г. 11:11 ч., Qu Wenruo wrote:
>> There are at least two bug reports of kernel tree-checker complaining
>> about invalid inode generation.
>>
>> All offending inodes seem to be caused by old kernel around 2014, with
>> inode generation overflow.
>>
>> So add such check and repair ability to lowmem mode check first.
>>
>> This involves:
>> - Calculate the inode generation upper limit
>>   If it's an inode from log tree, then the upper limit is
>>   super_generation + 1, otherwise it's super_generation.
>>
>> - Check if the inode generation is larger than the upper limit
>>
>> - Repair by resetting inode generation to current transaction
>>   generation
>>
>> Reported-by: Charles Wright 
>> Signed-off-by: Qu Wenruo 
>
> Tested-by: Nikolay Borisov 
>
> There is one small nit with the assert once rectified you can add:
>
> Reviewed-by: Nikolay Borisov 
>
>> ---
>>  check/mode-lowmem.c | 76 +
>>  1 file changed, 76 insertions(+)
>>
>> diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
>> index 5f7f101d..7af29ba9 100644
>> --- a/check/mode-lowmem.c
>> +++ b/check/mode-lowmem.c
>> @@ -2472,6 +2472,59 @@ static bool has_orphan_item(struct btrfs_root *root, 
>> u64 ino)
>>  return false;
>>  }
>>
>> +static int repair_inode_gen_lowmem(struct btrfs_root *root,
>> +   struct btrfs_path *path)
>> +{
>> +struct btrfs_trans_handle *trans;
>> +struct btrfs_inode_item *ii;
>> +struct btrfs_key key;
>> +u64 transid;
>> +int ret;
>> +
>> +trans = btrfs_start_transaction(root, 1);
>> +if (IS_ERR(trans)) {
>> +ret = PTR_ERR(trans);
>> +errno = -ret;
>> +error("failed to start transaction for inode gen repair: %m");
>> +return ret;
>> +}
>> +transid = trans->transid;
>
>> +btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
>> +ASSERT(key.type == BTRFS_INODE_ITEM_KEY);
>
> nit: This function's sole caller, check_inode_item, is guaranteed to be
> called with a path pointing to BTRFS_INODE_ITEM_KEY thanks to the logic
> in the 'for' loop in process_one_leaf. This renders the assert
> redundant. At the very least I think it should be moved to
> check_inode_item.

Yes, the ASSERT() doesn't make much sense by itself.

However I still believe it won't be a problem.

It's compiler's job to remove such dead ASSERT(), but for human reader,
I still believe this ASSERT() could still make sense, especially when
the caller or callee can get more and more complex.

Thanks,
Qu

>
>> +
>> +btrfs_release_path(path);
>> +
>> +ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
>> +if (ret > 0) {
>> +ret = -ENOENT;
>> +error("no inode item found for ino %llu", key.objectid);
>> +goto error;
>> +}
>> +if (ret < 0) {
>> +errno = -ret;
>> +error("failed to find inode item for ino %llu: %m",
>> +  key.objectid);
>> +goto error;
>> +}
>> +ii = btrfs_item_ptr(path->nodes[0], path->slots[0],
>> +struct btrfs_inode_item);
>> +btrfs_set_inode_generation(path->nodes[0], ii, trans->transid);
>> +btrfs_mark_buffer_dirty(path->nodes[0]);
>> +ret = btrfs_commit_transaction(trans, root);
>> +if (ret < 0) {
>> +errno = -ret;
>> +error("failed to commit transaction: %m");
>> +goto error;
>> +}
>> +printf("reseting inode generation to %llu for ino %llu\n",
>> +transid, key.objectid);
>> +return ret;
>> +
>> +error:
>> +btrfs_abort_transaction(trans, ret);
>> +return ret;
>> +}
>> +
>>  /*
>>   * Check INODE_ITEM and related ITEMs (the same inode number)
>>   * 1. check link count
>> @@ -2487,6 +2540,7 @@ static int check_inode_item(struct btrfs_root *root, 
>> struct btrfs_path *path)
>>  struct btrfs_inode_item *ii;
>>  struct btrfs_key key;
>>  struct btrfs_key last_key;
>> +struct btrfs_super_block *super = root->fs_info->super_copy;
>>  u64 inode_id;
>>  u32 mode;
>>  u64 flags;
>> @@ -2497,6 +2551,8 @@ static int check_inode_item(struct btrfs_root *root, 
>> struct btrfs_path *path)
>>  u64 refs = 0;
>>  u64 extent_end = 0;
>>  u64 extent_size = 0;
>> +u64 generation;
>> +u64 gen_uplimit;
>>  unsigned int dir;
>>  unsigned int nodatasum;
>>  bool is_orphan = false;
>> @@ -2527,6 +2583,7 @@ static int check_inode_item(struct btrfs_root *root, 
>> struct btrfs_path *path)
>>  flags = btrfs_inode_flags(node, ii);
>>  dir = imode_to_type(mode) == BTRFS_FT_DIR;
>>  nlink = btrfs_inode_nlink(node, ii);
>> +generation = btrfs_inode_generation(node, ii);
>>  nodatasum = btrfs_inode_flags(node, ii) & BTRFS_INODE_NODATASUM;
>>
>>  if (!is_valid_imode(mode)) {
>> @@ -2540,6 +2597,25 @@ static int check_inode_item(struct bt

Re: Btrfs partition mount error

2019-09-30 Thread Qu Wenruo


On 2019/9/30 下午8:10, Andrey Ivanov wrote:
>>
>> We have another report internally about a similar corruption (multiple
>> bit flips in a single fs), and they are also using VMware, along with
>> VMware guest kernel modules.
>>
>> Would you mind to migrate to KVM based hypervisor to see if the
>> corruption happens again?
> 
> I had this problem with btrfs after more than a year of using btrfs and
> vmware.

So it looks like some regression, although still not sure who is to blame.

But anyway, I'll add more asserts for those obvious members which should
never exceed 64K, to either to make sure it's not btrfs, or to confirm
it's us to be blamed.

Thanks,
Qu

> If I switch to KVM based hypervisor, then I am not sure that this
> problem will occur again
> in a short period of time.
> I used virtualbox earlier, but switched to vmware because virtualbox has
> slow graphics.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] btrfs: removed unused return variable

2019-09-30 Thread Aliasgar Surti
On Mon, Sep 30, 2019 at 02:53:12PM +0300, Nikolay Borisov wrote:
> 
> 
> On 30.09.19 г. 14:17 ч., Aliasgar Surti wrote:
> > From: Aliasgar Surti 
> > 
> > Removed unused return variable and replaced it with returning
> > the value directly
> > 
> > Signed-off-by: Aliasgar Surti 
> 
> Actually this function should be turned to void and even the declaration
> at the top of disk-io can be removed.

Okay, will send out v2 incorporating this change.

Thanks,
Aliasgar
> 
> > ---
> >  fs/btrfs/disk-io.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index 044981c..c80fa67 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -4255,7 +4255,6 @@ static int btrfs_destroy_delayed_refs(struct 
> > btrfs_transaction *trans,
> > struct rb_node *node;
> > struct btrfs_delayed_ref_root *delayed_refs;
> > struct btrfs_delayed_ref_node *ref;
> > -   int ret = 0;
> >  
> > delayed_refs = &trans->delayed_refs;
> >  
> > @@ -4263,7 +4262,7 @@ static int btrfs_destroy_delayed_refs(struct 
> > btrfs_transaction *trans,
> > if (atomic_read(&delayed_refs->num_entries) == 0) {
> > spin_unlock(&delayed_refs->lock);
> > btrfs_info(fs_info, "delayed_refs has NO entry");
> > -   return ret;
> > +   return 0;
> > }
> >  
> > while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
> > @@ -4307,7 +4306,7 @@ static int btrfs_destroy_delayed_refs(struct 
> > btrfs_transaction *trans,
> >  
> > spin_unlock(&delayed_refs->lock);
> >  
> > -   return ret;
> > +   return 0;
> >  }
> >  
> >  static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root)
> > 


[PATCH v2] btrfs: removed unused return variable

2019-09-30 Thread Aliasgar Surti
From: Aliasgar Surti 

Removed unused return variable and replaced it with returning
the value directly

Signed-off-by: Aliasgar Surti 
---
v2: Made btrfs_destroy_delayed_refs() as void and removed its declaration
---
 fs/btrfs/disk-io.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 044981c..0a8243a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -52,8 +52,6 @@
 static const struct extent_io_ops btree_extent_io_ops;
 static void end_workqueue_fn(struct btrfs_work *work);
 static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
-static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
- struct btrfs_fs_info *fs_info);
 static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root);
 static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info,
struct extent_io_tree *dirty_pages,
@@ -4249,13 +4247,12 @@ static void btrfs_destroy_all_ordered_extents(struct 
btrfs_fs_info *fs_info)
btrfs_wait_ordered_roots(fs_info, U64_MAX, 0, (u64)-1);
 }
 
-static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
+static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
  struct btrfs_fs_info *fs_info)
 {
struct rb_node *node;
struct btrfs_delayed_ref_root *delayed_refs;
struct btrfs_delayed_ref_node *ref;
-   int ret = 0;
 
delayed_refs = &trans->delayed_refs;
 
@@ -4263,7 +4260,7 @@ static int btrfs_destroy_delayed_refs(struct 
btrfs_transaction *trans,
if (atomic_read(&delayed_refs->num_entries) == 0) {
spin_unlock(&delayed_refs->lock);
btrfs_info(fs_info, "delayed_refs has NO entry");
-   return ret;
+   return;
}
 
while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
@@ -4306,8 +4303,6 @@ static int btrfs_destroy_delayed_refs(struct 
btrfs_transaction *trans,
}
 
spin_unlock(&delayed_refs->lock);
-
-   return ret;
 }
 
 static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root)
-- 
2.7.4



Re: [PATCH v2] btrfs: removed unused return variable

2019-09-30 Thread Nikolay Borisov



On 30.09.19 г. 16:00 ч., Aliasgar Surti wrote:
> From: Aliasgar Surti 
> 
> Removed unused return variable and replaced it with returning
> the value directly

The changelog could be expanded, something like:

btrfs_destroy_delayed_refs always returns 0 so it can be converted to
void. While at it also remove the unnecessary forward declaration in
disk-io.c

> 
> Signed-off-by: Aliasgar Surti 

Code wise it's ok:

Reviewed-by: Nikolay Borisov 

> ---
> v2: Made btrfs_destroy_delayed_refs() as void and removed its declaration
> ---
>  fs/btrfs/disk-io.c | 9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 044981c..0a8243a 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -52,8 +52,6 @@
>  static const struct extent_io_ops btree_extent_io_ops;
>  static void end_workqueue_fn(struct btrfs_work *work);
>  static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
> -static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
> -   struct btrfs_fs_info *fs_info);
>  static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root);
>  static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info,
>   struct extent_io_tree *dirty_pages,
> @@ -4249,13 +4247,12 @@ static void btrfs_destroy_all_ordered_extents(struct 
> btrfs_fs_info *fs_info)
>   btrfs_wait_ordered_roots(fs_info, U64_MAX, 0, (u64)-1);
>  }
>  
> -static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
> +static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
> struct btrfs_fs_info *fs_info)
   ^^
nit: It's good practice to also adjust the alignment of following lines.

>  {
>   struct rb_node *node;
>   struct btrfs_delayed_ref_root *delayed_refs;
>   struct btrfs_delayed_ref_node *ref;
> - int ret = 0;
>  
>   delayed_refs = &trans->delayed_refs;
>  
> @@ -4263,7 +4260,7 @@ static int btrfs_destroy_delayed_refs(struct 
> btrfs_transaction *trans,
>   if (atomic_read(&delayed_refs->num_entries) == 0) {
>   spin_unlock(&delayed_refs->lock);
>   btrfs_info(fs_info, "delayed_refs has NO entry");
> - return ret;
> + return;
>   }
>  
>   while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
> @@ -4306,8 +4303,6 @@ static int btrfs_destroy_delayed_refs(struct 
> btrfs_transaction *trans,
>   }
>  
>   spin_unlock(&delayed_refs->lock);
> -
> - return ret;
>  }
>  
>  static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root)
> 


Re: [PATCH v2] btrfs: removed unused return variable

2019-09-30 Thread David Sterba
On Mon, Sep 30, 2019 at 06:30:21PM +0530, Aliasgar Surti wrote:
> From: Aliasgar Surti 
> 
> Removed unused return variable and replaced it with returning
> the value directly

This change has been sent several times and I give the same answer each
time:

https://lore.kernel.org/linux-btrfs/20190418154913.go20...@twin.jikos.cz/

"When switching a fuction to return void, please check the whole
callgraph for functions that do not properly handler errors and do
BUG_ON. You won't see errors passed from them so this gives the
impression no error handling is needed in the caller.

This has been sent in the past, so I can point you to the 2nd paragraph
in
https://lore.kernel.org/linux-btrfs/20180815124425.gm24...@twin.jikos.cz/

hint: btrfs_pin_extent"


Re: [PATCH 1/3] btrfs-progs: check/lowmem: Add check and repair for invalid inode generation

2019-09-30 Thread Nikolay Borisov



On 30.09.19 г. 15:24 ч., Qu Wenruo wrote:
> Yes, the ASSERT() doesn't make much sense by itself.
> 
> However I still believe it won't be a problem.

It won't be a problem but it feels wrong to have this assert this deep
into the call chain. IMO It should be put where it can trigger at the
earliest which seems to be in check_inode_item. That function assumes
it's working with an inode item and goes to dereference inode members so
if the type is wrong we'd crash there instead of in repair_inode_gen_lowmem.

> 
> It's compiler's job to remove such dead ASSERT(), but for human reader,
> I still believe this ASSERT() could still make sense, especially when
> the caller or callee can get more and more complex.
> 
> Thanks,


Re: [PATCH] Btrfs: fix memory leak due to concurrent append writes with fiemap

2019-09-30 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: 4.16+

The bot has tested the following trees: v5.3.1, v5.2.17, v4.19.75.

v5.3.1: Build OK!
v5.2.17: Build OK!
v4.19.75: Failed to apply! Possible dependencies:
7073017aeb98 ("btrfs: use offset_in_page instead of open-coding it")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

--
Thanks,
Sasha


Re: [PATCH] Btrfs: fix memory leak due to concurrent append writes with fiemap

2019-09-30 Thread Josef Bacik
On Mon, Sep 30, 2019 at 10:20:25AM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> When we have a buffered write that starts at an offset greater than or
> equals to the file's size happening concurrently with a full ranged
> fiemap, we can end up leaking an extent state structure.
> 
> Suppose we have a file with a size of 1Mb, and before the buffered write
> and fiemap are performed, it has a single extent state in its io tree
> representing the range from 0 to 1Mb, with the EXTENT_DELALLOC bit set.
> 
> The following sequence diagram shows how the memory leak happens if a
> fiemap a buffered write, starting at offset 1Mb and with a length of
> 4Kb, are performed concurrently.
> 
>   CPU 1  CPU 2
> 
>   extent_fiemap()
> --> it's a full ranged fiemap
> range from 0 to LLONG_MAX - 1
> (9223372036854775807)
> 
> --> locks range in the inode's
> io tree
>   --> after this we have 2 extent
>   states in the io tree:
>   --> 1 for range [0, 1Mb[ with
>   the bits EXTENT_LOCKED and
>   EXTENT_DELALLOC_BITS set
>   --> 1 for the range
>   [1Mb, LLONG_MAX[ with
>   the EXTENT_LOCKED bit set
> 
>   --> start buffered write at 
> offset
>   1Mb with a length of 4Kb
> 
>   btrfs_file_write_iter()
> 
> btrfs_buffered_write()
>   --> cached_state is NULL
> 
>   
> lock_and_cleanup_extent_if_need()
> --> returns 0 and 
> does not lock
> range because it 
> starts
> at current i_size 
> / eof
> 
>   --> cached_state 
> remains NULL
> 
>   btrfs_dirty_pages()
> 
> btrfs_set_extent_delalloc()
>   (...)
>   __set_extent_bit()
> 
> --> splits extent 
> state for range
> [1Mb, 
> LLONG_MAX[ and now we
> have 2 extent 
> states:
> 
> --> one for 
> the range
> [1Mb, 1Mb 
> + 4Kb[ with
> 
> EXTENT_LOCKED set
> --> another 
> one for the range
> [1Mb + 
> 4Kb, LLONG_MAX[ with
> 
> EXTENT_LOCKED set as well
> 
> --> sets 
> EXTENT_DELALLOC on the
> extent state 
> for the range
> [1Mb, 1Mb + 
> 4Kb[
> --> caches extent 
> state
> [1Mb, 1Mb + 
> 4Kb[ into
> @cached_state 
> because it has
> the bit 
> EXTENT_LOCKED set
> 
> --> 
> btrfs_buffered_write() ends up
> with a non-NULL 
> cached_state and
> never calls anything 
> to release its
> reference on it, 
> resulting in a
> memory leak
> 
> Fix this by calling free_extent_state() on cached_state if the range was
> not locked by lock_and_cleanup_extent_if_need().
> 
> The same issue can happen if anything else other than fiemap locks a range
> that covers eof and beyond.
> 
> This could be triggered, sporadically, by test case generic/561 from the
> fstests suite, which makes duperemove run concurrently with fsstress, and
> duperemove does plenty of calls to fiemap. When CONFIG_BTRFS_DEBUG is set
> the leak is reported in dmesg/syslog when removing the btrfs module with
> a message like

Re: [PATCH 1/3] btrfs-progs: check/lowmem: Add check and repair for invalid inode generation

2019-09-30 Thread Qu Wenruo



On 2019/9/30 下午9:34, Nikolay Borisov wrote:
>
>
> On 30.09.19 г. 15:24 ч., Qu Wenruo wrote:
>> Yes, the ASSERT() doesn't make much sense by itself.
>>
>> However I still believe it won't be a problem.
>
> It won't be a problem but it feels wrong to have this assert this deep
> into the call chain. IMO It should be put where it can trigger at the
> earliest which seems to be in check_inode_item. That function assumes
> it's working with an inode item and goes to dereference inode members so
> if the type is wrong we'd crash there instead of in repair_inode_gen_lowmem.

This is going to be a preference thing.

To my taste, if possible, I would put ASSERT() in every function with
more than 24 lines, as an alternative to comment, to show the
prerequisite or the assumption.

Thanks,
Qu

>
>>
>> It's compiler's job to remove such dead ASSERT(), but for human reader,
>> I still believe this ASSERT() could still make sense, especially when
>> the caller or callee can get more and more complex.
>>
>> Thanks,


[GIT PULL] Btrfs fixes for 5.4-rc2

2019-09-30 Thread David Sterba
Hi,

a bunch of fixes that accumulated in recent weeks, mostly material for
stable.

Summary:

- fix for regression from 5.3 that prevents to use balance convert with
  single profile

- qgroup fixes: rescan race, accounting leak with multiple writers,
  potential leak after io failure recovery

- fix for use after free in relocation (reported by KASAN)

- other error handling fixups

Please pull, thanks.


The following changes since commit 6af112b11a4bc1b560f60a618ac9c1dcefe9836e:

  btrfs: Relinquish CPUs in btrfs_compare_trees (2019-09-09 14:59:20 +0200)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.4-rc1-tag

for you to fetch changes up to d4e204948fe3e0dc8e1fbf3f8f3290c9c2823be3:

  btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls 
(2019-09-27 15:24:34 +0200)


Dennis Zhou (1):
  btrfs: adjust dirty_metadata_bytes after writeback failure of extent 
buffer

Filipe Manana (3):
  Btrfs: fix selftests failure due to uninitialized i_mode in test inodes
  Btrfs: fix missing error return if writeback for extent buffer never 
started
  Btrfs: fix race setting up and completing qgroup rescan workers

Qu Wenruo (4):
  btrfs: relocation: fix use-after-free on dead relocation roots
  btrfs: Fix a regression which we can't convert to SINGLE profile
  btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data 
space
  btrfs: qgroup: Fix reserved data space leak if we have multiple reserve 
calls

 fs/btrfs/extent_io.c | 13 +
 fs/btrfs/qgroup.c| 38 +++---
 fs/btrfs/relocation.c|  9 -
 fs/btrfs/tests/btrfs-tests.c |  8 +++-
 fs/btrfs/volumes.c   |  8 +++-
 5 files changed, 58 insertions(+), 18 deletions(-)


[PATCH] Setup GitLab-CI for btrfs-progs

2019-09-30 Thread Lakshmipathi.G
Make use of GitLab-CI nested virutal environment to start QEMU instance inside 
containers
and perform btrfs-progs build, execute unit test cases and save the logs.

More details can be found at https://github.com/kdave/btrfs-progs/issues/171

Signed-off-by: Lakshmipathi.G 
---
 .gitlab-ci.yml| 181 ++
 gitlab-ci/Dockerfile  |   3 +
 gitlab-ci/btrfs-progs-tests.service   |  13 +++
 gitlab-ci/build_or_run_btrfs-progs.sh |  37 +++
 gitlab-ci/kernel_build.sh |  30 ++
 gitlab-ci/run_tests.sh|   9 ++
 gitlab-ci/setup_image.sh  |  42 
 7 files changed, 315 insertions(+)
 create mode 100644 .gitlab-ci.yml
 create mode 100644 gitlab-ci/Dockerfile
 create mode 100644 gitlab-ci/btrfs-progs-tests.service
 create mode 100755 gitlab-ci/build_or_run_btrfs-progs.sh
 create mode 100755 gitlab-ci/kernel_build.sh
 create mode 100755 gitlab-ci/run_tests.sh
 create mode 100755 gitlab-ci/setup_image.sh

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
new file mode 100644
index 000..2afde50
--- /dev/null
+++ b/.gitlab-ci.yml
@@ -0,0 +1,181 @@
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public
+# License v2 as published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public
+# License along with this program; if not, write to the
+# Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+# Boston, MA 021110-1307, USA.
+#
+
+image: docker:18.09.7
+
+services:
+- docker:18.09.7-dind
+
+variables:
+  DOCKER_DRIVER: overlay2
+
+stages:
+  - build
+  - btrfs-progs build
+  - test
+
+variables:
+  DOCKER_DRIVER: overlay2
+  IMAGE_TAG: registry.gitlab.com/$CI_PROJECT_NAMESPACE/btrfs-progs:gitlab-ci
+
+before_script:
+   - docker login --username $CI_REGISTRY_USER --password 
$CI_REGISTRY_PASSWORD $CI_REGISTRY
+
+docker build:
+  stage: build
+  script:
+- cd gitlab-ci
+- docker pull $IMAGE_TAG > /dev/null && echo "Downloaded image" || ( 
docker build -t $IMAGE_TAG . && docker push $IMAGE_TAG )
+- cd ..
+
+## To enable or disable Kernel Build set BUILD_KERNEL: "1" or BUILD_KERNEL: 
"0" 
+## If you disable Kernel Build, make sure PREBUILT_KERNEL_ID points to 
previously built the kernel job id.
+
+kernel build:
+  variables:
+BUILD_KERNEL: "1"
+PREBUILT_KERNEL_ID: "288159334"
+  before_script:
+- apk add curl unzip 
+  stage: build
+  services:
+- docker:18.09.7-dind
+  script:
+ - if [ "$BUILD_KERNEL" == "1" ]; then
+ docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged 
--device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/kernel_build.sh;
+   else
+ curl -o bzImage.zip --location --header "JOB-TOKEN:$CI_JOB_TOKEN"  
"https://gitlab.com/api/v4/projects/$CI_PROJECT_ID/jobs/$PREBUILT_KERNEL_ID/artifacts";
 && unzip bzImage.zip;
+   fi;
+  artifacts:
+when: always
+paths:
+  - bzImage
+
+# To enable or disable image build update BUILD_IMAGE value to "1" or "0".
+# If you disable Image Build, make sure PREBUILT_IMAGE_ID points to previously 
built rootfs job id.
+ 
+image build:
+  variables:
+BUILD_IMAGE: "1"
+PREBUILT_IMAGE_ID: "288506168"
+  before_script:
+- apk add curl unzip 
+  stage: build
+  services:
+- docker:18.09.7-dind
+  script:
+ - if [ "$BUILD_IMAGE" == "1" ]; then
+  docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged 
--device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/setup_image.sh;
+   else
+  curl  -o qemu-image.img.zip --location --header 
"JOB-TOKEN:$CI_JOB_TOKEN" 
"https://gitlab.com/api/v4/projects/$CI_PROJECT_ID/jobs/$PREBUILT_IMAGE_ID/artifacts";
 && unzip qemu-image.img.zip;
+   fi;
+  artifacts:
+when: always
+paths:
+  - qemu-image.img
+
+btrfs-progs build:
+  stage: btrfs-progs build
+  services:
+- docker:18.09.7-dind
+  script:
+ - docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged 
--device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/run_tests.sh
+  artifacts:
+expire_in: 1 week
+when: always
+paths:
+  - qemu-image.img
+
+cli tests:
+  stage: test
+  services:
+- docker:18.09.7-dind
+  script:
+ - echo "./cli-tests.sh" > $PWD/cmd
+ - docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged 
--device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/run_tests.sh
+ - test -e "result" || exit 1 # If result doesn't exists, job failed.
+  artifacts:
+when: always
+paths:
+  - "*tests-results.txt"
+
+convert tests:
+  only: 
+- devel
+  stage: test
+  services:
+- docker:18.09

[PATCH] btrfs: fix incorrect updating of log root tree

2019-09-30 Thread Josef Bacik
We've historically had reports of being unable to mount file systems
because the tree log root couldn't be read.  Usually this is the "parent
transid failure", but could be any of the related errors, including
"fsid mismatch" or "bad tree block", depending on which block got
allocated.

The modification of the individual log root items are serialized on the
per-log root root_mutex.  This means that any modification to the
per-subvol log root_item is completely protected.

However we update the root item in the log root tree outside of the log
root tree log_mutex.  We do this in order to allow multiple subvolumes
to be updated in each log transaction.

This is problematic however because when we are writing the log root
tree out we update the super block with the _current_ log root node
information.  Since these two operations happen independently of each
other, you can end up updating the log root tree in between writing out
the dirty blocks and setting the super block to point at the current
root.

This means we'll point at the new root node that hasn't been written
out, instead of the one we should be pointing at.  Thus whatever garbage
or old block we end up pointing at complains when we mount the file
system later and try to replay the log.

Fix this by copying the log's root item into a local root item copy.
Then once we're safely under the log_root_tree->log_mutex we update the
root item in the log_root_tree.  This way we do not modify the
log_root_tree while we're committing it, fixing the problem.

cc: sta...@vger.kernel.org
Signed-off-by: Josef Bacik 
Reviewed-by: Chris Mason 
---
 fs/btrfs/tree-log.c | 36 +++-
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 7cac09a6f007..1d7f22951ef2 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2908,7 +2908,8 @@ static int walk_log_tree(struct btrfs_trans_handle *trans,
  * in the tree of log roots
  */
 static int update_log_root(struct btrfs_trans_handle *trans,
-  struct btrfs_root *log)
+  struct btrfs_root *log,
+  struct btrfs_root_item *root_item)
 {
struct btrfs_fs_info *fs_info = log->fs_info;
int ret;
@@ -2916,10 +2917,10 @@ static int update_log_root(struct btrfs_trans_handle 
*trans,
if (log->log_transid == 1) {
/* insert root item on the first sync */
ret = btrfs_insert_root(trans, fs_info->log_root_tree,
-   &log->root_key, &log->root_item);
+   &log->root_key, root_item);
} else {
ret = btrfs_update_root(trans, fs_info->log_root_tree,
-   &log->root_key, &log->root_item);
+   &log->root_key, root_item);
}
return ret;
 }
@@ -3017,6 +3018,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info = root->fs_info;
struct btrfs_root *log = root->log_root;
struct btrfs_root *log_root_tree = fs_info->log_root_tree;
+   struct btrfs_root_item new_root_item;
int log_transid = 0;
struct btrfs_log_ctx root_log_ctx;
struct blk_plug plug;
@@ -3080,17 +3082,25 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
goto out;
}
 
+   /*
+* We _must_ update under the root->log_mutex in order to make sure we
+* have a consistent view of the log root we are trying to commit at
+* this moment.
+*
+* We _must_ copy this into a local copy, because we are not holding the
+* log_root_tree->log_mutex yet.  This is important because when we
+* commit the log_root_tree we must have a consistent view of the
+* log_root_tree when we update the super block to point at the
+* log_root_tree bytenr.  If we update the log_root_tree here we'll race
+* with the commit and possibly point at the new block which we may not
+* have written out.
+*/
btrfs_set_root_node(&log->root_item, log->node);
+   memcpy(&new_root_item, &log->root_item, sizeof(new_root_item));
 
root->log_transid++;
log->log_transid = root->log_transid;
root->log_start_pid = 0;
-   /*
-* Update or create log root item under the root's log_mutex to prevent
-* races with concurrent log syncs that can lead to failure to update
-* log root item because it was not created yet.
-*/
-   ret = update_log_root(trans, log);
/*
 * IO has been started, blocks of the log tree have WRITTEN flag set
 * in their headers. new modifications of the log will be written to
@@ -3111,6 +3121,14 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
mutex_unlock(&log_root_tree->log_mutex);
 
mutex_lock(&log_root_tree->log_mutex);
+
+

BTRFS corruption, mounts but comes read-only

2019-09-30 Thread Remi Gauvin
Quick history, the system was rejecting ssh connections, and was not
responding to ACPI shutdown, so I had to force power off.

On restart, everything looked ok, but when I did a scrub, the system
quickly reached a point of forced read-only., (kernel Ubuntu 4.15.0-52)

I installed a current Arch for recovery, (kernel 5.3.1 and btrfs-progs
5.2.2)

Here is the result of btrfs check

btrfs check /dev/sdd4
Opening filesystem to check...
Checking filesystem on /dev/sdd4
UUID: cd54199e-753f-4bac-862d-c9d353d3e155
[1/7] checking root items
[2/7] checking extents
corrupt extent record: key [5126707298304,169,16384]
incorrect local backref count on 5126707298304 root 5890615246848 owner
1507958538109110 offset 386035672063981056 found 0 wanted 11927552 back
0x55737707d980
backref disk bytenr does not match extent record, bytenr=5126707298304,
ref bytenr=0
tree backref 5126707298304 parent 5890436890624 root 5890436890624 not
found in extent tree
tree backref 5126707298304 parent 5890463039488 root 5890463039488 not
found in extent tree
tree backref 5126707298304 parent 5125855133696 root 5125855133696 not
found in extent tree
tree backref 5126707298304 parent 4717386399744 root 4717386399744 not
found in extent tree
tree backref 5126707298304 parent 5125777653760 root 5125777653760 not
found in extent tree
tree backref 5126707298304 parent 5126231179264 root 5126231179264 not
found in extent tree
tree backref 5126707298304 parent 4716869992448 root 4716869992448 not
found in extent tree
tree backref 5126707298304 parent 4716547129344 root 4716547129344 not
found in extent tree
tree backref 5126707298304 parent 4716575670272 root 4716575670272 not
found in extent tree
tree backref 5126707298304 parent 4418457927680 root 4418457927680 not
found in extent tree
tree backref 5126707298304 parent 4418653339648 root 4418653339648 not
found in extent tree
tree backref 5126707298304 parent 4716604555264 root 4716604555264 not
found in extent tree
tree backref 5126707298304 parent 4716953616384 root 4716953616384 not
found in extent tree
tree backref 5126707298304 parent 5890615246848 root 5890615246848 not
found in extent tree
tree backref 5126707298304 parent 4134522355712 root 4134522355712 not
found in extent tree
tree backref 5126707298304 parent 4418052734976 root 4418052734976 not
found in extent tree
tree backref 5126707298304 parent 4034125447168 root 4034125447168 not
found in extent tree
tree backref 5126707298304 parent 4034188083200 root 4034188083200 not
found in extent tree
tree backref 5126707298304 parent 4133585420288 root 4133585420288 not
found in extent tree
tree backref 5126707298304 parent 4034073935872 root 4034073935872 not
found in extent tree
tree backref 5126707298304 parent 4032730021888 root 4032730021888 not
found in extent tree
tree backref 5126707298304 parent 4033156546560 root 4033156546560 not
found in extent tree
tree backref 5126707298304 parent 4031949127680 root 4031949127680 not
found in extent tree
tree backref 5126707298304 parent 4032101384192 root 4032101384192 not
found in extent tree
tree backref 5126707298304 parent 4032197132288 root 4032197132288 not
found in extent tree
tree backref 5126707298304 parent 4034077949952 root 4034077949952 not
found in extent tree
tree backref 5126707298304 parent 4031202394112 root 4031202394112 not
found in extent tree
tree backref 5126707298304 parent 4030981292032 root 4030981292032 not
found in extent tree
tree backref 5126707298304 parent 3874566619136 root 3874566619136 not
found in extent tree
tree backref 5126707298304 parent 3874229714944 root 3874229714944 not
found in extent tree
tree backref 5126707298304 parent 3873657765888 root 3873657765888 not
found in extent tree
tree backref 5126707298304 parent 3874460270592 root 3874460270592 not
found in extent tree
tree backref 5126707298304 parent 4030556012544 root 4030556012544 not
found in extent tree
tree backref 5126707298304 parent 4031297765376 root 4031297765376 not
found in extent tree
tree backref 5126707298304 parent 4418354544640 root 4418354544640 not
found in extent tree
backpointer mismatch on [5126707298304 16384]
bad extent [5126707298304, 5126707314688), type mismatch with chunk
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
cache and super generation don't match, space cache will be invalidated
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 4501017206864 bytes used, error(s) found
total csum bytes: 4386928432
total tree bytes: 9074900992
total fs tree bytes: 3698982912
total extent tree bytes: 377012224
btree space waste bytes: 1430502412
file data blocks allocated: 16729280847872
 referenced 8948709453824



Thank you.


Re: [GIT PULL] Btrfs fixes for 5.4-rc2

2019-09-30 Thread pr-tracker-bot
The pull request you sent on Mon, 30 Sep 2019 16:25:08 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.4-rc1-tag

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/bb48a59135926ece9b1361e8b96b33fc658830bc

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [PATCH][v3] btrfs: use refcount_inc_not_zero in kill_all_nodes

2019-09-30 Thread David Sterba
On Thu, Sep 26, 2019 at 08:29:32AM -0400, Josef Bacik wrote:
> We hit the following warning while running down a different problem
> 
> [ 6197.175850] [ cut here ]
> [ 6197.185082] refcount_t: underflow; use-after-free.
> [ 6197.194704] WARNING: CPU: 47 PID: 966 at lib/refcount.c:190 
> refcount_sub_and_test_checked+0x53/0x60
> [ 6197.521792] Call Trace:
> [ 6197.526687]  __btrfs_release_delayed_node+0x76/0x1c0
> [ 6197.536615]  btrfs_kill_all_delayed_nodes+0xec/0x130
> [ 6197.546532]  ? __btrfs_btree_balance_dirty+0x60/0x60
> [ 6197.556482]  btrfs_clean_one_deleted_snapshot+0x71/0xd0
> [ 6197.566910]  cleaner_kthread+0xfa/0x120
> [ 6197.574573]  kthread+0x111/0x130
> [ 6197.581022]  ? kthread_create_on_node+0x60/0x60
> [ 6197.590086]  ret_from_fork+0x1f/0x30
> [ 6197.597228] ---[ end trace 424bb7ae00509f56 ]---
> 
> This is because the free side drops the ref without the lock, and then
> takes the lock if our refcount is 0.  So you can have nodes on the tree
> that have a refcount of 0.

This sounds like breaking the assumptions of the refcounts, if the
object is in the tree it should not be accessible by anything else than
the freeing code, with refs == 0.

Now the delayed nodes do not follow that and there were bugs where the 0
was increased again (ec35e48b286959991c "btrfs: fix refcount_t usage
when deleting btrfs_delayed_nodes").

Your patch fixes another instance, I still see some other refcount_inc
that seem to be safe but the call to refcount_inc_not_zero should have a
comment, like is in the mentioned commit.

> Fix this by zero'ing out that element in our
> temporary array so we don't try to kill it again.

The fix looks correct.


Re: BTRFS corruption, mounts but comes read-only

2019-09-30 Thread Remi Gauvin
Mounting the FS after that btrfs check, kernel 5.3.1, here is the dmesg log:

[10524.830640] BTRFS info (device sdf4): using free space tree
[10524.830644] BTRFS info (device sdf4): has skinny extents
[10549.561143] BTRFS info (device sdf4): checking UUID tree
[10606.900264] BTRFS info (device sdf4): leaf 3873928691712 gen 935882
total ptrs 147 free space 6236 owner 2
[10606.900268]  item 0 key (5126703497216 169 0) itemoff 15845 itemsize 438
[10606.900270]  extent refs 46 gen 885323 flags 2
[10606.900272]  ref#0: tree block backref root 10262
[10606.900274]  ref#1: shared block backref parent 6300271443968
[10606.900276]  ref#2: shared block backref parent 6300177645568
[10606.900278]  ref#3: shared block backref parent 6300087877632
[10606.900279]  ref#4: shared block backref parent 6299851292672
[10606.900281]  ref#5: shared block backref parent 6299576254464
[10606.900283]  ref#6: shared block backref parent 6299308621824
[10606.900284]  ref#7: shared block backref parent 6240101318656
[10606.900286]  ref#8: shared block backref parent 6239972589568
[10606.900288]  ref#9: shared block backref parent 5891141632000
[10606.900289]  ref#10: shared block backref parent 5891082911744
[10606.900291]  ref#11: shared block backref parent 5891042951168
[10606.900293]  ref#12: shared block backref parent 5890615246848
[10606.900294]  ref#13: shared block backref parent 5890463039488
[10606.900296]  ref#14: shared block backref parent 5890436890624
[10606.900298]  ref#15: shared block backref parent 5126231179264
[10606.900299]  ref#16: shared block backref parent 5125855133696
[10606.900301]  ref#17: shared block backref parent 5125777653760
[10606.900302]  ref#18: shared block backref parent 4717386399744
[10606.900304]  ref#19: shared block backref parent 4716953616384
[10606.900306]  ref#20: shared block backref parent 4716869992448
[10606.900307]  ref#21: shared block backref parent 4716604555264
[10606.900309]  ref#22: shared block backref parent 4716575670272
[10606.900311]  ref#23: shared block backref parent 4716547129344
[10606.900312]  ref#24: shared block backref parent 4418653339648
[10606.900314]  ref#25: shared block backref parent 4418457927680
[10606.900316]  ref#26: shared block backref parent 4418052734976
[10606.900317]  ref#27: shared block backref parent 4134522355712
[10606.900319]  ref#28: shared block backref parent 4133585420288
[10606.900320]  ref#29: shared block backref parent 4034188083200
[10606.900322]  ref#30: shared block backref parent 4034125447168
[10606.900324]  ref#31: shared block backref parent 4034077949952
[10606.900325]  ref#32: shared block backref parent 4034073935872
[10606.900327]  ref#33: shared block backref parent 4033156546560
[10606.900329]  ref#34: shared block backref parent 4032730021888
[10606.900330]  ref#35: shared block backref parent 4032197132288
[10606.900332]  ref#36: shared block backref parent 4032101384192
[10606.900333]  ref#37: shared block backref parent 4031949127680
[10606.900335]  ref#38: shared block backref parent 4031297765376
[10606.900337]  ref#39: shared block backref parent 4031202394112
[10606.900338]  ref#40: shared block backref parent 4030981292032
[10606.900340]  ref#41: shared block backref parent 4030556012544
[10606.900342]  ref#42: shared block backref parent 3874566619136
[10606.900343]  ref#43: shared block backref parent 3874460270592
[10606.900345]  ref#44: shared block backref parent 3874229714944
[10606.900347]  ref#45: shared block backref parent 3873657765888
[10606.900350]  item 1 key (5126703529984 169 0) itemoff 15812 itemsize 33
[10606.900351]  extent refs 1 gen 571786 flags 258
[10606.900352]  ref#0: tree block backref root 46426
[10606.900355]  item 2 key (5126703546368 169 0) itemoff 15779 itemsize 33
[10606.900357]  extent refs 1 gen 626172 flags 2
[10606.900390]  ref#0: tree block backref root 7
[10606.900393]  item 3 key (5126703562752 169 0) itemoff 15746 itemsize 33
[10606.900395]  extent refs 1 gen 626172 flags 2
[10606.900396]  ref#0: tree block backref root 7
[10606.900399]  item 4 key (5126703579136 169 0) itemoff 15713 itemsize 33
[10606.900401]  extent refs 1 gen 626172 flags 2
[10606.900402]  ref#0: tree block backref root 7
[10606.900404]  item 5 key (5126703661056 169 0) itemoff 15680 itemsize 33
[10606.900406]  extent refs 1 gen 573020 flags 258
[10606.900407]  ref#0: shared block backref parent 5891148972032
[10606.900410]  item 6 key (5126703677440 169 0) itemoff 15647 itemsize 33
[10606.900412]  extent refs 1 gen 626172 flags 2
[10606.9004

Re: [PATCH RFC] btrfs: relocation: Hunt down BUG_ON() in merge_reloc_roots()

2019-09-30 Thread David Sterba
On Thu, Sep 26, 2019 at 02:35:45PM +0800, Qu Wenruo wrote:
> [BUG]
> There is one BUG_ON() report where a transaction is aborted during
> balance, then kernel BUG_ON() in merge_reloc_roots():

Do you have details from the report, eg. what's the error code?

>   void merge_reloc_roots(struct reloc_control *rc)
>   {
>   ...
>   BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)); <<<
>   }
> 
> [CAUSE]
> It's still uncertain why we can get to such situation.
> As all __add_reloc_root() calls will also link that reloc root to
> rc->reloc_roots, and in merge_reloc_roots() we cleanup rc->reloc_roots.
> 
> So the root cause is still uncertain.
> 
> [FIX]
> But we can still hunt down all the BUG_ON() in merge_reloc_roots().
> 
> There are 3 BUG_ON()s in it:
> - BUG_ON() for read_fs_root() result
> - BUG_ON() for root->reloc_root != reloc_root case
> - BUG_ON() for the non-empty reloc_root_tree

relocation.c is worst regarding number of BUG_ONs, some of them look
like runtime assertions of the invariants but some of them are
substituting for error handling.

The first one BUG_ON(IS_ERR(root)); is clearly the latter, the other are
assertions

> 
> For the first two, just grab the return value and goto out tag for
> cleanup.
> 
> For the last one, make it more graceful by:
> - grab the lock before doing read/write
> - warn instead of panic
> - cleanup the nodes in rc->reloc_root_tree
> 
> Signed-off-by: Qu Wenruo 
> ---
> Reason for RFC:
> The root cause to leak nodes in reloc_root_tree is still unknown.
> ---
>  fs/btrfs/relocation.c | 39 ---
>  1 file changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 655f1d5a8c27..d562b5c52a40 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -2484,11 +2484,26 @@ void merge_reloc_roots(struct reloc_control *rc)
>   if (btrfs_root_refs(&reloc_root->root_item) > 0) {
>   root = read_fs_root(fs_info,
>   reloc_root->root_key.offset);
> - BUG_ON(IS_ERR(root));
> - BUG_ON(root->reloc_root != reloc_root);
> + if (IS_ERR(root)) {

This is bug_on -> error handling, ok

> + ret = PTR_ERR(root);
> + btrfs_err(fs_info,
> +   "failed to read root %llu: %d",
> +   reloc_root->root_key.offset, ret);
> + goto out;
> + }
> + if (root->reloc_root != reloc_root) {

With this one I'm not sure it could happen but replacing the bug on is
still good.

> + btrfs_err(fs_info,
> + "reloc root mismatch for root %llu",

Would be good to print the number of the other root as well.

> + root->root_key.objectid);
> + ret = -EINVAL;
> + goto out;
> + }
>  
>   ret = merge_reloc_root(rc, root);
>   if (ret) {
> + btrfs_err(fs_info,
> + "failed to merge reloc tree for root %llu: %d",
> +   root->root_key.objectid, ret);
>   if (list_empty(&reloc_root->root_list))
>   list_add_tail(&reloc_root->root_list,
> &reloc_roots);
> @@ -2520,7 +2535,25 @@ void merge_reloc_roots(struct reloc_control *rc)
>   free_reloc_roots(&reloc_roots);
>   }
>  
> - BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root));

This one looks more like the invariant, the tree should be really empty
here. While the cleanup is trying to make things work despite there's a
problem, I think the warning should not be debugging only.

> + spin_lock(&rc->reloc_root_tree.lock);
> + /* Cleanup dirty reloc tree nodes */
> + if (!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)) {
> + struct mapping_node *node;
> + struct mapping_node *next;
> +
> + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));

...


Re: Btrfs partition mount error

2019-09-30 Thread Andrey Ivanov

On 30.09.2019 15:27, Qu Wenruo wrote:

We have another report internally about a similar corruption (multiple
bit flips in a single fs), and they are also using VMware, along with
VMware guest kernel modules.

Would you mind to migrate to KVM based hypervisor to see if the
corruption happens again?


I had this problem with btrfs after more than a year of using btrfs and
vmware.


So it looks like some regression, although still not sure who is to blame.


One more clarification. I had this problem with btrfs after switching from
linux-4.19.52-gentoo kernel to linux-4.19.66-gentoo.


Re: [PATCH] Btrfs: fix memory leak due to concurrent append writes with fiemap

2019-09-30 Thread David Sterba
On Mon, Sep 30, 2019 at 10:20:25AM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> When we have a buffered write that starts at an offset greater than or
> equals to the file's size happening concurrently with a full ranged
> fiemap, we can end up leaking an extent state structure.
> 
> Suppose we have a file with a size of 1Mb, and before the buffered write
> and fiemap are performed, it has a single extent state in its io tree
> representing the range from 0 to 1Mb, with the EXTENT_DELALLOC bit set.
> 
> The following sequence diagram shows how the memory leak happens if a
> fiemap a buffered write, starting at offset 1Mb and with a length of
> 4Kb, are performed concurrently.
> 
>   CPU 1  CPU 2
> 
>   extent_fiemap()
> --> it's a full ranged fiemap
> range from 0 to LLONG_MAX - 1
> (9223372036854775807)
> 
> --> locks range in the inode's
> io tree
>   --> after this we have 2 extent
>   states in the io tree:
>   --> 1 for range [0, 1Mb[ with
>   the bits EXTENT_LOCKED and
>   EXTENT_DELALLOC_BITS set
>   --> 1 for the range
>   [1Mb, LLONG_MAX[ with
>   the EXTENT_LOCKED bit set
> 
>   --> start buffered write at 
> offset
>   1Mb with a length of 4Kb
> 
>   btrfs_file_write_iter()
> 
> btrfs_buffered_write()
>   --> cached_state is NULL
> 
>   
> lock_and_cleanup_extent_if_need()
> --> returns 0 and 
> does not lock
> range because it 
> starts
> at current i_size 
> / eof
> 
>   --> cached_state 
> remains NULL
> 
>   btrfs_dirty_pages()
> 
> btrfs_set_extent_delalloc()
>   (...)
>   __set_extent_bit()
> 
> --> splits extent 
> state for range
> [1Mb, 
> LLONG_MAX[ and now we
> have 2 extent 
> states:
> 
> --> one for 
> the range
> [1Mb, 1Mb 
> + 4Kb[ with
> 
> EXTENT_LOCKED set
> --> another 
> one for the range
> [1Mb + 
> 4Kb, LLONG_MAX[ with
> 
> EXTENT_LOCKED set as well
> 
> --> sets 
> EXTENT_DELALLOC on the
> extent state 
> for the range
> [1Mb, 1Mb + 
> 4Kb[
> --> caches extent 
> state
> [1Mb, 1Mb + 
> 4Kb[ into
> @cached_state 
> because it has
> the bit 
> EXTENT_LOCKED set
> 
> --> 
> btrfs_buffered_write() ends up
> with a non-NULL 
> cached_state and
> never calls anything 
> to release its
> reference on it, 
> resulting in a
> memory leak
> 
> Fix this by calling free_extent_state() on cached_state if the range was
> not locked by lock_and_cleanup_extent_if_need().
> 
> The same issue can happen if anything else other than fiemap locks a range
> that covers eof and beyond.
> 
> This could be triggered, sporadically, by test case generic/561 from the
> fstests suite, which makes duperemove run concurrently with fsstress, and
> duperemove does plenty of calls to fiemap. When CONFIG_BTRFS_DEBUG is set
> the leak is reported in dmesg/syslog when removing the btrfs module with
> a message like

Re: [PATCH RFC] btrfs: relocation: Hunt down BUG_ON() in merge_reloc_roots()

2019-09-30 Thread Qu Wenruo


On 2019/10/1 上午1:51, David Sterba wrote:
> On Thu, Sep 26, 2019 at 02:35:45PM +0800, Qu Wenruo wrote:
>> [BUG]
>> There is one BUG_ON() report where a transaction is aborted during
>> balance, then kernel BUG_ON() in merge_reloc_roots():
> 
> Do you have details from the report, eg. what's the error code?

Nope, what I get is only kernel message.

Not enough to get the error code.

> 
>>   void merge_reloc_roots(struct reloc_control *rc)
>>   {
>>  ...
>>  BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)); <<<
>>   }
>>
>> [CAUSE]
>> It's still uncertain why we can get to such situation.
>> As all __add_reloc_root() calls will also link that reloc root to
>> rc->reloc_roots, and in merge_reloc_roots() we cleanup rc->reloc_roots.
>>
>> So the root cause is still uncertain.
>>
>> [FIX]
>> But we can still hunt down all the BUG_ON() in merge_reloc_roots().
>>
>> There are 3 BUG_ON()s in it:
>> - BUG_ON() for read_fs_root() result
>> - BUG_ON() for root->reloc_root != reloc_root case
>> - BUG_ON() for the non-empty reloc_root_tree
> 
> relocation.c is worst regarding number of BUG_ONs, some of them look
> like runtime assertions of the invariants but some of them are
> substituting for error handling.

Yeah, if we inject kmalloc error into btrfs_relocate_block_group(), it's
too easy to hit BUG_ON().

> 
> The first one BUG_ON(IS_ERR(root)); is clearly the latter, the other are
> assertions
> 
>>
>> For the first two, just grab the return value and goto out tag for
>> cleanup.
>>
>> For the last one, make it more graceful by:
>> - grab the lock before doing read/write
>> - warn instead of panic
>> - cleanup the nodes in rc->reloc_root_tree
>>
>> Signed-off-by: Qu Wenruo 
>> ---
>> Reason for RFC:
>> The root cause to leak nodes in reloc_root_tree is still unknown.
>> ---
>>  fs/btrfs/relocation.c | 39 ---
>>  1 file changed, 36 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index 655f1d5a8c27..d562b5c52a40 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -2484,11 +2484,26 @@ void merge_reloc_roots(struct reloc_control *rc)
>>  if (btrfs_root_refs(&reloc_root->root_item) > 0) {
>>  root = read_fs_root(fs_info,
>>  reloc_root->root_key.offset);
>> -BUG_ON(IS_ERR(root));
>> -BUG_ON(root->reloc_root != reloc_root);
>> +if (IS_ERR(root)) {
> 
> This is bug_on -> error handling, ok
> 
>> +ret = PTR_ERR(root);
>> +btrfs_err(fs_info,
>> +  "failed to read root %llu: %d",
>> +  reloc_root->root_key.offset, ret);
>> +goto out;
>> +}
>> +if (root->reloc_root != reloc_root) {
> 
> With this one I'm not sure it could happen but replacing the bug on is
> still good.
> 
>> +btrfs_err(fs_info,
>> +"reloc root mismatch for root %llu",
> 
> Would be good to print the number of the other root as well.
> 
>> +root->root_key.objectid);
>> +ret = -EINVAL;
>> +goto out;
>> +}
>>  
>>  ret = merge_reloc_root(rc, root);
>>  if (ret) {
>> +btrfs_err(fs_info,
>> +"failed to merge reloc tree for root %llu: %d",
>> +  root->root_key.objectid, ret);
>>  if (list_empty(&reloc_root->root_list))
>>  list_add_tail(&reloc_root->root_list,
>>&reloc_roots);
>> @@ -2520,7 +2535,25 @@ void merge_reloc_roots(struct reloc_control *rc)
>>  free_reloc_roots(&reloc_roots);
>>  }
>>  
>> -BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root));
> 
> This one looks more like the invariant, the tree should be really empty
> here. While the cleanup is trying to make things work despite there's a
> problem, I think the warning should not be debugging only.

Indeed, we should output the warning no matter whatever.

Thanks,
Qu
> 
>> +spin_lock(&rc->reloc_root_tree.lock);
>> +/* Cleanup dirty reloc tree nodes */
>> +if (!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)) {
>> +struct mapping_node *node;
>> +struct mapping_node *next;
>> +
>> +WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> 
> ...
> 



signature.asc
Description: OpenPGP digital signature


Re: BTRFS corruption, mounts but comes read-only

2019-09-30 Thread Qu Wenruo


On 2019/10/1 上午5:08, Remi Gauvin wrote:
> Mounting the FS after that btrfs check, kernel 5.3.1, here is the dmesg log:

As that btrfs check shows, your extent tree is corrupted.
Quite some backref is lost, thus no wonder some write opeartion would
cause problem.

In that case, it looks like only extent tree is corrupted, and no
transid error.

If you have data backed up, you would like to btrfs check --repair to
see if it can repair them.

Thanks,
Qu
> 
> [10524.830640] BTRFS info (device sdf4): using free space tree
> [10524.830644] BTRFS info (device sdf4): has skinny extents
> [10549.561143] BTRFS info (device sdf4): checking UUID tree
> [10606.900264] BTRFS info (device sdf4): leaf 3873928691712 gen 935882
> total ptrs 147 free space 6236 owner 2
> [10606.900268]item 0 key (5126703497216 169 0) itemoff 15845 itemsize 
> 438
> [10606.900270]extent refs 46 gen 885323 flags 2
> [10606.900272]ref#0: tree block backref root 10262
> [10606.900274]ref#1: shared block backref parent 6300271443968
> [10606.900276]ref#2: shared block backref parent 6300177645568
> [10606.900278]ref#3: shared block backref parent 6300087877632
> [10606.900279]ref#4: shared block backref parent 6299851292672
> [10606.900281]ref#5: shared block backref parent 6299576254464
> [10606.900283]ref#6: shared block backref parent 6299308621824
> [10606.900284]ref#7: shared block backref parent 6240101318656
> [10606.900286]ref#8: shared block backref parent 6239972589568
> [10606.900288]ref#9: shared block backref parent 5891141632000
> [10606.900289]ref#10: shared block backref parent 
> 5891082911744
> [10606.900291]ref#11: shared block backref parent 
> 5891042951168
> [10606.900293]ref#12: shared block backref parent 
> 5890615246848
> [10606.900294]ref#13: shared block backref parent 
> 5890463039488
> [10606.900296]ref#14: shared block backref parent 
> 5890436890624
> [10606.900298]ref#15: shared block backref parent 
> 5126231179264
> [10606.900299]ref#16: shared block backref parent 
> 5125855133696
> [10606.900301]ref#17: shared block backref parent 
> 5125777653760
> [10606.900302]ref#18: shared block backref parent 
> 4717386399744
> [10606.900304]ref#19: shared block backref parent 
> 4716953616384
> [10606.900306]ref#20: shared block backref parent 
> 4716869992448
> [10606.900307]ref#21: shared block backref parent 
> 4716604555264
> [10606.900309]ref#22: shared block backref parent 
> 4716575670272
> [10606.900311]ref#23: shared block backref parent 
> 4716547129344
> [10606.900312]ref#24: shared block backref parent 
> 4418653339648
> [10606.900314]ref#25: shared block backref parent 
> 4418457927680
> [10606.900316]ref#26: shared block backref parent 
> 4418052734976
> [10606.900317]ref#27: shared block backref parent 
> 4134522355712
> [10606.900319]ref#28: shared block backref parent 
> 4133585420288
> [10606.900320]ref#29: shared block backref parent 
> 4034188083200
> [10606.900322]ref#30: shared block backref parent 
> 4034125447168
> [10606.900324]ref#31: shared block backref parent 
> 4034077949952
> [10606.900325]ref#32: shared block backref parent 
> 4034073935872
> [10606.900327]ref#33: shared block backref parent 
> 4033156546560
> [10606.900329]ref#34: shared block backref parent 
> 4032730021888
> [10606.900330]ref#35: shared block backref parent 
> 4032197132288
> [10606.900332]ref#36: shared block backref parent 
> 4032101384192
> [10606.900333]ref#37: shared block backref parent 
> 4031949127680
> [10606.900335]ref#38: shared block backref parent 
> 4031297765376
> [10606.900337]ref#39: shared block backref parent 
> 4031202394112
> [10606.900338]ref#40: shared block backref parent 
> 4030981292032
> [10606.900340]ref#41: shared block backref parent 
> 4030556012544
> [10606.900342]ref#42: shared block backref parent 
> 3874566619136
> [10606.900343]ref#43: shared block backref parent 
> 3874460270592
> [10606.900345]ref#44: shared block backref parent 
> 3874229714944
> [10606.900347]ref#45: shared block backref parent 
> 3873657765888
> [10606.900350]item 1 key (5126703529984 169 0) itemoff 15812 itemsize 
> 33
> [10606.900351]extent refs 1 gen 571786 flags 258
> [10606.900352]ref#0: tree block backref root 46426
> [

Re: BTRFS corruption, mounts but comes read-only

2019-09-30 Thread Remi Gauvin
On 2019-09-30 9:30 p.m., Qu Wenruo wrote:
> 
> 
> On 2019/10/1 上午5:08, Remi Gauvin wrote:
>> Mounting the FS after that btrfs check, kernel 5.3.1, here is the dmesg log:
> 
> As that btrfs check shows, your extent tree is corrupted.
> Quite some backref is lost, thus no wonder some write opeartion would
> cause problem.
> 
> In that case, it looks like only extent tree is corrupted, and no
> transid error.
> 
> If you have data backed up, you would like to btrfs check --repair to
> see if it can repair them.
> 
> Thanks,
> Qu


This is itself a back-up system, but not wishing to tempt fate, I'm
going to have to create a working substitute before I do anything risky
with it.  It would take me several days and lots of travel to re-aquire
them.

Any idea what could have happened?  The data and meta data should be
raid1, and none of the drives have any io errors of any kinds in their
SMART log.

Thank you for your guidance.






signature.asc
Description: OpenPGP digital signature


Re: BTRFS corruption, mounts but comes read-only

2019-09-30 Thread Qu Wenruo


On 2019/10/1 上午9:41, Remi Gauvin wrote:
> On 2019-09-30 9:30 p.m., Qu Wenruo wrote:
>>
>>
>> On 2019/10/1 上午5:08, Remi Gauvin wrote:
>>> Mounting the FS after that btrfs check, kernel 5.3.1, here is the dmesg log:
>>
>> As that btrfs check shows, your extent tree is corrupted.
>> Quite some backref is lost, thus no wonder some write opeartion would
>> cause problem.
>>
>> In that case, it looks like only extent tree is corrupted, and no
>> transid error.
>>
>> If you have data backed up, you would like to btrfs check --repair to
>> see if it can repair them.
>>
>> Thanks,
>> Qu
> 
> 
> This is itself a back-up system, but not wishing to tempt fate, I'm
> going to have to create a working substitute before I do anything risky
> with it.  It would take me several days and lots of travel to re-aquire
> them.
> 
> Any idea what could have happened?

Maybe some existing bugs, since it's an older kernel.

But at least, no tree relationship corruption. So you should be pretty OK.
You could try btrfs check --init-extent-tree, as I see no error report
from fs tree check.

Since it's extent tree only mismatch, you can in fact mount the fs RO
and grab all data.

>  The data and meta data should be
> raid1, and none of the drives have any io errors of any kinds in their
> SMART log.

It's indeed a symptom of btrfs kernel module bug. But at least looks
repairable.

Thanks,
Qu
> 
> Thank you for your guidance.
> 
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: Btrfs partition mount error

2019-09-30 Thread Andrey Ivanov

On 30.09.2019 9:27, Qu Wenruo wrote:

I recommend to do a "btrfs check" on all fses.


I had done "btrfs check" on /dev/sda4:

attached btrfs-check-sda4.output

There are some errors. How to fix them?
[1/7] checking root items
[2/7] checking extents
incorrect local backref count on 1533538304 root 5 owner 7662482 offset 0 found 
1 wanted 4194305 back 0x55bbb3f482a0
backpointer mismatch on [1533538304 16384]
data backref 1533759488 root 5 owner 10122609 offset 0 num_refs 0 not found in 
extent tree
incorrect local backref count on 1533759488 root 5 owner 10122609 offset 0 
found 1 wanted 0 back 0x55bbb1c76e40
incorrect local backref count on 1533759488 root 16389 owner 10122609 offset 0 
found 0 wanted 1 back 0x55bbb3f49d00
backref disk bytenr does not match extent record, bytenr=1533759488, ref 
bytenr=0
backpointer mismatch on [1533759488 4096]
data backref 1534713856 root 5 owner 7312396 offset 4096 num_refs 0 not found 
in extent tree
incorrect local backref count on 1534713856 root 5 owner 7312396 offset 4096 
found 1 wanted 0 back 0x55bbac7e8ad0
incorrect local backref count on 1534713856 root 274877906949 owner 7312396 
offset 4096 found 0 wanted 1 back 0x55bbb3be83c0
backref disk bytenr does not match extent record, bytenr=1534713856, ref 
bytenr=0
backpointer mismatch on [1534713856 4096]
backref bytes do not match extent backref, bytenr=1985052672, ref bytes=20480, 
backref bytes=86016
backpointer mismatch on [1985052672 20480]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
root 5 inode 843065 errors 100, file extent discount
Found file extent holes:
start: 0, len: 4294971392
unresolved ref dir 843063 index 3 namelen 13 name Templates.msf 
filetype 1 errors 4, no inode ref
unresolved ref dir 843063 index 3 namelen 13 name Templates/msf 
filetype 0 errors 3, no dir item, no dir index
unresolved ref dir 843063 index 5 namelen 6 name Drafts filetype 1 
errors 4, no inode ref
unresolved ref dir 843063 index 5 namelen 6 name Dsafts filetype 0 
errors 3, no dir item, no dir index
root 5 inode 843068 errors 400, nbytes wrong
unresolved ref dir 843063 index 15 namelen 14 name filterlog.html 
filetype 1 errors 2, no dir index
unresolved ref dir 843063 index 18 namelen 8 name Sent.sbd filetype 2 
errors 2, no dir index
root 5 inode 908624 errors 1, no inode item
unresolved ref dir 843063 index 15 namelen 14 name filterlog.html 
filetype 1 errors 5, no dir item, no inode ref
root 5 inode 908627 errors 1, no inode item
unresolved ref dir 843063 index 18 namelen 8 name Sent/sbd filetype 2 
errors 5, no dir item, no inode ref
root 5 inode 7287329 errors 100, file extent discount
Found file extent holes:
start: 0, len: 8192
ERROR: errors found in fs roots
Opening filesystem to check...
Checking filesystem on /dev/sda4
UUID: a942b8da-e92d-4348-8de9-ded1e5e095ad
found 184471662592 bytes used, error(s) found
total csum bytes: 178843432
total tree bytes: 1172783104
total fs tree bytes: 921796608
total extent tree bytes: 52232192
btree space waste bytes: 184041078
file data blocks allocated: 538597539840
 referenced 190557810688


Re: Btrfs partition mount error

2019-09-30 Thread Andrey Ivanov

On 30.09.2019 10:31, Qu Wenruo wrote:

Please provide the following dump:

# btrfs ins dump-tree -b 855738744832 /dev/sdc1


attached btrfs-ins-dump-tree-855738744832-sdc1.output


OK, tree-checker is again detecting the problem correctly.

item 19 key (613039370240 EXTENT_ITEM 1048576) itemoff 15223 itemsize 53
refs 1 gen 52116 flags DATA
extent data backref root FS_TREE objectid 5841996 offset 
607125504 count 1
item 20 key (613040418816 EXTENT_ITEM 1032192) itemoff 15170 itemsize 
117
refs 1 gen 52162 flags DATA
extent data backref root FS_TREE objectid 5842000 offset 
927858688 count 1

Item 20 should be a regular single reference, but its size is 117.
But please check this:

117 = 0x75
53  = 0x35

The difference is 0x40, which is a single bit.

And if itemsize for slot 20 is regular 53, then it matches its neighbors
without any problem.

So at least to me, this looks like a bitflip.
Although I'm not sure if it's btrfs itself causing the problem or it's
the hardware or even 3rd party kernel modules to blame.


I ran memtest86 and discovered that, indeed, one of the memory modules has a 
bit flip:

   
https://drive.google.com/file/d/1pnHfUcoSwH8DEnWfl0S7newMLBVo3eXE/view?usp=sharing
   
https://drive.google.com/file/d/1iDOlSFfMSZ1ffxrSQTeXDmv0luOHT3yS/view?usp=sharing

As you can see on screenshots, two bits flip:
- 0x0040
- 0x0001


 > 117 = 0x75
 > 53  = 0x35
 >
 > The difference is 0x40, which is a single bit.

I think it's the 0x0040 bit.



Re: Btrfs partition mount error

2019-09-30 Thread Qu Wenruo


On 2019/10/1 下午12:08, Andrey Ivanov wrote:
> On 30.09.2019 9:27, Qu Wenruo wrote:
>> I recommend to do a "btrfs check" on all fses.
> 
> I had done "btrfs check" on /dev/sda4:
> 
> attached btrfs-check-sda4.output
> 
> There are some errors. How to fix them?

It looks like "btrfs check --repair" can handle most of these bugs.

Please do a backup and try repair.

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Setup GitLab-CI for btrfs-progs

2019-09-30 Thread Philipp Hahn
Hi,

I'm not yet a GitLab expert myself, but AFAIK ...

Am 30.09.19 um 18:56 schrieb Lakshmipathi.G:
> diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
> new file mode 100644
> index 000..2afde50
> --- /dev/null
> +++ b/.gitlab-ci.yml
> @@ -0,0 +1,181 @@
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public
> +# License v2 as published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public
> +# License along with this program; if not, write to the
> +# Free Software Foundation, Inc., 59 Temple Place - Suite 330,
> +# Boston, MA 021110-1307, USA.
> +#
> +
> +image: docker:18.09.7
> +
> +services:
> +- docker:18.09.7-dind
> +
> +variables:
> +  DOCKER_DRIVER: overlay2
> +
> +stages:
> +  - build
> +  - btrfs-progs build
> +  - test
> +
> +variables:

You already have a "variables" section above - merge them?

> +  DOCKER_DRIVER: overlay2
> +  IMAGE_TAG: registry.gitlab.com/$CI_PROJECT_NAMESPACE/btrfs-progs:gitlab-ci
> +
> +before_script:
> +   - docker login --username $CI_REGISTRY_USER --password 
> $CI_REGISTRY_PASSWORD $CI_REGISTRY
> +
> +docker build:
> +  stage: build
> +  script:
> +- cd gitlab-ci
> +- docker pull $IMAGE_TAG > /dev/null && echo "Downloaded image" || ( 
> docker build -t $IMAGE_TAG . && docker push $IMAGE_TAG )
> +- cd ..
> +
> +## To enable or disable Kernel Build set BUILD_KERNEL: "1" or BUILD_KERNEL: 
> "0" 
> +## If you disable Kernel Build, make sure PREBUILT_KERNEL_ID points to 
> previously built the kernel job id.
> +
> +kernel build:
> +  variables:
> +BUILD_KERNEL: "1"
> +PREBUILT_KERNEL_ID: "288159334"
> +  before_script:
> +- apk add curl unzip 
> +  stage: build
> +  services:
> +- docker:18.09.7-dind

You already have "services" defined globally - no need to repeat that
here again.

> +  script:
> + - if [ "$BUILD_KERNEL" == "1" ]; then
> + docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged 
> --device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/kernel_build.sh;
> +   else
> + curl -o bzImage.zip --location --header "JOB-TOKEN:$CI_JOB_TOKEN"  
> "https://gitlab.com/api/v4/projects/$CI_PROJECT_ID/jobs/$PREBUILT_KERNEL_ID/artifacts";
>  && unzip bzImage.zip;
> +   fi;
> +  artifacts:
> +when: always
> +paths:
> +  - bzImage
> +
> +# To enable or disable image build update BUILD_IMAGE value to "1" or "0".
> +# If you disable Image Build, make sure PREBUILT_IMAGE_ID points to 
> previously built rootfs job id.
> + 
> +image build:
> +  variables:
> +BUILD_IMAGE: "1"
> +PREBUILT_IMAGE_ID: "288506168"
> +  before_script:
> +- apk add curl unzip 
> +  stage: build
> +  services:
> +- docker:18.09.7-dind

dito

> +  script:
> + - if [ "$BUILD_IMAGE" == "1" ]; then
> +  docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged 
> --device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/setup_image.sh;
> +   else
> +  curl  -o qemu-image.img.zip --location --header 
> "JOB-TOKEN:$CI_JOB_TOKEN" 
> "https://gitlab.com/api/v4/projects/$CI_PROJECT_ID/jobs/$PREBUILT_IMAGE_ID/artifacts";
>  && unzip qemu-image.img.zip;
> +   fi;
> +  artifacts:
> +when: always
> +paths:
> +  - qemu-image.img
> +
> +btrfs-progs build:
> +  stage: btrfs-progs build
> +  services:
> +- docker:18.09.7-dind

dito

> +  script:
> + - docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged 
> --device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/run_tests.sh
> +  artifacts:
> +expire_in: 1 week
> +when: always
> +paths:
> +  - qemu-image.img
> +
> +cli tests:
> +  stage: test
> +  services:
> +- docker:18.09.7-dind

dito

> +  script:
> + - echo "./cli-tests.sh" > $PWD/cmd
> + - docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged 
> --device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/run_tests.sh
> + - test -e "result" || exit 1 # If result doesn't exists, job failed.
> +  artifacts:
> +when: always
> +paths:
> +  - "*tests-results.txt"
> +
> +convert tests:
> +  only: 
> +- devel
> +  stage: test
> +  services:
> +- docker:18.09.7-dind

dito

> +  script:
> + - echo "./convert-tests.sh" > $PWD/cmd
> + - docker run --cap-add SYS_PTRACE --cap-add sys_admin --privileged 
> --device=/dev/kvm -v $PWD:/repo $IMAGE_TAG /repo/gitlab-ci/run_tests.sh
> + - test -e "result" || exit 1
> +  artifacts:
> +when: always
> +paths:
> +  - "*tests-results.txt"
> +
> +fsck tests:
> +  stage: test
> +  services:
> +- docker:18.09.7-dind

dito

> +  script:
> + - echo "./fsck-tests.sh" >

[PATCH v3] btrfs: removed unused return variable

2019-09-30 Thread Aliasgar Surti
From: Aliasgar Surti 

Removed unused return variable and replaced it with returning
the value directly. Also removed the unnecessary forward declaration.

Signed-off-by: Aliasgar Surti 
---
v2: Made btrfs_destroy_delayed_refs() as void and removed its declaration
v3: Reverted back the change where the function was made void from int
---
 fs/btrfs/disk-io.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 044981c..e381368 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -52,8 +52,6 @@
 static const struct extent_io_ops btree_extent_io_ops;
 static void end_workqueue_fn(struct btrfs_work *work);
 static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
-static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
- struct btrfs_fs_info *fs_info);
 static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root);
 static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info,
struct extent_io_tree *dirty_pages,
@@ -4255,7 +4253,6 @@ static int btrfs_destroy_delayed_refs(struct 
btrfs_transaction *trans,
struct rb_node *node;
struct btrfs_delayed_ref_root *delayed_refs;
struct btrfs_delayed_ref_node *ref;
-   int ret = 0;
 
delayed_refs = &trans->delayed_refs;
 
@@ -4263,7 +4260,7 @@ static int btrfs_destroy_delayed_refs(struct 
btrfs_transaction *trans,
if (atomic_read(&delayed_refs->num_entries) == 0) {
spin_unlock(&delayed_refs->lock);
btrfs_info(fs_info, "delayed_refs has NO entry");
-   return ret;
+   return 0;
}
 
while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
@@ -4306,8 +4303,7 @@ static int btrfs_destroy_delayed_refs(struct 
btrfs_transaction *trans,
}
 
spin_unlock(&delayed_refs->lock);
-
-   return ret;
+   return 0;
 }
 
 static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root)
-- 
2.7.4