date:20180620

[RFC PATCH] btrfs: Remove V0 extent support

2018-06-20 Thread Nikolay Borisov

The v0 compat code was introduced in commit 5d4f98a28c7d
("Btrfs: Mixed back reference  (FORWARD ROLLING FORMAT CHANGE)") 9
years ago, which was merged in 2.6.31. This means that the code is
there to support filesystems which are _VERY_ old and if you are using
btrfs on such an old kernel, you have much bigger problems. This coupled
with the fact that no one is likely testing/maintining this code likely
means it has bugs lurking. All things considered I think 43 kernel
releases later it's high time this remnant of the past got removed.

This patch removes all code wrapped in #ifdefs but leaves the BUG_ONs in case
we have a v0 with no support intact as a sort of safety-net. 

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.c   |   6 +-
 fs/btrfs/ctree.h   |   2 -
 fs/btrfs/extent-tree.c | 209 +
 fs/btrfs/print-tree.c  |  30 +--
 fs/btrfs/relocation.c  | 151 +--
 5 files changed, 4 insertions(+), 394 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 6879697520d5..d1273ec2e0d5 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -888,11 +888,7 @@ int btrfs_block_can_be_shared(struct btrfs_root *root,
 btrfs_root_last_snapshot(&root->root_item) ||
 btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)))
return 1;
-#ifdef BTRFS_COMPAT_EXTENT_TREE_V0
-   if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
-   btrfs_header_backref_rev(buf) < BTRFS_MIXED_BACKREF_REV)
-   return 1;
-#endif
+
return 0;
 }
 
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index e671a1fcbbec..bc52bf7ac572 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -55,8 +55,6 @@ struct btrfs_ordered_sum;
 
 #define BTRFS_OLDEST_GENERATION0ULL
 
-#define BTRFS_COMPAT_EXTENT_TREE_V0
-
 /*
  * the max metadata block size.  This limit is somewhat artificial,
  * but the memmove costs go through the roof for larger blocks.
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0f7e797236dc..4129831523a2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -870,17 +870,7 @@ int btrfs_lookup_extent_info(struct btrfs_trans_handle 
*trans,
num_refs = btrfs_extent_refs(leaf, ei);
extent_flags = btrfs_extent_flags(leaf, ei);
} else {
-#ifdef BTRFS_COMPAT_EXTENT_TREE_V0
-   struct btrfs_extent_item_v0 *ei0;
-   BUG_ON(item_size != sizeof(*ei0));
-   ei0 = btrfs_item_ptr(leaf, path->slots[0],
-struct btrfs_extent_item_v0);
-   num_refs = btrfs_extent_refs_v0(leaf, ei0);
-   /* FIXME: this isn't correct for data */
-   extent_flags = BTRFS_BLOCK_FLAG_FULL_BACKREF;
-#else
BUG();
-#endif
}
BUG_ON(num_refs == 0);
} else {
@@ -1039,89 +1029,6 @@ int btrfs_lookup_extent_info(struct btrfs_trans_handle 
*trans,
  * tree block info structure.
  */
 
-#ifdef BTRFS_COMPAT_EXTENT_TREE_V0
-static int convert_extent_item_v0(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info,
- struct btrfs_path *path,
- u64 owner, u32 extra_size)
-{
-   struct btrfs_root *root = fs_info->extent_root;
-   struct btrfs_extent_item *item;
-   struct btrfs_extent_item_v0 *ei0;
-   struct btrfs_extent_ref_v0 *ref0;
-   struct btrfs_tree_block_info *bi;
-   struct extent_buffer *leaf;
-   struct btrfs_key key;
-   struct btrfs_key found_key;
-   u32 new_size = sizeof(*item);
-   u64 refs;
-   int ret;
-
-   leaf = path->nodes[0];
-   BUG_ON(btrfs_item_size_nr(leaf, path->slots[0]) != sizeof(*ei0));
-
-   btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
-   ei0 = btrfs_item_ptr(leaf, path->slots[0],
-struct btrfs_extent_item_v0);
-   refs = btrfs_extent_refs_v0(leaf, ei0);
-
-   if (owner == (u64)-1) {
-   while (1) {
-   if (path->slots[0] >= btrfs_header_nritems(leaf)) {
-   ret = btrfs_next_leaf(root, path);
-   if (ret < 0)
-   return ret;
-   BUG_ON(ret > 0); /* Corruption */
-   leaf = path->nodes[0];
-   }
-   btrfs_item_key_to_cpu(leaf, &found_key,
- path->slots[0]);
-   BUG_ON(key.objectid != found_key.objectid);
-   if (found_key.type != BTRFS_EXTENT_REF_V0_KEY) {
-   path->slots[0]++;
-   continue;
-

Re: [PATCH 34/34] btrfs: Remove fs_info from convert_extent_item_v0

2018-06-20 Thread Nikolay Borisov




On 21.06.2018 05:05, Qu Wenruo wrote:
> 
> 
> On 2018年06月20日 20:49, Nikolay Borisov wrote:
>> It can be referenced from trans since the function is always called
>> within a transaction
>>
>> Signed-off-by: Nikolay Borisov 
>> ---
>>  fs/btrfs/extent-tree.c | 10 --
>>  1 file changed, 4 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index c3c3e6f3b72c..9c0e15b057a0 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -1038,10 +1038,10 @@ int btrfs_lookup_extent_info(struct 
>> btrfs_trans_handle *trans,
>>  
>>  #ifdef BTRFS_COMPAT_EXTENT_TREE_V0
> 
> Do we really need to update this code?
> 
> V0 extent tree is already deprecated, maybe it's a good time to remove
> them now?

I did it for the sake of consistency. David said in the past that
actually removing the v0 code might be what we want to do. I don't have
strong preferences either way.

> 
> Thanks,
> Qu
> 
>>  static int convert_extent_item_v0(struct btrfs_trans_handle *trans,
>> -  struct btrfs_fs_info *fs_info,
>>struct btrfs_path *path,
>>u64 owner, u32 extra_size)
>>  {
>> +struct btrfs_fs_info *fs_info = trans->fs_info;
>>  struct btrfs_root *root = fs_info->extent_root;
>>  struct btrfs_extent_item *item;
>>  struct btrfs_extent_item_v0 *ei0;
>> @@ -1682,8 +1682,7 @@ int lookup_inline_extent_backref(struct 
>> btrfs_trans_handle *trans,
>>  err = -ENOENT;
>>  goto out;
>>  }
>> -ret = convert_extent_item_v0(trans, fs_info, path, owner,
>> - extra_size);
>> +ret = convert_extent_item_v0(trans, path, owner, extra_size);
>>  if (ret < 0) {
>>  err = ret;
>>  goto out;
>> @@ -2384,7 +2383,7 @@ static int run_delayed_extent_op(struct 
>> btrfs_trans_handle *trans,
>>  item_size = btrfs_item_size_nr(leaf, path->slots[0]);
>>  #ifdef BTRFS_COMPAT_EXTENT_TREE_V0
>>  if (item_size < sizeof(*ei)) {
>> -ret = convert_extent_item_v0(trans, fs_info, path, (u64)-1, 0);
>> +ret = convert_extent_item_v0(trans, path, (u64)-1, 0);
>>  if (ret < 0) {
>>  err = ret;
>>  goto out;
>> @@ -6937,8 +6936,7 @@ static int __btrfs_free_extent(struct 
>> btrfs_trans_handle *trans,
>>  #ifdef BTRFS_COMPAT_EXTENT_TREE_V0
>>  if (item_size < sizeof(*ei)) {
>>  BUG_ON(found_extent || extent_slot != path->slots[0]);
>> -ret = convert_extent_item_v0(trans, info, path, owner_objectid,
>> - 0);
>> +ret = convert_extent_item_v0(trans, path, owner_objectid, 0);
>>  if (ret < 0) {
>>  btrfs_abort_transaction(trans, ret);
>>  goto out;
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/34] fs_info cleanup of extent-tree.c

2018-06-20 Thread Qu Wenruo




On 2018年06月20日 20:48, Nikolay Borisov wrote:
> Hello, 
> 
> This series aims at removing all the redundant btrfs_fs_info args being 
> passed to functions in extent-tree.c. Each patch removes the arg from a 
> one function hence it should be fairly easy to review each one of those 
> patches. I'm mainly exploiting the fact that most of the time we have a 
> function which takes a transaction handle, which is always valid (ie can't be 
> null) and at the same time we are passing an fs_info. The former actually 
> contains a reference to the fs info so can be referenced directly from the 
> transaction. Additionally, 2 patches also exploit the fact that block group 
> cache structs also hold a reference to fs_info so there is no point in 
> passing it there as well. 
> 
> To spice things up a bit, here is the output of stackdelta before/after the 
> patch set is applied: 
> 
> ./fs/btrfs/extent-tree.c  __btrfs_inc_extent_ref  152 144 -8
> ./fs/btrfs/extent-tree.c  __btrfs_run_delayed_refs256 248 
> -8
> ./fs/btrfs/extent-tree.c  alloc_reserved_file_extent  128 136 
> +8
> ./fs/btrfs/extent-tree.c  btrfs_alloc_logged_file_extent  104 88  
> -16
> ./fs/btrfs/extent-tree.c  btrfs_alloc_reserved_file_extent56  
> 48  -8
> ./fs/btrfs/extent-tree.c  btrfs_alloc_tree_block  176 168 -8
> ./fs/btrfs/extent-tree.c  btrfs_force_chunk_alloc 24  16  -8
> ./fs/btrfs/extent-tree.c  btrfs_free_extent   104 96  -8
> ./fs/btrfs/extent-tree.c  btrfs_free_tree_block   112 104 -8
> ./fs/btrfs/extent-tree.c  btrfs_inc_block_group_ro56  48  
> -8
> ./fs/btrfs/extent-tree.c  btrfs_inc_extent_ref112 104 -8
> ./fs/btrfs/extent-tree.c  caching_thread  216 208 -8
> ./fs/btrfs/extent-tree.c  convert_extent_item_v0  120 112 -8
> ./fs/btrfs/extent-tree.c  insert_inline_extent_backref120 112 
> -8
> ./fs/btrfs/extent-tree.c  lookup_inline_extent_backref176 184 
> +8
> ./fs/btrfs/extent-tree.c  remove_extent_data_ref  104 96  -8
> 
> Also the output of bloat-o-meter : 
> 
> add/remove: 5/5 grow/shrink: 6/24 up/down: 2275/-2554 (-279)
> Function old new   delta
> insert_extent_data_ref - 738+738
> lookup_extent_data_ref - 613+613
> remove_extent_data_ref - 535+535
> lookup_tree_block_ref  - 227+227
> insert_tree_block_ref  - 139+139
> btrfs_inc_extent_ref 235 242  +7
> btrfs_make_block_group   831 837  +6
> update_inline_extent_backref 681 685  +4
> exclude_super_stripes356 360  +4
> free_excluded_extents 95  96  +1
> alloc_reserved_file_extent   954 955  +1
> check_system_chunk   362 361  -1
> insert_inline_extent_backref 224 221  -3
> flush_space 16911688  -3
> cache_block_group   11321129  -3
> btrfs_free_block_groups 11401137  -3
> btrfs_alloc_tree_block  10241021  -3
> remove_extent_backref104 100  -4
> find_free_extent54365431  -5
> convert_extent_item_v0   735 730  -5
> do_chunk_alloc   846 838  -8
> btrfs_remove_block_group28052797  -8
> btrfs_free_extent306 298  -8
> btrfs_alloc_data_chunk_ondemand 12421234  -8
> btrfs_free_tree_block862 853  -9
> btrfs_force_chunk_alloc   45  35 -10
> btrfs_read_block_groups 22452233 -12
> btrfs_alloc_logged_file_extent   249 237 -12
> btrfs_alloc_reserved_file_extent  70  57 -13
> btrfs_inc_block_group_ro 352 338 -14
> lookup_inline_extent_backref15321516 -16
> caching_thread  14651446 -19
> __btrfs_run_delayed_refs54695446 -23
> __btrfs_inc_extent_ref.isra  608 566 -42
> __btrfs_free_extent.isra31363082 -54
> insert_tree_block_ref.isra   131   --131
> lookup_tree_block_ref.isra   216   --216
> remove_extent_data_ref.isra  543   --543
> lookup_extent_data_ref.isra  649   --649
> insert_extent_data_ref.isra

Re: [PATCH 34/34] btrfs: Remove fs_info from convert_extent_item_v0

2018-06-20 Thread Qu Wenruo




On 2018年06月20日 20:49, Nikolay Borisov wrote:
> It can be referenced from trans since the function is always called
> within a transaction
> 
> Signed-off-by: Nikolay Borisov 
> ---
>  fs/btrfs/extent-tree.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index c3c3e6f3b72c..9c0e15b057a0 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1038,10 +1038,10 @@ int btrfs_lookup_extent_info(struct 
> btrfs_trans_handle *trans,
>  
>  #ifdef BTRFS_COMPAT_EXTENT_TREE_V0

Do we really need to update this code?

V0 extent tree is already deprecated, maybe it's a good time to remove
them now?

Thanks,
Qu

>  static int convert_extent_item_v0(struct btrfs_trans_handle *trans,
> -   struct btrfs_fs_info *fs_info,
> struct btrfs_path *path,
> u64 owner, u32 extra_size)
>  {
> + struct btrfs_fs_info *fs_info = trans->fs_info;
>   struct btrfs_root *root = fs_info->extent_root;
>   struct btrfs_extent_item *item;
>   struct btrfs_extent_item_v0 *ei0;
> @@ -1682,8 +1682,7 @@ int lookup_inline_extent_backref(struct 
> btrfs_trans_handle *trans,
>   err = -ENOENT;
>   goto out;
>   }
> - ret = convert_extent_item_v0(trans, fs_info, path, owner,
> -  extra_size);
> + ret = convert_extent_item_v0(trans, path, owner, extra_size);
>   if (ret < 0) {
>   err = ret;
>   goto out;
> @@ -2384,7 +2383,7 @@ static int run_delayed_extent_op(struct 
> btrfs_trans_handle *trans,
>   item_size = btrfs_item_size_nr(leaf, path->slots[0]);
>  #ifdef BTRFS_COMPAT_EXTENT_TREE_V0
>   if (item_size < sizeof(*ei)) {
> - ret = convert_extent_item_v0(trans, fs_info, path, (u64)-1, 0);
> + ret = convert_extent_item_v0(trans, path, (u64)-1, 0);
>   if (ret < 0) {
>   err = ret;
>   goto out;
> @@ -6937,8 +6936,7 @@ static int __btrfs_free_extent(struct 
> btrfs_trans_handle *trans,
>  #ifdef BTRFS_COMPAT_EXTENT_TREE_V0
>   if (item_size < sizeof(*ei)) {
>   BUG_ON(found_extent || extent_slot != path->slots[0]);
> - ret = convert_extent_item_v0(trans, info, path, owner_objectid,
> -  0);
> + ret = convert_extent_item_v0(trans, path, owner_objectid, 0);
>   if (ret < 0) {
>   btrfs_abort_transaction(trans, ret);
>   goto out;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [LKP] [lkp-robot] [mm] 9092c71bb7: blogbench.write_score -12.3% regression

2018-06-20 Thread Huang, Ying

Chris Mason  writes:

> On 19 Jun 2018, at 23:51, Huang, Ying wrote:
 "Huang, Ying"  writes:

> Hi, Josef,
>
> Do you have time to take a look at the regression?
>
> kernel test robot  writes:
>
>> Greeting,
>>
>> FYI, we noticed a -12.3% regression of blogbench.write_score and
>> a +9.6% improvement
>> of blogbench.read_score due to commit:
>>
>>
>> commit: 9092c71bb724dba2ecba849eae69e5c9d39bd3d2 ("mm: use
>> sc->priority for slab shrink targets")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
>> master
>>
>> in testcase: blogbench
>> on test machine: 16 threads Intel(R) Xeon(R) CPU D-1541 @
>> 2.10GHz with 8G memory
>> with following parameters:
>>
>>  disk: 1SSD
>>  fs: btrfs
>>  cpufreq_governor: performance
>>
>> test-description: Blogbench is a portable filesystem benchmark
>> that tries to reproduce the load of a real-world busy file
>> server.
>> test-url:
>
> I'm surprised, this patch is a big win in production here at FB.  I'll
> have to reproduce these results to better understand what is going on.
> My first guess is that since we have fewer inodes in slab, we're
> reading more inodes from disk in order to do the writes.
>
> But that should also make our read scores lower.

Thanks for looking at this.  If you need more information, please let me
know.

Best Regards,
Huang, Ying

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 3/3] btrfs: Fix a C compliance issue

2018-06-20 Thread David Sterba

On Thu, Jun 21, 2018 at 05:16:06AM +0800, kbuild test robot wrote:
> Hi Bart,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on v4.18-rc1]
> [also build test ERROR on next-20180620]
> [cannot apply to btrfs/next]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Bart-Van-Assche/Three-patches-that-address-static-analyzer-reports/20180621-041247
> config: m68k-sun3_defconfig (attached as .config)
> compiler: m68k-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=7.2.0 make.cross ARCH=m68k 
> 
> All errors (new ones prefixed by >>):
> 
>fs//btrfs/super.c: In function 'btrfs_print_mod_info':
> >> fs//btrfs/super.c:2386:4: error: expected expression before ';' token
>;
>^

That's probably because none of the config option is turned on and this
results to

const char options[] = ;

but a "" should fix it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 3/3] btrfs: Fix a C compliance issue

2018-06-20 Thread kbuild test robot

Hi Bart,

I love your patch! Yet something to improve:

[auto build test ERROR on v4.18-rc1]
[also build test ERROR on next-20180620]
[cannot apply to btrfs/next]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Bart-Van-Assche/Three-patches-that-address-static-analyzer-reports/20180621-041247
config: m68k-sun3_defconfig (attached as .config)
compiler: m68k-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=m68k 

All errors (new ones prefixed by >>):

   fs//btrfs/super.c: In function 'btrfs_print_mod_info':
>> fs//btrfs/super.c:2386:4: error: expected expression before ';' token
   ;
   ^

vim +2386 fs//btrfs/super.c

  2370  
  2371  static void __init btrfs_print_mod_info(void)
  2372  {
  2373  static const char options[] =
  2374  #ifdef CONFIG_BTRFS_DEBUG
  2375  ", debug=on"
  2376  #endif
  2377  #ifdef CONFIG_BTRFS_ASSERT
  2378  ", assert=on"
  2379  #endif
  2380  #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
  2381  ", integrity-checker=on"
  2382  #endif
  2383  #ifdef CONFIG_BTRFS_FS_REF_VERIFY
  2384  ", ref-verify=on"
  2385  #endif
> 2386  ;
  2387  pr_info("Btrfs loaded, crc32c=%s%s\n", crc32c_impl(), options);
  2388  }
  2389  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH RFC 0/2] Btrfs: fix file data corruptions due to lost dirty bits

2018-06-20 Thread David Sterba

On Wed, Jun 20, 2018 at 03:48:08PM -0400, Chris Mason wrote:
> 
> 
> On 20 Jun 2018, at 15:33, David Sterba wrote:
> 
> > On Wed, Jun 20, 2018 at 07:56:10AM -0700, Chris Mason wrote:
> >> We've been hunting the root cause of data crc errors here at FB for a 
> >> while.
> >> We'd find one or two corrupted files, usually displaying crc errors 
> >> without any
> >> corresponding IO errors from the storage.  The bug was rare enough 
> >> that we'd
> >> need to watch a large number of machines for a few days just to catch 
> >> it
> >> happening.
> >>
> >> We're still running these patches through testing, but the fixup 
> >> worker bug
> >> seems to account for the vast majority of crc errors we're seeing in 
> >> the fleet.
> >> It's cleaning pages that were dirty, and creating a window where they 
> >> can be
> >> reclaimed before we finish processing the page.
> >
> > I'm having flashbacks when I see 'fixup worker',
> 
> Yeah, I don't understand how so much pain can live in one little 
> function.
> 
> > and the test generic/208 does not make it better:
> >
> > generic/095 [18:07:03][ 3769.317862] run fstests generic/095 at 
> > 2018-06-20 18:07:03
> 
> Hmpf, I pass both 095 and 208 here.
> 
> > [ 3774.849685] BTRFS: device fsid 3acffad9-28e5-43ce-80e1-f5032e334cba 
> > devid 1 transid 5 /dev/vdb
> > [ 3774.875409] BTRFS info (device vdb): disk space caching is enabled
> > [ 3774.877723] BTRFS info (device vdb): has skinny extents
> > [ 3774.879371] BTRFS info (device vdb): flagging fs with big metadata 
> > feature
> > [ 3774.885020] BTRFS info (device vdb): checking UUID tree
> > [ 3775.593329] Page cache invalidation failure on direct I/O.  
> > Possible data corruption due to collision with buffered I/O!
> > [ 3775.596979] File: /tmp/scratch/file2 PID: 12031 Comm: kworker/1:1
> > [ 3776.642812] Page cache invalidation failure on direct I/O.  
> > Possible data corruption due to collision with buffered I/O!
> > [ 3776.645041] File: /tmp/scratch/file2 PID: 12033 Comm: kworker/3:0
> > [ 3776.920634] WARNING: CPU: 0 PID: 12036 at fs/btrfs/inode.c:9319 
> > btrfs_destroy_inode+0x1d5/0x290 [btrfs]
> 
> 
> Which warning is this in your tree?  The file_write patch is more likely 
> to have screwed up our bits and the fixup worker is more likely to have 
> screwed up nrpages.

 9311 void btrfs_destroy_inode(struct inode *inode)
 9312 {
 9313 struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 9314 struct btrfs_ordered_extent *ordered;
 9315 struct btrfs_root *root = BTRFS_I(inode)->root;
 9316
 9317 WARN_ON(!hlist_empty(&inode->i_dentry));
 9318 WARN_ON(inode->i_data.nrpages);
 9319 WARN_ON(BTRFS_I(inode)->block_rsv.reserved);

The branch is the last pull, ie. no other 4.18-rc1 stuff plus your two patches.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/2] Btrfs: fix file data corruptions due to lost dirty bits

2018-06-20 Thread Chris Mason





On 20 Jun 2018, at 15:33, David Sterba wrote:


On Wed, Jun 20, 2018 at 07:56:10AM -0700, Chris Mason wrote:
We've been hunting the root cause of data crc errors here at FB for a 
while.
We'd find one or two corrupted files, usually displaying crc errors 
without any
corresponding IO errors from the storage.  The bug was rare enough 
that we'd
need to watch a large number of machines for a few days just to catch 
it

happening.

We're still running these patches through testing, but the fixup 
worker bug
seems to account for the vast majority of crc errors we're seeing in 
the fleet.
It's cleaning pages that were dirty, and creating a window where they 
can be

reclaimed before we finish processing the page.


I'm having flashbacks when I see 'fixup worker',


Yeah, I don't understand how so much pain can live in one little 
function.



and the test generic/208 does not make it better:

generic/095		[18:07:03][ 3769.317862] run fstests generic/095 at 
2018-06-20 18:07:03


Hmpf, I pass both 095 and 208 here.

[ 3774.849685] BTRFS: device fsid 3acffad9-28e5-43ce-80e1-f5032e334cba 
devid 1 transid 5 /dev/vdb

[ 3774.875409] BTRFS info (device vdb): disk space caching is enabled
[ 3774.877723] BTRFS info (device vdb): has skinny extents
[ 3774.879371] BTRFS info (device vdb): flagging fs with big metadata 
feature

[ 3774.885020] BTRFS info (device vdb): checking UUID tree
[ 3775.593329] Page cache invalidation failure on direct I/O.  
Possible data corruption due to collision with buffered I/O!

[ 3775.596979] File: /tmp/scratch/file2 PID: 12031 Comm: kworker/1:1
[ 3776.642812] Page cache invalidation failure on direct I/O.  
Possible data corruption due to collision with buffered I/O!

[ 3776.645041] File: /tmp/scratch/file2 PID: 12033 Comm: kworker/3:0
[ 3776.920634] WARNING: CPU: 0 PID: 12036 at fs/btrfs/inode.c:9319 
btrfs_destroy_inode+0x1d5/0x290 [btrfs]



Which warning is this in your tree?  The file_write patch is more likely 
to have screwed up our bits and the fixup worker is more likely to have 
screwed up nrpages.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/2] Btrfs: fix file data corruptions due to lost dirty bits

2018-06-20 Thread David Sterba

On Wed, Jun 20, 2018 at 07:56:10AM -0700, Chris Mason wrote:
> We've been hunting the root cause of data crc errors here at FB for a while.
> We'd find one or two corrupted files, usually displaying crc errors without 
> any
> corresponding IO errors from the storage.  The bug was rare enough that we'd
> need to watch a large number of machines for a few days just to catch it
> happening.
> 
> We're still running these patches through testing, but the fixup worker bug
> seems to account for the vast majority of crc errors we're seeing in the 
> fleet.
> It's cleaning pages that were dirty, and creating a window where they can be
> reclaimed before we finish processing the page.

I'm having flashbacks when I see 'fixup worker', and the test generic/208 does
not make it better:

generic/095 [18:07:03][ 3769.317862] run fstests generic/095 at 
2018-06-20 18:07:03
[ 3774.849685] BTRFS: device fsid 3acffad9-28e5-43ce-80e1-f5032e334cba devid 1 
transid 5 /dev/vdb
[ 3774.875409] BTRFS info (device vdb): disk space caching is enabled
[ 3774.877723] BTRFS info (device vdb): has skinny extents
[ 3774.879371] BTRFS info (device vdb): flagging fs with big metadata feature
[ 3774.885020] BTRFS info (device vdb): checking UUID tree
[ 3775.593329] Page cache invalidation failure on direct I/O.  Possible data 
corruption due to collision with buffered I/O!
[ 3775.596979] File: /tmp/scratch/file2 PID: 12031 Comm: kworker/1:1
[ 3776.642812] Page cache invalidation failure on direct I/O.  Possible data 
corruption due to collision with buffered I/O!
[ 3776.645041] File: /tmp/scratch/file2 PID: 12033 Comm: kworker/3:0
[ 3776.920634] WARNING: CPU: 0 PID: 12036 at fs/btrfs/inode.c:9319 
btrfs_destroy_inode+0x1d5/0x290 [btrfs]
[ 3776.924182] Modules linked in: btrfs libcrc32c xor zstd_decompress 
zstd_compress xxhash raid6_pq loop [last unloaded: libcrc32c]
[ 3776.927703] CPU: 0 PID: 12036 Comm: umount Not tainted 4.17.0-rc7-default+ 
#153
[ 3776.929164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.0.0-prebuilt.qemu-project.org 04/01/2014
[ 3776.931006] RIP: 0010:btrfs_destroy_inode+0x1d5/0x290 [btrfs]
[ 3776.932052] RSP: 0018:b2dac5943dc8 EFLAGS: 00010206
[ 3776.933066] RAX: 9ab763fe1000 RBX: 9ab7796bf4d8 RCX: 
[ 3776.934366] RDX:  RSI:  RDI: 9ab7796bf4d8
[ 3776.935708] RBP: b2dac5943e38 R08:  R09: 0002
[ 3776.93] R10: b2dac5943d28 R11: f9929087e0f2246e R12: 9ab7796bf4d8
[ 3776.937511] R13: a1dfb4b1 R14: 9ab775c657a0 R15: 9ab7796bd4b8
[ 3776.938346] FS:  7f0c97635fc0() GS:9ab77fc0() 
knlGS:
[ 3776.939502] CS:  0010 DS:  ES:  CR0: 80050033
[ 3776.940701] CR2: 7f0c96f7f793 CR3: 63819000 CR4: 06f0
[ 3776.942396] Call Trace:
[ 3776.942994]  dispose_list+0x51/0x80
[ 3776.943758]  evict_inodes+0x15b/0x1b0
[ 3776.944558]  generic_shutdown_super+0x3a/0x110
[ 3776.945501]  kill_anon_super+0xe/0x20
[ 3776.946272]  btrfs_kill_super+0x12/0xa0 [btrfs]
[ 3776.947313]  deactivate_locked_super+0x34/0x60
[ 3776.948421]  cleanup_mnt+0x3b/0x70
[ 3776.949201]  task_work_run+0x8d/0xc0
[ 3776.949971]  exit_to_usermode_loop+0x99/0xa0
[ 3776.950872]  do_syscall_64+0x17d/0x190
[ 3776.951783]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 3776.952724] RIP: 0033:0x7f0c96efea57
[ 3776.953320] RSP: 002b:7ffc3ae13b98 EFLAGS: 0246 ORIG_RAX: 
00a6
[ 3776.954294] RAX:  RBX: 55cf12f21970 RCX: 7f0c96efea57
[ 3776.955196] RDX: 0001 RSI:  RDI: 55cf12f21b50
[ 3776.956648] RBP:  R08: 0005 R09: 
[ 3776.957964] R10: 55cf12f21b70 R11: 0246 R12: 55cf12f21b50
[ 3776.959657] R13: 7f0c974191c4 R14:  R15: 
[ 3776.961345] Code: ef e8 90 a7 fe ff e9 5f ff ff ff 0f 0b 48 83 bb d8 02 00 
00 00 0f 84 76 fe ff ff 0f 0b 48 83 bb f0 fe ff ff 00 0f 84 74 fe ff ff <0f> 0b 
48 83 bb e8 fe ff ff 00 0f 84 72 fe ff ff 0f 0b 8b 93 e4 
[ 3776.965122] irq event stamp: 12936
[ 3776.965598] hardirqs last  enabled at (12935): [] 
_raw_spin_unlock_irq+0x29/0x50
[ 3776.966691] hardirqs last disabled at (12936): [] 
error_entry+0x6c/0xc0
[ 3776.968171] softirqs last  enabled at (5088): [] 
__do_softirq+0x3a8/0x518
[ 3776.969521] softirqs last disabled at (5065): [] 
irq_exit+0xc1/0xd0
[ 3776.971686] ---[ end trace e11771ebe2e788d0 ]---
[ 3776.972746] WARNING: CPU: 0 PID: 12036 at fs/btrfs/inode.c:9320 
btrfs_destroy_inode+0x1e5/0x290 [btrfs]
[ 3776.974875] Modules linked in: btrfs libcrc32c xor zstd_decompress 
zstd_compress xxhash raid6_pq loop [last unloaded: libcrc32c]
[ 3776.977451] CPU: 0 PID: 12036 Comm: umount Tainted: GW 
4.17.0-rc7-default+ #153
[ 3776.978663] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.0.0-prebuilt.qemu-project.org 04/01/2014
[ 3776.980291]

Re: [PATCH] btrfs: Fix a C compliance issue

2018-06-20 Thread Bart Van Assche

On Wed, 2018-06-20 at 13:19 -0400, Jeff Mahoney wrote:
> The shed should be yellow.
> 
> -Jeff
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 891cd2ed5dd4..57c9da0b459f 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2375,21 +2375,20 @@ static __cold void btrfs_interface_exit(void)
> 
>  static void __init btrfs_print_mod_info(void)
>  {
> - pr_info("Btrfs loaded, crc32c=%s"
> + pr_info("Btrfs loaded, crc32c=%s", crc32c_impl());
>  #ifdef CONFIG_BTRFS_DEBUG
> - ", debug=on"
> + pr_cont(", debug=on");
>  #endif
>  #ifdef CONFIG_BTRFS_ASSERT
> - ", assert=on"
> + pr_cont(", assert=on");
>  #endif
>  #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
> - ", integrity-checker=on"
> + pr_cont(", integrity-checker=on");
>  #endif
>  #ifdef CONFIG_BTRFS_FS_REF_VERIFY
> - ", ref-verify=on"
> + pr_cont(", ref-verify=on")
>  #endif
> - "\n",
> - crc32c_impl());
> + pr_cont("\n");
>  }
> 
>  static int null_open(struct block_device *bdev, fmode_t mode)

Since we are doing bikeshedding, let me contribute to it :-)

From scripts/checkpatch.pl:

if ($line =~ /\bprintk\s*\(\s*KERN_CONT\b|\bpr_cont\s*\(/) {
WARN("LOGGING_CONTINUATION",
 "Avoid logging continuation uses where feasible\n" 
. $herecurr);
}

Bart.

[PATCH 4/7] btrfs: lift uuid_mutex to callers of btrfs_open_devices

2018-06-20 Thread David Sterba

Prepartory work to fix race between mount and device scan.

The callers will have to manage the critical section, eg.  mount wants
to scan and then call btrfs_open_devices without the ioctl scan walking
in and modifying the fs devices in the meantime.

Signed-off-by: David Sterba 
---
 fs/btrfs/super.c   | 2 ++
 fs/btrfs/volumes.c | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 735402ed3154..ee82d02f5453 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1569,7 +1569,9 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
goto error_fs_info;
}
 
+   mutex_lock(&uuid_mutex);
error = btrfs_open_devices(fs_devices, mode, fs_type);
+   mutex_unlock(&uuid_mutex);
if (error)
goto error_fs_info;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 958bfe1a725c..5336d9832ba4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1145,7 +1145,8 @@ int btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
 {
int ret;
 
-   mutex_lock(&uuid_mutex);
+   lockdep_assert_held(&uuid_mutex);
+
mutex_lock(&fs_devices->device_list_mutex);
if (fs_devices->opened) {
fs_devices->opened++;
@@ -1155,7 +1156,6 @@ int btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
ret = open_fs_devices(fs_devices, flags, holder);
}
mutex_unlock(&fs_devices->device_list_mutex);
-   mutex_unlock(&uuid_mutex);
 
return ret;
 }
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/7] btrfs: reorder initialization before the mount locks uuid_mutex

2018-06-20 Thread David Sterba

In preparation to take a big lock, move resource initialization before
the critical section. It's not obvious from the diff, the desired order
is:

- initialize mount security options
- allocate temporary fs_info
- allocate superblock buffers

Signed-off-by: David Sterba 
---
 fs/btrfs/super.c | 30 ++
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index e3324ddf2777..1780eb41f203 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1528,14 +1528,6 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
if (!(flags & SB_RDONLY))
mode |= FMODE_WRITE;
 
-   mutex_lock(&uuid_mutex);
-   error = btrfs_parse_early_options(data, mode, fs_type,
- &fs_devices);
-   mutex_unlock(&uuid_mutex);
-   if (error) {
-   return ERR_PTR(error);
-   }
-
security_init_mnt_opts(&new_sec_opts);
if (data) {
error = parse_security_options(data, &new_sec_opts);
@@ -1543,12 +1535,6 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
return ERR_PTR(error);
}
 
-   mutex_lock(&uuid_mutex);
-   error = btrfs_scan_one_device(device_name, mode, fs_type, &fs_devices);
-   mutex_unlock(&uuid_mutex);
-   if (error)
-   goto error_sec_opts;
-
/*
 * Setup a dummy root and fs_info for test/set super.  This is because
 * we don't actually fill this stuff out until open_ctree, but we need
@@ -1561,8 +1547,6 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
goto error_sec_opts;
}
 
-   fs_info->fs_devices = fs_devices;
-
fs_info->super_copy = kzalloc(BTRFS_SUPER_INFO_SIZE, GFP_KERNEL);
fs_info->super_for_commit = kzalloc(BTRFS_SUPER_INFO_SIZE, GFP_KERNEL);
security_init_mnt_opts(&fs_info->security_opts);
@@ -1571,6 +1555,20 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
goto error_fs_info;
}
 
+   mutex_lock(&uuid_mutex);
+   error = btrfs_parse_early_options(data, mode, fs_type, &fs_devices);
+   mutex_unlock(&uuid_mutex);
+   if (error)
+   goto error_fs_info;
+
+   mutex_lock(&uuid_mutex);
+   error = btrfs_scan_one_device(device_name, mode, fs_type, &fs_devices);
+   mutex_unlock(&uuid_mutex);
+   if (error)
+   goto error_fs_info;
+
+   fs_info->fs_devices = fs_devices;
+
mutex_lock(&uuid_mutex);
error = btrfs_open_devices(fs_devices, mode, fs_type);
mutex_unlock(&uuid_mutex);
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/7] btrfs: fix mount and ioctl device scan ioctl race

2018-06-20 Thread David Sterba

Technically this extends the critical section covered by uuid_mutex to:

- parse early mount options -- here we can call device scan on paths
  that can be passed as 'device=/dev/...'

- scan the device passed to mount

- open the devices related to the fs_devices -- this increases
  fs_devices::opened

The race can happen when mount calls one of the scans and there's
another one called eg. by mkfs or 'btrfs dev scan':

Mount  Scan
-  
scan_one_device (dev1, fsid1)
   scan_one_device (dev2, fsid2)
   add the device
   free stale devices
   fsid1 fs_devices::opened == 0
   find fsid1:dev1
   free fsid1:dev1
   if it's the last one,
free fs_devices of fsid1
too

open_devices (dev1, fsid1)
   dev1 not found

When fixed, the uuid mutex will make sure that mount will increase
fs_devices::opened and this will not be touched by the racing scan
ioctl.

Reported-and-tested-by: syzbot+909a5177749d7990f...@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+ceb2606025ec1cc34...@syzkaller.appspotmail.com
Signed-off-by: David Sterba 
---
 fs/btrfs/super.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 1780eb41f203..b13b871bc584 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1557,19 +1557,19 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
 
mutex_lock(&uuid_mutex);
error = btrfs_parse_early_options(data, mode, fs_type, &fs_devices);
-   mutex_unlock(&uuid_mutex);
-   if (error)
+   if (error) {
+   mutex_unlock(&uuid_mutex);
goto error_fs_info;
+   }
 
-   mutex_lock(&uuid_mutex);
error = btrfs_scan_one_device(device_name, mode, fs_type, &fs_devices);
-   mutex_unlock(&uuid_mutex);
-   if (error)
+   if (error) {
+   mutex_unlock(&uuid_mutex);
goto error_fs_info;
+   }
 
fs_info->fs_devices = fs_devices;
 
-   mutex_lock(&uuid_mutex);
error = btrfs_open_devices(fs_devices, mode, fs_type);
mutex_unlock(&uuid_mutex);
if (error)
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/7] btrfs: lift uuid_mutex to callers of btrfs_scan_one_device

2018-06-20 Thread David Sterba

Prepartory work to fix race between mount and device scan.

The callers will have to manage the critical section, eg. mount wants to
scan and then call btrfs_open_devices without the ioctl scan walking in
and modifying the fs devices in the meantime.

Signed-off-by: David Sterba 
---
 fs/btrfs/super.c   | 12 +++-
 fs/btrfs/volumes.c |  4 ++--
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 81107ad49f3a..735402ed3154 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -917,8 +917,10 @@ static int btrfs_parse_early_options(const char *options, 
fmode_t flags,
error = -ENOMEM;
goto out;
}
+   mutex_lock(&uuid_mutex);
error = btrfs_scan_one_device(device_name,
flags, holder, fs_devices);
+   mutex_unlock(&uuid_mutex);
kfree(device_name);
if (error)
goto out;
@@ -1539,7 +1541,9 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
return ERR_PTR(error);
}
 
+   mutex_lock(&uuid_mutex);
error = btrfs_scan_one_device(device_name, mode, fs_type, &fs_devices);
+   mutex_unlock(&uuid_mutex);
if (error)
goto error_sec_opts;
 
@@ -2234,15 +2238,21 @@ static long btrfs_control_ioctl(struct file *file, 
unsigned int cmd,
 
switch (cmd) {
case BTRFS_IOC_SCAN_DEV:
+   mutex_lock(&uuid_mutex);
ret = btrfs_scan_one_device(vol->name, FMODE_READ,
&btrfs_root_fs_type, &fs_devices);
+   mutex_unlock(&uuid_mutex);
break;
case BTRFS_IOC_DEVICES_READY:
+   mutex_lock(&uuid_mutex);
ret = btrfs_scan_one_device(vol->name, FMODE_READ,
&btrfs_root_fs_type, &fs_devices);
-   if (ret)
+   if (ret) {
+   mutex_unlock(&uuid_mutex);
break;
+   }
ret = !(fs_devices->num_devices == fs_devices->total_devices);
+   mutex_unlock(&uuid_mutex);
break;
case BTRFS_IOC_GET_SUPPORTED_FEATURES:
ret = btrfs_ioctl_get_supported_features((void __user*)arg);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 02246f9af0a3..958bfe1a725c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1226,6 +1226,8 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
int ret = 0;
u64 bytenr;
 
+   lockdep_assert_held(&uuid_mutex);
+
/*
 * we would like to check all the supers, but that would make
 * a btrfs mount succeed after a mkfs from a different FS.
@@ -1244,13 +1246,11 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
goto error_bdev_put;
}
 
-   mutex_lock(&uuid_mutex);
device = device_list_add(path, disk_super);
if (IS_ERR(device))
ret = PTR_ERR(device);
else
*fs_devices_ret = device->fs_devices;
-   mutex_unlock(&uuid_mutex);
 
btrfs_release_disk_super(page);
 
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/7] btrfs: restore uuid_mutex in btrfs_open_devices

2018-06-20 Thread David Sterba

Commit 542c5908abfe84f7b4c1 ("btrfs: replace uuid_mutex by
device_list_mutex in btrfs_open_devices") switched to device_list_mutex
as we need that for the device list traversal, but we also need
uuid_mutex to protect access to fs_devices::opened to be consistent with
other users of that item.

CC: Anand Jain 
Signed-off-by: David Sterba 
---
 fs/btrfs/volumes.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e034ad9e23b4..1da162928d1a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1146,6 +1146,7 @@ int btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
 {
int ret;
 
+   mutex_lock(&uuid_mutex);
mutex_lock(&fs_devices->device_list_mutex);
if (fs_devices->opened) {
fs_devices->opened++;
@@ -1155,6 +1156,7 @@ int btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
ret = open_fs_devices(fs_devices, flags, holder);
}
mutex_unlock(&fs_devices->device_list_mutex);
+   mutex_unlock(&uuid_mutex);
 
return ret;
 }
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/7] btrfs: lift uuid_mutex to callers of btrfs_parse_early_options

2018-06-20 Thread David Sterba

Prepartory work to fix race between mount and device scan.

btrfs_parse_early_options calls the device scan from mount and we'll
need to let mount completely manage the critical section.

Signed-off-by: David Sterba 
---
 fs/btrfs/super.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index ee82d02f5453..e3324ddf2777 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -892,6 +892,8 @@ static int btrfs_parse_early_options(const char *options, 
fmode_t flags,
char *device_name, *opts, *orig, *p;
int error = 0;
 
+   lockdep_assert_held(&uuid_mutex);
+
if (!options)
return 0;
 
@@ -917,10 +919,8 @@ static int btrfs_parse_early_options(const char *options, 
fmode_t flags,
error = -ENOMEM;
goto out;
}
-   mutex_lock(&uuid_mutex);
error = btrfs_scan_one_device(device_name,
flags, holder, fs_devices);
-   mutex_unlock(&uuid_mutex);
kfree(device_name);
if (error)
goto out;
@@ -1528,8 +1528,10 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
if (!(flags & SB_RDONLY))
mode |= FMODE_WRITE;
 
+   mutex_lock(&uuid_mutex);
error = btrfs_parse_early_options(data, mode, fs_type,
  &fs_devices);
+   mutex_unlock(&uuid_mutex);
if (error) {
return ERR_PTR(error);
}
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/7] btrfs: extend critical section when scanning a new device

2018-06-20 Thread David Sterba

The stale device list removal needs to be protected by device_list_mutex
too as this could delete from the list and could race with another list
modification and cause crash.

The device needs to be fully initialized before it's added to the list
so the fs_devices also need to be set under the mutex.

Signed-off-by: David Sterba 
---
 fs/btrfs/volumes.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1da162928d1a..02246f9af0a3 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -791,12 +791,11 @@ static noinline struct btrfs_device 
*device_list_add(const char *path,
rcu_assign_pointer(device->name, name);
 
mutex_lock(&fs_devices->device_list_mutex);
+   device->fs_devices = fs_devices;
list_add_rcu(&device->dev_list, &fs_devices->devices);
fs_devices->num_devices++;
-   mutex_unlock(&fs_devices->device_list_mutex);
-
-   device->fs_devices = fs_devices;
btrfs_free_stale_devices(path, device);
+   mutex_unlock(&fs_devices->device_list_mutex);
 
if (disk_super->label[0])
pr_info("BTRFS: device label %s devid %llu transid %llu 
%s\n",
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/7] Fix locking when scanning devices

2018-06-20 Thread David Sterba

This patchset fixes the bugs recently reported by syzbot. I've tried to
use patches from Anand [1] to fix that but in the end there were fixes
not suitable for merging to 4.18 and my final fix took a different
approach.

In short, fs_devices::opened is protected by uuid_mutex and this mutex
can be used to exclude mount and scanning to interfere.

The fstests pass and 2 syzbot reproducers reported no problems. I'd like
to push the patchset to 4.18 but not rc2 as it's too close. I'll add the
patchset to for-next soon if there are no major problems found, but
otherwise I'm open to comments.

[1]
https://patchwork.kernel.org/patch/10446779/
https://patchwork.kernel.org/patch/10437707/ 1-6

David Sterba (7):
  btrfs: restore uuid_mutex in btrfs_open_devices
  btrfs: extend critical section when scanning a new device
  btrfs: lift uuid_mutex to callers of btrfs_scan_one_device
  btrfs: lift uuid_mutex to callers of btrfs_open_devices
  btrfs: lift uuid_mutex to callers of btrfs_parse_early_options
  btrfs: reorder initialization before the mount locks uuid_mutex
  btrfs: fix mount and ioctl device scan ioctl race

 fs/btrfs/super.c   | 38 +-
 fs/btrfs/volumes.c | 11 ++-
 2 files changed, 31 insertions(+), 18 deletions(-)

-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: Fix a C compliance issue

2018-06-20 Thread Jeff Mahoney

On 6/20/18 12:55 PM, David Sterba wrote:
> On Wed, Jun 20, 2018 at 04:44:54PM +, Bart Van Assche wrote:
>> On Mon, 2018-06-18 at 12:31 +0300, Nikolay Borisov wrote:
>>> On 18.06.2018 12:26, David Sterba wrote:
 On Sat, Jun 16, 2018 at 01:28:13PM +0300, Nikolay Borisov wrote:
> I'd rather not see more printk being added. Nothing prevents from having
> the fmt string being passed to pr_info.

 So you mean to do

 +  static const char fmt[] = "Btrfs loaded, crc32c=%s"
 +  pr_info(fmt);
>>>
>>> Pretty much, something along the lines of
>>>
>>> pr_info(fmt, crc32c_impl).
>>>
>>> printk requires having the KERN_INFO in the format string, which I see
>>> no point in doing, correct me if I'm wrong?
>>
>> You should know that what you proposed doesn't compile because pr_info()
>> relies on string concatenation and hence requires that its first argument is
>> a string constant instead of a const char pointer. Anyway, I will rework this
>> patch such that it uses pr_info() instead of printk().
> 
> Right, the pr_info(fmt,...) does not compile. The closest version I got to is
> below. It does not look pretty, but I can't think of a better version right
> now.
> 
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2369,7 +2369,8 @@ static __cold void btrfs_interface_exit(void)
>  
>  static void __init btrfs_print_mod_info(void)
>  {
> -   static const char fmt[] = KERN_INFO "Btrfs loaded, crc32c=%s"
> +   static const char fmt1[] = "Btrfs loaded, crc32c=";
> +   static const char fmt2[] =
>  #ifdef CONFIG_BTRFS_DEBUG
> ", debug=on"
>  #endif
> @@ -2383,7 +2384,7 @@ static void __init btrfs_print_mod_info(void)
> ", ref-verify=on"
>  #endif
> "\n";
> -   printk(fmt, crc32c_impl());
> +   pr_info("%s%s%s", fmt1, crc32c_impl(), fmt2);
>  }
>  
>  static int __init init_btrfs_fs(void)

The shed should be yellow.

-Jeff

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 891cd2ed5dd4..57c9da0b459f 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2375,21 +2375,20 @@ static __cold void btrfs_interface_exit(void)

 static void __init btrfs_print_mod_info(void)
 {
-   pr_info("Btrfs loaded, crc32c=%s"
+   pr_info("Btrfs loaded, crc32c=%s", crc32c_impl());
 #ifdef CONFIG_BTRFS_DEBUG
-   ", debug=on"
+   pr_cont(", debug=on");
 #endif
 #ifdef CONFIG_BTRFS_ASSERT
-   ", assert=on"
+   pr_cont(", assert=on");
 #endif
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
-   ", integrity-checker=on"
+   pr_cont(", integrity-checker=on");
 #endif
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
-   ", ref-verify=on"
+   pr_cont(", ref-verify=on")
 #endif
-   "\n",
-   crc32c_impl());
+   pr_cont("\n");
 }

 static int null_open(struct block_device *bdev, fmode_t mode)



-- 
Jeff Mahoney
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/3] Three patches that address static analyzer reports

2018-06-20 Thread Bart Van Assche

Hello Chris and Josef,

The three patches in this series address complaints reported by static
analyzers (gcc + W=1, sparse, smatch). These patches do not change any
functionality. Please consider these for inclusion in the upstream
kernel.

Thanks,

Bart.

Bart Van Assche (3):
  btrfs: Fix indentation
  btrfs: Annotate fall-through
  btrfs: Fix a C compliance issue

 fs/btrfs/extent-tree.c | 4 ++--
 fs/btrfs/ioctl.c   | 4 ++--
 fs/btrfs/reada.c   | 2 +-
 fs/btrfs/super.c   | 7 ---
 4 files changed, 9 insertions(+), 8 deletions(-)

-- 
2.17.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/3] btrfs: Annotate fall-through

2018-06-20 Thread Bart Van Assche

This patch avoids that the compiler complains that a fall-through
annotation is missing when building with W=1.

Signed-off-by: Bart Van Assche 
---
 fs/btrfs/super.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 81107ad49f3a..3e298f26a383 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -760,6 +760,7 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char 
*options,
case Opt_recovery:
btrfs_warn(info,
   "'recovery' is deprecated, use 
'usebackuproot' instead");
+   /* fall through */
case Opt_usebackuproot:
btrfs_info(info,
   "trying to use backup root at mount time");
-- 
2.17.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/3] btrfs: Fix indentation

2018-06-20 Thread Bart Van Assche

This patch avoids that building the BTRFS source code with smatch
triggers complaints about inconsistent indenting.

Signed-off-by: Bart Van Assche 
---
 fs/btrfs/extent-tree.c | 4 ++--
 fs/btrfs/ioctl.c   | 4 ++--
 fs/btrfs/reada.c   | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3d9fe58c0080..db46ceb62b3f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6279,7 +6279,7 @@ static int update_block_group(struct btrfs_trans_handle 
*trans,
if (list_empty(&cache->dirty_list)) {
list_add_tail(&cache->dirty_list,
  &trans->transaction->dirty_bgs);
-   trans->transaction->num_dirty_bgs++;
+   trans->transaction->num_dirty_bgs++;
btrfs_get_block_group(cache);
}
spin_unlock(&trans->transaction->dirty_bgs_lock);
@@ -7534,7 +7534,7 @@ static noinline int find_free_extent(struct btrfs_fs_info 
*fs_info,
 * for the proper type.
 */
if (!block_group_bits(block_group, flags)) {
-   u64 extra = BTRFS_BLOCK_GROUP_DUP |
+   u64 extra = BTRFS_BLOCK_GROUP_DUP |
BTRFS_BLOCK_GROUP_RAID1 |
BTRFS_BLOCK_GROUP_RAID5 |
BTRFS_BLOCK_GROUP_RAID6 |
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index c2837a32d689..93b23549ee1e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2507,8 +2507,8 @@ static int btrfs_search_path_in_tree_user(struct inode 
*inode,
 static noinline int btrfs_ioctl_ino_lookup(struct file *file,
   void __user *argp)
 {
-struct btrfs_ioctl_ino_lookup_args *args;
-struct inode *inode;
+   struct btrfs_ioctl_ino_lookup_args *args;
+   struct inode *inode;
int ret = 0;
 
args = memdup_user(argp, sizeof(*args));
diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index 40f1bcef394d..4be425f70c2d 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -355,7 +355,7 @@ static struct reada_extent *reada_find_extent(struct 
btrfs_fs_info *fs_info,
dev = bbio->stripes[nzones].dev;
 
/* cannot read ahead on missing device. */
-if (!dev->bdev)
+   if (!dev->bdev)
continue;
 
zone = reada_find_zone(dev, logical, bbio);
-- 
2.17.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 3/3] btrfs: Fix a C compliance issue

2018-06-20 Thread Bart Van Assche

The C programming language does not allow to use preprocessor statements
inside macro arguments (pr_info() is defined as a macro). Hence rework
the pr_info() statement in btrfs_print_mod_info() such that it becomes
compliant. This patch allows tools like sparse to analyze the BTRFS
source code.

Fixes: 62e855771dac ("btrfs: convert printk(KERN_* to use pr_* calls")
Signed-off-by: Bart Van Assche 
Cc: Jeff Mahoney 
Cc: David Sterba 
Cc: Nikolay Borisov 
---
 fs/btrfs/super.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 3e298f26a383..972d9fbd7e96 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2370,7 +2370,7 @@ static __cold void btrfs_interface_exit(void)
 
 static void __init btrfs_print_mod_info(void)
 {
-   pr_info("Btrfs loaded, crc32c=%s"
+   static const char options[] =
 #ifdef CONFIG_BTRFS_DEBUG
", debug=on"
 #endif
@@ -2383,8 +2383,8 @@ static void __init btrfs_print_mod_info(void)
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
", ref-verify=on"
 #endif
-   "\n",
-   crc32c_impl());
+   ;
+   pr_info("Btrfs loaded, crc32c=%s%s\n", crc32c_impl(), options);
 }
 
 static int __init init_btrfs_fs(void)
-- 
2.17.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: Fix a C compliance issue

2018-06-20 Thread David Sterba

On Wed, Jun 20, 2018 at 04:44:54PM +, Bart Van Assche wrote:
> On Mon, 2018-06-18 at 12:31 +0300, Nikolay Borisov wrote:
> > On 18.06.2018 12:26, David Sterba wrote:
> > > On Sat, Jun 16, 2018 at 01:28:13PM +0300, Nikolay Borisov wrote:
> > > > I'd rather not see more printk being added. Nothing prevents from having
> > > > the fmt string being passed to pr_info.
> > > 
> > > So you mean to do
> > > 
> > > + static const char fmt[] = "Btrfs loaded, crc32c=%s"
> > > + pr_info(fmt);
> > 
> > Pretty much, something along the lines of
> > 
> > pr_info(fmt, crc32c_impl).
> > 
> > printk requires having the KERN_INFO in the format string, which I see
> > no point in doing, correct me if I'm wrong?
> 
> You should know that what you proposed doesn't compile because pr_info()
> relies on string concatenation and hence requires that its first argument is
> a string constant instead of a const char pointer. Anyway, I will rework this
> patch such that it uses pr_info() instead of printk().

Right, the pr_info(fmt,...) does not compile. The closest version I got to is
below. It does not look pretty, but I can't think of a better version right
now.

--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2369,7 +2369,8 @@ static __cold void btrfs_interface_exit(void)
 
 static void __init btrfs_print_mod_info(void)
 {
-   static const char fmt[] = KERN_INFO "Btrfs loaded, crc32c=%s"
+   static const char fmt1[] = "Btrfs loaded, crc32c=";
+   static const char fmt2[] =
 #ifdef CONFIG_BTRFS_DEBUG
", debug=on"
 #endif
@@ -2383,7 +2384,7 @@ static void __init btrfs_print_mod_info(void)
", ref-verify=on"
 #endif
"\n";
-   printk(fmt, crc32c_impl());
+   pr_info("%s%s%s", fmt1, crc32c_impl(), fmt2);
 }
 
 static int __init init_btrfs_fs(void)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: Fix a C compliance issue

2018-06-20 Thread Bart Van Assche

On Mon, 2018-06-18 at 12:31 +0300, Nikolay Borisov wrote:
> On 18.06.2018 12:26, David Sterba wrote:
> > On Sat, Jun 16, 2018 at 01:28:13PM +0300, Nikolay Borisov wrote:
> > > I'd rather not see more printk being added. Nothing prevents from having
> > > the fmt string being passed to pr_info.
> > 
> > So you mean to do
> > 
> > +   static const char fmt[] = "Btrfs loaded, crc32c=%s"
> > +   pr_info(fmt);
> 
> Pretty much, something along the lines of
> 
> pr_info(fmt, crc32c_impl).
> 
> printk requires having the KERN_INFO in the format string, which I see
> no point in doing, correct me if I'm wrong?

You should know that what you proposed doesn't compile because pr_info()
relies on string concatenation and hence requires that its first argument is
a string constant instead of a const char pointer. Anyway, I will rework this
patch such that it uses pr_info() instead of printk().

Bart.

[PATCH RFC 0/2] Btrfs: fix file data corruptions due to lost dirty bits

2018-06-20 Thread Chris Mason

We've been hunting the root cause of data crc errors here at FB for a while.
We'd find one or two corrupted files, usually displaying crc errors without any
corresponding IO errors from the storage.  The bug was rare enough that we'd
need to watch a large number of machines for a few days just to catch it
happening.

We're still running these patches through testing, but the fixup worker bug
seems to account for the vast majority of crc errors we're seeing in the fleet.
It's cleaning pages that were dirty, and creating a window where they can be
reclaimed before we finish processing the page.

btrfs_file_write() has a similar bug when copy_from_user catches a page fault
and we're writing to a page that was already dirty when file_write started.
This one is much harder to trigger, and I haven't confirmed yet that we're
seeing it in the fleet.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] Btrfs: keep pages dirty when using btrfs_writepage_fixup_worker

2018-06-20 Thread Chris Mason

For COW, btrfs expects pages dirty pages to have been through a few setup
steps.  This includes reserving space for the new block allocations and marking
the range in the state tree for delayed allocation.

A few places outside btrfs will dirty pages directly, especially when unmapping
mmap'd pages.  In order for these to properly go through COW, we run them
through a fixup worker to wait for stable pages, and do the delalloc prep.

87826df0ec36 added a window where the dirty pages were cleaned, but pending
more action from the fixup worker.  During this window, page migration can jump
in and relocate the page.  Once our fixup work actually starts, it finds
page->mapping is NULL and we end up freeing the page without ever writing it.

This leads to crc errors and other exciting problems, since it screws up the
whole statemachine for waiting for ordered extents.  The fix here is to keep
the page dirty while we're waiting for the fixup worker to get to work.  This
also makes sure the error handling in btrfs_writepage_fixup_worker does the
right thing with dirty bits when we run out of space.

Signed-off-by: Chris Mason 
---
 fs/btrfs/inode.c | 67 +---
 1 file changed, 49 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0b86cf1..5538900 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2100,11 +2100,21 @@ static void btrfs_writepage_fixup_worker(struct 
btrfs_work *work)
page = fixup->page;
 again:
lock_page(page);
-   if (!page->mapping || !PageDirty(page) || !PageChecked(page)) {
-   ClearPageChecked(page);
+
+   /*
+* before we queued this fixup, we took a reference on the page.
+* page->mapping may go NULL, but it shouldn't be moved to a
+* different address space.
+*/
+   if (!page->mapping || !PageDirty(page) || !PageChecked(page))
goto out_page;
-   }
 
+   /*
+* we keep the PageChecked() bit set until we're done with the
+* btrfs_start_ordered_extent() dance that we do below.  That
+* drops and retakes the page lock, so we don't want new
+* fixup workers queued for this page during the churn.
+*/
inode = page->mapping->host;
page_start = page_offset(page);
page_end = page_offset(page) + PAGE_SIZE - 1;
@@ -2129,33 +2139,46 @@ static void btrfs_writepage_fixup_worker(struct 
btrfs_work *work)
 
ret = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start,
   PAGE_SIZE);
-   if (ret) {
-   mapping_set_error(page->mapping, ret);
-   end_extent_writepage(page, ret, page_start, page_end);
-   ClearPageChecked(page);
-   goto out;
-}
+   if (ret)
+   goto out_error;
 
ret = btrfs_set_extent_delalloc(inode, page_start, page_end, 0,
&cached_state, 0);
-   if (ret) {
-   mapping_set_error(page->mapping, ret);
-   end_extent_writepage(page, ret, page_start, page_end);
-   ClearPageChecked(page);
-   goto out;
-   }
+   if (ret)
+   goto out_error;
 
-   ClearPageChecked(page);
-   set_page_dirty(page);
btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, false);
+
+   /*
+* everything went as planned, we're now the proud owners of a
+* Dirty page with delayed allocation bits set and space reserved
+* for our COW destination.
+*
+* The page was dirty when we started, nothing should have cleaned it.
+*/
+   BUG_ON(!PageDirty(page));
+
 out:
unlock_extent_cached(&BTRFS_I(inode)->io_tree, page_start, page_end,
 &cached_state);
 out_page:
+   ClearPageChecked(page);
unlock_page(page);
put_page(page);
kfree(fixup);
extent_changeset_free(data_reserved);
+   return;
+
+out_error:
+   /*
+* We hit ENOSPC or other errors.  Update the mapping and page to
+* reflect the errors and clean the page.
+*/
+   mapping_set_error(page->mapping, ret);
+   end_extent_writepage(page, ret, page_start, page_end);
+   clear_page_dirty_for_io(page);
+   SetPageError(page);
+   goto out;
 }
 
 /*
@@ -2179,6 +2202,13 @@ static int btrfs_writepage_start_hook(struct page *page, 
u64 start, u64 end)
if (TestClearPagePrivate2(page))
return 0;
 
+   /*
+* PageChecked is set below when we create a fixup worker for this page,
+* don't try to create another one if we're already PageChecked()
+*
+* The extent_io writepage code will redirty the page if we send
+* back EAGAIN.
+*/
if (PageChecked(page))
return -EAGAIN;
 
@@ -2192,7 +,8 @@ static int btr

[PATCH 1/2] Btrfs: don't clean dirty pages during buffered writes

2018-06-20 Thread Chris Mason

During buffered writes, we follow this basic series of steps:

again:
lock all the pages
wait for writeback on all the pages
Take the extent range lock
wait for ordered extents on the whole range
clean all the pages

if (copy_from_user_in_atomic() hits a fault) {
drop our locks
goto again;
}

dirty all the pages
release all the locks

The extra waiting, cleaning and locking are there to make sure we don't
modify pages in flight to the drive, after they've been crc'd.

If some of the pages in the range were already dirty when the write
began, and we need to goto again, we create a window where a dirty page
has been cleaned and unlocked.  It may be reclaimed before we're able to
lock it again, which means we'll read the old contents off the drive and
lose any modifications that had been pending writeback.

We don't actually need to clean the pages.  All of the other locking in
place makes sure we don't start IO on the pages, so we can just leave
them dirty for the duration of the write.

Fixes: 73d59314e6ed (the original btrfs merge)
Signed-off-by: Chris Mason 
---
 fs/btrfs/file.c | 30 --
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f660ba1..89ec4d2 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -534,6 +534,15 @@ int btrfs_dirty_pages(struct inode *inode, struct page 
**pages,
 
end_of_last_block = start_pos + num_bytes - 1;
 
+   /*
+* the pages may have already been dirty, clear out old accounting
+* so we can set things up properly
+*/
+   clear_extent_bit(&BTRFS_I(inode)->io_tree, start_pos, end_of_last_block,
+EXTENT_DIRTY | EXTENT_DELALLOC |
+EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 0, 0,
+cached);
+
if (!btrfs_is_free_space_inode(BTRFS_I(inode))) {
if (start_pos >= isize &&
!(BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC)) {
@@ -1504,18 +1513,27 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode 
*inode, struct page **pages,
}
if (ordered)
btrfs_put_ordered_extent(ordered);
-   clear_extent_bit(&inode->io_tree, start_pos, last_pos,
-EXTENT_DIRTY | EXTENT_DELALLOC |
-EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
-0, 0, cached_state);
+
*lockstart = start_pos;
*lockend = last_pos;
ret = 1;
}
 
+   /*
+* It's possible the pages are dirty right now, but we don't want
+* to clean them yet because copy_from_user may catch a page fault
+* and we might have to fall back to one page at a time.  If that
+* happens, we'll unlock these pages and we'd have a window where
+* reclaim could sneak in and drop the once-dirty page on the floor
+* without writing it.
+*
+* We have the pages locked and the extent range locked, so there's
+* no way someone can start IO on any dirty pages in this range.
+*
+* we'll call btrfs_dirty_pages() later on, and that will flip around
+* delalloc bits and dirty the pages as required.
+*/
for (i = 0; i < num_pages; i++) {
-   if (clear_page_dirty_for_io(pages[i]))
-   account_page_redirty(pages[i]);
set_page_extent_mapped(pages[i]);
WARN_ON(!PageLocked(pages[i]));
}
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: delayed-ref: simplify btrfs_add_delayed_tree_ref()

2018-06-20 Thread David Sterba

On Wed, May 23, 2018 at 11:22:20AM +0300, Nikolay Borisov wrote:
> 
> 
> On 23.05.2018 11:06, Su Yue wrote:
> > Commit 5a5003df98d5 ("btrfs: delayed-ref: double free in
> > btrfs_add_delayed_tree_ref()") fixed double free problem by creating
> > an unnessesary label to jump.
> > The elegant way is just to change "ref" to "head_ref" and keep
> > btrfs_add_delayed_tree_ref() and btrfs_add_delayed_data_ref() in
> > similar structure.
> 
> I agree, personally I'm a fan of multiple returns rather than jump
> labels, because at this point you know the function terminates and
> that's it.

Ok, let's do the freeing in-place, but it would be better to put them
before any other code, which is init_delayed_ref_common() in this case.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Streamline log_extent_csums a bit

2018-06-20 Thread Nikolay Borisov

Currently this function takes the root as an argument only to get the
log_root from it. Simplify this by directly passing the log root from
the caller. Also eliminate the fs_info local var, since it's used only
once, so directly reference it from the transaction handle.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/tree-log.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index daf32dc94dc3..b52ca6b8503e 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4064,11 +4064,9 @@ static int extent_cmp(void *priv, struct list_head *a, 
struct list_head *b)
 
 static int log_extent_csums(struct btrfs_trans_handle *trans,
struct btrfs_inode *inode,
-   struct btrfs_root *root,
+   struct btrfs_root *log,
const struct extent_map *em)
 {
-   struct btrfs_fs_info *fs_info = root->fs_info;
-   struct btrfs_root *log = root->log_root;
u64 csum_offset;
u64 csum_len;
LIST_HEAD(ordered_sums);
@@ -4089,7 +4087,7 @@ static int log_extent_csums(struct btrfs_trans_handle 
*trans,
}
 
/* block start is already adjusted for the file extent offset. */
-   ret = btrfs_lookup_csums_range(fs_info->csum_root,
+   ret = btrfs_lookup_csums_range(trans->fs_info->csum_root,
   em->block_start + csum_offset,
   em->block_start + csum_offset +
   csum_len - 1, &ordered_sums, 0);
@@ -4125,7 +4123,7 @@ static int log_one_extent(struct btrfs_trans_handle 
*trans,
int ret;
int extent_inserted = 0;
 
-   ret = log_extent_csums(trans, inode, root, em);
+   ret = log_extent_csums(trans, inode, log, em);
if (ret)
return ret;
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] btrfs: fix race between mkfs and mount

2018-06-20 Thread David Sterba

On Mon, Jun 04, 2018 at 11:00:30PM +0800, Anand Jain wrote:
> In an instrumented testing it is possible that the mount and
> a newer mkfs.btrfs thread on the same device can race and if the new
> mkfs.btrfs wins it will free the older fs_devices, then the mount thread
> will lead to oops.
> 
> Thread1   Thread2
> ---   ---
> mkfs.btrfs -fq /dev/sdb
> mount /dev/sdb /btrfs
> |_btrfs_mount_root()
>   |_btrfs_scan_one_device(... &fs_devices)
> 
>   mkfs.btrfs -fq /dev/sdb
>   |_btrfs_contol_ioctl()
> |_btrfs_scan_one_device(... 
> &fs_devices)
>   |_::
> 
> |_btrfs_free_stale_devices()
> 
>   |_btrfs_open_devices(fs_devices ..) <-- stale fs_devices.
> 
> Fix this with a mutually exclusive flag BTRFS_VOL_FLAG_EXCL_OPS.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/super.c   |  6 ++
>  fs/btrfs/volumes.c | 10 +-
>  fs/btrfs/volumes.h |  1 +
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index f0c13defc9eb..b60e7cbe39f5 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1565,7 +1565,13 @@ static struct dentry *btrfs_mount_root(struct 
> file_system_type *fs_type,
>   goto error_fs_info;
>   }
>  
> + if (test_and_set_bit(BTRFS_VOLUME_STATE_EXCL_OPS, 
> &fs_devices->volume_state)) {
> + error = -EBUSY;

We'd need to wait until the bit is not set instead of BUSY, as the
parallel scan is not really a reason to fail the whole mount.

I'll post the patch series to address this problem today, it utilizes
the uuid_mutex in a similar way you try to do with the new bit, but it
will not lead to EBUSY.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] btrfs: convert volume rotating flag into bitmap

2018-06-20 Thread David Sterba

On Mon, Jun 04, 2018 at 11:00:28PM +0800, Anand Jain wrote:
> Add bitmap btrfs_fs_devices::volume_state to maintain the volume states and
> flags. This patch in perticular converts btrfs_fs_devices::rotating into
> flag BTRFS_VOLUME_STATE_ROTATING.

I'm not sure we need this. There are 2 flags, we don't need the
atomicity of test/set _bit, the values don't change too often so
protecting by the mutex should be suffictient.

The size of btrfs_device is also not a big concern, it will be stored in
the 512 byte kmalloc bin and it's size is close to the size so replacing
2 ints with one long will not gain anything.

The 3rd flag would cause other problems and is not the right solution to
the scan/mount problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] btrfs: always wait on ordered extents at fsync time

2018-06-20 Thread David Sterba

On Thu, May 24, 2018 at 11:49:04AM +0100, Filipe Manana wrote:
> On Wed, May 23, 2018 at 4:58 PM, Josef Bacik  wrote:
> > From: Josef Bacik 
> >
> > There's a priority inversion that exists currently with btrfs fsync.  In
> > some cases we will collect outstanding ordered extents onto a list and
> > only wait on them at the very last second.  However this "very last
> > second" falls inside of a transaction handle, so if we are in a lower
> > priority cgroup we can end up holding the transaction open for longer
> > than needed, so if a high priority cgroup is also trying to fsync()
> > it'll see latency.
> >
> > Signed-off-by: Josef Bacik 
> > ---
> >  fs/btrfs/file.c | 56 
> > 
> >  1 file changed, 4 insertions(+), 52 deletions(-)
> >
> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> > index 5772f0cbedef..2b1c36612384 100644
> > --- a/fs/btrfs/file.c
> > +++ b/fs/btrfs/file.c
> > @@ -2069,53 +2069,12 @@ int btrfs_sync_file(struct file *file, loff_t 
> > start, loff_t end, int datasync)
> > atomic_inc(&root->log_batch);
> > full_sync = test_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
> >  &BTRFS_I(inode)->runtime_flags);
> > +
> > /*
> > -* We might have have had more pages made dirty after calling
> > -* start_ordered_ops and before acquiring the inode's i_mutex.
> > +* We have to do this here to avoid the priority inversion of 
> > waiting on
> > +* IO of a lower priority task while holding a transaciton open.
> >  */
> > -   if (full_sync) {
> > -   /*
> > -* For a full sync, we need to make sure any ordered 
> > operations
> > -* start and finish before we start logging the inode, so 
> > that
> > -* all extents are persisted and the respective file extent
> > -* items are in the fs/subvol btree.
> > -*/
> > -   ret = btrfs_wait_ordered_range(inode, start, len);
> > -   } else {
> > -   /*
> > -* Start any new ordered operations before starting to log 
> > the
> > -* inode. We will wait for them to finish in 
> > btrfs_sync_log().
> > -*
> > -* Right before acquiring the inode's mutex, we might have 
> > new
> > -* writes dirtying pages, which won't immediately start the
> > -* respective ordered operations - that is done through the
> > -* fill_delalloc callbacks invoked from the writepage and
> > -* writepages address space operations. So make sure we 
> > start
> > -* all ordered operations before starting to log our inode. 
> > Not
> > -* doing this means that while logging the inode, writeback
> > -* could start and invoke writepage/writepages, which would 
> > call
> > -* the fill_delalloc callbacks (cow_file_range,
> > -* submit_compressed_extents). These callbacks add first an
> > -* extent map to the modified list of extents and then 
> > create
> > -* the respective ordered operation, which means in
> > -* tree-log.c:btrfs_log_inode() we might capture all 
> > existing
> > -* ordered operations (with btrfs_get_logged_extents()) 
> > before
> > -* the fill_delalloc callback adds its ordered operation, 
> > and by
> > -* the time we visit the modified list of extent maps (with
> > -* btrfs_log_changed_extents()), we see and process the 
> > extent
> > -* map they created. We then use the extent map to 
> > construct a
> > -* file extent item for logging without waiting for the
> > -* respective ordered operation to finish - this file extent
> > -* item points to a disk location that might not have yet 
> > been
> > -* written to, containing random data - so after a crash a 
> > log
> > -* replay will make our inode have file extent items that 
> > point
> > -* to disk locations containing invalid data, as we returned
> > -* success to userspace without waiting for the respective
> > -* ordered operation to finish, because it wasn't captured 
> > by
> > -* btrfs_get_logged_extents().
> > -*/
> > -   ret = start_ordered_ops(inode, start, end);
> > -   }
> > +   ret = btrfs_wait_ordered_range(inode, start, len);
> > if (ret) {
> > inode_unlock(inode);
> > goto out;
> > @@ -2240,13 +2199,6 @@ int btrfs_sync_file(struct file *file, loff_t start, 
> > loff_t end, int datasync)
> > goto out;
> > }
> > }
> > -   if

Re: [PATCH 2/4] btrfs: remove the wait ordered logic in the log_one_extent path

2018-06-20 Thread David Sterba

On Sat, May 26, 2018 at 03:48:31PM +0300, Nikolay Borisov wrote:
> > +   ret = log_extent_csums(trans, inode, root, em);
> 
> With the following minor diff: 
> 
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index daf32dc94dc3..34d5b0630824 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> 
> @@ -4064,11 +4064,9 @@ static int extent_cmp(void *priv, struct list_head *a, 
> struct list_head *b)
>  
>  static int log_extent_csums(struct btrfs_trans_handle *trans,
> struct btrfs_inode *inode,
> -   struct btrfs_root *root,
> +   struct btrfs_root *log,
> const struct extent_map *em)
>  {
> -   struct btrfs_fs_info *fs_info = root->fs_info;
> -   struct btrfs_root *log = root->log_root;
> u64 csum_offset;
> u64 csum_len;
> LIST_HEAD(ordered_sums);
> @@ -4089,7 +4087,7 @@ static int log_extent_csums(struct btrfs_trans_handle 
> *trans,
> }
>  
> /* block start is already adjusted for the file extent offset. */
> -   ret = btrfs_lookup_csums_range(fs_info->csum_root,
> +   ret = btrfs_lookup_csums_range(trans->fs_info->csum_root,
>em->block_start + csum_offset,
>em->block_start + csum_offset +
>csum_len - 1, &ordered_sums, 0);
> @@ -4125,7 +4123,7 @@ static int log_one_extent(struct btrfs_trans_handle 
> *trans,
> int ret;
> int extent_inserted = 0;
>  
> -   ret = log_extent_csums(trans, inode, root, em);
> +   ret = log_extent_csums(trans, inode, log, em);
> if (ret)
> return ret;
> 
> Bloat-o-meter reports: 
> 
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-55 (-55)
> Function old new   delta
> btrfs_log_changed_extents.isra  42984243 -55
> Total: Before=64999, After=64944, chg -0.08%
> 
> I suggest you incorporate it in the patch

The patches are in misc-next now, please send that as a separate patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] btrfs: always wait on ordered extents at fsync time

2018-06-20 Thread David Sterba

On Wed, May 23, 2018 at 11:58:33AM -0400, Josef Bacik wrote:
> From: Josef Bacik 
> 
> There's a priority inversion that exists currently with btrfs fsync.  In
> some cases we will collect outstanding ordered extents onto a list and
> only wait on them at the very last second.  However this "very last
> second" falls inside of a transaction handle, so if we are in a lower
> priority cgroup we can end up holding the transaction open for longer
> than needed, so if a high priority cgroup is also trying to fsync()
> it'll see latency.
> 
> Signed-off-by: Josef Bacik 

1-4 added to misc-next, with Filipe's reviewed-by from the first
iteration.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: fix invalid-free in btrfs_extent_same

2018-06-20 Thread David Sterba

On Wed, Jun 20, 2018 at 03:11:46PM +0800, Lu Fengqi wrote:
> On Tue, Jun 19, 2018 at 03:27:54PM +0200, David Sterba wrote:
> >On Tue, Jun 19, 2018 at 02:54:38PM +0800, Lu Fengqi wrote:
> >> If this condition ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
> >>   (BTRFS_I(dst)->flags & BTRFS_INODE_NODATASUM))
> >> is hit, we will go to free the uninitialized cmp.src_pages and
> >> cmp.dst_pages.
> >> 
> >> Fixes: 67b07bd4bec5 ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl")
> >> Signed-off-by: Lu Fengqi 
> >> ---
> >>  fs/btrfs/ioctl.c | 10 +-
> >>  1 file changed, 5 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> >> index c2837a32d689..43ecbe620dea 100644
> >> --- a/fs/btrfs/ioctl.c
> >> +++ b/fs/btrfs/ioctl.c
> >> @@ -3577,7 +3577,7 @@ static int btrfs_extent_same(struct inode *src, u64 
> >> loff, u64 olen,
> >>ret = btrfs_extent_same_range(src, loff, BTRFS_MAX_DEDUPE_LEN,
> >>  dst, dst_loff, &cmp);
> >>if (ret)
> >> -  goto out_unlock;
> >> +  goto out_free;
> >>  
> >>loff += BTRFS_MAX_DEDUPE_LEN;
> >>dst_loff += BTRFS_MAX_DEDUPE_LEN;
> >> @@ -3587,16 +3587,16 @@ static int btrfs_extent_same(struct inode *src, 
> >> u64 loff, u64 olen,
> >>ret = btrfs_extent_same_range(src, loff, tail_len, dst,
> >>  dst_loff, &cmp);
> >
> >The labels now switch order and there's one more 'goto out_free' that
> >actually also wants to unlock the pages, after error of
> >btrfs_extent_same_range in the for loop. So this needs to be update too.
> 
> Sorry, I'm not quite sure what needs to be updated. I will appreciate if
> you are willing to take time to make it clear. There are three goto
> statements here. The first one that between lock and malloc, jumps directly
> to the unlock label. The rest goto statements (including this goto
> statement after btrfs_extent_same_range in the for loop) that after malloc,
> jump to the following free label. No matter jump to which label, the pages
> will be freed and the inodes will be unlocked.

Sorry, I must have looked at the unpatched sources, the patch is fine as
sent and I'll add it to 4.18 queue.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/34] btrfs: Remove fs_info from remove_extent_data_ref

2018-06-20 Thread Nikolay Borisov

This function is always called with a valid transaction from where the
fs_info can be referenced. No functional change.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1171ad81e5c5..8891eea6fa2c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1383,7 +1383,6 @@ static noinline int insert_extent_data_ref(struct 
btrfs_trans_handle *trans,
 }
 
 static noinline int remove_extent_data_ref(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info,
   struct btrfs_path *path,
   int refs_to_drop, int *last_ref)
 {
@@ -1420,7 +1419,7 @@ static noinline int remove_extent_data_ref(struct 
btrfs_trans_handle *trans,
num_refs -= refs_to_drop;
 
if (num_refs == 0) {
-   ret = btrfs_del_item(trans, fs_info->extent_root, path);
+   ret = btrfs_del_item(trans, trans->fs_info->extent_root, path);
*last_ref = 1;
} else {
if (key.type == BTRFS_EXTENT_DATA_REF_KEY)
@@ -2020,7 +2019,7 @@ static int remove_extent_backref(struct 
btrfs_trans_handle *trans,
update_inline_extent_backref(fs_info, path, iref,
 -refs_to_drop, NULL, last_ref);
} else if (is_data) {
-   ret = remove_extent_data_ref(trans, fs_info, path, refs_to_drop,
+   ret = remove_extent_data_ref(trans, path, refs_to_drop,
 last_ref);
} else {
*last_ref = 1;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/34] btrfs: Remove fs_info from insert_extent_data_ref

2018-06-20 Thread Nikolay Borisov

This function is always called with a valid transaction handle from
where fs_info can be referenced. So remove the redundant argument.
No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f0e40884a908..5fc44c2e3e18 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1304,13 +1304,12 @@ static noinline int lookup_extent_data_ref(struct 
btrfs_trans_handle *trans,
 }
 
 static noinline int insert_extent_data_ref(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info,
   struct btrfs_path *path,
   u64 bytenr, u64 parent,
   u64 root_objectid, u64 owner,
   u64 offset, int refs_to_add)
 {
-   struct btrfs_root *root = fs_info->extent_root;
+   struct btrfs_root *root = trans->fs_info->extent_root;
struct btrfs_key key;
struct extent_buffer *leaf;
u32 size;
@@ -2002,9 +2001,9 @@ static int insert_extent_backref(struct 
btrfs_trans_handle *trans,
ret = insert_tree_block_ref(trans, path, bytenr, parent,
root_objectid);
} else {
-   ret = insert_extent_data_ref(trans, fs_info, path, bytenr,
-parent, root_objectid,
-owner, offset, refs_to_add);
+   ret = insert_extent_data_ref(trans, path, bytenr, parent,
+root_objectid, owner, offset,
+refs_to_add);
}
return ret;
 }
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/34] btrfs: Remove fs_info from insert_tree_block_ref

2018-06-20 Thread Nikolay Borisov

This function is always called with a valid transaction so there is no
need to duplicate the fs_info, we can reference it directly from the
trans handle. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3d9fe58c0080..f0e40884a908 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1523,7 +1523,6 @@ static noinline int lookup_tree_block_ref(struct 
btrfs_trans_handle *trans,
 }
 
 static noinline int insert_tree_block_ref(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info,
  struct btrfs_path *path,
  u64 bytenr, u64 parent,
  u64 root_objectid)
@@ -1540,7 +1539,7 @@ static noinline int insert_tree_block_ref(struct 
btrfs_trans_handle *trans,
key.offset = root_objectid;
}
 
-   ret = btrfs_insert_empty_item(trans, fs_info->extent_root,
+   ret = btrfs_insert_empty_item(trans, trans->fs_info->extent_root,
  path, &key, 0);
btrfs_release_path(path);
return ret;
@@ -2000,8 +1999,8 @@ static int insert_extent_backref(struct 
btrfs_trans_handle *trans,
int ret;
if (owner < BTRFS_FIRST_FREE_OBJECTID) {
BUG_ON(refs_to_add != 1);
-   ret = insert_tree_block_ref(trans, fs_info, path, bytenr,
-   parent, root_objectid);
+   ret = insert_tree_block_ref(trans, path, bytenr, parent,
+   root_objectid);
} else {
ret = insert_extent_data_ref(trans, fs_info, path, bytenr,
 parent, root_objectid,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/34] btrfs: Remove fs_info argument from update_inline_extent_backref

2018-06-20 Thread Nikolay Borisov

This function always uses the leaf's extent_buffer which already
contains a reference to the fs_info. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 211a9d0a94dd..1bbe6d403763 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1892,14 +1892,14 @@ static int lookup_extent_backref(struct 
btrfs_trans_handle *trans,
  * helper to update/remove inline back ref
  */
 static noinline_for_stack
-void update_inline_extent_backref(struct btrfs_fs_info *fs_info,
- struct btrfs_path *path,
+void update_inline_extent_backref(struct btrfs_path *path,
  struct btrfs_extent_inline_ref *iref,
  int refs_to_mod,
  struct btrfs_delayed_extent_op *extent_op,
  int *last_ref)
 {
-   struct extent_buffer *leaf;
+   struct extent_buffer *leaf = path->nodes[0];
+   struct btrfs_fs_info *fs_info = leaf->fs_info;
struct btrfs_extent_item *ei;
struct btrfs_extent_data_ref *dref = NULL;
struct btrfs_shared_data_ref *sref = NULL;
@@ -1910,7 +1910,6 @@ void update_inline_extent_backref(struct btrfs_fs_info 
*fs_info,
int type;
u64 refs;
 
-   leaf = path->nodes[0];
ei = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_extent_item);
refs = btrfs_extent_refs(leaf, ei);
WARN_ON(refs_to_mod < 0 && refs + refs_to_mod <= 0);
@@ -1977,8 +1976,8 @@ int insert_inline_extent_backref(struct 
btrfs_trans_handle *trans,
   owner, offset, 1);
if (ret == 0) {
BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
-   update_inline_extent_backref(fs_info, path, iref,
-refs_to_add, extent_op, NULL);
+   update_inline_extent_backref(path, iref, refs_to_add,
+extent_op, NULL);
} else if (ret == -ENOENT) {
setup_inline_extent_backref(fs_info, path, iref, parent,
root_objectid, owner, offset,
@@ -2016,8 +2015,8 @@ static int remove_extent_backref(struct 
btrfs_trans_handle *trans,
 
BUG_ON(!is_data && refs_to_drop != 1);
if (iref) {
-   update_inline_extent_backref(fs_info, path, iref,
--refs_to_drop, NULL, last_ref);
+   update_inline_extent_backref(path, iref, -refs_to_drop, NULL,
+last_ref);
} else if (is_data) {
ret = remove_extent_data_ref(trans, path, refs_to_drop,
 last_ref);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/34] btrfs: Remove fs_info from fixup_low_keys

2018-06-20 Thread Nikolay Borisov

This argument is unused. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 4bc326df472e..18fd80e2f278 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -3128,8 +3128,7 @@ int btrfs_search_slot_for_read(struct btrfs_root *root,
  * higher levels
  *
  */
-static void fixup_low_keys(struct btrfs_fs_info *fs_info,
-  struct btrfs_path *path,
+static void fixup_low_keys(struct btrfs_path *path,
   struct btrfs_disk_key *key, int level)
 {
int i;
@@ -3181,7 +3180,7 @@ void btrfs_set_item_key_safe(struct btrfs_fs_info 
*fs_info,
btrfs_set_item_key(eb, &disk_key, slot);
btrfs_mark_buffer_dirty(eb);
if (slot == 0)
-   fixup_low_keys(fs_info, path, &disk_key, 1);
+   fixup_low_keys(path, &disk_key, 1);
 }
 
 /*
@@ -3945,7 +3944,7 @@ static noinline int __push_leaf_left(struct btrfs_fs_info 
*fs_info,
clean_tree_block(fs_info, right);
 
btrfs_item_key(right, &disk_key, 0);
-   fixup_low_keys(fs_info, path, &disk_key, 1);
+   fixup_low_keys(path, &disk_key, 1);
 
/* then fixup the leaf pointer in the path */
if (path->slots[0] < push_items) {
@@ -4320,7 +4319,7 @@ static noinline int split_leaf(struct btrfs_trans_handle 
*trans,
path->nodes[0] = right;
path->slots[0] = 0;
if (path->slots[1] == 0)
-   fixup_low_keys(fs_info, path, &disk_key, 1);
+   fixup_low_keys(path, &disk_key, 1);
}
/*
 * We create a new leaf 'right' for the required ins_len and
@@ -4642,7 +4641,7 @@ void btrfs_truncate_item(struct btrfs_fs_info *fs_info,
btrfs_set_disk_key_offset(&disk_key, offset + size_diff);
btrfs_set_item_key(leaf, &disk_key, slot);
if (slot == 0)
-   fixup_low_keys(fs_info, path, &disk_key, 1);
+   fixup_low_keys(path, &disk_key, 1);
}
 
item = btrfs_item_nr(slot);
@@ -4744,7 +4743,7 @@ void setup_items_for_insert(struct btrfs_root *root, 
struct btrfs_path *path,
 
if (path->slots[0] == 0) {
btrfs_cpu_key_to_disk(&disk_key, cpu_key);
-   fixup_low_keys(fs_info, path, &disk_key, 1);
+   fixup_low_keys(path, &disk_key, 1);
}
btrfs_unlock_up_safe(path, 1);
 
@@ -4886,7 +4885,6 @@ int btrfs_insert_item(struct btrfs_trans_handle *trans, 
struct btrfs_root *root,
 static void del_ptr(struct btrfs_root *root, struct btrfs_path *path,
int level, int slot)
 {
-   struct btrfs_fs_info *fs_info = root->fs_info;
struct extent_buffer *parent = path->nodes[level];
u32 nritems;
int ret;
@@ -4919,7 +4917,7 @@ static void del_ptr(struct btrfs_root *root, struct 
btrfs_path *path,
struct btrfs_disk_key disk_key;
 
btrfs_node_key(parent, &disk_key, 0);
-   fixup_low_keys(fs_info, path, &disk_key, level + 1);
+   fixup_low_keys(path, &disk_key, level + 1);
}
btrfs_mark_buffer_dirty(parent);
 }
@@ -5022,7 +5020,7 @@ int btrfs_del_items(struct btrfs_trans_handle *trans, 
struct btrfs_root *root,
struct btrfs_disk_key disk_key;
 
btrfs_item_key(leaf, &disk_key, 0);
-   fixup_low_keys(fs_info, path, &disk_key, 1);
+   fixup_low_keys(path, &disk_key, 1);
}
 
/* delete the leaf if it is mostly empty */
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/34] btrfs: Remove fs_info from lookup_inline_extent_backref

2018-06-20 Thread Nikolay Borisov

This function is always called with a valid transaction handle from
where the fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8891eea6fa2c..211a9d0a94dd 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1596,13 +1596,13 @@ static int find_next_key(struct btrfs_path *path, int 
level,
  */
 static noinline_for_stack
 int lookup_inline_extent_backref(struct btrfs_trans_handle *trans,
-struct btrfs_fs_info *fs_info,
 struct btrfs_path *path,
 struct btrfs_extent_inline_ref **ref_ret,
 u64 bytenr, u64 num_bytes,
 u64 parent, u64 root_objectid,
 u64 owner, u64 offset, int insert)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_root *root = fs_info->extent_root;
struct btrfs_key key;
struct extent_buffer *leaf;
@@ -1868,9 +1868,9 @@ static int lookup_extent_backref(struct 
btrfs_trans_handle *trans,
 {
int ret;
 
-   ret = lookup_inline_extent_backref(trans, fs_info, path, ref_ret,
-  bytenr, num_bytes, parent,
-  root_objectid, owner, offset, 0);
+   ret = lookup_inline_extent_backref(trans, path, ref_ret, bytenr,
+  num_bytes, parent, root_objectid,
+  owner, offset, 0);
if (ret != -ENOENT)
return ret;
 
@@ -1972,9 +1972,9 @@ int insert_inline_extent_backref(struct 
btrfs_trans_handle *trans,
struct btrfs_extent_inline_ref *iref;
int ret;
 
-   ret = lookup_inline_extent_backref(trans, fs_info, path, &iref,
-  bytenr, num_bytes, parent,
-  root_objectid, owner, offset, 1);
+   ret = lookup_inline_extent_backref(trans, path, &iref, bytenr,
+  num_bytes, parent, root_objectid,
+  owner, offset, 1);
if (ret == 0) {
BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
update_inline_extent_backref(fs_info, path, iref,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/34] btrfs: Remove fs_info from lookup_extent_backref

2018-06-20 Thread Nikolay Borisov

This argument is unused. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index aeef5437ec8a..0c4c093201b5 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1858,7 +1858,6 @@ void setup_inline_extent_backref(struct btrfs_fs_info 
*fs_info,
 }
 
 static int lookup_extent_backref(struct btrfs_trans_handle *trans,
-struct btrfs_fs_info *fs_info,
 struct btrfs_path *path,
 struct btrfs_extent_inline_ref **ref_ret,
 u64 bytenr, u64 num_bytes, u64 parent,
@@ -6844,9 +6843,8 @@ static int __btrfs_free_extent(struct btrfs_trans_handle 
*trans,
if (is_data)
skinny_metadata = false;
 
-   ret = lookup_extent_backref(trans, info, path, &iref,
-   bytenr, num_bytes, parent,
-   root_objectid, owner_objectid,
+   ret = lookup_extent_backref(trans, path, &iref, bytenr, num_bytes,
+   parent, root_objectid, owner_objectid,
owner_offset);
if (ret == 0) {
extent_slot = path->slots[0];
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 19/34] btrfs: Remove fs_info from run_delayed_extent_op

2018-06-20 Thread Nikolay Borisov

This function is always called with a valid transaction handle so
fs_info can be referenced from there. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index edc1c98c3556..f13b49497f67 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2318,10 +2318,10 @@ static void __run_delayed_extent_op(struct 
btrfs_delayed_extent_op *extent_op,
 }
 
 static int run_delayed_extent_op(struct btrfs_trans_handle *trans,
-struct btrfs_fs_info *fs_info,
 struct btrfs_delayed_ref_head *head,
 struct btrfs_delayed_extent_op *extent_op)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_key key;
struct btrfs_path *path;
struct btrfs_extent_item *ei;
@@ -2526,7 +2526,7 @@ static int cleanup_extent_op(struct btrfs_trans_handle 
*trans,
return 0;
}
spin_unlock(&head->lock);
-   ret = run_delayed_extent_op(trans, fs_info, head, extent_op);
+   ret = run_delayed_extent_op(trans, head, extent_op);
btrfs_free_delayed_extent_op(extent_op);
return ret ? ret : 1;
 }
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/34] btrfs: Remove fs_info argument from lookup_extent_data_ref

2018-06-20 Thread Nikolay Borisov

This function is always called with a valid transaction handle from
where fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1bac81f4f77f..aeef5437ec8a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1216,13 +1216,12 @@ static int match_extent_data_ref(struct extent_buffer 
*leaf,
 }
 
 static noinline int lookup_extent_data_ref(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info,
   struct btrfs_path *path,
   u64 bytenr, u64 parent,
   u64 root_objectid,
   u64 owner, u64 offset)
 {
-   struct btrfs_root *root = fs_info->extent_root;
+   struct btrfs_root *root = trans->fs_info->extent_root;
struct btrfs_key key;
struct btrfs_extent_data_ref *ref;
struct extent_buffer *leaf;
@@ -1880,9 +1879,8 @@ static int lookup_extent_backref(struct 
btrfs_trans_handle *trans,
ret = lookup_tree_block_ref(trans, path, bytenr, parent,
root_objectid);
} else {
-   ret = lookup_extent_data_ref(trans, fs_info, path, bytenr,
-parent, root_objectid, owner,
-offset);
+   ret = lookup_extent_data_ref(trans, path, bytenr, parent,
+root_objectid, owner, offset);
}
return ret;
 }
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 16/34] btrfs: Remove fs_info from alloc_reserved_file_extent

2018-06-20 Thread Nikolay Borisov

fs_info can be referenced from the transaction handle, which is always
valid. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 48d89ee676e0..449ebd1bb8c9 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -60,7 +60,6 @@ static void __run_delayed_extent_op(struct 
btrfs_delayed_extent_op *extent_op,
struct extent_buffer *leaf,
struct btrfs_extent_item *ei);
 static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info,
  u64 parent, u64 root_objectid,
  u64 flags, u64 owner, u64 offset,
  struct btrfs_key *ins, int ref_mod);
@@ -2282,10 +2281,10 @@ static int run_delayed_data_ref(struct 
btrfs_trans_handle *trans,
if (node->action == BTRFS_ADD_DELAYED_REF && insert_reserved) {
if (extent_op)
flags |= extent_op->flags_to_set;
-   ret = alloc_reserved_file_extent(trans, fs_info,
-parent, ref_root, flags,
-ref->objectid, ref->offset,
-&ins, node->ref_mod);
+   ret = alloc_reserved_file_extent(trans, parent, ref_root,
+flags, ref->objectid,
+ref->offset, &ins,
+node->ref_mod);
} else if (node->action == BTRFS_ADD_DELAYED_REF) {
ret = __btrfs_inc_extent_ref(trans, fs_info, node, parent,
 ref_root, ref->objectid,
@@ -8041,11 +8040,11 @@ int btrfs_free_and_pin_reserved_extent(struct 
btrfs_fs_info *fs_info,
 }
 
 static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info,
  u64 parent, u64 root_objectid,
  u64 flags, u64 owner, u64 offset,
  struct btrfs_key *ins, int ref_mod)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
int ret;
struct btrfs_extent_item *extent_item;
struct btrfs_extent_inline_ref *iref;
@@ -8272,8 +8271,8 @@ int btrfs_alloc_logged_file_extent(struct 
btrfs_trans_handle *trans,
spin_unlock(&block_group->lock);
spin_unlock(&space_info->lock);
 
-   ret = alloc_reserved_file_extent(trans, fs_info, 0, root_objectid,
-0, owner, offset, ins, 1);
+   ret = alloc_reserved_file_extent(trans, 0, root_objectid, 0, owner,
+offset, ins, 1);
btrfs_put_block_group(block_group);
return ret;
 }
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/34] btrfs: Remove fs_info from btrfs_make_block_group

2018-06-20 Thread Nikolay Borisov

This function is always called with a valid transaction handle from
where we can reference the fs_info. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h   | 4 ++--
 fs/btrfs/extent-tree.c | 4 ++--
 fs/btrfs/volumes.c | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 118346aceea9..907c14786680 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2716,8 +2716,8 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info);
 int btrfs_read_block_groups(struct btrfs_fs_info *info);
 int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr);
 int btrfs_make_block_group(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info, u64 bytes_used,
-  u64 type, u64 chunk_offset, u64 size);
+  u64 bytes_used, u64 type, u64 chunk_offset,
+  u64 size);
 void btrfs_add_raid_kobjects(struct btrfs_fs_info *fs_info);
 struct btrfs_trans_handle *btrfs_start_trans_remove_block_group(
struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index cf54cd48c75d..95da427a66ea 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10207,10 +10207,10 @@ void btrfs_create_pending_block_groups(struct 
btrfs_trans_handle *trans)
trans->can_flush_pending_bgs = can_flush_pending_bgs;
 }
 
-int btrfs_make_block_group(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info, u64 bytes_used,
+int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used,
   u64 type, u64 chunk_offset, u64 size)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_block_group_cache *cache;
int ret;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e034ad9e23b4..4f376463fdd4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4898,7 +4898,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
refcount_inc(&em->refs);
write_unlock(&em_tree->lock);
 
-   ret = btrfs_make_block_group(trans, info, 0, type, start, num_bytes);
+   ret = btrfs_make_block_group(trans, 0, type, start, num_bytes);
if (ret)
goto error_del_extent;
 
@@ -5173,7 +5173,7 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 
logical, u64 len)
/*
 * There could be two corrupted data stripes, we need
 * to loop retry in order to rebuild the correct data.
-* 
+*
 * Fail a stripe at a time on every retry except the
 * stripe under reconstruction.
 */
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 24/34] btrfs: Remove fs_info from btrfs_alloc_chunk

2018-06-20 Thread Nikolay Borisov

It can be referenced from trans since the function is always called
within a transaction.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 4 ++--
 fs/btrfs/volumes.c | 7 +++
 fs/btrfs/volumes.h | 3 +--
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4c45d68179d4..ed38ccf1607c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4534,7 +4534,7 @@ void check_system_chunk(struct btrfs_trans_handle *trans,
 * the paths we visit in the chunk tree (they were already COWed
 * or created in the current transaction for example).
 */
-   ret = btrfs_alloc_chunk(trans, fs_info, flags);
+   ret = btrfs_alloc_chunk(trans, flags);
}
 
if (!ret) {
@@ -4636,7 +4636,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans, u64 flags,
 */
check_system_chunk(trans, fs_info, flags);
 
-   ret = btrfs_alloc_chunk(trans, fs_info, flags);
+   ret = btrfs_alloc_chunk(trans, flags);
trans->allocating_chunk = false;
 
spin_lock(&space_info->lock);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fefff9405884..4a444aa82ac3 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5036,13 +5036,12 @@ int btrfs_finish_chunk_alloc(struct btrfs_trans_handle 
*trans,
  * require modifying the chunk tree. This division is important for the
  * bootstrap process of adding storage to a seed btrfs.
  */
-int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info, u64 type)
+int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, u64 type)
 {
u64 chunk_offset;
 
-   lockdep_assert_held(&fs_info->chunk_mutex);
-   chunk_offset = find_next_chunk(fs_info);
+   lockdep_assert_held(&trans->fs_info->chunk_mutex);
+   chunk_offset = find_next_chunk(trans->fs_info);
return __btrfs_alloc_chunk(trans, chunk_offset, type);
 }
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 5139ec8daf4c..df2d8bdf8c9a 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -396,8 +396,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 
chunk_start,
 u64 physical, u64 **logical, int *naddrs, int *stripe_len);
 int btrfs_read_sys_array(struct btrfs_fs_info *fs_info);
 int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info);
-int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info, u64 type);
+int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, u64 type);
 void btrfs_mapping_init(struct btrfs_mapping_tree *tree);
 void btrfs_mapping_tree_free(struct btrfs_mapping_tree *tree);
 blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/34] btrfs: Remove fs_info argument from lookup_tree_block_ref

2018-06-20 Thread Nikolay Borisov

This function is always called with a valid transaction handle from
where the fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1bbe6d403763..1bac81f4f77f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1487,12 +1487,11 @@ static noinline u32 extent_data_ref_count(struct 
btrfs_path *path,
 }
 
 static noinline int lookup_tree_block_ref(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info,
  struct btrfs_path *path,
  u64 bytenr, u64 parent,
  u64 root_objectid)
 {
-   struct btrfs_root *root = fs_info->extent_root;
+   struct btrfs_root *root = trans->fs_info->extent_root;
struct btrfs_key key;
int ret;
 
@@ -1878,8 +1877,8 @@ static int lookup_extent_backref(struct 
btrfs_trans_handle *trans,
*ref_ret = NULL;
 
if (owner < BTRFS_FIRST_FREE_OBJECTID) {
-   ret = lookup_tree_block_ref(trans, fs_info, path, bytenr,
-   parent, root_objectid);
+   ret = lookup_tree_block_ref(trans, path, bytenr, parent,
+   root_objectid);
} else {
ret = lookup_extent_data_ref(trans, fs_info, path, bytenr,
 parent, root_objectid, owner,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/34] btrfs: Remove fs_info from __btrfs_free_extent

2018-06-20 Thread Nikolay Borisov

This function is always called with a valid transaction handle so we
can reference the fs_info from there. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 93dc421723ed..48d89ee676e0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -52,11 +52,10 @@ enum {
 };
 
 static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info,
-   struct btrfs_delayed_ref_node *node, u64 parent,
-   u64 root_objectid, u64 owner_objectid,
-   u64 owner_offset, int refs_to_drop,
-   struct btrfs_delayed_extent_op *extra_op);
+  struct btrfs_delayed_ref_node *node, u64 parent,
+  u64 root_objectid, u64 owner_objectid,
+  u64 owner_offset, int refs_to_drop,
+  struct btrfs_delayed_extent_op *extra_op);
 static void __run_delayed_extent_op(struct btrfs_delayed_extent_op *extent_op,
struct extent_buffer *leaf,
struct btrfs_extent_item *ei);
@@ -2293,7 +2292,7 @@ static int run_delayed_data_ref(struct btrfs_trans_handle 
*trans,
 ref->offset, node->ref_mod,
 extent_op);
} else if (node->action == BTRFS_DROP_DELAYED_REF) {
-   ret = __btrfs_free_extent(trans, fs_info, node, parent,
+   ret = __btrfs_free_extent(trans, node, parent,
  ref_root, ref->objectid,
  ref->offset, node->ref_mod,
  extent_op);
@@ -2446,8 +2445,7 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle 
*trans,
 ref->level, 0, 1,
 extent_op);
} else if (node->action == BTRFS_DROP_DELAYED_REF) {
-   ret = __btrfs_free_extent(trans, fs_info, node,
- parent, ref_root,
+   ret = __btrfs_free_extent(trans, node, parent, ref_root,
  ref->level, 0, 1, extent_op);
} else {
BUG();
@@ -6806,12 +6804,12 @@ int btrfs_finish_extent_commit(struct 
btrfs_trans_handle *trans)
 }
 
 static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
-   struct btrfs_fs_info *info,
-   struct btrfs_delayed_ref_node *node, u64 parent,
-   u64 root_objectid, u64 owner_objectid,
-   u64 owner_offset, int refs_to_drop,
-   struct btrfs_delayed_extent_op *extent_op)
+  struct btrfs_delayed_ref_node *node, u64 parent,
+  u64 root_objectid, u64 owner_objectid,
+  u64 owner_offset, int refs_to_drop,
+  struct btrfs_delayed_extent_op *extent_op)
 {
+   struct btrfs_fs_info *info = trans->fs_info;
struct btrfs_key key;
struct btrfs_path *path;
struct btrfs_root *extent_root = info->extent_root;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 18/34] btrfs: Remove fs_info from run_delayed_data_ref

2018-06-20 Thread Nikolay Borisov

This function is always called with a valid transaction from where
fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b72a6d7d84c3..edc1c98c3556 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2255,7 +2255,6 @@ static int __btrfs_inc_extent_ref(struct 
btrfs_trans_handle *trans,
 }
 
 static int run_delayed_data_ref(struct btrfs_trans_handle *trans,
-   struct btrfs_fs_info *fs_info,
struct btrfs_delayed_ref_node *node,
struct btrfs_delayed_extent_op *extent_op,
int insert_reserved)
@@ -2272,7 +2271,7 @@ static int run_delayed_data_ref(struct btrfs_trans_handle 
*trans,
ins.type = BTRFS_EXTENT_ITEM_KEY;
 
ref = btrfs_delayed_node_to_data_ref(node);
-   trace_run_delayed_data_ref(fs_info, node, ref, node->action);
+   trace_run_delayed_data_ref(trans->fs_info, node, ref, node->action);
 
if (node->type == BTRFS_SHARED_DATA_REF_KEY)
parent = ref->parent;
@@ -2471,7 +2470,7 @@ static int run_one_delayed_ref(struct btrfs_trans_handle 
*trans,
   insert_reserved);
else if (node->type == BTRFS_EXTENT_DATA_REF_KEY ||
 node->type == BTRFS_SHARED_DATA_REF_KEY)
-   ret = run_delayed_data_ref(trans, fs_info, node, extent_op,
+   ret = run_delayed_data_ref(trans, node, extent_op,
   insert_reserved);
else
BUG();
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/34] btrfs: Remove fs_info from btrfs_add_delayed_tree_ref

2018-06-20 Thread Nikolay Borisov

This function is always called with a valid transaction handle from
where fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/delayed-ref.c | 4 ++--
 fs/btrfs/delayed-ref.h | 3 +--
 fs/btrfs/extent-tree.c | 8 
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 03dec673d12a..82ac1273c65f 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -709,13 +709,13 @@ static void init_delayed_ref_common(struct btrfs_fs_info 
*fs_info,
  * to make sure the delayed ref is eventually processed before this
  * transaction commits.
  */
-int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
-  struct btrfs_trans_handle *trans,
+int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
   u64 bytenr, u64 num_bytes, u64 parent,
   u64 ref_root,  int level, int action,
   struct btrfs_delayed_extent_op *extent_op,
   int *old_ref_mod, int *new_ref_mod)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_delayed_tree_ref *ref;
struct btrfs_delayed_ref_head *head_ref;
struct btrfs_delayed_ref_root *delayed_refs;
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index ea1aecb6a50d..31729302c827 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -234,8 +234,7 @@ static inline void btrfs_put_delayed_ref_head(struct 
btrfs_delayed_ref_head *hea
kmem_cache_free(btrfs_delayed_ref_head_cachep, head);
 }
 
-int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
-  struct btrfs_trans_handle *trans,
+int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
   u64 bytenr, u64 num_bytes, u64 parent,
   u64 ref_root, int level, int action,
   struct btrfs_delayed_extent_op *extent_op,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0c4c093201b5..c9b6bfe001d1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2176,7 +2176,7 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
   owner, offset, BTRFS_ADD_DELAYED_REF);
 
if (owner < BTRFS_FIRST_FREE_OBJECTID) {
-   ret = btrfs_add_delayed_tree_ref(fs_info, trans, bytenr,
+   ret = btrfs_add_delayed_tree_ref(trans, bytenr,
 num_bytes, parent,
 root_objectid, (int)owner,
 BTRFS_ADD_DELAYED_REF, NULL,
@@ -7162,7 +7162,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle 
*trans,
   root->root_key.objectid,
   btrfs_header_level(buf), 0,
   BTRFS_DROP_DELAYED_REF);
-   ret = btrfs_add_delayed_tree_ref(fs_info, trans, buf->start,
+   ret = btrfs_add_delayed_tree_ref(trans, buf->start,
 buf->len, parent,
 root->root_key.objectid,
 btrfs_header_level(buf),
@@ -7241,7 +7241,7 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans,
old_ref_mod = new_ref_mod = 0;
ret = 0;
} else if (owner < BTRFS_FIRST_FREE_OBJECTID) {
-   ret = btrfs_add_delayed_tree_ref(fs_info, trans, bytenr,
+   ret = btrfs_add_delayed_tree_ref(trans, bytenr,
 num_bytes, parent,
 root_objectid, (int)owner,
 BTRFS_DROP_DELAYED_REF, NULL,
@@ -8457,7 +8457,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct 
btrfs_trans_handle *trans,
btrfs_ref_tree_mod(root, ins.objectid, ins.offset, parent,
   root_objectid, level, 0,
   BTRFS_ADD_DELAYED_EXTENT);
-   ret = btrfs_add_delayed_tree_ref(fs_info, trans, ins.objectid,
+   ret = btrfs_add_delayed_tree_ref(trans, ins.objectid,
 ins.offset, parent,
 root_objectid, level,
 BTRFS_ADD_DELAYED_EXTENT,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 20/34] btrfs: Remove unused fs_info from cleanup_extent_op

2018-06-20 Thread Nikolay Borisov

The argument is no longer used so remove it.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f13b49497f67..da0c222c013d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2512,7 +2512,6 @@ static void unselect_delayed_ref_head(struct 
btrfs_delayed_ref_root *delayed_ref
 }
 
 static int cleanup_extent_op(struct btrfs_trans_handle *trans,
-struct btrfs_fs_info *fs_info,
 struct btrfs_delayed_ref_head *head)
 {
struct btrfs_delayed_extent_op *extent_op = head->extent_op;
@@ -2540,7 +2539,7 @@ static int cleanup_ref_head(struct btrfs_trans_handle 
*trans,
 
delayed_refs = &trans->transaction->delayed_refs;
 
-   ret = cleanup_extent_op(trans, fs_info, head);
+   ret = cleanup_extent_op(trans, head);
if (ret < 0) {
unselect_delayed_ref_head(delayed_refs, head);
btrfs_debug(fs_info, "run_delayed_extent_op returned %d", ret);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/34] btrfs: Remove fs_info from btrfs_remove_block_group

2018-06-20 Thread Nikolay Borisov

This function is always called with a valid transaction handle from
where we can reference fs_info. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h   | 3 +--
 fs/btrfs/extent-tree.c | 4 ++--
 fs/btrfs/volumes.c | 2 +-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 907c14786680..f6b37911f41d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2723,8 +2723,7 @@ struct btrfs_trans_handle 
*btrfs_start_trans_remove_block_group(
struct btrfs_fs_info *fs_info,
const u64 chunk_offset);
 int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
-struct btrfs_fs_info *fs_info, u64 group_start,
-struct extent_map *em);
+u64 group_start, struct extent_map *em);
 void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info);
 void btrfs_get_block_group_trimming(struct btrfs_block_group_cache *cache);
 void btrfs_put_block_group_trimming(struct btrfs_block_group_cache *cache);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 95da427a66ea..93dc421723ed 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10300,9 +10300,9 @@ static void clear_avail_alloc_bits(struct btrfs_fs_info 
*fs_info, u64 flags)
 }
 
 int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
-struct btrfs_fs_info *fs_info, u64 group_start,
-struct extent_map *em)
+u64 group_start, struct extent_map *em)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_root *root = fs_info->extent_root;
struct btrfs_path *path;
struct btrfs_block_group_cache *block_group;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4f376463fdd4..fefff9405884 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2883,7 +2883,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans,
}
}
 
-   ret = btrfs_remove_block_group(trans, fs_info, chunk_offset, em);
+   ret = btrfs_remove_block_group(trans, chunk_offset, em);
if (ret) {
btrfs_abort_transaction(trans, ret);
goto out;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/34] btrfs: Remove fs_info from btrfs_add_delayed_data_ref

2018-06-20 Thread Nikolay Borisov

This function is always called with a valid transaction handle from
where fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/delayed-ref.c | 4 ++--
 fs/btrfs/delayed-ref.h | 3 +--
 fs/btrfs/extent-tree.c | 7 +++
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 82ac1273c65f..6eb00eb65d76 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -791,13 +791,13 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle 
*trans,
 /*
  * add a delayed data ref. it's similar to btrfs_add_delayed_tree_ref.
  */
-int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
-  struct btrfs_trans_handle *trans,
+int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans,
   u64 bytenr, u64 num_bytes,
   u64 parent, u64 ref_root,
   u64 owner, u64 offset, u64 reserved, int action,
   int *old_ref_mod, int *new_ref_mod)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_delayed_data_ref *ref;
struct btrfs_delayed_ref_head *head_ref;
struct btrfs_delayed_ref_root *delayed_refs;
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 31729302c827..d9f2a4ebd5db 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -239,8 +239,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle 
*trans,
   u64 ref_root, int level, int action,
   struct btrfs_delayed_extent_op *extent_op,
   int *old_ref_mod, int *new_ref_mod);
-int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
-  struct btrfs_trans_handle *trans,
+int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans,
   u64 bytenr, u64 num_bytes,
   u64 parent, u64 ref_root,
   u64 owner, u64 offset, u64 reserved, int action,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index c9b6bfe001d1..cf54cd48c75d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2182,7 +2182,7 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 BTRFS_ADD_DELAYED_REF, NULL,
 &old_ref_mod, &new_ref_mod);
} else {
-   ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
+   ret = btrfs_add_delayed_data_ref(trans, bytenr,
 num_bytes, parent,
 root_objectid, owner, offset,
 0, BTRFS_ADD_DELAYED_REF,
@@ -7247,7 +7247,7 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans,
 BTRFS_DROP_DELAYED_REF, NULL,
 &old_ref_mod, &new_ref_mod);
} else {
-   ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
+   ret = btrfs_add_delayed_data_ref(trans, bytenr,
 num_bytes, parent,
 root_objectid, owner, offset,
 0, BTRFS_DROP_DELAYED_REF,
@@ -8221,7 +8221,6 @@ int btrfs_alloc_reserved_file_extent(struct 
btrfs_trans_handle *trans,
 u64 offset, u64 ram_bytes,
 struct btrfs_key *ins)
 {
-   struct btrfs_fs_info *fs_info = root->fs_info;
int ret;
 
BUG_ON(root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID);
@@ -8230,7 +8229,7 @@ int btrfs_alloc_reserved_file_extent(struct 
btrfs_trans_handle *trans,
   root->root_key.objectid, owner, offset,
   BTRFS_ADD_DELAYED_EXTENT);
 
-   ret = btrfs_add_delayed_data_ref(fs_info, trans, ins->objectid,
+   ret = btrfs_add_delayed_data_ref(trans, ins->objectid,
 ins->offset, 0,
 root->root_key.objectid, owner,
 offset, ram_bytes,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 22/34] btrfs: Remove fs_info from run_delayed_tree_ref

2018-06-20 Thread Nikolay Borisov

It can always be referneced from the passed transaction handle since
it's always valid. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 95361d4195bc..229cb3a229b0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2409,7 +2409,6 @@ static int run_delayed_extent_op(struct 
btrfs_trans_handle *trans,
 }
 
 static int run_delayed_tree_ref(struct btrfs_trans_handle *trans,
-   struct btrfs_fs_info *fs_info,
struct btrfs_delayed_ref_node *node,
struct btrfs_delayed_extent_op *extent_op,
int insert_reserved)
@@ -2420,14 +2419,14 @@ static int run_delayed_tree_ref(struct 
btrfs_trans_handle *trans,
u64 ref_root = 0;
 
ref = btrfs_delayed_node_to_tree_ref(node);
-   trace_run_delayed_tree_ref(fs_info, node, ref, node->action);
+   trace_run_delayed_tree_ref(trans->fs_info, node, ref, node->action);
 
if (node->type == BTRFS_SHARED_BLOCK_REF_KEY)
parent = ref->parent;
ref_root = ref->root;
 
if (node->ref_mod != 1) {
-   btrfs_err(fs_info,
+   btrfs_err(trans->fs_info,
"btree block(%llu) has %d references rather than 1: action %d ref_root 
%llu parent %llu",
  node->bytenr, node->ref_mod, node->action, ref_root,
  parent);
@@ -2466,7 +2465,7 @@ static int run_one_delayed_ref(struct btrfs_trans_handle 
*trans,
 
if (node->type == BTRFS_TREE_BLOCK_REF_KEY ||
node->type == BTRFS_SHARED_BLOCK_REF_KEY)
-   ret = run_delayed_tree_ref(trans, fs_info, node, extent_op,
+   ret = run_delayed_tree_ref(trans, node, extent_op,
   insert_reserved);
else if (node->type == BTRFS_EXTENT_DATA_REF_KEY ||
 node->type == BTRFS_SHARED_DATA_REF_KEY)
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 21/34] btrfs: Remove fs_info from cleanup_ref_head

2018-06-20 Thread Nikolay Borisov

fs_info can be refenreced from the transaction handle, since it's always
valid. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index da0c222c013d..95361d4195bc 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2531,9 +2531,10 @@ static int cleanup_extent_op(struct btrfs_trans_handle 
*trans,
 }
 
 static int cleanup_ref_head(struct btrfs_trans_handle *trans,
-   struct btrfs_fs_info *fs_info,
struct btrfs_delayed_ref_head *head)
 {
+
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_delayed_ref_root *delayed_refs;
int ret;
 
@@ -2688,7 +2689,7 @@ static noinline int __btrfs_run_delayed_refs(struct 
btrfs_trans_handle *trans,
 * up and move on to the next ref_head.
 */
if (!ref) {
-   ret = cleanup_ref_head(trans, fs_info, locked_ref);
+   ret = cleanup_ref_head(trans, locked_ref);
if (ret > 0 ) {
/* We dropped our lock, we need to loop. */
ret = 0;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 23/34] btrfs: Remove fs_info from do_chunk_alloc

2018-06-20 Thread Nikolay Borisov

This function is always called with a valid transaction handle from
where fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 229cb3a229b0..4c45d68179d4 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -66,8 +66,7 @@ static int alloc_reserved_file_extent(struct 
btrfs_trans_handle *trans,
 static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
 struct btrfs_delayed_ref_node *node,
 struct btrfs_delayed_extent_op *extent_op);
-static int do_chunk_alloc(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info, u64 flags,
+static int do_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
  int force);
 static int find_next_key(struct btrfs_path *path, int level,
 struct btrfs_key *key);
@@ -4272,7 +4271,7 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode 
*inode, u64 bytes)
if (IS_ERR(trans))
return PTR_ERR(trans);
 
-   ret = do_chunk_alloc(trans, fs_info, alloc_target,
+   ret = do_chunk_alloc(trans, alloc_target,
 CHUNK_ALLOC_NO_FORCE);
btrfs_end_transaction(trans);
if (ret < 0) {
@@ -4556,9 +4555,10 @@ void check_system_chunk(struct btrfs_trans_handle *trans,
  *- return 1 if it successfully allocates a chunk,
  *- return errors including -ENOSPC otherwise.
  */
-static int do_chunk_alloc(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info, u64 flags, int force)
+static int do_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
+ int force)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_space_info *space_info;
int wait_for_alloc = 0;
int ret = 0;
@@ -4967,7 +4967,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
ret = PTR_ERR(trans);
break;
}
-   ret = do_chunk_alloc(trans, fs_info,
+   ret = do_chunk_alloc(trans,
 btrfs_metadata_alloc_profile(fs_info),
 CHUNK_ALLOC_NO_FORCE);
btrfs_end_transaction(trans);
@@ -7808,8 +7808,7 @@ static noinline int find_free_extent(struct btrfs_fs_info 
*fs_info,
goto out;
}
 
-   ret = do_chunk_alloc(trans, fs_info, flags,
-CHUNK_ALLOC_FORCE);
+   ret = do_chunk_alloc(trans, flags, CHUNK_ALLOC_FORCE);
 
/*
 * If we can't allocate a new chunk we've already looped
@@ -9435,7 +9434,7 @@ int btrfs_inc_block_group_ro(struct btrfs_fs_info 
*fs_info,
 */
alloc_flags = update_block_group_flags(fs_info, cache->flags);
if (alloc_flags != cache->flags) {
-   ret = do_chunk_alloc(trans, fs_info, alloc_flags,
+   ret = do_chunk_alloc(trans, alloc_flags,
 CHUNK_ALLOC_FORCE);
/*
 * ENOSPC is allowed here, we may have enough space
@@ -9452,8 +9451,7 @@ int btrfs_inc_block_group_ro(struct btrfs_fs_info 
*fs_info,
if (!ret)
goto out;
alloc_flags = get_alloc_profile(fs_info, cache->space_info->flags);
-   ret = do_chunk_alloc(trans, fs_info, alloc_flags,
-CHUNK_ALLOC_FORCE);
+   ret = do_chunk_alloc(trans, alloc_flags, CHUNK_ALLOC_FORCE);
if (ret < 0)
goto out;
ret = inc_block_group_ro(cache, 0);
@@ -9475,7 +9473,7 @@ int btrfs_force_chunk_alloc(struct btrfs_trans_handle 
*trans,
 {
u64 alloc_flags = get_alloc_profile(fs_info, type);
 
-   return do_chunk_alloc(trans, fs_info, alloc_flags, CHUNK_ALLOC_FORCE);
+   return do_chunk_alloc(trans, alloc_flags, CHUNK_ALLOC_FORCE);
 }
 
 /*
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 33/34] btrfs: Remove fs_info from btrfs_force_chunk_alloc

2018-06-20 Thread Nikolay Borisov

It can be referenced from the passed transaction handle.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h   | 3 +--
 fs/btrfs/extent-tree.c | 5 ++---
 fs/btrfs/relocation.c  | 3 +--
 fs/btrfs/volumes.c | 2 +-
 4 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index caeca7592013..2062b393e3fd 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2809,8 +2809,7 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info 
*fs_info,
   u64 start, u64 end);
 int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
 u64 num_bytes, u64 *actual_bytes);
-int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans,
-   struct btrfs_fs_info *fs_info, u64 type);
+int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, u64 type);
 int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range);
 
 int btrfs_init_space_info(struct btrfs_fs_info *fs_info);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d6f82a5c79d3..c3c3e6f3b72c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9463,10 +9463,9 @@ int btrfs_inc_block_group_ro(struct 
btrfs_block_group_cache *cache)
return ret;
 }
 
-int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans,
-   struct btrfs_fs_info *fs_info, u64 type)
+int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, u64 type)
 {
-   u64 alloc_flags = get_alloc_profile(fs_info, type);
+   u64 alloc_flags = get_alloc_profile(trans->fs_info, type);
 
return do_chunk_alloc(trans, alloc_flags, CHUNK_ALLOC_FORCE);
 }
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index ef1b5aad035e..22214033a4a2 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4169,8 +4169,7 @@ static noinline_for_stack int relocate_block_group(struct 
reloc_control *rc)
}
}
if (trans && progress && err == -ENOSPC) {
-   ret = btrfs_force_chunk_alloc(trans, fs_info,
- rc->block_group->flags);
+   ret = btrfs_force_chunk_alloc(trans, rc->block_group->flags);
if (ret == 1) {
err = 0;
progress = 0;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 36225e5b16e5..5bd6f3a40f9c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3057,7 +3057,7 @@ static int btrfs_may_alloc_data_chunk(struct 
btrfs_fs_info *fs_info,
if (IS_ERR(trans))
return PTR_ERR(trans);
 
-   ret = btrfs_force_chunk_alloc(trans, fs_info,
+   ret = btrfs_force_chunk_alloc(trans,
  BTRFS_BLOCK_GROUP_DATA);
btrfs_end_transaction(trans);
if (ret < 0)
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 25/34] btrfs: Remove fs_info from check_system_chunk

2018-06-20 Thread Nikolay Borisov

It can be referenced from trans since the function is always called
within a transaction

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h   | 3 +--
 fs/btrfs/extent-tree.c | 8 
 fs/btrfs/volumes.c | 2 +-
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f6b37911f41d..6498abf8ba36 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2821,8 +2821,7 @@ int btrfs_delayed_refs_qgroup_accounting(struct 
btrfs_trans_handle *trans,
 int btrfs_start_write_no_snapshotting(struct btrfs_root *root);
 void btrfs_end_write_no_snapshotting(struct btrfs_root *root);
 void btrfs_wait_for_snapshot_creation(struct btrfs_root *root);
-void check_system_chunk(struct btrfs_trans_handle *trans,
-   struct btrfs_fs_info *fs_info, const u64 type);
+void check_system_chunk(struct btrfs_trans_handle *trans, const u64 type);
 u64 add_new_free_space(struct btrfs_block_group_cache *block_group,
   u64 start, u64 end);
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ed38ccf1607c..57884cd72225 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4493,9 +4493,9 @@ static u64 get_profile_num_devs(struct btrfs_fs_info 
*fs_info, u64 type)
  * for allocating a chunk, otherwise if it's false, reserve space necessary for
  * removing a chunk.
  */
-void check_system_chunk(struct btrfs_trans_handle *trans,
-   struct btrfs_fs_info *fs_info, u64 type)
+void check_system_chunk(struct btrfs_trans_handle *trans, u64 type)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_space_info *info;
u64 left;
u64 thresh;
@@ -4634,7 +4634,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans, u64 flags,
 * Check if we have enough space in SYSTEM chunk because we may need
 * to update devices.
 */
-   check_system_chunk(trans, fs_info, flags);
+   check_system_chunk(trans, flags);
 
ret = btrfs_alloc_chunk(trans, flags);
trans->allocating_chunk = false;
@@ -9459,7 +9459,7 @@ int btrfs_inc_block_group_ro(struct btrfs_fs_info 
*fs_info,
if (cache->flags & BTRFS_BLOCK_GROUP_SYSTEM) {
alloc_flags = update_block_group_flags(fs_info, cache->flags);
mutex_lock(&fs_info->chunk_mutex);
-   check_system_chunk(trans, fs_info, alloc_flags);
+   check_system_chunk(trans, alloc_flags);
mutex_unlock(&fs_info->chunk_mutex);
}
mutex_unlock(&fs_info->ro_block_group_mutex);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4a444aa82ac3..36225e5b16e5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2827,7 +2827,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans,
}
map = em->map_lookup;
mutex_lock(&fs_info->chunk_mutex);
-   check_system_chunk(trans, fs_info, map->type);
+   check_system_chunk(trans, map->type);
mutex_unlock(&fs_info->chunk_mutex);
 
/*
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 30/34] btrfs: Remove fs_info from remove_extent_backref

2018-06-20 Thread Nikolay Borisov

It can be referenced directly from the transaction handle since it's
always valid

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b03fe240da97..47b80297dbc2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1998,7 +1998,6 @@ static int insert_extent_backref(struct 
btrfs_trans_handle *trans,
 }
 
 static int remove_extent_backref(struct btrfs_trans_handle *trans,
-struct btrfs_fs_info *fs_info,
 struct btrfs_path *path,
 struct btrfs_extent_inline_ref *iref,
 int refs_to_drop, int is_data, int *last_ref)
@@ -2014,7 +2013,7 @@ static int remove_extent_backref(struct 
btrfs_trans_handle *trans,
 last_ref);
} else {
*last_ref = 1;
-   ret = btrfs_del_item(trans, fs_info->extent_root, path);
+   ret = btrfs_del_item(trans, trans->fs_info->extent_root, path);
}
return ret;
 }
@@ -6862,7 +6861,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle 
*trans,
 #endif
if (!found_extent) {
BUG_ON(iref);
-   ret = remove_extent_backref(trans, info, path, NULL,
+   ret = remove_extent_backref(trans, path, NULL,
refs_to_drop,
is_data, &last_ref);
if (ret) {
@@ -7006,9 +7005,9 @@ static int __btrfs_free_extent(struct btrfs_trans_handle 
*trans,
btrfs_mark_buffer_dirty(leaf);
}
if (found_extent) {
-   ret = remove_extent_backref(trans, info, path,
-   iref, refs_to_drop,
-   is_data, &last_ref);
+   ret = remove_extent_backref(trans, path, iref,
+   refs_to_drop, is_data,
+   &last_ref);
if (ret) {
btrfs_abort_transaction(trans, ret);
goto out;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 32/34] btrfs: Remove fs_info from btrfs_inc_block_group_ro

2018-06-20 Thread Nikolay Borisov

It can be referenced from the passed bg cache.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h   | 3 +--
 fs/btrfs/extent-tree.c | 4 ++--
 fs/btrfs/relocation.c  | 2 +-
 fs/btrfs/scrub.c   | 2 +-
 4 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index eba0b1530843..caeca7592013 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2801,8 +2801,7 @@ int btrfs_cond_migrate_bytes(struct btrfs_fs_info 
*fs_info,
 void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
 struct btrfs_block_rsv *block_rsv,
 u64 num_bytes);
-int btrfs_inc_block_group_ro(struct btrfs_fs_info *fs_info,
-struct btrfs_block_group_cache *cache);
+int btrfs_inc_block_group_ro(struct btrfs_block_group_cache *cache);
 void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache);
 void btrfs_put_block_group_cache(struct btrfs_fs_info *info);
 u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 06823f69637b..d6f82a5c79d3 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9392,10 +9392,10 @@ static int inc_block_group_ro(struct 
btrfs_block_group_cache *cache, int force)
return ret;
 }
 
-int btrfs_inc_block_group_ro(struct btrfs_fs_info *fs_info,
-struct btrfs_block_group_cache *cache)
+int btrfs_inc_block_group_ro(struct btrfs_block_group_cache *cache)
 
 {
+   struct btrfs_fs_info *fs_info = cache->fs_info;
struct btrfs_trans_handle *trans;
u64 alloc_flags;
int ret;
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 879b76fa881a..ef1b5aad035e 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4375,7 +4375,7 @@ int btrfs_relocate_block_group(struct btrfs_fs_info 
*fs_info, u64 group_start)
rc->block_group = btrfs_lookup_block_group(fs_info, group_start);
BUG_ON(!rc->block_group);
 
-   ret = btrfs_inc_block_group_ro(fs_info, rc->block_group);
+   ret = btrfs_inc_block_group_ro(rc->block_group);
if (ret) {
err = ret;
goto out;
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 572306036477..7599c4ff950e 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3862,7 +3862,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
 * -> btrfs_scrub_pause()
 */
scrub_pause_on(fs_info);
-   ret = btrfs_inc_block_group_ro(fs_info, cache);
+   ret = btrfs_inc_block_group_ro(cache);
if (!ret && is_dev_replace) {
/*
 * If we are doing a device replace wait for any tasks
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 31/34] btrfs: Remove fs_info from btrfs_alloc_logged_file_extent

2018-06-20 Thread Nikolay Borisov

It can be referenced from trans since the function is always called
within a valid transaction

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h   | 1 -
 fs/btrfs/extent-tree.c | 2 +-
 fs/btrfs/tree-log.c| 1 -
 3 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6498abf8ba36..eba0b1530843 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2676,7 +2676,6 @@ int btrfs_alloc_reserved_file_extent(struct 
btrfs_trans_handle *trans,
 u64 offset, u64 ram_bytes,
 struct btrfs_key *ins);
 int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info,
   u64 root_objectid, u64 owner, u64 offset,
   struct btrfs_key *ins);
 int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, u64 num_bytes,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 47b80297dbc2..06823f69637b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8229,10 +8229,10 @@ int btrfs_alloc_reserved_file_extent(struct 
btrfs_trans_handle *trans,
  * space cache bits as well
  */
 int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info,
   u64 root_objectid, u64 owner, u64 offset,
   struct btrfs_key *ins)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
int ret;
struct btrfs_block_group_cache *block_group;
struct btrfs_space_info *space_info;
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index f8220ec02036..ecdf45465798 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -715,7 +715,6 @@ static noinline int replay_one_extent(struct 
btrfs_trans_handle *trans,
 * allocation tree
 */
ret = btrfs_alloc_logged_file_extent(trans,
-   fs_info,
root->root_key.objectid,
key->objectid, offset, &ins);
if (ret)
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 27/34] btrfs: Remove fs_info from exclude_super_stripes

2018-06-20 Thread Nikolay Borisov

It can be referenced from the passed block group

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 196fd467cfac..d3d61e56b61a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -231,9 +231,9 @@ static void free_excluded_extents(struct 
btrfs_block_group_cache *cache)
  start, end, EXTENT_UPTODATE);
 }
 
-static int exclude_super_stripes(struct btrfs_fs_info *fs_info,
-struct btrfs_block_group_cache *cache)
+static int exclude_super_stripes(struct btrfs_block_group_cache *cache)
 {
+   struct btrfs_fs_info *fs_info = cache->fs_info;
u64 bytenr;
u64 *logical;
int stripe_len;
@@ -10068,7 +10068,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 * info has super bytes accounted for, otherwise we'll think
 * we have more space than we actually do.
 */
-   ret = exclude_super_stripes(info, cache);
+   ret = exclude_super_stripes(cache);
if (ret) {
/*
 * We may have excluded something, so call this just in
@@ -10219,7 +10219,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle 
*trans, u64 bytes_used,
cache->last_byte_to_unpin = (u64)-1;
cache->cached = BTRFS_CACHE_FINISHED;
cache->needs_free_space = 1;
-   ret = exclude_super_stripes(fs_info, cache);
+   ret = exclude_super_stripes(cache);
if (ret) {
/*
 * We may have excluded something, so call this just in
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/34] btrfs: Remove fs_info argument from insert_extent_backref

2018-06-20 Thread Nikolay Borisov

This function is always called with a valid transaction handle from
where fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5fc44c2e3e18..1171ad81e5c5 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1990,7 +1990,6 @@ int insert_inline_extent_backref(struct 
btrfs_trans_handle *trans,
 }
 
 static int insert_extent_backref(struct btrfs_trans_handle *trans,
-struct btrfs_fs_info *fs_info,
 struct btrfs_path *path,
 u64 bytenr, u64 parent, u64 root_objectid,
 u64 owner, u64 offset, int refs_to_add)
@@ -2254,8 +2253,8 @@ static int __btrfs_inc_extent_ref(struct 
btrfs_trans_handle *trans,
path->reada = READA_FORWARD;
path->leave_spinning = 1;
/* now insert the actual backref */
-   ret = insert_extent_backref(trans, fs_info, path, bytenr, parent,
-   root_objectid, owner, offset, refs_to_add);
+   ret = insert_extent_backref(trans, path, bytenr, parent, root_objectid,
+   owner, offset, refs_to_add);
if (ret)
btrfs_abort_transaction(trans, ret);
 out:
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 28/34] btrfs: Remove fs_info from insert_inline_extent_backref

2018-06-20 Thread Nikolay Borisov

It can be referenced from the passed transaction handle, since it'si
always valid.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d3d61e56b61a..08ad7572d025 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1954,7 +1954,6 @@ void update_inline_extent_backref(struct btrfs_path *path,
 
 static noinline_for_stack
 int insert_inline_extent_backref(struct btrfs_trans_handle *trans,
-struct btrfs_fs_info *fs_info,
 struct btrfs_path *path,
 u64 bytenr, u64 num_bytes, u64 parent,
 u64 root_objectid, u64 owner,
@@ -1972,7 +1971,7 @@ int insert_inline_extent_backref(struct 
btrfs_trans_handle *trans,
update_inline_extent_backref(path, iref, refs_to_add,
 extent_op, NULL);
} else if (ret == -ENOENT) {
-   setup_inline_extent_backref(fs_info, path, iref, parent,
+   setup_inline_extent_backref(trans->fs_info, path, iref, parent,
root_objectid, owner, offset,
refs_to_add, extent_op);
ret = 0;
@@ -2201,7 +2200,6 @@ static int __btrfs_inc_extent_ref(struct 
btrfs_trans_handle *trans,
  u64 owner, u64 offset, int refs_to_add,
  struct btrfs_delayed_extent_op *extent_op)
 {
-   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_path *path;
struct extent_buffer *leaf;
struct btrfs_extent_item *item;
@@ -2218,10 +2216,9 @@ static int __btrfs_inc_extent_ref(struct 
btrfs_trans_handle *trans,
path->reada = READA_FORWARD;
path->leave_spinning = 1;
/* this will setup the path even if it fails to insert the back ref */
-   ret = insert_inline_extent_backref(trans, fs_info, path, bytenr,
-  num_bytes, parent, root_objectid,
-  owner, offset,
-  refs_to_add, extent_op);
+   ret = insert_inline_extent_backref(trans, path, bytenr, num_bytes,
+  parent, root_objectid, owner,
+  offset, refs_to_add, extent_op);
if ((ret < 0 && ret != -EAGAIN) || !ret)
goto out;
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 29/34] btrfs: Remove fs_info from run_one_delayed_ref

2018-06-20 Thread Nikolay Borisov

It can be referenced from the passed transaction handle, since it's
always valid

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 08ad7572d025..b03fe240da97 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2445,7 +2445,6 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle 
*trans,
 
 /* helper function to actually process a single delayed ref entry */
 static int run_one_delayed_ref(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info,
   struct btrfs_delayed_ref_node *node,
   struct btrfs_delayed_extent_op *extent_op,
   int insert_reserved)
@@ -2454,7 +2453,7 @@ static int run_one_delayed_ref(struct btrfs_trans_handle 
*trans,
 
if (trans->aborted) {
if (insert_reserved)
-   btrfs_pin_extent(fs_info, node->bytenr,
+   btrfs_pin_extent(trans->fs_info, node->bytenr,
 node->num_bytes, 1);
return 0;
}
@@ -2731,7 +2730,7 @@ static noinline int __btrfs_run_delayed_refs(struct 
btrfs_trans_handle *trans,
locked_ref->extent_op = NULL;
spin_unlock(&locked_ref->lock);
 
-   ret = run_one_delayed_ref(trans, fs_info, ref, extent_op,
+   ret = run_one_delayed_ref(trans, ref, extent_op,
  must_insert_reserved);
 
btrfs_free_delayed_extent_op(extent_op);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 17/34] btrfs: Remove fs_info argument from __btrfs_inc_extent_ref

2018-06-20 Thread Nikolay Borisov

This function already takes a transaction which holds a reference to
the fs_info struct. Use that reference and remove the extra arg. No
functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 449ebd1bb8c9..b72a6d7d84c3 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2197,12 +2197,12 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle 
*trans,
 }
 
 static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info,
  struct btrfs_delayed_ref_node *node,
  u64 parent, u64 root_objectid,
  u64 owner, u64 offset, int refs_to_add,
  struct btrfs_delayed_extent_op *extent_op)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_path *path;
struct extent_buffer *leaf;
struct btrfs_extent_item *item;
@@ -2286,10 +2286,9 @@ static int run_delayed_data_ref(struct 
btrfs_trans_handle *trans,
 ref->offset, &ins,
 node->ref_mod);
} else if (node->action == BTRFS_ADD_DELAYED_REF) {
-   ret = __btrfs_inc_extent_ref(trans, fs_info, node, parent,
-ref_root, ref->objectid,
-ref->offset, node->ref_mod,
-extent_op);
+   ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root,
+ref->objectid, ref->offset,
+node->ref_mod, extent_op);
} else if (node->action == BTRFS_DROP_DELAYED_REF) {
ret = __btrfs_free_extent(trans, node, parent,
  ref_root, ref->objectid,
@@ -2439,10 +2438,8 @@ static int run_delayed_tree_ref(struct 
btrfs_trans_handle *trans,
BUG_ON(!extent_op || !extent_op->update_flags);
ret = alloc_reserved_tree_block(trans, node, extent_op);
} else if (node->action == BTRFS_ADD_DELAYED_REF) {
-   ret = __btrfs_inc_extent_ref(trans, fs_info, node,
-parent, ref_root,
-ref->level, 0, 1,
-extent_op);
+   ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root,
+ref->level, 0, 1, extent_op);
} else if (node->action == BTRFS_DROP_DELAYED_REF) {
ret = __btrfs_free_extent(trans, node, parent, ref_root,
  ref->level, 0, 1, extent_op);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/34] fs_info cleanup of extent-tree.c

2018-06-20 Thread Nikolay Borisov

Hello, 

This series aims at removing all the redundant btrfs_fs_info args being 
passed to functions in extent-tree.c. Each patch removes the arg from a 
one function hence it should be fairly easy to review each one of those 
patches. I'm mainly exploiting the fact that most of the time we have a 
function which takes a transaction handle, which is always valid (ie can't be 
null) and at the same time we are passing an fs_info. The former actually 
contains a reference to the fs info so can be referenced directly from the 
transaction. Additionally, 2 patches also exploit the fact that block group 
cache structs also hold a reference to fs_info so there is no point in 
passing it there as well. 

To spice things up a bit, here is the output of stackdelta before/after the 
patch set is applied: 

./fs/btrfs/extent-tree.c__btrfs_inc_extent_ref  152 144 -8
./fs/btrfs/extent-tree.c__btrfs_run_delayed_refs256 248 
-8
./fs/btrfs/extent-tree.calloc_reserved_file_extent  128 136 
+8
./fs/btrfs/extent-tree.cbtrfs_alloc_logged_file_extent  104 88  
-16
./fs/btrfs/extent-tree.cbtrfs_alloc_reserved_file_extent56  
48  -8
./fs/btrfs/extent-tree.cbtrfs_alloc_tree_block  176 168 -8
./fs/btrfs/extent-tree.cbtrfs_force_chunk_alloc 24  16  -8
./fs/btrfs/extent-tree.cbtrfs_free_extent   104 96  -8
./fs/btrfs/extent-tree.cbtrfs_free_tree_block   112 104 -8
./fs/btrfs/extent-tree.cbtrfs_inc_block_group_ro56  48  
-8
./fs/btrfs/extent-tree.cbtrfs_inc_extent_ref112 104 -8
./fs/btrfs/extent-tree.ccaching_thread  216 208 -8
./fs/btrfs/extent-tree.cconvert_extent_item_v0  120 112 -8
./fs/btrfs/extent-tree.cinsert_inline_extent_backref120 112 
-8
./fs/btrfs/extent-tree.clookup_inline_extent_backref176 184 
+8
./fs/btrfs/extent-tree.cremove_extent_data_ref  104 96  -8

Also the output of bloat-o-meter : 

add/remove: 5/5 grow/shrink: 6/24 up/down: 2275/-2554 (-279)
Function old new   delta
insert_extent_data_ref - 738+738
lookup_extent_data_ref - 613+613
remove_extent_data_ref - 535+535
lookup_tree_block_ref  - 227+227
insert_tree_block_ref  - 139+139
btrfs_inc_extent_ref 235 242  +7
btrfs_make_block_group   831 837  +6
update_inline_extent_backref 681 685  +4
exclude_super_stripes356 360  +4
free_excluded_extents 95  96  +1
alloc_reserved_file_extent   954 955  +1
check_system_chunk   362 361  -1
insert_inline_extent_backref 224 221  -3
flush_space 16911688  -3
cache_block_group   11321129  -3
btrfs_free_block_groups 11401137  -3
btrfs_alloc_tree_block  10241021  -3
remove_extent_backref104 100  -4
find_free_extent54365431  -5
convert_extent_item_v0   735 730  -5
do_chunk_alloc   846 838  -8
btrfs_remove_block_group28052797  -8
btrfs_free_extent306 298  -8
btrfs_alloc_data_chunk_ondemand 12421234  -8
btrfs_free_tree_block862 853  -9
btrfs_force_chunk_alloc   45  35 -10
btrfs_read_block_groups 22452233 -12
btrfs_alloc_logged_file_extent   249 237 -12
btrfs_alloc_reserved_file_extent  70  57 -13
btrfs_inc_block_group_ro 352 338 -14
lookup_inline_extent_backref15321516 -16
caching_thread  14651446 -19
__btrfs_run_delayed_refs54695446 -23
__btrfs_inc_extent_ref.isra  608 566 -42
__btrfs_free_extent.isra31363082 -54
insert_tree_block_ref.isra   131   --131
lookup_tree_block_ref.isra   216   --216
remove_extent_data_ref.isra  543   --543
lookup_extent_data_ref.isra  649   --649
insert_extent_data_ref.isra  729   --729
Total: Before=91208, After=90929, chg -0.31%

Overall this series is a win both in code density as well as stack usage and 
brings us closer to completely elimina

[PATCH 34/34] btrfs: Remove fs_info from convert_extent_item_v0

2018-06-20 Thread Nikolay Borisov

It can be referenced from trans since the function is always called
within a transaction

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index c3c3e6f3b72c..9c0e15b057a0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1038,10 +1038,10 @@ int btrfs_lookup_extent_info(struct btrfs_trans_handle 
*trans,
 
 #ifdef BTRFS_COMPAT_EXTENT_TREE_V0
 static int convert_extent_item_v0(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info,
  struct btrfs_path *path,
  u64 owner, u32 extra_size)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_root *root = fs_info->extent_root;
struct btrfs_extent_item *item;
struct btrfs_extent_item_v0 *ei0;
@@ -1682,8 +1682,7 @@ int lookup_inline_extent_backref(struct 
btrfs_trans_handle *trans,
err = -ENOENT;
goto out;
}
-   ret = convert_extent_item_v0(trans, fs_info, path, owner,
-extra_size);
+   ret = convert_extent_item_v0(trans, path, owner, extra_size);
if (ret < 0) {
err = ret;
goto out;
@@ -2384,7 +2383,7 @@ static int run_delayed_extent_op(struct 
btrfs_trans_handle *trans,
item_size = btrfs_item_size_nr(leaf, path->slots[0]);
 #ifdef BTRFS_COMPAT_EXTENT_TREE_V0
if (item_size < sizeof(*ei)) {
-   ret = convert_extent_item_v0(trans, fs_info, path, (u64)-1, 0);
+   ret = convert_extent_item_v0(trans, path, (u64)-1, 0);
if (ret < 0) {
err = ret;
goto out;
@@ -6937,8 +6936,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle 
*trans,
 #ifdef BTRFS_COMPAT_EXTENT_TREE_V0
if (item_size < sizeof(*ei)) {
BUG_ON(found_extent || extent_slot != path->slots[0]);
-   ret = convert_extent_item_v0(trans, info, path, owner_objectid,
-0);
+   ret = convert_extent_item_v0(trans, path, owner_objectid, 0);
if (ret < 0) {
btrfs_abort_transaction(trans, ret);
goto out;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 26/34] btrfs: Remove fs_info from free_excluded_extents

2018-06-20 Thread Nikolay Borisov

It can be referenced from the passed block group

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 57884cd72225..196fd467cfac 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -217,9 +217,9 @@ static int add_excluded_extent(struct btrfs_fs_info 
*fs_info,
return 0;
 }
 
-static void free_excluded_extents(struct btrfs_fs_info *fs_info,
- struct btrfs_block_group_cache *cache)
+static void free_excluded_extents(struct btrfs_block_group_cache *cache)
 {
+   struct btrfs_fs_info *fs_info = cache->fs_info;
u64 start, end;
 
start = cache->key.objectid;
@@ -555,7 +555,7 @@ static noinline void caching_thread(struct btrfs_work *work)
caching_ctl->progress = (u64)-1;
 
up_read(&fs_info->commit_root_sem);
-   free_excluded_extents(fs_info, block_group);
+   free_excluded_extents(block_group);
mutex_unlock(&caching_ctl->mutex);
 
wake_up(&caching_ctl->wait);
@@ -663,7 +663,7 @@ static int cache_block_group(struct btrfs_block_group_cache 
*cache,
wake_up(&caching_ctl->wait);
if (ret == 1) {
put_caching_control(caching_ctl);
-   free_excluded_extents(fs_info, cache);
+   free_excluded_extents(cache);
return 0;
}
} else {
@@ -9826,7 +9826,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
 */
if (block_group->cached == BTRFS_CACHE_NO ||
block_group->cached == BTRFS_CACHE_ERROR)
-   free_excluded_extents(info, block_group);
+   free_excluded_extents(block_group);
 
btrfs_remove_free_space_cache(block_group);
ASSERT(block_group->cached != BTRFS_CACHE_STARTED);
@@ -10074,7 +10074,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 * We may have excluded something, so call this just in
 * case.
 */
-   free_excluded_extents(info, cache);
+   free_excluded_extents(cache);
btrfs_put_block_group(cache);
goto error;
}
@@ -10089,14 +10089,14 @@ int btrfs_read_block_groups(struct btrfs_fs_info 
*info)
if (found_key.offset == btrfs_block_group_used(&cache->item)) {
cache->last_byte_to_unpin = (u64)-1;
cache->cached = BTRFS_CACHE_FINISHED;
-   free_excluded_extents(info, cache);
+   free_excluded_extents(cache);
} else if (btrfs_block_group_used(&cache->item) == 0) {
cache->last_byte_to_unpin = (u64)-1;
cache->cached = BTRFS_CACHE_FINISHED;
add_new_free_space(cache, found_key.objectid,
   found_key.objectid +
   found_key.offset);
-   free_excluded_extents(info, cache);
+   free_excluded_extents(cache);
}
 
ret = btrfs_add_block_group_cache(info, cache);
@@ -10225,14 +10225,14 @@ int btrfs_make_block_group(struct btrfs_trans_handle 
*trans, u64 bytes_used,
 * We may have excluded something, so call this just in
 * case.
 */
-   free_excluded_extents(fs_info, cache);
+   free_excluded_extents(cache);
btrfs_put_block_group(cache);
return ret;
}
 
add_new_free_space(cache, chunk_offset, chunk_offset + size);
 
-   free_excluded_extents(fs_info, cache);
+   free_excluded_extents(cache);
 
 #ifdef CONFIG_BTRFS_DEBUG
if (btrfs_should_fragment_free_space(cache)) {
@@ -10316,7 +10316,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle 
*trans,
 * Free the reserved super bytes from this block group before
 * remove it.
 */
-   free_excluded_extents(fs_info, block_group);
+   free_excluded_extents(block_group);
btrfs_free_ref_tree_range(fs_info, block_group->key.objectid,
  block_group->key.offset);
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [LKP] [lkp-robot] [mm] 9092c71bb7: blogbench.write_score -12.3% regression

2018-06-20 Thread Chris Mason


On 19 Jun 2018, at 23:51, Huang, Ying wrote:

"Huang, Ying"  writes:


Hi, Josef,

Do you have time to take a look at the regression?

kernel test robot  writes:


Greeting,

FYI, we noticed a -12.3% regression of blogbench.write_score and a 
+9.6% improvement

of blogbench.read_score due to commit:


commit: 9092c71bb724dba2ecba849eae69e5c9d39bd3d2 ("mm: use 
sc->priority for slab shrink targets")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 
master


in testcase: blogbench
on test machine: 16 threads Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz 
with 8G memory

with following parameters:

disk: 1SSD
fs: btrfs
cpufreq_governor: performance

test-description: Blogbench is a portable filesystem benchmark 
that tries to reproduce the load of a real-world busy file server.

test-url:


I'm surprised, this patch is a big win in production here at FB.  I'll 
have to reproduce these results to better understand what is going on.  
My first guess is that since we have fewer inodes in slab, we're reading 
more inodes from disk in order to do the writes.


But that should also make our read scores lower.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RAID56

2018-06-20 Thread Duncan

Gandalf Corvotempesta posted on Wed, 20 Jun 2018 11:15:03 +0200 as
excerpted:

> Il giorno mer 20 giu 2018 alle ore 10:34 Duncan <1i5t5.dun...@cox.net>
> ha scritto:
>> Parity-raid is certainly nice, but mandatory, especially when there's
>> already other parity solutions (both hardware and software) available
>> that btrfs can be run on top of, should a parity-raid solution be
>> /that/ necessary?
> 
> You can't be serious. hw raid as much more flaws than any sw raid.

I didn't say /good/ solutions, I said /other/ solutions.
FWIW, I'd go for mdraid at the lower level, were I to choose, here.

But for a 4-12-ish device solution, I'd probably go btrfs raid1 on a pair 
of mdraid-0s.  That gets you btrfs raid1 data integrity and recovery from 
its other mirror, while also being faster than the still not optimized 
btrfs raid 10.  Beyond about a dozen devices, six per "side" of the btrfs 
raid1, the risk of multi-device breakdown before recovery starts to get 
too high for comfort, but six 8 TB devices in raid0 gives you up to 48 TB 
to work with, and more than that arguably should be broken down into 
smaller blocks to work with in any case, because otherwise you're simply 
dealing with so much data it'll take you unreasonably long to do much of 
anything non-incremental with it, from any sort of fscks or btrfs 
maintenance, to trying to copy or move the data anywhere (including for 
backup/restore purposes), to ... whatever.

Actually, I'd argue that point is reached well before 48 TB, but the 
point remains, at some point it's just too much data to do much of 
anything with, too much to risk losing all at once, too much to backup 
and restore all at once as it just takes too much time to do it, just too 
much...  And that point's well within ordinary raid sizes with a dozen 
devices or less, mirrored, these days.

Which is one of the reasons I'm so skeptical about parity-raid being 
mandatory "nowadays".  Maybe it was in the past, when disks were (say) 
half a TB or less and mirroring a few TB of data was resource-
prohibitive, but now?

Of course we've got a guy here who works with CERN and deals with their 
annual 50ish petabytes of data (49 in 2016, see wikipedia's CERN 
article), but that's simply problems on a different scale.

Even so, I'd say it needs broken up into manageable chunks, and 50 PB is 
"only" a bit over 1000 48 TB filesystems worth.  OK, say 2000, so you're 
not filling them all absolutely full.

Meanwhile, I'm actually an N-way-mirroring proponent, here, as opposed to 
a parity-raid proponent.  And at that sort of scale, you /really/ don't 
want to have to restore from backups, so 3-way or even 4-5 way mirroring 
makes a lot of sense.  Hmm... 2.5 dozen for 5-way-mirroring, 2000 times, 
2.5*12*2000=... 60K devices!  That's a lot of hard drives!  And a lot of 
power to spin them.  But I guess it's a rounding error compared to what 
CERN uses for the LHC.

FWIW, N-way-mirroring has been on the btrfs roadmap, since at least 
kernel 3.6, for "after raid56".  I've been waiting awhile too; no sign of 
it yet so I guess I'll be waiting awhile longer.  So as they say, 
"welcome to the club!"  I'm 51 now.  Maybe I'll see it before I die.  
Imagine, I'm in my 80s in the retirement home and get the news btrfs 
finally has N-way-mirroring in mainline.  I'll be jumping up and down and 
cause a ruckus when I break my hip!  Well, hoping it won't be /that/ 
long, but... =;^]

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3] btrfs: Don't remove block group still has pinned down bytes

2018-06-20 Thread Qu Wenruo



On 2018年06月20日 17:33, Filipe Manana wrote:
> On Wed, Jun 20, 2018 at 10:22 AM, Qu Wenruo  wrote:
>>
>>
>> On 2018年06月20日 17:13, Filipe Manana wrote:
>>> On Fri, Jun 15, 2018 at 2:35 AM, Qu Wenruo  wrote:
 [BUG]
 Under certain KVM load and LTP tests, we are possible to hit the
 following calltrace if quota is enabled:
 --
 BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
 BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
 [ cut here ]
 WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 
 blk_status_to_errno+0x1a/0x30
 CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 
 (unreleased)
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
 1.0.0-prebuilt.qemu-project.org 04/01/2014
 Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
 task: 9f827b340bc0 task.stack: b4f8c0304000
 RIP: 0010:blk_status_to_errno+0x1a/0x30
 Call Trace:
  submit_extent_page+0x191/0x270 [btrfs]
  ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
  __do_readpage+0x2d2/0x810 [btrfs]
  ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
  ? run_one_async_done+0xc0/0xc0 [btrfs]
  __extent_read_full_page+0xe7/0x100 [btrfs]
  ? run_one_async_done+0xc0/0xc0 [btrfs]
  read_extent_buffer_pages+0x1ab/0x2d0 [btrfs]
  ? run_one_async_done+0xc0/0xc0 [btrfs]
  btree_read_extent_buffer_pages+0x94/0xf0 [btrfs]
  read_tree_block+0x31/0x60 [btrfs]
  read_block_for_search.isra.35+0xf0/0x2e0 [btrfs]
  btrfs_search_slot+0x46b/0xa00 [btrfs]
  ? kmem_cache_alloc+0x1a8/0x510
  ? btrfs_get_token_32+0x5b/0x120 [btrfs]
  find_parent_nodes+0x11d/0xeb0 [btrfs]
  ? leaf_space_used+0xb8/0xd0 [btrfs]
  ? btrfs_leaf_free_space+0x49/0x90 [btrfs]
  ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
  btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
  btrfs_find_all_roots+0x45/0x60 [btrfs]
  btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs]
  btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs]
  btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs]
  insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs]
  btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs]
  ? pick_next_task_fair+0x2cd/0x530
  ? __switch_to+0x92/0x4b0
  btrfs_worker_helper+0x81/0x300 [btrfs]
  process_one_work+0x1da/0x3f0
  worker_thread+0x2b/0x3f0
  ? process_one_work+0x3f0/0x3f0
  kthread+0x11a/0x130
  ? kthread_create_on_node+0x40/0x40
  ret_from_fork+0x35/0x40
 Code: 00 00 5b c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 40 
 80 ff 0c 40 0f b6 c7 77 0b 48 c1 e0 04 8b 80 00 bf c8 bd c3 <0f> 0b b8 fb 
 ff ff ff c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00
 ---[ end trace f079fb809e7a862b ]---
 BTRFS critical (device vda2): unable to find logical 8820195328 length 
 16384
 BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO 
 failure
 BTRFS info (device vda2): forced readonly
 BTRFS error (device vda2): pending csums is 2887680
 --

 [CAUSE]
 It's caused by race with block group auto removal like the following
 case:
 - There is a meta block group X, which has only one tree block
   The tree block belongs to fs tree 257.
 - In current transaction, some operation modified fs tree 257
   The tree block get CoWed, so the block group X is empty, and marked as
   unused, queued to be deleted.
 - Some workload (like fsync) wakes up cleaner_kthread()
   Which will call btrfs_deleted_unused_bgs() to remove unused block
   groups.
   So block group X along its chunk map get removed.
 - Some delalloc work finished for fs tree 257
   Quota needs to get the original reference of the extent, which will
   reads tree blocks of commit root of 257.
   Then since the chunk map get removed, above warning get triggered.

 [FIX]
 Just teach btrfs_delete_unused_bgs() to skip block group who still has
 pinned bytes.

 However there is a minor side effect, since currently we only queue
 empty blocks at update_block_group(), and such empty block group with
 pinned bytes won't go through update_block_group() again, such block
 group won't be removed, until it get new extent allocated and removed.
>>>
>>> So that can be fixed in a separate patch, to add it back to the list
>>> of block groups to be deleted once everything is unpinned and passes
>>> all other necessary criteria.
>>
>> That's the plan.
>> Although still something more need to be considered.
>>
>>>

 But please note that, there are more problems related to extent
 allocator with block group auto removal.
>>>
>>> The above isn't a problem of the allocator itself but rather in the
>>> way we manage COW, commit roots and unpinning.
>>>

 Even a block group is mark

Re: [PATCH v3] btrfs: Don't remove block group still has pinned down bytes

2018-06-20 Thread Filipe Manana

On Wed, Jun 20, 2018 at 10:22 AM, Qu Wenruo  wrote:
>
>
> On 2018年06月20日 17:13, Filipe Manana wrote:
>> On Fri, Jun 15, 2018 at 2:35 AM, Qu Wenruo  wrote:
>>> [BUG]
>>> Under certain KVM load and LTP tests, we are possible to hit the
>>> following calltrace if quota is enabled:
>>> --
>>> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
>>> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
>>> [ cut here ]
>>> WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 
>>> blk_status_to_errno+0x1a/0x30
>>> CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 
>>> (unreleased)
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>>> 1.0.0-prebuilt.qemu-project.org 04/01/2014
>>> Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
>>> task: 9f827b340bc0 task.stack: b4f8c0304000
>>> RIP: 0010:blk_status_to_errno+0x1a/0x30
>>> Call Trace:
>>>  submit_extent_page+0x191/0x270 [btrfs]
>>>  ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
>>>  __do_readpage+0x2d2/0x810 [btrfs]
>>>  ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
>>>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>>>  __extent_read_full_page+0xe7/0x100 [btrfs]
>>>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>>>  read_extent_buffer_pages+0x1ab/0x2d0 [btrfs]
>>>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>>>  btree_read_extent_buffer_pages+0x94/0xf0 [btrfs]
>>>  read_tree_block+0x31/0x60 [btrfs]
>>>  read_block_for_search.isra.35+0xf0/0x2e0 [btrfs]
>>>  btrfs_search_slot+0x46b/0xa00 [btrfs]
>>>  ? kmem_cache_alloc+0x1a8/0x510
>>>  ? btrfs_get_token_32+0x5b/0x120 [btrfs]
>>>  find_parent_nodes+0x11d/0xeb0 [btrfs]
>>>  ? leaf_space_used+0xb8/0xd0 [btrfs]
>>>  ? btrfs_leaf_free_space+0x49/0x90 [btrfs]
>>>  ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
>>>  btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
>>>  btrfs_find_all_roots+0x45/0x60 [btrfs]
>>>  btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs]
>>>  btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs]
>>>  btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs]
>>>  insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs]
>>>  btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs]
>>>  ? pick_next_task_fair+0x2cd/0x530
>>>  ? __switch_to+0x92/0x4b0
>>>  btrfs_worker_helper+0x81/0x300 [btrfs]
>>>  process_one_work+0x1da/0x3f0
>>>  worker_thread+0x2b/0x3f0
>>>  ? process_one_work+0x3f0/0x3f0
>>>  kthread+0x11a/0x130
>>>  ? kthread_create_on_node+0x40/0x40
>>>  ret_from_fork+0x35/0x40
>>> Code: 00 00 5b c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 40 
>>> 80 ff 0c 40 0f b6 c7 77 0b 48 c1 e0 04 8b 80 00 bf c8 bd c3 <0f> 0b b8 fb 
>>> ff ff ff c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00
>>> ---[ end trace f079fb809e7a862b ]---
>>> BTRFS critical (device vda2): unable to find logical 8820195328 length 16384
>>> BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO 
>>> failure
>>> BTRFS info (device vda2): forced readonly
>>> BTRFS error (device vda2): pending csums is 2887680
>>> --
>>>
>>> [CAUSE]
>>> It's caused by race with block group auto removal like the following
>>> case:
>>> - There is a meta block group X, which has only one tree block
>>>   The tree block belongs to fs tree 257.
>>> - In current transaction, some operation modified fs tree 257
>>>   The tree block get CoWed, so the block group X is empty, and marked as
>>>   unused, queued to be deleted.
>>> - Some workload (like fsync) wakes up cleaner_kthread()
>>>   Which will call btrfs_deleted_unused_bgs() to remove unused block
>>>   groups.
>>>   So block group X along its chunk map get removed.
>>> - Some delalloc work finished for fs tree 257
>>>   Quota needs to get the original reference of the extent, which will
>>>   reads tree blocks of commit root of 257.
>>>   Then since the chunk map get removed, above warning get triggered.
>>>
>>> [FIX]
>>> Just teach btrfs_delete_unused_bgs() to skip block group who still has
>>> pinned bytes.
>>>
>>> However there is a minor side effect, since currently we only queue
>>> empty blocks at update_block_group(), and such empty block group with
>>> pinned bytes won't go through update_block_group() again, such block
>>> group won't be removed, until it get new extent allocated and removed.
>>
>> So that can be fixed in a separate patch, to add it back to the list
>> of block groups to be deleted once everything is unpinned and passes
>> all other necessary criteria.
>
> That's the plan.
> Although still something more need to be considered.
>
>>
>>>
>>> But please note that, there are more problems related to extent
>>> allocator with block group auto removal.
>>
>> The above isn't a problem of the allocator itself but rather in the
>> way we manage COW, commit roots and unpinning.
>>
>>>
>>> Even a block group is marked unused, extent allocator can still allocate
>>> new extents from unused block group.
>>
>> Why is that a problem?
>> It's ok (with some good benefits), as long

Re: [PATCH v3] btrfs: Don't remove block group still has pinned down bytes

2018-06-20 Thread Qu Wenruo



On 2018年06月20日 17:13, Filipe Manana wrote:
> On Fri, Jun 15, 2018 at 2:35 AM, Qu Wenruo  wrote:
>> [BUG]
>> Under certain KVM load and LTP tests, we are possible to hit the
>> following calltrace if quota is enabled:
>> --
>> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
>> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
>> [ cut here ]
>> WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 
>> blk_status_to_errno+0x1a/0x30
>> CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 
>> (unreleased)
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> 1.0.0-prebuilt.qemu-project.org 04/01/2014
>> Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
>> task: 9f827b340bc0 task.stack: b4f8c0304000
>> RIP: 0010:blk_status_to_errno+0x1a/0x30
>> Call Trace:
>>  submit_extent_page+0x191/0x270 [btrfs]
>>  ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
>>  __do_readpage+0x2d2/0x810 [btrfs]
>>  ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
>>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>>  __extent_read_full_page+0xe7/0x100 [btrfs]
>>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>>  read_extent_buffer_pages+0x1ab/0x2d0 [btrfs]
>>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>>  btree_read_extent_buffer_pages+0x94/0xf0 [btrfs]
>>  read_tree_block+0x31/0x60 [btrfs]
>>  read_block_for_search.isra.35+0xf0/0x2e0 [btrfs]
>>  btrfs_search_slot+0x46b/0xa00 [btrfs]
>>  ? kmem_cache_alloc+0x1a8/0x510
>>  ? btrfs_get_token_32+0x5b/0x120 [btrfs]
>>  find_parent_nodes+0x11d/0xeb0 [btrfs]
>>  ? leaf_space_used+0xb8/0xd0 [btrfs]
>>  ? btrfs_leaf_free_space+0x49/0x90 [btrfs]
>>  ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
>>  btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
>>  btrfs_find_all_roots+0x45/0x60 [btrfs]
>>  btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs]
>>  btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs]
>>  btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs]
>>  insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs]
>>  btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs]
>>  ? pick_next_task_fair+0x2cd/0x530
>>  ? __switch_to+0x92/0x4b0
>>  btrfs_worker_helper+0x81/0x300 [btrfs]
>>  process_one_work+0x1da/0x3f0
>>  worker_thread+0x2b/0x3f0
>>  ? process_one_work+0x3f0/0x3f0
>>  kthread+0x11a/0x130
>>  ? kthread_create_on_node+0x40/0x40
>>  ret_from_fork+0x35/0x40
>> Code: 00 00 5b c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 40 
>> 80 ff 0c 40 0f b6 c7 77 0b 48 c1 e0 04 8b 80 00 bf c8 bd c3 <0f> 0b b8 fb ff 
>> ff ff c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00
>> ---[ end trace f079fb809e7a862b ]---
>> BTRFS critical (device vda2): unable to find logical 8820195328 length 16384
>> BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO 
>> failure
>> BTRFS info (device vda2): forced readonly
>> BTRFS error (device vda2): pending csums is 2887680
>> --
>>
>> [CAUSE]
>> It's caused by race with block group auto removal like the following
>> case:
>> - There is a meta block group X, which has only one tree block
>>   The tree block belongs to fs tree 257.
>> - In current transaction, some operation modified fs tree 257
>>   The tree block get CoWed, so the block group X is empty, and marked as
>>   unused, queued to be deleted.
>> - Some workload (like fsync) wakes up cleaner_kthread()
>>   Which will call btrfs_deleted_unused_bgs() to remove unused block
>>   groups.
>>   So block group X along its chunk map get removed.
>> - Some delalloc work finished for fs tree 257
>>   Quota needs to get the original reference of the extent, which will
>>   reads tree blocks of commit root of 257.
>>   Then since the chunk map get removed, above warning get triggered.
>>
>> [FIX]
>> Just teach btrfs_delete_unused_bgs() to skip block group who still has
>> pinned bytes.
>>
>> However there is a minor side effect, since currently we only queue
>> empty blocks at update_block_group(), and such empty block group with
>> pinned bytes won't go through update_block_group() again, such block
>> group won't be removed, until it get new extent allocated and removed.
> 
> So that can be fixed in a separate patch, to add it back to the list
> of block groups to be deleted once everything is unpinned and passes
> all other necessary criteria.

That's the plan.
Although still something more need to be considered.

> 
>>
>> But please note that, there are more problems related to extent
>> allocator with block group auto removal.
> 
> The above isn't a problem of the allocator itself but rather in the
> way we manage COW, commit roots and unpinning.
> 
>>
>> Even a block group is marked unused, extent allocator can still allocate
>> new extents from unused block group.
> 
> Why is that a problem?
> It's ok (with some good benefits), as long as the cleaner thread (or
> any thing that attempts to delete block groups in the unused list),
> doesn't delete it.

It in fact could cause problem under certain c

Re: RAID56

2018-06-20 Thread Gandalf Corvotempesta

Il giorno mer 20 giu 2018 alle ore 10:34 Duncan <1i5t5.dun...@cox.net>
ha scritto:
> Parity-raid is certainly nice, but mandatory, especially when there's
> already other parity solutions (both hardware and software) available
> that btrfs can be run on top of, should a parity-raid solution be /that/
> necessary?

You can't be serious. hw raid as much more flaws than any sw raid.
Current CPUs are much more performant than any hw raid chipset and
there is no more a performance lost in using a sw raid VS hw raid.

Biggest difference is that you are not locked with a single vendor.
When you have to move disks between servers you can do safely without
having to use the same hw raid controller (with the same firmware). Almost
all raid controller only support one-way upgrades, if your raid was created
with an older model, you can upgrade to a newer one but then it's impossible
to move it back. If you have some issues with the new controller, you can't use
the previous one.
Almost all server vendor doesn't support old-gen controller on new-gen servers
(at lest DELL), so you are forced to upgrade the raid controller when
you have to upgrade
the whole server or move disks between servers. I can continue for
hours, no, you can't
compare any modern software raid to any hw raid.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3] btrfs: Don't remove block group still has pinned down bytes

2018-06-20 Thread Filipe Manana

On Fri, Jun 15, 2018 at 2:35 AM, Qu Wenruo  wrote:
> [BUG]
> Under certain KVM load and LTP tests, we are possible to hit the
> following calltrace if quota is enabled:
> --
> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
> [ cut here ]
> WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 
> blk_status_to_errno+0x1a/0x30
> CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 
> (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.0.0-prebuilt.qemu-project.org 04/01/2014
> Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> task: 9f827b340bc0 task.stack: b4f8c0304000
> RIP: 0010:blk_status_to_errno+0x1a/0x30
> Call Trace:
>  submit_extent_page+0x191/0x270 [btrfs]
>  ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
>  __do_readpage+0x2d2/0x810 [btrfs]
>  ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>  __extent_read_full_page+0xe7/0x100 [btrfs]
>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>  read_extent_buffer_pages+0x1ab/0x2d0 [btrfs]
>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>  btree_read_extent_buffer_pages+0x94/0xf0 [btrfs]
>  read_tree_block+0x31/0x60 [btrfs]
>  read_block_for_search.isra.35+0xf0/0x2e0 [btrfs]
>  btrfs_search_slot+0x46b/0xa00 [btrfs]
>  ? kmem_cache_alloc+0x1a8/0x510
>  ? btrfs_get_token_32+0x5b/0x120 [btrfs]
>  find_parent_nodes+0x11d/0xeb0 [btrfs]
>  ? leaf_space_used+0xb8/0xd0 [btrfs]
>  ? btrfs_leaf_free_space+0x49/0x90 [btrfs]
>  ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
>  btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
>  btrfs_find_all_roots+0x45/0x60 [btrfs]
>  btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs]
>  btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs]
>  btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs]
>  insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs]
>  btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs]
>  ? pick_next_task_fair+0x2cd/0x530
>  ? __switch_to+0x92/0x4b0
>  btrfs_worker_helper+0x81/0x300 [btrfs]
>  process_one_work+0x1da/0x3f0
>  worker_thread+0x2b/0x3f0
>  ? process_one_work+0x3f0/0x3f0
>  kthread+0x11a/0x130
>  ? kthread_create_on_node+0x40/0x40
>  ret_from_fork+0x35/0x40
> Code: 00 00 5b c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 40 80 
> ff 0c 40 0f b6 c7 77 0b 48 c1 e0 04 8b 80 00 bf c8 bd c3 <0f> 0b b8 fb ff ff 
> ff c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00
> ---[ end trace f079fb809e7a862b ]---
> BTRFS critical (device vda2): unable to find logical 8820195328 length 16384
> BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO 
> failure
> BTRFS info (device vda2): forced readonly
> BTRFS error (device vda2): pending csums is 2887680
> --
>
> [CAUSE]
> It's caused by race with block group auto removal like the following
> case:
> - There is a meta block group X, which has only one tree block
>   The tree block belongs to fs tree 257.
> - In current transaction, some operation modified fs tree 257
>   The tree block get CoWed, so the block group X is empty, and marked as
>   unused, queued to be deleted.
> - Some workload (like fsync) wakes up cleaner_kthread()
>   Which will call btrfs_deleted_unused_bgs() to remove unused block
>   groups.
>   So block group X along its chunk map get removed.
> - Some delalloc work finished for fs tree 257
>   Quota needs to get the original reference of the extent, which will
>   reads tree blocks of commit root of 257.
>   Then since the chunk map get removed, above warning get triggered.
>
> [FIX]
> Just teach btrfs_delete_unused_bgs() to skip block group who still has
> pinned bytes.
>
> However there is a minor side effect, since currently we only queue
> empty blocks at update_block_group(), and such empty block group with
> pinned bytes won't go through update_block_group() again, such block
> group won't be removed, until it get new extent allocated and removed.

So that can be fixed in a separate patch, to add it back to the list
of block groups to be deleted once everything is unpinned and passes
all other necessary criteria.

>
> But please note that, there are more problems related to extent
> allocator with block group auto removal.

The above isn't a problem of the allocator itself but rather in the
way we manage COW, commit roots and unpinning.

>
> Even a block group is marked unused, extent allocator can still allocate
> new extents from unused block group.

Why is that a problem?
It's ok (with some good benefits), as long as the cleaner thread (or
any thing that attempts to delete block groups in the unused list),
doesn't delete it.

> Thus delaying block group to next transaction won't work.
> (Extents get allocated in current transaction, and removed again in next
> transaction).
>
> So the root fix need to co-operate with extent allocator.

What do you mean by co-operation with the extent allocator? I don't
think

[PATCH v2] Btrfs: fix physical offset reported by fiemap for inline extents

2018-06-20 Thread fdmanana

From: Filipe Manana 

Commit 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when
fm_extent_count is zero") introduced a regression where we no longer
report 0 as the physical offset for inline extents (and other extents
with a special block_start value). This is because it always sets the
variable used to report the physical offset ("disko") as em->block_start
plus some offset, and em->block_start has the value 18446744073709551614
((u64) -2) for inline extents.

This made the btrfs test 004 (from fstests) often fail, for example, for
a file with an inline extent we have the following items in the subvolume
tree:

item 101 key (418 INODE_ITEM 0) itemoff 11029 itemsize 160
   generation 25 transid 38 size 1525 nbytes 1525
   block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0
   sequence 0 flags 0x2(none)
   atime 1529342058.461891730 (2018-06-18 18:14:18)
   ctime 1529342058.461891730 (2018-06-18 18:14:18)
   mtime 1529342058.461891730 (2018-06-18 18:14:18)
   otime 1529342055.869892885 (2018-06-18 18:14:15)
item 102 key (418 INODE_REF 264) itemoff 11016 itemsize 13
   index 25 namelen 3 name: fc7
item 103 key (418 EXTENT_DATA 0) itemoff 9470 itemsize 1546
   generation 38 type 0 (inline)
   inline extent data size 1525 ram_bytes 1525 compression 0 (none)

Then when test 004 invoked fiemap against the file it got a non-zero
physical offset:

 $ filefrag -v /mnt/p0/d4/d7/fc7
 Filesystem type is: 9123683e
 File size of /mnt/p0/d4/d7/fc7 is 1525 (1 block of 4096 bytes)
  ext: logical_offset:physical_offset: length:   expected: flags:
0:0..4095: 18446744073709551614..  4093:   4096:
 last,not_aligned,inline,eof
 /mnt/p0/d4/d7/fc7: 1 extent found

This resulted in the test failing like this:

btrfs/004 49s ... [failed, exit status 1]- output mismatch (see 
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad)
--- tests/btrfs/004.out 2016-08-23 10:17:35.027012095 +0100
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad  
2018-06-18 18:15:02.385872155 +0100
@@ -1,3 +1,10 @@
 QA output created by 004
 *** test backref walking
-*** done
+./tests/btrfs/004: line 227: [: 7.55578637259143e+22: integer expression 
expected
+ERROR: 7.55578637259143e+22 is not a valid numeric value.
+unexpected output from
+   /home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal 
logical-resolve -s 65536 -P 7.55578637259143e+22 
/home/fdmanana/btrfs-tests/scratch_1
...
(Run 'diff -u tests/btrfs/004.out 
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad'  to see the entire 
diff)
Ran: btrfs/004

The large number in scientific notation reported as an invalid numeric
value is the result from the filter passed to perl which multiplies the
physical offset by the block size reported by fiemap.

So fix this by ensuring the physical offset is always set to 0 when we
are processing an extent with a special block_start value.

Fixes: 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when fm_extent_count 
is zero")
Signed-off-by: Filipe Manana 
---

v2: Set the physical offset to 0 for other extent maps with a special
block_start value as well.

 fs/btrfs/extent_io.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8e4a7cdbc9f5..1aa91d57404a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4545,8 +4545,11 @@ int extent_fiemap(struct inode *inode, struct 
fiemap_extent_info *fieinfo,
offset_in_extent = em_start - em->start;
em_end = extent_map_end(em);
em_len = em_end - em_start;
-   disko = em->block_start + offset_in_extent;
flags = 0;
+   if (em->block_start < EXTENT_MAP_LAST_BYTE)
+   disko = em->block_start + offset_in_extent;
+   else
+   disko = 0;
 
/*
 * bump off for our next call to get_extent
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix physical offset reported by fiemap for inline extents

2018-06-20 Thread Filipe Manana

On Wed, Jun 20, 2018 at 3:55 AM, robbieko  wrote:
> fdman...@kernel.org 於 2018-06-19 19:31 寫到:
>
>> From: Filipe Manana 
>>
>> Commit 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when
>> fm_extent_count is zero") introduced a regression where we no longer
>> report 0 as the physical offset for inline extents. This is because it
>> always sets the variable used to report the physical offset ("disko")
>> as em->block_start plus some offset, and em->block_start has the value
>> 18446744073709551614 ((u64) -2) for inline extents.
>>
>> This made the btrfs test 004 (from fstests) often fail, for example, for
>> a file with an inline extent we have the following items in the subvolume
>> tree:
>>
>> item 101 key (418 INODE_ITEM 0) itemoff 11029 itemsize 160
>>generation 25 transid 38 size 1525 nbytes 1525
>>block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0
>>sequence 0 flags 0x2(none)
>>atime 1529342058.461891730 (2018-06-18 18:14:18)
>>ctime 1529342058.461891730 (2018-06-18 18:14:18)
>>mtime 1529342058.461891730 (2018-06-18 18:14:18)
>>otime 1529342055.869892885 (2018-06-18 18:14:15)
>> item 102 key (418 INODE_REF 264) itemoff 11016 itemsize 13
>>index 25 namelen 3 name: fc7
>> item 103 key (418 EXTENT_DATA 0) itemoff 9470 itemsize 1546
>>generation 38 type 0 (inline)
>>inline extent data size 1525 ram_bytes 1525 compression 0
>> (none)
>>
>> Then when test 004 invoked fiemap against the file it got a non-zero
>> physical offset:
>>
>>  $ filefrag -v /mnt/p0/d4/d7/fc7
>>  Filesystem type is: 9123683e
>>  File size of /mnt/p0/d4/d7/fc7 is 1525 (1 block of 4096 bytes)
>>   ext: logical_offset:physical_offset: length:   expected:
>> flags:
>> 0:0..4095: 18446744073709551614..  4093:   4096:
>>   last,not_aligned,inline,eof
>>  /mnt/p0/d4/d7/fc7: 1 extent found
>>
>> This resulted in the test failing like this:
>>
>> btrfs/004 49s ... [failed, exit status 1]- output mismatch (see
>> /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad)
>> --- tests/btrfs/004.out 2016-08-23 10:17:35.027012095 +0100
>> +++
>> /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad  2018-06-18
>> 18:15:02.385872155 +0100
>> @@ -1,3 +1,10 @@
>>  QA output created by 004
>>  *** test backref walking
>> -*** done
>> +./tests/btrfs/004: line 227: [: 7.55578637259143e+22: integer
>> expression expected
>> +ERROR: 7.55578637259143e+22 is not a valid numeric value.
>> +unexpected output from
>> +   /home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal
>> logical-resolve -s 65536 -P 7.55578637259143e+22
>> /home/fdmanana/btrfs-tests/scratch_1
>> ...
>> (Run 'diff -u tests/btrfs/004.out
>> /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad'  to see
>> the entire diff)
>> Ran: btrfs/004
>>
>> The large number in scientific notation reported as an invalid numeric
>> value is the result from the filter passed to perl which multiplies the
>> physical offset by the block size reported by fiemap.
>>
>> So fix this by ensuring the physical offset is always set to 0 when we
>> are processing an inline extent.
>>
>> Fixes: 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when
>> fm_extent_count is zero")
>> Signed-off-by: Filipe Manana 
>> ---
>>  fs/btrfs/extent_io.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 8e4a7cdbc9f5..978327d98fc5 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -4559,6 +4559,7 @@ int extent_fiemap(struct inode *inode, struct
>> fiemap_extent_info *fieinfo,
>> end = 1;
>> flags |= FIEMAP_EXTENT_LAST;
>> } else if (em->block_start == EXTENT_MAP_INLINE) {
>> +   disko = 0;
>> flags |= (FIEMAP_EXTENT_DATA_INLINE |
>>   FIEMAP_EXTENT_NOT_ALIGNED);
>> } else if (em->block_start == EXTENT_MAP_DELALLOC) {
>
>
>
> EXTENT_MAP_DELALLOC should have the same problem.
>
> em->block_start has some special values. The following values should not be
> considered disko
> #define EXTENT_MAP_LAST_BYTE((u64)-4)
> #define EXTENT_MAP_HOLE((u64)-3)
> #define EXTENT_MAP_INLINE((u64)-2)
> #define EXTENT_MAP_DELALLOC((u64)-1)
>
> Is the following change more suitable?
> if (em->block_start >= EXTENT_MAP_LAST_BYTE)
> disko = 0;
> else
> disko = em->block_start + offset_in_extent;

Yes, I was thinking about it yesterday's evening regarding at least
holes/delalloc and leaving it for today's morning after leaving some
tests running during the evening.

>
> Thanks.
> Robbie Ko
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.ke

Re: btrfs balance did not progress after 12H

2018-06-20 Thread Duncan

Austin S. Hemmelgarn posted on Tue, 19 Jun 2018 12:58:44 -0400 as
excerpted:

> That said, I would question the value of repacking chunks that are
> already more than half full.  Anything above a 50% usage filter
> generally takes a long time, and has limited value in most cases (higher
> values are less likely to reduce the total number of allocated chunks).
> With `-duszge=50` or less, you're guaranteed to reduce the number of
> chunk if at least two match, and it isn't very time consuming for the
> allocator, all because you can pack at least two matching chunks into
> one 'new' chunk (new in quotes because it may re-pack them into existing
> slack space on the FS). Additionally, `-dusage=50` is usually sufficient
> to mitigate the typical ENOSPC issues that regular balancing is supposed
> to help with.

While I used to agree, 50% for best efficiency, perhaps 66 or 70% if 
you're really pressed for space, now that the allocator can repack into 
existing chunks more efficiently than it used to (at least in ssd mode, 
which all my storage is now), I've seen higher values result in practical/
noticeable recovery of space to unallocated as well.

In fact, I routinely use usage=70 these days, and sometimes use higher, 
to 99 or even 100%[1].  But of course I'm on ssd so it's far faster, and 
partition it up with the biggest partitions being under 100 GiB, so even 
full unfiltered balances are normally under 10 minutes and normal 
filtered balances under a minute, to the point I usually issue the 
balance command and actually wait for completion, so it's a far different 
ball game than issuing a balance command on a multi-TB hard drive and 
expecting it to take hours or even days.  In that case, yeah, a 50% cap 
arguably makes sense, tho he was using 60, which still shouldn't (sans 
bugs like we seem to have here) be /too/ bad.

---
[1] usage=100: -musage=1..100 is the only way I've found to balance 
metadata without rebalancing system as well, with the unfortunate penalty 
for rebalancing system on small filesystems being an increase of the 
system chunk size from 8 MB original mkfs.btrfs size to 32 MB... only a 
few KiB used! =:^(

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: check-integrity: Fix NULL pointer dereference for degraded mount

2018-06-20 Thread Qu Wenruo

Commit f8f84b2dfda5 ("btrfs: index check-integrity state hash by a dev_t")
changed how btrfsic how we index device state hash.

Now we need to access device->bdev->bd_dev, while for degraded mount
it's completely possible to have device->bdev as NULL, thus it will
trigger a NULL pointer dereference at mount time.

Fix it by checking if the device is degraded before accessing
device->bdev->bd_dev.

There are a lot of other places accessing device->bdev->bd_dev, however
the other call sites have either checked device->bdev, or the
device->bdev is passed from btrfsic_map_block(), so it won't cause harm.

Fixes: f8f84b2dfda5 ("btrfs: index check-integrity state hash by a dev_t")
Signed-off-by: Qu Wenruo 
---
Please note there are still quite some problem about check-integrity,
including:
1) Warning for degraded mount
2) Meaningless empty lines output

This patch will only fix the obvious NULL pointer dereference exposed by
btrfs/027 with "check_int" mount option.
---
 fs/btrfs/check-integrity.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index a3fdb4fe967d..daf45472bef9 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -1539,7 +1539,12 @@ static int btrfsic_map_block(struct btrfsic_state 
*state, u64 bytenr, u32 len,
}
 
device = multi->stripes[0].dev;
-   block_ctx_out->dev = btrfsic_dev_state_lookup(device->bdev->bd_dev);
+   if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state) ||
+   !device->bdev || !device->name)
+   block_ctx_out->dev = NULL;
+   else
+   block_ctx_out->dev = btrfsic_dev_state_lookup(
+   device->bdev->bd_dev);
block_ctx_out->dev_bytenr = multi->stripes[0].physical;
block_ctx_out->start = bytenr;
block_ctx_out->len = len;
-- 
2.17.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RAID56

2018-06-20 Thread Nikolay Borisov




On 20.06.2018 10:34, Gandalf Corvotempesta wrote:
> Il giorno mer 20 giu 2018 alle ore 02:06 waxhead
>  ha scritto:
>> First of all: I am not a BTRFS developer, but I follow the mailing list
>> closely and I too have a particular interest in the "RAID"5/6 feature
>> which realistically is probably about 3-4 years (if not more) in the future.
> 
> Ok.
> 
> [cut]
> 
>> Now keep in mind that this is just a humble users analysis of the
>> situation based on whatever I have picked up from the mailing list which
>> may or may not be entirely accurate so take it for what it is!
> 
> I wasn't aware of all of these "restrictions".
> If this is true, now I understand why redhat lost interest in BTRFS.
> 3-4 years more for a "working" RAID56 is absolutely too much, in this case,
> ZFS support for RAID-Z expansion/reduction (actively being worked on)
> will be released
> much earlier (probably, a test working-version later this year and a
> stable version next year)
> 
> RAID-Z single disk espansion/removal is probably the real missing feature in 
> ZFS
> allowing it to be considered a general-purpose FS.
> 
> Device removal was added some months ago and now is possible (so, if
> you add a single disk to a mirrored vdev,
> you don't have to destroy the whole pool to remove the accidentally-added 
> disk)
> 
> In 3-4 years, maybe oracle release ZFS as GPL-compatible (solaris is
> dying, latest release is 3 years ago,
> so there is no need to keep a FS opensource compatible only with a died OS)
> 
> Keep in mind that i'm not a ZFS-fan (honestly, I don't like it) but
> with these 2 features added and tons of restriction in BTRFS,
> there is no other choise.

Of course btrfs is open source and new contributors are always welcome.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RAID56

2018-06-20 Thread Gandalf Corvotempesta

Il giorno mer 20 giu 2018 alle ore 02:06 waxhead
 ha scritto:
> First of all: I am not a BTRFS developer, but I follow the mailing list
> closely and I too have a particular interest in the "RAID"5/6 feature
> which realistically is probably about 3-4 years (if not more) in the future.

Ok.

[cut]

> Now keep in mind that this is just a humble users analysis of the
> situation based on whatever I have picked up from the mailing list which
> may or may not be entirely accurate so take it for what it is!

I wasn't aware of all of these "restrictions".
If this is true, now I understand why redhat lost interest in BTRFS.
3-4 years more for a "working" RAID56 is absolutely too much, in this case,
ZFS support for RAID-Z expansion/reduction (actively being worked on)
will be released
much earlier (probably, a test working-version later this year and a
stable version next year)

RAID-Z single disk espansion/removal is probably the real missing feature in ZFS
allowing it to be considered a general-purpose FS.

Device removal was added some months ago and now is possible (so, if
you add a single disk to a mirrored vdev,
you don't have to destroy the whole pool to remove the accidentally-added disk)

In 3-4 years, maybe oracle release ZFS as GPL-compatible (solaris is
dying, latest release is 3 years ago,
so there is no need to keep a FS opensource compatible only with a died OS)

Keep in mind that i'm not a ZFS-fan (honestly, I don't like it) but
with these 2 features added and tons of restriction in BTRFS,
there is no other choise.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: fix invalid-free in btrfs_extent_same

2018-06-20 Thread Lu Fengqi

On Tue, Jun 19, 2018 at 03:27:54PM +0200, David Sterba wrote:
>On Tue, Jun 19, 2018 at 02:54:38PM +0800, Lu Fengqi wrote:
>> If this condition ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
>> (BTRFS_I(dst)->flags & BTRFS_INODE_NODATASUM))
>> is hit, we will go to free the uninitialized cmp.src_pages and
>> cmp.dst_pages.
>> 
>> Fixes: 67b07bd4bec5 ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl")
>> Signed-off-by: Lu Fengqi 
>> ---
>>  fs/btrfs/ioctl.c | 10 +-
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>> 
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index c2837a32d689..43ecbe620dea 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -3577,7 +3577,7 @@ static int btrfs_extent_same(struct inode *src, u64 
>> loff, u64 olen,
>>  ret = btrfs_extent_same_range(src, loff, BTRFS_MAX_DEDUPE_LEN,
>>dst, dst_loff, &cmp);
>>  if (ret)
>> -goto out_unlock;
>> +goto out_free;
>>  
>>  loff += BTRFS_MAX_DEDUPE_LEN;
>>  dst_loff += BTRFS_MAX_DEDUPE_LEN;
>> @@ -3587,16 +3587,16 @@ static int btrfs_extent_same(struct inode *src, u64 
>> loff, u64 olen,
>>  ret = btrfs_extent_same_range(src, loff, tail_len, dst,
>>dst_loff, &cmp);
>
>The labels now switch order and there's one more 'goto out_free' that
>actually also wants to unlock the pages, after error of
>btrfs_extent_same_range in the for loop. So this needs to be update too.

Sorry, I'm not quite sure what needs to be updated. I will appreciate if
you are willing to take time to make it clear. There are three goto
statements here. The first one that between lock and malloc, jumps directly
to the unlock label. The rest goto statements (including this goto
statement after btrfs_extent_same_range in the for loop) that after malloc,
jump to the following free label. No matter jump to which label, the pages
will be freed and the inodes will be unlocked.

-- 
Thanks,
Lu

>
>>  
>> +out_free:
>> +kvfree(cmp.src_pages);
>> +kvfree(cmp.dst_pages);
>> +
>>  out_unlock:
>>  if (same_inode)
>>  inode_unlock(src);
>>  else
>>  btrfs_double_inode_unlock(src, dst);
>>  
>> -out_free:
>> -kvfree(cmp.src_pages);
>> -kvfree(cmp.dst_pages);
>> -
>>  return ret;
>>  }
>>  
>> -- 
>> 2.17.1
>> 
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RAID56

2018-06-20 Thread Duncan

Gandalf Corvotempesta posted on Tue, 19 Jun 2018 17:26:59 +0200 as
excerpted:

> Another kernel release was made.
> Any improvements in RAID56?

 Btrfs feature improvements come in "btrfs time".  Think long term, 
multiple releases, even multiple years (5 releases per year). 

In fact, btrfs raid56 is a good example.  Originally it was supposed to 
be in kernel 3.6 (or even before, but 3.5 is when I really started 
getting into btrfs enough to know), but for various reasons primarily 
involving the complexity of the feature as well as btrfs itself and the 
number of devs actually working on btrfs, even partial raid56 support 
didn't get added until 3.9, and still-buggy full support for raid56 scrub 
and device replace wasn't there until 3.19, with 4.3 fixing some bugs 
while others remained hidden for many releases until they were finally 
fixed in 4.12.

Since 4.12, btrfs raid56 mode, as such, has the known major bugs fixed 
and is ready for "still cautious use"[1], but for rather technical 
reasons discussed below, may not actually meet people's general 
expectations for what btrfs raid56 should be in reliability terms.

And that's the long term 3+ years out bit that waxhead was talking about.

> I didn't see any changes in that sector, is something still being worked
> on or it's stuck waiting for something ?

Actually, if you look on the wiki page, there were indeed raid56 changes 
in 4.17.

https://btrfs.wiki.kernel.org/index.php/Changelog#v4.17_.28Jun_2018.29


* raid56:
** make sure target is identical to source when raid56 rebuild fails 
after dev-replace
** faster rebuild during scrub, batch by stripes and not block-by-block
** make more use of cached data when rebuilding from a missing device


Tho that's actually the small stuff, "ignoring the elephant in the room" 
raid56 reliability expectations mentioned earlier as likely taking years 
to deal with.

As for those long term issues...

The "elephant in the room" problem is simply the parity-raid "write hole" 
common to all parity-raid systems, unless they've taken specific measures 
to work around the issue in one way or another.


In simple terms, the "write hole" problem is just that parity-raid makes 
the assumption that an update to a stripe including its parity is atomic, 
it happens all at once, so that it's impossible for the parity to be out 
of sync with the data actually written on all the other stripe-component 
devices.  In "real life", that's an invalid assumption.  Should the 
system crash at the wrong time, in the middle of a stripe update, it's 
quite possible that the parity will not match what's actually written to 
the data devices in the stripe, because either the parity will have been 
updated while at least one data device was still writing at the time of 
the crash, or the data will be updated but the parity device won't have 
finished writing yet at the time of the crash.  Either way, the parity 
doesn't match the data that's actually in the stripe, and should a device 
be/go missing so the parity is actually needed to recover the missing 
data, that missing data will be calculated incorrectly because the parity 
doesn't match what the data actually was.

Now as I already stated, that's a known problem common to parity-raid in 
general, so it's not unique at all to btrfs.

The problem specific to btrfs, however, is that in general it's copy-on-
write, with checksumming to guard against invalid data, so in general, it 
provides higher guarantees of data integrity than does a normal update-in-
place filesystem, and it'd be quite reasonable for someone to expect 
those guarantees to extend to btrfs raid56 mode as well, but they don't.

They don't, because while btrfs in general is copy-on-write and thus 
atomic update (in the event of a crash you get either the data as it was 
before the write or the completely written data, not some unpredictable 
mix of before and after), btrfs parity-raid stripes are *NOT* copy-on-
write, they're update-in-place, meaning the write-hole problem applies, 
and in the event of a crash when the parity-raid was already degraded, 
the integrity of the data or metadata being parity-raid written at the 
time of the crash is not guaranteed, nor at present, with the current 
raid56 implementation, /can/ it be guaranteed.

But as I said, the write hole problem is common to parity-raid in 
general, so for people that understand the problem and are prepared to 
deal with the reliability implications it implies[3], btrfs raid56 mode 
should be reasonably ready for still cautious use, even tho it doesn't 
carry the same data integrity and reliability guarantees that btrfs in 
general does.

As for working around or avoiding the write-hole problem entirely, 
there's (at least) four possible solutions, each with their own drawbacks.

The arguably "most proper" but also longest term solution would be to 
rewrite btrfs raid56 mode so it does copy-on-write for partial-stripes in 
parity-mode as well (full-st

87 matches

Mail list logo