On Tue, Apr 16, 2019 at 12:32 PM Qu Wenruo <w...@suse.de> wrote: > > > > On 2019/4/16 下午7:22, Filipe Manana wrote: > > On Tue, Apr 16, 2019 at 11:23 AM Qu Wenruo <w...@suse.com> wrote: > >> > >> There is a hidden call loop which can trigger itself: > >> > >> btrfs_reserve_extent() <--| > >> |- do_chunk_alloc() | > >> |- btrfs_alloc_chunk() | > >> |- btrfs_insert_item() | > >> |- btrfs_reserve_extent() <--| > >> > >> Currently, we're using root->ref_cows to determine whether we should do > >> chunk prealloc to avoid such loop. > >> > >> But that's still a hidden trap. Instead of solving it using some hidden > >> tricks, this patch will make chunk/block group allocation exclusive. > >> > >> Now if do_chunk_alloc() determines to alloc chunk, it will set a special > >> flag in transaction handler so it new do_chunk_alloc() will refuse to > >> allocate new chunk until current chunk allocation finishes. > > > > What if the chunk allocated during the recursion is of a different type? > > I.e. we're allocating a data chunk, then when inserting the items in the > > extent, chunk, device, etc trees, we run out of metadata data space and > > need to allocate a metadata chunk. What you are doing skips the allocation > > of the metadata chunk and makes the inserts/updates of the trees fail > > with ENOSPC. > > That's why we do preallocation to avoid such problem.
Ok, I wasn't aware about such preallocation in btrfs-progs. > > Just as the old code shows, even when we're allocating data chunks, it > will try to check metadata space first. > > And for the profile we're asking, we over-allocate 2M, definitely not > the best solution but should be enough for extent/chunk tree update. > > To make it as good as kernel, we need a lot of accounting which is not > here in btrfs-progs. > > So in your case, every btrfs_reserve_extent() call should either trigger > chunk allocation (either metadata or data, or both), or have 2M overhead. > > If that 2M can not meet the metadata space requirement for > chunk/extent/root tree update, then we hit the problem you described. > > However 2M looks pretty generous to me, so it should just work. Yes, that should be enough even for 64Kb nodes and trees with 8 (max) levels. thanks > > Thanks, > Qu > > > > > thanks > > > > > > > >> > >> Signed-off-by: Qu Wenruo <w...@suse.com> > >> --- > >> check/main.c | 2 +- > >> extent-tree.c | 12 ++++++++++++ > >> transaction.c | 3 ++- > >> transaction.h | 3 ++- > >> 4 files changed, 17 insertions(+), 3 deletions(-) > >> > >> diff --git a/check/main.c b/check/main.c > >> index 683c322eba6f..121be4b73c4f 100644 > >> --- a/check/main.c > >> +++ b/check/main.c > >> @@ -10185,7 +10185,7 @@ int cmd_check(int argc, char **argv) > >> goto close_out; > >> } > >> > >> - trans->reinit_extent_tree = true; > >> + trans->reinit_extent_tree = 1; > >> if (init_extent_tree) { > >> printf("Creating a new extent tree\n"); > >> ret = reinit_extent_tree(trans, info, > >> diff --git a/extent-tree.c b/extent-tree.c > >> index 8c9cdeff3b02..e25ddf02e5cc 100644 > >> --- a/extent-tree.c > >> +++ b/extent-tree.c > >> @@ -1873,10 +1873,21 @@ static int do_chunk_alloc(struct > >> btrfs_trans_handle *trans, > >> (flags & BTRFS_BLOCK_GROUP_SYSTEM)) > >> return 0; > >> > >> + /* > >> + * We're going to allocate new chunk, during the process, we will > >> + * allocate new tree blocks, which can trigger new chunk allocation > >> + * again. > >> + * Avoid such loop call > >> + */ > >> + if (trans->allocating_chunk) > >> + return 0; > >> + trans->allocating_chunk = 1; > >> + > >> ret = btrfs_alloc_chunk(trans, fs_info, &start, &num_bytes, > >> space_info->flags); > >> if (ret == -ENOSPC) { > >> space_info->full = 1; > >> + trans->allocating_chunk = 0; > >> return 0; > >> } > >> > >> @@ -1885,6 +1896,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle > >> *trans, > >> ret = btrfs_make_block_group(trans, fs_info, 0, space_info->flags, > >> start, num_bytes); > >> BUG_ON(ret); > >> + trans->allocating_chunk = 0; > >> return 0; > >> } > >> > >> diff --git a/transaction.c b/transaction.c > >> index 3a63988b0969..21377282f806 100644 > >> --- a/transaction.c > >> +++ b/transaction.c > >> @@ -46,7 +46,8 @@ struct btrfs_trans_handle* > >> btrfs_start_transaction(struct btrfs_root *root, > >> fs_info->generation++; > >> h->transid = fs_info->generation; > >> h->blocks_reserved = num_blocks; > >> - h->reinit_extent_tree = false; > >> + h->reinit_extent_tree = 0; > >> + h->allocating_chunk = 0; > >> root->last_trans = h->transid; > >> root->commit_root = root->node; > >> extent_buffer_get(root->node); > >> diff --git a/transaction.h b/transaction.h > >> index 34060252dd5c..b426cd112447 100644 > >> --- a/transaction.h > >> +++ b/transaction.h > >> @@ -28,7 +28,8 @@ struct btrfs_trans_handle { > >> u64 transid; > >> u64 alloc_exclude_start; > >> u64 alloc_exclude_nr; > >> - bool reinit_extent_tree; > >> + unsigned int reinit_extent_tree:1; > >> + unsigned int allocating_chunk:1; > >> u64 delayed_ref_updates; > >> unsigned long blocks_reserved; > >> unsigned long blocks_used; > >> -- > >> 2.21.0 > >> > > > > > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”