Hi Josef,
is this the commit to look at:
6df9a95e63395f595d0d1eb5d561dd6c91c40270 Btrfs: make the chunk
allocator completely tree lockless

or some other commits are also relevant?

Alex.


On Tue, Feb 18, 2014 at 6:06 PM, Josef Bacik <jba...@fb.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
> On 02/18/2014 10:47 AM, Alex Lyakas wrote:
>> Hello Josef,
>>
>> On Tue, Dec 18, 2012 at 3:52 PM, Josef Bacik <jba...@fusionio.com>
>> wrote:
>>> On Wed, Dec 12, 2012 at 06:52:37PM -0700, Liu Bo wrote:
>>>> An user reported that he has hit an annoying deadlock while
>>>> playing with ceph based on btrfs.
>>>>
>>>> Current updating device tree requires space from METADATA
>>>> chunk, so we -may- need to do a recursive chunk allocation when
>>>> adding/updating dev extent, that is where the deadlock comes
>>>> from.
>>>>
>>>> If we use SYSTEM metadata to update device tree, we can avoid
>>>> the recursive stuff.
>>>>
>>>
>>> This is going to cause us to allocate much more system chunks
>>> than we used to which could land us in trouble.  Instead let's
>>> just keep us from re-entering if we're already allocating a
>>> chunk.  We do the chunk allocation when we don't have enough
>>> space for a cluster, but we'll likely have plenty of space to
>>> make an allocation.  Can you give this patch a try Jim and see if
>>> it fixes your problem? Thanks,
>>>
>>> Josef
>>>
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index e152809..59df5e7 100644 --- a/fs/btrfs/extent-tree.c +++
>>> b/fs/btrfs/extent-tree.c @@ -3564,6 +3564,10 @@ static int
>>> do_chunk_alloc(struct btrfs_trans_handle *trans, int
>>> wait_for_alloc = 0; int ret = 0;
>>>
>>> +       /* Don't re-enter if we're already allocating a chunk */
>>> +       if (trans->allocating_chunk) +               return
>>> -ENOSPC; + space_info = __find_space_info(extent_root->fs_info,
>>> flags); if (!space_info) { ret =
>>> update_space_info(extent_root->fs_info, flags, @@ -3606,6 +3610,8
>>> @@ again: goto again; }
>>>
>>> +       trans->allocating_chunk = true; + /* * If we have mixed
>>> data/metadata chunks we want to make sure we keep * allocating
>>> mixed chunks instead of individual chunks. @@ -3632,6 +3638,7 @@
>>> again: check_system_chunk(trans, extent_root, flags);
>>>
>>> ret = btrfs_alloc_chunk(trans, extent_root, flags); +
>>> trans->allocating_chunk = false; if (ret < 0 && ret != -ENOSPC)
>>> goto out;
>>>
>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>> index e6509b9..47ad8be 100644 --- a/fs/btrfs/transaction.c +++
>>> b/fs/btrfs/transaction.c @@ -388,6 +388,7 @@ again:
>>> h->qgroup_reserved = qgroup_reserved; h->delayed_ref_elem.seq =
>>> 0; h->type = type; +       h->allocating_chunk = false;
>>> INIT_LIST_HEAD(&h->qgroup_ref_list);
>>> INIT_LIST_HEAD(&h->new_bgs);
>>>
>>> diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
>>> index 0e8aa1e..69700f7 100644 --- a/fs/btrfs/transaction.h +++
>>> b/fs/btrfs/transaction.h @@ -68,6 +68,7 @@ struct
>>> btrfs_trans_handle { struct btrfs_block_rsv *orig_rsv; short
>>> aborted; short adding_csums; +       bool allocating_chunk; enum
>>> btrfs_trans_type type; /* * this root is only needed to validate
>>> that the root passed to
>>
>> I hit this problem in a following scenario: - a data chunk
>> allocation is triggered, and locks chunk_mutex - the same thread
>> now also wants to allocate a metadata chunk, so it recursively
>> calls do_chunk_alloc, but cannot lock the chunk_mutex => deadlock -
>> btrfs has only one metadata chunk, the one that was initially
>> allocated by mkfs, it has: total_bytes=8388608 bytes_used=8130560
>> bytes_pinned=77824 bytes_reserved=180224 so bytes_used +
>> bytes_pinned + bytes_reserved == total_bytes
>>
>> Your patch would have returned ENOSPC and avoid the deadlock, but
>> there would be a failure to allocate a tree block for metadata. So
>> the transaction would have probably aborted.
>>
>> How such situation should be handled?
>>
>> Idea1: - lock chunk mutex, - if we are allocating a data chunk,
>> check whether the metadata space is below some threshold. If yes,
>> go and allocate a metadata chunk first and then only a data chunk.
>>
>> Idea2: - check if we are the same thread that already locked the
>> chunk mutex. If yes, allow recursive call but don't attempt to
>> lock/unlock the chunk_mutex this time
>>
>> Or some other way?
>>
>
> I fixed this with the delayed chunk allocation stuff which doesn't
> actually do the block group creation stuff until we end the
> transaction, so we can allocate metadata chunks without any issue.
> Thanks,
>
> Josef
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBAgAGBQJTA4UMAAoJEANb+wAKly3B+KEP/RdlEyJWydetjQxllF0cgHY1
> UraqWBl+mSSHlwZlHyGjmAu6cK6n+QfTZtdIBhihdY50UcvMuWtVmz2JzlbxeO5+
> 88dBevADmW+QQoRl0yyQgnjlLWm+LvMTgOd1r+DZqlGs6sdX05dMI207+fQOW+c4
> P+UKbT/eUYRVC4K//J1GUk4Yh3Q70U25321RWCehSUciwDVJO2LztD9VBAgh3qUc
> o5uh5syshS3RbEi0hnUQ8tDKXWvdZQBA2RF4loXACCmQO95e84mxVpoYPd9S1yYs
> J+wf+Bak5hKZxmXJkOVcjLj4GsVQFJWTBTj6FvOFrm5TAFEGSyzrEzL8xW361+VS
> I1q8GPSVN1fGKkVypddylLIXLHmqXb57UElvGhoBM0otxNd8+xfSpLZ045vv5qLx
> RKwhJI1gIWD59kBre0fdSkUJZDeYSmLWOiwG6hG3A7Yy93c6/1RLHRnHq5NEe12R
> nrqZKBnkvDKnL/21eVqpOMo7i/AzCB7N+ojfaql2WvWcLkCpomhLBgC18Q1RiSzZ
> nfmafQIUPunM4l/fLXsbYFdiUu2jSZWZuTpOV71lYUqfrUydqBCZqTpWAlmfkNQ7
> C4BHMtgfiRn6CI2KzpP6DpdGJbxjExEWzwheaswffN5TzOxEHQeRvHOKI41ln1i7
> UfdifDhUx+zZl0TxMesQ
> =elae
> -----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to