On Wed, May 9, 2018 at 2:10 AM, robbieko <robbi...@synology.com> wrote:
> Filipe Manana 於 2018-05-08 19:12 寫到:
>>
>> On Tue, May 8, 2018 at 11:30 AM, robbieko <robbi...@synology.com> wrote:
>>>
>>> Hi Filipe Manana,
>>>
>>> Although the snapshot is readonly, when the snapshot is created,
>>> in order to modify the last_snapshot, it will cause generation changes,
>>> and it will affect the commit_root modification.
>>
>>
>> So what you to say is that the problem happens when creating a
>> snapshot of snapshot that is being used by send.
>> You don't mention that anywhere, except in the example below.
>>
>> Just say that in the changelog, that the problem can happen if while
>> we are doing a send one of the snapshots used (parent or send) is
>> snapshotted, because snapshoting implies COWing the root of the source
>> subvolume/snaphot.
>> That is not mentioned anywhere in the changelog.
>
>
> OK, I will add this information to changelog.
>
>
>>
>> Now, why does this happen without your patch?
>> Before we use the commit roots, we call extent_buffer_get() on them to
>> make sure they are not released and we do that while holding the
>> semaphore commit_root_sem. Why isn't this enough to prevent
>> use-after-free problems?
>>
>
> Although we use extent_buffer_get to prevent the extent_buffer
> from being released, because of the snapshoting implies COWing
> the root, the commit_root original space (A) is released.
> Therefore, when other process need alloc the new tree block,
> it may use the space of A.
> However, alloc_extent_buffer is created by the bytenr.
> It will first find out if there is an existing extent_buffer through
> find_extent_buffer and cause the original extent_buffer to be modified.
> Eventually causing send process to access illegal memory.
> Thus extent_buffer_get can only prevent extent_buffer from being released,
> but it cannot prevent extent_buffer from being used by others.
>
> btrfs_alloc_tree_block
> --btrfs_reserve_extent (get A bytenr)
> --btrfs_init_new_buffer (use A bytenr)
> ----btrfs_find_create_tree_block
> ------alloc_extent_buffer
> --------find_extent_buffer (get A)
>
> So we fixed the problem by copy commit_root to avoid accessing illegal
> memory

Here it should be mentioned as well how the space gets released
despite the fact that the extent buffer has references.
I.e. that __btrfs_cow_block() calls btrfs_free_tree_block().

Please add all this information, in detail, to the changelog.
The changelog you provided had no useful information at all - didn't
mention how the problem can happen nor why.

thanks

>
>
>
>> That should be explained as well in the changelog.
>>
>> Those are the 2 pieces of information you need to put in the changelog.
>>
>>>
>>> Example:
>>> #btrfs sub show snap1
>>>         Name:                   snap1
>>>         UUID:                   17ba3140-b79d-1649-a2e1-b238c3253a40
>>>         Parent UUID:    6fe90a14-02f3-1b40-bdfc-be435060d023
>>>         Received UUID:  17ba3140-b79d-1649-a2e1-b238c3253a40
>>>         Creation time:  2018-04-27 18:04:11 +0800
>>>         Subvolume ID:   514
>>>         Generation:             4854
>>>         Gen at creation:        506
>>>         Parent ID:              511
>>>         Top level ID:   511
>>>         Flags:                  readonly
>>>         Snapshot(s):
>>>
>>> #btrfs-debug-tree -t 514 /dev/md2 | more
>>> btrfs-progs v4.0
>>> file tree key (514 ROOT_ITEM 506)
>>> node 259127066624 level 2 items 192 free 301 generation 4854 owner 514
>>> fs uuid 081b4c28-007c-430b-be7f-ead57aa71548
>>> chunk uuid a48e7346-bd15-4eac-8957-b11a2bf9bbe2
>>>         key (256 INODE_ITEM 0) block 258707652608 (15790262) gen 505
>>>
>>> #btrfs sub snap send_snap1 test_subvol
>>> #btrfs-debug-tree -t 514 /dev/md2 | more
>>> btrfs-progs v4.0
>>> file tree key (514 ROOT_ITEM 506)
>>> node 258444509184 level 2 items 192 free 301 generation 4881 owner 514
>>> fs uuid 081b4c28-007c-430b-be7f-ead57aa71548
>>> chunk uuid a48e7346-bd15-4eac-8957-b11a2bf9bbe2
>>>         key (256 INODE_ITEM 0) block 258707652608 (15790262) gen 505
>>>
>>> According to the above example node and generation will be updated.
>>>
>>> This means that as long as the send or parent snapshot is taken during
>>> the btrfs send, commit_root will change, and send process may access
>>> invalid memory.
>>>
>>> Thanks.
>>> Robbie Ko
>>>
>>>
>>> Filipe Manana 於 2018-05-08 17:15 寫到:
>>>>
>>>>
>>>> On Tue, May 8, 2018 at 9:15 AM, robbieko <robbi...@synology.com> wrote:
>>>>
>>>>> From: Robbie Ko <robbi...@synology.com>
>>>>>
>>>>> [BUG]
>>>>> btrfs send BUG on parent commit_root node receive destination
>>>>> is at the same volume.
>>>>
>>>>
>>>>
>>>> I can't make sense of that sentence.
>>>>
>>>>>
>>>>> [Example]
>>>>> btrfs send -e -p /mnt/snap1/ /mnt/snap2/ | btrfs receive -e /mnt/temp
>>>>
>>>>
>>>>
>>>> So, is -e necessary?
>>>> So doing that, for any parent and send snapshots, triggers the bug?
>>>> (and regardless of -e)
>>>> If that was enough I'm pretty sure almost everyone would be reporting
>>>> send/receive as unusable.
>>>> Please clarify.
>>>>
>>>>>
>>>>> [REASON]
>>>>> 1. When send with the parent, the send process will get the parent
>>>>>    commit_root(A), then send commit_root(B) and then compare the
>>>>>    tree between A and B.
>>>>> 2. The receive process will take snapshot from parent, then the parent
>>>>>    commit_root changes from A to C, so A will be released.
>>>>
>>>>
>>>>
>>>> I don't understand this. How can the commit_root change?
>>>> The snapshot is readonly, it's can't be modified after it's created.
>>>>
>>>>> 3. Then A will be assigned to other leaf nodes.
>>>>> 4. Therefore, the send process will use invalid A, resulting in
>>>>>    invalid memory access.
>>>>>
>>>>> [FIX]
>>>>> We create new root and copy from parent/send commit_root to
>>>>> avoid invalid memory access.
>>>>>
>>>>> CallTrace looks like this:
>>>>>  ------------[ cut here ]------------
>>>>>  kernel BUG at fs/btrfs/ctree.c:1861!
>>>>>  invalid opcode: 0000 [#1] SMP
>>>>>  CPU: 6 PID: 24235 Comm: btrfs Tainted: P           O 3.10.105 #23721
>>>>
>>>>
>>>>
>>>> 3.10 kernel....
>>>> You should reproduce any bugs you find on more recent kernels...
>>>>
>>>>>  ffff88046652d680 ti: ffff88041b720000 task.ti: ffff88041b720000
>>>>>  RIP: 0010:[<ffffffffa08dd0e8>] read_node_slot+0x108/0x110 [btrfs]
>>>>>  RSP: 0018:ffff88041b723b68  EFLAGS: 00010246
>>>>>  RAX: ffff88043ca6b000 RBX: ffff88041b723c50 RCX: ffff880000000000
>>>>>  RDX: 000000000000004c RSI: ffff880314b133f8 RDI: ffff880458b24000
>>>>>  RBP: 0000000000000000 R08: 0000000000000001 R09: ffff88041b723c66
>>>>>  R10: 0000000000000001 R11: 0000000000001000 R12: ffff8803f3e48890
>>>>>  R13: ffff8803f3e48880 R14: ffff880466351800 R15: 0000000000000001
>>>>>  FS:  00007f8c321dc8c0(0000) GS:ffff88047fcc0000(0000)
>>>>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>  R2: 00007efd1006d000 CR3: 0000000213a24000 CR4: 00000000003407e0
>>>>>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>  Stack:
>>>>>  ffff88041b723c50 ffff8803f3e48880 ffff8803f3e48890 ffff8803f3e48880
>>>>>  ffff880466351800 0000000000000001 ffffffffa08dd9d7 ffff88041b723c50
>>>>>  ffff8803f3e48880 ffff88041b723c66 ffffffffa08dde85 a9ff88042d2c4400
>>>>>  Call Trace:
>>>>>  [<ffffffffa08dd9d7>] ? tree_move_down.isra.33+0x27/0x50 [btrfs]
>>>>>  [<ffffffffa08dde85>] ? tree_advance+0xb5/0xc0 [btrfs]
>>>>>  [<ffffffffa08e83d4>] ? btrfs_compare_trees+0x2d4/0x760 [btrfs]
>>>>>  [<ffffffffa0982050>] ? finish_inode_if_needed+0x870/0x870 [btrfs]
>>>>>  [<ffffffffa09841ea>] ? btrfs_ioctl_send+0xeda/0x1050 [btrfs]
>>>>>  [<ffffffffa094bd3d>] ? btrfs_ioctl+0x1e3d/0x33f0 [btrfs]
>>>>>  [<ffffffff81111133>] ? handle_pte_fault+0x373/0x990
>>>>>  [<ffffffff8153a096>] ? atomic_notifier_call_chain+0x16/0x20
>>>>>  [<ffffffff81063256>] ? set_task_cpu+0xb6/0x1d0
>>>>>  [<ffffffff811122c3>] ? handle_mm_fault+0x143/0x2a0
>>>>>  [<ffffffff81539cc0>] ? __do_page_fault+0x1d0/0x500
>>>>>  [<ffffffff81062f07>] ? check_preempt_curr+0x57/0x90
>>>>>  [<ffffffff8115075a>] ? do_vfs_ioctl+0x4aa/0x990
>>>>>  [<ffffffff81034f83>] ? do_fork+0x113/0x3b0
>>>>>  [<ffffffff812dd7d7>] ? trace_hardirqs_off_thunk+0x3a/0x6c
>>>>>  [<ffffffff81150cc8>] ? SyS_ioctl+0x88/0xa0
>>>>>  [<ffffffff8153e422>] ? system_call_fastpath+0x16/0x1b
>>>>>  ---[ end trace 29576629ee80b2e1 ]---
>>>>>
>>>>> Signed-off-by: Robbie Ko <robbi...@synology.com>
>>>>> ---
>>>>>  fs/btrfs/ctree.c | 26 ++++++++++++++++++++++++--
>>>>>  1 file changed, 24 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>>>>> index b88a79e..c9ce52f 100644
>>>>> --- a/fs/btrfs/ctree.c
>>>>> +++ b/fs/btrfs/ctree.c
>>>>> @@ -5398,6 +5398,8 @@ int btrfs_compare_trees(struct btrfs_root
>>>>> *left_root,
>>>>>         u64 right_blockptr;
>>>>>         u64 left_gen;
>>>>>         u64 right_gen;
>>>>> +       struct extent_buffer *left_root_node = NULL;
>>>>> +       struct extent_buffer *right_root_node = NULL;
>>>>>
>>>>>         left_path = btrfs_alloc_path();
>>>>>         if (!left_path) {
>>>>> @@ -5416,6 +5418,20 @@ int btrfs_compare_trees(struct btrfs_root
>>>>> *left_root,
>>>>>                 goto out;
>>>>>         }
>>>>>
>>>>> +       left_root_node = alloc_dummy_extent_buffer(left_root->fs_info,
>>>>> 0);
>>>>> +       if (!left_root_node) {
>>>>> +               ret = -ENOMEM;
>>>>> +               goto out;
>>>>> +       }
>>>>> +       extent_buffer_get(left_root_node);
>>>>> +
>>>>> +       right_root_node = alloc_dummy_extent_buffer(left_root->fs_info,
>>>>> 0);
>>>>> +       if (!right_root_node) {
>>>>> +               ret = -ENOMEM;
>>>>> +               goto out;
>>>>> +       }
>>>>> +       extent_buffer_get(right_root_node);
>>>>> +
>>>>>         left_path->search_commit_root = 1;
>>>>>         left_path->skip_locking = 1;
>>>>>         right_path->search_commit_root = 1;
>>>>> @@ -5460,12 +5476,16 @@ int btrfs_compare_trees(struct btrfs_root
>>>>> *left_root,
>>>>>         down_read(&fs_info->commit_root_sem);
>>>>>         left_level = btrfs_header_level(left_root->commit_root);
>>>>>         left_root_level = left_level;
>>>>> -       left_path->nodes[left_level] = left_root->commit_root;
>>>>> +       copy_extent_buffer_full(left_root_node,
>>>>> left_root->commit_root);
>>>>> +       set_bit(EXTENT_BUFFER_UPTODATE, &left_root_node->bflags);
>>>>> +       left_path->nodes[left_level] = left_root_node;
>>>>>         extent_buffer_get(left_path->nodes[left_level]);
>>>>>
>>>>>         right_level = btrfs_header_level(right_root->commit_root);
>>>>>         right_root_level = right_level;
>>>>> -       right_path->nodes[right_level] = right_root->commit_root;
>>>>> +       copy_extent_buffer_full(right_root_node,
>>>>> right_root->commit_root);
>>>>> +       set_bit(EXTENT_BUFFER_UPTODATE, &right_root_node->bflags);
>>>>> +       right_path->nodes[right_level] = right_root_node;
>>>>>         extent_buffer_get(right_path->nodes[right_level]);
>>>>>         up_read(&fs_info->commit_root_sem);
>>>>>
>>>>> @@ -5613,6 +5633,8 @@ int btrfs_compare_trees(struct btrfs_root
>>>>> *left_root,
>>>>>  out:
>>>>>         btrfs_free_path(left_path);
>>>>>         btrfs_free_path(right_path);
>>>>> +       free_extent_buffer(left_root_node);
>>>>> +       free_extent_buffer(right_root_node);
>>>>>         kvfree(tmp_buf);
>>>>>         return ret;
>>>>>  }
>>>>> --
>>>>> 1.9.1
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>>>>> in
>>>>> the body of a message to majord...@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to