Re: [PATCH 2/2] Btrfs: fix use-after-free in the finishing procedure of the device replace

Stefan Behrens Fri, 07 Feb 2014 01:43:55 -0800

On Fri, 07 Feb 2014 17:17:45 +0800, Miao Xie wrote:
> On Fri, 07 Feb 2014 09:14:42 +0100, Stefan Behrens wrote:
>> On Thu, 30 Jan 2014 16:46:55 +0800, Miao Xie wrote:
>>> During device replace test, we hit a null pointer deference (It was very 
>>> easy
>>> to reproduce it by running xfstests' btrfs/011 on the devices with the 
>>> virtio
>>> scsi driver). There were two bugs that caused this problem:
>>> - We might allocate new chunks on the replaced device after we updated
>>>   the mapping tree. And we forgot to replace the source device in those
>>>   mapping of the new chunks.
>>> - We might get the mapping information which including the source device
>>>   before the mapping information update. And then submit the bio which was
>>>   based on that mapping information after we freed the source device.
>>>
>>> For the first bug, we can fix it by doing mapping tree update and source
>>> device remove in the same context of the chunk mutex. The chunk mutex is
>>> used to protect the allocable device list, the above method can avoid
>>> the new chunk allocation, and after we remove the source device, all
>>> the new chunks will be allocated on the new device. So it can fix
>>> the first bug.
>>>
>>> For the second bug, we need make sure all flighting bios are finished and
>>> no new bios are produced during we are removing the source device. To fix
>>> this problem, we introduced a global @bio_counter, we not only inc/dec
>>> @bio_counter outsize of map_blocks, but also inc it before submitting bio
>>> and dec @bio_counter when ending bios.
>>>
>>> Since Raid56 is a little different and device replace dosen't support raid56
>>> yet, it is not addressed in the patch and I add comments to make sure we 
>>> will
>>> fix it in the future.
>>>
>>> Reported-by: Qu Wenruo <quwen...@cn.fujitsu.com>
>>> Signed-off-by: Wang Shilong <wangsl.f...@cn.fujitsu.com>
>>> Signed-off-by: Miao Xie <mi...@cn.fujitsu.com>
>>> ---
>>>  fs/btrfs/ctree.h       |  9 ++++++
>>>  fs/btrfs/dev-replace.c | 74 
>>> +++++++++++++++++++++++++++++++++++++++++++++-----
>>>  fs/btrfs/disk-io.c     | 12 +++++++-
>>>  fs/btrfs/volumes.c     | 30 +++++++++++++++-----
>>>  fs/btrfs/volumes.h     |  1 +
>>>  5 files changed, 111 insertions(+), 15 deletions(-)
[...]
>>>  static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
>>>                                    int scrub_ret)
>>>  {
>>> @@ -458,12 +487,6 @@ static int btrfs_dev_replace_finishing(struct 
>>> btrfs_fs_info *fs_info,
>>>     src_device = dev_replace->srcdev;
>>>     btrfs_dev_replace_unlock(dev_replace);
>>>  
>>> -   /* replace old device with new one in mapping tree */
>>> -   if (!scrub_ret)
>>> -           btrfs_dev_replace_update_device_in_mapping_tree(fs_info,
>>> -                                                           src_device,
>>> -                                                           tgt_device);
>>> -
>>
>> Isn't this change the reason that you need to add code to count bios?
> 
> No, the change is not the reason to introduce a bio counter.
> 
> The above two bugs are independent, the second bug can happen even the first 
> one
> doesn't exist, for example:
>       Task1                   Task2
>                               btrfs_readpages()
>                                 btrfs_map_bio()
>                                   __btrfs_map_block()
>                                   <be blocked for a long time>


You are right!
Thanks.


>       device replace start
>       device replace finish
>         release source device
>                                   bio_size_ok()
>                                     bdev_get_queue()
>                                       oops
> 
>> I mean, code is there to flush everything to disk. It's needed for sync,
>> too. When btrfs_dev_replace_finishing() is called, the copy operation is
>> finished and the replaced and the replacement drive are logically
>> identical except that data is not yet forced to be flushed to disk. The
>> only difference between both disks is the bdev block device pointer and
>> the btrfs device pointer.
>>
>> I think this is the correct and lightweight procedure:
>>
>> 1. After the copy operation finished, from now on, always use the new
>> btrfs device and the new block device.
>> 2. Flush all outstanding bios and wait for them.
>> 3. Close the block dev and free the btrfs device.
>>
>> You said that the problem was that we might allocate new chunks on the
>> replaced device after we updated the mapping tree. Well, can't we just
>> make sure the new chunks are allocated using the new device pointers
>> after step 1 and all issues are resolved? I think that the first change
>> is not a good idea and causes the other issues that you fix with a lot
>> of code.
> 
> Step1 can fix the first problem, but need more code to recover the mapping
> tree when the flush fails.
> 
> And as I said above, the second bug is not introduced by the fix of the first
> bug, we have to add a bio counter. Any good idea?
> 
> BTW, the reason why I mixed two fixes into one patch is that we could not fix
> the oops by any one of them. I can split it into two patches if need. 
> 
> Thanks
> Miao


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] Btrfs: fix use-after-free in the finishing procedure of the device replace

Reply via email to