On Fri, 07 Feb 2014 17:17:45 +0800, Miao Xie wrote: > On Fri, 07 Feb 2014 09:14:42 +0100, Stefan Behrens wrote: >> On Thu, 30 Jan 2014 16:46:55 +0800, Miao Xie wrote: >>> During device replace test, we hit a null pointer deference (It was very >>> easy >>> to reproduce it by running xfstests' btrfs/011 on the devices with the >>> virtio >>> scsi driver). There were two bugs that caused this problem: >>> - We might allocate new chunks on the replaced device after we updated >>> the mapping tree. And we forgot to replace the source device in those >>> mapping of the new chunks. >>> - We might get the mapping information which including the source device >>> before the mapping information update. And then submit the bio which was >>> based on that mapping information after we freed the source device. >>> >>> For the first bug, we can fix it by doing mapping tree update and source >>> device remove in the same context of the chunk mutex. The chunk mutex is >>> used to protect the allocable device list, the above method can avoid >>> the new chunk allocation, and after we remove the source device, all >>> the new chunks will be allocated on the new device. So it can fix >>> the first bug. >>> >>> For the second bug, we need make sure all flighting bios are finished and >>> no new bios are produced during we are removing the source device. To fix >>> this problem, we introduced a global @bio_counter, we not only inc/dec >>> @bio_counter outsize of map_blocks, but also inc it before submitting bio >>> and dec @bio_counter when ending bios. >>> >>> Since Raid56 is a little different and device replace dosen't support raid56 >>> yet, it is not addressed in the patch and I add comments to make sure we >>> will >>> fix it in the future. >>> >>> Reported-by: Qu Wenruo <quwen...@cn.fujitsu.com> >>> Signed-off-by: Wang Shilong <wangsl.f...@cn.fujitsu.com> >>> Signed-off-by: Miao Xie <mi...@cn.fujitsu.com> >>> --- >>> fs/btrfs/ctree.h | 9 ++++++ >>> fs/btrfs/dev-replace.c | 74 >>> +++++++++++++++++++++++++++++++++++++++++++++----- >>> fs/btrfs/disk-io.c | 12 +++++++- >>> fs/btrfs/volumes.c | 30 +++++++++++++++----- >>> fs/btrfs/volumes.h | 1 + >>> 5 files changed, 111 insertions(+), 15 deletions(-) [...] >>> static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, >>> int scrub_ret) >>> { >>> @@ -458,12 +487,6 @@ static int btrfs_dev_replace_finishing(struct >>> btrfs_fs_info *fs_info, >>> src_device = dev_replace->srcdev; >>> btrfs_dev_replace_unlock(dev_replace); >>> >>> - /* replace old device with new one in mapping tree */ >>> - if (!scrub_ret) >>> - btrfs_dev_replace_update_device_in_mapping_tree(fs_info, >>> - src_device, >>> - tgt_device); >>> - >> >> Isn't this change the reason that you need to add code to count bios? > > No, the change is not the reason to introduce a bio counter. > > The above two bugs are independent, the second bug can happen even the first > one > doesn't exist, for example: > Task1 Task2 > btrfs_readpages() > btrfs_map_bio() > __btrfs_map_block() > <be blocked for a long time>
You are right! Thanks. > device replace start > device replace finish > release source device > bio_size_ok() > bdev_get_queue() > oops > >> I mean, code is there to flush everything to disk. It's needed for sync, >> too. When btrfs_dev_replace_finishing() is called, the copy operation is >> finished and the replaced and the replacement drive are logically >> identical except that data is not yet forced to be flushed to disk. The >> only difference between both disks is the bdev block device pointer and >> the btrfs device pointer. >> >> I think this is the correct and lightweight procedure: >> >> 1. After the copy operation finished, from now on, always use the new >> btrfs device and the new block device. >> 2. Flush all outstanding bios and wait for them. >> 3. Close the block dev and free the btrfs device. >> >> You said that the problem was that we might allocate new chunks on the >> replaced device after we updated the mapping tree. Well, can't we just >> make sure the new chunks are allocated using the new device pointers >> after step 1 and all issues are resolved? I think that the first change >> is not a good idea and causes the other issues that you fix with a lot >> of code. > > Step1 can fix the first problem, but need more code to recover the mapping > tree when the flush fails. > > And as I said above, the second bug is not introduced by the fix of the first > bug, we have to add a bio counter. Any good idea? > > BTW, the reason why I mixed two fixes into one patch is that we could not fix > the oops by any one of them. I can split it into two patches if need. > > Thanks > Miao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html