2016-04-01 18:15 GMT-07:00 Anand Jain <anand.j...@oracle.com>:
>>>> Issue 2.
>>>> At start of autoreplacig drive by hotspare, kernel craches in
>>>> transaction
>>>> handling code (inside of btrfs_commit_transaction() called by
>>>> autoreplace initiating
>>>> routines). I 'fixed' this by removing of closing of bdev in
>>>> btrfs_close_one_device_dont_free(), see
>>>>
>>>> https://bitbucket.org/jekhor/linux-btrfs/commits/dfa441c9ec7b3833f6a5e4d0b6f8c678faea29bb?at=master
>>>> (oops text is attached also). Bdev is closed after replacing by
>>>> btrfs_dev_replace_finishing(), so this is safe but doesn't seem
>>>> to be right way.
>>>
>>>
>>>   I have sent out V2. I don't see that issue with this,
>>>   could you pls try ?
>>
>>
>> Yes, it reproduced on v4.4.5 kernel. I will try with current
>> 'for-linus-4.6' Chris' tree soon.
>>
>> To emulate a drive failure, I disconnect the drive in VirtualBox, so bdev
>> can be freed by kernel after releasing of all references to it.
>
>
>   So far the raid group profile would adapt to lower suitable
>   group profile when device is missing/failed. This appears to
>   be not happening with RAID56 OR there are stale IO which wasn't
>   flushed out. Anyway to have this fixed I am moving the patch
>    btrfs: introduce device dynamic state transition to offline or failed
>   to the top in v3 for any potential changes.
>   But firstly we need a reliable test case, or a very carefully
>   crafted test case which can create this situation
>
>   Here below is the dm-error that I am using for testing, which
>   apparently doesn't report this issue. Could you please try on V3. ?
>   (pls note the device names are hard coded in the test script
>   sorry about that) This would eventually be fstests script.

Hi,

I have reproduced this oops with attached script. I don't use any dm
layer, but just detach drive at scsi layer as xfstests do (device
management functions were copy-pasted from it).

Attachment: test-autoreplace2-mainline.sh
Description: Bourne shell script

Reply via email to