2016-04-01 18:15 GMT-07:00 Anand Jain <anand.j...@oracle.com>: >>>> Issue 2. >>>> At start of autoreplacig drive by hotspare, kernel craches in >>>> transaction >>>> handling code (inside of btrfs_commit_transaction() called by >>>> autoreplace initiating >>>> routines). I 'fixed' this by removing of closing of bdev in >>>> btrfs_close_one_device_dont_free(), see >>>> >>>> https://bitbucket.org/jekhor/linux-btrfs/commits/dfa441c9ec7b3833f6a5e4d0b6f8c678faea29bb?at=master >>>> (oops text is attached also). Bdev is closed after replacing by >>>> btrfs_dev_replace_finishing(), so this is safe but doesn't seem >>>> to be right way. >>> >>> >>> I have sent out V2. I don't see that issue with this, >>> could you pls try ? >> >> >> Yes, it reproduced on v4.4.5 kernel. I will try with current >> 'for-linus-4.6' Chris' tree soon. >> >> To emulate a drive failure, I disconnect the drive in VirtualBox, so bdev >> can be freed by kernel after releasing of all references to it. > > > So far the raid group profile would adapt to lower suitable > group profile when device is missing/failed. This appears to > be not happening with RAID56 OR there are stale IO which wasn't > flushed out. Anyway to have this fixed I am moving the patch > btrfs: introduce device dynamic state transition to offline or failed > to the top in v3 for any potential changes. > But firstly we need a reliable test case, or a very carefully > crafted test case which can create this situation > > Here below is the dm-error that I am using for testing, which > apparently doesn't report this issue. Could you please try on V3. ? > (pls note the device names are hard coded in the test script > sorry about that) This would eventually be fstests script.
Hi, I have reproduced this oops with attached script. I don't use any dm layer, but just detach drive at scsi layer as xfstests do (device management functions were copy-pasted from it).
test-autoreplace2-mainline.sh
Description: Bourne shell script