Hi all,
While running xfs test testing some other features scheduled for 4.12
I came across this KASAN dump:
[ 638.913813] XFS (nvme0n1): Mounting V5 Filesystem
[ 638.917934] XFS (nvme0n1): Ending clean mount
[ 639.035070] blk_update_request: I/O error, dev nvme1n1, sector 0
[ 639.071764] XFS (nvme1n1): Mounting V5 Filesystem
[ 639.077052] XFS (nvme1n1): Ending clean mount
[ 639.110634] XFS (nvme0n1): Unmounting Filesystem
[ 639.260132] XFS (nvme0n1): Mounting V5 Filesystem
[ 639.264141] XFS (nvme0n1): Ending clean mount
[ 639.282112] run fstests generic/108 at 2017-04-20 14:30:05
[ 639.525274] XFS (nvme1n1): Unmounting Filesystem
[ 639.570999] scsi host2: scsi_debug: version 1.86 [20160430]
[ 639.570999] dev_size_mb=128, opts=0x0, submit_queues=1, statistics=0
[ 639.573698] scsi 2:0:0:0: Direct-Access Linux scsi_debug 0186
PQ: 0 ANSI: 7
[ 639.595927] sd 2:0:0:0: Attached scsi generic sg1 type 0
[ 639.598290] sd 2:0:0:0: [sda] 262144 512-byte logical blocks: (134 MB/128
MiB)
[ 639.599884] sd 2:0:0:0: [sda] Write Protect is off
[ 639.600246] sd 2:0:0:0: [sda] Mode Sense: 73 00 10 08
[ 639.602747] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled,
supports DPO and FUA
[ 639.626440] sd 2:0:0:0: [sda] Attached SCSI disk
[ 641.264666] XFS (dm-0): Mounting V5 Filesystem
[ 641.278227] XFS (dm-0): Ending clean mount
[ 641.394865] sd 2:0:0:0: rejecting I/O to offline device
[ 641.395353] sd 2:0:0:0: rejecting I/O to offline device
[ 641.395903] sd 2:0:0:0: rejecting I/O to offline device
[ 641.396362] sd 2:0:0:0: rejecting I/O to offline device
[ 641.396896] sd 2:0:0:0: rejecting I/O to offline device
[ 641.397347] sd 2:0:0:0: rejecting I/O to offline device
[ 641.397888] sd 2:0:0:0: rejecting I/O to offline device
[ 641.398358] sd 2:0:0:0: rejecting I/O to offline device
[ 641.400009] sd 2:0:0:0: rejecting I/O to offline device
[ 641.400378] blk_update_request: I/O error, dev sda, sector 0
[ 641.423308] XFS (dm-0): Unmounting Filesystem
[ 642.585631] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[ 642.637953]
==================================================================
[ 642.638860] BUG: KASAN: use-after-free in scsi_exit_rq+0xf3/0x120 at addr
ffff8802b7fedf00
[ 642.639362] Read of size 1 by task rcuos/5/53
[ 642.639713] CPU: 7 PID: 53 Comm: rcuos/6 Not tainted 4.11.0-rc5+ #13
[ 642.640170] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 642.640923] Call Trace:
[ 642.641080] dump_stack+0x63/0x8f
[ 642.641289] kasan_object_err+0x21/0x70
[ 642.641531] kasan_report.part.1+0x231/0x500
[ 642.641823] ? scsi_exit_rq+0xf3/0x120
[ 642.642054] ? _raw_spin_unlock_irqrestore+0xe/0x10
[ 642.642353] ? free_percpu+0x1b7/0x340
[ 642.642586] ? put_task_stack+0x117/0x2b0
[ 642.642837] __asan_report_load1_noabort+0x2e/0x30
[ 642.643138] scsi_exit_rq+0xf3/0x120
[ 642.643366] free_request_size+0x44/0x60
[ 642.643614] mempool_destroy.part.6+0x9b/0x150
[ 642.643899] ? kasan_slab_free+0x87/0xb0
[ 642.644152] mempool_destroy+0x13/0x20
[ 642.644394] blk_exit_rl+0x36/0x40
[ 642.644614] blkg_free+0x146/0x200
[ 642.644836] __blkg_release_rcu+0x121/0x220
[ 642.645112] rcu_nocb_kthread+0x61f/0xca0
[ 642.645376] ? get_state_synchronize_rcu+0x20/0x20
[ 642.645690] ? pci_mmcfg_check_reserved+0x110/0x110
[ 642.646011] kthread+0x298/0x390
[ 642.646224] ? get_state_synchronize_rcu+0x20/0x20
[ 642.646535] ? kthread_park+0x160/0x160
[ 642.646787] ret_from_fork+0x2c/0x40
[ 642.647020] Object at ffff8802b7fedd80, in cache kmalloc-2048 size: 2048
[ 642.647435] Allocated:
[ 642.647591] PID = 3992
[ 642.647750] save_stack_trace+0x1b/0x20
[ 642.648000] save_stack+0x46/0xd0
[ 642.648208] kasan_kmalloc+0xad/0xe0
[ 642.648441] __kmalloc+0x134/0x220
[ 642.648664] scsi_host_alloc+0x6b/0x11c0
[ 642.648919] 0xffffffffc101d94a
[ 642.649124] driver_probe_device+0x49e/0xc60
[ 642.649398] __device_attach_driver+0x1d3/0x2a0
[ 642.649684] bus_for_each_drv+0x11a/0x1d0
[ 642.649946] __device_attach+0x1e1/0x2c0
[ 642.650200] device_initial_probe+0x13/0x20
[ 642.650557] bus_probe_device+0x19b/0x240
[ 642.651029] device_add+0x86d/0x1450
[ 642.651472] device_register+0x1a/0x20
[ 642.651778] 0xffffffffc10270ce
[ 642.651990] 0xffffffffc1048a62
[ 642.652197] do_one_initcall+0xa7/0x250
[ 642.652445] do_init_module+0x1d0/0x55d
[ 642.652696] load_module+0x7c9f/0x9850
[ 642.652937] SYSC_finit_module+0x189/0x1c0
[ 642.653200] SyS_finit_module+0xe/0x10
[ 642.653441] entry_SYSCALL_64_fastpath+0x1a/0xa9
[ 642.653735] Freed:
[ 642.653875] PID = 4128
[ 642.654304] save_stack_trace+0x1b/0x20
[ 642.654991] save_stack+0x46/0xd0
[ 642.655200] kasan_slab_free+0x71/0xb0
[ 642.655434] kfree+0x8d/0x1b0
[ 642.655624] scsi_host_dev_release+0x2cb/0x430
[ 642.655898] device_release+0x76/0x1e0
[ 642.656141] kobject_release+0x107/0x370
[ 642.656387] kobject_put+0x56/0xb0
[ 642.656596] put_device+0x17/0x20
[ 642.656800] scsi_host_put+0x15/0x20
[ 642.657027] 0xffffffffc101fcd7
[ 642.657225] device_release_driver_internal+0x26a/0x4e0
[ 642.657550] device_release_driver+0x12/0x20
[ 642.657821] bus_remove_device+0x2d0/0x590
[ 642.658071] device_del+0x55b/0x920
[ 642.658287] device_unregister+0x1a/0xa0
[ 642.658528] 0xffffffffc101e0ca
[ 642.658723] 0xffffffffc102fccc
[ 642.658933] SyS_delete_module+0x334/0x3e0
[ 642.659203] entry_SYSCALL_64_fastpath+0x1a/0xa9
[ 642.659505] Memory state around the buggy address:
[ 642.659815] ffff8802b7fede00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
fb
[ 642.660262] ffff8802b7fede80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
fb
[ 642.660725] >ffff8802b7fedf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
fb
[ 642.661187] ^
[ 642.661398] ffff8802b7fedf80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
fb
[ 642.661854] ffff8802b7fee000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
fb
[ 642.662315]
==================================================================
I triggered this while running xfstests: generic/108.
I don't know how to fix this but I did debug it a bit and exposed a bit of the
call chain
leading up to the bug.
The scsi code in scsi_alloc_queue sets up a prep_request fn:
blk_queue_prep_rq(q, scsi_prep_fn);
This function fails due to the: [ 641.397347] sd 2:0:0:0: rejecting I/O to
offline device
So we kick out a BLKPREP_KILL error. In blk-core.c: blk_peek_request we end up
calling
__blk_end_request_all
__blk_end_bidi_request
blk_finish_request
__blk_put_request
blk_put_rl
* Triggers __blkg_release_rcu * which goes down the UAF call chain at
the top
I'm not exactly sure who is causing a put_device causing the release(), I had
an idea but
turned out to be wrong, so now I'm not sure why the driver is getting nuked.
I'm running on top of Jens' for-next tree with most recent commit:
commit b74baaabbc2da23c7870859316f9e15d10f876fe
Merge: 5e9a6fc caf7df1
Author: Jens Axboe <[email protected]>
Date: Thu Apr 20 12:16:19 2017 -0600
Merge branch 'for-4.12/block' into for-next
I can test some patches, if necessary.