[PATCH RFC 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices

2015-12-08 Thread Ewan D. Milne
From: "Ewan D. Milne" 

The klist traversal used by the reading of /proc/scsi/scsi is not interlocked
against device removal.  It takes a reference on the containing object, but
this does not prevent the device from being removed from the list.  Thus, we
get errors and eventually panic, as shown in the traces below.  Fix this by
keeping a klist iterator in the seq_file private data.

The problem can be easily reproduced by repeatedly increasing scsi_debug's
max_luns to 30 and then deleting the devices via sysfs, while simulatenously
accessing /proc/scsi/scsi.

>From a patch originally developed by David Jeffery 

Dec  3 13:22:02 localhost kernel: WARNING: CPU: 2 PID: 28073 at 
include/linux/kref.h:47 klist_iter_init_node+0x3d/0x50()
Dec  3 13:22:02 localhost kernel: Modules linked in: scsi_debug 
x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel joydev iTCO_wdt 
dcdbas ipmi_devintf acpi_power_meter iTCO_vendor_support ipmi_si imsghandler 
pcspkr wmi acpi_cpufreq tpm_tis tpm shpchp lpc_ich mfd_core nfsd nfs_acl lockd 
grace sunrpc tg3 ptp pps_core
Dec  3 13:22:02 localhost kernel: CPU: 2 PID: 28073 Comm: cat Not tainted 
4.4.0-rc1+ #2
Dec  3 13:22:02 localhost kernel: Hardware name: Dell Inc. PowerEdge 
R320/08VT7V, BIOS 2.0.22 11/19/2013
Dec  3 13:22:02 localhost kernel: 81a20e77 880613acfd18 
81321eef 
Dec  3 13:22:02 localhost kernel: 880613acfd50 8107ca52 
88061176b198 
Dec  3 13:22:02 localhost kernel: 814542b0 880610cfb100 
88061176b198 880613acfd60
Dec  3 13:22:02 localhost kernel: Call Trace:
Dec  3 13:22:02 localhost kernel: [] dump_stack+0x44/0x55
Dec  3 13:22:02 localhost kernel: [] 
warn_slowpath_common+0x82/0xc0
Dec  3 13:22:02 localhost kernel: [] ? 
proc_scsi_show+0x20/0x20
Dec  3 13:22:02 localhost kernel: [] 
warn_slowpath_null+0x1a/0x20
Dec  3 13:22:02 localhost kernel: [] 
klist_iter_init_node+0x3d/0x50
Dec  3 13:22:02 localhost kernel: [] bus_find_device+0x51/0xb0
Dec  3 13:22:02 localhost kernel: [] scsi_seq_next+0x2d/0x40
Dec  3 13:22:02 localhost kernel: [] seq_read+0x290/0x370
Dec  3 13:22:02 localhost kernel: [] proc_reg_read+0x48/0x70
Dec  3 13:22:02 localhost kernel: [] __vfs_read+0x28/0xd0
Dec  3 13:22:02 localhost kernel: [] ? 
security_file_permission+0xa3/0xc0
Dec  3 13:22:02 localhost kernel: [] ? 
rw_verify_area+0x53/0xf0
Dec  3 13:22:02 localhost kernel: [] vfs_read+0x86/0x130
Dec  3 13:22:02 localhost kernel: [] SyS_read+0x46/0xa0
Dec  3 13:22:02 localhost kernel: [] 
entry_SYSCALL_64_fastpath+0x12/0x6a
Dec  3 13:22:02 localhost kernel: ---[ end trace 99a60fb1c41fc8c9 ]---
Dec  3 13:22:02 localhost kernel: [ cut here ]
Dec  3 13:22:02 localhost kernel: WARNING: CPU: 2 PID: 28073 at lib/klist.c:189 
klist_release+0xa8/0xb0()
Dec  3 13:22:02 localhost kernel: Modules linked in: scsi_debug 
x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel joydev iTCO_wdt 
dcdbas ipmi_devintf acpi_power_meter iTCO_vendor_support ipmi_si imsghandler 
pcspkr wmi acpi_cpufreq tpm_tis tpm shpchp lpc_ich mfd_core nfsd nfs_acl lockd 
grace sunrpc tg3 ptp pps_core
Dec  3 13:22:02 localhost kernel: CPU: 2 PID: 28073 Comm: cat Tainted: G
W   4.4.0-rc1+ #2
Dec  3 13:22:02 localhost kernel: Hardware name: Dell Inc. PowerEdge 
R320/08VT7V, BIOS 2.0.22 11/19/2013
Dec  3 13:22:02 localhost kernel: 81aaa040 880613acfcc0 
81321eef 
Dec  3 13:22:02 localhost kernel: 880613acfcf8 8107ca52 
dead00f8 880613acfd80
Dec  3 13:22:02 localhost kernel: 88060f7aa368 88060f7aa380 
88061176b198 880613acfd08
Dec  3 13:22:02 localhost kernel: Call Trace:
Dec  3 13:22:02 localhost kernel: [] dump_stack+0x44/0x55
Dec  3 13:22:02 localhost kernel: [] 
warn_slowpath_common+0x82/0xc0
Dec  3 13:22:02 localhost kernel: [] 
warn_slowpath_null+0x1a/0x20
Dec  3 13:22:02 localhost kernel: [] klist_release+0xa8/0xb0
Dec  3 13:22:02 localhost kernel: [] ? 
bus_uevent_store+0x50/0x50
Dec  3 13:22:02 localhost kernel: [] klist_next+0x95/0xf0
Dec  3 13:22:02 localhost kernel: [] ? 
proc_scsi_show+0x20/0x20
Dec  3 13:22:02 localhost kernel: [] bus_find_device+0x72/0xb0
Dec  3 13:22:02 localhost kernel: [] scsi_seq_next+0x2d/0x40
Dec  3 13:22:02 localhost kernel: [] seq_read+0x290/0x370
Dec  3 13:22:02 localhost kernel: [] proc_reg_read+0x48/0x70
Dec  3 13:22:02 localhost kernel: [] __vfs_read+0x28/0xd0
Dec  3 13:22:02 localhost kernel: [] ? 
security_file_permission+0xa3/0xc0
Dec  3 13:22:02 localhost kernel: [] ? 
rw_verify_area+0x53/0xf0
Dec  3 13:22:02 localhost kernel: [] vfs_read+0x86/0x130
Dec  3 13:22:02 localhost kernel: [] SyS_read+0x46/0xa0
Dec  3 13:22:02 localhost kernel: [] 
entry_SYSCALL_64_fastpath+0x12/0x6a
Dec  3 13:22:02 localhost kernel: ---[ end trace 99a60fb1c41fc8ca ]---
Dec  3 13:22:02 localhost kernel: [ cut here ]
Dec  3 13:22:02 localhost kernel: WARNING: CPU: 2 PID: 28073 a

Re: [PATCH RFC 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices

2015-12-18 Thread Hannes Reinecke

On 12/08/2015 03:08 PM, Ewan D. Milne wrote:

From: "Ewan D. Milne" 

The klist traversal used by the reading of /proc/scsi/scsi is not interlocked
against device removal.  It takes a reference on the containing object, but
this does not prevent the device from being removed from the list.  Thus, we
get errors and eventually panic, as shown in the traces below.  Fix this by
keeping a klist iterator in the seq_file private data.

The problem can be easily reproduced by repeatedly increasing scsi_debug's
max_luns to 30 and then deleting the devices via sysfs, while simulatenously
accessing /proc/scsi/scsi.

 From a patch originally developed by David Jeffery 


That's now, what, the third attempt on fixing this?

All previous attempts have been rejected on the grounds that 
/proc/scsi/scsi is deprecated and we should allow any updates to it.


Maybe this time we get lucky ...

Cheers,

Hannes
--
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices

2016-01-04 Thread Martin K. Petersen
> "Ewan" == Ewan D Milne  writes:

Ewan> The klist traversal used by the reading of /proc/scsi/scsi is not
Ewan> interlocked against device removal.  It takes a reference on the
Ewan> containing object, but this does not prevent the device from being
Ewan> removed from the list.  Thus, we get errors and eventually panic,
Ewan> as shown in the traces below.  Fix this by keeping a klist
Ewan> iterator in the seq_file private data.

Applied to 4.5/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html