Re: Spurious DISK_EVENT_MEDIA_CHANGE on USB DVD hotplug?
On 08/14/2017 02:30 PM, Tejun Heo wrote: > Hello, Joe. > > On Thu, Aug 10, 2017 at 10:45:54AM -0400, Joe Lawrence wrote: >> In the case of my USB DVD -> laptop example, there is no media in my >> device, however I still see the DISK_EVENT_MEDIA_CHANGE event. This is >> a bit confusing, and I was wondering if there was an explanation for >> the following: >> >> drivers/scsi/sr.c :: sr_probe() >> >> disk->events = DISK_EVENT_MEDIA_CHANGE | DISK_EVENT_EJECT_REQUEST; >> ... >> cd->media_present = 1; >> >> DISK_EVENT_MEDIA_CHANGE events will pass through to userspace and >> for some reason cd->media_present defaults to 1? More on that below... > > I don't have any concrete ideas but I think the only thing it's trying > to do is to always generate at least one changed event no matter what. > > ... >> sr_check_events() compares the previous (in this case, default) >> media_present value with what the TUR returns. If it has changed, then >> turn on the DISK_EVENT_MEDIA_CHANGE event bit. >> >> In my laptop USB DVD case, !scsi_status_is_good and sshdr.asc == 0x3a, >> so last_present (1) and cd->media_present (0) mis-compare and the change >> event is set. That does not seem intuitive to me, is this a bug? > > It's not incorrect. We can try to change the behavior to avoid double > notifications but given that this has been like this for a really long > time and that it isn't technically incorrect, I'm not sure changing it > is a good idea. It might as well break other things. Without a definition of DISK_EVENT_MEDIA_CHANGE or its udev DISK_MEDIA_CHANGE counterpart it's kinda hard to say :) But I agree that changing this behavior could have inadvertent effects. >> Bringing this back to the reported BMC case, which presumably does have >> "media" present in the virtual device... is it reasonable to expect a >> DISK_EVENT_MEDIA_CHANGE even for a new device that contains media? (I >> haven't verified, but in this case GET_EVENT_STATUS_NOTIFICATION might >> be enough to set media present.) > > Yeah, I think so. > >> If there is documentation that explains DISK_EVENT_MEDIA_CHANGE conditions >> somewhere, feel free to point me there. > > AFAIK, there isn't any. The only thing it tries to do is generating > at least one event after media change. Given that media is reported > present after the last notification, I think userspace should be able > to do the right thing (and must have been doing the right thing until > recently). I have no idea if udev or other userspace has changed in this respect, or this is simply a timing window that this particular user has fallen into. This is a simulated device, so it might be fast/slower than any real hardware. Thanks for chiming in, -- Joe
Spurious DISK_EVENT_MEDIA_CHANGE on USB DVD hotplug?
Hi Tejun, Kay, I'm investigating a customer report which manifests itself all the way up in gnome-session when a BMC hotplug-adds a simulated DVD device. The user logs into their server's BMC and enables "media redirection", an emulated DVD device + .iso is dynamically added to the bus... in the past this has worked well, however, they are now noticing a timing condition on RHEL7 that prevents gnome from successfully auto-mounting the DVD media. With Harald's help, I've done some debugging and we've found out that on hotplug-add, the kernel sends two uevents (ADD, CHANGE) in short succession. (Example with an ordinary, physical USB DVD device on my laptop, is very similar): % udevadm monitor -k -e KERNEL[2409061.130338] add /devices/pci:00/:00:14.0/usb3/3-9/3-9.3/3-9.3:1.0/host20/target20:0:0/20:0:0:0/block/sr1 (block) ACTION=add DEVNAME=/dev/sr1 DEVPATH=/devices/pci:00/:00:14.0/usb3/3-9/3-9.3/3-9.3:1.0/host20/target20:0:0/20:0:0:0/block/sr1 DEVTYPE=disk MAJOR=11 MINOR=1 SEQNUM=5885 SUBSYSTEM=block ... KERNEL[2409061.134076] change /devices/pci:00/:00:14.0/usb3/3-9/3-9.3/3-9.3:1.0/host20/target20:0:0/20:0:0:0/block/sr1 (block) ACTION=change DEVNAME=/dev/sr1 DEVPATH=/devices/pci:00/:00:14.0/usb3/3-9/3-9.3/3-9.3:1.0/host20/target20:0:0/20:0:0:0/block/sr1 DEVTYPE=disk > DISK_MEDIA_CHANGE=1 MAJOR=11 MINOR=1 SEQNUM=5889 SUBSYSTEM=block (Both of these events trigger a call out to the 'cdrom_id' userspace program, the latter of which interferes with the gnome-session auto-mounting feature.) With a systemtap probe, I can also see that there are four userspace openers of the cdrom when it is added: (parent)-> (child) : system-tap probe-point --- > systemd-udevd(849) -> systemd-udevd(6783) : > module("cdrom").function("cdrom_open@drivers/cdrom/cdrom.c:980") systemd-udevd(6783) -> cdrom_id(6791) : module("cdrom").function("cdrom_open@drivers/cdrom/cdrom.c:980") systemd-udevd(849) -> systemd-udevd(6783) : module("cdrom").function("cdrom_open@drivers/cdrom/cdrom.c:980") systemd-udevd(6783) -> cdrom_id(6794) : module("cdrom").function("cdrom_open@drivers/cdrom/cdrom.c:980") where on the first opener, the kernel eventually invokes sr_mod::sr_check_events() and gets a DISK_EVENT_MEDIA_CHANGE return code. In the case of my USB DVD -> laptop example, there is no media in my device, however I still see the DISK_EVENT_MEDIA_CHANGE event. This is a bit confusing, and I was wondering if there was an explanation for the following: drivers/scsi/sr.c :: sr_probe() disk->events = DISK_EVENT_MEDIA_CHANGE | DISK_EVENT_EJECT_REQUEST; ... cd->media_present = 1; DISK_EVENT_MEDIA_CHANGE events will pass through to userspace and for some reason cd->media_present defaults to 1? More on that below... drivers/scsi/sr.c :: sr_check_events() ... do_tur: /* let's see whether the media is there with TUR */ last_present = cd->media_present; ret = scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr); /* * Media is considered to be present if TUR succeeds or fails with * sense data indicating something other than media-not-present * (ASC 0x3a). */ cd->media_present = scsi_status_is_good(ret) || (scsi_sense_valid(&sshdr) && sshdr.asc != 0x3a); if (last_present != cd->media_present) cd->device->changed = 1; if (cd->device->changed) { events |= DISK_EVENT_MEDIA_CHANGE; cd->device->changed = 0; cd->tur_changed = true; } ... sr_check_events() compares the previous (in this case, default) media_present value with what the TUR returns. If it has changed, then turn on the DISK_EVENT_MEDIA_CHANGE event bit. In my laptop USB DVD case, !scsi_status_is_good and sshdr.asc == 0x3a, so last_present (1) and cd->media_present (0) mis-compare and the change event is set. That does not seem intuitive to me, is this a bug? Bringing this back to the reported BMC case, which presumably does have "media" present in the virtual device... is it reasonable to expect a DISK_EVENT_MEDIA_CHANGE even for a new device that contains media? (I haven't verified, but in this case GET_EVENT_STATUS_NOTIFICATION might be enough to set media present.) If there is documentation that explains DISK_EVENT_MEDIA_CHANGE conditions somewhere, feel free to point me there. Thanks, -- Joe
[PATCH 2/2] mpt3sas - avoid mpt3sas_transport_port_add NULL parent_dev
If _scsih_sas_host_add's call to mpt3sas_config_get_sas_iounit_pg0 fails, ioc->sas_hba.parent_dev may be left uninitialized. A later device probe could invoke mpt3sas_transport_port_add which will call sas_port_alloc_num [scsi_transport_sas] with a NULL parent_dev pointer. Signed-off-by: Joe Lawrence --- drivers/scsi/mpt3sas/mpt3sas_transport.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c index 6a84b82d71bb..ff93286bc32f 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_transport.c +++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c @@ -705,6 +705,11 @@ mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, u16 handle, goto out_fail; } + if (!sas_node->parent_dev) { + pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", + ioc->name, __FILE__, __LINE__, __func__); + goto out_fail; + } port = sas_port_alloc_num(sas_node->parent_dev); if ((sas_port_add(port))) { pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] mpt3sas - set num_phys after allocating phy[] space
In _scsih_sas_host_add, the number of HBA phys are determined and then later used to allocate an array of struct _sas_phy's. If the routine sets ioc->sas_hba.num_phys, but then fails to allocate the ioc->sas_hba.phy array (by kcalloc error or other intermediate error/exit path), ioc->sas_hba is left in a dangerous state: all readers of ioc->sas_hba.phy[] do so by indexing it from 0..ioc->sas_hba.num_phys without checking that the space was ever allocated. Modify _scsih_sas_host_add to set ioc->sas_hba.num_phys only after successfully allocating ioc->sas_hba.phy[]. Signed-off-by: Joe Lawrence --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index f2139e5604a3..6e36d20c9e0b 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -4893,13 +4893,22 @@ _scsih_sas_host_add(struct MPT3SAS_ADAPTER *ioc) u16 ioc_status; u16 sz; u8 device_missing_delay; + u8 num_phys; - mpt3sas_config_get_number_hba_phys(ioc, &ioc->sas_hba.num_phys); - if (!ioc->sas_hba.num_phys) { + mpt3sas_config_get_number_hba_phys(ioc, &num_phys); + if (!num_phys) { pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", ioc->name, __FILE__, __LINE__, __func__); return; } + ioc->sas_hba.phy = kcalloc(num_phys, + sizeof(struct _sas_phy), GFP_KERNEL); + if (!ioc->sas_hba.phy) { + pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", + ioc->name, __FILE__, __LINE__, __func__); + goto out; + } + ioc->sas_hba.num_phys = num_phys; /* sas_iounit page 0 */ sz = offsetof(Mpi2SasIOUnitPage0_t, PhyData) + (ioc->sas_hba.num_phys * @@ -4959,13 +4968,6 @@ _scsih_sas_host_add(struct MPT3SAS_ADAPTER *ioc) MPI2_SASIOUNIT1_REPORT_MISSING_TIMEOUT_MASK; ioc->sas_hba.parent_dev = &ioc->shost->shost_gendev; - ioc->sas_hba.phy = kcalloc(ioc->sas_hba.num_phys, - sizeof(struct _sas_phy), GFP_KERNEL); - if (!ioc->sas_hba.phy) { - pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", - ioc->name, __FILE__, __LINE__, __func__); - goto out; - } for (i = 0; i < ioc->sas_hba.num_phys ; i++) { if ((mpt3sas_config_get_phy_pg0(ioc, &mpi_reply, &phy_pg0, i))) { -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] _scsih_sas_host_add early exits can crash system
There are many error paths in _scsih_sas_host_add that lead to an early exit and a few that leave IOC resources uninitialized, setting the stage for a later crash. This can be emulated using a systemtap script like: % stap -g -e \ 'probe module("mpt3sas").function("mpt3sas_config_get_sas_iounit_pg0").return { $return = -1 }' to force early exit, while remove/re-adding an MPT3 adapter: % lspci -D | grep MPT :54:00.0 Mass storage controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) % SYSFS=$(find /sys/devices -name :54:00.0) % SYSFS_PARENT=$(dirname $SYSFS) % echo 1 > $SYSFS/remove % sleep 1m % echo 1 > $SYSFS_PARENT/rescan These two patches fix: 1) referencing unallocated ioc->sas_hba.phy[] space 2) passing a NULL ioc->sas_hba.parent_dev to the scsi_transport_sas layer. Note: these changes don't improve or retry adapter initialization, but only try to prevent the system from crashing Joe Lawrence (2): mpt3sas - set num_phys after allocating phy[] space mpt3sas - avoid mpt3sas_transport_port_add NULL parent_dev drivers/scsi/mpt3sas/mpt3sas_scsih.c | 20 +++- drivers/scsi/mpt3sas/mpt3sas_transport.c | 5 + 2 files changed, 16 insertions(+), 9 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix a bdi reregistration race, v3
On 05/05/2016 04:40 PM, Joe Lawrence wrote: > On 05/05/2016 03:58 PM, Bart Van Assche wrote: >> On 03/28/2016 02:29 PM, Bart Van Assche wrote: >>> Avoid that the sd driver registers a BDI device with a name that >>> is still in use. This patch avoids that the following warning gets >>> triggered: >>> >>> [ ... ] >> >> (replying to my own e-mail) >> >> If anyone could review this patch that would be very welcome. > > Hi Bart, > > I *think* I may be hitting this same problem running some tests here at > Stratus > ... snip... Hi Bart, Good news = With your v3 patch, I didn't see the "sysfs: cannot create duplicate filename '/devices/virtual/bdi/65:0'" warning during my weekend testing (573 surprise disk HBA removals). Bad news = I still crashed in add_disk > sysfs_create_link > sysfs_do_create_link_sd on a NULL target_kobj->sd ... unfortunately I don't have kdump working, so all I have is a serial console output to work with for now. Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix a bdi reregistration race, v3
On 05/05/2016 03:58 PM, Bart Van Assche wrote: > On 03/28/2016 02:29 PM, Bart Van Assche wrote: >> Avoid that the sd driver registers a BDI device with a name that >> is still in use. This patch avoids that the following warning gets >> triggered: >> >> [ ... ] > > (replying to my own e-mail) > > If anyone could review this patch that would be very welcome. Hi Bart, I *think* I may be hitting this same problem running some tests here at Stratus -- I've got slub_debugging and other kernel debugging turned on while repeatedly hotplugging controllers at various timings during probe. It takes hours to hit, but I've seen it twice in the past week. I've got some other testing on my plate at the moment, but when I get a chance I can give your v3 a spin. Here's an example (sorry for the wide width) ... [ cut here ] WARNING: CPU: 5 PID: 30702 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80 sysfs: cannot create duplicate filename '/devices/virtual/bdi/65:0' Modules linked in: btrfs xor raid6_pq msdos ext4 jbd2 mbcache matroxfb(OE) ccmod(POE) ftmod(OE) videosw(OE) ipmi_devintf fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack bonding ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter vfat fat x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul ablk_helper cryptd dm_service_time pcspkr ses enclosure sg wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath ip_tables xfs libcrc32c raid1 sr_mod cdrom crc32c_intel qla2xxx(OE) scsi_transport_fc ixgbe(OE) mdio igb(OE) ptp pps_core mpt3sas(OE) raid_class i2c_algo_bit sra_sense(OE) i2c_core sd_mod(OE) scsi_transport_sas dca scsi_hbas(OE) fjes ipmi_msghandler usb_storage dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_devintf] CPU: 5 PID: 30702 Comm: kworker/u97:7 Tainted: PW OE 4.6.0-rc6+ #37 Hardware name: Stratus ftServer 6800/G7LYY, BIOS BIOS Version 8.1:64 02/03/2016 Workqueue: events_unbound async_run_entry_fn 0286 e136f6f8 88038ae03a50 8134359f 88038ae03aa0 88038ae03a90 8108af31 001f5bdb0398 880067e0e7e8 88058f92def0 88084bd2c008 Call Trace: [] dump_stack+0x63/0x84 [] __warn+0xd1/0xf0 [] warn_slowpath_fmt+0x5f/0x80 [] sysfs_warn_dup+0x64/0x80 [] sysfs_create_dir_ns+0x7e/0x90 [] kobject_add_internal+0xaa/0x320 [] ? device_private_init+0x23/0x70 [] kobject_add+0x75/0xd0 [] ? mutex_lock+0x12/0x2f [] device_add+0x125/0x610 [] device_create_groups_vargs+0xd8/0x100 [] device_create_vargs+0x1c/0x20 [] bdi_register+0x8c/0x180 [] bdi_register_dev+0x27/0x30 [] add_disk+0x17f/0x4a0 [] ? update_autosuspend+0x55/0x60 [] ? __pm_runtime_use_autosuspend+0x54/0x70 [] sd_probe_async+0x115/0x1d0 [sd_mod] [] async_run_entry_fn+0x4a/0x140 [] process_one_work+0x16e/0x420 [] worker_thread+0x125/0x4b0 [] ? __schedule+0x2ad/0x8a0 [] ? rescuer_thread+0x380/0x380 [] kthread+0xd8/0xf0 [] ret_from_fork+0x22/0x40 [] ? kthread_park+0x60/0x60 ---[ end trace 353add4e78cb2a75 ]--- [ cut here ] WARNING: CPU: 5 PID: 30702 at lib/kobject.c:240 kobject_add_internal+0x262/0x320 kobject_add_internal failed for 65:0 with -EEXIST, don't try to register things with the same name in the same directory. Modules linked in: btrfs xor raid6_pq msdos ext4 jbd2 mbcache matroxfb(OE) ccmod(POE) ftmod(OE) videosw(OE) ipmi_devintf fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack bonding ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter vfat fat x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul ablk_helper cryptd dm_service_time pcspkr ses enclosure sg wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath ip_tables xfs libcrc32c raid1 sr_mod cdrom crc32c_intel qla2xxx(OE) scsi_transport_fc ixgbe(OE) mdio igb(OE) ptp pps_core mpt3sas(OE) raid_class i2c_algo_bit sra_sense(OE) i2c_core sd_mod(OE) scsi_transport_sas dca scsi_hbas(OE) fjes ipmi_msghandler usb_storage dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_devintf] CPU: 5 PID: 30702 Comm: kworker/u97:7 Tainted: PW OE 4.
[PATCH v2] mpt3sas - remove unused fw_event_work elements
Firmware events are queued up using the fw_event_work's struct work, not its delayed_work member. The initial driver for SAS2 controllers had handled firmware reset using the rescan barrier and was later redesigned through "mpt2sas: [Resend] Host Reset code cleanup". The delayed_work variables are now unused and may provoke CONFIG_DEBUG_OBJECTS_TIMERS "assert_init not available" false warnings in _scsih_fw_event_cleanup_queue. Cleanup fw_event_work's unused entries, update it's kerneldoc, and update _scsih_fw_event_cleanup_queue accordingly. Fixes: 146b16c8071f (mpt3sas: Refcount fw_events and fix unsafe list usage) Signed-off-by: Joe Lawrence --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index e0e4920d0fa6..f2139e5604a3 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -174,13 +174,13 @@ struct sense_info { * struct fw_event_work - firmware event struct * @list: link list framework * @work: work object (ioc->fault_reset_work_q) - * @cancel_pending_work: flag set during reset handling * @ioc: per adapter object * @device_handle: device handle * @VF_ID: virtual function id * @VP_ID: virtual port id * @ignore: flag meaning this event has been marked to ignore - * @event: firmware event MPI2_EVENT_XXX defined in mpt2_ioc.h + * @event: firmware event MPI2_EVENT_XXX defined in mpi2_ioc.h + * @refcount: kref for this event * @event_data: reply event data payload follows * * This object stored on ioc->fw_event_list. @@ -188,8 +188,6 @@ struct sense_info { struct fw_event_work { struct list_headlist; struct work_struct work; - u8 cancel_pending_work; - struct delayed_work delayed_work; struct MPT3SAS_ADAPTER *ioc; u16 device_handle; @@ -2804,12 +2802,12 @@ _scsih_fw_event_cleanup_queue(struct MPT3SAS_ADAPTER *ioc) /* * Wait on the fw_event to complete. If this returns 1, then * the event was never executed, and we need a put for the -* reference the delayed_work had on the fw_event. +* reference the work had on the fw_event. * * If it did execute, we wait for it to finish, and the put will * happen from _firmware_event_work() */ - if (cancel_delayed_work_sync(&fw_event->delayed_work)) + if (cancel_work_sync(&fw_event->work)) fw_event_work_put(fw_event); fw_event_work_put(fw_event); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mpt3sas - remove unused fw_event_work delayed_work
Hi Chaitra, while discussing this patch with Ewan, I realized that cancel_pending_work is unused as well. I can send a v2 with that and the associated kerneldoc update. Do we know why f1c35e6aea579 "mpt2sas: RESCAN Barrier work is added in case of HBA reset" was unneeded for the mpt3 version? If that is interesting, that info could be added to v2 commit message as well. Thanks, -- Joe On 04/11/2016 07:13 AM, Chaitra Basappa wrote: Hi, Please consider this patch as Ack-by: Chaitra P B Thanks, Chaitra -Original Message- From: Sathya Prakash [mailto:sathya.prak...@broadcom.com] Sent: Saturday, April 02, 2016 1:45 AM To: emi...@redhat.com; Joe Lawrence Cc: linux-scsi@vger.kernel.org; Chaitra Basappa; Suganath Prabu Subramani; Calvin Owens Subject: RE: [PATCH] mpt3sas - remove unused fw_event_work delayed_work We will look into this early next week and provide a detailed response. On the first look this is ACK from Broadcom, will reconfirm. -Original Message- From: Ewan D. Milne [mailto:emi...@redhat.com] Sent: Friday, April 01, 2016 2:04 PM To: Joe Lawrence Cc: linux-scsi@vger.kernel.org; Sathya Prakash; Chaitra P B; Suganath Prabu Subramani; Calvin Owens Subject: Re: [PATCH] mpt3sas - remove unused fw_event_work delayed_work On Fri, 2016-04-01 at 15:13 -0400, Joe Lawrence wrote: On 04/01/2016 02:51 PM, Ewan D. Milne wrote: On Fri, 2016-04-01 at 13:56 -0400, Joe Lawrence wrote: @@ -2804,12 +2803,12 @@ _scsih_fw_event_cleanup_queue(struct MPT3SAS_ADAPTER *ioc) /* * Wait on the fw_event to complete. If this returns 1, then * the event was never executed, and we need a put for the -* reference the delayed_work had on the fw_event. +* reference the work had on the fw_event. * * If it did execute, we wait for it to finish, and the put will * happen from _firmware_event_work() */ - if (cancel_delayed_work_sync(&fw_event->delayed_work)) + if (cancel_work_sync(&fw_event->work)) fw_event_work_put(fw_event); fw_event_work_put(fw_event); Hmm... Fixes: 146b16c8 (mpt3sas: Refcount fw_events and fix unsafe list usage) This could technically go back to f92363d12359 (mpt3sas: add new driver supporting 12GB SAS) ... but will probably only apply cleanly to _scsih_fw_event_cleanup_queue after 146b16c8 (mpt3sas: Refcount fw_events and fix unsafe list usage), so you're right. Since mpt3sas uses ->work instead of _delayed_work this would seem to be correct, however the deprecated mpt2sas driver had a commit that changed the firmware event work mechanism to use ->delayed_work instead of ->work: commit f1c35e6aea579d5bdb6dc02dfa99c67c7c3b3f67 Author: Kashyap, Desai Date: Tue Mar 9 16:31:43 2010 +0530 Okay, so this is pre-mpt3sas split. [SCSI] mpt2sas: RESCAN Barrier work is added in case of HBA reset. Add the cancel_pending_work flag from the fw_event_work structure, and then to set the flag during host reset, check the flag later from work threads context and if cancel_pending_work_flag is set ingore those events. More unused mpt2 vestiges in the mpt3 version? % cd drivers/scsi/mpt3sas/ % grep 'cancel_pending_work' *.{c,h} mpt3sas_scsih.c: * @cancel_pending_work: flag set during reset handling mpt3sas_scsih.c:u8 cancel_pending_work; Now Rescan after host reset is changed. Added special task MPT2SAS_RESCAN_AFTER_HOST_RESET. This task will be queued at the time of HBA reset. this task is treated as barrier. All work after MPT2SAS_RESCAN_AFTER_HOST_RESET will be treated as new work and will be server by callback handle. If host_recovery is going on while running RESCAN task, it will wait for shos_recovery_done completion which will be called from HBA reset DONE context. FWIW, I don't see anything like this in today's mpt3sas driver. Well, that's the question. Is there some functionality missing? Were the changes abandoned/replaced? mpt2sas used delayed_work for something else, so maybe that's why the firmware event changes initially used it (albeit with a 0 delay) but it's hard to know... cancel_delayed_work() will call del_timer() on delayed_work->timer, but it looks like kzalloc is used to allocate the fw_event_work objects so perhaps nothing bad will happen. I was wondering, though, because I have seen dumps of hung systems with requests that should have timed out but are not on any timer list. -Ewan Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mpt3sas - remove unused fw_event_work delayed_work
On 04/01/2016 02:51 PM, Ewan D. Milne wrote: > On Fri, 2016-04-01 at 13:56 -0400, Joe Lawrence wrote: >> @@ -2804,12 +2803,12 @@ _scsih_fw_event_cleanup_queue(struct MPT3SAS_ADAPTER >> *ioc) >> /* >> * Wait on the fw_event to complete. If this returns 1, then >> * the event was never executed, and we need a put for the >> - * reference the delayed_work had on the fw_event. >> + * reference the work had on the fw_event. >> * >> * If it did execute, we wait for it to finish, and the put will >> * happen from _firmware_event_work() >> */ >> -if (cancel_delayed_work_sync(&fw_event->delayed_work)) >> +if (cancel_work_sync(&fw_event->work)) >> fw_event_work_put(fw_event); >> >> fw_event_work_put(fw_event); > > Hmm... Fixes: 146b16c8 (mpt3sas: Refcount fw_events and fix unsafe list usage) This could technically go back to f92363d12359 (mpt3sas: add new driver supporting 12GB SAS) ... but will probably only apply cleanly to _scsih_fw_event_cleanup_queue after 146b16c8 (mpt3sas: Refcount fw_events and fix unsafe list usage), so you're right. > Since mpt3sas uses ->work instead of _delayed_work this would seem to be > correct, however the deprecated mpt2sas driver had a commit that changed > the firmware event work mechanism to use ->delayed_work instead of ->work: > > commit f1c35e6aea579d5bdb6dc02dfa99c67c7c3b3f67 > Author: Kashyap, Desai > Date: Tue Mar 9 16:31:43 2010 +0530 Okay, so this is pre-mpt3sas split. > [SCSI] mpt2sas: RESCAN Barrier work is added in case of HBA reset. > > Add the cancel_pending_work flag from the fw_event_work structure, and > then to > set the flag during host reset, check the flag later from work threads > context and if cancel_pending_work_flag is set ingore those events. More unused mpt2 vestiges in the mpt3 version? % cd drivers/scsi/mpt3sas/ % grep 'cancel_pending_work' *.{c,h} mpt3sas_scsih.c: * @cancel_pending_work: flag set during reset handling mpt3sas_scsih.c:u8 cancel_pending_work; > Now Rescan after host reset is changed. > Added special task MPT2SAS_RESCAN_AFTER_HOST_RESET. This task will be > queued > at the time of HBA reset. this task is treated as barrier. All work after > MPT2SAS_RESCAN_AFTER_HOST_RESET will be treated as new work and will be > server by callback handle. If host_recovery is going on while running > RESCAN > task, it will wait for shos_recovery_done completion which will be called > from HBA reset DONE context. FWIW, I don't see anything like this in today's mpt3sas driver. Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] mpt3sas - remove unused fw_event_work delayed_work
The driver's fw events are queued up using the the fw_event_work's struct work, not its delayed_work member. The latter appears to be unused and may provoke CONFIG_DEBUG_OBJECTS_TIMERS "assert_init not available" false warnings in _scsih_fw_event_cleanup_queue. Remove it and update _scsih_fw_event_cleanup_queue accordingly. Signed-off-by: Joe Lawrence --- I think this goes all the way back to the introduction of the mpt3sas driver. The previous generation mpt2sas driver uses delayed_work, so perhaps it was simply copied and pasted into the mpt3sas but never updated. drivers/scsi/mpt3sas/mpt3sas_scsih.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index e0e4920d0fa6..67643602efbc 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -189,7 +189,6 @@ struct fw_event_work { struct list_headlist; struct work_struct work; u8 cancel_pending_work; - struct delayed_work delayed_work; struct MPT3SAS_ADAPTER *ioc; u16 device_handle; @@ -2804,12 +2803,12 @@ _scsih_fw_event_cleanup_queue(struct MPT3SAS_ADAPTER *ioc) /* * Wait on the fw_event to complete. If this returns 1, then * the event was never executed, and we need a put for the -* reference the delayed_work had on the fw_event. +* reference the work had on the fw_event. * * If it did execute, we wait for it to finish, and the put will * happen from _firmware_event_work() */ - if (cancel_delayed_work_sync(&fw_event->delayed_work)) + if (cancel_work_sync(&fw_event->work)) fw_event_work_put(fw_event); fw_event_work_put(fw_event); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] mpt2sas: Refcount sas_device objects and fix unsafe list usage
On 07/12/2015 12:24 AM, Calvin Owens wrote: > These objects can be referenced concurrently throughout the driver, we > need a way to make sure threads can't delete them out from under each > other. This patch adds the refcount, and refactors the code to use it. > > Additionally, we cannot iterate over the sas_device_list without > holding the lock, or we risk corrupting random memory if items are > added or deleted as we iterate. This patch refactors _scsih_probe_sas() > to use the sas_device_list in a safe way. > > Cc: Christoph Hellwig > Cc: Bart Van Assche > Signed-off-by: Calvin Owens > --- > drivers/scsi/mpt2sas/mpt2sas_base.h | 22 +- > drivers/scsi/mpt2sas/mpt2sas_scsih.c | 434 > --- > drivers/scsi/mpt2sas/mpt2sas_transport.c | 12 +- > 3 files changed, 315 insertions(+), 153 deletions(-) [ ... snip ... ] > @@ -2078,7 +2150,7 @@ _scsih_slave_configure(struct scsi_device *sdev) > } > > spin_lock_irqsave(&ioc->sas_device_lock, flags); > - sas_device = mpt2sas_scsih_sas_device_find_by_sas_address(ioc, > + sas_device = __mpt2sas_get_sdev_by_addr(ioc, > sas_device_priv_data->sas_target->sas_address); > if (!sas_device) { > spin_unlock_irqrestore(&ioc->sas_device_lock, flags); > @@ -2116,13 +2188,14 @@ _scsih_slave_configure(struct scsi_device *sdev) > if (!ssp_target) > _scsih_display_sata_capabilities(ioc, handle, sdev); > > - > _scsih_change_queue_depth(sdev, qdepth); > > if (ssp_target) { > sas_read_port_mode_page(sdev); > _scsih_enable_tlr(ioc, sdev); > } > + > + sas_device_put(sas_device); > return 0; > } Hi Calvin, Any reason why this sas_device_put is placed outside the sas_device lock? Most other instances in this patch were called just before unlocking. BTW I attempted testing, but needed to port to mpt3 and ended up with a driver that didn't boot :( Hopefully I can retry later this week, or find an older mpt2 box lying around. -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RESEND] qla2xxx: prevent board_disable from running during EEH
On 06/26/2015 12:33 PM, Mauricio Faria de Oliveira wrote: > Commit f3ddac1918fe963bcbf8d407a3a3c0881b47248b ("[SCSI] qla2xxx: > Disable adapter when we encounter a PCI disconnect.") has introduced a > code that disables the board, releasing some resources, when reading > 0x. > > In case this happens when there is an EEH, this read will trigger EEH > detection and set PCI channel offline. EEH will be able to recover the > card from this state by doing a reset, so it's a better option than > simply disabling the card. > > Since eeh_check_failure will mark the channel as offline before > returning the read value, in case there really was an EEH, we can simply > check for pci_channel_offline, preventing the board_disable code from > running if it's true. > > Without this patch, EEH code will try to access those same resources > that board_disable will try to free. This race can cause EEH recovery to > fail. > > [ 504.370577] EEH: Notify device driver to resume > [ 504.370580] qla2xxx [0001:07:00.0]-9002:2: The device failed to > resume I/O from slot/link_reset. > > Signed-off-by: Thadeu Lima de Souza Cascardo > Cc: > Cc: > --- > drivers/scsi/qla2xxx/qla_isr.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/scsi/qla2xxx/qla_isr.c > b/drivers/scsi/qla2xxx/qla_isr.c > index a04a1b1..8132926 100644 > --- a/drivers/scsi/qla2xxx/qla_isr.c > +++ b/drivers/scsi/qla2xxx/qla_isr.c > @@ -116,7 +116,7 @@ bool > qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg) > { > /* Check for PCI disconnection */ > -if (reg == 0x) { > +if (reg == 0x && !pci_channel_offline(vha->hw->pdev)) { > if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags) && > !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags) && > !test_bit(PFLG_DRIVER_PROBING, &vha->pci_flags)) { Hi Mauricio, Re: signed-off-by chain -- I believe if you are (re)sending another person's patch, you will need a: "From: Thadeu Lima de Souza Cascardo " tag at the top of the message body to retain original authorship and then your own: "Signed-off-by: Mauricio Faria de Oliveira " tag below Thadeu's since the patch has passed through you. Re: the patch -- I did some work last year to harden board_disable against various races, but without having EEH hardware available, I was uncertain about EEH behavior. For example: pci_read EEH -> marks channel offline pci_read returns ~0 When do the EEH error handler callbacks run? Thanks, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mpt2sas: Abort initialization if no memory I/O resources, detected
On 06/21/2015 02:46 PM, Timothy Pearson wrote: > On 06/16/2015 01:49 PM, Timothy Pearson wrote: >> On 06/16/2015 12:42 PM, Joe Lawrence wrote: >>> On 06/16/2015 12:28 PM, Timothy Pearson wrote: >>>> On 06/12/2015 05:05 PM, Timothy Pearson wrote: >>>>> The mpt2sas driver crashes if the BIOS does not set up at least one >>>>> memory I/O resource. This failure can happen if the device is too >>>>> slow to respond during POST and is missed by the BIOS, but Linux >>>>> then detects the device later in the boot process. >>>>> >>>>> This patch aborts initialization and prints a warning if no memory I/O >>>>> resources are found. >>>>> >>>>> Signed-off-by: Timothy Pearson >>>>> Tested-by: Timothy Pearson >>>>> --- >>>>> drivers/scsi/mpt2sas/mpt2sas_base.c | 9 + >>>>> 1 file changed, 9 insertions(+) >>>>> >>>>> diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c >>>>> b/drivers/scsi/mpt2sas/mpt2sas_base.c >>>>> index 11248de..15c9504 100644 >>>>> --- a/drivers/scsi/mpt2sas/mpt2sas_base.c >>>>> +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c >>>>> @@ -6,6 +6,8 @@ >>>>> * Copyright (C) 2007-2014 LSI Corporation >>>>> * Copyright (C) 20013-2014 Avago Technologies >>>>> * (mailto: mpt-fusionlinux@avagotech.com) >>>>> + * Copyright (C) 2015 Raptor Engineering >>>>> + * (mailto: supp...@araptorengineeringinc.com) >>>>> * >>>>> * This program is free software; you can redistribute it and/or >>>>> * modify it under the terms of the GNU General Public License >>>>> @@ -1582,6 +1584,13 @@ mpt2sas_base_map_resources(struct >>>>> MPT2SAS_ADAPTER >>>>> *ioc) >>>>> } >>>>> } >>>>> >>>>> + if (ioc->chip == NULL) { >>>>> + printk(MPT2SAS_ERR_FMT "unable to map " >>>>> + "adapter memory (resource not found)!\n", ioc->name); >>>>> + r = -EINVAL; >>>>> + goto out_fail; >>>>> + } >>>>> + >>>>> _base_mask_interrupts(ioc); >>>>> >>>>> r = _base_get_ioc_facts(ioc, CAN_SLEEP); >>>> >>>> Just following up on this patch as I have not yet received any >>>> response. >>>> >>>> Thanks! >>> >>> Hi Tim -- just curious, why was the similar check on ioc->chip just a >>> few lines above the one added by the patch insufficient? >>> >>> That loop block sets memap_sz when it finds an IORESOURCE_MEM so that it >>> only sets ioc->chip once. I wonder if the fix might be simpler if the >>> existing ioc->chip check relocated entirely to where you put it (maybe >>> also pulling the entire error text onto one line for easier grepping). >>> >>> Regards, >>> >>> -- Joe >> >> If there are no IORESOURCE_MEM resources allocated by the BIOS (i.e. if >> the BIOS does not run resource allocation on the mpt2sas device) then >> the check you are referring to is not executed, and the driver attempts >> to perform operations on a null ioc->chip pointer. >> >> I can relocate the check if desired. >> > > On looking more closely at the code I think having the two separate > checks is useful for debugging. The first error message is triggered if > the resource exists but Linux cannot map it for some reason, and the > second error message is triggered if the resource does not exist at all > (buggy BIOS). Fair enough. The two error messages are unique, so grepping will lead to the correct check. Only other nits would be the addition of a copyright (up to Avago I guess) and an mpt3sas version (these drivers are 99% the same code). Acked-by: Joe Lawrence -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OOPS: unplugging western digital passport drive
Hi Alan -- last I heard from Stan on this bug was April 14 -- he was on "long holidays" at that time :) -- Joe On 06/17/2015 03:09 PM, Alan Stern wrote: > Joe or Stanisナ^ツaw: > > I never heard anything back about this. Does the patch fix the crash? > > Alan Stern > > > On Wed, 18 Mar 2015, Alan Stern wrote: > >> On Wed, 18 Mar 2015, Alan Stern wrote: >> >>> On Tue, 17 Mar 2015, Joe Lawrence wrote: >>> >>>> On 03/11/2015 12:25 AM, StanisÅ‚aw Pitucha wrote: >>>>> Hi linux-scsi, >>>>> I've got another case of reproducible crash when unplugging western >>>>> digital passport drives. This was mentioned before in >>>>> http://www.spinics.net/lists/linux-scsi/msg82603.html >>> >>> Like it says in that thread, the problem is somehow related to the ses >>> driver. If you remove or blacklist that driver, there won't be any >>> more crashes. >> >> Looks like I spoke too soon. I think the patch below will fix the >> problem. Let me know what happens. >> >> Alan Stern >> >> >> >> >> Index: usb-4.0/drivers/scsi/scsi_pm.c >> === >> --- usb-4.0.orig/drivers/scsi/scsi_pm.c >> +++ usb-4.0/drivers/scsi/scsi_pm.c >> @@ -217,14 +217,18 @@ static int sdev_runtime_suspend(struct d >> { >> const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; >> struct scsi_device *sdev = to_scsi_device(dev); >> -int err; >> +struct device *blk_rpm_dev = sdev->request_queue->dev; >> +int err = 0; >> >> -err = blk_pre_runtime_suspend(sdev->request_queue); >> -if (err) >> -return err; >> +if (blk_rpm_dev) { >> +err = blk_pre_runtime_suspend(sdev->request_queue); >> +if (err) >> +return err; >> +} >> if (pm && pm->runtime_suspend) >> err = pm->runtime_suspend(dev); >> -blk_post_runtime_suspend(sdev->request_queue, err); >> +if (blk_rpm_dev) >> +blk_post_runtime_suspend(sdev->request_queue, err); >> >> return err; >> } >> @@ -246,12 +250,15 @@ static int sdev_runtime_resume(struct de >> { >> struct scsi_device *sdev = to_scsi_device(dev); >> const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; >> +struct device *blk_rpm_dev = sdev->request_queue->dev; >> int err = 0; >> >> -blk_pre_runtime_resume(sdev->request_queue); >> +if (blk_rpm_dev) >> +blk_pre_runtime_resume(sdev->request_queue); >> if (pm && pm->runtime_resume) >> err = pm->runtime_resume(dev); >> -blk_post_runtime_resume(sdev->request_queue, err); >> +if (blk_rpm_dev) >> +blk_post_runtime_resume(sdev->request_queue, err); >> >> return err; >> } >> >> >> > > > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
Re: [PATCH 17/20 v1] [SCSI] mpt3sas: Use alloc_ordered_workqueue() API instead of create_singlethread_workqueue() API
On 06/18/2015 09:06 AM, Sreekanth Reddy wrote: > On Thu, Jun 18, 2015 at 5:40 PM, Joe Lawrence > wrote: >> On 06/16/2015 01:37 AM, Sreekanth Reddy wrote: >>> Created a thread using alloc_ordered_workqueue() API in order to process >>> the works from firmware Work-queue sequentially instead of >>> create_singlethread_workqueue() API. >>> >>> Changes in v1: >>> No need to check for backport compatibility in the upstream kernel. >>> so removing the else section where driver use >>> create_singlethread_workqueue() API if alloc_ordered_workqueue() API is >>> not defined, This else section is not required since in the latest upstream >>> kernel this alloc_ordered_workqueue() API is always defined. >>> >>> Signed-off-by: Sreekanth Reddy >>> --- >>> drivers/scsi/mpt3sas/mpt3sas_scsih.c | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c >>> b/drivers/scsi/mpt3sas/mpt3sas_scsih.c >>> index b848458..7e5926c 100644 >>> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c >>> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c >>> @@ -8085,8 +8085,8 @@ _scsih_probe(struct pci_dev *pdev, const struct >>> pci_device_id *id) >>> /* event thread */ >>> snprintf(ioc->firmware_event_name, sizeof(ioc->firmware_event_name), >>> "fw_event%d", ioc->id); >>> - ioc->firmware_event_thread = create_singlethread_workqueue( >>> - ioc->firmware_event_name); >>> + ioc->firmware_event_thread = alloc_ordered_workqueue( >>> + ioc->firmware_event_name, WQ_MEM_RECLAIM); >>> if (!ioc->firmware_event_thread) { >>> pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", >>> ioc->name, __FILE__, __LINE__, __func__); >>> >> >> Hi Sreekanth, >> >> Is this change still needed after e09c2c2954684 workqueue: apply >> __WQ_ORDERED to create_singlethread_workqueue() ? (3.17+) > > I won't say that it is compulsory required, but I feel it is better if > these changes are included. since initially we thought that thread > created by using create_singlethread_workqueue() will process the > works sequentially but in-between it has broken and then it is fixed > by Tejun. So I thought it is better to directly use the > alloc_ordered_workqueue() as create_singlethead_workqueue() API also > invoked the same API. Ok, I was just wondering if maybe create_singlethread_workqueue was fixed after this patch was initially written. Since it's effectively the same... Reviewed-by: Joe Lawrence Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/20 v1] [SCSI] mpt3sas: Use alloc_ordered_workqueue() API instead of create_singlethread_workqueue() API
On 06/16/2015 01:37 AM, Sreekanth Reddy wrote: > Created a thread using alloc_ordered_workqueue() API in order to process > the works from firmware Work-queue sequentially instead of > create_singlethread_workqueue() API. > > Changes in v1: > No need to check for backport compatibility in the upstream kernel. > so removing the else section where driver use > create_singlethread_workqueue() API if alloc_ordered_workqueue() API is > not defined, This else section is not required since in the latest upstream > kernel this alloc_ordered_workqueue() API is always defined. > > Signed-off-by: Sreekanth Reddy > --- > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > index b848458..7e5926c 100644 > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > @@ -8085,8 +8085,8 @@ _scsih_probe(struct pci_dev *pdev, const struct > pci_device_id *id) > /* event thread */ > snprintf(ioc->firmware_event_name, sizeof(ioc->firmware_event_name), > "fw_event%d", ioc->id); > - ioc->firmware_event_thread = create_singlethread_workqueue( > - ioc->firmware_event_name); > + ioc->firmware_event_thread = alloc_ordered_workqueue( > + ioc->firmware_event_name, WQ_MEM_RECLAIM); > if (!ioc->firmware_event_thread) { > pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", > ioc->name, __FILE__, __LINE__, __func__); > Hi Sreekanth, Is this change still needed after e09c2c2954684 workqueue: apply __WQ_ORDERED to create_singlethread_workqueue() ? (3.17+) In upstream, this change looks cosmetic (unless Tejun has a preference for one over the other), but maybe converting to alloc_ordered_workqueue keeps your in house version in closer sync? Thanks, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mpt2sas: Abort initialization if no memory I/O resources, detected
On 06/16/2015 12:28 PM, Timothy Pearson wrote: > On 06/12/2015 05:05 PM, Timothy Pearson wrote: >> The mpt2sas driver crashes if the BIOS does not set up at least one >> memory I/O resource. This failure can happen if the device is too >> slow to respond during POST and is missed by the BIOS, but Linux >> then detects the device later in the boot process. >> >> This patch aborts initialization and prints a warning if no memory I/O >> resources are found. >> >> Signed-off-by: Timothy Pearson >> Tested-by: Timothy Pearson >> --- >> drivers/scsi/mpt2sas/mpt2sas_base.c | 9 + >> 1 file changed, 9 insertions(+) >> >> diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c >> b/drivers/scsi/mpt2sas/mpt2sas_base.c >> index 11248de..15c9504 100644 >> --- a/drivers/scsi/mpt2sas/mpt2sas_base.c >> +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c >> @@ -6,6 +6,8 @@ >> * Copyright (C) 2007-2014 LSI Corporation >> * Copyright (C) 20013-2014 Avago Technologies >> * (mailto: mpt-fusionlinux@avagotech.com) >> + * Copyright (C) 2015 Raptor Engineering >> + * (mailto: supp...@araptorengineeringinc.com) >> * >> * This program is free software; you can redistribute it and/or >> * modify it under the terms of the GNU General Public License >> @@ -1582,6 +1584,13 @@ mpt2sas_base_map_resources(struct MPT2SAS_ADAPTER >> *ioc) >> } >> } >> >> + if (ioc->chip == NULL) { >> + printk(MPT2SAS_ERR_FMT "unable to map " >> + "adapter memory (resource not found)!\n", ioc->name); >> + r = -EINVAL; >> + goto out_fail; >> + } >> + >> _base_mask_interrupts(ioc); >> >> r = _base_get_ioc_facts(ioc, CAN_SLEEP); > > Just following up on this patch as I have not yet received any response. > > Thanks! Hi Tim -- just curious, why was the similar check on ioc->chip just a few lines above the one added by the patch insufficient? That loop block sets memap_sz when it finds an IORESOURCE_MEM so that it only sets ioc->chip once. I wonder if the fix might be simpler if the existing ioc->chip check relocated entirely to where you put it (maybe also pulling the entire error text onto one line for easier grepping). Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/20] [SCSI] mpt3sas: Use alloc_ordered_workqueue() API instead of create_singlethread_workqueue() API
On 06/12/2015 05:42 AM, Sreekanth Reddy wrote: ... > +#if defined(alloc_ordered_workqueue) > + ioc->firmware_event_thread = alloc_ordered_workqueue( > + ioc->firmware_event_name, WQ_MEM_RECLAIM); > +#else > + ioc->firmware_event_thread = create_singlethread_workqueue( > ioc->firmware_event_name); > +#endif Hi Sreekanth, I think the upstream version of this code can safely assume alloc_ordered_workqueue is defined, no? Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups
On 04/13/2015 10:38 AM, James Bottomley wrote: > On Mon, 2015-04-13 at 10:06 -0400, Joe Lawrence wrote: >> On 04/12/2015 08:54 PM, James Bottomley wrote: >>> On Sun, 2015-04-12 at 20:11 -0400, Joe Lawrence wrote: >>>> On 12/30/2014 09:07 AM, Joe Lawrence wrote: >>>>> A colleague noticed that the mpt2 and mpt3sas drivers do not correctly >>>>> check the PCI master abort pattern in _base_wait_for_doorbell_ack. This >>>>> pattern should be checked *prior* to any valid bit patterns, which would >>>>> always return true since a PCI read on master abort sets all bits high. >>>>> >>>>> The second patch adds similar checking to _base_wait_for_doorbell_int and >>>>> _base_wait_for_doorbell_not_used to avoid potentially long loops around >>>>> PCI reads. >>>>> >>>>> Joe Lawrence (2): >>>>> mpt2sas,mpt3sas: correct master-abort checking in doorbell ack >>>>> mpt2sas,mpt3sas: additional master abort checks >>>>> >>>>> drivers/scsi/mpt2sas/mpt2sas_base.c | 17 - >>>>> drivers/scsi/mpt3sas/mpt3sas_base.c | 17 - >>>>> 2 files changed, 24 insertions(+), 10 deletions(-) >>>>> >>>> >>>> Avago ping? >>>> >>>> This one was pretty straightforward: check 0x *before* any >>>> individual bit(s), i.e. before reading the doorbell register. >>> >>> OK, Joe, explain why this patch is important: what problems could result >>> from it not being present? If you convince everyone then no more mpt2/3 >>> sas patches until this is at least commented on and a plan of action >>> proposed. >> >> Hi James, >> >> As currently coded: If the PCI read returns a master abort, >> _base_wait_for_doorbell_ack will loop until it exhausts its timeout (up >> to 15 seconds). Other parts of the driver, like the periodic watchdog >> or EEH, may detect a similar problem before such a long time and cleanup >> the mess. However, complete device removal may be stalled until whoever >> called _base_wait_for_doorbell_ack is satisfied that it has finished. > > I think we all picked this up from the changelog. What I meant was in > what situations might a card get a master abort ... because that's when > the problem becomes user visible. It sounds like it's something that > might not occur very often or is a bit theoretical, is that right? Got it -- yes, master abort should be an infrequent event. On Stratus HW it typically indicates PCI hotplug removal of a device. We run many automated removal/add tests, which find corner case bugs and other race conditions in driver error paths and whatnot. >> This behavior is not really a bug, but feels like one in the making. >> Should additional code be introduced, copy/pasted, etc. it may not do >> what was intended. >> >> For future reference, would a repost have been more appropriate? This >> changeset was so small that I figured a status ping would have sufficed. > > Either works. I was just trying to work out what sort of attention > needs to be paid to the fix based on what sort of problem it fixes for > the end user. Ok. This particular piece of code isn't going to crash the machine or cause corruption, etc. It only seemed proper to fix the check that was already there. -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups
On 04/12/2015 08:54 PM, James Bottomley wrote: > On Sun, 2015-04-12 at 20:11 -0400, Joe Lawrence wrote: >> On 12/30/2014 09:07 AM, Joe Lawrence wrote: >>> A colleague noticed that the mpt2 and mpt3sas drivers do not correctly >>> check the PCI master abort pattern in _base_wait_for_doorbell_ack. This >>> pattern should be checked *prior* to any valid bit patterns, which would >>> always return true since a PCI read on master abort sets all bits high. >>> >>> The second patch adds similar checking to _base_wait_for_doorbell_int and >>> _base_wait_for_doorbell_not_used to avoid potentially long loops around >>> PCI reads. >>> >>> Joe Lawrence (2): >>> mpt2sas,mpt3sas: correct master-abort checking in doorbell ack >>> mpt2sas,mpt3sas: additional master abort checks >>> >>> drivers/scsi/mpt2sas/mpt2sas_base.c | 17 - >>> drivers/scsi/mpt3sas/mpt3sas_base.c | 17 - >>> 2 files changed, 24 insertions(+), 10 deletions(-) >>> >> >> Avago ping? >> >> This one was pretty straightforward: check 0x *before* any >> individual bit(s), i.e. before reading the doorbell register. > > OK, Joe, explain why this patch is important: what problems could result > from it not being present? If you convince everyone then no more mpt2/3 > sas patches until this is at least commented on and a plan of action > proposed. Hi James, As currently coded: If the PCI read returns a master abort, _base_wait_for_doorbell_ack will loop until it exhausts its timeout (up to 15 seconds). Other parts of the driver, like the periodic watchdog or EEH, may detect a similar problem before such a long time and cleanup the mess. However, complete device removal may be stalled until whoever called _base_wait_for_doorbell_ack is satisfied that it has finished. This behavior is not really a bug, but feels like one in the making. Should additional code be introduced, copy/pasted, etc. it may not do what was intended. For future reference, would a repost have been more appropriate? This changeset was so small that I figured a status ping would have sufficed. Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups
On 12/30/2014 09:07 AM, Joe Lawrence wrote: > A colleague noticed that the mpt2 and mpt3sas drivers do not correctly > check the PCI master abort pattern in _base_wait_for_doorbell_ack. This > pattern should be checked *prior* to any valid bit patterns, which would > always return true since a PCI read on master abort sets all bits high. > > The second patch adds similar checking to _base_wait_for_doorbell_int and > _base_wait_for_doorbell_not_used to avoid potentially long loops around > PCI reads. > > Joe Lawrence (2): > mpt2sas,mpt3sas: correct master-abort checking in doorbell ack > mpt2sas,mpt3sas: additional master abort checks > > drivers/scsi/mpt2sas/mpt2sas_base.c | 17 - > drivers/scsi/mpt3sas/mpt3sas_base.c | 17 - > 2 files changed, 24 insertions(+), 10 deletions(-) > Avago ping? This one was pretty straightforward: check 0x *before* any individual bit(s), i.e. before reading the doorbell register. -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OOPS: unplugging western digital passport drive
On 03/11/2015 12:25 AM, Stanisław Pitucha wrote: > Hi linux-scsi, > I've got another case of reproducible crash when unplugging western > digital passport drives. This was mentioned before in > http://www.spinics.net/lists/linux-scsi/msg82603.html > > Is there any way I can get a more correct stacktrace of the original > bug without changing/recompiling the kernel? (or with?) > > My last log from oops is: > > [ 3343.469871]: usb 4-1: USB disconnect, device number 3 > [ 3343.521948]: BUG: unable to handle kernel NULL pointer dereference > at 01a0 > [ 3343.522052]: IP: [] blk_post_runtime_resume+0x65/0x80 > [ 3343.522131]: PGD 0 > [ 3343.522159]: Oops: 0002 [#1] PREEMPT SMP > [ 3343.522215]: Modules linked in: nls_utf8 hfsplus ses enclosure uas > usb_storage rfcomm fuse ctr ccm bnep ecb btusb bluetooth uvcvideo > videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev > joydev media mousedev iTCO_wdt iTCO_vendor_support arc4 coretemp > intel_rapl x86_pkg_temp_thermal intel_powerclamp iwldvm kvm_intel > mac80211 kvm evdev mac_hid psmouse serio_raw pcspkr snd_hda_codec_hdmi > iwlwifi snd_hda_codec_realtek i2c_i801 snd_hda_codec_generic > rtsx_pci_ms i915 snd_hda_intel memstick snd_hda_controller cfg80211 > snd_hda_codec r8169 snd_hwdep mii snd_pcm drm_kms_helper snd_timer > lpc_ich shpchp drm thinkpad_acpi intel_gtt i2c_algo_bit nvram mei_me > i2c_core snd mei soundcore led_class rfkill hwmon tpm_tis tpm battery > wmi thermal video button processor ac sch_fq_codel usbhid > hid_logitech_dj > [ 3343.523406]: hid_generic hid nfs lockd grace sunrpc fscache ext4 > crc16 mbcache jbd2 sha256_ssse3 sha256_generic algif_skcipher af_alg > dm_crypt dm_mod sr_mod cdrom sd_mod rtsx_pci_sdmmc atkbd mmc_core > libps2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel > ahci libahci aesni_intel aes_x86_64 lrw gf128mul glue_helper > ablk_helper cryptd libata xhci_pci ehci_pci scsi_mod xhci_hcd ehci_hcd > rtsx_pci usbcore usb_common i8042 serio > [ 3343.523825]: CPU: 1 PID: 22490 Comm: kworker/1:0 Not tainted 3.18.6-1-ARCH > #1 > [ 3343.523882]: Hardware name: LENOVO 2481CTO/2481CTO, BIOS > G3ET36WW(1.10) 06/20/2012 > [ 3343.523953]: Workqueue: usb_hub_wq hub_event [usbcore] > [ 3343.523997]: task: 880211fd5080 ti: 8801971bc000 task.ti: > 8801971bc000 > [ 3343.524054]: RIP: 0010:[] [] > blk_post_runtime_resume+0x65/0x80 > [ 3343.524131]: RSP: 0018:8801971bf8c8 EFLAGS: 00010092 > [ 3343.524173]: RAX: RBX: 880211ca1698 RCX: > > [ 3343.524228]: RDX: 0001000dea0f RSI: 0009 RDI: > 880211ca1698 > [ 3343.524283]: RBP: 8801971bf8d8 R08: R09: > 880216c3f000 > [ 3343.524338]: R10: 88021f256db0 R11: ea0002987d40 R12: > > [ 3343.524392]: R13: R14: 88020df7b428 R15: > 0004 > [ 3343.524448]: FS: () GS:88021f24() > knlGS: > [ 3343.524509]: CS: 0010 DS: ES: CR0: 80050033 > [ 3343.524554]: CR2: 01a0 CR3: 01811000 CR4: > 001407e0 > [ 3343.524609]: Stack: > [ 3343.524628]: 8802123fe968 8801971bf908 > a0119564 > [ 3343.524695]: 8801971bf910 8802123fe968 8802123fea16 > a01194e0 > [ 3343.524762]: 8801971bf938 813c4636 8800a61f5780 > 8802123fe968 > [ 3343.524828]: Call Trace: > [ 3343.524866]: [] scsi_runtime_resume+0x84/0xd0 [scsi_mod] > [ 3343.524930]: [] ? > scsi_autopm_put_device+0x20/0x20 [scsi_mod] > [ 3343.524990]: [] __rpm_callback+0x36/0x90 > [ 3343.525035]: [] rpm_callback+0x26/0xa0 > [ 3343.525079]: [] rpm_resume+0x4b1/0x690 > [ 3343.525124]: [] __pm_runtime_resume+0x40/0x60 > [ 3343.525177]: [] __device_release_driver+0x29/0xf0 > [ 3343.525228]: [] device_release_driver+0x23/0x30 > [ 3343.525277]: [] bus_remove_device+0x108/0x180 > [ 3343.525325]: [] device_del+0x129/0x1f0 > [ 3343.525378]: [] __scsi_remove_device+0xcd/0xe0 > [scsi_mod] > [ 3343.525442]: [] scsi_forget_host+0x64/0x70 [scsi_mod] > [ 3343.525502]: [] scsi_remove_host+0x7b/0x130 [scsi_mod] > [ 3343.525558]: [] usb_stor_disconnect+0x59/0xd0 > [usb_storage] > [ 3343.525625]: [] usb_unbind_interface+0x1f8/0x2c0 > [usbcore] > [ 3343.525682]: [] ? rpm_idle+0x23/0x340 > [ 3343.525729]: [] __device_release_driver+0x7f/0xf0 > [ 3343.525779]: [] device_release_driver+0x23/0x30 > [ 3343.525828]: [] bus_remove_device+0x108/0x180 > [ 3343.525876]: [] device_del+0x129/0x1f0 > [ 3343.528574]: [] usb_disable_device+0x91/0x290 [usbcore] > [ 3343.531302]: [] usb_disconnect+0x94/0x2d0 [usbcore] > [ 3343.533981]: [] hub_event+0x66a/0x1640 [usbcore] > [ 3343.536657]: [] ? __schedule+0x3e8/0xa50 > [ 3343.539275]: [] process_one_work+0x145/0x400 > [ 3343.541919]: [] worker_thread+0x6b/0x480 > [ 3343.544524]: [] ? init_pwq.part.22+0x10/0x10 > [ 3343.547108]: [] kthread+0xea/0x100 > [ 3343.549708]: [
Re: usb-storage URB use-after-free
On Thu, 29 Jan 2015 11:42:18 -0500 Alan Stern wrote: > On Wed, 28 Jan 2015, Joe Lawrence wrote: > > > This one should have gone over to linux-usb. > > > > -- Joe > > > > On 01/28/2015 05:04 PM, Joe Lawrence wrote: > > > Hello linux-usb, > > > > > > We've hit a USB use-after-free on Stratus HW during device removal tests. > > > We're running fio disk I/O to a scsi disk hanging off USB when the USB > > > controller is hotplug removed. This crash is very consistent (usually the > > > first device pull during testing). Without I/O, it may take days to > > > encounter. > > > > > > general protection fault: [#1] SMP > > > ... > > > CPU: 35 PID: 19626 Comm: kworker/u97:0 Tainted: PF W > > > O-- 3.10.0-210.el7.dev02.x86_64 #1 > > > Hardware name: Stratus ftServer 6800/G7LYY, BIOS BIOS Version 8.0:30 > > > 11/12/2014 > > > Workqueue: scsi_tmf_872 scmd_eh_abort_handler > > > task: 88031bd91960 ti: 880981318000 task.ti: 880981318000 > > > RIP: 0010:[] [] kobject_put+0xd/0x60 > > > RSP: 0018:88098131bd28 EFLAGS: 00010002 > > > RAX: RBX: 6b6b6b6b6b6b6c0b RCX: 0001002f0009 > > > RDX: 002f RSI: ea0015719800 RDI: 6b6b6b6b6b6b6c0b > > > RBP: 88098131bd30 R08: 88055c6622f0 R09: 0001002f0008 > > > R10: 88085f407a80 R11: 81450503 R12: 8804bab6d248 > > > R13: ff98 R14: 0086 R15: 0c20 > > > FS: () GS:88085fce() > > > knlGS: > > > CS: 0010 DS: ES: CR0: 80050033 > > > CR2: 7f2ebb5d6008 CR3: 001038dc5000 CR4: 001407e0 > > > DR0: DR1: DR2: > > > DR3: DR6: 0ff0 DR7: 0400 > > > Stack: > > > 88098131bd40 813cd327 88098131bd50 > > > 8140a65a 88098131bd78 81416795 > > > 880414791968 880414791978 88098131bd88 814175f6 > > > Call Trace: > > > [] put_device+0x17/0x20 > > > [] usb_put_dev+0x1a/0x20 > > > [] usb_hcd_unlink_urb+0x65/0xe0 > > > [] usb_unlink_urb+0x26/0x50 > > > [] usb_sg_cancel+0x82/0xe0 > > > [] usb_stor_stop_transport+0x31/0x50 [usb_storage] > > > [] command_abort+0xcd/0xe0 [usb_storage] > > > [] scmd_eh_abort_handler+0xbf/0x480 > > > [] process_one_work+0x17b/0x470 > > > [] worker_thread+0x11b/0x400 > > > [] ? rescuer_thread+0x400/0x400 > > > [] kthread+0xcf/0xe0 > > > [] ? kthread_create_on_node+0x140/0x140 > > > [] ret_from_fork+0x7c/0xb0 > > > [] ? kthread_create_on_node+0x140/0x140 > > > > > > from another crash, we know that the URB itself has been freed: > > > > but the underlying usb_device->device is still present: > > > > It looks like usb_hcd_unlink_urb takes a reference out on the underlying > > > device but not the URB, which in our test case consistently gets freed > > > just before usb_hcd_unlink_urb tries to return a reference on > > > urb->device. > > The analysis looks correct. > > > > Maybe the URB is a reference count short when usb_hcd_unlink_urb calls > > > unlink1? Somewhere usb-storage missed taking out a ref? > > No; it's a simple use-after-free error. > > > > A tourniquet hack-patch follows (probably inside out, as the URB should > > > stay > > > intact while usb_hcd_unlink_urb does its thing) and has been running a > > > little over 30 surprise device removals under I/O without incident. > > > > > > Any better suggestions? > > Please try this instead. > > Alan Stern > > > > Index: usb-3.19/drivers/usb/core/hcd.c > === > --- usb-3.19.orig/drivers/usb/core/hcd.c > +++ usb-3.19/drivers/usb/core/hcd.c > @@ -1618,6 +1618,7 @@ static int unlink1(struct usb_hcd *hcd, > int usb_hcd_unlink_urb (struct urb *urb, int status) > { > struct usb_hcd *hcd; > + struct usb_device *udev = urb->dev; > int retval = -EIDRM; > unsigned long flags; > > @@ -1629,20 +1630,19 @@ int usb_hcd_unlink_urb (struct urb *urb, > spin_lock_irqsave(&hcd_urb_unlink_lock, flags); > if (atomic_read(&urb->use_
Re: usb-storage URB use-after-free
This one should have gone over to linux-usb. -- Joe On 01/28/2015 05:04 PM, Joe Lawrence wrote: > Hello linux-usb, > > We've hit a USB use-after-free on Stratus HW during device removal tests. > We're running fio disk I/O to a scsi disk hanging off USB when the USB > controller is hotplug removed. This crash is very consistent (usually the > first device pull during testing). Without I/O, it may take days to > encounter. > > general protection fault: [#1] SMP > ... > CPU: 35 PID: 19626 Comm: kworker/u97:0 Tainted: PF W O-- > 3.10.0-210.el7.dev02.x86_64 #1 > Hardware name: Stratus ftServer 6800/G7LYY, BIOS BIOS Version 8.0:30 > 11/12/2014 > Workqueue: scsi_tmf_872 scmd_eh_abort_handler > task: 88031bd91960 ti: 880981318000 task.ti: 880981318000 > RIP: 0010:[] [] kobject_put+0xd/0x60 > RSP: 0018:88098131bd28 EFLAGS: 00010002 > RAX: RBX: 6b6b6b6b6b6b6c0b RCX: 0001002f0009 > RDX: 002f RSI: ea0015719800 RDI: 6b6b6b6b6b6b6c0b > RBP: 88098131bd30 R08: 88055c6622f0 R09: 0001002f0008 > R10: 88085f407a80 R11: 81450503 R12: 8804bab6d248 > R13: ff98 R14: 0086 R15: 0c20 > FS: () GS:88085fce() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7f2ebb5d6008 CR3: 001038dc5000 CR4: 001407e0 > DR0: DR1: DR2: > DR3: DR6: 0ff0 DR7: 0400 > Stack: > 88098131bd40 813cd327 88098131bd50 > 8140a65a 88098131bd78 81416795 > 880414791968 880414791978 88098131bd88 814175f6 > Call Trace: > [] put_device+0x17/0x20 > [] usb_put_dev+0x1a/0x20 > [] usb_hcd_unlink_urb+0x65/0xe0 > [] usb_unlink_urb+0x26/0x50 > [] usb_sg_cancel+0x82/0xe0 > [] usb_stor_stop_transport+0x31/0x50 [usb_storage] > [] command_abort+0xcd/0xe0 [usb_storage] > [] scmd_eh_abort_handler+0xbf/0x480 > [] process_one_work+0x17b/0x470 > [] worker_thread+0x11b/0x400 > [] ? rescuer_thread+0x400/0x400 > [] kthread+0xcf/0xe0 > [] ? kthread_create_on_node+0x140/0x140 > [] ret_from_fork+0x7c/0xb0 > [] ? kthread_create_on_node+0x140/0x140 > > from another crash, we know that the URB itself has been freed: > > crash> struct -o urb | grep SIZE > SIZE: 0xc0 > crash> p 0xc0/8 > $2 = 0x18 > crash> rd 880846df9248 0x18 > 880846df9248: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b > 880846df9258: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b > 880846df9268: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b > 880846df9278: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b > 880846df9288: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b > 880846df9298: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b > 880846df92a8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b > 880846df92b8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b > 880846df92c8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b > 880846df92d8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b > 880846df92e8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b > 880846df92f8: 6b6b6b6b6b6b6b6b a56b6b6b6b6b6b6b kkk. > > but the underlying usb_device->device is still present: > > crash> struct usb_device 88084fd22e68 > ... > dev = { > parent = 0x881050c3d418, > p = 0x88084ff27ae8, > kobj = { > name = 0x88084ff071e0 "6-1", > entry = { > next = 0x88084ff09568, > prev = 0x88084ff09000 > }, > parent = 0x881050c3d428, > kset = 0x8808540dc9f0, > ktype = 0x819cc220 , > sd = 0x88084ff1b8b8, > kref = { > refcount = { > counter = 0x8 > } > }, > state_initialized = 0x1, > state_in_sysfs = 0x1, > state_add_uevent_sent = 0x1, > state_remove_uevent_sent = 0x0, > uevent_suppress = 0x0 > }, > ... > > I added trace printing to usb_{alloc_urb,get_urb,free_urb} and urb_destroy > showing the URB and the calling function, then crashed again: > > ... snip ... > usb-storage-687 [000] 350.013212: bprint: > usb_alloc_urb : JL: usb_sg_init+0xe1 -> usb_alloc_urb(0x8808489b0a28) > usb-storage-687 [000] 350.013212: bprint: usb_get_urb > : JL: usb_hcd_submit_urb+0x3b -> usb_get_urb(0x8808489b0a28) >
usb-storage URB use-after-free
Hello linux-usb, We've hit a USB use-after-free on Stratus HW during device removal tests. We're running fio disk I/O to a scsi disk hanging off USB when the USB controller is hotplug removed. This crash is very consistent (usually the first device pull during testing). Without I/O, it may take days to encounter. general protection fault: [#1] SMP ... CPU: 35 PID: 19626 Comm: kworker/u97:0 Tainted: PF W O-- 3.10.0-210.el7.dev02.x86_64 #1 Hardware name: Stratus ftServer 6800/G7LYY, BIOS BIOS Version 8.0:30 11/12/2014 Workqueue: scsi_tmf_872 scmd_eh_abort_handler task: 88031bd91960 ti: 880981318000 task.ti: 880981318000 RIP: 0010:[] [] kobject_put+0xd/0x60 RSP: 0018:88098131bd28 EFLAGS: 00010002 RAX: RBX: 6b6b6b6b6b6b6c0b RCX: 0001002f0009 RDX: 002f RSI: ea0015719800 RDI: 6b6b6b6b6b6b6c0b RBP: 88098131bd30 R08: 88055c6622f0 R09: 0001002f0008 R10: 88085f407a80 R11: 81450503 R12: 8804bab6d248 R13: ff98 R14: 0086 R15: 0c20 FS: () GS:88085fce() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f2ebb5d6008 CR3: 001038dc5000 CR4: 001407e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Stack: 88098131bd40 813cd327 88098131bd50 8140a65a 88098131bd78 81416795 880414791968 880414791978 88098131bd88 814175f6 Call Trace: [] put_device+0x17/0x20 [] usb_put_dev+0x1a/0x20 [] usb_hcd_unlink_urb+0x65/0xe0 [] usb_unlink_urb+0x26/0x50 [] usb_sg_cancel+0x82/0xe0 [] usb_stor_stop_transport+0x31/0x50 [usb_storage] [] command_abort+0xcd/0xe0 [usb_storage] [] scmd_eh_abort_handler+0xbf/0x480 [] process_one_work+0x17b/0x470 [] worker_thread+0x11b/0x400 [] ? rescuer_thread+0x400/0x400 [] kthread+0xcf/0xe0 [] ? kthread_create_on_node+0x140/0x140 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x140/0x140 from another crash, we know that the URB itself has been freed: crash> struct -o urb | grep SIZE SIZE: 0xc0 crash> p 0xc0/8 $2 = 0x18 crash> rd 880846df9248 0x18 880846df9248: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 880846df9258: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 880846df9268: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 880846df9278: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 880846df9288: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 880846df9298: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 880846df92a8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 880846df92b8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 880846df92c8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 880846df92d8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 880846df92e8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 880846df92f8: 6b6b6b6b6b6b6b6b a56b6b6b6b6b6b6b kkk. but the underlying usb_device->device is still present: crash> struct usb_device 88084fd22e68 ... dev = { parent = 0x881050c3d418, p = 0x88084ff27ae8, kobj = { name = 0x88084ff071e0 "6-1", entry = { next = 0x88084ff09568, prev = 0x88084ff09000 }, parent = 0x881050c3d428, kset = 0x8808540dc9f0, ktype = 0x819cc220 , sd = 0x88084ff1b8b8, kref = { refcount = { counter = 0x8 } }, state_initialized = 0x1, state_in_sysfs = 0x1, state_add_uevent_sent = 0x1, state_remove_uevent_sent = 0x0, uevent_suppress = 0x0 }, ... I added trace printing to usb_{alloc_urb,get_urb,free_urb} and urb_destroy showing the URB and the calling function, then crashed again: ... snip ... usb-storage-687 [000] 350.013212: bprint: usb_alloc_urb : JL: usb_sg_init+0xe1 -> usb_alloc_urb(0x8808489b0a28) usb-storage-687 [000] 350.013212: bprint: usb_get_urb : JL: usb_hcd_submit_urb+0x3b -> usb_get_urb(0x8808489b0a28) kworker/u97:2-703 [006] 380.656791: bprint: usb_free_urb : JL: __usb_hcd_giveback_urb -> usb_free_urb(0x8808489b0a28) usb-storage-687 [000] 380.656793: bprint: usb_free_urb : JL: sg_clean+0x33 -> usb_free_urb(0x8808489b0a28) usb-storage-687 [000] 380.656794: bprint: urb_destroy : JL: usb_free_urb+0x40 -> urb_destroy(0x8808489b0a28) crash> bt 703 PID: 703TASK: 8808497f8000 CPU: 6 COMMAND: "kworker/u97:2" #0 [8808497f7c10] die at 8101736b #1 [8808497f7c40] do_general_protection at 8160be4e #2 [8808497f7c70] general_protection at 8160b768 [
Re: [SCIS] mpt3sas: wait_for_completion_timeout timeout not reported
On 12/30/2014 11:19 AM, Nicholas Mc Guire wrote: > wait_for_completion_timeout reaching timeout was being ignored, > this also should fail if timeout condition occurs. > > Thanks to Joe Lawrence for confirmation. How about this instead: Acked-by: Joe Lawrence You still probably want a review from the Avago folks though. BTW, don't worry about the "[SCSI]" subject prefix to the commit, "[PATCH]" is fine. I believe the former is a convention that the maintainer applies to patches as he collects them to indicate that they originated via the SCSI tree. Thanks, -- Joe > this was only compile tested with > x86_64_defconfig + CONFIG_SCSI_LOWLEVEL=y + CONFIG_SCSI_MPT3SAS=m > > patch is against linux-next 3.19.0-rc1 -next-20141226 > Signed-off-by: Nicholas Mc Guire > --- > drivers/scsi/mpt3sas/mpt3sas_config.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_config.c > b/drivers/scsi/mpt3sas/mpt3sas_config.c > index 4472c2a..04ff21b 100644 > --- a/drivers/scsi/mpt3sas/mpt3sas_config.c > +++ b/drivers/scsi/mpt3sas/mpt3sas_config.c > @@ -393,7 +393,7 @@ _config_request(struct MPT3SAS_ADAPTER *ioc, > Mpi2ConfigRequest_t > mpt3sas_base_put_smid_default(ioc, smid); > timeleft = wait_for_completion_timeout(&ioc->config_cmds.done, > timeout*HZ); > - if (!(ioc->config_cmds.status & MPT3_CMD_COMPLETE)) { > + if (timeleft == 0 || !(ioc->config_cmds.status & MPT3_CMD_COMPLETE)) { > pr_err(MPT3SAS_FMT "%s: timeout\n", > ioc->name, __func__); > _debug_dump_mf(mpi_request, > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] mpt2sas,mpt3sas: additional master abort checks
Add checks for PCI master abort reads in _base_wait_for_doorbell_int and _base_wait_for_doorbell_not_used, which contain potentially long loops around readl calls. Signed-off-by: Joe Lawrence --- drivers/scsi/mpt2sas/mpt2sas_base.c | 10 -- drivers/scsi/mpt3sas/mpt3sas_base.c | 10 -- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index 16d6fd5e037e..6ad1268cc57b 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -2942,7 +2942,9 @@ _base_wait_for_doorbell_int(struct MPT2SAS_ADAPTER *ioc, int timeout, cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout; do { int_status = readl(&ioc->chip->HostInterruptStatus); - if (int_status & MPI2_HIS_IOC2SYS_DB_STATUS) { + if (int_status == 0x) + goto out; + else if (int_status & MPI2_HIS_IOC2SYS_DB_STATUS) { dhsprintk(ioc, printk(MPT2SAS_INFO_FMT "%s: " "successful count(%d), timeout(%d)\n", ioc->name, __func__, count, timeout)); @@ -2955,6 +2957,7 @@ _base_wait_for_doorbell_int(struct MPT2SAS_ADAPTER *ioc, int timeout, count++; } while (--cntdn); + out: printk(MPT2SAS_ERR_FMT "%s: failed due to timeout count(%d), " "int_status(%x)!\n", ioc->name, __func__, count, int_status); return -EFAULT; @@ -3032,7 +3035,9 @@ _base_wait_for_doorbell_not_used(struct MPT2SAS_ADAPTER *ioc, int timeout, cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout; do { doorbell_reg = readl(&ioc->chip->Doorbell); - if (!(doorbell_reg & MPI2_DOORBELL_USED)) { + if (doorbell_reg == 0x) + goto out; + else if (!(doorbell_reg & MPI2_DOORBELL_USED)) { dhsprintk(ioc, printk(MPT2SAS_INFO_FMT "%s: " "successful count(%d), timeout(%d)\n", ioc->name, __func__, count, timeout)); @@ -3045,6 +3050,7 @@ _base_wait_for_doorbell_not_used(struct MPT2SAS_ADAPTER *ioc, int timeout, count++; } while (--cntdn); + out: printk(MPT2SAS_ERR_FMT "%s: failed due to timeout count(%d), " "doorbell_reg(%x)!\n", ioc->name, __func__, count, doorbell_reg); return -EFAULT; diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 878bf6ddd2a0..d3b6549640c7 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -3115,7 +3115,9 @@ _base_wait_for_doorbell_int(struct MPT3SAS_ADAPTER *ioc, int timeout, cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout; do { int_status = readl(&ioc->chip->HostInterruptStatus); - if (int_status & MPI2_HIS_IOC2SYS_DB_STATUS) { + if (int_status == 0x) + goto out; + else if (int_status & MPI2_HIS_IOC2SYS_DB_STATUS) { dhsprintk(ioc, pr_info(MPT3SAS_FMT "%s: successful count(%d), timeout(%d)\n", ioc->name, __func__, count, timeout)); @@ -3128,6 +3130,7 @@ _base_wait_for_doorbell_int(struct MPT3SAS_ADAPTER *ioc, int timeout, count++; } while (--cntdn); + out: pr_err(MPT3SAS_FMT "%s: failed due to timeout count(%d), int_status(%x)!\n", ioc->name, __func__, count, int_status); @@ -3207,7 +3210,9 @@ _base_wait_for_doorbell_not_used(struct MPT3SAS_ADAPTER *ioc, int timeout, cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout; do { doorbell_reg = readl(&ioc->chip->Doorbell); - if (!(doorbell_reg & MPI2_DOORBELL_USED)) { + if (doorbell_reg == 0x) + goto out; + else if (!(doorbell_reg & MPI2_DOORBELL_USED)) { dhsprintk(ioc, pr_info(MPT3SAS_FMT "%s: successful count(%d), timeout(%d)\n", ioc->name, __func__, count, timeout)); @@ -3220,6 +3225,7 @@ _base_wait_for_doorbell_not_used(struct MPT3SAS_ADAPTER *ioc, int timeout, count++; } while (--cntdn); + out: pr_err(MPT3SAS_FMT "%s: failed due to timeout count(%d), doorbell_reg(%x)!\n", ioc->name, __func__, count, doorbell_reg); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups
A colleague noticed that the mpt2 and mpt3sas drivers do not correctly check the PCI master abort pattern in _base_wait_for_doorbell_ack. This pattern should be checked *prior* to any valid bit patterns, which would always return true since a PCI read on master abort sets all bits high. The second patch adds similar checking to _base_wait_for_doorbell_int and _base_wait_for_doorbell_not_used to avoid potentially long loops around PCI reads. Joe Lawrence (2): mpt2sas,mpt3sas: correct master-abort checking in doorbell ack mpt2sas,mpt3sas: additional master abort checks drivers/scsi/mpt2sas/mpt2sas_base.c | 17 - drivers/scsi/mpt3sas/mpt3sas_base.c | 17 - 2 files changed, 24 insertions(+), 10 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] mpt2sas,mpt3sas: correct master-abort checking in doorbell ack
Test against the invalid data pattern ~0 before testing valid data patterns. Reported-by: Derek Shute Signed-off-by: Joe Lawrence --- drivers/scsi/mpt2sas/mpt2sas_base.c |7 --- drivers/scsi/mpt3sas/mpt3sas_base.c |7 --- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index 3fb01d1883c6..16d6fd5e037e 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -2983,7 +2983,9 @@ _base_wait_for_doorbell_ack(struct MPT2SAS_ADAPTER *ioc, int timeout, cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout; do { int_status = readl(&ioc->chip->HostInterruptStatus); - if (!(int_status & MPI2_HIS_SYS2IOC_DB_STATUS)) { + if (int_status == 0x) + goto out; + else if (!(int_status & MPI2_HIS_SYS2IOC_DB_STATUS)) { dhsprintk(ioc, printk(MPT2SAS_INFO_FMT "%s: " "successful count(%d), timeout(%d)\n", ioc->name, __func__, count, timeout)); @@ -2995,8 +2997,7 @@ _base_wait_for_doorbell_ack(struct MPT2SAS_ADAPTER *ioc, int timeout, mpt2sas_base_fault_info(ioc , doorbell); return -EFAULT; } - } else if (int_status == 0x) - goto out; + } if (sleep_flag == CAN_SLEEP) msleep(1); diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index b9c27398e206..878bf6ddd2a0 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -3157,7 +3157,9 @@ _base_wait_for_doorbell_ack(struct MPT3SAS_ADAPTER *ioc, int timeout, cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout; do { int_status = readl(&ioc->chip->HostInterruptStatus); - if (!(int_status & MPI2_HIS_SYS2IOC_DB_STATUS)) { + if (int_status == 0x) + goto out; + else if (!(int_status & MPI2_HIS_SYS2IOC_DB_STATUS)) { dhsprintk(ioc, pr_info(MPT3SAS_FMT "%s: successful count(%d), timeout(%d)\n", ioc->name, __func__, count, timeout)); @@ -3169,8 +3171,7 @@ _base_wait_for_doorbell_ack(struct MPT3SAS_ADAPTER *ioc, int timeout, mpt3sas_base_fault_info(ioc , doorbell); return -EFAULT; } - } else if (int_status == 0x) - goto out; + } if (sleep_flag == CAN_SLEEP) usleep_range(1000, 1500); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SCSI RFC] mpt2sas: wait_for_completion_timeout timeout not reported
On 12/29/2014 12:25 PM, Nicholas Mc Guire wrote: > wait_for_completion_timeout reaching timeout was being ignored, > this probably also should fail if timeout condition occurs ? > > this was only compile tested with > x86_64_defconfig + CONFIG_SCSI_LOWLEVEL=y + CONFIG_SCSI_MPT2SAS=m > > patch is against linux-next 3.19.0-rc1 -next-20141226 > > Signed-off-by: Nicholas Mc Guire > --- > drivers/scsi/mpt2sas/mpt2sas_config.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/scsi/mpt2sas/mpt2sas_config.c > b/drivers/scsi/mpt2sas/mpt2sas_config.c > index c72a2ff..02dc2d8 100644 > --- a/drivers/scsi/mpt2sas/mpt2sas_config.c > +++ b/drivers/scsi/mpt2sas/mpt2sas_config.c > @@ -391,7 +391,7 @@ _config_request(struct MPT2SAS_ADAPTER *ioc, > Mpi2ConfigRequest_t > mpt2sas_base_put_smid_default(ioc, smid); > timeleft = wait_for_completion_timeout(&ioc->config_cmds.done, > timeout*HZ); > - if (!(ioc->config_cmds.status & MPT2_CMD_COMPLETE)) { > + if (timeleft == 0 || !(ioc->config_cmds.status & MPT2_CMD_COMPLETE)) { > printk(MPT2SAS_ERR_FMT "%s: timeout\n", > ioc->name, __func__); > _debug_dump_mf(mpi_request, > Hi Nicholas, This would look like the correct thing to do here, for it eliminates a small window where we timeout when mpt2sas_config_done had set MPT2_CMD_COMPLETE and MPT3_CMD_REPLY_VALID, but has not finished copying the reply data into config_cmds.reply. It looks like the same patch should be applied to the mpt3sas driver as well. -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 31/35] qla2xxx: Disable PCI device in shutdown handler.
On Wed, 24 Sep 2014 11:37:55 -0400 Chad Dupuis wrote: > > > On Wed, 24 Sep 2014, Joe Lawrence wrote: > > > On Wed, 24 Sep 2014 03:08:34 -0400 > > Saurav Kashyap wrote: > > > >> From: Chad Dupuis > >> > >> Disable the PCI device during shutdown to prevent any races with > >> other PCI code such as the AER handling code. > >> > >> Signed-off-by: Chad Dupuis > >> Signed-off-by: Saurav Kashyap > >> --- > >> drivers/scsi/qla2xxx/qla_os.c |3 +++ > >> 1 files changed, 3 insertions(+), 0 deletions(-) > >> > >> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c > >> index 08c8e9c..70445bc 100644 > >> --- a/drivers/scsi/qla2xxx/qla_os.c > >> +++ b/drivers/scsi/qla2xxx/qla_os.c > >> @@ -3037,6 +3037,9 @@ qla2x00_shutdown(struct pci_dev *pdev) > >>qla2x00_free_irqs(vha); > >> > >>qla2x00_free_fw_dump(ha); > >> + > >> + pci_disable_pcie_error_reporting(pdev); > >> + pci_disable_device(pdev); > >> } > >> > >> /* Deletes all the virtual ports for a given ha */ > > > > Hi Saurav, > > > > Just curious if there were any bug report instances to go along with > > this patch? (Or a short explanation of the race.) > > Joe, > > The race here was during reboot after taking a crash dump. We hit an AER > condition so the driver's AER handler is called after we've already > shutdown the adapter which causes us to try to free some already freed > structs. To prevent this we disable the PCI function from the shutdown > handler. Hi Chad, Could the driver's AER handler get invoked after qla2x00_free_fw_dump, but before pci_disable_pcie_error_reporting (ie, is there anything to hold off .shutdown from running at the same time as the AER callbacks)? Thanks, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 31/35] qla2xxx: Disable PCI device in shutdown handler.
On Wed, 24 Sep 2014 03:08:34 -0400 Saurav Kashyap wrote: > From: Chad Dupuis > > Disable the PCI device during shutdown to prevent any races with > other PCI code such as the AER handling code. > > Signed-off-by: Chad Dupuis > Signed-off-by: Saurav Kashyap > --- > drivers/scsi/qla2xxx/qla_os.c |3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c > index 08c8e9c..70445bc 100644 > --- a/drivers/scsi/qla2xxx/qla_os.c > +++ b/drivers/scsi/qla2xxx/qla_os.c > @@ -3037,6 +3037,9 @@ qla2x00_shutdown(struct pci_dev *pdev) > qla2x00_free_irqs(vha); > > qla2x00_free_fw_dump(ha); > + > + pci_disable_pcie_error_reporting(pdev); > + pci_disable_device(pdev); > } > > /* Deletes all the virtual ports for a given ha */ Hi Saurav, Just curious if there were any bug report instances to go along with this patch? (Or a short explanation of the race.) Thanks, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 20/35] qla2xxx: Unload of qla2xxx driver crashes the machine.
Hi Saurav, Will these changes conflict with those submitted in August [1] to Christoph's drivers-for-3.18 branch? In particular, "qla2xxx: Fix shost use-after-free on device removal" [2] fixed this same driver unload issue in a slightly different manner. That change was marked for stable as the bug was introduced by fe1b806f4f71 ("qla2xxx: Refactor shutdown code so some functionality can be reused") in previous releases. Regards, -- Joe [1] http://thread.gmane.org/gmane.linux.scsi/93859 [2] http://git.infradead.org/users/hch/scsi-queue.git/commitdiff/db7157d4cfce6edf052452fb1d327d4d11b67f4c On Wed, 24 Sep 2014 03:08:23 -0400 Saurav Kashyap wrote: > From: Arun Easi > > Signed-off-by: Arun Easi > Signed-off-by: Saurav Kashyap > --- > drivers/scsi/qla2xxx/qla_os.c |4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c > index 8846883..6539513 100644 > --- a/drivers/scsi/qla2xxx/qla_os.c > +++ b/drivers/scsi/qla2xxx/qla_os.c > @@ -3190,10 +3190,10 @@ qla2x00_remove_one(struct pci_dev *pdev) > > qla2x00_free_device(base_vha); > > - scsi_host_put(base_vha->host); > - > qla2x00_clear_drv_active(base_vha); > > + scsi_host_put(base_vha->host); > + > qla2x00_unmap_iobases(ha); > > pci_release_selected_regions(ha->pdev, ha->bars); -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] mpt2sas: remove duplicate disable_discovery MODULE_PARAM
The disable_discovery module parameter is declared and only used by mpt2sas_scsih.c. Remove the extra copy in mpt2sas_base.c. Signed-off-by: Joe Lawrence --- drivers/scsi/mpt2sas/mpt2sas_base.c |4 1 file changed, 4 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index cf51b48..bb8dcb0 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -84,10 +84,6 @@ static int mpt2sas_fwfault_debug; MODULE_PARM_DESC(mpt2sas_fwfault_debug, " enable detection of firmware fault " "and halt firmware - (default=0)"); -static int disable_discovery = -1; -module_param(disable_discovery, int, 0); -MODULE_PARM_DESC(disable_discovery, " disable discovery "); - /** * _scsih_set_fwfault_debug - global setting of ioc->fwfault_debug. * -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND 6/6] qla2xxx: Prevent probe and board_disable race
The PCI register read checking introduced in commit fe1b806f4f71 ("qla2xxx: Disable adapter when we encounter a PCI disconnect") is active during driver probe. Hold off scheduling any board removal until the driver probe has completed. This ensures that the the board_disable work structure is initialized and more importantly, avoids racing qla2x00_probe_one. Signed-off-by: Joe Lawrence Acked-by: Chad Dupuis --- drivers/scsi/qla2xxx/qla_def.h |1 + drivers/scsi/qla2xxx/qla_isr.c |3 ++- drivers/scsi/qla2xxx/qla_os.c |2 ++ 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h index 1398818..de5a9c4 100644 --- a/drivers/scsi/qla2xxx/qla_def.h +++ b/drivers/scsi/qla2xxx/qla_def.h @@ -3405,6 +3405,7 @@ typedef struct scsi_qla_host { unsigned long pci_flags; #define PFLG_DISCONNECTED 0 /* PCI device removed */ #define PFLG_DRIVER_REMOVING 1 /* PCI driver .remove */ +#define PFLG_DRIVER_PROBING2 /* PCI driver .probe */ uint32_tdevice_flags; #define SWITCH_FOUND BIT_0 diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index 341c64d..a0992bf 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -118,7 +118,8 @@ qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg) /* Check for PCI disconnection */ if (reg == 0x) { if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags) && - !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags)) { + !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags) && + !test_bit(PFLG_DRIVER_PROBING, &vha->pci_flags)) { /* * Schedule this (only once) on the default system * workqueue so that all the adapter workqueues and the diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 84d4df6..f937859 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -2632,6 +2632,7 @@ qla2x00_probe_one(struct pci_dev *pdev, const struct pci_device_id *id) } pci_set_drvdata(pdev, base_vha); + set_bit(PFLG_DRIVER_PROBING, &base_vha->pci_flags); host = base_vha->host; base_vha->req = req; @@ -2928,6 +2929,7 @@ skip_dpc: qlt_add_target(ha, base_vha); + clear_bit(PFLG_DRIVER_PROBING, &base_vha->pci_flags); return 0; probe_init_failed: -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND 3/6] qla2xxx: Collect PCI register checks and board_disable scheduling
Add an uint16_t variant of qla2x00_check_reg_for_disconnect and use these routines to check and schedule a PCI-disconnected board from a centralized place. Signed-off-by: Joe Lawrence Acked-by: Chad Dupuis --- drivers/scsi/qla2xxx/qla_gbl.h |3 ++- drivers/scsi/qla2xxx/qla_isr.c | 28 +--- drivers/scsi/qla2xxx/qla_mr.c |2 +- drivers/scsi/qla2xxx/qla_nx.c |6 +++--- drivers/scsi/qla2xxx/qla_os.c |8 +--- 5 files changed, 20 insertions(+), 27 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_gbl.h b/drivers/scsi/qla2xxx/qla_gbl.h index d646540..43ef0db 100644 --- a/drivers/scsi/qla2xxx/qla_gbl.h +++ b/drivers/scsi/qla2xxx/qla_gbl.h @@ -475,7 +475,8 @@ extern uint8_t *qla25xx_read_nvram_data(scsi_qla_host_t *, uint8_t *, uint32_t, extern int qla25xx_write_nvram_data(scsi_qla_host_t *, uint8_t *, uint32_t, uint32_t); extern int qla2x00_is_a_vp_did(scsi_qla_host_t *, uint32_t); -bool qla2x00_check_reg_for_disconnect(scsi_qla_host_t *, uint32_t); +bool qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *, uint32_t); +bool qla2x00_check_reg16_for_disconnect(scsi_qla_host_t *, uint16_t); extern int qla2x00_beacon_on(struct scsi_qla_host *); extern int qla2x00_beacon_off(struct scsi_qla_host *); diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index bea0e74..016ebed 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -56,16 +56,8 @@ qla2100_intr_handler(int irq, void *dev_id) vha = pci_get_drvdata(ha->pdev); for (iter = 50; iter--; ) { hccr = RD_REG_WORD(®->hccr); - /* Check for PCI disconnection */ - if (hccr == 0x) { - /* -* Schedule this on the default system workqueue so that -* all the adapter workqueues and the DPC thread can be -* shutdown cleanly. -*/ - schedule_work(&ha->board_disable); + if (qla2x00_check_reg16_for_disconnect(vha, hccr)) break; - } if (hccr & HCCR_RISC_PAUSE) { if (pci_channel_offline(ha->pdev)) break; @@ -121,7 +113,7 @@ qla2100_intr_handler(int irq, void *dev_id) } bool -qla2x00_check_reg_for_disconnect(scsi_qla_host_t *vha, uint32_t reg) +qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg) { /* Check for PCI disconnection */ if (reg == 0x) { @@ -136,6 +128,12 @@ qla2x00_check_reg_for_disconnect(scsi_qla_host_t *vha, uint32_t reg) return false; } +bool +qla2x00_check_reg16_for_disconnect(scsi_qla_host_t *vha, uint16_t reg) +{ + return qla2x00_check_reg32_for_disconnect(vha, 0x | reg); +} + /** * qla2300_intr_handler() - Process interrupts for the ISP23xx and ISP63xx. * @irq: @@ -174,7 +172,7 @@ qla2300_intr_handler(int irq, void *dev_id) vha = pci_get_drvdata(ha->pdev); for (iter = 50; iter--; ) { stat = RD_REG_DWORD(®->u.isp2300.host_status); - if (qla2x00_check_reg_for_disconnect(vha, stat)) + if (qla2x00_check_reg32_for_disconnect(vha, stat)) break; if (stat & HSR_RISC_PAUSED) { if (unlikely(pci_channel_offline(ha->pdev))) @@ -2631,7 +2629,7 @@ qla24xx_intr_handler(int irq, void *dev_id) vha = pci_get_drvdata(ha->pdev); for (iter = 50; iter--; ) { stat = RD_REG_DWORD(®->host_status); - if (qla2x00_check_reg_for_disconnect(vha, stat)) + if (qla2x00_check_reg32_for_disconnect(vha, stat)) break; if (stat & HSRX_RISC_PAUSED) { if (unlikely(pci_channel_offline(ha->pdev))) @@ -2721,7 +2719,7 @@ qla24xx_msix_rsp_q(int irq, void *dev_id) * we process the response queue. */ stat = RD_REG_DWORD(®->host_status); - if (qla2x00_check_reg_for_disconnect(vha, stat)) + if (qla2x00_check_reg32_for_disconnect(vha, stat)) goto out; qla24xx_process_response_queue(vha, rsp); if (!ha->flags.disable_msix_handshake) { @@ -2761,7 +2759,7 @@ qla25xx_msix_rsp_q(int irq, void *dev_id) hccr = RD_REG_DWORD_RELAXED(®->hccr); spin_unlock_irqrestore(&ha->hardware_lock, flags); } - if (qla2x00_check_reg_for_disconnect(vha, hccr)) + if (qla2x00_check_reg32_for_disconnect(vha, hccr)) goto out; queue_work_on((int) (rsp->id - 1), ha->wq, &rsp->q_work); @@ -2796,7 +2794,7 @@ qla24xx_msix_default(int irq, void *dev_id) vha = pci_get_drvdata(ha->pdev);
[PATCH RESEND 4/6] qla2xxx: Schedule board_disable only once
There are various callers of qla2x00_check_reg{32,16}_for_disconnect that may schedule board removal on PCI-disconnect. Test-and-set a dedicated flag before scheduling board_disable so it is invoked only once. Signed-off-by: Joe Lawrence Acked-by: Chad Dupuis --- drivers/scsi/qla2xxx/qla_def.h |3 +++ drivers/scsi/qla2xxx/qla_isr.c | 14 -- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h index b643991..4267ec6 100644 --- a/drivers/scsi/qla2xxx/qla_def.h +++ b/drivers/scsi/qla2xxx/qla_def.h @@ -3402,6 +3402,9 @@ typedef struct scsi_qla_host { #define FX00_CRITEMP_RECOVERY 25 #define FX00_HOST_INFO_RESEND 26 + unsigned long pci_flags; +#define PFLG_DISCONNECTED 0 /* PCI device removed */ + uint32_tdevice_flags; #define SWITCH_FOUND BIT_0 #define DFLG_NO_CABLE BIT_1 diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index 016ebed..fd75b91 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -117,12 +117,14 @@ qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg) { /* Check for PCI disconnection */ if (reg == 0x) { - /* -* Schedule this on the default system workqueue so that all the -* adapter workqueues and the DPC thread can be shutdown -* cleanly. -*/ - schedule_work(&vha->hw->board_disable); + if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags)) { + /* +* Schedule this (only once) on the default system +* workqueue so that all the adapter workqueues and the +* DPC thread can be shutdown cleanly. +*/ + schedule_work(&vha->hw->board_disable); + } return true; } else return false; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND 5/6] qla2xxx: Prevent removal and board_disable race
Introduce mutual exclusion between the qla2xxx_remove_one PCI driver callback and qla2x00_disable_board_on_pci_error, which is scheduled as board_disable work by qla2x00_check_reg{32,16}_for_disconnect: * Leave the driver-specific data attached to the underlying PCI device intact in qla2x00_disable_board_on_pci_error, so that qla2x00_remove_one has enough breadcrumbs to determine that any board_disable work has been completed. * In qla2xxx_remove_one, set a bit to prevent any subsequent board_disable work from scheduling, then cancel and wait until pending work has completed. * Reuse the PCI device enable count check in qla2x00_remove_one to determine if board_disable has occured. The original purpose of this check was unnecessary since the driver remove function wasn't called when the probe fails. Signed-off-by: Joe Lawrence Acked-by: Chad Dupuis --- drivers/scsi/qla2xxx/qla_def.h |1 + drivers/scsi/qla2xxx/qla_isr.c |3 ++- drivers/scsi/qla2xxx/qla_os.c | 31 +++ 3 files changed, 22 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h index 4267ec6..1398818 100644 --- a/drivers/scsi/qla2xxx/qla_def.h +++ b/drivers/scsi/qla2xxx/qla_def.h @@ -3404,6 +3404,7 @@ typedef struct scsi_qla_host { unsigned long pci_flags; #define PFLG_DISCONNECTED 0 /* PCI device removed */ +#define PFLG_DRIVER_REMOVING 1 /* PCI driver .remove */ uint32_tdevice_flags; #define SWITCH_FOUND BIT_0 diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index fd75b91..341c64d 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -117,7 +117,8 @@ qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg) { /* Check for PCI disconnection */ if (reg == 0x) { - if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags)) { + if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags) && + !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags)) { /* * Schedule this (only once) on the default system * workqueue so that all the adapter workqueues and the diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 3bfa89d..84d4df6 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -3131,15 +3131,25 @@ qla2x00_remove_one(struct pci_dev *pdev) scsi_qla_host_t *base_vha; struct qla_hw_data *ha; + base_vha = pci_get_drvdata(pdev); + ha = base_vha->hw; + + /* Indicate device removal to prevent future board_disable and wait +* until any pending board_disable has completed. */ + set_bit(PFLG_DRIVER_REMOVING, &base_vha->pci_flags); + cancel_work_sync(&ha->board_disable); + /* -* If the PCI device is disabled that means that probe failed and any -* resources should be have cleaned up on probe exit. +* If the PCI device is disabled then there was a PCI-disconnect and +* qla2x00_disable_board_on_pci_error has taken care of most of the +* resources. */ - if (!atomic_read(&pdev->enable_cnt)) + if (!atomic_read(&pdev->enable_cnt)) { + scsi_host_put(base_vha->host); + kfree(ha); + pci_set_drvdata(pdev, NULL); return; - - base_vha = pci_get_drvdata(pdev); - ha = base_vha->hw; + } qla2x00_wait_for_hba_ready(base_vha); @@ -4799,18 +4809,15 @@ qla2x00_disable_board_on_pci_error(struct work_struct *work) qla82xx_md_free(base_vha); qla2x00_free_queues(ha); - scsi_host_put(base_vha->host); - qla2x00_unmap_iobases(ha); pci_release_selected_regions(ha->pdev, ha->bars); - kfree(ha); - ha = NULL; - pci_disable_pcie_error_reporting(pdev); pci_disable_device(pdev); - pci_set_drvdata(pdev, NULL); + /* +* Let qla2x00_remove_one cleanup qla_hw_data on device removal. +*/ } /** -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND 2/6] qla2xxx: Use qla2x00_clear_drv_active on probe failure
Take advantage of commit fe1b806f4f71 ("qla2xxx: Refactor shutdown code so some functionality can be reused") to remove an inlined copy of qla2x00_clear_drv_active in the driver's probe hardware error path. Signed-off-by: Joe Lawrence Acked-by: Chad Dupuis --- drivers/scsi/qla2xxx/qla_os.c | 13 +++-- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 8252c0e..53449d7 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -238,6 +238,7 @@ static int qla2xxx_eh_host_reset(struct scsi_cmnd *); static int qla2x00_change_queue_depth(struct scsi_device *, int, int); static int qla2x00_change_queue_type(struct scsi_device *, int); +static void qla2x00_clear_drv_active(struct qla_hw_data *); static void qla2x00_free_device(scsi_qla_host_t *); struct scsi_host_template qla2xxx_driver_template = { @@ -2954,16 +2955,8 @@ probe_failed: scsi_host_put(base_vha->host); probe_hw_failed: - if (IS_QLA82XX(ha)) { - qla82xx_idc_lock(ha); - qla82xx_clear_drv_active(ha); - qla82xx_idc_unlock(ha); - } - if (IS_QLA8044(ha)) { - qla8044_idc_lock(ha); - qla8044_clear_drv_active(ha); - qla8044_idc_unlock(ha); - } + qla2x00_clear_drv_active(ha); + iospace_config_failed: if (IS_P3P_TYPE(ha)) { if (!ha->nx_pcibase) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND 1/6] qla2xxx: Fix shost use-after-free on device removal
Once calling scsi_host_put, be careful to not access qla_hw_data through the Scsi_Host private data (ie, scsi_qla_host base_vha). Fixes: fe1b806f4f71 ("qla2xxx: Refactor shutdown code so some functionality can be reused") Cc: sta...@vger.kernel.org # 3.14, 3.15, 3.16 Signed-off-by: Joe Lawrence Acked-by: Chad Dupuis --- drivers/scsi/qla2xxx/qla_os.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index be9698d..8252c0e 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -3119,10 +3119,8 @@ qla2x00_unmap_iobases(struct qla_hw_data *ha) } static void -qla2x00_clear_drv_active(scsi_qla_host_t *vha) +qla2x00_clear_drv_active(struct qla_hw_data *ha) { - struct qla_hw_data *ha = vha->hw; - if (IS_QLA8044(ha)) { qla8044_idc_lock(ha); qla8044_clear_drv_active(ha); @@ -3193,7 +3191,7 @@ qla2x00_remove_one(struct pci_dev *pdev) scsi_host_put(base_vha->host); - qla2x00_clear_drv_active(base_vha); + qla2x00_clear_drv_active(ha); qla2x00_unmap_iobases(ha); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND 0/6] qla2xxx device removal fixups
Resending patchset against scsi-queue/drivers-for-3.18 with Acked-by and Cc stable annotations. Joe Lawrence (6): qla2xxx: Fix shost use-after-free on device removal qla2xxx: Use qla2x00_clear_drv_active on probe failure qla2xxx: Collect PCI register checks and board_disable scheduling qla2xxx: Schedule board_disable only once qla2xxx: Prevent removal and board_disable race qla2xxx: Prevent probe and board_disable race drivers/scsi/qla2xxx/qla_def.h |5 drivers/scsi/qla2xxx/qla_gbl.h |3 +- drivers/scsi/qla2xxx/qla_isr.c | 44 +++-- drivers/scsi/qla2xxx/qla_mr.c |2 +- drivers/scsi/qla2xxx/qla_nx.c |6 ++-- drivers/scsi/qla2xxx/qla_os.c | 60 ++-- 6 files changed, 61 insertions(+), 59 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] qla2xxx device removal fixups
On Wed, 18 Jun 2014 10:02:13 -0400 Joe Lawrence wrote: > Hi Chad, Giri, et al. > > Stratus has been testing the upstream qla2xxx driver against surprise > device removal and has found a few minor issues along the way. With > this patchset, results have been good. Although the following changes > may be most interesting on a Stratus platform, they should be applicable > to general hotplug scenarios on other hardware. > > The first two patches are independent from the others and can be > reviewed as such. One fixes up a use-after-free and the other > consolidates some code. Patch 1 corrects a bug introduced by commit fe1b806, which has made its way into three linux-stable releases: % git tag --contains fe1b806 | grep '^v3.[0-9]*$' v3.14 v3.15 v3.16 > The final four patches are more invasive and modify the scsi_qla_host > structure to avoid device removal race conditions. These changes were > written to demonstrate the problem at hand and minimally fix them in > order to continue testing. (For example, adding completely a new > pci_flags member to scsi_qla_host is probably overkill.) > > We would welcome any feedback or questions about our testing. Chad ack'd the series on 28 Jul 2014. Any chance this can make into hch's queue for 3.18? Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] qla2xxx: Prevent removal and board_disable race
On Fri, 25 Jul 2014 11:23:02 -0400 Chad Dupuis wrote: > > > On Wed, 18 Jun 2014, Joe Lawrence wrote: > > > Introduce mutual exclusion between the qla2xxx_remove_one PCI driver > > callback and qla2x00_disable_board_on_pci_error, which is scheduled as > > board_disable work by qla2x00_check_reg{32,16}_for_disconnect: > > > > * Leave the driver-specific data attached to the underlying PCI device > > intact in qla2x00_disable_board_on_pci_error, so that qla2x00_remove_one > > has enough breadcrumbs to determine that any board_disable work has been > > completed. > > > > * In qla2xxx_remove_one, set a bit to prevent any subsequent > > board_disable work from scheduling, then cancel and wait until pending > > work has completed. > > > > * Reuse the PCI device enable count check in qla2x00_remove_one to > > determine if board_disable has occured. The original purpose of this > > check was unnecessary since the driver remove function wasn't called > > when the probe fails. > > > > Signed-off-by: Joe Lawrence > > --- > > drivers/scsi/qla2xxx/qla_def.h |1 + > > drivers/scsi/qla2xxx/qla_isr.c |3 ++- > > drivers/scsi/qla2xxx/qla_os.c | 31 +++ > > 3 files changed, 22 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h > > index 1267b11..7c441c9 100644 > > --- a/drivers/scsi/qla2xxx/qla_def.h > > +++ b/drivers/scsi/qla2xxx/qla_def.h > > @@ -3404,6 +3404,7 @@ typedef struct scsi_qla_host { > > > > unsigned long pci_flags; > > #define PFLG_DISCONNECTED 0 /* PCI device removed */ > > +#define PFLG_DRIVER_REMOVING 1 /* PCI driver .remove */ > > > > uint32_tdevice_flags; > > #define SWITCH_FOUNDBIT_0 > > diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c > > index 2741ad8..ee5eef4 100644 > > --- a/drivers/scsi/qla2xxx/qla_isr.c > > +++ b/drivers/scsi/qla2xxx/qla_isr.c > > @@ -117,7 +117,8 @@ qla2x00_check_reg32_for_disconnect(scsi_qla_host_t > > *vha, uint32_t reg) > > { > > /* Check for PCI disconnection */ > > if (reg == 0x) { > > - if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags)) { > > + if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags) && > > + !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags)) { > > In our remove function we set a bit that we are unloading: > > set_bit (UNLOADING, &base_vha->dpc_flags); > > could this be used instead? Hi Chad, Thanks for the comments. I think with a little bit of code shuffling this could be accomplished. My goal with the patchset was to try and present the problem/fix as plain as possible. It was easiest to collect all the atomic bits I needed inside a single variable. Should I be tacking on such flags to 'dpc_flags' ? > > > /* > > * Schedule this (only once) on the default system > > * workqueue so that all the adapter workqueues and the > > diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c > > index 39c9953..51cba37 100644 > > --- a/drivers/scsi/qla2xxx/qla_os.c > > +++ b/drivers/scsi/qla2xxx/qla_os.c > > @@ -3123,15 +3123,25 @@ qla2x00_remove_one(struct pci_dev *pdev) > > scsi_qla_host_t *base_vha; > > struct qla_hw_data *ha; > > > > + base_vha = pci_get_drvdata(pdev); > > + ha = base_vha->hw; > > + > > + /* Indicate device removal to prevent future board_disable and wait > > +* until any pending board_disable has completed. */ > > + set_bit(PFLG_DRIVER_REMOVING, &base_vha->pci_flags); > > + cancel_work_sync(&ha->board_disable); > > + > > /* > > -* If the PCI device is disabled that means that probe failed and any > > -* resources should be have cleaned up on probe exit. > > +* If the PCI device is disabled then there was a PCI-disconnect and > > +* qla2x00_disable_board_on_pci_error has taken care of most of the > > +* resources. > > */ > > - if (!atomic_read(&pdev->enable_cnt)) > > + if (!atomic_read(&pdev->enable_cnt)) { > > Should we also add a check here that this is a disconnection before > freeing these structs? The original intent of the check for > pdev->enable_cnt is to make sure we don't try to dereference an already > freed struct if probe failed. I'm not exactly
Re: [PATCH] lpfc: Avoid to disable pci_dev twice
[ +cc linux-pci and Bjorn, comments inline/below ... ] On Thu, 17 Jul 2014 02:32:31 -0400 Mike Qiu wrote: > In IBM Power servers, when hardware error occurs during probe > state, EEH subsystem will call driver's error_detected interface, > which will call pci_disable_device(). But driver's probe function also > call pci_disable_device() in this situation. > > So pci_dev will be disabled twice: > > Device lpfc disabling already-disabled device > [ cut here ] > WARNING: at drivers/pci/pci.c:1407 > CPU: 0 PID: 8744 Comm: kworker/0:0 Tainted: GW > 3.10.42-2002.pkvm2_1_1.6.ppc64 #1 > Workqueue: events .work_for_cpu_fn > task: c0274e3f5400 ti: c027d3958000 task.ti: c027d3958000 > NIP: c0471b8c LR: c0471b88 CTR: c043ebe0 > REGS: c027d395b650 TRAP: 0700 Tainted: GW > (3.10.42-2002.pkvm2_1_1.6.ppc64) > MSR: 900100029032 CR: 28b52b44 XER: 2000 > CFAR: c0879ab8 SOFTE: 1 > ... > NIP .pci_disable_device+0xcc/0xe0 > LR .pci_disable_device+0xc8/0xe0 > Call Trace: > .pci_disable_device+0xc8/0xe0 (unreliable) > .lpfc_disable_pci_dev+0x50/0x80 [lpfc] > .lpfc_pci_probe_one+0x870/0x21a0 [lpfc] > .local_pci_probe+0x68/0xb0 > .work_for_cpu_fn+0x38/0x60 > .process_one_work+0x1a4/0x4d0 > .worker_thread+0x37c/0x490 > .kthread+0xf0/0x100 > .ret_from_kernel_thread+0x5c/0x80 > > Signed-off-by: Mike Qiu > --- > drivers/scsi/lpfc/lpfc.h | 1 + > drivers/scsi/lpfc/lpfc_init.c | 59 > +++ > 2 files changed, 55 insertions(+), 5 deletions(-) > > diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h > index 434e903..0c7bad9 100644 > --- a/drivers/scsi/lpfc/lpfc.h > +++ b/drivers/scsi/lpfc/lpfc.h > @@ -813,6 +813,7 @@ struct lpfc_hba { > #define VPD_MASK0xf /* mask for any vpd data */ > > uint8_t soft_wwn_enable; > + uint8_t probe_done; > > struct timer_list fcp_poll_timer; > struct timer_list eratt_poll; > diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c > index 06f9a5b..c2e67ae 100644 > --- a/drivers/scsi/lpfc/lpfc_init.c > +++ b/drivers/scsi/lpfc/lpfc_init.c > @@ -9519,6 +9519,9 @@ lpfc_pci_probe_one_s3(struct pci_dev *pdev, const > struct pci_device_id *pid) > } > } > > + /* Set the probe flag */ > + phba->probe_done = 1; > + > /* Perform post initialization setup */ > lpfc_post_init_setup(phba); > > @@ -9795,6 +9798,9 @@ lpfc_sli_prep_dev_for_recover(struct lpfc_hba *phba) > static void > lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) > { > + if (phba) > + return; > + Should that be "if *not* phba" like the others below? > lpfc_printf_log(phba, KERN_ERR, LOG_INIT, > "2710 PCI channel disable preparing for reset\n"); > > @@ -9812,7 +9818,8 @@ lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) > > /* Disable interrupt and pci device */ > lpfc_sli_disable_intr(phba); > - pci_disable_device(phba->pcidev); > + if (phba->probe_done && phba->pcidev) > + pci_disable_device(phba->pcidev); > } > > /** > @@ -10282,6 +10289,9 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const > struct pci_device_id *pid) > goto out_disable_intr; > } > > + /* Set probe_done flag */ > + phba->probe_done = 1; > + > /* Log the current active interrupt mode */ > phba->intr_mode = intr_mode; > lpfc_log_intr_mode(phba, intr_mode); > @@ -10544,6 +10554,9 @@ lpfc_sli4_prep_dev_for_recover(struct lpfc_hba *phba) > static void > lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) > { > + if (!phba) > + return; > + > lpfc_printf_log(phba, KERN_ERR, LOG_INIT, > "2826 PCI channel disable preparing for reset\n"); > > @@ -10562,7 +10575,9 @@ lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) > /* Disable interrupt and pci device */ > lpfc_sli4_disable_intr(phba); > lpfc_sli4_queue_destroy(phba); > - pci_disable_device(phba->pcidev); > + > + if (phba->probe_done && phba->pcidev) > + pci_disable_device(phba->pcidev); > } > > /** > @@ -10893,9 +10908,21 @@ static pci_ers_result_t > lpfc_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) > { > struct Scsi_Host *shost = pci_get_drvdata(pdev); > - struct lpfc_hba *phba = ((struct lpfc_vport *)shost->hostdata)->phba; > + struct lpfc_hba *phba; > pci_ers_result_t rc = PCI_ERS_RESULT_DISCONNECT; > > + if (!shost) > + /* Run here means it may during probe state and > + * Scsi_Host has not been created and We can do nothing > + * in this state so call for hotplug*/ > + return PCI_ERS_RESULT_NONE; Is it possible to get here during device removal, ie lpfc_pci_remove_one? If so, we may have shost in hand now, but can these routi
Re: mpt2sas stuck installing
On Sat, 12 Jul 2014 03:13:07 + "Elliott, Robert (Server Storage)" wrote: > > > > -Original Message- > > From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi- > > ow...@vger.kernel.org] On Behalf Of Joe Lawrence > ... > > In your crash stack trace, the scsi error handler has issued a host > > reset, but then crashed in mpt2sas_base_get_iocstate. Reading through > > _scsih_shutdown, I don't believe that the mpt2sas .shutdown path takes > > any precaution before heading down mpt2sas_base_detach to free adapter > > resources. The ordinary .remove path looks similar, though it does > > call sas_remove_host before freeing resources, *then* scsi_remove_host > > and scsi_host_put. > > > > I wonder if this ordering needs to be reversed (and added to > > _scsih_shutdown) to properly de-register from the SCSI midlayer prior > > to removing the controller instance. > > > > Regards, > > > > -- Joe > > Nagalakshmi was working on an mpt3sas patch for the scsi-mq tree > to do just that. I don't recall if the patch has made it into the > scsi-mq tree yet. Apparently it's also needed for non-mq and mpt2sas. > > It is making this change: > > sas_remove_host(shost); > > + scsi_remove_host(shost); > > mpt3sas_base_detach(ioc); > > list_del(&ioc->list); > > - scsi_remove_host(shost); > > scsi_host_put(shost); > > We are making a similar change in hpsa. Doing so led to a general > protection fault, which unveiled that we also needed to change > cancel_delayed_work() calls to cancel_delayed_work_sync() to > ensure there are no worker functions still active after the > scsi_host structure is unregistered. It looks like Sreekanth posted those patches today -- the new ordering seems correct, but is there any reason why _scsih_shutdown skips {sas,scsi}_remove_host calls? Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi: mpt2sas: mpt2sas_base.c: Fix for possible null pointer dereference
On Wed, 28 May 2014, Christoph Hellwig wrote: > > - ioc_status = le16_to_cpu(mpi_reply->IOCStatus) & MPI2_IOCSTATUS_MASK; > > + if (mpi_reply) { > > + ioc_status = le16_to_cpu(mpi_reply->IOCStatus) & > > MPI2_IOCSTATUS_MASK; > > + } > > > > if (ioc_status != MPI2_IOCSTATUS_SUCCESS) > > ioc->port_enable_failed = 1; > > ioc_status isn't initialized without the reply and used here as well > as later in the function. I think we'll need input from LSI or others > with the spec on what to do when we didn't get a reply. Any update on this? The mpt3 version checks for !mpi_reply and returns 1. Which leads to another question -- should mpt{2,3}sas_port_enable_done ever return 0 (as their respective comments describe)? Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drivers: message: fusion: Simplify rounding
ALIGN is certainly more readable to me. Reviewed-by: Joe Lawrence -- Joe On Tue, 1 Jul 2014, Rasmus Villemoes wrote: > Rounding up to a multiple of 4 should be done using the ALIGN > macro. As a bonus, this also makes the generated code smaller. > > In GetIocFacts(), sz is assigned to a few lines below without being > read in the meantime, so it is ok that it doesn't end up with the same > value as facts->FWImageSize. > > Signed-off-by: Rasmus Villemoes > --- > drivers/message/fusion/mptbase.c | 7 +-- > drivers/message/fusion/mptctl.c | 7 +-- > 2 files changed, 2 insertions(+), 12 deletions(-) > > diff --git a/drivers/message/fusion/mptbase.c > b/drivers/message/fusion/mptbase.c > index ebc0af7..10b16a1 100644 > --- a/drivers/message/fusion/mptbase.c > +++ b/drivers/message/fusion/mptbase.c > @@ -3175,12 +3175,7 @@ GetIocFacts(MPT_ADAPTER *ioc, int sleepFlag, int > reason) > facts->FWImageSize = le32_to_cpu(facts->FWImageSize); > } > > - sz = facts->FWImageSize; > - if ( sz & 0x01 ) > - sz += 1; > - if ( sz & 0x02 ) > - sz += 2; > - facts->FWImageSize = sz; > + facts->FWImageSize = ALIGN(facts->FWImageSize, 4); > > if (!facts->RequestFrameSize) { > /* Something is wrong! */ > diff --git a/drivers/message/fusion/mptctl.c b/drivers/message/fusion/mptctl.c > index 8a050e8..1004392 100644 > --- a/drivers/message/fusion/mptctl.c > +++ b/drivers/message/fusion/mptctl.c > @@ -1749,12 +1749,7 @@ mptctl_replace_fw (unsigned long arg) > > /* Allocate memory for the new FW image >*/ > - newFwSize = karg.newImageSize; > - > - if (newFwSize & 0x01) > - newFwSize += 1; > - if (newFwSize & 0x02) > - newFwSize += 2; > + newFwSize = ALIGN(karg.newImageSize, 4); > > mpt_alloc_fw_memory(ioc, newFwSize); > if (ioc->cached_fw == NULL) > -- > 1.9.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 10/11] mptfusion: combine fw_event_work and its event_data
Tack the firmware reply event_data payload to the end of its corresponding struct fw_event_work allocation. Rework fw_event_work allocation calculations to include the event_data size where appropriate. This clarifies the code a bit and avoids the following smatch warnings: drivers/message/fusion/mptsas.c:1003 mptsas_queue_device_delete() error: memcpy() 'fw_event->event_data' too small (29 vs 36) drivers/message/fusion/mptsas.c:1017 mptsas_queue_rescan() error: not allocating enough data 168 vs 160 Signed-off-by: Joe Lawrence Cc: Dan Carpenter Cc: Sreekanth Reddy Reviewed-by: Christoph Hellwig --- drivers/message/fusion/mptsas.c | 16 ++-- drivers/message/fusion/mptsas.h |2 +- 2 files changed, 7 insertions(+), 11 deletions(-) diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index 54b51a9..5454d838 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -990,11 +990,10 @@ mptsas_queue_device_delete(MPT_ADAPTER *ioc, MpiEventDataSasDeviceStatusChange_t *sas_event_data) { struct fw_event_work *fw_event; - int sz; - sz = offsetof(struct fw_event_work, event_data) + - sizeof(MpiEventDataSasDeviceStatusChange_t); - fw_event = kzalloc(sz, GFP_ATOMIC); + fw_event = kzalloc(sizeof(*fw_event) + + sizeof(MpiEventDataSasDeviceStatusChange_t), + GFP_ATOMIC); if (!fw_event) { printk(MYIOC_s_WARN_FMT "%s: failed at (line=%d)\n", ioc->name, __func__, __LINE__); @@ -1011,10 +1010,8 @@ static void mptsas_queue_rescan(MPT_ADAPTER *ioc) { struct fw_event_work *fw_event; - int sz; - sz = offsetof(struct fw_event_work, event_data); - fw_event = kzalloc(sz, GFP_ATOMIC); + fw_event = kzalloc(sizeof(*fw_event), GFP_ATOMIC); if (!fw_event) { printk(MYIOC_s_WARN_FMT "%s: failed at (line=%d)\n", ioc->name, __func__, __LINE__); @@ -4983,7 +4980,7 @@ static int mptsas_event_process(MPT_ADAPTER *ioc, EventNotificationReply_t *reply) { u32 event = le32_to_cpu(reply->Event); - int sz, event_data_sz; + int event_data_sz; struct fw_event_work *fw_event; unsigned long delay; @@ -5093,8 +5090,7 @@ mptsas_event_process(MPT_ADAPTER *ioc, EventNotificationReply_t *reply) event_data_sz = ((reply->MsgLength * 4) - offsetof(EventNotificationReply_t, Data)); - sz = offsetof(struct fw_event_work, event_data) + event_data_sz; - fw_event = kzalloc(sz, GFP_ATOMIC); + fw_event = kzalloc(sizeof(*fw_event) + event_data_sz, GFP_ATOMIC); if (!fw_event) { printk(MYIOC_s_WARN_FMT "%s: failed at (line=%d)\n", ioc->name, __func__, __LINE__); diff --git a/drivers/message/fusion/mptsas.h b/drivers/message/fusion/mptsas.h index 57e86ab..c396483 100644 --- a/drivers/message/fusion/mptsas.h +++ b/drivers/message/fusion/mptsas.h @@ -110,7 +110,7 @@ struct fw_event_work { MPT_ADAPTER *ioc; u32 event; u8 retries; - u8 __attribute__((aligned(4))) event_data[1]; + charevent_data[0] __aligned(4); }; struct mptsas_discovery_event { -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 02/11] mpt2sas: combine fw_event_work and its event_data
Tack the firmware reply event_data payload to the end of its corresponding struct fw_event_work allocation. This matches the convention in the mptfusion driver and simplifies the code. Signed-off-by: Joe Lawrence Cc: Dan Carpenter Cc: Christoph Hellwig Cc: Sreekanth Reddy --- drivers/scsi/mpt2sas/mpt2sas_scsih.c | 52 -- 1 file changed, 30 insertions(+), 22 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c index 13e49c3..e7801ff 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c @@ -173,7 +173,7 @@ struct fw_event_work { u8 VP_ID; u8 ignore; u16 event; - void*event_data; + charevent_data[0] __aligned(4); }; /* raid transport support */ @@ -2834,7 +2834,6 @@ _scsih_fw_event_free(struct MPT2SAS_ADAPTER *ioc, struct fw_event_work spin_lock_irqsave(&ioc->fw_event_lock, flags); list_del(&fw_event->list); - kfree(fw_event->event_data); kfree(fw_event); spin_unlock_irqrestore(&ioc->fw_event_lock, flags); } @@ -3520,7 +3519,8 @@ _scsih_check_topo_delete_events(struct MPT2SAS_ADAPTER *ioc, if (fw_event->event != MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST || fw_event->ignore) continue; - local_event_data = fw_event->event_data; + local_event_data = (Mpi2EventDataSasTopologyChangeList_t *) + fw_event->event_data; if (local_event_data->ExpStatus == MPI2_EVENT_SAS_TOPO_ES_ADDED || local_event_data->ExpStatus == @@ -5504,7 +5504,9 @@ _scsih_sas_topology_change_event(struct MPT2SAS_ADAPTER *ioc, u64 sas_address; unsigned long flags; u8 link_rate, prev_link_rate; - Mpi2EventDataSasTopologyChangeList_t *event_data = fw_event->event_data; + Mpi2EventDataSasTopologyChangeList_t *event_data = + (Mpi2EventDataSasTopologyChangeList_t *) + fw_event->event_data; #ifdef CONFIG_SCSI_MPT2SAS_LOGGING if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK) @@ -5699,7 +5701,8 @@ _scsih_sas_device_status_change_event(struct MPT2SAS_ADAPTER *ioc, u64 sas_address; unsigned long flags; Mpi2EventDataSasDeviceStatusChange_t *event_data = - fw_event->event_data; + (Mpi2EventDataSasDeviceStatusChange_t *) + fw_event->event_data; #ifdef CONFIG_SCSI_MPT2SAS_LOGGING if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK) @@ -5794,6 +5797,7 @@ _scsih_sas_enclosure_dev_status_change_event(struct MPT2SAS_ADAPTER *ioc, #ifdef CONFIG_SCSI_MPT2SAS_LOGGING if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK) _scsih_sas_enclosure_dev_status_change_event_debug(ioc, +(Mpi2EventDataSasEnclDevStatusChange_t *) fw_event->event_data); #endif } @@ -5818,7 +5822,9 @@ _scsih_sas_broadcast_primitive_event(struct MPT2SAS_ADAPTER *ioc, u32 termination_count; u32 query_count; Mpi2SCSITaskManagementReply_t *mpi_reply; - Mpi2EventDataSasBroadcastPrimitive_t *event_data = fw_event->event_data; + Mpi2EventDataSasBroadcastPrimitive_t *event_data = + (Mpi2EventDataSasBroadcastPrimitive_t *) + fw_event->event_data; u16 ioc_status; unsigned long flags; int r; @@ -5969,7 +5975,9 @@ static void _scsih_sas_discovery_event(struct MPT2SAS_ADAPTER *ioc, struct fw_event_work *fw_event) { - Mpi2EventDataSasDiscovery_t *event_data = fw_event->event_data; + Mpi2EventDataSasDiscovery_t *event_data = + (Mpi2EventDataSasDiscovery_t *) + fw_event->event_data; #ifdef CONFIG_SCSI_MPT2SAS_LOGGING if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK) { @@ -6357,7 +6365,9 @@ _scsih_sas_ir_config_change_event(struct MPT2SAS_ADAPTER *ioc, Mpi2EventIrConfigElement_t *element; int i; u8 foreign_config; - Mpi2EventDataIrConfigChangeList_t *event_data = fw_event->event_data; + Mpi2EventDataIrConfigChangeList_t *event_data = + (Mpi2EventDataIrConfigChangeList_t *) + fw_event->event_data; #ifdef CONFIG_SCSI_MPT2SAS_LOGGING if ((ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK) @@ -6425,7 +6435,9 @@ _scsih_sas_ir_volume_event(struct MPT2SAS_ADAPTER *ioc, u16 handle; u32 state; int rc; - Mpi2EventDataIrVolume_t *event_data = fw_event->event_data; + Mpi2EventDataIrVolume_t *event_data = + (Mpi2EventDataIrVolume_t *) +
[PATCH v2 11/11] mptfusion: tweak null pointer checks
Fixes the following smatch warnings: drivers/message/fusion/mptbase.c:652 mptbase_reply() warn: variable dereferenced before check 'reply' (see line 639) [JL: No-brainer, the enclosing switch statement dereferences reply, so we can't get here unless reply is valid.] drivers/message/fusion/mptsas.c:1255 mptsas_taskmgmt_complete() error: we previously assumed 'pScsiTmReply' could be null (see line 1227) [HCH: Reading the code in mptsas_taskmgmt_complete it's pretty obvious that it can't do anything useful if mr/pScsiTmReply are NULL, so I suspect it would be best to just return at the beginning of the function. I'd love to understand if it actually could ever be zero, which I doubt. Maybe the LSI people can shed some light on that?] drivers/message/fusion/mptsas.c:3888 mptsas_not_responding_devices() error: we previously assumed 'port_info->phy_info' could be null (see line 3875) [HCH: It's pretty obvious from reading mptsas_sas_io_unit_pg0 that we never register a port_info with a NULL phy_info in the lists, so all NULL checks on it could be deleted.] drivers/message/fusion/mptscsih.c:1284 mptscsih_info() error: we previously assumed 'h' could be null (see line 1274) [HCH: shost_priv can't return NULL, so the if (h) should be removed.] drivers/message/fusion/mptscsih.c:1388 mptscsih_qcmd() error: we previously assumed 'vdevice' could be null (see line 1373) [HCH: vdevice can't ever be NULL here, it's allocated in ->slave_alloc and thus guaranteed to be around when ->queuecommand is called.] Signed-off-by: Joe Lawrence Cc: Christoph Hellwig Cc: Dan Carpenter Cc: Sreekanth Reddy --- drivers/message/fusion/mptbase.c | 10 +++ drivers/message/fusion/mptsas.c | 52 ++--- drivers/message/fusion/mptscsih.c | 19 ++ 3 files changed, 37 insertions(+), 44 deletions(-) diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c index 9d4c782..a896d94 100644 --- a/drivers/message/fusion/mptbase.c +++ b/drivers/message/fusion/mptbase.c @@ -649,12 +649,10 @@ mptbase_reply(MPT_ADAPTER *ioc, MPT_FRAME_HDR *req, MPT_FRAME_HDR *reply) case MPI_FUNCTION_CONFIG: case MPI_FUNCTION_SAS_IO_UNIT_CONTROL: ioc->mptbase_cmds.status |= MPT_MGMT_STATUS_COMMAND_GOOD; - if (reply) { - ioc->mptbase_cmds.status |= MPT_MGMT_STATUS_RF_VALID; - memcpy(ioc->mptbase_cmds.reply, reply, - min(MPT_DEFAULT_FRAME_SIZE, - 4 * reply->u.reply.MsgLength)); - } + ioc->mptbase_cmds.status |= MPT_MGMT_STATUS_RF_VALID; + memcpy(ioc->mptbase_cmds.reply, reply, + min(MPT_DEFAULT_FRAME_SIZE, + 4 * reply->u.reply.MsgLength)); if (ioc->mptbase_cmds.status & MPT_MGMT_STATUS_PENDING) { ioc->mptbase_cmds.status &= ~MPT_MGMT_STATUS_PENDING; complete(&ioc->mptbase_cmds.done); diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index 5454d838..e3494a6 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -1203,27 +1203,28 @@ mptsas_taskmgmt_complete(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf, MPT_FRAME_HDR *mr) "(mf = %p, mr = %p)\n", ioc->name, mf, mr)); pScsiTmReply = (SCSITaskMgmtReply_t *)mr; - if (pScsiTmReply) { - dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT - "\tTaskMgmt completed: fw_channel = %d, fw_id = %d,\n" - "\ttask_type = 0x%02X, iocstatus = 0x%04X " - "loginfo = 0x%08X,\n\tresponse_code = 0x%02X, " - "term_cmnds = %d\n", ioc->name, - pScsiTmReply->Bus, pScsiTmReply->TargetID, - pScsiTmReply->TaskType, - le16_to_cpu(pScsiTmReply->IOCStatus), - le32_to_cpu(pScsiTmReply->IOCLogInfo), - pScsiTmReply->ResponseCode, - le32_to_cpu(pScsiTmReply->TerminationCount))); - - if (pScsiTmReply->ResponseCode) - mptscsih_taskmgmt_response_code(ioc, - pScsiTmReply->ResponseCode); - } - - if (pScsiTmReply && (pScsiTmReply->TaskType == + if (!pScsiTmReply) + return 0; + + dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT + "\tTaskMgmt completed: fw_channel = %d, fw_id = %d,\n" + "\ttask_type = 0x%02X, iocstatus = 0
[PATCH v2 09/11] mptfusion: make adapter prod_name[] a pointer
The struct _MPT_ADAPTER doesn't need a full copy of the product string, so prod_name can point to the string literal storage that the driver already provides. Avoids the following smatch warning: drivers/message/fusion/mptbase.c:2858 MptDisplayIocCapabilities() warn: this array is probably non-NULL. 'ioc->prod_name' Signed-off-by: Joe Lawrence Cc: Dan Carpenter Cc: Sreekanth Reddy Reviewed-by: Christoph Hellwig --- drivers/message/fusion/mptbase.c | 11 +-- drivers/message/fusion/mptbase.h |2 +- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c index ea30033..9d4c782 100644 --- a/drivers/message/fusion/mptbase.c +++ b/drivers/message/fusion/mptbase.c @@ -1408,8 +1408,8 @@ mpt_verify_adapter(int iocid, MPT_ADAPTER **iocpp) * in /proc/mpt/summary and /sysfs/class/scsi_host/host/version_product * **/ -static void -mpt_get_product_name(u16 vendor, u16 device, u8 revision, char *prod_name) +static const char* +mpt_get_product_name(u16 vendor, u16 device, u8 revision) { char *product_str = NULL; @@ -1635,8 +1635,7 @@ mpt_get_product_name(u16 vendor, u16 device, u8 revision, char *prod_name) } out: - if (product_str) - sprintf(prod_name, "%s", product_str); + return product_str; } /** @@ -1887,8 +1886,8 @@ mpt_attach(struct pci_dev *pdev, const struct pci_device_id *id) dinitprintk(ioc, printk(MYIOC_s_INFO_FMT "facts @ %p, pfacts[0] @ %p\n", ioc->name, &ioc->facts, &ioc->pfacts[0])); - mpt_get_product_name(pdev->vendor, pdev->device, pdev->revision, -ioc->prod_name); + ioc->prod_name = mpt_get_product_name(pdev->vendor, pdev->device, + pdev->revision); switch (pdev->device) { diff --git a/drivers/message/fusion/mptbase.h b/drivers/message/fusion/mptbase.h index 76c05bc..78c8104 100644 --- a/drivers/message/fusion/mptbase.h +++ b/drivers/message/fusion/mptbase.h @@ -605,7 +605,7 @@ typedef struct _MPT_ADAPTER int id;/* Unique adapter id N {0,1,2,...} */ int pci_irq; /* This irq */ char name[MPT_NAME_LENGTH]; /* "iocN" */ - char prod_name[MPT_NAME_LENGTH];/* "LSIFC9x9" */ + const char *prod_name;/* "LSIFC9x9" */ #ifdef CONFIG_FUSION_LOGGING /* used in mpt_display_event_info */ char evStr[EVENT_DESCR_STR_SZ]; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 08/11] mptfusion: use memdup_user
Let memdup_user handle the kmalloc, copy_from_user and error checking kfree code. Spotted by the following smatch (false positive) warning: drivers/message/fusion/mptctl.c:1369 mptctl_getiocinfo() warn: possible info leak 'karg' Signed-off-by: Joe Lawrence Cc: Christoph Hellwig Cc: Dan Carpenter Cc: Sreekanth Reddy --- drivers/message/fusion/mptctl.c | 18 +- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/drivers/message/fusion/mptctl.c b/drivers/message/fusion/mptctl.c index 8a050e8..b0a892a 100644 --- a/drivers/message/fusion/mptctl.c +++ b/drivers/message/fusion/mptctl.c @@ -1261,19 +1261,11 @@ mptctl_getiocinfo (unsigned long arg, unsigned int data_size) else return -EFAULT; - karg = kmalloc(data_size, GFP_KERNEL); - if (karg == NULL) { - printk(KERN_ERR MYNAM "%s::mpt_ioctl_iocinfo() @%d - no memory available!\n", - __FILE__, __LINE__); - return -ENOMEM; - } - - if (copy_from_user(karg, uarg, data_size)) { - printk(KERN_ERR MYNAM "%s@%d::mptctl_getiocinfo - " - "Unable to read in mpt_ioctl_iocinfo struct @ %p\n", - __FILE__, __LINE__, uarg); - kfree(karg); - return -EFAULT; + karg = memdup_user(uarg, data_size); + if (IS_ERR(karg)) { + printk(KERN_ERR MYNAM "%s@%d::mpt_ioctl_iocinfo() - memdup_user returned error [%ld]\n", + __FILE__, __LINE__, PTR_ERR(karg)); + return PTR_ERR(karg); } if (((iocnum = mpt_verify_adapter(karg->hdr.iocnum, &ioc)) < 0) || -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 07/11] mptfusion: remove redundant kfree checks
Fixes the following smatch warnings: drivers/message/fusion/mptfc.c:529 mptfc_target_destroy() info: redundant null check on starget->hostdata calling kfree() drivers/message/fusion/mptspi.c:465 mptspi_target_destroy() info: redundant null check on starget->hostdata calling kfree() Signed-off-by: Joe Lawrence Cc: Dan Carpenter Cc: Sreekanth Reddy Reviewed-by: Christoph Hellwig --- drivers/message/fusion/mptfc.c |3 +-- drivers/message/fusion/mptspi.c |3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/message/fusion/mptfc.c b/drivers/message/fusion/mptfc.c index 02a3eef..21ac845 100644 --- a/drivers/message/fusion/mptfc.c +++ b/drivers/message/fusion/mptfc.c @@ -525,8 +525,7 @@ mptfc_target_destroy(struct scsi_target *starget) if (ri) /* better be! */ ri->starget = NULL; } - if (starget->hostdata) - kfree(starget->hostdata); + kfree(starget->hostdata); starget->hostdata = NULL; } diff --git a/drivers/message/fusion/mptspi.c b/drivers/message/fusion/mptspi.c index 7b4db9a..787933d 100644 --- a/drivers/message/fusion/mptspi.c +++ b/drivers/message/fusion/mptspi.c @@ -461,8 +461,7 @@ static int mptspi_target_alloc(struct scsi_target *starget) static void mptspi_target_destroy(struct scsi_target *starget) { - if (starget->hostdata) - kfree(starget->hostdata); + kfree(starget->hostdata); starget->hostdata = NULL; } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 06/11] mptfusion: mark file-private functions as static
Fixes the following sparse warnings: drivers/message/fusion/mptbase.c:7011:1: warning: symbol 'mpt_SoftResetHandler' was not declared. Should it be static? drivers/message/fusion/mptsas.c:1578:23: warning: symbol 'mptsas_refreshing_device_handles' was not declared. Should it be static? drivers/message/fusion/mptsas.c:3653:24: warning: symbol 'mptsas_expander_add' was not declared. Should it be static? drivers/message/fusion/mptsas.c:5327:1: warning: symbol 'mptsas_shutdown' was not declared. Should it be static? drivers/message/fusion/mptspi.c:624:1: warning: symbol 'mptscsih_quiesce_raid' was not declared. Should it be static? Signed-off-by: Joe Lawrence Cc: Dan Carpenter Cc: Sreekanth Reddy Reviewed-by: Christoph Hellwig --- drivers/message/fusion/mptbase.c |2 +- drivers/message/fusion/mptsas.c |6 +++--- drivers/message/fusion/mptspi.c |2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c index ebc0af7..ea30033 100644 --- a/drivers/message/fusion/mptbase.c +++ b/drivers/message/fusion/mptbase.c @@ -7007,7 +7007,7 @@ EXPORT_SYMBOL(mpt_halt_firmware); * IOC doesn't reply to any outstanding request. This will transfer IOC * to READY state. **/ -int +static int mpt_SoftResetHandler(MPT_ADAPTER *ioc, int sleepFlag) { int rc; diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index 711fcb5..54b51a9 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -1575,7 +1575,7 @@ mptsas_del_end_device(MPT_ADAPTER *ioc, struct mptsas_phyinfo *phy_info) mptsas_port_delete(ioc, phy_info->port_details); } -struct mptsas_phyinfo * +static struct mptsas_phyinfo * mptsas_refreshing_device_handles(MPT_ADAPTER *ioc, struct mptsas_devinfo *sas_device) { @@ -3648,7 +3648,7 @@ mptsas_send_expander_event(struct fw_event_work *fw_event) * @handle: * */ -struct mptsas_portinfo * +static struct mptsas_portinfo * mptsas_expander_add(MPT_ADAPTER *ioc, u16 handle) { struct mptsas_portinfo buffer, *port_info; @@ -5321,7 +5321,7 @@ mptsas_probe(struct pci_dev *pdev, const struct pci_device_id *id) return error; } -void +static void mptsas_shutdown(struct pci_dev *pdev) { MPT_ADAPTER *ioc = pci_get_drvdata(pdev); diff --git a/drivers/message/fusion/mptspi.c b/drivers/message/fusion/mptspi.c index 49d1133..7b4db9a 100644 --- a/drivers/message/fusion/mptspi.c +++ b/drivers/message/fusion/mptspi.c @@ -620,7 +620,7 @@ static void mptspi_read_parameters(struct scsi_target *starget) spi_width(starget) = (nego & MPI_SCSIDEVPAGE0_NP_WIDE) ? 1 : 0; } -int +static int mptscsih_quiesce_raid(MPT_SCSI_HOST *hd, int quiesce, u8 channel, u8 id) { MPT_ADAPTER *ioc = hd->ioc; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 05/11] mpt3sas: combine fw_event_work and its event_data
Tack the firmware reply event_data payload to the end of its corresponding struct fw_event_work allocation. This matches the convention in the mptfusion driver and simplifies the code. This avoids the following smatch warning: drivers/scsi/mpt3sas/mpt3sas_scsih.c:2519 mpt3sas_send_trigger_data_event() warn: possible memory leak of 'fw_event' Signed-off-by: Joe Lawrence Cc: Dan Carpenter Cc: Sreekanth Reddy Cc: Christoph Hellwig --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 56 +++--- 1 file changed, 31 insertions(+), 25 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index f3ee3b4..a14be8f 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -190,7 +190,7 @@ struct fw_event_work { u8 VP_ID; u8 ignore; u16 event; - void*event_data; + charevent_data[0] __aligned(4); }; /* raid transport support */ @@ -2492,7 +2492,6 @@ _scsih_fw_event_free(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work spin_lock_irqsave(&ioc->fw_event_lock, flags); list_del(&fw_event->list); - kfree(fw_event->event_data); kfree(fw_event); spin_unlock_irqrestore(&ioc->fw_event_lock, flags); } @@ -2513,12 +2512,10 @@ mpt3sas_send_trigger_data_event(struct MPT3SAS_ADAPTER *ioc, if (ioc->is_driver_loading) return; - fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC); + fw_event = kzalloc(sizeof(*fw_event) + sizeof(*event_data), + GFP_ATOMIC); if (!fw_event) return; - fw_event->event_data = kzalloc(sizeof(*event_data), GFP_ATOMIC); - if (!fw_event->event_data) - return; fw_event->event = MPT3SAS_PROCESS_TRIGGER_DIAG; fw_event->ioc = ioc; memcpy(fw_event->event_data, event_data, sizeof(*event_data)); @@ -3213,7 +3210,8 @@ _scsih_check_topo_delete_events(struct MPT3SAS_ADAPTER *ioc, if (fw_event->event != MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST || fw_event->ignore) continue; - local_event_data = fw_event->event_data; + local_event_data = (Mpi2EventDataSasTopologyChangeList_t *) + fw_event->event_data; if (local_event_data->ExpStatus == MPI2_EVENT_SAS_TOPO_ES_ADDED || local_event_data->ExpStatus == @@ -5045,7 +5043,9 @@ _scsih_sas_topology_change_event(struct MPT3SAS_ADAPTER *ioc, u64 sas_address; unsigned long flags; u8 link_rate, prev_link_rate; - Mpi2EventDataSasTopologyChangeList_t *event_data = fw_event->event_data; + Mpi2EventDataSasTopologyChangeList_t *event_data = + (Mpi2EventDataSasTopologyChangeList_t *) + fw_event->event_data; #ifdef CONFIG_SCSI_MPT3SAS_LOGGING if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK) @@ -5243,7 +5243,8 @@ _scsih_sas_device_status_change_event(struct MPT3SAS_ADAPTER *ioc, u64 sas_address; unsigned long flags; Mpi2EventDataSasDeviceStatusChange_t *event_data = - fw_event->event_data; + (Mpi2EventDataSasDeviceStatusChange_t *) + fw_event->event_data; #ifdef CONFIG_SCSI_MPT3SAS_LOGGING if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK) @@ -5339,6 +5340,7 @@ _scsih_sas_enclosure_dev_status_change_event(struct MPT3SAS_ADAPTER *ioc, #ifdef CONFIG_SCSI_MPT3SAS_LOGGING if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK) _scsih_sas_enclosure_dev_status_change_event_debug(ioc, +(Mpi2EventDataSasEnclDevStatusChange_t *) fw_event->event_data); #endif } @@ -5363,7 +5365,9 @@ _scsih_sas_broadcast_primitive_event(struct MPT3SAS_ADAPTER *ioc, u32 termination_count; u32 query_count; Mpi2SCSITaskManagementReply_t *mpi_reply; - Mpi2EventDataSasBroadcastPrimitive_t *event_data = fw_event->event_data; + Mpi2EventDataSasBroadcastPrimitive_t *event_data = + (Mpi2EventDataSasBroadcastPrimitive_t *) + fw_event->event_data; u16 ioc_status; unsigned long flags; int r; @@ -5515,7 +5519,8 @@ static void _scsih_sas_discovery_event(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work *fw_event) { - Mpi2EventDataSasDiscovery_t *event_data = fw_event->event_data; + Mpi2EventDataSasDiscovery_t *event_data = + (Mpi2EventDataSasDiscovery_t *) fw_event->event_data; #ifdef CONFIG_SCSI_MPT3SAS_LOGGING if (ioc->logging_
[PATCH v2 03/11] mpt2sas: annotate ioc->reply_post_host_index as __iomem
The MPT2SAS_ADAPTER reply_post_host_index[] holds calculated addresses in memory mapped register space. Add an "__iomem" annotation to silence the following sparse warnings: drivers/scsi/mpt2sas/mpt2sas_base.c:1006:43: warning: incorrect type in argument 2 (different address spaces) expected void volatile [noderef] *addr got unsigned long long [usertype] * drivers/scsi/mpt2sas/mpt2sas_base.c:4299:22: warning: cast removes address space of expression drivers/scsi/mpt2sas/mpt2sas_base.c:4303:27: warning: cast removes address space of expression Signed-off-by: Joe Lawrence Cc: Dan Carpenter Reviewed-by: Christoph Hellwig Acked-by: Sreekanth Reddy --- drivers/scsi/mpt2sas/mpt2sas_base.c |9 + drivers/scsi/mpt2sas/mpt2sas_base.h |2 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index 8b88118..a31397c 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -4295,12 +4295,13 @@ mpt2sas_base_attach(struct MPT2SAS_ADAPTER *ioc) goto out_free_resources; if (ioc->is_warpdrive) { - ioc->reply_post_host_index[0] = - (resource_size_t *)&ioc->chip->ReplyPostHostIndex; + ioc->reply_post_host_index[0] = (resource_size_t __iomem *) + &ioc->chip->ReplyPostHostIndex; for (i = 1; i < ioc->cpu_msix_table_sz; i++) - ioc->reply_post_host_index[i] = (resource_size_t *) - ((u8 *)&ioc->chip->Doorbell + (0x4000 + ((i - 1) + ioc->reply_post_host_index[i] = + (resource_size_t __iomem *) + ((u8 __iomem *)&ioc->chip->Doorbell + (0x4000 + ((i - 1) * 4))); } diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h index fd3b998..0ac5815 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.h +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h @@ -837,7 +837,7 @@ struct MPT2SAS_ADAPTER { u8 msix_enable; u16 msix_vector_count; u8 *cpu_msix_table; - resource_size_t **reply_post_host_index; + resource_size_t __iomem **reply_post_host_index; u16 cpu_msix_table_sz; u32 ioc_reset_count; MPT2SAS_FLUSH_RUNNING_CMDS schedule_dead_ioc_flush_running_cmds; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 04/11] mpt3sas: correct scsi_{target,device} hostdata allocation
In _scsih_{slave,target}_alloc, an incorrect structure type is passed to sizeof() when allocating storage for hostdata. Luckily larger structure types were used, so at least the wrong sizes were safe: struct scsi_device (1784 bytes) > struct MPT3SAS_DEVICE (24 bytes) struct scsi_target (760 bytes) > struct MPT3SAS_TARGET (32 bytes) This fixes the following smatch warnings: drivers/scsi/mpt3sas/mpt3sas_scsih.c:1166 _scsih_target_alloc() warn: struct type mismatch 'MPT3SAS_TARGET vs scsi_target' drivers/scsi/mpt3sas/mpt3sas_scsih.c:1280 _scsih_slave_alloc() warn: struct type mismatch 'MPT3SAS_DEVICE vs scsi_device' Signed-off-by: Joe Lawrence Cc: Dan Carpenter Reviewed-by: Christoph Hellwig Acked-by: Sreekanth Reddy --- drivers/scsi/mpt3sas/mpt3sas_scsih.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 18e713d..f3ee3b4 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -1163,7 +1163,8 @@ _scsih_target_alloc(struct scsi_target *starget) unsigned long flags; struct sas_rphy *rphy; - sas_target_priv_data = kzalloc(sizeof(struct scsi_target), GFP_KERNEL); + sas_target_priv_data = kzalloc(sizeof(*sas_target_priv_data), + GFP_KERNEL); if (!sas_target_priv_data) return -ENOMEM; @@ -1277,7 +1278,8 @@ _scsih_slave_alloc(struct scsi_device *sdev) struct _sas_device *sas_device; unsigned long flags; - sas_device_priv_data = kzalloc(sizeof(struct scsi_device), GFP_KERNEL); + sas_device_priv_data = kzalloc(sizeof(*sas_device_priv_data), + GFP_KERNEL); if (!sas_device_priv_data) return -ENOMEM; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 01/11] mpt2sas: correct scsi_{target,device} hostdata allocation
In _scsih_{slave,target}_alloc, an incorrect structure type is passed to sizeof() when allocating storage for hostdata. Luckily larger structure types were used, so at least the wrong sizes were safe: struct scsi_device (1784 bytes) > struct MPT2SAS_DEVICE (24 bytes) struct scsi_target (760 bytes) > struct MPT2SAS_TARGET (40 bytes) This fixes the following smatch warnings: drivers/scsi/mpt2sas/mpt2sas_scsih.c:1295 _scsih_target_alloc() warn: struct type mismatch 'MPT2SAS_TARGET vs scsi_target' drivers/scsi/mpt2sas/mpt2sas_scsih.c:1409 _scsih_slave_alloc() warn: struct type mismatch 'MPT2SAS_DEVICE vs scsi_device' Signed-off-by: Joe Lawrence Cc: Dan Carpenter Reviewed-by: Christoph Hellwig Acked-by: Sreekanth Reddy --- drivers/scsi/mpt2sas/mpt2sas_scsih.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c index 5055f92..13e49c3 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c @@ -1292,7 +1292,8 @@ _scsih_target_alloc(struct scsi_target *starget) unsigned long flags; struct sas_rphy *rphy; - sas_target_priv_data = kzalloc(sizeof(struct scsi_target), GFP_KERNEL); + sas_target_priv_data = kzalloc(sizeof(*sas_target_priv_data), + GFP_KERNEL); if (!sas_target_priv_data) return -ENOMEM; @@ -1406,7 +1407,8 @@ _scsih_slave_alloc(struct scsi_device *sdev) struct _sas_device *sas_device; unsigned long flags; - sas_device_priv_data = kzalloc(sizeof(struct scsi_device), GFP_KERNEL); + sas_device_priv_data = kzalloc(sizeof(*sas_device_priv_data), + GFP_KERNEL); if (!sas_device_priv_data) return -ENOMEM; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 00/11] mptfusion/mpt2/mpt3 static checker fixups
v2: Combined the mptfusion and mpt{2,3}sas static checker patches, re-ordering them by driver. Updated Reviewed-by and Acked-by tags as well as Sreekanth's email address. Based off v3.16-rc2, compile tested. patches dropped: * mpt3sas: fix possible memory leak in mpt3sas_send_trigger_data_event Christoph suggested combining this into a single allocation, so this patch was transformed into two new patches (removing the Reviewed-by and Acked-by tags): mpt2sas-combine-fw_event_work-and-its-event_data.patch mpt3sas-combine-fw_event_work-and-its-event_data.patch * mptfusion: initChainBuffers should return errno Christoph noted that the whole callchain uses -1 instead of errno. Let it be. patches modified: * mptfusion: zero kernel-space source of copy_to_user A static checker false-positive brought me here, Christoph suggested using memdup_user. * mptfusion: combine fw_event_work and its event_data Remove the unnecessary sz variables. * mptfusion: tweak null pointer checks Moved commentry from myself (JL) and Christoph (HCH) into the commit message. In the mpt{2,3}sas-combine-fw_event_work-and-its-event_data patches, I was wondering if the alignment attribute should be: __attribute__ ((aligned (sizeof(unsigned long instead of: char event_data[0] __aligned(4) Regards, Joe Lawrence (11): mpt2sas: correct scsi_{target,device} hostdata allocation mpt2sas: combine fw_event_work and its event_data mpt2sas: annotate ioc->reply_post_host_index as __iomem mpt3sas: correct scsi_{target,device} hostdata allocation mpt3sas: combine fw_event_work and its event_data mptfusion: mark file-private functions as static mptfusion: remove redundant kfree checks mptfusion: use memdup_user mptfusion: make adapter prod_name[] a pointer mptfusion: combine fw_event_work and its event_data mptfusion: tweak null pointer checks drivers/message/fusion/mptbase.c | 23 +-- drivers/message/fusion/mptbase.h |2 +- drivers/message/fusion/mptctl.c | 18 +++-- drivers/message/fusion/mptfc.c |3 +- drivers/message/fusion/mptsas.c | 74 -- drivers/message/fusion/mptsas.h |2 +- drivers/message/fusion/mptscsih.c| 19 - drivers/message/fusion/mptspi.c |5 +-- drivers/scsi/mpt2sas/mpt2sas_base.c |9 +++-- drivers/scsi/mpt2sas/mpt2sas_base.h |2 +- drivers/scsi/mpt2sas/mpt2sas_scsih.c | 58 +++--- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 62 +++- 12 files changed, 137 insertions(+), 140 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] mptfusion static checker fixups
Hi Christoph, Sreekanth, Would it be easier to combine this set with the mpt2/mpt3 changes and repost a V2? There were a few changes Christoph request in each set, so I can post those up later today. -- Joe On Wed, 25 Jun 2014 13:01:50 +0200 Christoph Hellwig wrote: > Can I get another review for this series? > > On Wed, Jun 04, 2014 at 12:49:42PM -0400, Joe Lawrence wrote: > > While reviewing the mpt2/mpt3 static checker fixup patchset, Christoph > > inquired about mptfusion. None of the sparse / smatch warnings from the > > earlier patchset directly apply to fusion, but there were a few easy to > > fix warnings (compile tested only). > > > > The patchset is ordered from the smallest/easiest change up to the last > > three, which are bit more involved and should be reviewed by LSI, > > especially the last one "mptfusion: tweak null pointer checks". See the > > commentary in those patches after the signed-off-by line. > > > > Cc: Christoph Hellwig > > Cc: Dan Carpenter > > Cc: Sreekanth Reddy > > > > Joe Lawrence (7): > > mptfusion: mark file-private functions as static > > mptfusion: remove redundant kfree checks > > mptfusion: initChainBuffers should return errno > > mptfusion: zero kernel-space source of copy_to_user > > mptfusion: make adapter prod_name[] a pointer > > mptfusion: combine fw_event_work and its event_data > > mptfusion: tweak null pointer checks > > > > drivers/message/fusion/mptbase.c | 29 +-- > > drivers/message/fusion/mptbase.h |2 +- > > drivers/message/fusion/mptctl.c |2 +- > > drivers/message/fusion/mptfc.c|3 +- > > drivers/message/fusion/mptsas.c | 57 > > +++-- > > drivers/message/fusion/mptsas.h |2 +- > > drivers/message/fusion/mptscsih.c |7 +++-- > > drivers/message/fusion/mptspi.c |5 ++-- > > 8 files changed, 52 insertions(+), 55 deletions(-) > > > > -- > > 1.7.10.4 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > ---end quoted text--- > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] qla2xxx: Prevent probe and board_disable race
The PCI register read checking introduced in commit f3ddac19 "qla2xxx: Disable adapter when we encounter a PCI disconnect" is active during driver probe. Hold off scheduling any board removal until the driver probe has completed. This ensures that the the board_disable work structure is initialized and more importantly, avoids racing qla2x00_probe_one. Signed-off-by: Joe Lawrence --- drivers/scsi/qla2xxx/qla_def.h |1 + drivers/scsi/qla2xxx/qla_isr.c |3 ++- drivers/scsi/qla2xxx/qla_os.c |2 ++ 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h index 7c441c9..253a89d 100644 --- a/drivers/scsi/qla2xxx/qla_def.h +++ b/drivers/scsi/qla2xxx/qla_def.h @@ -3405,6 +3405,7 @@ typedef struct scsi_qla_host { unsigned long pci_flags; #define PFLG_DISCONNECTED 0 /* PCI device removed */ #define PFLG_DRIVER_REMOVING 1 /* PCI driver .remove */ +#define PFLG_DRIVER_PROBING2 /* PCI driver .probe */ uint32_tdevice_flags; #define SWITCH_FOUND BIT_0 diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index ee5eef4..4ea8252 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -118,7 +118,8 @@ qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg) /* Check for PCI disconnection */ if (reg == 0x) { if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags) && - !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags)) { + !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags) && + !test_bit(PFLG_DRIVER_PROBING, &vha->pci_flags)) { /* * Schedule this (only once) on the default system * workqueue so that all the adapter workqueues and the diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 51cba37..0b39425 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -2629,6 +2629,7 @@ qla2x00_probe_one(struct pci_dev *pdev, const struct pci_device_id *id) } pci_set_drvdata(pdev, base_vha); + set_bit(PFLG_DRIVER_PROBING, &base_vha->pci_flags); host = base_vha->host; base_vha->req = req; @@ -2920,6 +2921,7 @@ skip_dpc: qlt_add_target(ha, base_vha); + clear_bit(PFLG_DRIVER_PROBING, &base_vha->pci_flags); return 0; probe_init_failed: -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] qla2xxx: Prevent removal and board_disable race
Introduce mutual exclusion between the qla2xxx_remove_one PCI driver callback and qla2x00_disable_board_on_pci_error, which is scheduled as board_disable work by qla2x00_check_reg{32,16}_for_disconnect: * Leave the driver-specific data attached to the underlying PCI device intact in qla2x00_disable_board_on_pci_error, so that qla2x00_remove_one has enough breadcrumbs to determine that any board_disable work has been completed. * In qla2xxx_remove_one, set a bit to prevent any subsequent board_disable work from scheduling, then cancel and wait until pending work has completed. * Reuse the PCI device enable count check in qla2x00_remove_one to determine if board_disable has occured. The original purpose of this check was unnecessary since the driver remove function wasn't called when the probe fails. Signed-off-by: Joe Lawrence --- drivers/scsi/qla2xxx/qla_def.h |1 + drivers/scsi/qla2xxx/qla_isr.c |3 ++- drivers/scsi/qla2xxx/qla_os.c | 31 +++ 3 files changed, 22 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h index 1267b11..7c441c9 100644 --- a/drivers/scsi/qla2xxx/qla_def.h +++ b/drivers/scsi/qla2xxx/qla_def.h @@ -3404,6 +3404,7 @@ typedef struct scsi_qla_host { unsigned long pci_flags; #define PFLG_DISCONNECTED 0 /* PCI device removed */ +#define PFLG_DRIVER_REMOVING 1 /* PCI driver .remove */ uint32_tdevice_flags; #define SWITCH_FOUND BIT_0 diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index 2741ad8..ee5eef4 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -117,7 +117,8 @@ qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg) { /* Check for PCI disconnection */ if (reg == 0x) { - if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags)) { + if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags) && + !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags)) { /* * Schedule this (only once) on the default system * workqueue so that all the adapter workqueues and the diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 39c9953..51cba37 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -3123,15 +3123,25 @@ qla2x00_remove_one(struct pci_dev *pdev) scsi_qla_host_t *base_vha; struct qla_hw_data *ha; + base_vha = pci_get_drvdata(pdev); + ha = base_vha->hw; + + /* Indicate device removal to prevent future board_disable and wait +* until any pending board_disable has completed. */ + set_bit(PFLG_DRIVER_REMOVING, &base_vha->pci_flags); + cancel_work_sync(&ha->board_disable); + /* -* If the PCI device is disabled that means that probe failed and any -* resources should be have cleaned up on probe exit. +* If the PCI device is disabled then there was a PCI-disconnect and +* qla2x00_disable_board_on_pci_error has taken care of most of the +* resources. */ - if (!atomic_read(&pdev->enable_cnt)) + if (!atomic_read(&pdev->enable_cnt)) { + scsi_host_put(base_vha->host); + kfree(ha); + pci_set_drvdata(pdev, NULL); return; - - base_vha = pci_get_drvdata(pdev); - ha = base_vha->hw; + } qla2x00_wait_for_hba_ready(base_vha); @@ -4791,18 +4801,15 @@ qla2x00_disable_board_on_pci_error(struct work_struct *work) qla82xx_md_free(base_vha); qla2x00_free_queues(ha); - scsi_host_put(base_vha->host); - qla2x00_unmap_iobases(ha); pci_release_selected_regions(ha->pdev, ha->bars); - kfree(ha); - ha = NULL; - pci_disable_pcie_error_reporting(pdev); pci_disable_device(pdev); - pci_set_drvdata(pdev, NULL); + /* +* Let qla2x00_remove_one cleanup qla_hw_data on device removal. +*/ } /** -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] qla2xxx: Collect PCI register checks and board_disable scheduling
Add an uint16_t variant of qla2x00_check_reg_for_disconnect and use these routines to check and schedule a PCI-disconnected board from a centralized place. Signed-off-by: Joe Lawrence --- drivers/scsi/qla2xxx/qla_gbl.h |3 ++- drivers/scsi/qla2xxx/qla_isr.c | 28 +--- drivers/scsi/qla2xxx/qla_mr.c |2 +- drivers/scsi/qla2xxx/qla_nx.c |6 +++--- drivers/scsi/qla2xxx/qla_os.c |8 +--- 5 files changed, 20 insertions(+), 27 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_gbl.h b/drivers/scsi/qla2xxx/qla_gbl.h index d48dea8..a60c1b8 100644 --- a/drivers/scsi/qla2xxx/qla_gbl.h +++ b/drivers/scsi/qla2xxx/qla_gbl.h @@ -475,7 +475,8 @@ extern uint8_t *qla25xx_read_nvram_data(scsi_qla_host_t *, uint8_t *, uint32_t, extern int qla25xx_write_nvram_data(scsi_qla_host_t *, uint8_t *, uint32_t, uint32_t); extern int qla2x00_is_a_vp_did(scsi_qla_host_t *, uint32_t); -bool qla2x00_check_reg_for_disconnect(scsi_qla_host_t *, uint32_t); +bool qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *, uint32_t); +bool qla2x00_check_reg16_for_disconnect(scsi_qla_host_t *, uint16_t); extern int qla2x00_beacon_on(struct scsi_qla_host *); extern int qla2x00_beacon_off(struct scsi_qla_host *); diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index a56825c..e1b16ad 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -56,16 +56,8 @@ qla2100_intr_handler(int irq, void *dev_id) vha = pci_get_drvdata(ha->pdev); for (iter = 50; iter--; ) { hccr = RD_REG_WORD(®->hccr); - /* Check for PCI disconnection */ - if (hccr == 0x) { - /* -* Schedule this on the default system workqueue so that -* all the adapter workqueues and the DPC thread can be -* shutdown cleanly. -*/ - schedule_work(&ha->board_disable); + if (qla2x00_check_reg16_for_disconnect(vha, hccr)) break; - } if (hccr & HCCR_RISC_PAUSE) { if (pci_channel_offline(ha->pdev)) break; @@ -121,7 +113,7 @@ qla2100_intr_handler(int irq, void *dev_id) } bool -qla2x00_check_reg_for_disconnect(scsi_qla_host_t *vha, uint32_t reg) +qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg) { /* Check for PCI disconnection */ if (reg == 0x) { @@ -136,6 +128,12 @@ qla2x00_check_reg_for_disconnect(scsi_qla_host_t *vha, uint32_t reg) return false; } +bool +qla2x00_check_reg16_for_disconnect(scsi_qla_host_t *vha, uint16_t reg) +{ + return qla2x00_check_reg32_for_disconnect(vha, 0x | reg); +} + /** * qla2300_intr_handler() - Process interrupts for the ISP23xx and ISP63xx. * @irq: @@ -174,7 +172,7 @@ qla2300_intr_handler(int irq, void *dev_id) vha = pci_get_drvdata(ha->pdev); for (iter = 50; iter--; ) { stat = RD_REG_DWORD(®->u.isp2300.host_status); - if (qla2x00_check_reg_for_disconnect(vha, stat)) + if (qla2x00_check_reg32_for_disconnect(vha, stat)) break; if (stat & HSR_RISC_PAUSED) { if (unlikely(pci_channel_offline(ha->pdev))) @@ -2633,7 +2631,7 @@ qla24xx_intr_handler(int irq, void *dev_id) vha = pci_get_drvdata(ha->pdev); for (iter = 50; iter--; ) { stat = RD_REG_DWORD(®->host_status); - if (qla2x00_check_reg_for_disconnect(vha, stat)) + if (qla2x00_check_reg32_for_disconnect(vha, stat)) break; if (stat & HSRX_RISC_PAUSED) { if (unlikely(pci_channel_offline(ha->pdev))) @@ -2723,7 +2721,7 @@ qla24xx_msix_rsp_q(int irq, void *dev_id) * we process the response queue. */ stat = RD_REG_DWORD(®->host_status); - if (qla2x00_check_reg_for_disconnect(vha, stat)) + if (qla2x00_check_reg32_for_disconnect(vha, stat)) goto out; qla24xx_process_response_queue(vha, rsp); if (!ha->flags.disable_msix_handshake) { @@ -2763,7 +2761,7 @@ qla25xx_msix_rsp_q(int irq, void *dev_id) hccr = RD_REG_DWORD_RELAXED(®->hccr); spin_unlock_irqrestore(&ha->hardware_lock, flags); } - if (qla2x00_check_reg_for_disconnect(vha, hccr)) + if (qla2x00_check_reg32_for_disconnect(vha, hccr)) goto out; queue_work_on((int) (rsp->id - 1), ha->wq, &rsp->q_work); @@ -2798,7 +2796,7 @@ qla24xx_msix_default(int irq, void *dev_id) vha = pci_get_drvdata(ha->pdev); do { stat = RD_REG_DWORD(®
[PATCH 2/6] qla2xxx: Use qla2x00_clear_drv_active on probe failure
Take advantage of commit fe1b806f "qla2xxx: Refactor shutdown code so some functionality can be reused" to remove an inlined copy of qla2x00_clear_drv_active in the driver's probe hardware error path. Signed-off-by: Joe Lawrence --- drivers/scsi/qla2xxx/qla_os.c | 13 +++-- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 6a512b7..31913bb 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -238,6 +238,7 @@ static int qla2xxx_eh_host_reset(struct scsi_cmnd *); static int qla2x00_change_queue_depth(struct scsi_device *, int, int); static int qla2x00_change_queue_type(struct scsi_device *, int); +static void qla2x00_clear_drv_active(struct qla_hw_data *); static void qla2x00_free_device(scsi_qla_host_t *); struct scsi_host_template qla2xxx_driver_template = { @@ -2946,16 +2947,8 @@ probe_failed: scsi_host_put(base_vha->host); probe_hw_failed: - if (IS_QLA82XX(ha)) { - qla82xx_idc_lock(ha); - qla82xx_clear_drv_active(ha); - qla82xx_idc_unlock(ha); - } - if (IS_QLA8044(ha)) { - qla8044_idc_lock(ha); - qla8044_clear_drv_active(ha); - qla8044_idc_unlock(ha); - } + qla2x00_clear_drv_active(ha); + iospace_config_failed: if (IS_P3P_TYPE(ha)) { if (!ha->nx_pcibase) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] qla2xxx device removal fixups
Hi Chad, Giri, et al. Stratus has been testing the upstream qla2xxx driver against surprise device removal and has found a few minor issues along the way. With this patchset, results have been good. Although the following changes may be most interesting on a Stratus platform, they should be applicable to general hotplug scenarios on other hardware. The first two patches are independent from the others and can be reviewed as such. One fixes up a use-after-free and the other consolidates some code. The final four patches are more invasive and modify the scsi_qla_host structure to avoid device removal race conditions. These changes were written to demonstrate the problem at hand and minimally fix them in order to continue testing. (For example, adding completely a new pci_flags member to scsi_qla_host is probably overkill.) We would welcome any feedback or questions about our testing. Regards, -- Joe Joe Lawrence (6): qla2xxx: Fix shost use-after-free on device removal scsi_qla_host *base_vha is allocated as part of the Scsi_Host private hostdata[] by qla2x00_create_host. When the last reference on the host is put by qla2x00_remove_one, it's released along with any hostdata[], rendering base_vha unsafe to dereference. To safely complete the adapter cleanup in qla2x00_remove_one, use the scsi_qla_host_t *qla_hw_data pointer that is already in hand. To reproduce, set kernel boot option slub_debug=FZPU and remove an adapter instance. qla2xxx: Use qla2x00_clear_drv_active on probe failure Clean up some duplicate code along the way. qla2xxx: Collect PCI register checks and board_disable scheduling PCI register read checks are performed throughout the driver to detect disconnected hardware. There is currently a function that verifies 32-bit values and schedules board_removal if needed. Other 16-bit registers are checked and board_removal scheduling scattered around the driver. This change pulls all of these together such that a new 16-bit register check function wrappers the existing 32-bit version and centralizes the scheduling of board_disable work to a single invocation. The following patches depend upon this change. qla2xxx: Schedule board_disable only once In automated hot-plug device removal testing, occasionally qla2x00_disable_board_on_pci_error would run twice. I believe this occurred by a second qla2x00_check_reg_for_disconnect scheduling board_disable while the first instance was still running. The easiest solution seemed to be adding a per-adapter flag that could be test_and_set before scheduling board_disable. qla2xxx: Prevent removal and board_disable race This race was discovered through many iterations of automated PCI hotplug. The automated test inserts faults on the Stratus platform to simulate device removal. After removal, it re-adds the devices, simulating PCI hotplug insertion. Rinse, repeat. The race occurs when a Stratus proprietary hotplug driver detects that PCI devices have gone missing and calls into the kernel PCI subsystem to remove them. At nearly the same time, the qla2xxx driver figures out the same issue and schedules a board_disable. It takes a few hundred device removals to hit, but it seemed that the problem was that qla2x00_disable_board_on_pci_error started, then qla2xxx_remove_one was invoked, so one of the two started touching already freed resources. Although the bug manifested on a Stratus platform running modified drivers, the window for qla2xxx_remove_one to race qla2x00_disable_board_on_pci_error is still present if running stock kernel/drivers on commodity hardware. It may be difficult to hit, but one could theoretically invoke PCI device removal via sysfs (or simply unload the qla2xxx driver) and for board_disable to run at the same time. The fix leaves the Scsi_Host intact during board_disable so that qla2x00_remove_one can safely stop the main timer and cancel_work_sync on any outstanding board_disable work. A PFLG_DRIVER_REMOVING bit is also set to prevent any other board_disable instances from scheduling. Once the asynchronous players are out of the way, qla2x00_remove_one can move forward cleaning up whatever remaining resources are still attached. The PCI device enable count check in qla2x00_remove_one was re-purposed to determine if board_disable had run. Its original intention to detect qla2x00_probe_one failure was invalid. As long as the driver .probe routine returns an -ERRNO, then its counterpart .remove routine is *not* called. qla2xxx: Prevent probe and board_disable race An automated Stratus device removal test hotplug removes devices shortly during/after their initialization. The test inserts devices, then proceeds to remove th
[PATCH 4/6] qla2xxx: Schedule board_disable only once
There are various callers of qla2x00_check_reg{32,16}_for_disconnect that may schedule board removal on PCI-disconnect. Test-and-set a dedicated flag before scheduling board_disable so it is invoked only once. Signed-off-by: Joe Lawrence --- drivers/scsi/qla2xxx/qla_def.h |3 +++ drivers/scsi/qla2xxx/qla_isr.c | 14 -- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h index 1fa0104..1267b11 100644 --- a/drivers/scsi/qla2xxx/qla_def.h +++ b/drivers/scsi/qla2xxx/qla_def.h @@ -3402,6 +3402,9 @@ typedef struct scsi_qla_host { #define FX00_CRITEMP_RECOVERY 25 #define FX00_HOST_INFO_RESEND 26 + unsigned long pci_flags; +#define PFLG_DISCONNECTED 0 /* PCI device removed */ + uint32_tdevice_flags; #define SWITCH_FOUND BIT_0 #define DFLG_NO_CABLE BIT_1 diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index e1b16ad..2741ad8 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -117,12 +117,14 @@ qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg) { /* Check for PCI disconnection */ if (reg == 0x) { - /* -* Schedule this on the default system workqueue so that all the -* adapter workqueues and the DPC thread can be shutdown -* cleanly. -*/ - schedule_work(&vha->hw->board_disable); + if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags)) { + /* +* Schedule this (only once) on the default system +* workqueue so that all the adapter workqueues and the +* DPC thread can be shutdown cleanly. +*/ + schedule_work(&vha->hw->board_disable); + } return true; } else return false; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] qla2xxx: Fix shost use-after-free on device removal
Once calling scsi_host_put, be careful to not access qla_hw_data through the Scsi_Host private data (ie, scsi_qla_host base_vha). Signed-off-by: Joe Lawrence --- drivers/scsi/qla2xxx/qla_os.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 5a430c7..6a512b7 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -3111,10 +3111,8 @@ qla2x00_unmap_iobases(struct qla_hw_data *ha) } static void -qla2x00_clear_drv_active(scsi_qla_host_t *vha) +qla2x00_clear_drv_active(struct qla_hw_data *ha) { - struct qla_hw_data *ha = vha->hw; - if (IS_QLA8044(ha)) { qla8044_idc_lock(ha); qla8044_clear_drv_active(ha); @@ -3185,7 +3183,7 @@ qla2x00_remove_one(struct pci_dev *pdev) scsi_host_put(base_vha->host); - qla2x00_clear_drv_active(base_vha); + qla2x00_clear_drv_active(ha); qla2x00_unmap_iobases(ha); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] mptfusion: combine fw_event_work and its event_data
On Thu, 5 Jun 2014 02:34:29 -0700 Christoph Hellwig wrote: > > diff --git a/drivers/message/fusion/mptsas.h > > b/drivers/message/fusion/mptsas.h > > index 57e86ab..c396483 100644 > > --- a/drivers/message/fusion/mptsas.h > > +++ b/drivers/message/fusion/mptsas.h > > @@ -110,7 +110,7 @@ struct fw_event_work { > > MPT_ADAPTER *ioc; > > u32 event; > > u8 retries; > > - u8 __attribute__((aligned(4))) event_data[1]; > > + charevent_data[0] __aligned(4); > > }; Is this alignment necessary and if so, should it be more like: __attribute__ ((aligned (sizeof(unsigned long > > > > - sz = offsetof(struct fw_event_work, event_data) + > > - sizeof(MpiEventDataSasDeviceStatusChange_t); > > + sz = sizeof(*fw_event) + > > + sizeof(MpiEventDataSasDeviceStatusChange_t); > > fw_event = kzalloc(sz, GFP_ATOMIC); > > Seems like there is no point in keeping the sz variable here and at > the other occurances. Not that it really matters, but if we make a pass > over this code we might as well fix that up, too. I'll take a look and post up a v2 next week, adding these to a v2 mpt2/mpt3 set. Thanks, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7] mptfusion: zero kernel-space source of copy_to_user
On Wed, 4 Jun 2014 12:49:46 -0400 Joe Lawrence wrote: > Fixes the following smatch warning: > > drivers/message/fusion/mptctl.c:1369 mptctl_getiocinfo() warn: > possible info leak 'karg' > > Signed-off-by: Joe Lawrence > Cc: Christoph Hellwig > Cc: Dan Carpenter > Cc: Sreekanth Reddy > --- > drivers/message/fusion/mptctl.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/message/fusion/mptctl.c b/drivers/message/fusion/mptctl.c > index dcc8385..e6d8935 100644 > --- a/drivers/message/fusion/mptctl.c > +++ b/drivers/message/fusion/mptctl.c > @@ -1261,7 +1261,7 @@ mptctl_getiocinfo (unsigned long arg, unsigned int > data_size) > else > return -EFAULT; > > - karg = kmalloc(data_size, GFP_KERNEL); > + karg = kzalloc(data_size, GFP_KERNEL); > if (karg == NULL) { > printk(KERN_ERR MYNAM "%s::mpt_ioctl_iocinfo() @%d - no memory > available!\n", > __FILE__, __LINE__); Hi Dan, kzalloc silenced that smatch warning, but the code looks like: (calculate data_size) ... karg = kmalloc(data_size, GFP_KERNEL); ... if (copy_from_user(karg, uarg, data_size)) { ... if (copy_to_user((char __user *)arg, karg, data_size)) { where 'data_size' once calculated, is unchanged. Since the size allocated is the same copied from the user and the same copied back out to the user, would this really be considered an info leak? Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/7] mptfusion: tweak null pointer checks
Fixes the following smatch warnings: drivers/message/fusion/mptbase.c:652 mptbase_reply() warn: variable dereferenced before check 'reply' (see line 639) drivers/message/fusion/mptsas.c:1255 mptsas_taskmgmt_complete() error: we previously assumed 'pScsiTmReply' could be null (see line 1227) drivers/message/fusion/mptsas.c:3888 mptsas_not_responding_devices() error: we previously assumed 'port_info->phy_info' could be null (see line 3875) drivers/message/fusion/mptscsih.c:1284 mptscsih_info() error: we previously assumed 'h' could be null (see line 1274) drivers/message/fusion/mptscsih.c:1388 mptscsih_qcmd() error: we previously assumed 'vdevice' could be null (see line 1373) Signed-off-by: Joe Lawrence Cc: Christoph Hellwig Cc: Dan Carpenter Cc: Sreekanth Reddy --- JL: Comments on the above warnings, in order: No-brainer, the enclosing switch statement dereferences 'reply', so we can't get here unless 'reply' is valid. "Retry target reset" needs to know the target ID and channel, so enclose that entire block inside a if (pScsiTmReply) block. The code before the loop checks for 'port_info->phy_info', but not many other places in the driver bother. Not sure what to do here. 'hostdata' is checked and then immediately dereferenced after that block. How about a default return string of "N/A" to indicate one isn't available? Not sure what to do here. 'vdevice' is needed to "set up the message frame" target ID and channel, so it should probably return an error in in this case. drivers/message/fusion/mptbase.c | 10 -- drivers/message/fusion/mptsas.c | 27 ++- drivers/message/fusion/mptscsih.c |7 --- 3 files changed, 22 insertions(+), 22 deletions(-) diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c index 03263d6..c68a951 100644 --- a/drivers/message/fusion/mptbase.c +++ b/drivers/message/fusion/mptbase.c @@ -649,12 +649,10 @@ mptbase_reply(MPT_ADAPTER *ioc, MPT_FRAME_HDR *req, MPT_FRAME_HDR *reply) case MPI_FUNCTION_CONFIG: case MPI_FUNCTION_SAS_IO_UNIT_CONTROL: ioc->mptbase_cmds.status |= MPT_MGMT_STATUS_COMMAND_GOOD; - if (reply) { - ioc->mptbase_cmds.status |= MPT_MGMT_STATUS_RF_VALID; - memcpy(ioc->mptbase_cmds.reply, reply, - min(MPT_DEFAULT_FRAME_SIZE, - 4 * reply->u.reply.MsgLength)); - } + ioc->mptbase_cmds.status |= MPT_MGMT_STATUS_RF_VALID; + memcpy(ioc->mptbase_cmds.reply, reply, + min(MPT_DEFAULT_FRAME_SIZE, + 4 * reply->u.reply.MsgLength)); if (ioc->mptbase_cmds.status & MPT_MGMT_STATUS_PENDING) { ioc->mptbase_cmds.status &= ~MPT_MGMT_STATUS_PENDING; complete(&ioc->mptbase_cmds.done); diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index 35c87b1..72e464fc 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -1252,17 +1252,19 @@ mptsas_taskmgmt_complete(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf, MPT_FRAME_HDR *mr) ioc->name, jiffies_to_msecs(jiffies - target_reset_list->time_count)/1000)); - id = pScsiTmReply->TargetID; - channel = pScsiTmReply->Bus; - target_reset_list->time_count = jiffies; + if (pScsiTmReply) { + id = pScsiTmReply->TargetID; + channel = pScsiTmReply->Bus; + target_reset_list->time_count = jiffies; - /* -* retry target reset -*/ - if (!target_reset_list->target_reset_issued) { - if (mptsas_target_reset(ioc, channel, id)) - target_reset_list->target_reset_issued = 1; - return 1; + /* +* retry target reset +*/ + if (!target_reset_list->target_reset_issued) { + if (mptsas_target_reset(ioc, channel, id)) + target_reset_list->target_reset_issued = 1; + return 1; + } } /* @@ -3872,9 +3874,8 @@ retry_page: redo_expander_scan: list_for_each_entry(port_info, &ioc->sas_topology, list) { - if (port_info->phy_info && - (!(port_info->phy_info[0].identify.device_info & - MPI_SAS_DEVICE_INFO_SMP_TARGET))) + if (!(port_info->phy_info[0].identify.device_info & + MPI_SAS_DEVICE_INFO_SMP_TARGET))
[PATCH 6/7] mptfusion: combine fw_event_work and its event_data
Tack the firmware reply event_data payload to the end of its corresponding struct fw_event_work allocation. Rework fw_event_work allocation calculations to include the event_data size where appropriate. This clarifies the code a bit and avoids the following smatch warnings: drivers/message/fusion/mptsas.c:1003 mptsas_queue_device_delete() error: memcpy() 'fw_event->event_data' too small (29 vs 36) drivers/message/fusion/mptsas.c:1017 mptsas_queue_rescan() error: not allocating enough data 168 vs 160 Signed-off-by: Joe Lawrence Cc: Christoph Hellwig Cc: Dan Carpenter Cc: Sreekanth Reddy --- JL: In the existing code, struct fw_event_work includes an aligned char event_data[1] and when event data payload is required, a larger number of bytes is allocated. This is calculated by subtracting the offsetof(struct fw_event_work, event_data) from the sizeof(struct fw_event_work) and adding the required payload size. Was a single byte required to guarantee the alignment? If not, it seems clearer to just use a 0-length array and avoid the offsetof part. drivers/message/fusion/mptsas.c | 24 drivers/message/fusion/mptsas.h |2 +- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index 8bb580d..35c87b1 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -992,15 +992,15 @@ mptsas_queue_device_delete(MPT_ADAPTER *ioc, struct fw_event_work *fw_event; int sz; - sz = offsetof(struct fw_event_work, event_data) + - sizeof(MpiEventDataSasDeviceStatusChange_t); + sz = sizeof(*fw_event) + + sizeof(MpiEventDataSasDeviceStatusChange_t); fw_event = kzalloc(sz, GFP_ATOMIC); if (!fw_event) { printk(MYIOC_s_WARN_FMT "%s: failed at (line=%d)\n", ioc->name, __func__, __LINE__); return; } - memcpy(fw_event->event_data, sas_event_data, + memcpy(&fw_event->event_data, sas_event_data, sizeof(MpiEventDataSasDeviceStatusChange_t)); fw_event->event = MPI_EVENT_SAS_DEVICE_STATUS_CHANGE; fw_event->ioc = ioc; @@ -1013,7 +1013,7 @@ mptsas_queue_rescan(MPT_ADAPTER *ioc) struct fw_event_work *fw_event; int sz; - sz = offsetof(struct fw_event_work, event_data); + sz = sizeof(*fw_event); fw_event = kzalloc(sz, GFP_ATOMIC); if (!fw_event) { printk(MYIOC_s_WARN_FMT "%s: failed at (line=%d)\n", @@ -3617,7 +3617,7 @@ mptsas_send_expander_event(struct fw_event_work *fw_event) ioc = fw_event->ioc; expander_data = (MpiEventDataSasExpanderStatusChange_t *) - fw_event->event_data; + &fw_event->event_data; memcpy(&sas_address, &expander_data->SASAddress, sizeof(__le64)); sas_address = le64_to_cpu(sas_address); port_info = mptsas_find_portinfo_by_sas_address(ioc, sas_address); @@ -3694,7 +3694,7 @@ mptsas_send_link_status_event(struct fw_event_work *fw_event) u8 link_rate; ioc = fw_event->ioc; - link_data = (MpiEventDataSasPhyLinkStatus_t *)fw_event->event_data; + link_data = (MpiEventDataSasPhyLinkStatus_t *) &fw_event->event_data; memcpy(&sas_address, &link_data->SASAddress, sizeof(__le64)); sas_address = le64_to_cpu(sas_address); @@ -4043,7 +4043,7 @@ mptsas_handle_queue_full_event(struct fw_event_work *fw_event) ioc = fw_event->ioc; - qfull_data = (EventDataQueueFull_t *)fw_event->event_data; + qfull_data = (EventDataQueueFull_t *) &fw_event->event_data; fw_id = qfull_data->TargetID; fw_channel = qfull_data->Bus; current_depth = le16_to_cpu(qfull_data->CurrentDepth); @@ -4584,7 +4584,7 @@ mptsas_send_sas_event(struct fw_event_work *fw_event) ioc = fw_event->ioc; sas_event_data = (EVENT_DATA_SAS_DEVICE_STATUS_CHANGE *) - fw_event->event_data; + &fw_event->event_data; device_info = le32_to_cpu(sas_event_data->DeviceInfo); if ((device_info & @@ -4652,7 +4652,7 @@ mptsas_send_raid_event(struct fw_event_work *fw_event) RaidPhysDiskPage0_t phys_disk; ioc = fw_event->ioc; - raid_event_data = (EVENT_DATA_RAID *)fw_event->event_data; + raid_event_data = (EVENT_DATA_RAID *) &fw_event->event_data; status = le32_to_cpu(raid_event_data->SettingsStatus); state = (status >> 8) & 0xff; @@ -4950,7 +4950,7 @@ mptsas_send_ir2_event(struct fw_event_work *fw_event) RaidPhysDiskPage0_t phys_disk; ioc = fw_event->ioc; - ir2_data = (MPI_EVENT_DATA_IR2 *)fw_event->event_data; + ir2_data = (M
[PATCH 5/7] mptfusion: make adapter prod_name[] a pointer
The struct _MPT_ADAPTER doesn't need a full copy of the product string, so prod_name can point to the string literal storage that the driver already provides. Avoids the following sparse warning: drivers/message/fusion/mptbase.c:2858 MptDisplayIocCapabilities() warn: this array is probably non-NULL. 'ioc->prod_name' Signed-off-by: Joe Lawrence Cc: Christoph Hellwig Cc: Dan Carpenter Cc: Sreekanth Reddy --- drivers/message/fusion/mptbase.c | 11 +-- drivers/message/fusion/mptbase.h |2 +- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c index a70b873..03263d6 100644 --- a/drivers/message/fusion/mptbase.c +++ b/drivers/message/fusion/mptbase.c @@ -1408,8 +1408,8 @@ mpt_verify_adapter(int iocid, MPT_ADAPTER **iocpp) * in /proc/mpt/summary and /sysfs/class/scsi_host/host/version_product * **/ -static void -mpt_get_product_name(u16 vendor, u16 device, u8 revision, char *prod_name) +static const char* +mpt_get_product_name(u16 vendor, u16 device, u8 revision) { char *product_str = NULL; @@ -1635,8 +1635,7 @@ mpt_get_product_name(u16 vendor, u16 device, u8 revision, char *prod_name) } out: - if (product_str) - sprintf(prod_name, "%s", product_str); + return product_str; } /** @@ -1887,8 +1886,8 @@ mpt_attach(struct pci_dev *pdev, const struct pci_device_id *id) dinitprintk(ioc, printk(MYIOC_s_INFO_FMT "facts @ %p, pfacts[0] @ %p\n", ioc->name, &ioc->facts, &ioc->pfacts[0])); - mpt_get_product_name(pdev->vendor, pdev->device, pdev->revision, -ioc->prod_name); + ioc->prod_name = mpt_get_product_name(pdev->vendor, pdev->device, + pdev->revision); switch (pdev->device) { diff --git a/drivers/message/fusion/mptbase.h b/drivers/message/fusion/mptbase.h index 76c05bc..78c8104 100644 --- a/drivers/message/fusion/mptbase.h +++ b/drivers/message/fusion/mptbase.h @@ -605,7 +605,7 @@ typedef struct _MPT_ADAPTER int id;/* Unique adapter id N {0,1,2,...} */ int pci_irq; /* This irq */ char name[MPT_NAME_LENGTH]; /* "iocN" */ - char prod_name[MPT_NAME_LENGTH];/* "LSIFC9x9" */ + const char *prod_name;/* "LSIFC9x9" */ #ifdef CONFIG_FUSION_LOGGING /* used in mpt_display_event_info */ char evStr[EVENT_DESCR_STR_SZ]; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] mptfusion: mark file-private functions as static
Fixes the following sparse warnings: drivers/message/fusion/mptbase.c:7011:1: warning: symbol 'mpt_SoftResetHandler' was not declared. Should it be static? drivers/message/fusion/mptsas.c:1578:23: warning: symbol 'mptsas_refreshing_device_handles' was not declared. Should it be static? drivers/message/fusion/mptsas.c:3653:24: warning: symbol 'mptsas_expander_add' was not declared. Should it be static? drivers/message/fusion/mptsas.c:5327:1: warning: symbol 'mptsas_shutdown' was not declared. Should it be static? drivers/message/fusion/mptspi.c:624:1: warning: symbol 'mptscsih_quiesce_raid' was not declared. Should it be static? Signed-off-by: Joe Lawrence Cc: Christoph Hellwig Cc: Dan Carpenter Cc: Sreekanth Reddy --- drivers/message/fusion/mptbase.c |2 +- drivers/message/fusion/mptsas.c |6 +++--- drivers/message/fusion/mptspi.c |2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c index 570b18a..5ef3f89 100644 --- a/drivers/message/fusion/mptbase.c +++ b/drivers/message/fusion/mptbase.c @@ -7007,7 +7007,7 @@ EXPORT_SYMBOL(mpt_halt_firmware); * IOC doesn't reply to any outstanding request. This will transfer IOC * to READY state. **/ -int +static int mpt_SoftResetHandler(MPT_ADAPTER *ioc, int sleepFlag) { int rc; diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index 00d339c..8bb580d 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -1575,7 +1575,7 @@ mptsas_del_end_device(MPT_ADAPTER *ioc, struct mptsas_phyinfo *phy_info) mptsas_port_delete(ioc, phy_info->port_details); } -struct mptsas_phyinfo * +static struct mptsas_phyinfo * mptsas_refreshing_device_handles(MPT_ADAPTER *ioc, struct mptsas_devinfo *sas_device) { @@ -3650,7 +3650,7 @@ mptsas_send_expander_event(struct fw_event_work *fw_event) * @handle: * */ -struct mptsas_portinfo * +static struct mptsas_portinfo * mptsas_expander_add(MPT_ADAPTER *ioc, u16 handle) { struct mptsas_portinfo buffer, *port_info; @@ -5323,7 +5323,7 @@ mptsas_probe(struct pci_dev *pdev, const struct pci_device_id *id) return error; } -void +static void mptsas_shutdown(struct pci_dev *pdev) { MPT_ADAPTER *ioc = pci_get_drvdata(pdev); diff --git a/drivers/message/fusion/mptspi.c b/drivers/message/fusion/mptspi.c index 5653e50..ed8de0a 100644 --- a/drivers/message/fusion/mptspi.c +++ b/drivers/message/fusion/mptspi.c @@ -620,7 +620,7 @@ static void mptspi_read_parameters(struct scsi_target *starget) spi_width(starget) = (nego & MPI_SCSIDEVPAGE0_NP_WIDE) ? 1 : 0; } -int +static int mptscsih_quiesce_raid(MPT_SCSI_HOST *hd, int quiesce, u8 channel, u8 id) { MPT_ADAPTER *ioc = hd->ioc; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] mptfusion: remove redundant kfree checks
Fixes the following smatch warnings: drivers/message/fusion/mptfc.c:529 mptfc_target_destroy() info: redundant null check on starget->hostdata calling kfree() drivers/message/fusion/mptspi.c:465 mptspi_target_destroy() info: redundant null check on starget->hostdata calling kfree() Signed-off-by: Joe Lawrence Cc: Christoph Hellwig Cc: Dan Carpenter Cc: Sreekanth Reddy --- drivers/message/fusion/mptfc.c |3 +-- drivers/message/fusion/mptspi.c |3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/message/fusion/mptfc.c b/drivers/message/fusion/mptfc.c index fd75108..4bac2ad 100644 --- a/drivers/message/fusion/mptfc.c +++ b/drivers/message/fusion/mptfc.c @@ -525,8 +525,7 @@ mptfc_target_destroy(struct scsi_target *starget) if (ri) /* better be! */ ri->starget = NULL; } - if (starget->hostdata) - kfree(starget->hostdata); + kfree(starget->hostdata); starget->hostdata = NULL; } diff --git a/drivers/message/fusion/mptspi.c b/drivers/message/fusion/mptspi.c index ed8de0a..7ce9a5e 100644 --- a/drivers/message/fusion/mptspi.c +++ b/drivers/message/fusion/mptspi.c @@ -461,8 +461,7 @@ static int mptspi_target_alloc(struct scsi_target *starget) static void mptspi_target_destroy(struct scsi_target *starget) { - if (starget->hostdata) - kfree(starget->hostdata); + kfree(starget->hostdata); starget->hostdata = NULL; } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] mptfusion: zero kernel-space source of copy_to_user
Fixes the following smatch warning: drivers/message/fusion/mptctl.c:1369 mptctl_getiocinfo() warn: possible info leak 'karg' Signed-off-by: Joe Lawrence Cc: Christoph Hellwig Cc: Dan Carpenter Cc: Sreekanth Reddy --- drivers/message/fusion/mptctl.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/message/fusion/mptctl.c b/drivers/message/fusion/mptctl.c index dcc8385..e6d8935 100644 --- a/drivers/message/fusion/mptctl.c +++ b/drivers/message/fusion/mptctl.c @@ -1261,7 +1261,7 @@ mptctl_getiocinfo (unsigned long arg, unsigned int data_size) else return -EFAULT; - karg = kmalloc(data_size, GFP_KERNEL); + karg = kzalloc(data_size, GFP_KERNEL); if (karg == NULL) { printk(KERN_ERR MYNAM "%s::mpt_ioctl_iocinfo() @%d - no memory available!\n", __FILE__, __LINE__); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] mptfusion: initChainBuffers should return errno
The lone caller of initChainBuffers checks the return code for < 0, so it is safe to appease smatch and return the proper errno value. Fixes the following smatch warnings: drivers/message/fusion/mptbase.c:4328 initChainBuffers() warn: returning -1 instead of -ENOMEM is sloppy drivers/message/fusion/mptbase.c:4335 initChainBuffers() warn: returning -1 instead of -ENOMEM is sloppy drivers/message/fusion/mptbase.c:4402 initChainBuffers() warn: returning -1 instead of -ENOMEM is sloppy Signed-off-by: Joe Lawrence Cc: Christoph Hellwig Cc: Dan Carpenter Cc: Sreekanth Reddy --- drivers/message/fusion/mptbase.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c index 5ef3f89..a70b873 100644 --- a/drivers/message/fusion/mptbase.c +++ b/drivers/message/fusion/mptbase.c @@ -4325,14 +4325,14 @@ initChainBuffers(MPT_ADAPTER *ioc) sz = ioc->req_depth * sizeof(int); mem = kmalloc(sz, GFP_ATOMIC); if (mem == NULL) - return -1; + return -ENOMEM; ioc->ReqToChain = (int *) mem; dinitprintk(ioc, printk(MYIOC_s_DEBUG_FMT "ReqToChain alloc @ %p, sz=%d bytes\n", ioc->name, mem, sz)); mem = kmalloc(sz, GFP_ATOMIC); if (mem == NULL) - return -1; + return -ENOMEM; ioc->RequestNB = (int *) mem; dinitprintk(ioc, printk(MYIOC_s_DEBUG_FMT "RequestNB alloc @ %p, sz=%d bytes\n", @@ -4399,7 +4399,7 @@ initChainBuffers(MPT_ADAPTER *ioc) if (ioc->ChainToChain == NULL) { mem = kmalloc(sz, GFP_ATOMIC); if (mem == NULL) - return -1; + return -ENOMEM; ioc->ChainToChain = (int *) mem; dinitprintk(ioc, printk(MYIOC_s_DEBUG_FMT "ChainToChain alloc @ %p, sz=%d bytes\n", -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/7] mptfusion static checker fixups
While reviewing the mpt2/mpt3 static checker fixup patchset, Christoph inquired about mptfusion. None of the sparse / smatch warnings from the earlier patchset directly apply to fusion, but there were a few easy to fix warnings (compile tested only). The patchset is ordered from the smallest/easiest change up to the last three, which are bit more involved and should be reviewed by LSI, especially the last one "mptfusion: tweak null pointer checks". See the commentary in those patches after the signed-off-by line. Cc: Christoph Hellwig Cc: Dan Carpenter Cc: Sreekanth Reddy Joe Lawrence (7): mptfusion: mark file-private functions as static mptfusion: remove redundant kfree checks mptfusion: initChainBuffers should return errno mptfusion: zero kernel-space source of copy_to_user mptfusion: make adapter prod_name[] a pointer mptfusion: combine fw_event_work and its event_data mptfusion: tweak null pointer checks drivers/message/fusion/mptbase.c | 29 +-- drivers/message/fusion/mptbase.h |2 +- drivers/message/fusion/mptctl.c |2 +- drivers/message/fusion/mptfc.c|3 +- drivers/message/fusion/mptsas.c | 57 +++-- drivers/message/fusion/mptsas.h |2 +- drivers/message/fusion/mptscsih.c |7 +++-- drivers/message/fusion/mptspi.c |5 ++-- 8 files changed, 52 insertions(+), 55 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig wrote:
> On Mon, Jun 02, 2014 at 10:37:26AM -0400, Joe Lawrence wrote: > > If mpt3sas_send_trigger_data_event exits early without inserting a > > fw_event, be sure to undo any prior allocations. > > Looks good, but why don't we just allocate the two in a single > allocation? Hi Christoph, The following routines already handle two allocations safely: mpt2sas_scsih_event_callback mpt3sas_scsih_event_callback but if we wanted to merge them, it could look something like this (introducing a bunch of UglyCamelCaseCastings). Compile tested only... and doesn't include mpt3sas or fusion versions. -- >8 -- >From b34615dfb103613f228a82eb4eb6644a04036256 Mon Sep 17 00:00:00 2001 From: Joe Lawrence Date: Sun, 1 Jun 2014 22:36:45 -0400 Subject: [PATCH] mpt3sas: combine fw_event_work and its event_data Tack the firmware reply event_data payload to the end of its corresponding struct fw_event_work allocation. This fixes the following smatch warning: drivers/scsi/mpt3sas/mpt3sas_scsih.c:2522 mpt3sas_send_trigger_data_event() warn: possible memory leak of 'fw_event' Signed-off-by: Joe Lawrence Cc: Dan Carpenter Cc: Sreekanth Reddy --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 54 +- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index e6f0720..8c2aa5d 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -190,7 +190,7 @@ struct fw_event_work { u8 VP_ID; u8 ignore; u16 event; - void*event_data; + charevent_data[0]; }; /* raid transport support */ @@ -2495,7 +2495,6 @@ _scsih_fw_event_free(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work spin_lock_irqsave(&ioc->fw_event_lock, flags); list_del(&fw_event->list); - kfree(fw_event->event_data); kfree(fw_event); spin_unlock_irqrestore(&ioc->fw_event_lock, flags); } @@ -2516,12 +2515,10 @@ mpt3sas_send_trigger_data_event(struct MPT3SAS_ADAPTER *ioc, if (ioc->is_driver_loading) return; - fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC); + fw_event = kzalloc(sizeof(*fw_event) + sizeof(*event_data), + GFP_ATOMIC); if (!fw_event) return; - fw_event->event_data = kzalloc(sizeof(*event_data), GFP_ATOMIC); - if (!fw_event->event_data) - return; fw_event->event = MPT3SAS_PROCESS_TRIGGER_DIAG; fw_event->ioc = ioc; memcpy(fw_event->event_data, event_data, sizeof(*event_data)); @@ -3216,7 +3213,8 @@ _scsih_check_topo_delete_events(struct MPT3SAS_ADAPTER *ioc, if (fw_event->event != MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST || fw_event->ignore) continue; - local_event_data = fw_event->event_data; + local_event_data = (Mpi2EventDataSasTopologyChangeList_t *) + &fw_event->event_data; if (local_event_data->ExpStatus == MPI2_EVENT_SAS_TOPO_ES_ADDED || local_event_data->ExpStatus == @@ -5051,7 +5049,8 @@ _scsih_sas_topology_change_event(struct MPT3SAS_ADAPTER *ioc, u64 sas_address; unsigned long flags; u8 link_rate, prev_link_rate; - Mpi2EventDataSasTopologyChangeList_t *event_data = fw_event->event_data; + Mpi2EventDataSasTopologyChangeList_t *event_data = + (Mpi2EventDataSasTopologyChangeList_t *) &fw_event->event_data; #ifdef CONFIG_SCSI_MPT3SAS_LOGGING if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK) @@ -5249,7 +5248,7 @@ _scsih_sas_device_status_change_event(struct MPT3SAS_ADAPTER *ioc, u64 sas_address; unsigned long flags; Mpi2EventDataSasDeviceStatusChange_t *event_data = - fw_event->event_data; + (Mpi2EventDataSasDeviceStatusChange_t *) &fw_event->event_data; #ifdef CONFIG_SCSI_MPT3SAS_LOGGING if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK) @@ -5345,7 +5344,7 @@ _scsih_sas_enclosure_dev_status_change_event(struct MPT3SAS_ADAPTER *ioc, #ifdef CONFIG_SCSI_MPT3SAS_LOGGING if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK) _scsih_sas_enclosure_dev_status_change_event_debug(ioc, -fw_event->event_data); +&fw_event->event_data); #endif } @@ -5369,7 +5368,8 @@ _scsih_sas_broadcast_primitive_event(struct MPT3SAS_ADAPTER *ioc, u32 termination_count; u32 query_count; Mpi2SCSITaskManagementReply_t *mpi_reply; - Mpi2EventDataSasBroa
Re: [PATCH 0/4] mpt2/mpt3 static checker fixups
On Mon, 2 Jun 2014 09:33:07 -0700 Christoph Hellwig wrote: > On Mon, Jun 02, 2014 at 10:34:28AM -0400, Joe Lawrence wrote: > > Hello Sreekanth, Dan, > > > > These are a few minor smatch and sparse static checker fixes for the LSI > > mpt2 and mpt3 drivers. The first three fix real potential bugs and the > > last cleans up a noisy complaint from sparse. > > Can you check if any of these also applies to drivers/message/fusion? > There is tons of copy & paste between all three drivers unfortunately. The fixes in this patchset don't apply to drivers/message/fusion, however there are a half-dozen or so warnings that I could take a look at if the driver is still maintained. -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] mpt2sas: annotate ioc->reply_post_host_index as __iomem
The MPT2SAS_ADAPTER reply_post_host_index[] holds calculated addresses in memory mapped register space. Add an "__iomem" annotation to silence the following sparse warnings: drivers/scsi/mpt2sas/mpt2sas_base.c:1006:43: warning: incorrect type in argument 2 (different address spaces) expected void volatile [noderef] *addr got unsigned long long [usertype] * drivers/scsi/mpt2sas/mpt2sas_base.c:4299:22: warning: cast removes address space of expression drivers/scsi/mpt2sas/mpt2sas_base.c:4303:27: warning: cast removes address space of expression Signed-off-by: Joe Lawrence Cc: Dan Carpenter Cc: Sreekanth Reddy --- drivers/scsi/mpt2sas/mpt2sas_base.c |9 + drivers/scsi/mpt2sas/mpt2sas_base.h |2 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index bde63f7..a6e477e 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -4295,12 +4295,13 @@ mpt2sas_base_attach(struct MPT2SAS_ADAPTER *ioc) goto out_free_resources; if (ioc->is_warpdrive) { - ioc->reply_post_host_index[0] = - (resource_size_t *)&ioc->chip->ReplyPostHostIndex; + ioc->reply_post_host_index[0] = (resource_size_t __iomem *) + &ioc->chip->ReplyPostHostIndex; for (i = 1; i < ioc->cpu_msix_table_sz; i++) - ioc->reply_post_host_index[i] = (resource_size_t *) - ((u8 *)&ioc->chip->Doorbell + (0x4000 + ((i - 1) + ioc->reply_post_host_index[i] = + (resource_size_t __iomem *) + ((u8 __iomem *)&ioc->chip->Doorbell + (0x4000 + ((i - 1) * 4))); } diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h index 1f2ac3a..23d0c85 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.h +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h @@ -837,7 +837,7 @@ struct MPT2SAS_ADAPTER { u8 msix_enable; u16 msix_vector_count; u8 *cpu_msix_table; - resource_size_t **reply_post_host_index; + resource_size_t __iomem **reply_post_host_index; u16 cpu_msix_table_sz; u32 ioc_reset_count; MPT2SAS_FLUSH_RUNNING_CMDS schedule_dead_ioc_flush_running_cmds; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] mpt3sas: correct scsi_{target,device} hostdata allocation
In _scsih_{slave,target}_alloc, an incorrect structure type is passed to sizeof() when allocating storage for hostdata. Luckily larger structure types were used, so at least the wrong sizes were safe: struct scsi_device (1784 bytes) > struct MPT3SAS_DEVICE (24 bytes) struct scsi_target (760 bytes) > struct MPT3SAS_TARGET (32 bytes) This fixes the following smatch warnings: drivers/scsi/mpt3sas/mpt3sas_scsih.c:1166 _scsih_target_alloc() warn: struct type mismatch 'MPT3SAS_TARGET vs scsi_target' drivers/scsi/mpt3sas/mpt3sas_scsih.c:1280 _scsih_slave_alloc() warn: struct type mismatch 'MPT3SAS_DEVICE vs scsi_device' Signed-off-by: Joe Lawrence Cc: Dan Carpenter Cc: Sreekanth Reddy --- drivers/scsi/mpt3sas/mpt3sas_scsih.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index a961fe1..e6f0720 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -1163,7 +1163,8 @@ _scsih_target_alloc(struct scsi_target *starget) unsigned long flags; struct sas_rphy *rphy; - sas_target_priv_data = kzalloc(sizeof(struct scsi_target), GFP_KERNEL); + sas_target_priv_data = kzalloc(sizeof(*sas_target_priv_data), + GFP_KERNEL); if (!sas_target_priv_data) return -ENOMEM; @@ -1277,7 +1278,8 @@ _scsih_slave_alloc(struct scsi_device *sdev) struct _sas_device *sas_device; unsigned long flags; - sas_device_priv_data = kzalloc(sizeof(struct scsi_device), GFP_KERNEL); + sas_device_priv_data = kzalloc(sizeof(*sas_device_priv_data), + GFP_KERNEL); if (!sas_device_priv_data) return -ENOMEM; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] mpt3sas: fix possible memory leak in mpt3sas_send_trigger_data_event
If mpt3sas_send_trigger_data_event exits early without inserting a fw_event, be sure to undo any prior allocations. This fixes the following smatch warning: drivers/scsi/mpt3sas/mpt3sas_scsih.c:2522 mpt3sas_send_trigger_data_event() warn: possible memory leak of 'fw_event' Signed-off-by: Joe Lawrence Cc: Dan Carpenter Cc: Sreekanth Reddy --- drivers/scsi/mpt3sas/mpt3sas_scsih.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index e6f0720..25e3a22 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -2520,8 +2520,10 @@ mpt3sas_send_trigger_data_event(struct MPT3SAS_ADAPTER *ioc, if (!fw_event) return; fw_event->event_data = kzalloc(sizeof(*event_data), GFP_ATOMIC); - if (!fw_event->event_data) + if (!fw_event->event_data) { + kfree(fw_event); return; + } fw_event->event = MPT3SAS_PROCESS_TRIGGER_DIAG; fw_event->ioc = ioc; memcpy(fw_event->event_data, event_data, sizeof(*event_data)); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] mpt2sas: correct scsi_{target,device} hostdata allocation
In _scsih_{slave,target}_alloc, an incorrect structure type is passed to sizeof() when allocating storage for hostdata. Luckily larger structure types were used, so at least the wrong sizes were safe: struct scsi_device (1784 bytes) > struct MPT2SAS_DEVICE (24 bytes) struct scsi_target (760 bytes) > struct MPT2SAS_TARGET (40 bytes) This fixes the following smatch warnings: drivers/scsi/mpt2sas/mpt2sas_scsih.c:1295 _scsih_target_alloc() warn: struct type mismatch 'MPT2SAS_TARGET vs scsi_target' drivers/scsi/mpt2sas/mpt2sas_scsih.c:1409 _scsih_slave_alloc() warn: struct type mismatch 'MPT2SAS_DEVICE vs scsi_device' Signed-off-by: Joe Lawrence Cc: Dan Carpenter Cc: Sreekanth Reddy --- drivers/scsi/mpt2sas/mpt2sas_scsih.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c index 6fd7d40..deb0bd2 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c @@ -1292,7 +1292,8 @@ _scsih_target_alloc(struct scsi_target *starget) unsigned long flags; struct sas_rphy *rphy; - sas_target_priv_data = kzalloc(sizeof(struct scsi_target), GFP_KERNEL); + sas_target_priv_data = kzalloc(sizeof(*sas_target_priv_data), + GFP_KERNEL); if (!sas_target_priv_data) return -ENOMEM; @@ -1406,7 +1407,8 @@ _scsih_slave_alloc(struct scsi_device *sdev) struct _sas_device *sas_device; unsigned long flags; - sas_device_priv_data = kzalloc(sizeof(struct scsi_device), GFP_KERNEL); + sas_device_priv_data = kzalloc(sizeof(*sas_device_priv_data), + GFP_KERNEL); if (!sas_device_priv_data) return -ENOMEM; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] mpt2/mpt3 static checker fixups
Hello Sreekanth, Dan, These are a few minor smatch and sparse static checker fixes for the LSI mpt2 and mpt3 drivers. The first three fix real potential bugs and the last cleans up a noisy complaint from sparse. Joe Lawrence (4): mpt2sas: correct scsi_{target,device} hostdata allocation mpt3sas: correct scsi_{target,device} hostdata allocation mpt3sas: fix possible memory leak in mpt3sas_send_trigger_data_event mpt2sas: annotate ioc->reply_post_host_index as __iomem drivers/scsi/mpt2sas/mpt2sas_base.c |9 + drivers/scsi/mpt2sas/mpt2sas_base.h |2 +- drivers/scsi/mpt2sas/mpt2sas_scsih.c |6 -- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 10 +++--- 4 files changed, 17 insertions(+), 10 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make SCSI error handler code easier to understand
On Mon, 26 May 2014 17:12:27 +0200 Bart Van Assche wrote: > Every now and then someone asks how it is avoided that the SCSI error > handler and the SCSI completion handler are invoked concurrently for > the same SCSI command. Hence this patch series that should make the SCSI > error handler code a little easier to understand. Hi Bart, We talked about REQ_ATOM_COMPLETE a while back (so I may be one of those people who periodically ask about SCSI error handling / completion), and I just thought I'd chime in here to clarify the scenario we were worried about. Before d555a2ab "[SCSI] Fix spurious request sense in error handling" we saw the following sequence of events: [ 1394.345991] sd 2:0:1:0: [sdw] Done: [ 1394.361790] TIMEOUT [ 1394.374148] sd 2:0:1:0: [sdw] [ 1394.388775] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK [ 1394.409068] sd 2:0:1:0: [sdw] CDB: [ 1394.424782] Read(10): 28 00 00 00 00 18 00 00 08 00 [ 1394.443772] sd 2:0:1:0: [sdw] scmd 88104dce02c0 abort scheduled [ 1394.474026] sd 2:0:1:0: [sdw] aborting command 88104dce02c0 [ 1394.573338] qla2xxx [:46:00.0]-801c:2: Abort command issued nexus=2:1:0 -- 1 2002. [ 1394.609313] sd 2:0:1:0: [sdw] scmd 88104dce02c0 retry aborted command [ 1425.397434] sd 2:0:1:0: [sdw] Done: [ 1425.417802] TIMEOUT [ 1425.431914] sd 2:0:1:0: [sdw] [ 1425.448247] Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [ 1425.470736] sd 2:0:1:0: [sdw] CDB: [ 1425.488005] Read(10): 28 00 00 00 00 18 00 00 08 00 [ 1425.508472] sd 2:0:1:0: [sdw] scmd 88104dce02c0 previous abort failed [ 1425.533299] scsi_eh_2: waking up 0/1/1 [ 1425.551189] sd 2:0:1:0: scsi_eh_prt_fail_stats: cmds failed: 1, cancel: 0 [ 1425.575897] Total of 1 commands on 1 devices require eh work [ 1425.597970] sd 2:0:1:0: [sdw] scsi_eh_2: requesting sense ie, something like: CMD -> .queuecommand ... timeout ABORT -> .eh_abort_handler CMD (retry) -> .queuecommand ... timeout ABORT -> scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) ... into scsi_error_handler ... REQUEST SENSE -> .queuecommand We eventually found our way into scsi_unjam_host, which "*knows* that all commands on the bus have either completed, failed or timed out.", and then scsi_send_eh_cmnd, which "hijacks" a SCSI command structure. Presumably the second comment follows from the first. But in the case of timeout, when was the LLDD informed to abort or forget about that scsi command structure? In our testing, it appeared that the LLDD's .queuecommand was called back-to-back with the same scsi_cmnd pointer. The driver in question was qla2xxx, who keeps a driver-private structure to scsi_cmnd mapping (assumed to be one-to-one) that got confused by this sequence of events. After d555a2ab, in this scenario scsi_unjam_host skips the request sense and escalates to the next level (scsi_eh_abort_cmds), skipping straight to the abort as we would have expected... This avoids the scsi_cmnd "hijack"... but was it ever safe to do so in the first place? Or should LLDD expect to gracefully handle the same scsi_cmnd pointer coming into .queuecommand w/o an intermediate abort or other scsi_host_template callback? Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] scsi_transport_sas: move bsg destructor into sas_rphy_remove
The recent change in sysfs, bcdde7e221a8750f9b62b6d0bd31b72ea4ad9309 "sysfs: make __sysfs_remove_dir() recursive" revealed an asymmetric rphy device creation/deletion sequence in scsi_transport_sas: modprobe mpt2sas sas_rphy_add device_add A rphy->dev device_add B sas_device transport class device_add C sas_end_device transport class device_add D bsg class rmmod mpt2sas sas_rphy_delete sas_rphy_remove device_del B device_del C device_del A sysfs_remove_group recursive sysfs dir removal sas_rphy_free device_del D warning where device A is the parent of B, C, and D. When sas_rphy_free tries to unregister the bsg request queue (device D above), the ensuing sysfs cleanup discovers that its sysfs group has already been removed and emits a warning, "sysfs group... not found for kobject 'end_device-X:0'". Since bsg creation is a side effect of sas_rphy_add, move its complementary removal call into sas_rphy_remove. This imposes the following tear-down order for the devices above: D, B, C, A. Note the sas_device and sas_end_device transport class devices (B and C above) are created and destroyed both via the list match traversal in attribute_container_device_trigger, so the order in which they are handled is fixed. This is fine as long as they are deleted before their parent device. Signed-off-by: Joe Lawrence Cc: "James E.J. Bottomley" Cc: Christoph Hellwig Acked-by: Dan Williams --- drivers/scsi/scsi_transport_sas.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c index 1b68142..c341f85 100644 --- a/drivers/scsi/scsi_transport_sas.c +++ b/drivers/scsi/scsi_transport_sas.c @@ -1621,8 +1621,6 @@ void sas_rphy_free(struct sas_rphy *rphy) list_del(&rphy->list); mutex_unlock(&sas_host->lock); - sas_bsg_remove(shost, rphy); - transport_destroy_device(dev); put_device(dev); @@ -1681,6 +1679,7 @@ sas_rphy_remove(struct sas_rphy *rphy) } sas_rphy_unlink(rphy); + sas_bsg_remove(NULL, rphy); transport_remove_device(dev); device_del(dev); } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RESEND] scsi_transport_sas: move bsg destructor into sas_rphy_remove
On Wed, 2 Apr 2014 16:48:52 -0400 Joe Lawrence wrote: > On Wed, 2 Apr 2014 16:40:41 -0400 > Joe Lawrence wrote: > > > Since bsg creation is a side effect of sas_rphy_add, move its > > complementary removal call into sas_rphy_remove. > > Hello James, > > This a resend of a patch posted to quiet an end-device sysfs warning in > its device deletion path when removing the mpt2sas driver: > > (initial report) > sysfs group not found for kobject on mpt2sas unload > http://marc.info/?t=13849746004 > > (patch + ACK + comments) > http://marc.info/?t=13860945551 > > (gentoo, LSI repro) > mpt2sas driver barfs when force removing a drive on 3.13.1 > http://marc.info/?t=13912235131 > > Dan Williams had a few other suggestions for cleanup in this area, but > those could be handled in a subsequent patch. This one missed 3.14 > inclusion and the warning still occurs in Linus's tree as of today > (post scsi/block merge) hence the repost. Ping? Just reproduced on 3.15-rc3. Also found this: (~500 repro count on Fedora Problem Tracker) http://retrace.fedoraproject.org/faf/problems/1623421/ -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.15-mw: Oops Workqueue: writeback bdi_writeback_workfn (flush-8:16) RIP: e030:[] [] kobject_put+0x11/0x70
On Mon, 14 Apr 2014 04:30:15 -0700 Christoph Hellwig wrote: > On Sat, Apr 12, 2014 at 01:34:31PM +0200, Sander Eikelenboom wrote: > > Hi, > > > > I just ran into the oops belowafter some uptime. > > Classic use after free introduced by my recent changes, sorry. > > This should fix it: > > --- > From: Christoph Hellwig > Subject: scsi: don't reference freed command in scsi_init_sgtable > > When scsi_init_io fails we have to release our device reference, but > we do this trying to reference the just freed command. Add a local > scsi_device pointer to fix this. > > Reported-by: Sander Eikelenboom > Signed-off-by: Christoph Hellwig > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 65a123d..54eff6a 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1044,6 +1044,7 @@ static int scsi_init_sgtable(struct request *req, > struct scsi_data_buffer *sdb, > */ > int scsi_init_io(struct scsi_cmnd *cmd, gfp_t gfp_mask) > { > + struct scsi_device *sdev = cmd->device; > struct request *rq = cmd->request; > > int error = scsi_init_sgtable(rq, &cmd->sdb, gfp_mask); > @@ -1091,7 +1092,7 @@ err_exit: > scsi_release_buffers(cmd); > cmd->request->special = NULL; > scsi_put_command(cmd); > - put_device(&cmd->device->sdev_gendev); > + put_device(&sdev->sdev_gendev); > return error; > } > EXPORT_SYMBOL(scsi_init_io); Hi Christoph, I hit a similar crash last week on a franken-kernel here (3.14 + scsi misc + qlogic patches + out of tree drivers + terriblethingsIknow). I think there is one other similar use-after-free that's been in place for a while now: int scsi_prep_return(struct request_queue *q, struct request *req, int ret) { struct scsi_device *sdev = q->queuedata; switch (ret) { case BLKPREP_KILL: req->errors = DID_NO_CONNECT << 16; /* release the command and kill it */ if (req->special) { struct scsi_cmnd *cmd = req->special; scsi_release_buffers(cmd); scsi_put_command(cmd);<< put_device(&cmd->device->sdev_gendev);<< req->special = NULL; } break; ... and the backtrace looked like: general protection fault: [#1] SMP Modules linked in: ccmod(POF) ftmod(OF) ipmi_devintf ipmi_msghandler bonding sg x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ixgbe(OF) aesni_intel igb(OF) ptp lrw pps_core gf128mul glue_helper i2c_algo_bit mdio ablk_helper cryptd pcspkr dca ntb i2c_core matroxfb(OF) videosw(OF) nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c dm_service_time sd_mod(OF) crc_t10dif crct10dif_common qla2xxx(OF) scsi_transport_fc mpt2sas(OF) raid_class scsi_tgt scsi_transport_sas dm_mirror dm_region_hash dm_log dm_multipath dm_mod CPU: 8 PID: 23976 Comm: systemd-udevd Tainted: PF W O 3.14.0+ #2 Hardware name: Stratus ftServer 6400/G7LAY, BIOS BIOS Version 6.3:57 12/25/2013 task: 880420b3d7c0 ti: 880729138000 task.ti: 880729138000 RIP: 0010:[] [] kobject_put+0xd/0x60 RSP: 0018:880729139a80 EFLAGS: 00010002 RAX: RBX: 6b6b6b6b6b6b6ce3 RCX: 0001002e0003 RDX: 002e RSI: ea0021370400 RDI: 6b6b6b6b6b6b6ce3 RBP: 880729139a88 R08: 88084dc16300 R09: 0001002e0002 R10: 88104f603a80 R11: ea0021370400 R12: 88084dc16300 R13: 0001 R14: 881026935388 R15: 880ff56b3a18 FS: 7f2eed940880() GS:88085fd0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f2eed0e2c20 CR3: 00084d6c8000 CR4: 000407e0 Stack: 88076248d0a8 880729139a98 813e57d7 880729139ac0 8141bf3e 00018141be66 88076248d0a8 88104def6840 880729139b20 a00a46e0 88104def6840 88076248d0a8 Call Trace: [] put_device+0x17/0x20 [] scsi_prep_return+0x9e/0xc0 [] sd_prep_fn+0x70/0xcd0 [sd_mod] [] blk_peek_request+0x12f/0x250 [] scsi_request_fn+0x48/0x570 [] __blk_run_queue+0x33/0x40 [] queue_unplugged+0x2a/0xa0 [] blk_flush_plug_list+0x1d8/0x230 [] blk_finish_plug+0x14/0x40 [] __do_page_cache_readahead+0x209/0x290 [] force_page_cache_readahead+0x6d/0xa0 [] page_cache_sync_readahead+0x43/0x50 [] generic_file_aio_read+0x4f5/0x720 [] blkdev_aio_read+0x4b/0x70 [] do_sync_read+0x67/0xa0 [] vfs_read+0x9b/0x160 [] SyS_read+0x55/0xd0 [] system_call_fastpath+0x16/0x1b Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND] scsi_transport_sas: move bsg destructor into sas_rphy_remove
The recent change in sysfs, bcdde7e221a8750f9b62b6d0bd31b72ea4ad9309 "sysfs: make __sysfs_remove_dir() recursive" revealed an asymmetric rphy device creation/deletion sequence in scsi_transport_sas: modprobe mpt2sas sas_rphy_add device_add A rphy->dev device_add B sas_device transport class device_add C sas_end_device transport class device_add D bsg class rmmod mpt2sas sas_rphy_delete sas_rphy_remove device_del B device_del C device_del A sysfs_remove_group recursive sysfs dir removal sas_rphy_free device_del D warning where device A is the parent of B, C, and D. When sas_rphy_free tries to unregister the bsg request queue (device D above), the ensuing sysfs cleanup discovers that its sysfs group has already been removed and emits a warning, "sysfs group... not found for kobject 'end_device-X:0'". Since bsg creation is a side effect of sas_rphy_add, move its complementary removal call into sas_rphy_remove. This imposes the following tear-down order for the devices above: D, B, C, A. Note the sas_device and sas_end_device transport class devices (B and C above) are created and destroyed both via the list match traversal in attribute_container_device_trigger, so the order in which they are handled is fixed. This is fine as long as they are deleted before their parent device. Signed-off-by: Joe Lawrence Cc: "James E.J. Bottomley" Cc: Dan Williams Cc: "Reddy, Sreekanth" --- drivers/scsi/scsi_transport_sas.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c index 1b68142..c341f85 100644 --- a/drivers/scsi/scsi_transport_sas.c +++ b/drivers/scsi/scsi_transport_sas.c @@ -1621,8 +1621,6 @@ void sas_rphy_free(struct sas_rphy *rphy) list_del(&rphy->list); mutex_unlock(&sas_host->lock); - sas_bsg_remove(shost, rphy); - transport_destroy_device(dev); put_device(dev); @@ -1681,6 +1679,7 @@ sas_rphy_remove(struct sas_rphy *rphy) } sas_rphy_unlink(rphy); + sas_bsg_remove(NULL, rphy); transport_remove_device(dev); device_del(dev); } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RESEND] scsi_transport_sas: move bsg destructor into sas_rphy_remove
On Wed, 2 Apr 2014 16:40:41 -0400 Joe Lawrence wrote: > Since bsg creation is a side effect of sas_rphy_add, move its > complementary removal call into sas_rphy_remove. Hello James, This a resend of a patch posted to quiet an end-device sysfs warning in its device deletion path when removing the mpt2sas driver: (initial report) sysfs group not found for kobject on mpt2sas unload http://marc.info/?t=13849746004 (patch + ACK + comments) http://marc.info/?t=13860945551 (gentoo, LSI repro) mpt2sas driver barfs when force removing a drive on 3.13.1 http://marc.info/?t=13912235131 Dan Williams had a few other suggestions for cleanup in this area, but those could be handled in a subsequent patch. This one missed 3.14 inclusion and the warning still occurs in Linus's tree as of today (post scsi/block merge) hence the repost. Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_sas: move bsg destructor into sas_rphy_remove
On Wed, 12 Feb 2014 11:13:22 -0800 Dan Williams wrote: > > Acked-by: Dan Williams > > I think this is the right move, and some follow-up cleanups / fixes > come to mind: Dan -- Thanks for taking a look. A few comments below. > 1/ Why is sas_bsg_remove() multiplexing 2 use cases? Let's just have > a sas_host_bsg_remove() and a sas_device_bsg_remove(). Splitting the two use cases reduces them into a simple conditional call to bsg_unregister_queue(), that could just be done in place. On the other hand, sas_bsg_initialize() looks like it handles both cases as well. > 2/ I think sas_rphy_free() mishandles the reference counts in the > sas_rphy_add() failing case. > sas_rphy_free() does: > transport_destroy_device(dev); /* which is just put_device(dev) */ > put_device(dev); > > ...but if sas_rphy_add() failed we never took the reference from > transport_add_device() and I expect we only have the one reference > from device_initialize() to drop. > > We also ignore the return code from transport_add_device() which is > another hole. Ugh, I think fixing this properly would require piping the return code of the trigger function back out of the attribute container code: void transport_add_device() void attribute_container_device_trigger() loop through the attribute_container_list if match int transport_add_class_device() ... return error ... Not to mention several callers of sas_rphy_add keep on truckin' when it does return failure. Perhaps this can be handled in another way? > Commenting on this mail should refresh this in James' queue. But you > may want to resend anyways given this is a few months old. Whatever is easier for James. Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mpt2sas driver barfs when force removing a drive on 3.13.1
Hi Sreekanth, There hasn't been much recent activity in drivers/scsi/scsi_transport_sas.c, so I wasn't sure who could best review the patch. Dan Williams added sas_rphy_unlink() back in 2011, perhaps he can remember why sas_bsg_remove wasn't performed there? Regards, -- Joe On Wed, 12 Feb 2014 11:43:58 + "Reddy, Sreekanth" wrote: > Hi Joe, > > May I known the status of this patch having subject > "[PATCH] scsi_transport_sas: move bsg destructor into sas_rphy_remove". > > I need the status of this patch as I am also observing the same WARNING call > trace on latest 3.14.0-rc2+ kernel whenever we unload the mpt2sas/mpt3sas > driver or whenever we unplug the drive attached to the HBA. > But after applying yours patch, these WARNING call trace is not observed. > > Regards, > Sreekanth > > >-Original Message- > >From: Joe Lawrence [mailto:joe.lawre...@stratus.com] > >Sent: Monday, February 03, 2014 7:54 PM > >To: prometheanf...@gentoo.org > >Cc: Nandigama, Nagalakshmi; Reddy, Sreekanth; Support; DL-MPT Fusion > >Linux; linux-scsi@vger.kernel.org; jbottom...@parallels.com; linux- > >ker...@vger.kernel.org; ker...@gentoo.org > >Subject: Re: mpt2sas driver barfs when force removing a drive on 3.13.1 > > > >On Fri, 31 Jan 2014 20:49:39 -0600 > >Matthew Thode wrote: > > > >> I decided to pull a drive while it was in use out of laziness (it was > >> open via luks, but not in actual use). Got a fun trace as a result. > >> Just thought you'd like to know :D > >> > > > >Hi Matthew, > > > >The first trace looks a lot like what I see on mpt2sas driver removal [1]. I > >posted a suggested fix back in Dec [2], which you might try, however it is > >not > >reviewed at this point. > > > >[1] http://thread.gmane.org/gmane.linux.scsi/86237 > >[2] https://github.com/joe- > >lawrence/linux/compare/scsi_transport_sas_sysfs_warning.patch > > > >Regards, > > > >-- Joe > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mpt2sas driver barfs when force removing a drive on 3.13.1
On Fri, 31 Jan 2014 20:49:39 -0600 Matthew Thode wrote: > I decided to pull a drive while it was in use out of laziness (it was > open via luks, but not in actual use). Got a fun trace as a result. > Just thought you'd like to know :D > Hi Matthew, The first trace looks a lot like what I see on mpt2sas driver removal [1]. I posted a suggested fix back in Dec [2], which you might try, however it is not reviewed at this point. [1] http://thread.gmane.org/gmane.linux.scsi/86237 [2] https://github.com/joe-lawrence/linux/compare/scsi_transport_sas_sysfs_warning.patch Regards, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] scsi_transport_sas: move bsg destructor into sas_rphy_remove
The recent change in sysfs, bcdde7e221a8750f9b62b6d0bd31b72ea4ad9309 "sysfs: make __sysfs_remove_dir() recursive" revealed an asymmetric rphy device creation/deletion sequence in scsi_transport_sas: modprobe mpt2sas sas_rphy_add device_add A rphy->dev device_add B sas_device transport class device_add C sas_end_device transport class device_add D bsg class rmmod mpt2sas sas_rphy_delete sas_rphy_remove device_del B device_del C device_del A sysfs_remove_group recursive sysfs dir removal sas_rphy_free device_del D warning where device A is the parent of B, C, and D. When sas_rphy_free tries to unregister the bsg request queue (device D above), the ensuing sysfs cleanup discovers that its sysfs group has already been removed and emits a warning, "sysfs group... not found for kobject 'end_device-X:0'". Since bsg creation is a side effect of sas_rphy_add, move its complementary removal call into sas_rphy_remove. This imposes the following tear-down order for the devices above: D, B, C, A. Note the sas_device and sas_end_device transport class devices (B and C above) are created and destroyed both via the list match traversal in attribute_container_device_trigger, so the order in which they are handled is fixed. This is fine as long as they are deleted before their parent device. Signed-off-by: Joe Lawrence Cc: "James E.J. Bottomley" --- drivers/scsi/scsi_transport_sas.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c index 1b681427dde0..c341f855fadc 100644 --- a/drivers/scsi/scsi_transport_sas.c +++ b/drivers/scsi/scsi_transport_sas.c @@ -1621,8 +1621,6 @@ void sas_rphy_free(struct sas_rphy *rphy) list_del(&rphy->list); mutex_unlock(&sas_host->lock); - sas_bsg_remove(shost, rphy); - transport_destroy_device(dev); put_device(dev); @@ -1681,6 +1679,7 @@ sas_rphy_remove(struct sas_rphy *rphy) } sas_rphy_unlink(rphy); + sas_bsg_remove(NULL, rphy); transport_remove_device(dev); device_del(dev); } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sysfs group not found for kobject on mpt2sas unload
On Mon, 25 Nov 2013 11:23:39 -0500 Joe Lawrence wrote: > On Wed, 20 Nov 2013 14:08:40 -0500 > Joe Lawrence wrote: > > > Starting in 3.12, when loading and unloading the mpt2sas driver, I see > > the following warning: > > > > [ cut here ] > > WARNING: CPU: 20 PID: 19096 at fs/sysfs/group.c:214 > > sysfs_remove_group+0xc6/0xd0() > > sysfs group 81ca2f40 not found for kobject 'end_device-30:0' > > Modules linked in: mpt2sas(-) > > stap_edcc1781e2697fc53c3d320bc2530218_19063(OF) ebtable_nat osst > > nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat > > nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 > > iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 > > nf_defrag_ipv4 xt_conntrack nf_conntrack bonding ebtable_filter ebtables > > ip6table_filter ip6_tables ixgbe igb x86_pkg_temp_thermal ptp coretemp > > pps_core joydev mdio crc32_pclmul crc32c_intel raid_class pcspkr > > ghash_clmulni_intel scsi_transport_sas ipmi_si dca ipmi_msghandler ntb > > uinput dm_round_robin sd_mod qla2xxx syscopyarea sysfillrect sysimgblt > > i2c_algo_bit drm_kms_helper ttm drm scsi_transport_fc scsi_tgt usb_storage > > i2c_core dm_multipath [last unloaded: > > stap_2da929a187c82c607a23237c27bf2d06_18803] > > CPU: 20 PID: 19096 Comm: rmmod Tainted: GF W O 3.12.0+ #4 > > Hardware name: Stratus ftServer 6400/G7LAZ, BIOS BIOS Version 6.2:52 > > 04/09/2013 > > 0009 88083ad3bb88 8165265e 88083ad3bbd0 > > 88083ad3bbc0 8105514d 81ca2f40 > > 880851f46ef8 88007a73cf38 88083fb8c6e8 88083ad3bc20 > > Call Trace: > > [] dump_stack+0x4d/0x66 > > [] warn_slowpath_common+0x7d/0xa0 > > [] warn_slowpath_fmt+0x4c/0x50 > > [] ? sysfs_get_dirent_ns+0x4e/0x70 > > [] sysfs_remove_group+0xc6/0xd0 > > [] dpm_sysfs_remove+0x43/0x50 > > [] device_del+0x45/0x1c0 > > [] device_unregister+0x1e/0x60 > > [] bsg_unregister_queue+0x5e/0xa0 > > [] sas_rphy_free+0x7a/0xb0 [scsi_transport_sas] > > [] sas_port_delete+0x35/0x160 [scsi_transport_sas] > > [] ? sysfs_remove_link+0x23/0x30 > > [] mpt2sas_transport_port_remove+0x19a/0x1e0 [mpt2sas] > > [] _scsih_remove_device+0xb0/0x100 [mpt2sas] > > [] > > mpt2sas_device_remove_by_sas_address.part.54+0x59/0x80 [mpt2sas] > > [] _scsih_remove+0xf9/0x210 [mpt2sas] > > [] pci_device_remove+0x3b/0xb0 > > [] __device_release_driver+0x7f/0xf0 > > [] driver_detach+0xc0/0xd0 > > [] bus_remove_driver+0x55/0xd0 > > [] driver_unregister+0x2c/0x50 > > [] pci_unregister_driver+0x23/0x80 > > [] _scsih_exit+0x25/0x912 [mpt2sas] > > [] SyS_delete_module+0x16d/0x2d0 > > [] ? do_page_fault+0xe/0x10 > > [] system_call_fastpath+0x16/0x1b > > ---[ end trace b4eef98870c871fd ]--- > > > > Instrumenting the module loading/unloading cycle with systemtap, it > > reports the following sequence of events: > > > > modprobe > > device_add(A) > > device_add(B child of A) > > device_add(C child of A) > > device_add(D child of A) > > > > rmmod > > device_del(B child of A) > > device_del(C child of A) > > device_del(A) > > device_del(D child of A) << WARNING > > > > The same sequence of device_add/del events occur in 3.11, but without > > the warning. Git bisect shows bcdde7e221a8750f9b62b6d0bd31b72ea4ad9309 > > "sysfs: make __sysfs_remove_dir() recursive" as the first bad commit. > > > > FWIW, I applied Mika's patch that Tejun posted the other day, "sysfs: > handle duplicate removal attempts in sysfs_remove_group()" [1] and the > warning goes away on mpt2sas unload. > > [1] > http://git.kernel.org/cgit/linux/kernel/git/tj/misc.git/commit/?h=review-sysfs-fixes&id=a69cc96d8c434c6cb64847f37caa890af705fc5c > > -- Joe Hi James, (Correction, the sysfs change introducing the warning is 3.13-rc1+, not 3.12.) Also, it seems that in 3.13-rc2, the patch from Mika that I had referenced was reverted and that bug handled separately. 81440e7 Revert "sysfs: handle duplicate removal attempts in sysfs_remove_group()" 54d7114 sysfs: handle duplicate removal attempts in sysfs_remove_group() So in 3.13-rc2, the warning persists when loading and unloading the mpt2sas driver. I looked at the code again, and I'm not sure why the sas_bsg_remove call was placed in sas_rphy_free and not sas_rphy_remove. The sas_rphy_remove function is tasked with undoing the actions of
Re: sysfs group not found for kobject on mpt2sas unload
On Wed, 20 Nov 2013 14:08:40 -0500 Joe Lawrence wrote: > Starting in 3.12, when loading and unloading the mpt2sas driver, I see > the following warning: > > [ cut here ] > WARNING: CPU: 20 PID: 19096 at fs/sysfs/group.c:214 > sysfs_remove_group+0xc6/0xd0() > sysfs group 81ca2f40 not found for kobject 'end_device-30:0' > Modules linked in: mpt2sas(-) stap_edcc1781e2697fc53c3d320bc2530218_19063(OF) > ebtable_nat osst nf_conntrack_netbios_ns nf_conntrack_broadcast > ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT > nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat > iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack > bonding ebtable_filter ebtables ip6table_filter ip6_tables ixgbe igb > x86_pkg_temp_thermal ptp coretemp pps_core joydev mdio crc32_pclmul > crc32c_intel raid_class pcspkr ghash_clmulni_intel scsi_transport_sas ipmi_si > dca ipmi_msghandler ntb uinput dm_round_robin sd_mod qla2xxx syscopyarea > sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm scsi_transport_fc > scsi_tgt usb_storage i2c_core dm_multipath [last unloaded: > stap_2da929a187c82c607a23237c27bf2d06_18803] > CPU: 20 PID: 19096 Comm: rmmod Tainted: GF W O 3.12.0+ #4 > Hardware name: Stratus ftServer 6400/G7LAZ, BIOS BIOS Version 6.2:52 > 04/09/2013 > 0009 88083ad3bb88 8165265e 88083ad3bbd0 > 88083ad3bbc0 8105514d 81ca2f40 > 880851f46ef8 88007a73cf38 88083fb8c6e8 88083ad3bc20 > Call Trace: > [] dump_stack+0x4d/0x66 > [] warn_slowpath_common+0x7d/0xa0 > [] warn_slowpath_fmt+0x4c/0x50 > [] ? sysfs_get_dirent_ns+0x4e/0x70 > [] sysfs_remove_group+0xc6/0xd0 > [] dpm_sysfs_remove+0x43/0x50 > [] device_del+0x45/0x1c0 > [] device_unregister+0x1e/0x60 > [] bsg_unregister_queue+0x5e/0xa0 > [] sas_rphy_free+0x7a/0xb0 [scsi_transport_sas] > [] sas_port_delete+0x35/0x160 [scsi_transport_sas] > [] ? sysfs_remove_link+0x23/0x30 > [] mpt2sas_transport_port_remove+0x19a/0x1e0 [mpt2sas] > [] _scsih_remove_device+0xb0/0x100 [mpt2sas] > [] mpt2sas_device_remove_by_sas_address.part.54+0x59/0x80 > [mpt2sas] > [] _scsih_remove+0xf9/0x210 [mpt2sas] > [] pci_device_remove+0x3b/0xb0 > [] __device_release_driver+0x7f/0xf0 > [] driver_detach+0xc0/0xd0 > [] bus_remove_driver+0x55/0xd0 > [] driver_unregister+0x2c/0x50 > [] pci_unregister_driver+0x23/0x80 > [] _scsih_exit+0x25/0x912 [mpt2sas] > [] SyS_delete_module+0x16d/0x2d0 > [] ? do_page_fault+0xe/0x10 > [] system_call_fastpath+0x16/0x1b > ---[ end trace b4eef98870c871fd ]--- > > Instrumenting the module loading/unloading cycle with systemtap, it > reports the following sequence of events: > > modprobe > device_add(A) > device_add(B child of A) > device_add(C child of A) > device_add(D child of A) > > rmmod > device_del(B child of A) > device_del(C child of A) > device_del(A) > device_del(D child of A) << WARNING > > The same sequence of device_add/del events occur in 3.11, but without > the warning. Git bisect shows bcdde7e221a8750f9b62b6d0bd31b72ea4ad9309 > "sysfs: make __sysfs_remove_dir() recursive" as the first bad commit. > FWIW, I applied Mika's patch that Tejun posted the other day, "sysfs: handle duplicate removal attempts in sysfs_remove_group()" [1] and the warning goes away on mpt2sas unload. [1] http://git.kernel.org/cgit/linux/kernel/git/tj/misc.git/commit/?h=review-sysfs-fixes&id=a69cc96d8c434c6cb64847f37caa890af705fc5c -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
sysfs group not found for kobject on mpt2sas unload
Starting in 3.12, when loading and unloading the mpt2sas driver, I see the following warning: [ cut here ] WARNING: CPU: 20 PID: 19096 at fs/sysfs/group.c:214 sysfs_remove_group+0xc6/0xd0() sysfs group 81ca2f40 not found for kobject 'end_device-30:0' Modules linked in: mpt2sas(-) stap_edcc1781e2697fc53c3d320bc2530218_19063(OF) ebtable_nat osst nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack bonding ebtable_filter ebtables ip6table_filter ip6_tables ixgbe igb x86_pkg_temp_thermal ptp coretemp pps_core joydev mdio crc32_pclmul crc32c_intel raid_class pcspkr ghash_clmulni_intel scsi_transport_sas ipmi_si dca ipmi_msghandler ntb uinput dm_round_robin sd_mod qla2xxx syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm scsi_transport_fc scsi_tgt usb_storage i2c_core dm_multipath [last unloaded: stap_2da929a187c82c607a23237c27bf2d06_18803] CPU: 20 PID: 19096 Comm: rmmod Tainted: GF W O 3.12.0+ #4 Hardware name: Stratus ftServer 6400/G7LAZ, BIOS BIOS Version 6.2:52 04/09/2013 0009 88083ad3bb88 8165265e 88083ad3bbd0 88083ad3bbc0 8105514d 81ca2f40 880851f46ef8 88007a73cf38 88083fb8c6e8 88083ad3bc20 Call Trace: [] dump_stack+0x4d/0x66 [] warn_slowpath_common+0x7d/0xa0 [] warn_slowpath_fmt+0x4c/0x50 [] ? sysfs_get_dirent_ns+0x4e/0x70 [] sysfs_remove_group+0xc6/0xd0 [] dpm_sysfs_remove+0x43/0x50 [] device_del+0x45/0x1c0 [] device_unregister+0x1e/0x60 [] bsg_unregister_queue+0x5e/0xa0 [] sas_rphy_free+0x7a/0xb0 [scsi_transport_sas] [] sas_port_delete+0x35/0x160 [scsi_transport_sas] [] ? sysfs_remove_link+0x23/0x30 [] mpt2sas_transport_port_remove+0x19a/0x1e0 [mpt2sas] [] _scsih_remove_device+0xb0/0x100 [mpt2sas] [] mpt2sas_device_remove_by_sas_address.part.54+0x59/0x80 [mpt2sas] [] _scsih_remove+0xf9/0x210 [mpt2sas] [] pci_device_remove+0x3b/0xb0 [] __device_release_driver+0x7f/0xf0 [] driver_detach+0xc0/0xd0 [] bus_remove_driver+0x55/0xd0 [] driver_unregister+0x2c/0x50 [] pci_unregister_driver+0x23/0x80 [] _scsih_exit+0x25/0x912 [mpt2sas] [] SyS_delete_module+0x16d/0x2d0 [] ? do_page_fault+0xe/0x10 [] system_call_fastpath+0x16/0x1b ---[ end trace b4eef98870c871fd ]--- Instrumenting the module loading/unloading cycle with systemtap, it reports the following sequence of events: modprobe device_add(A) device_add(B child of A) device_add(C child of A) device_add(D child of A) rmmod device_del(B child of A) device_del(C child of A) device_del(A) device_del(D child of A) << WARNING The same sequence of device_add/del events occur in 3.11, but without the warning. Git bisect shows bcdde7e221a8750f9b62b6d0bd31b72ea4ad9309 "sysfs: make __sysfs_remove_dir() recursive" as the first bad commit. The systemtap script and its output (from mpt2sas calls only) below: % stap -d scsi_transport_sas -d mpt2sas -e 'probe kernel.function("device_add"),kernel.function("device_del") { printf("%s dev=%p\n%s", pp(), $dev, $$parms$$); print_backtrace(); printf("\n"); } probe begin { printf("loaded\n") }' % modprobe mpt2sas kernel.function("device_add@drivers/base/core.c:976") dev=0x88007a73cf38 dev={.parent=0x88007a73bf60, .p=0x0, .kobj={.name="end_device-30:0", .entry={.next=0x88007a73cf50, .prev=0x88007a73cf50}, .parent=0x0, .kset=0x88085f878848, .ktype=0x81ca2300, .sd=0x0, .kref={.refcount={.counter=3}}, .state_initialized=1, .state_in_sysfs=0, .state_add_uevent_sent=0, .state_remove_uevent_sent=0, .uevent_suppress=0}, .init_name="", .type=0x0, .mutex={.count={.counter=1}, .wait_lock={={.rlock={.raw_lock={={.head_tail=0, .tickets={.head='\000', .t 0x813e4760 : device_add+0x0/0x640 [kernel] 0xa00dcd54 : sas_rphy_add+0x64/0x170 [scsi_transport_sas] 0xa0235c73 : mpt2sas_transport_port_add+0x273/0xa90 [mpt2sas] 0xa022aace : _scsih_scan_finished+0x20e/0x2c0 [mpt2sas] 0x8141dd77 : do_scsi_scan_host+0x77/0xa0 [kernel] 0x8141df6c : do_scan_async+0x1c/0x150 [kernel] 0x8107d829 : async_run_entry_fn+0x39/0x120 [kernel] 0x81070507 : process_one_work+0x177/0x460 [kernel] 0x810711eb : worker_thread+0x11b/0x3a0 [kernel] 0x81077822 : kthread+0xd2/0xf0 [kernel] 0x81661d2c : ret_from_fork+0x7c/0xb0 [kernel] kernel.function("device_add@drivers/base/core.c:976") dev=0x88007a73b4f8 dev={.parent=0x88007a73cf38, .p=0x0, .kobj={.name="end_device-30:0", .entry={.next=0x88007a73b510, .prev=0x88007a73b510}, .parent=0x0, .kset=0x88085f878848, .ktype=0x81ca2300, .sd=0x0, .kref={.refcount={.counter=2}}, .state_initialized=1, .state_in_sysfs=0, .state_add_uevent_sent=0, .state_remove_uevent_sen
Re: PATCH: st.c: Fix blk_get_queue usage
On Fri, 15 Nov 2013 12:23:06 +0100 Bodo Stroesser wrote: > On 14.11.2013, at 22.50, Joe Lawrence wrote: > > > On Thu, 14 Nov 2013 20:22:40 +0100 > > Bodo Stroesser wrote: > > > > > On 14.11.2013, at 20.05, Kai M??kisara (Kolumbus) > > > wrote: > > > > > > > > > > > On 14.11.2013, at 16.48, Bodo Stroesser > > > > wrote: > > > > > > < snip > > > > > > > > > > With this patch, blk_get_queue() is not called with the correct > > > > argument. > > > > Maybe change the call to blk_get_queue(SDp->request_queue) ? > > > > > > > > > if (blk_get_queue(disk->queue)) > > > > > > Yes, thank you. You are obviously right. Below is the revised patch. > > > Sorry for the mistake. > > > > > > Bodo > > > > > > > > goto out_put_disk; > > > > > + disk->queue = SDp->request_queue; > > > > > tpnt->driver = &st_template; > > > > > > > > > > tpnt->device = SDp-- > > > > > To unsubscribe from this list: send the line "unsubscribe linux-scsi" > > > > > in > > > > > the body of a message to majord...@vger.kernel.org > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > ; > > > > > > > > Thanks, > > > > Kai > > > > > > > > > > From: Bodo Stroesser > > > Date: Thu, 14 Nov 2013 14:35:00 +0100 > > > Subject: [PATCH] sg: fix blk_get_queue usage > > > > > > If blk_queue_get() in st_probe fails, disk->queue must not > > > be set to SDp->request_queue, as that would result in > > > put_disk() dropping a not taken reference. > > > > > > Thus, disk->queue should be set only after a successful > > > blk_queue_get(). > > > Revised patch due to a hint from Kai Makisara. > > > > > > Signed-off-by: Bodo Stroesser > > > > > > --- > > > > > > --- a/drivers/scsi/st.c 2013-11-14 14:10:40.0 +0100 > > > +++ b/drivers/scsi/st.c 2013-11-14 14:10:57.0 +0100 > > > @@ -4107,11 +4107,11 @@ > > > kref_init(&tpnt->kref); > > > tpnt->disk = disk; > > > disk->private_data = &tpnt->driver; > > > - disk->queue = SDp->request_queue; > > > /* SCSI tape doesn't register this gendisk via add_disk(). Manually > > >* take queue reference that release_disk() expects. */ > > > - if (blk_get_queue(disk->queue)) > > > + if (blk_get_queue(SDp->request_queue)) > > > goto out_put_disk; > > > + disk->queue = SDp->request_queue; > > > tpnt->driver = &st_template; > > > > > > tpnt->device = SDp; > > > ??{.n?+???+%??lzwm??b??r??zX?? ?(?? > > > }?z?&j:+v???zZ+??+zf???h???~i???z? ?w?&?)?? f > > > > Hi Bodo, > > > > Minor nit, wasn't blk_get_queue modified to return false on failure? > > > > 09ac46c42946 "block: misc updates to blk_get_queue()" > > > > Just curious what tree this patch was tested against. > > > > Thanks, > > > > -- Joe > > > > Hi Joe, > > we are intensively using SuSE SLES11 (currently kernel 3.0.80-x). At the > moment > I'm not involved with mainline. > > The patch resulted from the hunt for another problem and was accepted by > SuSE. They > told me that this would be relevant for mainline also. So I posted it but > didn't > check mainline for changes :-( > > Thus, here is the third revision of the patch that now is suited for mainline. > Sorry for the noise. > > Thank you. > > Bodo > > --- > > From: Bodo Stroesser > Date: Thu, 14 Nov 2013 14:35:00 +0100 > Subject: [PATCH] sg: fix blk_get_queue usage > > If blk_queue_get() in st_probe fails, disk->queue must not > be set to SDp->request_queue, as that would result in > put_disk() dropping a not taken reference. > > Thus, disk->queue should be set only after a successful > blk_queue_get(). > > 2nd revised patch due to hints from Kai Makisara and Joe Lawrence. > > Now this should be suited for mainline. > > Signed-off-by: Bodo Stroesser > > --- > > --- a/drivers/scsi/st.c 2013-11-15 11:58:39.
Re: PATCH: st.c: Fix blk_get_queue usage
On Thu, 14 Nov 2013 20:22:40 +0100 Bodo Stroesser wrote: > On 14.11.2013, at 20.05, Kai M??kisara (Kolumbus) > wrote: > > > > > On 14.11.2013, at 16.48, Bodo Stroesser wrote: > > > > > Hi, > > > > > > in st_probe(), st.c I stumbled across what I'd call a minor problem. > > > > > > So I'd like to suggest the following patch. > > > > > > Best Regards, > > > Bodo > > > > > > P.S.: Please CC me, I'm not on the list. > > > > > > - > > > > > > > > > From: Bodo Stroesser > > > Date: Thu, 14 Nov 2013 14:35:00 +0100 > > > Subject: [PATCH] sg: fix blk_get_queue usage > > > > > > If blk_queue_get() in st_probe fails, disk->queue must not > > > be set to SDp->request_queue, as that would result in > > > put_disk() dropping a not taken reference. > > > > > > Thus, disk->queue should be set only after a successful > > > blk_queue_get(). > > > > > > Signed-off-by: Bodo Stroesser > > > > > > --- > > > > > > --- a/drivers/scsi/st.c 2013-11-14 14:10:40.0 +0100 > > > +++ b/drivers/scsi/st.c 2013-11-14 14:10:57.0 +0100 > > > @@ -4107,11 +4107,11 @@ > > > kref_init(&tpnt->kref); > > > tpnt->disk = disk; > > > disk->private_data = &tpnt->driver; > > > - disk->queue = SDp->request_queue; > > > /* SCSI tape doesn't register this gendisk via add_disk(). Manually > > >* take queue reference that release_disk() expects. */ > > > > With this patch, blk_get_queue() is not called with the correct argument. > > Maybe change the call to blk_get_queue(SDp->request_queue) ? > > > > > if (blk_get_queue(disk->queue)) > > Yes, thank you. You are obviously right. Below is the revised patch. > Sorry for the mistake. > > Bodo > > > > goto out_put_disk; > > > + disk->queue = SDp->request_queue; > > > tpnt->driver = &st_template; > > > > > > tpnt->device = SDp-- > > > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > > > the body of a message to majord...@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ; > > > > Thanks, > > Kai > > > > From: Bodo Stroesser > Date: Thu, 14 Nov 2013 14:35:00 +0100 > Subject: [PATCH] sg: fix blk_get_queue usage > > If blk_queue_get() in st_probe fails, disk->queue must not > be set to SDp->request_queue, as that would result in > put_disk() dropping a not taken reference. > > Thus, disk->queue should be set only after a successful > blk_queue_get(). > Revised patch due to a hint from Kai Makisara. > > Signed-off-by: Bodo Stroesser > > --- > > --- a/drivers/scsi/st.c 2013-11-14 14:10:40.0 +0100 > +++ b/drivers/scsi/st.c 2013-11-14 14:10:57.0 +0100 > @@ -4107,11 +4107,11 @@ > kref_init(&tpnt->kref); > tpnt->disk = disk; > disk->private_data = &tpnt->driver; > - disk->queue = SDp->request_queue; > /* SCSI tape doesn't register this gendisk via add_disk(). Manually >* take queue reference that release_disk() expects. */ > - if (blk_get_queue(disk->queue)) > + if (blk_get_queue(SDp->request_queue)) > goto out_put_disk; > + disk->queue = SDp->request_queue; > tpnt->driver = &st_template; > > tpnt->device = SDp; > ??{.n?+???+%??lzwm??b??r??zX???(??}?z?&j:+v???zZ+??+zf???h???~i???z??w?&?)??f Hi Bodo, Minor nit, wasn't blk_get_queue modified to return false on failure? 09ac46c42946 "block: misc updates to blk_get_queue()" Just curious what tree this patch was tested against. Thanks, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html