Re: General protection fault with use_blk_mq=1.
On 03/29/2018 02:12 AM, Zephaniah E. Loss-Cutler-Hull wrote: > On 03/28/2018 10:13 PM, Paolo Valente wrote: >> In addition, the outcome of your attempt without >> CONFIG_DEBUG_BLK_CGROUP would give us useful bisection information: >> - if no failure occurs, then the issue is likely to be confined in >> that debugging code (which, on the bright side, is likely to be of >> occasional interest, for only a handful of developers) >> - if the issue still shows up, then we may have new hints on this odd >> failure >> >> Finally, consider that this issue has been reported to disappear from >> 4.16 [1], and, as a plus, that the service quality of BFQ had a >> further boost exactly from 4.16. > > I look forward to that either way then. >> >> Looking forward to your feedback, in case you try BFQ without >> CONFIG_DEBUG_BLK_CGROUP, > > I'm running that now, judging from the past if it survives until > tomorrow evening then we're good, so I should hopefully know in the next > day. Alright, I now have an uptime of over 20 hours, with scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=1 I did upgrade from 4.15.13 to 4.15.14 in the process, but a quick look at the changes doesn't have anything jump out at me as impacting this. So I'm reasonably comfortable stating that disabling CONFIG_DEBUG_BLK_CGROUP was sufficient to render this stable. Regards, Zephaniah E. Loss-Cutler-Hull.
Re: [PATCH] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk
On Wed, 2018-03-28 at 16:43 +0800, Wen Yang wrote: > There would be so many same lines printed by frequent prink if one > disk went wrong, like, > [ 546.185242] sd 0:1:0:0: rejecting I/O to offline device > [ 546.185258] sd 0:1:0:0: rejecting I/O to offline device > [ 546.185280] sd 0:1:0:0: rejecting I/O to offline device > [ 546.185307] sd 0:1:0:0: rejecting I/O to offline device > [ 546.185334] sd 0:1:0:0: rejecting I/O to offline device > [ 546.185364] sd 0:1:0:0: rejecting I/O to offline device > [ 546.185390] sd 0:1:0:0: rejecting I/O to offline device > [ 546.185410] sd 0:1:0:0: rejecting I/O to offline device > For slow serial console, the frequent prink may block the printk, and > if spin_lock was acquired before the printk like in scsi_request_fn, > that could trigger the watchdog. > > Relative disscussion can be found here, Related? Anyway: Reviewed-by: Bart Van Assche
Re: [PATCH v5 7/7] scsi_io_completion convert BUGs to WARNs
On Thu, 2018-03-29 at 01:47 -0400, Douglas Gilbert wrote: > The scsi_io_completion function contains three BUG() and BUG_ON() calls. > Replace them with WARN variants. Reviewed-by: Bart Van Assche
Re: [PATCH v5 6/7] scsi_io_completion hints on fastpatch
On Thu, 2018-03-29 at 01:47 -0400, Douglas Gilbert wrote: > - /* no bidi support for !blk_rq_is_passthrough yet */ > + /* no bidi support yet, other than in pass-through */ > > [ ... ] > > - /* Kill remainder if no retries. */ > + /* Kill remainder if no retries. */ If you would repost this patch series, please fold these two changes into a previous patch since these changes are not relately to branch optimizing. Anyway: Reviewed-by: Bart Van Assche
Re: [PATCH v5 5/7] scsi_io_completion_reprep helper added
On Thu, 2018-03-29 at 01:47 -0400, Douglas Gilbert wrote: > Since the action "reprep" is called from two places, rather than repeat > the code, make a new scsi_io_completion helper with "reprep" as its > suffix. Reviewed-by: Bart Van Assche
Re: [PATCH v5 4/7] scsi_io_completion_action helper added
On Thu, 2018-03-29 at 01:46 -0400, Douglas Gilbert wrote: > + /* > + * Unprep the request and put it back at the head of the > + * queue. A new command will be prepared and issued. > + * This block is the same as case ACTION_REPREP in > + * scsi_io_completion_action() above. >*/ > if (q->mq_ops) { > scsi_mq_requeue_cmd(cmd); > @@ -1061,16 +1090,8 @@ void scsi_io_completion(struct scsi_cmnd *cmd, > unsigned int good_bytes) > scsi_release_buffers(cmd); > scsi_requeue_command(q, cmd); > } Although I'm not happy about this code duplication: Reviewed-by: Bart Van Assche
Re: [PATCH v5 3/7] scsi_io_completion_nz_result function added
On Thu, 2018-03-29 at 01:46 -0400, Douglas Gilbert wrote: > Break out several intertwined paths when cmd->result is non zero and > place them in the scsi_io_completion_nz_result helper function. The > logic is not changed. Reviewed-by: Bart Van Assche
Re: [PATCH v5 2/7] scsi_io_completion rename variables
On Thu, 2018-03-29 at 01:46 -0400, Douglas Gilbert wrote: > Change and add some variable names, adjust some associated comments > for clarity. Correct some misleading comments. Reviewed-by: Bart Van Assche
Re: [PATCH v3 00/41] cxlflash: OCXL transport support and miscellaneous fixes
> On Mar 28, 2018, at 4:34 PM, Martin K. Petersen > wrote: > > > Uma, > >> This patch series adds OCXL support to the cxlflash driver. With this >> support, new devices using the OCXL transport will be supported by the >> cxlflash driver along with the existing CXL devices. An effort is made >> to keep this transport specific function independent of the existing >> core driver that communicates with the AFU. >> >> The first three patches contain a minor fix and staging improvements. >> >> This series is intended for 4.17 and is bisectable. > > Something this big really needs to be submitted around rc2/rc3. It's way > too late in the cycle to ensure proper zeroday coverage for all these > commits. > > I have started a 4.18/scsi-queue branch to hold this series for now. > The 4.18 branch will be rebased once 4.17rc1 is out in a few weeks. Your > changes won't show up in for-next until then either. I did send the first version of this series just after 4.16rc2 but the reviews took a while and the V3 is cutting close unless we do have an rc8 for 4.16 :) I understand what you are saying and agree with you. Thanks for keeping me posted.
Vmbus series...
Please feel free put it in via the scsi tree: Acked-by: David S. Miller
Re: [PATCH v2 net-next] qed*: Utilize FW 8.33.11.0
From: Michal Kalderon Date: Wed, 28 Mar 2018 11:42:16 +0300 > This FW contains several fixes and features > > RDMA Features > - SRQ support > - XRC support > - Memory window support > - RDMA low latency queue support > - RDMA bonding support > > RDMA bug fixes > - RDMA remote invalidate during retransmit fix > - iWARP MPA connect interop issue with RTR fix > - iWARP Legacy DPM support > - Fix MPA reject flow > - iWARP error handling > - RQ WQE validation checks > > MISC > - Fix some HSI types endianity > - New Restriction: vlan insertion in core_tx_bd_data can't be set > for LB packets > > ETH > - HW QoS offload support > - Fix vlan, dcb and sriov flow of VF sending a packet with > inband VLAN tag instead of default VLAN > - Allow GRE version 1 offloads in RX flow > - Allow VXLAN steering > > iSCSI / FcoE > - Fix bd availability checking flow > - Support 256th sge proerly in iscsi/fcoe retransmit > - Performance improvement > - Fix handle iSCSI command arrival with AHS and with immediate > - Fix ipv6 traffic class configuration > > DEBUG > - Update debug utilities > > Signed-off-by: Michal Kalderon > Signed-off-by: Tomer Tayar > Signed-off-by: Manish Rangankar > Signed-off-by: Ariel Elior > --- > Changes from v1 > Remove version bump and qedr module version addition Applied.
Re: Multi-Actuator SAS HDD First Look
On 2018-03-26 11:08 AM, Hannes Reinecke wrote: On Fri, 23 Mar 2018 08:57:12 -0600 Tim Walker wrote: Seagate announced their split actuator SAS drive, which will probably require some kernel changes for full support. It's targeted at cloud provider JBODs and RAID. Here are some of the drive's architectural points. Since the two LUNs share many common components (e.g. spindle) Seagate allocated some SCSI operations to be LUN specific and some to affect the entire device, that is, both LUNs. 1. Two LUNs, 0 & 1, each with independent lba space, and each connected to an independent read channel, actuator, and set of heads. 2. Each actuator addresses 1/2 of the media - no media is shared across the actuators. They seek independently. 3. One World Wide Name (WWN) is assigned to the port for device address. Each Logical Unit has a separate World Wide Name for identification in VPD page. 4. 128 deep command queue, shared across both LUNs 5. Each LUN can pull commands from the queue independently, so they can implement their own sorting and optimization. 6. Ordered tag attribute causes the command to be ordered across both Logical Units 7. Head of Queue attribute causes the command to be ordered with respect to a single Logical Unit 8. Mode pages are device-based (shared across both Logical Units) 9. Log pages are device-based. 10. Inquiry VPD pages (with minor exceptions) are device based. 11. Device health features (SMART, etc) are device based Seagate wants the multi-actuator design to integrate into the stack as painlessly as possible.The interface design is still in the early stages, so I am gathering requirements and recommendations, and also providing any information necessary to help scope integrating a multi-LUN device into the MQ stack. So, I am soliciting any pertinent feedback including: 1. Painful incompatibilities between the Seagate proposal and current MQ architecture 2. Linux changes needed 3. Drive interface changes needed 4. Anything else I may have overlooked So far it looks okay; just make sure to have VPD page 0x83 entries properly associated. To all intents and purposes these devices seem to look like 'normal' devices with two LUNs; nothing special with that. Real question would be in which areas those devices differentiate from the two indepdendent LUN scenario. There might be issues with per-device informations like SMART etc; ideally they are available from _both_ LUNs. Otherwise they'll show up as blank from one LUN, causing consternation with any management software. Further to this point, some types of damage, such as to a head or (one side of) a platter would degrade one LU, possibly making it unusable for storage, while the other side (and the other LU) would be fine. I'm curious how you plan to implement the START STOP UNIT command. If one side of the platter is in "start" state and the other side in "stop" state, will the heads on the stopped side be parked (if they can be parked)? And if both sides (LUs) are stopped I would hope you really would spin down the disk, then if either is started the disk would be spun up. Getting T10 to add a bit to the Block Device Characteristics VPD page might be helpful. It could be a "shares a spindle" bit with the other LUs identified in the SCSI Ports VPD page. Such an indication would help an enclosure find out if a Multi-Actuator disk was really spun down and ready to be removed or replaced. I think SES and smartmontools may need tweaks to handle this new device model sensibly. Doug Gilbert
Re: [PATCH] scsi: qla2xxx: Fix small memory leak in qla2x00_probe_one on probe failure
Hi Bill, > On Mar 23, 2018, at 7:37 AM, Bill Kuzeja wrote: > > The code that fixes the crashes in the following commit introduced a > small memory leak: > > commit 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on > probe failure") > > Fixing this requires a bit of reworking, which I've explained. Also provide > some code cleanup. > > Fixes: 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on > probe failure") > Signed-off-by: Bill Kuzeja > > --- > > There is a small window in qla2x00_probe_one where if qla2x00_alloc_queues > fails, we end up never freeing req and rsp and leak 0xc0 and 0xc8 bytes > respectively (the sizes of req and rsp). > > I originally put in checks to test for this condition which were based on > the incorrect assumption that if ha->rsp_q_map and ha->req_q_map were > allocated, then rsp and req were allocated as well. This is incorrect. > There is a window between these allocations: > > ret = qla2x00_mem_alloc(ha, req_length, rsp_length, &req, &rsp); >goto probe_hw_failed; > > [if successful, both rsp and req allocated] > > base_vha = qla2x00_create_host(sht, ha); >goto probe_hw_failed; > > ret = qla2x00_request_irqs(ha, rsp); >goto probe_failed; > > if (qla2x00_alloc_queues(ha, req, rsp)) { >goto probe_failed; > > [if successful, now ha->rsp_q_map and ha->req_q_map allocated] > > To simplify this, we should just set req and rsp to NULL after we free > them. Sounds simple enough? The problem is that req and rsp are pointers > defined in the qla2x00_probe_one and they are not always passed by > reference to the routines that free them. > > Here are paths which can free req and rsp: > > PATH 1: > qla2x00_probe_one > ret = qla2x00_mem_alloc(ha, req_length, rsp_length, &req, &rsp); > [req and rsp are passed by reference, but if this fails, we currently >do not NULL out req and rsp. Easily fixed] > > PATH 2: > qla2x00_probe_one > failing in qla2x00_request_irqs or qla2x00_alloc_queues > probe_failed: > qla2x00_free_device(base_vha); >qla2x00_free_req_que(ha, req) >qla2x00_free_rsp_que(ha, rsp) > > PATH 3: > qla2x00_probe_one: > failing in qla2x00_mem_alloc or qla2x00_create_host > probe_hw_failed: > qla2x00_free_req_que(ha, req) > qla2x00_free_rsp_que(ha, rsp) > > PATH 1: This should currently work, but it doesn't because rsp and rsp > are not set to NULL in qla2x00_mem_alloc. Easily remedied. > > PATH 2: req and rsp aren't passed in at all to qla2x00_free_device but are > derived from ha->req_q_map[0] and ha->rsp_q_map[0]. These are only set up > if qla2x00_alloc_queues succeeds. > > In qla2x00_free_queues, we are protected from crashing if these don't > exist because req_qid_map and rsp_qid_map are only set on their > allocation. We are guarded in this way: > >for (cnt = 0; cnt < ha->max_req_queues; cnt++) { >if (!test_bit(cnt, ha->req_qid_map)) >continue; > > PATH 3: This works. We haven't freed req or rsp yet (or they were never > allocated if qla2x00_mem_alloc failed), so we'll attempt to free them > here. > > To summarize, there are a few small changes to make this work correctly and > (and for some cleanup): > > 1) (For PATH 1) Set *rsp and *req to NULL in case of failure in > qla2x00_mem_alloc so these are correctly set to NULL back in > qla2x00_probe_one > > 2) After jumping to probe_failed: and calling qla2x00_free_device, > explicitly set rsp and req to NULL so further calls with these pointers > do not crash, i.e. the free queue calls in the probe_hw_failed section > we fall through to. > > 3) Fix return code check in the call to qla2x00_alloc_queues. We currently > drop the return code on the floor. The probe fails but the caller of the > probe doesn't have an error code, so it attaches to pci. This can result > in a crash on module shutdown. > > 4) Remove unnecessary NULL checks in qla2x00_free_req_que, > qla2x00_free_rsp_que, and the egregious NULL checks before kfrees and > vfrees in qla2x00_mem_free. > > I tested this out running a scenario where the card breaks at various > times during initialization. I made sure I forced every error exit path > in qla2x00_probe_one. > > --- > drivers/scsi/qla2xxx/qla_os.c | 44 +-- > 1 file changed, 21 insertions(+), 23 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c > index 5c5dcca4..f70d7eb 100644 > --- a/drivers/scsi/qla2xxx/qla_os.c > +++ b/drivers/scsi/qla2xxx/qla_os.c > @@ -471,9 +471,6 @@ static int qla2x00_alloc_queues(struct qla_hw_data *ha, > struct req_que *req, > > static void qla2x00_free_req_que(struct qla_hw_data *ha, struct req_que *req) > { > - if (!ha->req_q_map) > - return; > - > if (IS_QLAFX00(ha)) { > if (req && req->ring_fx00) >
Re: [PATCH] qla2xxx: Avoid double completion of abort command
Hi Ben, > On Mar 20, 2018, at 2:05 PM, Ben Hutchings > wrote: > > qla2x00_tmf_sp_done() now deletes the timer that will run > qla2x00_tmf_iocb_timeout(), but doesn't check whether the timer > already expired. Check the return value from del_timer() to avoid > calling complete() a second time. > > Fixes: 4440e46d5db7 ("[SCSI] qla2xxx: Add IOCB Abort command asynchronous > ...") > Fixes: 1514839b3664 ("scsi: qla2xxx: Fix NULL pointer crash due to active > ...") > Signed-off-by: Ben Hutchings > --- > drivers/scsi/qla2xxx/qla_init.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c > index 00329dda6179..b0aa8cc96f0f 100644 > --- a/drivers/scsi/qla2xxx/qla_init.c > +++ b/drivers/scsi/qla2xxx/qla_init.c > @@ -1546,8 +1546,8 @@ qla24xx_abort_sp_done(void *ptr, int res) > srb_t *sp = ptr; > struct srb_iocb *abt = &sp->u.iocb_cmd; > > - del_timer(&sp->u.iocb_cmd.timer); > - complete(&abt->u.abt.comp); > + if (del_timer(&sp->u.iocb_cmd.timer)) > + complete(&abt->u.abt.comp); > } > > int > -- > 2.15.0.rc0 > Looks good. Acked-by: Himanshu Madhani Thanks, - Himanshu
Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation
Hi Ben, > On Mar 21, 2018, at 10:58 AM, Ben Hutchings > wrote: > > On Wed, 2018-03-21 at 17:55 +, Madhani, Himanshu wrote: >> Hi Ben, >> >>> On Mar 21, 2018, at 10:45 AM, Ben Hutchings >>> wrote: >>> >>> On Wed, 2018-03-21 at 17:17 +, Madhani, Himanshu wrote: Hi Ben, > On Mar 20, 2018, at 2:36 PM, Ben Hutchings > wrote: > > qla2x00_init_timer() calls add_timer() on the iocb timeout timer, > which means the timeout function pointer and any data that the > function depends on must be initialised beforehand. > > Move this initialisation before each call to qla2x00_init_timer(). In > some cases qla2x00_init_timer() initialises a completion structure > needed by the timeout function, so move the call to add_timer() after > that. >>> >>> [...] What motivated this patch? are you debugging any crash which was helped by moving code around? >>> >>> I saw the recent fix that added a call to del_timer(), noticed the lack >>> of synchronisation and then kept looking further. >>> What I see from this patch is that its moving iocb.cmd.timeout field before qla2x00_init_timer(), which are completely different from each other. >>> >>> How are they "completely different"? qla2x00_init_timer() starts a >>> timer that will call qla2x00_sp_timeout(), which in turn uses the >>> iocb_cmd.timeout function pointer. We should not assume that any >>> initialisation code after the call to add_timer() will run before the >>> timer expires, since the scheduler might run other tasks for the whole >>> time. It's unlikely but not impossible. >>> >> >> I see. I used “completely different” due to the lack of better wording. >> >> I wanted to get context and understand if there was any issue that would >> have helped with these changes. >> >> your explanation does help and I agree that there is window where timer() >> will >> pop before timeout is initialized. >> >> Let me put this patch in our test cycle for couple days and will respond on >> this one > > Thanks. > > Ben. > > -- > Ben Hutchings > Software Developer, Codethink Ltd. Thanks again for the patch. Testing was successful. Acked-by: Himanshu Madhani Thanks, - Himanshu
Re: Recent kernels fail to boot on POWER8 with multipath SCSI
On Thu, Mar 29 2018 at 4:39am -0400, Paul Mackerras wrote: > Since commit 8d47e65948dd ("dm mpath: remove unnecessary NVMe > branching in favor of scsi_dh checks", 2018-03-05), upstream kernels > fail to boot on my POWER8 box which has multipath SCSI disks. The > host adapters are IPR and the userspace is CentOS 7. > > Before that commit, the system booted fine. After that, it fails with > a NULL pointer dereference until we get to commits c37366742baa ("dm > mpath: fix uninitialized 'pg_init_wait' waitqueue_head NULL pointer", > 2018-03-12) and e8f74a0f0011 ("dm mpath: eliminate need to use > scsi_device_from_queue", 2018-03-12), both of which say they fix > 8d47e65948dd. From commit e8f74a0f0011 on, I see some warnings from > the workqueue code, and the system does not find its root disk. > > Here are some annotated logs. > > The crash from e8f74a0f0011 until c37366742baa looks like this: > After e8f74a0f0011, the logs look like this: > > [ 38.042638] device-mapper: multipath service-time: version 0.3.0 loaded > [ 38.049825] WARNING: CPU: 16 PID: 1414 at > /home/paulus/kernel/kvm/kernel/workqueue.c:1513 > __queue_delayed_work+0x10c/0x130 > [ 38.049936] Modules linked in: dm_service_time lpfc(+) radeon(+) nvme_fc > nvme_fabrics nvme_core scsi_transport_fc i2c_algo_bit drm_kms_helper tg3 > syscopyarea sysfillrect sysimgblt fb_sys_fops ttm dm_multipath drm > drm_panel_orientation_quirks i2c_core cxgb3(+) mlx5_core(+) mdio > [ 38.050176] CPU: 16 PID: 1414 Comm: systemd-udevd Not tainted > 4.16.0-rc5-kvm+ #117 > [ 38.050264] NIP: c012bcac LR: c012bd94 CTR: > c012bcd0 > [ 38.050347] REGS: c00ff3df3800 TRAP: 0700 Not tainted > (4.16.0-rc5-kvm+) > [ 38.050422] MSR: 90029033 CR: 24022884 > XER: > [ 38.050475] tg3 0003:05:00.3 enP3p5s0f3: renamed from eth3 > [ 38.050504] CFAR: c012bbf8 SOFTE: 1 > [ 38.050504] GPR00: c012bd94 c00ff3df3a80 c149a100 > c00feef30c50 > [ 38.050504] GPR04: c00fdb660800 c00feef30c30 > > [ 38.050504] GPR08: 0001 c012bb50 > dbe25130 > [ 38.050504] GPR12: c012bcd0 cfd4b000 000112942aa0 > > [ 38.050504] GPR16: 000112940fe0 000112941048 000112980a00 > 000112942b00 > [ 38.050504] GPR20: 000112980030 01003cdc3059 01003cdc3199 > 0007 > [ 38.050504] GPR24: c00ff3df3c88 > 0400 > [ 38.050504] GPR28: 0400 c00fdb660800 > c00feef30c30 > [ 38.051216] NIP [c012bcac] __queue_delayed_work+0x10c/0x130 > [ 38.051284] LR [c012bd94] queue_delayed_work_on+0xc4/0x100 > [ 38.051350] Call Trace: > [ 38.051381] [c00ff3df3a80] [c00ff14e0180] 0xc00ff14e0180 > (unreliable) > [ 38.051462] [c00ff3df3ac0] [c012bd94] > queue_delayed_work_on+0xc4/0x100 > [ 38.051541] [c00ff3df3b10] [dbe20c98] > __pg_init_all_paths+0x108/0x190 [dm_multipath] > [ 38.051635] [c00ff3df3b50] [dbe20d68] > pg_init_all_paths+0x48/0x80 [dm_multipath] > [ 38.051729] [c00ff3df3b90] [dbe22b18] > multipath_prepare_ioctl+0x148/0x160 [dm_multipath] > [ 38.051825] [c00ff3df3bd0] [c0914e20] > dm_get_bdev_for_ioctl+0x120/0x1b0 > [ 38.051908] [c00ff3df3c20] [c091525c] dm_blk_ioctl+0x4c/0x110 > [ 38.051977] [c00ff3df3ca0] [c05a62c8] blkdev_ioctl+0x5f8/0xbd0 > [ 38.052046] [c00ff3df3d00] [c03e8b74] block_ioctl+0x64/0xd0 > [ 38.052115] [c00ff3df3d40] [c03a87b8] do_vfs_ioctl+0xd8/0x8c0 > [ 38.052184] [c00ff3df3de0] [c03a9074] SyS_ioctl+0xd4/0x130 > [ 38.052254] [c00ff3df3e30] [c000b8e0] system_call+0x58/0x6c > [ 38.052320] Instruction dump: > [ 38.052359] 38210040 e8010010 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6 > 4bfff810 > [ 38.052438] 0fe0 4b78 0fe0 4b60 <0fe0> 4b4c > 0fe0 4b30 > [ 38.052538] ---[ end trace a130dfb7aec21e53 ]--- > [ 38.052662] WARNING: CPU: 16 PID: 1414 at > /home/paulus/kernel/kvm/kernel/workqueue.c:1515 > __queue_delayed_work+0xfc/0x130 > [ 38.052912] Modules linked in: dm_service_time lpfc(+) radeon(+) nvme_fc > nvme_fabrics nvme_core scsi_transport_fc i2c_algo_bit drm_kms_helper tg3 > syscopyarea sysfillrect sysimgblt fb_sys_fops ttm dm_multipath drm > drm_panel_orientation_quirks i2c_core cxgb3(+) mlx5_core(+) mdio > [ 38.053563] CPU: 16 PID: 1414 Comm: systemd-udevd Tainted: GW > 4.16.0-rc5-kvm+ #117 > [ 38.053788] NIP: c012bc9c LR: c012bd94 CTR: > c012bcd0 > [ 38.053939] REGS: c00ff3df3800 TRAP: 0700 Tainted: GW > (4.16.0-rc5-kvm+) > [ 38.054104] MSR: 90029033 CR: 24022884 > XER: > [ 38.05
Re: [PATCH] scsi: cxgb4i: potential array overflow in t4_uld_rx_handler()
On Wed, Mar 28, 2018 at 08:30:37PM +0300, Dan Carpenter wrote: > On Wed, Mar 28, 2018 at 09:14:25PM +0530, Varun Prakash wrote: > > On Wed, Mar 21, 2018 at 09:12:00PM -0400, Martin K. Petersen wrote: > > > > > > > > > > > On Wed, Nov 29, 2017 at 02:42:20PM +0300, Dan Carpenter wrote: > > > >> The story is that Smatch marks skb->data as untrusted and so it > > > >> complains about this code: > > > >> > > > >>drivers/scsi/cxgbi/cxgb4i/cxgb4i.c:2111 t4_uld_rx_handler() > > > >>error: buffer overflow 'cxgb4i_cplhandlers' 239 <= 255. > > > >> > > > >> I don't know the code very well, but it looks like a reasonable warning > > > >> message. Let's address it by adding a sanity check to make sure "opc" > > > >> is within bounds. > > > >> > > > >> Fixes: bbc02c7e9d34 ("cxgb4: Add register, message, and FW > > > >> definitions") > > > >> Signed-off-by: Dan Carpenter > > > >> > > > >> diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c > > > >> b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c > > > >> index 266eddf17a99..94b2d5660a07 100644 > > > >> --- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c > > > >> +++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c > > > >> @@ -2108,12 +2108,12 @@ static int t4_uld_rx_handler(void *handle, > > > >> const __be64 *rsp, > > > >>log_debug(1 << CXGBI_DBG_TOE, > > > >>"cdev %p, opcode 0x%x(0x%x,0x%x), skb %p.\n", > > > >> cdev, opc, rpl->ot.opcode_tid, > > > >> ntohl(rpl->ot.opcode_tid), skb); > > > >> - if (cxgb4i_cplhandlers[opc]) > > > >> - cxgb4i_cplhandlers[opc](cdev, skb); > > > >> - else { > > > >> + if (opc >= ARRAY_SIZE(cxgb4i_cplhandlers) || > > > >> !cxgb4i_cplhandlers[opc]) { > > > >>pr_err("No handler for opcode 0x%x.\n", opc); > > > >>__kfree_skb(skb); > > > >> + return 0; > > > >>} > > > >> + cxgb4i_cplhandlers[opc](cdev, skb); > > > >>return 0; > > > >> nomem: > > > >>log_debug(1 << CXGBI_DBG_TOE, "OOM bailing out.\n"); > > > > > > > > > > > > This check is not necessary but we can add it to avoid warning. > > Is the reason it's not necessary, because the skb->data comes from the > firmware and we trust it? Yes, opc is an opcode of a cpl cmd which is generated by hardware or firmware, driver should never get opc >= ARRAY_SIZE(cxgb4i_cplhandler) or NUM_CPL_CMDS(239). > The v5 declares the array as having 256 > elements which also solves this warning. And cxgbit_uld_lro_rx_handler() > has a bounds check. So it seems pretty normal to prevent the array > overflow by force as well as by trust. > > > The commit mentioned in "Fixes" is not correct, this code was added in > > commit > > "7b36b6e [SCSI] cxgb4i v5: iscsi driver" > > Yeah. You're right. I can resend with an updated commit message. > > regards, > dan carpenter >
aacraid code passes GFP_DMA32 to kmalloc this will not work
Hi All, Since I made the same mistake myself I've done a quick grep for GFP_DMA32 in the kernel and drivers/scsi/aacraid/commctrl.c came up as a result of this grep, it does: p = kmalloc(sg_count[i], GFP_KERNEL|GFP_DMA32); But kmalloc always returns memory from the normal memory-zone, if you need memory from a specific memory-zone like the DMA32 zone, you must use the dma allocation functions (which from a quick glance at the code seems appropriate here) or directly call alloc_page or __get_free_page. Regards, Hans
[PATCH scsi-next] scsi: mvumi: Using module_pci_driver.
Remove boilerplate code by using macro module_pci_driver. Signed-off-by: YueHaibing --- drivers/scsi/mvumi.c | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/drivers/scsi/mvumi.c b/drivers/scsi/mvumi.c index fe97401..2e6fd86 100644 --- a/drivers/scsi/mvumi.c +++ b/drivers/scsi/mvumi.c @@ -2693,22 +2693,4 @@ static struct pci_driver mvumi_pci_driver = { #endif }; -/** - * mvumi_init - Driver load entry point - */ -static int __init mvumi_init(void) -{ - return pci_register_driver(&mvumi_pci_driver); -} - -/** - * mvumi_exit - Driver unload entry point - */ -static void __exit mvumi_exit(void) -{ - - pci_unregister_driver(&mvumi_pci_driver); -} - -module_init(mvumi_init); -module_exit(mvumi_exit); +module_pci_driver(mvumi_pci_driver); -- 2.7.0
Re: General protection fault with use_blk_mq=1.
On 03/28/2018 10:13 PM, Paolo Valente wrote: > > >> Il giorno 29 mar 2018, alle ore 05:22, Jens Axboe ha >> scritto: >> >> On 3/28/18 9:13 PM, Zephaniah E. Loss-Cutler-Hull wrote: >>> On 03/28/2018 06:02 PM, Jens Axboe wrote: On 3/28/18 5:03 PM, Zephaniah E. Loss-Cutler-Hull wrote: > I am not subscribed to any of the lists on the To list here, please CC > me on any replies. > > I am encountering a fairly consistent crash anywhere from 15 minutes to > 12 hours after boot with scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=1> > The crash looks like: > >>> > > Looking through the code, I'd guess that this is dying inside > blkg_rwstat_add, which calls percpu_counter_add_batch, which is what RIP > is pointing at. Leaving the whole thing here for Paolo - it's crashing off insertion of a request coming out of SG_IO. Don't think we've seen this BFQ failure case before. You can mitigate this by switching the scsi-mq devices to mq-deadline instead. >>> >>> I'm thinking that I should also be able to mitigate it by disabling >>> CONFIG_DEBUG_BLK_CGROUP. >>> >>> That should remove that entire chunk of code. >>> >>> Of course, that won't help if this is actually a symptom of a bigger >>> problem. >> >> Yes, it's not a given that it will fully mask the issue at hand. But >> turning off BFQ has a much higher chance of working for you. >> >> This time actually CC'ing Paolo. >> > > Hi Zephaniah, > if you are actually interested in the benefits of BFQ (low latency, > high responsiveness, fairness, ...) then it may be worth to try what > you yourself suggest: disabling CONFIG_DEBUG_BLK_CGROUP. Also because > this option activates the heavy computation of debug cgroup statistics, > which probably you don't use. I definitely am. > > In addition, the outcome of your attempt without > CONFIG_DEBUG_BLK_CGROUP would give us useful bisection information: > - if no failure occurs, then the issue is likely to be confined in > that debugging code (which, on the bright side, is likely to be of > occasional interest, for only a handful of developers) > - if the issue still shows up, then we may have new hints on this odd > failure > > Finally, consider that this issue has been reported to disappear from > 4.16 [1], and, as a plus, that the service quality of BFQ had a > further boost exactly from 4.16. I look forward to that either way then. > > Looking forward to your feedback, in case you try BFQ without > CONFIG_DEBUG_BLK_CGROUP, I'm running that now, judging from the past if it survives until tomorrow evening then we're good, so I should hopefully know in the next day. Thank you, Zephaniah E. Loss-Cutler-Hull. > Paolo > > [1] https://www.spinics.net/lists/linux-block/msg21422.html > >> >> -- >> Jens Axboe > signature.asc Description: OpenPGP digital signature
Recent kernels fail to boot on POWER8 with multipath SCSI
Since commit 8d47e65948dd ("dm mpath: remove unnecessary NVMe branching in favor of scsi_dh checks", 2018-03-05), upstream kernels fail to boot on my POWER8 box which has multipath SCSI disks. The host adapters are IPR and the userspace is CentOS 7. Before that commit, the system booted fine. After that, it fails with a NULL pointer dereference until we get to commits c37366742baa ("dm mpath: fix uninitialized 'pg_init_wait' waitqueue_head NULL pointer", 2018-03-12) and e8f74a0f0011 ("dm mpath: eliminate need to use scsi_device_from_queue", 2018-03-12), both of which say they fix 8d47e65948dd. From commit e8f74a0f0011 on, I see some warnings from the workqueue code, and the system does not find its root disk. Here are some annotated logs. The crash from e8f74a0f0011 until c37366742baa looks like this: [ 38.141575] device-mapper: multipath service-time: version 0.3.0 loaded [ 38.16] alua: device handler registered [ 38.144695] sd 0:2:4:0: alua: supports implicit TPGS [ 38.144808] sd 0:2:4:0: alua: device t10.IBM IPR-0 5EC28CA0 port group c4ab rel port c4ab [ 38.144988] sd 1:2:4:0: alua: supports implicit TPGS [ 38.145089] sd 1:2:4:0: alua: device t10.IBM IPR-0 5EC28CA0 port group c4a6 rel port c4a6 [ 38.159134] sd 0:2:4:0: alua: transition timeout set to 60 seconds [ 38.159244] sd 0:2:4:0: alua: port group c4ab state A preferred supports TOlUSNA [ 38.159321] sd 1:2:4:0: alua: transition timeout set to 60 seconds [ 38.159382] sd 1:2:4:0: alua: port group c4a6 state N non-preferred supports TOlUSNA [ 38.159521] sd 0:2:4:0: alua: port group c4ab state A preferred supports TOlUSNA [ 38.159598] Unable to handle kernel paging request for data at address 0x [ 38.159669] Faulting instruction address: 0xc016e9e0 [ 38.159731] Oops: Kernel access of bad area, sig: 11 [#1] [ 38.159779] LE SMP NR_CPUS=1024 NUMA PowerNV [ 38.159829] Modules linked in: scsi_dh_alua dm_service_time radeon(+) lpfc(+) nvme_fc nvme_fabrics nvme_core scsi_transport_fc i2c_algo_bit tg3 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm dm_multipath drm drm_panel_orientation_quirks i2c_core cxgb3(+) mlx5_core(+) mdio [ 38.160067] CPU: 66 PID: 409 Comm: kworker/66:0 Not tainted 4.16.0-rc5-kvm #118 [ 38.160142] Workqueue: kaluad alua_rtpg_work [scsi_dh_alua] [ 38.160190] NIP: c016e9e0 LR: c016ec4c CTR: c016eca0 [ 38.160261] REGS: c00ff3597720 TRAP: 0300 Not tainted (4.16.0-rc5-kvm) [ 38.160332] MSR: 90009033 CR: 22008024 XER: 2000 [ 38.160408] CFAR: c000884c DAR: DSISR: 4000 SOFTE: 1 [ 38.160408] GPR00: c016ec4c c00ff35979a0 c149a100 c00fd9a10750 [ 38.160408] GPR04: 0003 0001 [ 38.160408] GPR08: c00ff3597a30 dbdc52a0 [ 38.160408] GPR12: 2200 cfd6d600 c00fe0ec31c0 c00fe0ec3000 [ 38.160408] GPR16: dd962be8 dd962c10 c00fe0ec2e00 [ 38.160408] GPR20: 0001 c00fe0ec2e38 c00ff3597a30 0001 [ 38.160408] GPR24: 0003 [ 38.160408] GPR28: c00fd9a10758 0001 ffe8 c00fd9a10750 [ 38.161035] NIP [c016e9e0] __wake_up_common+0x90/0x240 [ 38.161096] LR [c016ec4c] __wake_up_common_lock+0xbc/0x110 [ 38.161155] Call Trace: [ 38.161182] [c00ff35979a0] [c0e32570] num_spec.62125+0x1c6ed4/0x21e824 (unreliable) [ 38.161266] [c00ff3597a10] [c016ec4c] __wake_up_common_lock+0xbc/0x110 [ 38.161340] [c00ff3597aa0] [dbdc34d8] pg_init_done+0x1b8/0x330 [dm_multipath] [ 38.161411] [c00ff3597b40] [dd9604c4] alua_rtpg_work+0x194/0xeb0 [scsi_dh_alua] [ 38.161486] [c00ff3597c90] [c012f9d0] process_one_work+0x1a0/0x470 [ 38.161558] [c00ff3597d30] [c012fd38] worker_thread+0x98/0x560 [ 38.161618] [c00ff3597dc0] [c01387a4] kthread+0x164/0x1b0 [ 38.161680] [c00ff3597e30] [c000bca0] ret_from_kernel_thread+0x5c/0xbc [ 38.161751] Instruction dump: [ 38.161787] 6000 6000 41920010 8136 792af7e3 40820100 ebdc0008 3bdeffe8 [ 38.161861] 3b9c0008 393e0018 7fbc4840 419e00a4 3b60 3be8 481c [ 38.161992] ---[ end trace 17021c073d1d ]--- After e8f74a0f0011, the logs look like this: [ 38.042638] device-mapper: multipath service-time: version 0.3.0 loaded [ 38.049825] WARNING: CPU: 16 PID: 1414 at /home/paulus/kernel/kvm/kernel/workqueue.c:1513 __queue_delayed_work+0x10c/0x130 [ 38.049936] Modules linked in: dm_service_time lpfc(+) radeon(+) nvme_fc nvme_fabrics nvme_core scsi_transport_fc i2c_algo_bit drm_kms_helper tg3 syscopyarea sysfillrect sysimgblt fb_sys_fops ttm dm_mult
Re: [PATCH v5 3/7] scsi_io_completion_nz_result function added
Looks good (as far as I can tell) Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v5 2/7] scsi_io_completion rename variables
Looks good, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850