Re: General protection fault with use_blk_mq=1.

2018-03-29 Thread Zephaniah E. Loss-Cutler-Hull
On 03/29/2018 02:12 AM, Zephaniah E. Loss-Cutler-Hull wrote:
> On 03/28/2018 10:13 PM, Paolo Valente wrote:
>> In addition, the outcome of your attempt without
>> CONFIG_DEBUG_BLK_CGROUP would give us useful bisection information:
>> - if no failure occurs, then the issue is likely to be confined in
>> that debugging code (which, on the bright side, is likely to be of
>> occasional interest, for only a handful of developers)
>> - if the issue still shows up, then we may have new hints on this odd
>> failure
>>
>> Finally, consider that this issue has been reported to disappear from
>> 4.16 [1], and, as a plus, that the service quality of BFQ had a
>> further boost exactly from 4.16.
> 
> I look forward to that either way then.
>>
>> Looking forward to your feedback, in case you try BFQ without
>> CONFIG_DEBUG_BLK_CGROUP,
> 
> I'm running that now, judging from the past if it survives until
> tomorrow evening then we're good, so I should hopefully know in the next
> day.

Alright, I now have an uptime of over 20 hours, with
scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=1

I did upgrade from 4.15.13 to 4.15.14 in the process, but a quick look
at the changes doesn't have anything jump out at me as impacting this.

So I'm reasonably comfortable stating that disabling
CONFIG_DEBUG_BLK_CGROUP was sufficient to render this stable.

Regards,
Zephaniah E. Loss-Cutler-Hull.


Re: [PATCH] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk

2018-03-29 Thread Bart Van Assche
On Wed, 2018-03-28 at 16:43 +0800, Wen Yang wrote:
> There would be so many same lines printed by frequent prink if one 
> disk went wrong, like,
> [  546.185242] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185258] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185280] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185307] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185334] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185364] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185390] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185410] sd 0:1:0:0: rejecting I/O to offline device
> For slow serial console, the frequent prink may block the printk, and 
> if spin_lock was acquired before the printk like in scsi_request_fn,
> that could trigger the watchdog.
> 
> Relative disscussion can be found here,
  
  Related?

Anyway:

Reviewed-by: Bart Van Assche 







Re: [PATCH v5 7/7] scsi_io_completion convert BUGs to WARNs

2018-03-29 Thread Bart Van Assche
On Thu, 2018-03-29 at 01:47 -0400, Douglas Gilbert wrote:
> The scsi_io_completion function contains three BUG() and BUG_ON() calls.
> Replace them with WARN variants.

Reviewed-by: Bart Van Assche 







Re: [PATCH v5 6/7] scsi_io_completion hints on fastpatch

2018-03-29 Thread Bart Van Assche
On Thu, 2018-03-29 at 01:47 -0400, Douglas Gilbert wrote:
> - /* no bidi support for !blk_rq_is_passthrough yet */
> + /* no bidi support yet, other than in pass-through */
> 
> [ ... ]
>
> -  /* Kill remainder if no retries. */
> + /* Kill remainder if no retries. */

If you would repost this patch series, please fold these two changes into a
previous patch since these changes are not relately to branch optimizing.

Anyway:

Reviewed-by: Bart Van Assche 






Re: [PATCH v5 5/7] scsi_io_completion_reprep helper added

2018-03-29 Thread Bart Van Assche
On Thu, 2018-03-29 at 01:47 -0400, Douglas Gilbert wrote:
> Since the action "reprep" is called from two places, rather than repeat
> the code, make a new scsi_io_completion helper with "reprep" as its
> suffix.

Reviewed-by: Bart Van Assche 






Re: [PATCH v5 4/7] scsi_io_completion_action helper added

2018-03-29 Thread Bart Van Assche
On Thu, 2018-03-29 at 01:46 -0400, Douglas Gilbert wrote:
> + /*
> +  * Unprep the request and put it back at the head of the
> +  * queue. A new command will be prepared and issued.
> +  * This block is the same as case ACTION_REPREP in
> +  * scsi_io_completion_action() above.
>*/
>   if (q->mq_ops) {
>   scsi_mq_requeue_cmd(cmd);
> @@ -1061,16 +1090,8 @@ void scsi_io_completion(struct scsi_cmnd *cmd, 
> unsigned int good_bytes)
>   scsi_release_buffers(cmd);
>   scsi_requeue_command(q, cmd);
>   }

Although I'm not happy about this code duplication:

Reviewed-by: Bart Van Assche 






Re: [PATCH v5 3/7] scsi_io_completion_nz_result function added

2018-03-29 Thread Bart Van Assche
On Thu, 2018-03-29 at 01:46 -0400, Douglas Gilbert wrote:
> Break out several intertwined paths when cmd->result is non zero and
> place them in the scsi_io_completion_nz_result helper function. The
> logic is not changed.

Reviewed-by: Bart Van Assche 





Re: [PATCH v5 2/7] scsi_io_completion rename variables

2018-03-29 Thread Bart Van Assche
On Thu, 2018-03-29 at 01:46 -0400, Douglas Gilbert wrote:
> Change and add some variable names, adjust some associated comments
> for clarity. Correct some misleading comments.

Reviewed-by: Bart Van Assche 






Re: [PATCH v3 00/41] cxlflash: OCXL transport support and miscellaneous fixes

2018-03-29 Thread Uma Krishnan


> On Mar 28, 2018, at 4:34 PM, Martin K. Petersen  
> wrote:
> 
> 
> Uma,
> 
>> This patch series adds OCXL support to the cxlflash driver. With this
>> support, new devices using the OCXL transport will be supported by the
>> cxlflash driver along with the existing CXL devices. An effort is made
>> to keep this transport specific function independent of the existing
>> core driver that communicates with the AFU.
>> 
>> The first three patches contain a minor fix and staging improvements.
>> 
>> This series is intended for 4.17 and is bisectable.
> 
> Something this big really needs to be submitted around rc2/rc3. It's way
> too late in the cycle to ensure proper zeroday coverage for all these
> commits.
> 
> I have started a 4.18/scsi-queue branch to hold this series for now.
> The 4.18 branch will be rebased once 4.17rc1 is out in a few weeks. Your
> changes won't show up in for-next until then either.

I did send the first version of this series just after 4.16rc2 but the reviews 
took a while and the V3 is cutting close unless we do have an rc8 for 4.16 :)

I understand what you are saying and agree with you. Thanks for keeping me 
posted.


Vmbus series...

2018-03-29 Thread David Miller

Please feel free put it in via the scsi tree:

Acked-by: David S. Miller 


Re: [PATCH v2 net-next] qed*: Utilize FW 8.33.11.0

2018-03-29 Thread David Miller
From: Michal Kalderon 
Date: Wed, 28 Mar 2018 11:42:16 +0300

> This FW contains several fixes and features
> 
> RDMA Features
> - SRQ support
> - XRC support
> - Memory window support
> - RDMA low latency queue support
> - RDMA bonding support
> 
> RDMA bug fixes
> - RDMA remote invalidate during retransmit fix
> - iWARP MPA connect interop issue with RTR fix
> - iWARP Legacy DPM support
> - Fix MPA reject flow
> - iWARP error handling
> - RQ WQE validation checks
> 
> MISC
> - Fix some HSI types endianity
> - New Restriction: vlan insertion in core_tx_bd_data can't be set
>   for LB packets
> 
> ETH
> - HW QoS offload support
> - Fix vlan, dcb and sriov flow of VF sending a packet with
>   inband VLAN tag instead of default VLAN
> - Allow GRE version 1 offloads in RX flow
> - Allow VXLAN steering
> 
> iSCSI / FcoE
> - Fix bd availability checking flow
> - Support 256th sge proerly in iscsi/fcoe retransmit
> - Performance improvement
> - Fix handle iSCSI command arrival with AHS and with immediate
> - Fix ipv6 traffic class configuration
> 
> DEBUG
> - Update debug utilities
> 
> Signed-off-by: Michal Kalderon 
> Signed-off-by: Tomer Tayar 
> Signed-off-by: Manish Rangankar 
> Signed-off-by: Ariel Elior 
> ---
> Changes from v1
>   Remove version bump and qedr module version addition

Applied.


Re: Multi-Actuator SAS HDD First Look

2018-03-29 Thread Douglas Gilbert

On 2018-03-26 11:08 AM, Hannes Reinecke wrote:

On Fri, 23 Mar 2018 08:57:12 -0600
Tim Walker  wrote:


Seagate announced their split actuator SAS drive, which will probably
require some kernel changes for full support. It's targeted at cloud
provider JBODs and RAID.

Here are some of the drive's architectural points. Since the two LUNs
share many common components (e.g. spindle) Seagate allocated some
SCSI operations to be LUN specific and some to affect the entire
device, that is, both LUNs.

1. Two LUNs, 0 & 1, each with independent lba space, and each
connected to an independent read channel, actuator, and set of heads.
2. Each actuator addresses 1/2 of the media - no media is shared
across the actuators. They seek independently.
3. One World Wide Name (WWN) is assigned to the port for device
address. Each Logical Unit has a separate World Wide Name for
identification in VPD page.
4. 128 deep command queue, shared across both LUNs
5. Each LUN can pull commands from the queue independently, so they
can implement their own sorting and optimization.
6. Ordered tag attribute causes the command to be ordered across both
Logical Units
7. Head of Queue attribute causes the command to be ordered with
respect to a single Logical Unit
8. Mode pages are device-based (shared across both Logical Units)
9. Log pages are device-based.
10. Inquiry VPD pages (with minor exceptions) are device based.
11. Device health features (SMART, etc) are device based

Seagate wants the multi-actuator design to integrate into the stack as
painlessly as possible.The interface design is still in the early
stages, so I am gathering requirements and recommendations, and also
providing any information necessary to help scope integrating a
multi-LUN device into the MQ stack. So, I am soliciting any pertinent
feedback including:

1. Painful incompatibilities between the Seagate proposal and current
MQ architecture
2. Linux changes needed
3. Drive interface changes needed
4. Anything else I may have overlooked


So far it looks okay; just make sure to have VPD page 0x83
entries properly associated.
To all intents and purposes these devices seem to look like 'normal'
devices with two LUNs; nothing special with that.
Real question would be in which areas those devices differentiate from
the two indepdendent LUN scenario.

There might be issues with per-device informations like SMART etc;
ideally they are available from _both_ LUNs.
Otherwise they'll show up as blank from one LUN, causing consternation
with any management software.


Further to this point, some types of damage, such as to a head
or (one side of) a platter would degrade one LU, possibly making
it unusable for storage, while the other side (and the other LU)
would be fine.

I'm curious how you plan to implement the START STOP UNIT command.
If one side of the platter is in "start" state and the other side
in "stop" state, will the heads on the stopped side be parked (if
they can be parked)? And if both sides (LUs) are stopped I would
hope you really would spin down the disk, then if either is started
the disk would be spun up.

Getting T10 to add a bit to the Block Device Characteristics VPD page
might be helpful. It could be a "shares a spindle" bit with the other
LUs identified in the SCSI Ports VPD page. Such an indication would
help an enclosure find out if a Multi-Actuator disk was really spun down
and ready to be removed or replaced. I think SES and smartmontools may
need tweaks to handle this new device model sensibly.

Doug Gilbert




Re: [PATCH] scsi: qla2xxx: Fix small memory leak in qla2x00_probe_one on probe failure

2018-03-29 Thread Madhani, Himanshu
Hi Bill, 

> On Mar 23, 2018, at 7:37 AM, Bill Kuzeja  wrote:
> 
> The code that fixes the crashes in the following commit introduced a
> small memory leak:
> 
> commit 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on 
> probe failure")
> 
> Fixing this requires a bit of reworking, which I've explained. Also provide
> some code cleanup.
> 
> Fixes: 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on 
> probe failure")
> Signed-off-by: Bill Kuzeja 
> 
> ---
> 
> There is a small window in qla2x00_probe_one where if qla2x00_alloc_queues
> fails, we end up never freeing req and rsp and leak 0xc0 and 0xc8 bytes
> respectively (the sizes of req and rsp).
> 
> I originally put in checks to test for this condition which were based on
> the incorrect assumption that if ha->rsp_q_map and ha->req_q_map were
> allocated, then rsp and req were allocated as well. This is incorrect.
> There is a window between these allocations:
> 
>   ret = qla2x00_mem_alloc(ha, req_length, rsp_length, &req, &rsp);
>goto probe_hw_failed;
> 
> [if successful, both rsp and req allocated]
> 
>   base_vha = qla2x00_create_host(sht, ha);
>goto probe_hw_failed;
> 
>   ret = qla2x00_request_irqs(ha, rsp);
>goto probe_failed;
> 
>   if (qla2x00_alloc_queues(ha, req, rsp)) {
>goto probe_failed;
> 
> [if successful, now ha->rsp_q_map and ha->req_q_map allocated]
> 
> To simplify this, we should just set req and rsp to NULL after we free
> them. Sounds simple enough? The problem is that req and rsp are pointers
> defined in the qla2x00_probe_one and they are not always passed by
> reference to the routines that free them.
> 
> Here are paths which can free req and rsp:
> 
> PATH 1:
> qla2x00_probe_one
>   ret = qla2x00_mem_alloc(ha, req_length, rsp_length, &req, &rsp);
>   [req and rsp are passed by reference, but if this fails, we currently
>do not NULL out req and rsp. Easily fixed]
> 
> PATH 2:
> qla2x00_probe_one
>   failing in qla2x00_request_irqs or qla2x00_alloc_queues
>  probe_failed:
> qla2x00_free_device(base_vha);
>qla2x00_free_req_que(ha, req)
>qla2x00_free_rsp_que(ha, rsp)
> 
> PATH 3:
> qla2x00_probe_one:
>   failing in qla2x00_mem_alloc or qla2x00_create_host
>  probe_hw_failed:
> qla2x00_free_req_que(ha, req)
> qla2x00_free_rsp_que(ha, rsp)
> 
> PATH 1: This should currently work, but it doesn't because rsp and rsp
> are not set to NULL in qla2x00_mem_alloc. Easily remedied.
> 
> PATH 2: req and rsp aren't passed in at all to qla2x00_free_device but are
> derived from ha->req_q_map[0] and ha->rsp_q_map[0]. These are only set up
> if qla2x00_alloc_queues succeeds.
> 
> In qla2x00_free_queues, we are protected from crashing if these don't
> exist because req_qid_map and rsp_qid_map are only set on their
> allocation. We are guarded in this way:
> 
>for (cnt = 0; cnt < ha->max_req_queues; cnt++) {
>if (!test_bit(cnt, ha->req_qid_map))
>continue;
> 
> PATH 3: This works. We haven't freed req or rsp yet (or they were never
> allocated if qla2x00_mem_alloc failed), so we'll attempt to free them
> here.
> 
> To summarize, there are a few small changes to make this work correctly and
> (and for some cleanup):
> 
> 1) (For PATH 1) Set *rsp and *req to NULL in case of failure in
> qla2x00_mem_alloc so these are correctly set to NULL back in
> qla2x00_probe_one
> 
> 2) After jumping to probe_failed: and calling qla2x00_free_device,
> explicitly set rsp and req to NULL so further calls with these pointers
> do not crash, i.e. the free queue calls in the probe_hw_failed section
> we fall through to.
> 
> 3) Fix return code check in the call to qla2x00_alloc_queues. We currently
> drop the return code on the floor. The probe fails but the caller of the
> probe doesn't have an error code, so it attaches to pci. This can result
> in a crash on module shutdown.
> 
> 4) Remove unnecessary NULL checks in qla2x00_free_req_que,
> qla2x00_free_rsp_que, and the egregious NULL checks before kfrees and
> vfrees in qla2x00_mem_free.
> 
> I tested this out running a scenario where the card breaks at various
> times during initialization. I made sure I forced every error exit path
> in qla2x00_probe_one.
> 
> ---
> drivers/scsi/qla2xxx/qla_os.c | 44 +--
> 1 file changed, 21 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 5c5dcca4..f70d7eb 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -471,9 +471,6 @@ static int qla2x00_alloc_queues(struct qla_hw_data *ha, 
> struct req_que *req,
> 
> static void qla2x00_free_req_que(struct qla_hw_data *ha, struct req_que *req)
> {
> - if (!ha->req_q_map)
> - return;
> -
>   if (IS_QLAFX00(ha)) {
>   if (req && req->ring_fx00)
> 

Re: [PATCH] qla2xxx: Avoid double completion of abort command

2018-03-29 Thread Madhani, Himanshu
Hi Ben, 

> On Mar 20, 2018, at 2:05 PM, Ben Hutchings  
> wrote:
> 
> qla2x00_tmf_sp_done() now deletes the timer that will run
> qla2x00_tmf_iocb_timeout(), but doesn't check whether the timer
> already expired.  Check the return value from del_timer() to avoid
> calling complete() a second time.
> 
> Fixes: 4440e46d5db7 ("[SCSI] qla2xxx: Add IOCB Abort command asynchronous 
> ...")
> Fixes: 1514839b3664 ("scsi: qla2xxx: Fix NULL pointer crash due to active 
> ...")
> Signed-off-by: Ben Hutchings 
> ---
> drivers/scsi/qla2xxx/qla_init.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
> index 00329dda6179..b0aa8cc96f0f 100644
> --- a/drivers/scsi/qla2xxx/qla_init.c
> +++ b/drivers/scsi/qla2xxx/qla_init.c
> @@ -1546,8 +1546,8 @@ qla24xx_abort_sp_done(void *ptr, int res)
>   srb_t *sp = ptr;
>   struct srb_iocb *abt = &sp->u.iocb_cmd;
> 
> - del_timer(&sp->u.iocb_cmd.timer);
> - complete(&abt->u.abt.comp);
> + if (del_timer(&sp->u.iocb_cmd.timer))
> + complete(&abt->u.abt.comp);
> }
> 
> int
> -- 
> 2.15.0.rc0
> 

Looks good.

Acked-by: Himanshu Madhani 

Thanks,
- Himanshu



Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation

2018-03-29 Thread Madhani, Himanshu
Hi Ben, 

> On Mar 21, 2018, at 10:58 AM, Ben Hutchings  
> wrote:
> 
> On Wed, 2018-03-21 at 17:55 +, Madhani, Himanshu wrote:
>> Hi Ben, 
>> 
>>> On Mar 21, 2018, at 10:45 AM, Ben Hutchings  
>>> wrote:
>>> 
>>> On Wed, 2018-03-21 at 17:17 +, Madhani, Himanshu wrote:
 Hi Ben, 
 
> On Mar 20, 2018, at 2:36 PM, Ben Hutchings 
>  wrote:
> 
> qla2x00_init_timer() calls add_timer() on the iocb timeout timer,
> which means the timeout function pointer and any data that the
> function depends on must be initialised beforehand.
> 
> Move this initialisation before each call to qla2x00_init_timer().  In
> some cases qla2x00_init_timer() initialises a completion structure
> needed by the timeout function, so move the call to add_timer() after
> that.
>>> 
>>> [...]
 What motivated this patch? are you debugging any crash which was
 helped by moving code around? 
>>> 
>>> I saw the recent fix that added a call to del_timer(), noticed the lack
>>> of synchronisation and then kept looking further.
>>> 
 What I see from this patch is that its moving iocb.cmd.timeout field
 before qla2x00_init_timer(),
 which are completely different from each other. 
>>> 
>>> How are they "completely different"?  qla2x00_init_timer() starts a
>>> timer that will call qla2x00_sp_timeout(), which in turn uses the
>>> iocb_cmd.timeout function pointer.  We should not assume that any
>>> initialisation code after the call to add_timer() will run before the
>>> timer expires, since the scheduler might run other tasks for the whole
>>> time.  It's unlikely but not impossible.
>>> 
>> 
>> I see. I used “completely different” due to the lack of better wording. 
>> 
>> I wanted to get context and understand if there was any issue that would
>> have helped with these changes. 
>> 
>> your explanation does help and I agree that there is window where timer() 
>> will 
>> pop before timeout is initialized. 
>> 
>> Let me put this patch in our test cycle for couple days and will respond on 
>> this one
> 
> Thanks.
> 
> Ben.
> 
> -- 
> Ben Hutchings
> Software Developer, Codethink Ltd.

Thanks again for the patch. Testing was successful. 

Acked-by: Himanshu Madhani 

Thanks,
- Himanshu



Re: Recent kernels fail to boot on POWER8 with multipath SCSI

2018-03-29 Thread Mike Snitzer
On Thu, Mar 29 2018 at  4:39am -0400,
Paul Mackerras  wrote:

> Since commit 8d47e65948dd ("dm mpath: remove unnecessary NVMe
> branching in favor of scsi_dh checks", 2018-03-05), upstream kernels
> fail to boot on my POWER8 box which has multipath SCSI disks.  The
> host adapters are IPR and the userspace is CentOS 7.
> 
> Before that commit, the system booted fine.  After that, it fails with
> a NULL pointer dereference until we get to commits c37366742baa ("dm
> mpath: fix uninitialized 'pg_init_wait' waitqueue_head NULL pointer",
> 2018-03-12) and e8f74a0f0011 ("dm mpath: eliminate need to use
> scsi_device_from_queue", 2018-03-12), both of which say they fix
> 8d47e65948dd.  From commit e8f74a0f0011 on, I see some warnings from
> the workqueue code, and the system does not find its root disk.
> 
> Here are some annotated logs.
> 
> The crash from e8f74a0f0011 until c37366742baa looks like this:


 
> After e8f74a0f0011, the logs look like this:
> 
> [   38.042638] device-mapper: multipath service-time: version 0.3.0 loaded
> [   38.049825] WARNING: CPU: 16 PID: 1414 at 
> /home/paulus/kernel/kvm/kernel/workqueue.c:1513 
> __queue_delayed_work+0x10c/0x130
> [   38.049936] Modules linked in: dm_service_time lpfc(+) radeon(+) nvme_fc 
> nvme_fabrics nvme_core scsi_transport_fc i2c_algo_bit drm_kms_helper tg3 
> syscopyarea sysfillrect sysimgblt fb_sys_fops ttm dm_multipath drm 
> drm_panel_orientation_quirks i2c_core cxgb3(+) mlx5_core(+) mdio
> [   38.050176] CPU: 16 PID: 1414 Comm: systemd-udevd Not tainted 
> 4.16.0-rc5-kvm+ #117
> [   38.050264] NIP:  c012bcac LR: c012bd94 CTR: 
> c012bcd0
> [   38.050347] REGS: c00ff3df3800 TRAP: 0700   Not tainted  
> (4.16.0-rc5-kvm+)
> [   38.050422] MSR:  90029033   CR: 24022884 
>  XER: 
> [   38.050475] tg3 0003:05:00.3 enP3p5s0f3: renamed from eth3
> [   38.050504] CFAR: c012bbf8 SOFTE: 1 
> [   38.050504] GPR00: c012bd94 c00ff3df3a80 c149a100 
> c00feef30c50 
> [   38.050504] GPR04: c00fdb660800 c00feef30c30  
>  
> [   38.050504] GPR08: 0001 c012bb50  
> dbe25130 
> [   38.050504] GPR12: c012bcd0 cfd4b000 000112942aa0 
>  
> [   38.050504] GPR16: 000112940fe0 000112941048 000112980a00 
> 000112942b00 
> [   38.050504] GPR20: 000112980030 01003cdc3059 01003cdc3199 
> 0007 
> [   38.050504] GPR24:   c00ff3df3c88 
> 0400 
> [   38.050504] GPR28:  0400 c00fdb660800 
> c00feef30c30 
> [   38.051216] NIP [c012bcac] __queue_delayed_work+0x10c/0x130
> [   38.051284] LR [c012bd94] queue_delayed_work_on+0xc4/0x100
> [   38.051350] Call Trace:
> [   38.051381] [c00ff3df3a80] [c00ff14e0180] 0xc00ff14e0180 
> (unreliable)
> [   38.051462] [c00ff3df3ac0] [c012bd94] 
> queue_delayed_work_on+0xc4/0x100
> [   38.051541] [c00ff3df3b10] [dbe20c98] 
> __pg_init_all_paths+0x108/0x190 [dm_multipath]
> [   38.051635] [c00ff3df3b50] [dbe20d68] 
> pg_init_all_paths+0x48/0x80 [dm_multipath]
> [   38.051729] [c00ff3df3b90] [dbe22b18] 
> multipath_prepare_ioctl+0x148/0x160 [dm_multipath]
> [   38.051825] [c00ff3df3bd0] [c0914e20] 
> dm_get_bdev_for_ioctl+0x120/0x1b0
> [   38.051908] [c00ff3df3c20] [c091525c] dm_blk_ioctl+0x4c/0x110
> [   38.051977] [c00ff3df3ca0] [c05a62c8] blkdev_ioctl+0x5f8/0xbd0
> [   38.052046] [c00ff3df3d00] [c03e8b74] block_ioctl+0x64/0xd0
> [   38.052115] [c00ff3df3d40] [c03a87b8] do_vfs_ioctl+0xd8/0x8c0
> [   38.052184] [c00ff3df3de0] [c03a9074] SyS_ioctl+0xd4/0x130
> [   38.052254] [c00ff3df3e30] [c000b8e0] system_call+0x58/0x6c
> [   38.052320] Instruction dump:
> [   38.052359] 38210040 e8010010 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6 
> 4bfff810 
> [   38.052438] 0fe0 4b78 0fe0 4b60 <0fe0> 4b4c 
> 0fe0 4b30 
> [   38.052538] ---[ end trace a130dfb7aec21e53 ]---
> [   38.052662] WARNING: CPU: 16 PID: 1414 at 
> /home/paulus/kernel/kvm/kernel/workqueue.c:1515 
> __queue_delayed_work+0xfc/0x130
> [   38.052912] Modules linked in: dm_service_time lpfc(+) radeon(+) nvme_fc 
> nvme_fabrics nvme_core scsi_transport_fc i2c_algo_bit drm_kms_helper tg3 
> syscopyarea sysfillrect sysimgblt fb_sys_fops ttm dm_multipath drm 
> drm_panel_orientation_quirks i2c_core cxgb3(+) mlx5_core(+) mdio
> [   38.053563] CPU: 16 PID: 1414 Comm: systemd-udevd Tainted: GW  
>   4.16.0-rc5-kvm+ #117
> [   38.053788] NIP:  c012bc9c LR: c012bd94 CTR: 
> c012bcd0
> [   38.053939] REGS: c00ff3df3800 TRAP: 0700   Tainted: GW
>  (4.16.0-rc5-kvm+)
> [   38.054104] MSR:  90029033   CR: 24022884 
>  XER: 
> [   38.05

Re: [PATCH] scsi: cxgb4i: potential array overflow in t4_uld_rx_handler()

2018-03-29 Thread Varun Prakash
On Wed, Mar 28, 2018 at 08:30:37PM +0300, Dan Carpenter wrote:
> On Wed, Mar 28, 2018 at 09:14:25PM +0530, Varun Prakash wrote:
> > On Wed, Mar 21, 2018 at 09:12:00PM -0400, Martin K. Petersen wrote:
> > > 

> > > >
> > > > On Wed, Nov 29, 2017 at 02:42:20PM +0300, Dan Carpenter wrote:
> > > >> The story is that Smatch marks skb->data as untrusted and so it
> > > >> complains about this code:
> > > >> 
> > > >>drivers/scsi/cxgbi/cxgb4i/cxgb4i.c:2111 t4_uld_rx_handler()
> > > >>error: buffer overflow 'cxgb4i_cplhandlers' 239 <= 255.
> > > >> 
> > > >> I don't know the code very well, but it looks like a reasonable warning
> > > >> message.  Let's address it by adding a sanity check to make sure "opc"
> > > >> is within bounds.
> > > >> 
> > > >> Fixes: bbc02c7e9d34 ("cxgb4: Add register, message, and FW 
> > > >> definitions")
> > > >> Signed-off-by: Dan Carpenter 
> > > >> 
> > > >> diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c 
> > > >> b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
> > > >> index 266eddf17a99..94b2d5660a07 100644
> > > >> --- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
> > > >> +++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
> > > >> @@ -2108,12 +2108,12 @@ static int t4_uld_rx_handler(void *handle, 
> > > >> const __be64 *rsp,
> > > >>log_debug(1 << CXGBI_DBG_TOE,
> > > >>"cdev %p, opcode 0x%x(0x%x,0x%x), skb %p.\n",
> > > >> cdev, opc, rpl->ot.opcode_tid, 
> > > >> ntohl(rpl->ot.opcode_tid), skb);
> > > >> -  if (cxgb4i_cplhandlers[opc])
> > > >> -  cxgb4i_cplhandlers[opc](cdev, skb);
> > > >> -  else {
> > > >> +  if (opc >= ARRAY_SIZE(cxgb4i_cplhandlers) || 
> > > >> !cxgb4i_cplhandlers[opc]) {
> > > >>pr_err("No handler for opcode 0x%x.\n", opc);
> > > >>__kfree_skb(skb);
> > > >> +  return 0;
> > > >>}
> > > >> +  cxgb4i_cplhandlers[opc](cdev, skb);
> > > >>return 0;
> > > >>  nomem:
> > > >>log_debug(1 << CXGBI_DBG_TOE, "OOM bailing out.\n");
> > > >
> > > >
> > 
> > This check is not necessary but we can add it to avoid warning.
> 
> Is the reason it's not necessary, because the skb->data comes from the
> firmware and we trust it?

Yes, opc is an opcode of a cpl cmd which is generated by hardware or firmware,
driver should never get opc >= ARRAY_SIZE(cxgb4i_cplhandler) or 
NUM_CPL_CMDS(239).

> The v5 declares the array as having 256
> elements which also solves this warning.  And cxgbit_uld_lro_rx_handler()
> has a bounds check.  So it seems pretty normal to prevent the array
> overflow by force as well as by trust.
> 
> > The commit mentioned in "Fixes" is not correct, this code was added in 
> > commit 
> > "7b36b6e [SCSI] cxgb4i v5: iscsi driver"
> 
> Yeah.  You're right.  I can resend with an updated commit message.
> 
> regards,
> dan carpenter
> 


aacraid code passes GFP_DMA32 to kmalloc this will not work

2018-03-29 Thread Hans de Goede

Hi All,

Since I made the same mistake myself I've done a quick grep for
GFP_DMA32 in the kernel and drivers/scsi/aacraid/commctrl.c
came up as a result of this grep, it does:

p = kmalloc(sg_count[i], GFP_KERNEL|GFP_DMA32);

But kmalloc always returns memory from the normal memory-zone,
if you need memory from a specific memory-zone like the
DMA32 zone, you must use the dma allocation functions (which
from a quick glance at the code seems appropriate here) or
directly call alloc_page or __get_free_page.

Regards,

Hans


[PATCH scsi-next] scsi: mvumi: Using module_pci_driver.

2018-03-29 Thread YueHaibing
Remove boilerplate code by using macro module_pci_driver.

Signed-off-by: YueHaibing 
---
 drivers/scsi/mvumi.c | 20 +---
 1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/drivers/scsi/mvumi.c b/drivers/scsi/mvumi.c
index fe97401..2e6fd86 100644
--- a/drivers/scsi/mvumi.c
+++ b/drivers/scsi/mvumi.c
@@ -2693,22 +2693,4 @@ static struct pci_driver mvumi_pci_driver = {
 #endif
 };
 
-/**
- * mvumi_init - Driver load entry point
- */
-static int __init mvumi_init(void)
-{
-   return pci_register_driver(&mvumi_pci_driver);
-}
-
-/**
- * mvumi_exit - Driver unload entry point
- */
-static void __exit mvumi_exit(void)
-{
-
-   pci_unregister_driver(&mvumi_pci_driver);
-}
-
-module_init(mvumi_init);
-module_exit(mvumi_exit);
+module_pci_driver(mvumi_pci_driver);
-- 
2.7.0




Re: General protection fault with use_blk_mq=1.

2018-03-29 Thread Zephaniah E. Loss-Cutler-Hull
On 03/28/2018 10:13 PM, Paolo Valente wrote:
> 
> 
>> Il giorno 29 mar 2018, alle ore 05:22, Jens Axboe  ha 
>> scritto:
>>
>> On 3/28/18 9:13 PM, Zephaniah E. Loss-Cutler-Hull wrote:
>>> On 03/28/2018 06:02 PM, Jens Axboe wrote:
 On 3/28/18 5:03 PM, Zephaniah E. Loss-Cutler-Hull wrote:
> I am not subscribed to any of the lists on the To list here, please CC
> me on any replies.
>
> I am encountering a fairly consistent crash anywhere from 15 minutes to
> 12 hours after boot with scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=1> 
> The crash looks like:
>
>>>
>
> Looking through the code, I'd guess that this is dying inside
> blkg_rwstat_add, which calls percpu_counter_add_batch, which is what RIP
> is pointing at.

 Leaving the whole thing here for Paolo - it's crashing off insertion of
 a request coming out of SG_IO. Don't think we've seen this BFQ failure
 case before.

 You can mitigate this by switching the scsi-mq devices to mq-deadline
 instead.

>>>
>>> I'm thinking that I should also be able to mitigate it by disabling
>>> CONFIG_DEBUG_BLK_CGROUP.
>>>
>>> That should remove that entire chunk of code.
>>>
>>> Of course, that won't help if this is actually a symptom of a bigger
>>> problem.
>>
>> Yes, it's not a given that it will fully mask the issue at hand. But
>> turning off BFQ has a much higher chance of working for you.
>>
>> This time actually CC'ing Paolo.
>>
> 
> Hi Zephaniah,
> if you are actually interested in the benefits of BFQ (low latency,
> high responsiveness, fairness, ...) then it may be worth to try what
> you yourself suggest: disabling CONFIG_DEBUG_BLK_CGROUP.  Also because
> this option activates the heavy computation of debug cgroup statistics,
> which probably you don't use.

I definitely am.
> 
> In addition, the outcome of your attempt without
> CONFIG_DEBUG_BLK_CGROUP would give us useful bisection information:
> - if no failure occurs, then the issue is likely to be confined in
> that debugging code (which, on the bright side, is likely to be of
> occasional interest, for only a handful of developers)
> - if the issue still shows up, then we may have new hints on this odd
> failure
> 
> Finally, consider that this issue has been reported to disappear from
> 4.16 [1], and, as a plus, that the service quality of BFQ had a
> further boost exactly from 4.16.

I look forward to that either way then.
> 
> Looking forward to your feedback, in case you try BFQ without
> CONFIG_DEBUG_BLK_CGROUP,

I'm running that now, judging from the past if it survives until
tomorrow evening then we're good, so I should hopefully know in the next
day.

Thank you,
Zephaniah E. Loss-Cutler-Hull.

> Paolo
> 
> [1] https://www.spinics.net/lists/linux-block/msg21422.html
> 
>>
>> -- 
>> Jens Axboe
> 




signature.asc
Description: OpenPGP digital signature


Recent kernels fail to boot on POWER8 with multipath SCSI

2018-03-29 Thread Paul Mackerras
Since commit 8d47e65948dd ("dm mpath: remove unnecessary NVMe
branching in favor of scsi_dh checks", 2018-03-05), upstream kernels
fail to boot on my POWER8 box which has multipath SCSI disks.  The
host adapters are IPR and the userspace is CentOS 7.

Before that commit, the system booted fine.  After that, it fails with
a NULL pointer dereference until we get to commits c37366742baa ("dm
mpath: fix uninitialized 'pg_init_wait' waitqueue_head NULL pointer",
2018-03-12) and e8f74a0f0011 ("dm mpath: eliminate need to use
scsi_device_from_queue", 2018-03-12), both of which say they fix
8d47e65948dd.  From commit e8f74a0f0011 on, I see some warnings from
the workqueue code, and the system does not find its root disk.

Here are some annotated logs.

The crash from e8f74a0f0011 until c37366742baa looks like this:

[   38.141575] device-mapper: multipath service-time: version 0.3.0 loaded
[   38.16] alua: device handler registered
[   38.144695] sd 0:2:4:0: alua: supports implicit TPGS
[   38.144808] sd 0:2:4:0: alua: device t10.IBM IPR-0   5EC28CA0 
port group c4ab rel port c4ab
[   38.144988] sd 1:2:4:0: alua: supports implicit TPGS
[   38.145089] sd 1:2:4:0: alua: device t10.IBM IPR-0   5EC28CA0 
port group c4a6 rel port c4a6
[   38.159134] sd 0:2:4:0: alua: transition timeout set to 60 seconds
[   38.159244] sd 0:2:4:0: alua: port group c4ab state A preferred supports 
TOlUSNA
[   38.159321] sd 1:2:4:0: alua: transition timeout set to 60 seconds
[   38.159382] sd 1:2:4:0: alua: port group c4a6 state N non-preferred supports 
TOlUSNA
[   38.159521] sd 0:2:4:0: alua: port group c4ab state A preferred supports 
TOlUSNA
[   38.159598] Unable to handle kernel paging request for data at address 
0x
[   38.159669] Faulting instruction address: 0xc016e9e0
[   38.159731] Oops: Kernel access of bad area, sig: 11 [#1]
[   38.159779] LE SMP NR_CPUS=1024 NUMA PowerNV
[   38.159829] Modules linked in: scsi_dh_alua dm_service_time radeon(+) 
lpfc(+) nvme_fc nvme_fabrics nvme_core scsi_transport_fc i2c_algo_bit tg3 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm dm_multipath 
drm drm_panel_orientation_quirks i2c_core cxgb3(+) mlx5_core(+) mdio
[   38.160067] CPU: 66 PID: 409 Comm: kworker/66:0 Not tainted 4.16.0-rc5-kvm 
#118
[   38.160142] Workqueue: kaluad alua_rtpg_work [scsi_dh_alua]
[   38.160190] NIP:  c016e9e0 LR: c016ec4c CTR: c016eca0
[   38.160261] REGS: c00ff3597720 TRAP: 0300   Not tainted  (4.16.0-rc5-kvm)
[   38.160332] MSR:  90009033   CR: 22008024  
XER: 2000
[   38.160408] CFAR: c000884c DAR:  DSISR: 4000 
SOFTE: 1 
[   38.160408] GPR00: c016ec4c c00ff35979a0 c149a100 
c00fd9a10750 
[   38.160408] GPR04: 0003 0001  
 
[   38.160408] GPR08: c00ff3597a30   
dbdc52a0 
[   38.160408] GPR12: 2200 cfd6d600 c00fe0ec31c0 
c00fe0ec3000 
[   38.160408] GPR16:  dd962be8 dd962c10 
c00fe0ec2e00 
[   38.160408] GPR20: 0001 c00fe0ec2e38 c00ff3597a30 
0001 
[   38.160408] GPR24:   0003 
 
[   38.160408] GPR28: c00fd9a10758 0001 ffe8 
c00fd9a10750 
[   38.161035] NIP [c016e9e0] __wake_up_common+0x90/0x240
[   38.161096] LR [c016ec4c] __wake_up_common_lock+0xbc/0x110
[   38.161155] Call Trace:
[   38.161182] [c00ff35979a0] [c0e32570] 
num_spec.62125+0x1c6ed4/0x21e824 (unreliable)
[   38.161266] [c00ff3597a10] [c016ec4c] 
__wake_up_common_lock+0xbc/0x110
[   38.161340] [c00ff3597aa0] [dbdc34d8] pg_init_done+0x1b8/0x330 
[dm_multipath]
[   38.161411] [c00ff3597b40] [dd9604c4] alua_rtpg_work+0x194/0xeb0 
[scsi_dh_alua]
[   38.161486] [c00ff3597c90] [c012f9d0] 
process_one_work+0x1a0/0x470
[   38.161558] [c00ff3597d30] [c012fd38] worker_thread+0x98/0x560
[   38.161618] [c00ff3597dc0] [c01387a4] kthread+0x164/0x1b0
[   38.161680] [c00ff3597e30] [c000bca0] 
ret_from_kernel_thread+0x5c/0xbc
[   38.161751] Instruction dump:
[   38.161787] 6000 6000 41920010 8136 792af7e3 40820100 ebdc0008 
3bdeffe8 
[   38.161861] 3b9c0008 393e0018 7fbc4840 419e00a4  3b60 3be8 
481c 
[   38.161992] ---[ end trace 17021c073d1d ]---

After e8f74a0f0011, the logs look like this:

[   38.042638] device-mapper: multipath service-time: version 0.3.0 loaded
[   38.049825] WARNING: CPU: 16 PID: 1414 at 
/home/paulus/kernel/kvm/kernel/workqueue.c:1513 __queue_delayed_work+0x10c/0x130
[   38.049936] Modules linked in: dm_service_time lpfc(+) radeon(+) nvme_fc 
nvme_fabrics nvme_core scsi_transport_fc i2c_algo_bit drm_kms_helper tg3 
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm dm_mult

Re: [PATCH v5 3/7] scsi_io_completion_nz_result function added

2018-03-29 Thread Johannes Thumshirn
Looks good (as far as I can tell)
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH v5 2/7] scsi_io_completion rename variables

2018-03-29 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850