RE: [PATCH RESEND] scsi: qla2xxx: I/Os timing out on surprise removal of

2018-10-23 Thread Kuzeja, William
This is still a bug in 4.20/scsi-queue. I am sending a new patch that applies 
cleanly to 
4.20/scsi-queue as it stands now. I have tested it successfully.

Regards

  -Bill

-Original Message-
From: Martin K. Petersen [mailto:martin.peter...@oracle.com] 
Sent: Friday, October 19, 2018 6:24 PM
To: Kuzeja, William 
Cc: linux-scsi@vger.kernel.org; qla2xxx-upstr...@qlogic.com
Subject: Re: [PATCH RESEND] scsi: qla2xxx: I/Os timing out on surprise removal 
of


Bill,

> When doing a surprise removal of an adapter, some in flight I/Os can
> get stuck and take a while to complete (they actually timeout and are
> retried). We are not handling an early error exit from
> qla2xxx_eh_abort properly.

This doesn't apply to 4.20/scsi-queue and the surrounding code has
changed significantly.

-- 
Martin K. Petersen  Oracle Linux Engineering



RE: [PATCH 3/3] qla2x00: fix init error handling

2018-03-08 Thread Kuzeja, William
On Thu, Mar 08, 2018 at 08:45:25AM, Meelis Roos wrote:
> When firmware init fails, qla2x00_probe_one() does double free of req and rsp
> queues and possibly other structures allocated by qla2x00_mem_alloc().

> Fix it by pulling out qla2x00_mem_free() and qla2x00_free_queues() invocations
> from qla2x00_free_device() and call them manually where needed, and also zero
> the req and rsp pointers after freeing them once in the error handler of
> qla2x00_probe_one().

> This fixes memory corruption and further crashes in unrelated code when 
> qla2200
> init fails for some reason.

> Signed-off-by: Meelis Roos 

Hi Meelis,

This issue should already be addressed by a very recent commit:

6a2cf8d3663e13e1 scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe 
failure

Furthermore, the additions in qla2x00_remove_one of:

+   qla2x00_mem_free(ha);
+
+   qla2x00_free_queues(ha);
+

are unnecessary. These routines are already called by qla2x00_free_device just 
above
in qla2x00_remove_one.

Regards,

   -bk




RE: [PATCH] scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure

2018-02-27 Thread Kuzeja, William

Thanks for the quick reply and analysis Hannes! My apologies in advance, I'm 
stuck on 
Outlook here at work - I'll try to format this to be readable (hopefully it 
doesn't get
mangled).

On 02/27/2018 06:01 AM, Hannes Reinecke wrote:

> Hmm. Isn't is rather the case that the labels and gotos are mixed up?
> We have this order of labels:

> probe_init_failed
> probe_failed
> probe_hw_failed
> iospace_config_failed
> disable_device

> but this potential calling order:

> disable_device
> iospace_config_failed
> probe_hw_failed
> probe_hw_failed
> probe_init_failed
> probe_failed
> probe_failed
> probe_failed

> So it looks to me as if the 'probe_init_failed' and 'probe_failed'
> labels should be exchanged...
> But yeah, we need to fix this up.
> I've run into this issue myself recently :-(

So, that is partly the problem. But.if we switch the order, we still have 
problems.
Consider the code with order switched:

probe_failed:

qla2x00_free_device(base_vha);
scsi_host_put(base_vha->host);

probe_init_failed:
qla2x00_free_req_que(ha, req);
ha->req_q_map[0] = NULL; alloc_queues succeeds
clear_bit(0, ha->req_qid_map);
qla2x00_free_rsp_que(ha, rsp);
ha->rsp_q_map[0] = NULL;
clear_bit(0, ha->rsp_qid_map);
ha->max_req_queues = ha->max_rsp_queues = 0;

probe_hw_failed:
qla2x00_mem_free(ha);
qla2x00_free_req_que(ha, req);
qla2x00_free_rsp_que(ha, rsp);
qla2x00_clear_drv_active(ha);

1) We call probe_init_failed only if qla2x00_alloc_queues fails. But if this 
routine fails,
ha->req_q_map = NULL (as well as rsp_q_map), so we crash doing 
ha->req_q_map[0] = NULL. So, I can remove these assignments. But the
qla2x00_free_req_que calls are redundant (they are done below), so we're left 
with clearing out max_req_queues and max_rsp_queues for this label. Since 
we're freeing ha anyways, I'm not sure if this is worth having another label 
for.

2) We still have the problem where in probe_failed, qla2x00_free_device does 
frees which are repeated later on in probe_hw_failed. 

qla2x00_free_device -> qla2x00_mem_free
qla2x00_free_device -> qla2x00_free_queues -> qla2x00_free_req_que

3) Also, I still need another label for when qla2x00_mem_alloc fails. We can't 
just
jump to probe_hw_failed.

I suppose we could do something like this:

probe_failed:
probe_failed = 1;

qla2x00_free_device(base_vha);
scsi_host_put(base_vha->host);

probe_init_failed:
ha->max_req_queues = ha->max_rsp_queues = 0;

probe_hw_failed:
if (!probe_failed) {
  qla2x00_mem_free(ha);
  qla2x00_free_req_que(ha, req);
  qla2x00_free_rsp_que(ha, rsp);
}
 probe_hw_failed_no_mem:
qla2x00_clear_drv_active(ha);

This is essentially what I have, except my version does not have the 
probe_init_failed 
label. Thoughts??

Thanks again and regards.

---




RE: [PATCH] sd: Limit WRITE SAME / WRITE SAME(16) w/UNMAP length for certain devices

2017-09-19 Thread Kuzeja, William
Ewan,

I like it, more generic than my patch. I never saw the other cases, so I 
limited my 
patch to WS16.

Acked-by: Bill Kuzeja 

On Tue-09-19 at 12:14 Ewan D. Milne wrote: 
> Some devices do not support a WRITE SAME / WRITE SAME(16) with the
> UNMAP
> bit set up to the length specified in the MAXIMUM WRITE SAME LENGTH
> field
> in the block limits VPD page (or, the field is zero, indicating there is
> no limit).  Limit the length by the MAXIMUM UNMAP LBA COUNT value.
> Otherwise the command might be rejected.
> 
> Signed-off-by: Ewan D. Milne 
> ---
>  drivers/scsi/sd.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 6549e5c..fa07ac2 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -693,6 +693,7 @@ static void sd_config_discard(struct scsi_disk *sdkp,
> unsigned int mode)
>   struct request_queue *q = sdkp->disk->queue;
>   unsigned int logical_block_size = sdkp->device->sector_size;
>   unsigned int max_blocks = 0;
> + u32 max_ws_blocks = sdkp->max_ws_blocks;
> 
>   q->limits.discard_alignment =
>   sdkp->unmap_alignment * logical_block_size;
> @@ -701,6 +702,13 @@ static void sd_config_discard(struct scsi_disk *sdkp,
> unsigned int mode)
>   sdkp->unmap_granularity * logical_block_size);
>   sdkp->provisioning_mode = mode;
> 
> + /*
> +  * Some devices limit WRITE SAME / WRITE SAME(16) w/UNMAP
> +  * by MAXIMUM UNMAP LBA COUNT instead of MAXIMUM WRITE
> SAME LENGTH.
> +  * Use the smaller of the two values to avoid ILLEGAL REQUEST
> errors.
> +  */
> + max_ws_blocks = min_not_zero(max_ws_blocks, sdkp-
> >max_unmap_blocks);
> +
>   switch (mode) {
> 
>   case SD_LBP_FULL:
> @@ -715,17 +723,17 @@ static void sd_config_discard(struct scsi_disk *sdkp,
> unsigned int mode)
>   break;
> 
>   case SD_LBP_WS16:
> - max_blocks = min_not_zero(sdkp->max_ws_blocks,
> + max_blocks = min_not_zero(max_ws_blocks,
> (u32)SD_MAX_WS16_BLOCKS);
>   break;
> 
>   case SD_LBP_WS10:
> - max_blocks = min_not_zero(sdkp->max_ws_blocks,
> + max_blocks = min_not_zero(max_ws_blocks,
> (u32)SD_MAX_WS10_BLOCKS);
>   break;
> 
>   case SD_LBP_ZERO:
> - max_blocks = min_not_zero(sdkp->max_ws_blocks,
> + max_blocks = min_not_zero(max_ws_blocks,
> (u32)SD_MAX_WS10_BLOCKS);
>   break;
>   }
> --
> 2.1.0