RE: [BUG] hpsa: Controller lockup detected: 0x00150028

2015-05-22 Thread Handzik, Joe
No, the problem here (iirc) actually dealt with buffers in the firmware.

Don or Mark, agree?

Joe

-Original Message-
From: Peter Zijlstra [mailto:pet...@infradead.org] 
Sent: Friday, May 22, 2015 11:40 AM
To: Tomas Henzl
Cc: Oelke, Mark; don.br...@pmcs.com; ISS StorageDev; storage...@pmcs.com; 
linux-scsi@vger.kernel.org
Subject: Re: [BUG] hpsa: Controller lockup detected: 0x00150028

On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
> >> I've updated to 6.62 and it appears to be working now; or rather, it has

I've since gotten 6.64 from HP to test; which does not seem public yet.

6.64 actually fixes the issue for me.

> An older issue for mptsas seems to handle a similar case
> 2a1b7e575b [SCSI] mptsas: fix hangs caused by ATA pass-through
> that might be for hpsa -

> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1067,6 +1067,8 @@ static int hpsa_slave_alloc(struct scsi_device *sdev)
> if (sd != NULL)
> sdev->hostdata = sd;
> spin_unlock_irqrestore(&h->devlock, flags);
> +
> +   blk_queue_dma_alignment (sdev->request_queue, 512 - 1);
> return 0;
>  }

That does indeed seem _very_ similar; I'll have to defer to Mark Oelke
and or Don Brace to say if the above is a useful alternative. Since they
seem to now know what was the root cause.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Bad tag value in scsi-mq.4

2014-08-19 Thread Handzik, Joe
I may take a look again, but was able to get this working on my Rhel7 box (with 
the kernel boot parameter present). So I either consistently used an earlier 
version of GRUB incorrectly (possible, maybe when I double checked my boot 
parameters in the earlier version I somehow reset to defaults?) or found some 
strange bug. I'm not too concerned, since other members of the team haven't had 
problems and have been using the SCSI mq code more than I have. Sorry for the 
false alarm!

Joe

-Original Message-
From: h...@infradead.org [mailto:h...@infradead.org] 
Sent: Tuesday, August 19, 2014 1:05 PM
To: Handzik, Joe
Cc: h...@infradead.org; linux-scsi@vger.kernel.org; 
scame...@beardog.cce.hp.com; Scales, Webb; Teel, Scott Stacy
Subject: Re: Bad tag value in scsi-mq.4

On Tue, Aug 05, 2014 at 07:47:12PM +0000, Handzik, Joe wrote:
> Yeah, we thought about that one. We call scsi_activate_tcq if our scsi_device 
> has tagged_supported set within hpsa_change_queue_type (our 
> .change_queue_type entry into the scsi_host_template). Also made sure I was 
> booting with the "scsi_mod.use_blk_mq=Y" option, which makes no difference 
> either way.

Can you add some tracing to catch this?  On the non-mq path requests
start out with ->tag set to -1 and blk_queue_start_tag, which is called
from scsi_request_fn sets it up.  Adding printks in that area should
help you to find the culprit.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Bad tag value in scsi-mq.4

2014-08-05 Thread Handzik, Joe
Yeah, we thought about that one. We call scsi_activate_tcq if our scsi_device 
has tagged_supported set within hpsa_change_queue_type (our .change_queue_type 
entry into the scsi_host_template). Also made sure I was booting with the 
"scsi_mod.use_blk_mq=Y" option, which makes no difference either way.

-Original Message-
From: h...@infradead.org [mailto:h...@infradead.org] 
Sent: Tuesday, August 05, 2014 2:28 PM
To: Handzik, Joe
Cc: h...@infradead.org; linux-scsi@vger.kernel.org; 
scame...@beardog.cce.hp.com; Scales, Webb; Teel, Scott Stacy
Subject: Re: Bad tag value in scsi-mq.4

On Tue, Aug 05, 2014 at 06:04:07PM +0000, Handzik, Joe wrote:
> Hey Christoph,
> 
> Using the scsi-mq.4 branch from git://git.infradead.org/users/hch/scsi.git, 
> I'm getting a -1 returned from from scsi_cmnd->request->tag...very unsure why 
> that would be. It happens without any drives attached to the controller, not 
> sure if that's relevant in any way.

You're using the non-mq code path, in which tagging needs to be enabled,
does your driver call scsi_activate_tcq?

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] hpsa: refine the pci enble/disable handling

2014-07-07 Thread Handzik, Joe
I'll take a look when I get a chance, Steve's out on vacation all week. At 
first glance it looks good, but like you said...gotta test it.

Joe

-Original Message-
From: Tomas Henzl [mailto:the...@redhat.com] 
Sent: Monday, July 07, 2014 9:13 AM
To: 'linux-scsi@vger.kernel.org'
Cc: stephenmcame...@gmail.com; michael.mil...@canonical.com; Handzik, Joe
Subject: Re: [PATCH] hpsa: refine the pci enble/disable handling

Steve, Joe,
any chance you could review this patch and verify the sw reset case?
Thanks, Tomas

On 06/12/2014 05:29 PM, Tomas Henzl wrote:
> When a second(kdump) kernel starts and the hard reset method is used 
> the driver calls pci_disable_device without previously enabling it, so 
> the kernel shows a warning -
> [   16.876248] WARNING: at drivers/pci/pci.c:1431 
> pci_disable_device+0x84/0x90()
> [   16.882686] Device hpsa
> disabling already-disabled device
> ...
> This patch fixes it, in addition to this I tried to balance also some 
> other pairs of enable/disable device in the driver.
> Unfortunately I wasn't able to verify the functionality for the case 
> of a sw reset, because of a lack of proper hw.
>
> Signed-off-by: Tomas Henzl 
> ---
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index 
> 5858600..67c41b9 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -5983,7 +5983,6 @@ static int hpsa_kdump_hard_reset_controller(struct 
> pci_dev *pdev)
>   /* Turn the board off.  This is so that later pci_restore_state()
>* won't turn the board on before the rest of config space is ready.
>*/
> - pci_disable_device(pdev);
>   pci_save_state(pdev);
>  
>   /* find the first memory BAR, so we can find the cfg table */ @@ 
> -6031,11 +6030,6 @@ static int hpsa_kdump_hard_reset_controller(struct 
> pci_dev *pdev)
>   goto unmap_cfgtable;
>  
>   pci_restore_state(pdev);
> - rc = pci_enable_device(pdev);
> - if (rc) {
> - dev_warn(&pdev->dev, "failed to enable device.\n");
> - goto unmap_cfgtable;
> - }
>   pci_write_config_word(pdev, 4, command_register);
>  
>   /* Some devices (notably the HP Smart Array 5i Controller) @@ 
> -6548,6 +6542,12 @@ static int hpsa_init_reset_devices(struct pci_dev *pdev)
>   if (!reset_devices)
>   return 0;
>  
> + rc = pci_enable_device(pdev);
> + if (rc) {
> + dev_warn(&pdev->dev, "failed to enable device.\n");
> + return -ENODEV;
> + }
> +
>   /* Reset the controller with a PCI power-cycle or via doorbell */
>   rc = hpsa_kdump_hard_reset_controller(pdev);
>  
> @@ -6556,10 +6556,11 @@ static int hpsa_init_reset_devices(struct pci_dev 
> *pdev)
>* "performant mode".  Or, it might be 640x, which can't reset
>* due to concerns about shared bbwc between 6402/6404 pair.
>*/
> - if (rc == -ENOTSUPP)
> - return rc; /* just try to do the kdump anyhow. */
> - if (rc)
> - return -ENODEV;
> + if (rc) {
> + if (rc != -ENOTSUPP) /* just try to do the kdump anyhow. */
> + rc = -ENODEV;
> + goto out_disable; 
> + }
>  
>   /* Now try to get the controller to respond to a no-op */
>   dev_warn(&pdev->dev, "Waiting for controller to respond to 
> no-op\n"); @@ -6570,7 +6571,11 @@ static int hpsa_init_reset_devices(struct 
> pci_dev *pdev)
>   dev_warn(&pdev->dev, "no-op failed%s\n",
>   (i < 11 ? "; re-trying" : ""));
>   }
> - return 0;
> +
> +out_disable:
> +
> + pci_disable_device(pdev);
> + return rc;
>  }
>  
>  static int hpsa_allocate_cmd_pool(struct ctlr_info *h) @@ -6722,6 
> +6727,7 @@ static void hpsa_undo_allocations_after_kdump_soft_reset(struct 
> ctlr_info *h)
>   iounmap(h->transtable);
>   if (h->cfgtable)
>   iounmap(h->cfgtable);
> + pci_disable_device(h->pdev);
>   pci_release_regions(h->pdev);
>   kfree(h);
>  }

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html