Re: [RFC 0/2] target refcounting infrastructure fixes for usb
Hi, On 01/03/2014 01:45 AM, Sarah Sharp wrote: On Sat, Dec 21, 2013 at 03:51:25PM -0500, Alan Stern wrote: On Fri, 20 Dec 2013, Sarah Sharp wrote: On Thu, Dec 19, 2013 at 11:48:47AM -0800, James Bottomley wrote: It should apply incrementally on top of the previous two. If it actually works, I'll fold it into the first patch. Well, it doesn't oops anymore, but it does generate a pile of warnings: This was a simple oversight. scsi_target_reap() was called at the start of __scsi_remove_device(), but it should be called at the end. The patch below, applied on top of James's three patches, will fix the warnings. Alan Stern P.S.: The comment isn't entirely correct any more... Ok, Alan's additional patch fixed the warnings I was seeing on UAS device unplug. James, can you Cc me on the finished patch when you send it in? Hans, I don't want to send the UAS patches off to Greg until James' patches get into mainline. I believe Greg's usb-next tree is frozen at this point, so they'll have to wait until 3.15. Ugh, I must say I'm rather unhappy about this, waiting till 3.14 to give time to shake things out was fine. But since the start of the 3.13 cycle, there have been no issues found in the xhci / uas code. Yes it triggered an existing problem in another subsys, but the code itself has been issue free all this time. If Greg's tree is indeed already frozen I would rather have us asking an exception for this. Alternatively we could add all of it to 3.14 except for the patch removing the BROKEN marking from uas. Or at least all the non uas bits, which are useful to have by themselves. James, what is the status of getting the fix for the refcount issue into mainline ? Regards, Hans -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status of block-integrity
On 12/22/2013 08:21 PM, Christoph Hellwig wrote: We have the block integrity code to support DIF/DIX in the the tree for about 5 and a half years, and we still don't have a single consumer of it. By normal kernel rules it should never have been merged, or at least the bitrot long removed. Given that we'll have a lot of work to do in this area with block multiqueue I think it's time to either kill it off for good or make sure we can actually use and test it. Which would make an ideal topic for LSF, wouldn't it? Personally, I doubt it's a good idea to kill it off, but a proper (userland) API for it has been a long time missing. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mpt2sas SATA spinup behavior
Thanks for the reply, Rob. The spinup count and spinup interval were 2 targets and 2 seconds. I changed the values to 8 targets and 0 seconds. The change did not alter the behavior I'm seeing. To spin down the drives I'm sending an ATA STANDBY IMMEDIATE command via ATA Pass-Through SG_IO. The controller is definitely aware of this because LSI's sas2ircu utility reports the drive status as Available when in standby as opposed to Ready when spinning. I'm running an IT firmware and these drives are JBOD as far as the controller is concerned. --Larkin On 1/2/2014 5:02 PM, Elliott, Robert (Server Storage) wrote: In SAS systems, it's common for controllers and expanders to rotate spinup permission across phys to avoid overloading the power supply (e.g., if the system is sized to support 8 x 18 W drives, it might not work if 8 x 45 W are briefly consumed). The most common algorithm allows n phys to spinup at a time, waiting m seconds for the next set. It would be prudent for a controller in an unknown server to be very cautious and use large values. A web search shows the LSI WebBIOS Configuration Utility offers these controls: Spinup Drive Count Controls the number of drives that spin up simultaneously. The default is 2 drives. Spinup Delay Controls the interval (in seconds) between the spinup of drives connected to this controller. The delay prevents a drain on the system power supply that would occur if all drives spun up at the same time. The default is 12 seconds. For SAS drives, spinup is controlled with the NOTIFY (ENABLE SPINUP) primitive. This occurs at power on and during any subsequent power draws due to changing back to the active power condition using START STOP UNIT commands (from stopped) or media access commands (from standby). For SATA drives, spinup is controlled at power on by delaying the initial OOB signal exchange. There is no easy way to control spinup thereafter if changing back to the active power condition with a media access command (from standby). How are you spinning down these drives - with the SCSI START STOP UNIT command being translated by the LSI controller's SCSI-to-ATA translation layer into an ATA STANDBY IMMEDIATE command? LSI might be keeping track of the drive's power condition and delaying media access commands per the spinup rotation timing, while Marvell might be just passing the commands through. --- Rob ElliottHP Server Storage -Original Message- From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi- ow...@vger.kernel.org] On Behalf Of Larkin Lowrey Sent: Tuesday, 31 December, 2013 5:42 PM To: linux-scsi@vger.kernel.org Subject: mpt2sas SATA spinup behavior I'm seeing odd behavior while spinning up SATA drives on my LSI SAS 2008 controller. I have 8 drives I keep spun down (most of the time). I wrote a tool to spin them all up at the same time by reading a sector from each drive (one thread per drive). Four of the drives are connected to a Marvell controller (mvsas) and the other four to an LSI 2008 (mpt2sas). The four on the mvsas controller all finish spinning up after ~13s. The four on the mpt2sas controller finish after 40s. The mpt2sas drives, when spun up individually, will complete spin up in ~9s (except one at 13s). It appears that each of the four drives are being accessed sequentially instead of in parallel and that they must all complete their spin up before any can complete their I/O. The mvsas drives, on the other hand, perform their spin-up I/O in parallel (different brand drive, 13s spin-up). Is there something unique to the LSI 2008 that requires SATA spin-up to be handled this way (sequentially)? I see no errors in dmesg/syslog. Are there any debug facilities that might shed light on what's going on? Can anyone recommend areas in the source code where I might start hunting for a root cause? Here's some additional detail. My tool watches for activity on any member drive and when there is activity on one it will spin up the remaining drives. In this first case I kicked an mpt2sas drive so the remaining 3 would be spun up along with the 4 mvsas drives. The three large numbers in brackets are milliseconds since the beginning of time. The first is the timestamp right before the block device is opened (O_DIRECT), the second is after open but before read(), and the third is after the read has completed. Interestingly, the open() does not complete for the 3 mpt2sas drives until 9s after the trigger drive was kicked. So, it appears that all I/O for the remaining drives was blocked while the controller waited for the first drive to respond. That seems bad. Dec 31 16:26:14 fubar hdpwr[5941]: Now spinning: /dev/sdf ST4000DM000-1F2168 s/n:#NRE Dec 31 16:26:14 fubar hdpwr[5941]: Spinning up /dev/sdg ST4000DM000-1F2168 s/n:#C85 Dec 31 16:26:14 fubar hdpwr[5941]: Spinning up /dev/sdh ST4000DM000-1F2168 s/n:#C8M Dec 31
Re: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst
On 01/02/2014 11:15 AM, Sarah Sharp wrote: On Tue, Dec 31, 2013 at 12:40:16PM -0800, walt wrote: On 12/18/2013 01:11 PM, Greg Kroah-Hartman wrote: 3.12-stable review patch. If anyone has any objections, please let me know. -- From: David Laight david.lai...@aculab.com commit 35773dac5f862cb1c82ea151eba3e2f6de51ec3e upstream. Section 4.11.7.1 of rev 1.0 of the xhci specification states that a link TRB can only occur at a boundary between underlying USB frames (512 bytes for high speed devices). If this isn't done the USB frames aren't formatted correctly and, for example, the USB3 ethernet ax88179_178a card will stop sending... Unfortunately this patch causes a regression when copying large files to my outboard USB3 drive. (Nothing at all to do with networking.) Do you have CONFIG_USB_DEBUG turned on for 3.13? If so, you should see dmesg output from this statement shortly before your drive fails: if (num_trbs = TRBS_PER_SEGMENT) { xhci_err(xhci, Too many fragments %d, max %d\n, num_trbs, TRBS_PER_SEGMENT - 1); return -ENOMEM; } Well, the answers depend on whether the usb3 drive uses logical volumes or not (lvm2), which I can't explain. What I've described so far is with lvm2. When using lvm2 on the usb3 drive, turning on USB_DEBUG has *no* effect -- the console prints two or three lines stating that the ext4 journal has quit and the drive is remounted ro. That particular drive stays wedged until the next reboot, but no other ill effects to the system. OTOH, when I put a disk with just an ordinary ext4 partition in the usb3 dock, (no logical volumes) the copy failure becomes catastrophic, with kernel panic messages, leaving the system unresponsive and needing a hard reset to recover. I also tried your other suggestion: diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 4265b48..1a6a43d 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -4714,7 +4714,7 @@ int xhci_gen_setup(struct usb_hcd *hcd, xhci_get_quirks_t get_quirks) int retval; /* Accept arbitrarily long scatter-gather lists */ - hcd-self.sg_tablesize = ~0; + hcd-self.sg_tablesize = 31; /* support to build packet from discontinuous buffers */ hcd-self.no_sg_constraint = 1; Sadly it didn't fix the problem. Did I get the patch right? Thanks for your help, and I'm happy to try more ideas, as always. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 4/4] arm64: Add APM X-Gene SoC SATA host controller DTS entries
On Friday 03 January 2014, Loc Ho wrote: + sata23clk: sata23clk@1f22c000 { + compatible = apm,xgene-device-clock; + #clock-cells = 1; + clocks = socplldiv2 0; + clock-names = sata23clk; + }; + + sata45clk: sata45clk@1f23c000 { + compatible = apm,xgene-device-clock; + #clock-cells = 1; + clocks = socplldiv2 0; + clock-names = sata45clk; Something is wrong here: You have two devices with the same compatible string but using different clock-names strings. The binding document lists this as an optional property with the description shall be the name of the device clock. If missing, use the device name, which doesn't seem to make any sense. Please fix the binding and the existing users of this, and don't introduce any more broken instances. Since each device clock is documented to have only one parent anyway, please just make it an anonymous clock. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 3/4] ata: Add APM X-Gene SoC SATA host controller driver
On Friday 03 January 2014, Loc Ho wrote: + +/* Controller who PHY shared with SGMII Ethernet PHY */ +#define XGENE_AHCI_SGMII_DTS apm,xgene-ahci-sgmii + +/* Controller who PHY (internal reference clock macro) shared with PCIe */ +#define XGENE_AHCI_PCIE_DTS apm,xgene-ahci-pcie Kill these macros. I've commented on this in the past. Also, there really shouldn't be any difference to the SATA driver based on what PHY is used, so the strings shouldn't be different either. +/* SATA host controller CSR */ +#define SLVRDERRATTRIBUTES_ADDR 0x +#define SLVWRERRATTRIBUTES_ADDR 0x0004 +#define MSTRDERRATTRIBUTES_ADDR 0x0008 +#define MSTWRERRATTRIBUTES_ADDR 0x000c +#define BUSCTLREG_ADDR 0x0014 Are these strings taken literally from the data sheet? You can probably drop the _ADDR part. +#define MSTAWAUX_COHERENT_BYPASS_SET(dst, src) \ + (((dst) ~0x0002) | (((u32)(src)1) 0x0002)) +#define MSTARAUX_COHERENT_BYPASS_SET(dst, src) \ + (((dst) ~0x0001) | (((u32)(src)) 0x0001)) The macros are only used once and don't really help readability. Just open-code them. If you have complex macros like these and use them multiple times, it may be better to use an inline function. +static void xgene_ahci_enable_phy(struct xgene_ahci_context *ctx, + int channel, int enable) +{ + void *mmio = ctx-mmio_base; + u32 val; + + xgene_rd(mmio + PORTCFG_ADDR, val); + val = PORTADDR_SET(val, channel == 0 ? 2 : 3); + xgene_wr_flush(mmio + PORTCFG_ADDR, val); + xgene_rd(mmio + PORTPHY1CFG_ADDR, val); + val = PORTPHY1CFG_FRCPHYRDY_SET(val, enable); + xgene_wr(mmio + PORTPHY1CFG_ADDR, val); +} + +static void xgene_ahci_set_phy_cfg(struct xgene_ahci_context *ctx, int channel) +{ + void *mmio = ctx-mmio_base; + u32 val; + + dev_dbg(ctx-dev, port configure mmio 0x%p channel %d\n, + mmio, channel); + xgene_rd(mmio + PORTCFG_ADDR, val); + val = PORTADDR_SET(val, channel == 0 ? 2 : 3); + xgene_wr_flush(mmio + PORTCFG_ADDR, val); + /* Disable fix rate */ + xgene_wr_flush(mmio + PORTPHY1CFG_ADDR, 0x0001fffe); + xgene_wr_flush(mmio + PORTPHY2CFG_ADDR, 0x5018461c); + xgene_wr_flush(mmio + PORTPHY3CFG_ADDR, 0x1c081907); + xgene_wr_flush(mmio + PORTPHY4CFG_ADDR, 0x1c080815); + xgene_rd(mmio + PORTPHY5CFG_ADDR, val); + /* Window negotiation 0x800 to 0x400 */ + val = PORTPHY5CFG_RTCHG_SET(val, 0x300); + xgene_wr_flush(mmio + PORTPHY5CFG_ADDR, val); + xgene_rd(mmio + PORTAXICFG_ADDR, val); + val = PORTAXICFG_EN_CONTEXT_SET(val, 0x1); /* enable context mgmt */ + val = PORTAXICFG_OUTTRANS_SET(val, 0xe); /* Outstanding */ + xgene_wr_flush(mmio + PORTAXICFG_ADDR, val); +} This looks like it should be part of the PHY driver? + /* + * When both ACPI and DTS are enabled, custom ACPI built-in ACPI + * table, and booting via DTS, we need to skip the probe of the + * built-in ACPI table probe. + */ + if (!efi_enabled(EFI_BOOT) dev-of_node == NULL) + return -ENODEV; I think I've commented on this one before too. The comment talks about ACPI, but the code is about EFI, which is completely unrelated. Neither the code nor the comment seems to make sense here and you wouldn't be in this function in the first place if the device is not defined in ACPI or in DT. + /* Can't use devm_ioremap_resource due to overlapping region */ + hpriv-csr_base = devm_ioremap(dev, res-start, resource_size(res)); + if (!hpriv-csr_base) { + dev_err(dev, can't map %pR\n, res); + return -ENOMEM; + } Why are the regions overlapping? That should not happen! + /* Select ATA */ + if (of_device_is_compatible(pdev-dev.of_node, + XGENE_AHCI_SGMII_DTS)) { + if (xgene_ahci_mux_select(hpriv)) { + dev_err(dev, SATA mux selection failed\n); + return -ENODEV; + } + } What kind of mux is this? Why does the SATA driver need to care about it? It sounds like this should be part of the pinctrl driver. + /* Setup DMA mask */ + rc = dma_set_mask(dev, DMA_BIT_MASK(64)); + if (rc) { + dev_err(dev, Unable to set dma mask\n); + goto error; + } + rc = dma_set_coherent_mask(dev, DMA_BIT_MASK(64)); + if (rc) { + dev_err(dev, Unable to set dma coherent mask\n); + goto error; + } dma_set_mask_and_coherent? + +static const struct acpi_device_id xgene_ahci_acpi_match[] = { + {APMC0D00, 0}, + {}, +}; +MODULE_DEVICE_TABLE(acpi, xgene_ahci_acpi_match); Just drop the ACPI part for now. It's clear that the driver won't work with ACPI in this state, since it would need to
Re: [RFC 2/2] dual scan thread bug fix
On Thu, 2 Jan 2014, James Bottomley wrote: In the highly unusual case where two threads are running concurrently through the scanning code scanning the same target, we run into the situation where one may allocate the target while the other is still using it. In this case, because the reap checks for STARGET_CREATED and kills the target without reference counting, the second thread will do the wrong thing on reap. Fix this by reference counting even creates and doing the STARGET_CREATED check in the final put. I'm still concerned about one thing. The previous patch does this in scsi_alloc_target(): found: - found_target-reap_ref++; + if (!kref_get_unless_zero(found_target-reap_ref)) + /* + * release routine already fired. Target is dead, but + * STARGET_DEL may not yet be set (set in the release + * routine), so set here as well, just in case + */ + found_target-state = STARGET_DEL; spin_unlock_irqrestore(shost-host_lock, flags); As a result, the two comments in this patch aren't right: @@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref *kref) struct scsi_target *starget = container_of(kref, struct scsi_target, reap_ref); - transport_remove_device(starget-dev); - device_del(starget-dev); - starget-state = STARGET_DEL; + /* + * if we get here and the target is still in the CREATED state that + * means it was allocated but never made visible (because a scan + * turned up no LUNs), so don't call device_del() on it. + */ + if (starget-state == STARGET_RUNNING) { + transport_remove_device(starget-dev); + device_del(starget-dev); + } Here the state could already be STARGET_DEL, even though the target is still visible. Also, it's a little odd that the comment talks about CREATED but the code really checks for RUNNING. They should be consistent. @@ -506,11 +513,13 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, */ void scsi_target_reap(struct scsi_target *starget) { + /* + * serious problem if this triggers: STARGET_DEL is only set in the + * kref release routine, so we're doing another final put on an + * already released kref + */ BUG_ON(starget-state == STARGET_DEL); Here the code is okay but the comment is wrong: STARGET_DEL is set in _two_ places (but neither of them runs until reap_ref has reached 0). Would it be better in scsi_alloc_target() to behave as though the state were STARGET_DEL without actually setting it? Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 4/4] arm64: Add APM X-Gene SoC SATA host controller DTS entries
Hi, + sata23clk: sata23clk@1f22c000 { + compatible = apm,xgene-device-clock; + #clock-cells = 1; + clocks = socplldiv2 0; + clock-names = sata23clk; + }; + + sata45clk: sata45clk@1f23c000 { + compatible = apm,xgene-device-clock; + #clock-cells = 1; + clocks = socplldiv2 0; + clock-names = sata45clk; Something is wrong here: You have two devices with the same compatible string but using different clock-names strings. The binding document lists this as an optional property with the description shall be the name of the device clock. If missing, use the device name, which doesn't seem to make any sense. Please fix the binding and the existing users of this, and don't introduce any more broken instances. Since each device clock is documented to have only one parent anyway, please just make it an anonymous clock. Okay I miss understood this... I will need to fix the X-Gene device parent clocks as well in an separate patch to the clock driver owner. -Loc -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 4/4] arm64: Add APM X-Gene SoC SATA host controller DTS entries
On Friday 03 January 2014, Loc Ho wrote: Okay I miss understood this... I will need to fix the X-Gene device parent clocks as well in an separate patch to the clock driver owner. Ok, thanks! Arnd -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst
On Fri, Jan 03, 2014 at 07:40:33AM -0800, walt wrote: On 01/02/2014 11:15 AM, Sarah Sharp wrote: On Tue, Dec 31, 2013 at 12:40:16PM -0800, walt wrote: On 12/18/2013 01:11 PM, Greg Kroah-Hartman wrote: 3.12-stable review patch. If anyone has any objections, please let me know. -- From: David Laight david.lai...@aculab.com commit 35773dac5f862cb1c82ea151eba3e2f6de51ec3e upstream. Section 4.11.7.1 of rev 1.0 of the xhci specification states that a link TRB can only occur at a boundary between underlying USB frames (512 bytes for high speed devices). If this isn't done the USB frames aren't formatted correctly and, for example, the USB3 ethernet ax88179_178a card will stop sending... Unfortunately this patch causes a regression when copying large files to my outboard USB3 drive. (Nothing at all to do with networking.) Do you have CONFIG_USB_DEBUG turned on for 3.13? If so, you should see dmesg output from this statement shortly before your drive fails: if (num_trbs = TRBS_PER_SEGMENT) { xhci_err(xhci, Too many fragments %d, max %d\n, num_trbs, TRBS_PER_SEGMENT - 1); return -ENOMEM; } Well, the answers depend on whether the usb3 drive uses logical volumes or not (lvm2), which I can't explain. What I've described so far is with lvm2. When using lvm2 on the usb3 drive, turning on USB_DEBUG has *no* effect -- the console prints two or three lines stating that the ext4 journal has quit and the drive is remounted ro. That particular drive stays wedged until the next reboot, but no other ill effects to the system. Odd. In 3.12 xHCI has dynamic debugging, and turning on CONFIG_USB_DEBUG should turn on debugging by default, so it's confusing that you didn't see any messages. Can I see your .config from /boot/? Also, did you try capturing dmesg with `tail -f /var/log/kern.log` or just dmesg? Perhaps you need to run `sudo dmesg -n 7`? OTOH, when I put a disk with just an ordinary ext4 partition in the usb3 dock, (no logical volumes) the copy failure becomes catastrophic, with kernel panic messages, leaving the system unresponsive and needing a hard reset to recover. I also tried your other suggestion: diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 4265b48..1a6a43d 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -4714,7 +4714,7 @@ int xhci_gen_setup(struct usb_hcd *hcd, xhci_get_quirks_t get_quirks) int retval; /* Accept arbitrarily long scatter-gather lists */ - hcd-self.sg_tablesize = ~0; + hcd-self.sg_tablesize = 31; /* support to build packet from discontinuous buffers */ hcd-self.no_sg_constraint = 1; Sadly it didn't fix the problem. Did I get the patch right? Yes, you did. So perhaps the patch triggers a different bug. I can't tell until I see xHCI debugging output. Thanks for your help, and I'm happy to try more ideas, as always. Thanks for your patience. :) Sarah Sharp -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 3/4] ata: Add APM X-Gene SoC SATA host controller driver
Hi, +/* Controller who PHY shared with SGMII Ethernet PHY */ +#define XGENE_AHCI_SGMII_DTS apm,xgene-ahci-sgmii + +/* Controller who PHY (internal reference clock macro) shared with PCIe */ +#define XGENE_AHCI_PCIE_DTS apm,xgene-ahci-pcie Kill these macros. I've commented on this in the past. Also, there really shouldn't be any difference to the SATA driver based on what PHY is used, so the strings shouldn't be different either. The only need for this is the MUX when the PHY is shared between SGMII and SATA. If we get rip of the MUX code, we can drop this. MUX discussion is below. +/* SATA host controller CSR */ +#define SLVRDERRATTRIBUTES_ADDR 0x +#define SLVWRERRATTRIBUTES_ADDR 0x0004 +#define MSTRDERRATTRIBUTES_ADDR 0x0008 +#define MSTWRERRATTRIBUTES_ADDR 0x000c +#define BUSCTLREG_ADDR 0x0014 Are these strings taken literally from the data sheet? You can probably drop the _ADDR part. These names are directly from the programming reference manual. I will get rip of the _ADDR for both drivers - host and PHY. +#define MSTAWAUX_COHERENT_BYPASS_SET(dst, src) \ + (((dst) ~0x0002) | (((u32)(src)1) 0x0002)) +#define MSTARAUX_COHERENT_BYPASS_SET(dst, src) \ + (((dst) ~0x0001) | (((u32)(src)) 0x0001)) The macros are only used once and don't really help readability. Just open-code them. If you have complex macros like these and use them multiple times, it may be better to use an inline function. For these particular one, I will get rip of them. +static void xgene_ahci_enable_phy(struct xgene_ahci_context *ctx, + int channel, int enable) +{ + void *mmio = ctx-mmio_base; + u32 val; + + xgene_rd(mmio + PORTCFG_ADDR, val); + val = PORTADDR_SET(val, channel == 0 ? 2 : 3); + xgene_wr_flush(mmio + PORTCFG_ADDR, val); + xgene_rd(mmio + PORTPHY1CFG_ADDR, val); + val = PORTPHY1CFG_FRCPHYRDY_SET(val, enable); + xgene_wr(mmio + PORTPHY1CFG_ADDR, val); +} + +static void xgene_ahci_set_phy_cfg(struct xgene_ahci_context *ctx, int channel) +{ + void *mmio = ctx-mmio_base; + u32 val; + + dev_dbg(ctx-dev, port configure mmio 0x%p channel %d\n, + mmio, channel); + xgene_rd(mmio + PORTCFG_ADDR, val); + val = PORTADDR_SET(val, channel == 0 ? 2 : 3); + xgene_wr_flush(mmio + PORTCFG_ADDR, val); + /* Disable fix rate */ + xgene_wr_flush(mmio + PORTPHY1CFG_ADDR, 0x0001fffe); + xgene_wr_flush(mmio + PORTPHY2CFG_ADDR, 0x5018461c); + xgene_wr_flush(mmio + PORTPHY3CFG_ADDR, 0x1c081907); + xgene_wr_flush(mmio + PORTPHY4CFG_ADDR, 0x1c080815); + xgene_rd(mmio + PORTPHY5CFG_ADDR, val); + /* Window negotiation 0x800 to 0x400 */ + val = PORTPHY5CFG_RTCHG_SET(val, 0x300); + xgene_wr_flush(mmio + PORTPHY5CFG_ADDR, val); + xgene_rd(mmio + PORTAXICFG_ADDR, val); + val = PORTAXICFG_EN_CONTEXT_SET(val, 0x1); /* enable context mgmt */ + val = PORTAXICFG_OUTTRANS_SET(val, 0xe); /* Outstanding */ + xgene_wr_flush(mmio + PORTAXICFG_ADDR, val); +} This looks like it should be part of the PHY driver? There are the corresponding configure on the host controller side. It is not part of the PHY. Each host such as SATA, SGMII, or PCIe has its own version of these registers. It can be completely different between SATA and PCIe for example. + /* + * When both ACPI and DTS are enabled, custom ACPI built-in ACPI + * table, and booting via DTS, we need to skip the probe of the + * built-in ACPI table probe. + */ + if (!efi_enabled(EFI_BOOT) dev-of_node == NULL) + return -ENODEV; I think I've commented on this one before too. The comment talks about ACPI, but the code is about EFI, which is completely unrelated. Neither the code nor the comment seems to make sense here and you wouldn't be in this function in the first place if the device is not defined in ACPI or in DT. Let remove them for the time being. Unless one has an custom ACPI table built in the kernel, it isn't a problem. + /* Can't use devm_ioremap_resource due to overlapping region */ + hpriv-csr_base = devm_ioremap(dev, res-start, resource_size(res)); + if (!hpriv-csr_base) { + dev_err(dev, can't map %pR\n, res); + return -ENOMEM; + } Why are the regions overlapping? That should not happen! Each host controller/PHY includes the following resource: 0x. for the host core register 0x.7000 for the mux select if shared with SGMII 0x.A000 for the PHY indirect access 0x.C000 for the host/PHY clocks 0x.D000 for the RAM shutdown removal As I only used one resource register of size 64K, it overlapped with the 0x.A000 mapped by the PHY already. If you believe it is better that I have two resources
Re: spinlock_irqsave() flags (Was: pm80xx: Spinlock fix)
On Thu, Jan 2, 2014 at 2:31 AM, Suresh Thiagarajan suresh.thiagara...@pmcs.com wrote: On Fri, Dec 27, 2013 at 9:48 PM, Oleg Nesterov o...@redhat.com wrote: On 12/24, Suresh Thiagarajan wrote: Below is a small pseudo code on protecting/serializing the flag for global access. struct temp { ... spinlock_t lock; unsigned long lock_flags; }; void my_lock(struct temp *t) { unsigned long flag; // thread-private variable as suggested spin_lock_irqsave(t-lock, flag); t-lock_flags = flag; //updating inside critical section now to serialize the access to flag } void my_unlock(struct temp *t) { unsigned long flag = t-lock_flags; t-lock_flags = 0; //clearing it before getting out of critical section spin_unlock_irqrestore(t-lock, flag); } Yes, this should work as a quick fix. And you do not need to clear -lock_flags in my_unlock(). But when I look at original patch again, I no longer understand why do you need pm8001_ha-lock_flags at all. Of course I do not understand this code, I am sure I missed something, but at first glance it seems that only this sequence spin_unlock_irq(pm8001_ha-lock); t-task_done(t); spin_lock_irq(pm8001_ha-lock); should be fixed? If yes, why you can't simply do spin_unlock() + spin_lock() around t-task_done() ? This won't enable irqs, but spin_unlock_irqrestore() doesn't necessarily enables irqs too, so -task_done() can run with irqs disabled? And note that the pattern above has a lot of users, perhaps it makes sense to start with something like the patch below? Thanks James, Oleg and all for your inputs. Will start with review and testing this patch and then work/investigate to keep shortest and clearest critical section so that we can have the lock and unlock within the same routine. Fwiw we solved this in libsas a while back with a similar pattern proposed by Oleg: unsigned long flags; local_irq_save(flags); spin_unlock(lock); ... spin_lock_lock(lock); local_irq_restore(flags); See commit 312d3e56119a [SCSI] libsas: remove ata_port.lock management duties from lldds -- Dan -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status of block-integrity
Hannes == Hannes Reinecke h...@suse.de writes: Hannes Personally, I doubt it's a good idea to kill it off, but a Hannes proper (userland) API for it has been a long time missing. Before we throw the baby out with the bath water, maybe Darrick can fill us in on the progress of the aio passthrough interface? -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bio_integrity_verify() bug causing READ verify to be silently skipped
nab == Nicholas A Bellinger n...@linux-iscsi.org writes: nab Given that bio_integrity_verify() is using bio_for_each_segment(), nab the loop starts from the updated bio-bi_idx, and not a zero value, nab which ends up skipping individual bio segment calls to nab bi-verify_fn(). That's botched. Verify is meant to be called with the completed bytes before the index is tampered with. nab The following patch changes bio_integrity_verify() to use nab bio_for_each_segment_all() instead of bio_for_each_segment() to nab ensure that the segment walk always starts from zero, regardless of nab the current bio-bi_idx value after bio_advance(). That breaks partial completion, though. I'll take a look at Kent's changes... -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug#733565: SIX messages per on boot console should be TWO
Ben == Ben Hutchings b...@decadent.org.uk writes: Ben I can see that it is emitted by sd_read_cache_type(), which is Ben called by sd_revalidate_disk(), and that is apparently now called 3 Ben times during probe. Which is quite ridiculous. We have to discover the basics of the disk before we can create the gendisk/block device/request queue. And some of the subsequent parameters we need can't be stored or acted upon until everything has been set up. So some questions we have to ask several times. Ben And I wonder whether this situation (no caching mode page) is Ben really serious enough to deserve logging at ERR severity? I guess we could extend the first_scan logic to cover the error case scenarios. But it is quite unusual for a device to not implement the caching mode page... -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] isci: fix reset timeout handling
Remove an erroneous BUG_ON() in the case of a hard reset timeout. The reset timeout handler puts the port into the awaiting link-up state. The timeout causes the device to be disconnected and we need to be in the awaiting link-up state to re-connect the port. The BUG_ON() made the incorrect assumption that resets never timeout and we always complete the reset in the resetting state. Testing this patch also uncovered that libata continues to attempt to reset the port long after the driver has torn down the context. Once the driver has committed to abandoning the link it must indicate to libata that recovery ends by returning -ENODEV from -lldd_I_T_nexus_reset(). Cc: sta...@vger.kernel.org Cc: Maciej Patelczyk maciej.patelc...@intel.com Cc: Dave Jiang dave.ji...@intel.com Acked-by: Lukasz Dorau lukasz.do...@intel.com Reported-by: David Milburn dmilb...@redhat.com Reported-by: Xun Ni xun...@intel.com Tested-by: Xun Ni xun...@intel.com Signed-off-by: Dan Williams dan.j.willi...@intel.com --- v2: Dave reported the build warning regression the last patch caused, and I clarified the changelog a bit. drivers/scsi/isci/port_config.c |7 --- drivers/scsi/isci/task.c|2 +- 2 files changed, 1 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/isci/port_config.c b/drivers/scsi/isci/port_config.c index 85c77f6b802b..ac879745ef80 100644 --- a/drivers/scsi/isci/port_config.c +++ b/drivers/scsi/isci/port_config.c @@ -615,13 +615,6 @@ static void sci_apc_agent_link_up(struct isci_host *ihost, SCIC_SDS_APC_WAIT_LINK_UP_NOTIFICATION); } else { /* the phy is already the part of the port */ - u32 port_state = iport-sm.current_state_id; - - /* if the PORT'S state is resetting then the link up is from -* port hard reset in this case, we need to tell the port -* that link up is recieved -*/ - BUG_ON(port_state != SCI_PORT_RESETTING); port_agent-phy_ready_mask |= 1 phy_index; sci_port_link_up(iport, iphy); } diff --git a/drivers/scsi/isci/task.c b/drivers/scsi/isci/task.c index 0d30ca849e8f..5d6fda72d659 100644 --- a/drivers/scsi/isci/task.c +++ b/drivers/scsi/isci/task.c @@ -801,7 +801,7 @@ int isci_task_I_T_nexus_reset(struct domain_device *dev) /* XXX: need to cleanup any ireqs targeting this * domain_device */ - ret = TMF_RESP_FUNC_COMPLETE; + ret = -ENODEV; goto out; } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi-sg: pass flag to inhibit setting LUN
Dne 3.1.2014 21:21, Martin K. Petersen napsal(a): Jiří == Jiří Pinkava j...@seznam.cz writes: Jiří This patch implements support for inhibiting setting LUN number Jiří for SCSI custom command send via /dev/sgX with ioctl(.., SG_IO, Jiří ...) call. Is it an absolute requirement that this is a per-command option or could we have a scsi_device quirk flag so we didn't have to plumb this through the entire I/O stack? It should be enough to limit it to open file descriptor or sg subsystem. This magic is expected to happen every time device is accessed trough sg subsystem, but not if it is accessed other way (eg. it is still some kind of common disk). My understanding is that sg subsystem allows generic access to device, in extreme case I might want to send command to LUN which is not the active one. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst
On 01/03/2014 11:54 AM, Sarah Sharp wrote: On Fri, Jan 03, 2014 at 07:40:33AM -0800, walt wrote: On 01/02/2014 11:15 AM, Sarah Sharp wrote: On Tue, Dec 31, 2013 at 12:40:16PM -0800, walt wrote: On 12/18/2013 01:11 PM, Greg Kroah-Hartman wrote: 3.12-stable review patch. If anyone has any objections, please let me know. -- From: David Laight david.lai...@aculab.com commit 35773dac5f862cb1c82ea151eba3e2f6de51ec3e upstream. Section 4.11.7.1 of rev 1.0 of the xhci specification states that a link TRB can only occur at a boundary between underlying USB frames (512 bytes for high speed devices). If this isn't done the USB frames aren't formatted correctly and, for example, the USB3 ethernet ax88179_178a card will stop sending... Unfortunately this patch causes a regression when copying large files to my outboard USB3 drive. (Nothing at all to do with networking.) Do you have CONFIG_USB_DEBUG turned on for 3.13? If so, you should see dmesg output from this statement shortly before your drive fails: if (num_trbs = TRBS_PER_SEGMENT) { xhci_err(xhci, Too many fragments %d, max %d\n, num_trbs, TRBS_PER_SEGMENT - 1); return -ENOMEM; } Well, the answers depend on whether the usb3 drive uses logical volumes or not (lvm2), which I can't explain. What I've described so far is with lvm2. When using lvm2 on the usb3 drive, turning on USB_DEBUG has *no* effect I'm so sorry Sarah, that was another mistake. The mistake is so stupid I'm not going to publish it here :( Once I finally ran the kernel with debugging actually compiled in, dmesg contains xhci debugging messages. Wow :) It's a big file so I zipped and attached it, which I hope is acceptable in lkml. BTW, this dmesg is from a kernel with sg_tablesize = 31, which as I said before doesn't fix the problem. The cp stopped around 7GB just as before. Sorry for the noise... xhci.dmesg.gz Description: application/gzip
Re: Bug#733565: SIX messages per on boot console should be TWO
MKP == Martin K Petersen martin.peter...@oracle.com writes: MKP We have to discover the basics of the disk before we can create the MKP gendisk/block device/request queue. And some of the subsequent MKP parameters we need can't be stored or acted upon until everything has MKP been set up. So some questions we have to ask several times. Perhaps the messages could be each differentiated so the user doesn't see them as something that looks like a bug and needs to be reported. E.g., prefix/suffix with PHASE I CHECK, PHASE II CHECK, PHASE III CHECK. or FIRST CHECK, INTERMEDIATE CHECK, FINAL CHECK, or CHECK 1, CHECK 2... or INTERMEDIATE PROBE:, FINAL PROBE:, etc. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/2] target refcounting infrastructure fixes for usb
On Fri, Jan 03, 2014 at 09:56:45AM +0100, Hans de Goede wrote: Hi, On 01/03/2014 01:45 AM, Sarah Sharp wrote: Ok, Alan's additional patch fixed the warnings I was seeing on UAS device unplug. James, can you Cc me on the finished patch when you send it in? Hans, I don't want to send the UAS patches off to Greg until James' patches get into mainline. I believe Greg's usb-next tree is frozen at this point, so they'll have to wait until 3.15. Ugh, I must say I'm rather unhappy about this, waiting till 3.14 to give time to shake things out was fine. But since the start of the 3.13 cycle, there have been no issues found in the xhci / uas code. I completely understand why you're unhappy. I agree that the pull request you sent me on Nov 18th is fine (aside from not being a GPG signed git tag). [1] I also agree that the UAS and xHCI driver changes have been stable during my testing, other than triggering the SCSI oops on UAS disconnect. Yes it triggered an existing problem in another subsys, but the code itself has been issue free all this time. If Greg's tree is indeed already frozen I would rather have us asking an exception for this. Detach yourself emotionally from your code and look at this request from a maintainer's perspective. You're asking me to push 69 patches to Greg after -rc6 is out, for a driver that's been marked broken for several kernel releases (uas). The patches enable a feature that's been basically untested across all xHCI host controllers (streams), and they add new userspace API for usbfs to expose streams. In the meantime, there's a big push to get code into linux-next at least a week or two before the merge window opens. I sent my last pull request on Dec 20th, and Felipe closed his USB gadget tree on Dec 26th. I had hoped to get the SCSI issue settled so I could send in the UAS patches on the 20th, but that didn't happen. My tree is closed. These fixes should get merged (and will!), but I will not ask Greg for an exception to get these patches into 3.14. Alternatively we could add all of it to 3.14 except for the patch removing the BROKEN marking from uas. Or at least all the non uas bits, which are useful to have by themselves. I think the best way to proceed with this is for me to queue all the xHCI patches in that pull request for usb-next once 3.14-rc1 is out, and then you send a separate pull request to Greg. You're going to need to be able to send him pull requests with a signed git tag in the future anyway. James, what is the status of getting the fix for the refcount issue into mainline ? (Greg, James will be queuing the SCSI fix for the 3.14 merge window.) Sarah Sharp [1] http://marc.info/?l=linux-usbm=138478555324055w=2 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug#733565: SIX messages per on boot console should be TWO
Martin == Martin K Petersen martin.peter...@oracle.com writes: Martin I guess we could extend the first_scan logic to cover the error Martin case scenarios. [SCSI] sd: Quiesce mode sense error messages Messages about discovered disk properties are only printed once unless they are found to have changed. Errors encountered during mode sense, however, are printed every time we revalidate. Quiesce mode sense errors so they are only printed during the first scan. Signed-off-by: Martin K. Petersen martin.peter...@oracle.com diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 69725f7c32c1..14d601ed1956 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2283,7 +2283,7 @@ sd_read_write_protect_flag(struct scsi_disk *sdkp, unsigned char *buffer) set_disk_ro(sdkp-disk, 0); if (sdp-skip_ms_page_3f) { - sd_printk(KERN_NOTICE, sdkp, Assuming Write Enabled\n); + sd_first_printk(KERN_NOTICE, sdkp, Assuming Write Enabled\n); return; } @@ -2315,7 +2315,7 @@ sd_read_write_protect_flag(struct scsi_disk *sdkp, unsigned char *buffer) } if (!scsi_status_is_good(res)) { - sd_printk(KERN_WARNING, sdkp, + sd_first_printk(KERN_WARNING, sdkp, Test WP failed, assume Write Enabled\n); } else { sdkp-write_prot = ((data.device_specific 0x80) != 0); @@ -2383,7 +2383,8 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char *buffer) if (!data.header_length) { modepage = 6; first_len = 0; - sd_printk(KERN_ERR, sdkp, Missing header in MODE_SENSE response\n); + sd_first_printk(KERN_ERR, sdkp, + Missing header in MODE_SENSE response\n); } /* that went OK, now ask for the proper length */ @@ -2396,7 +2397,7 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char *buffer) if (len 3) goto bad_sense; else if (len SD_BUF_SIZE) { - sd_printk(KERN_NOTICE, sdkp, Truncating mode parameter + sd_first_printk(KERN_NOTICE, sdkp, Truncating mode parameter data from %d to %d bytes\n, len, SD_BUF_SIZE); len = SD_BUF_SIZE; } @@ -2419,8 +2420,9 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char *buffer) /* We're interested only in the first 3 bytes. */ if (len - offset = 2) { - sd_printk(KERN_ERR, sdkp, Incomplete - mode parameter data\n); + sd_first_printk(KERN_ERR, sdkp, + Incomplete mode parameter + data\n); goto defaults; } else { modepage = page_code; @@ -2434,14 +2436,15 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char *buffer) else if (!spf len - offset 1) offset += 2 + buffer[offset+1]; else { - sd_printk(KERN_ERR, sdkp, Incomplete - mode parameter data\n); + sd_first_printk(KERN_ERR, sdkp, + Incomplete mode + parameter data\n); goto defaults; } } } - sd_printk(KERN_ERR, sdkp, No Caching mode page found\n); + sd_first_printk(KERN_ERR, sdkp, No Caching mode page found\n); goto defaults; Page_found: @@ -2455,7 +2458,7 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char *buffer) sdkp-DPOFUA = (data.device_specific 0x10) != 0; if (sdkp-DPOFUA !sdkp-device-use_10_for_rw) { - sd_printk(KERN_NOTICE, sdkp, + sd_first_printk(KERN_NOTICE, sdkp, Uses READ/WRITE(6), disabling FUA\n); sdkp-DPOFUA = 0; } @@ -2477,16 +2480,19 @@ bad_sense: sshdr.sense_key == ILLEGAL_REQUEST sshdr.asc == 0x24 sshdr.ascq == 0x0) /* Invalid field in CDB */ - sd_printk(KERN_NOTICE, sdkp, Cache data unavailable\n); + sd_first_printk(KERN_NOTICE, sdkp, Cache data unavailable\n); else - sd_printk(KERN_ERR, sdkp, Asking for cache data failed\n); +
Re: [RFC 0/2] target refcounting infrastructure fixes for usb
Hi, On 01/03/2014 11:00 PM, Sarah Sharp wrote: On Fri, Jan 03, 2014 at 09:56:45AM +0100, Hans de Goede wrote: Hi, On 01/03/2014 01:45 AM, Sarah Sharp wrote: Ok, Alan's additional patch fixed the warnings I was seeing on UAS device unplug. James, can you Cc me on the finished patch when you send it in? Hans, I don't want to send the UAS patches off to Greg until James' patches get into mainline. I believe Greg's usb-next tree is frozen at this point, so they'll have to wait until 3.15. Ugh, I must say I'm rather unhappy about this, waiting till 3.14 to give time to shake things out was fine. But since the start of the 3.13 cycle, there have been no issues found in the xhci / uas code. I completely understand why you're unhappy. I agree that the pull request you sent me on Nov 18th is fine (aside from not being a GPG signed git tag). [1] I also agree that the UAS and xHCI driver changes have been stable during my testing, other than triggering the SCSI oops on UAS disconnect. Yes it triggered an existing problem in another subsys, but the code itself has been issue free all this time. If Greg's tree is indeed already frozen I would rather have us asking an exception for this. Detach yourself emotionally from your code and look at this request from a maintainer's perspective. You're asking me to push 69 patches to Greg after -rc6 is out, for a driver that's been marked broken for several kernel releases (uas). The patches enable a feature that's been basically untested across all xHCI host controllers (streams), and they add new userspace API for usbfs to expose streams. Yes 69 patches which have been ready since November 18. Your argument is that they were not in linux-next before now, my unhappiness comes from that they could have been in linux-next for some time now already. In the meantime, there's a big push to get code into linux-next at least a week or two before the merge window opens. I sent my last pull request on Dec 20th, and Felipe closed his USB gadget tree on Dec 26th. I had hoped to get the SCSI issue settled so I could send in the UAS patches on the 20th, but that didn't happen. My tree is closed. These fixes should get merged (and will!), but I will not ask Greg for an exception to get these patches into 3.14. Alternatively we could add all of it to 3.14 except for the patch removing the BROKEN marking from uas. Or at least all the non uas bits, which are useful to have by themselves. I think the best way to proceed with this is for me to queue all the xHCI patches in that pull request for usb-next once 3.14-rc1 is out, and then you send a separate pull request to Greg. You're going to need to be able to send him pull requests with a signed git tag in the future anyway. Since you have all the patches in your tree now anyways I believe it would be best if you just send a pull-req for all of them. I see little value in me sending a separate pull-req for the uas patches. As discussed before I'm fine with picking up uas maintenance from then on. Regards, Hans -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mptNsas MSI-X fixes
We found a couple of MSI-X related problems with mptNsas on very large systems. The patches below contain fixes for mpt2sas and mpt3sas respectively. drivers/scsi/mpt2sas/mpt2sas_base.c | 64 -- drivers/scsi/mpt3sas/mpt3sas_base.c | 73 ++ 2 files changed, 44 insertions(+), 93 deletions(-) -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] [SCSI] mpt2sas: Rework the MSI-X grouping code
From: Martin K. Petersen martin.peter...@oracle.com On systems with a non power-of-two CPU count the existing MSI-X grouping code failed to distribute interrupts correctly. Rework the code to handle arbitrary processor counts. Also remove the hardcoded upper limit on the number of processors so we can boot on large systems. Signed-off-by: Martin K. Petersen martin.peter...@oracle.com CC: Sreekanth Reddy sreekanth.re...@lsi.com --- drivers/scsi/mpt2sas/mpt2sas_base.c | 64 + 1 file changed, 23 insertions(+), 41 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index 3901edc35812..bfdbdf214169 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -1332,53 +1332,35 @@ _base_request_irq(struct MPT2SAS_ADAPTER *ioc, u8 index, u32 vector) static void _base_assign_reply_queues(struct MPT2SAS_ADAPTER *ioc) { - struct adapter_reply_queue *reply_q; - int cpu_id; - int cpu_grouping, loop, grouping, grouping_mod; + unsigned int cpu, nr_cpus, nr_msix, index = 0; if (!_base_is_controller_msix_enabled(ioc)) return; memset(ioc-cpu_msix_table, 0, ioc-cpu_msix_table_sz); - /* when there are more cpus than available msix vectors, -* then group cpus togeather on same irq -*/ - if (ioc-cpu_count ioc-msix_vector_count) { - grouping = ioc-cpu_count / ioc-msix_vector_count; - grouping_mod = ioc-cpu_count % ioc-msix_vector_count; - if (grouping 2 || (grouping == 2 !grouping_mod)) - cpu_grouping = 2; - else if (grouping 4 || (grouping == 4 !grouping_mod)) - cpu_grouping = 4; - else if (grouping 8 || (grouping == 8 !grouping_mod)) - cpu_grouping = 8; - else - cpu_grouping = 16; - } else - cpu_grouping = 0; - - loop = 0; - reply_q = list_entry(ioc-reply_queue_list.next, -struct adapter_reply_queue, list); - for_each_online_cpu(cpu_id) { - if (!cpu_grouping) { - ioc-cpu_msix_table[cpu_id] = reply_q-msix_index; - reply_q = list_entry(reply_q-list.next, - struct adapter_reply_queue, list); - } else { - if (loop cpu_grouping) { - ioc-cpu_msix_table[cpu_id] = - reply_q-msix_index; - loop++; - } else { - reply_q = list_entry(reply_q-list.next, - struct adapter_reply_queue, list); - ioc-cpu_msix_table[cpu_id] = - reply_q-msix_index; - loop = 1; - } + + nr_cpus = num_online_cpus(); + nr_msix = ioc-reply_queue_count = min(ioc-reply_queue_count, + ioc-facts.MaxMSIxVectors); + if (!nr_msix) + return; + + cpu = cpumask_first(cpu_online_mask); + + do { + unsigned int i, group = nr_cpus / nr_msix; + + if (index nr_cpus % nr_msix) + group++; + + for (i = 0 ; i group ; i++) { + ioc-cpu_msix_table[cpu] = index; + cpu = cpumask_next(cpu, cpu_online_mask); } - } + + index++; + + } while (cpu nr_cpus); } /** -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] [SCSI] mpt3sas: Rework the MSI-X grouping code
From: Martin K. Petersen martin.peter...@oracle.com On systems with a non power-of-two CPU count the existing MSI-X grouping code failed to distribute interrupts correctly. Rework the code to handle arbitrary processor counts. Also remove the hardcoded upper limit on the number of processors so we can boot on large systems. Signed-off-by: Martin K. Petersen martin.peter...@oracle.com Cc: Sreekanth Reddy sreekanth.re...@lsi.com --- drivers/scsi/mpt3sas/mpt3sas_base.c | 73 +++-- 1 file changed, 21 insertions(+), 52 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index fa785062e97b..35b95e61acd5 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -1624,66 +1624,35 @@ _base_request_irq(struct MPT3SAS_ADAPTER *ioc, u8 index, u32 vector) static void _base_assign_reply_queues(struct MPT3SAS_ADAPTER *ioc) { - struct adapter_reply_queue *reply_q; - int cpu_id; - int cpu_grouping, loop, grouping, grouping_mod; - int reply_queue; + unsigned int cpu, nr_cpus, nr_msix, index = 0; if (!_base_is_controller_msix_enabled(ioc)) return; memset(ioc-cpu_msix_table, 0, ioc-cpu_msix_table_sz); - /* NUMA Hardware bug workaround - drop to less reply queues */ - if (ioc-reply_queue_count ioc-facts.MaxMSIxVectors) { - ioc-reply_queue_count = ioc-facts.MaxMSIxVectors; - reply_queue = 0; - list_for_each_entry(reply_q, ioc-reply_queue_list, list) { - reply_q-msix_index = reply_queue; - if (++reply_queue == ioc-reply_queue_count) - reply_queue = 0; - } - } + nr_cpus = num_online_cpus(); + nr_msix = ioc-reply_queue_count = min(ioc-reply_queue_count, + ioc-facts.MaxMSIxVectors); + if (!nr_msix) + return; - /* when there are more cpus than available msix vectors, -* then group cpus togeather on same irq -*/ - if (ioc-cpu_count ioc-msix_vector_count) { - grouping = ioc-cpu_count / ioc-msix_vector_count; - grouping_mod = ioc-cpu_count % ioc-msix_vector_count; - if (grouping 2 || (grouping == 2 !grouping_mod)) - cpu_grouping = 2; - else if (grouping 4 || (grouping == 4 !grouping_mod)) - cpu_grouping = 4; - else if (grouping 8 || (grouping == 8 !grouping_mod)) - cpu_grouping = 8; - else - cpu_grouping = 16; - } else - cpu_grouping = 0; - - loop = 0; - reply_q = list_entry(ioc-reply_queue_list.next, -struct adapter_reply_queue, list); - for_each_online_cpu(cpu_id) { - if (!cpu_grouping) { - ioc-cpu_msix_table[cpu_id] = reply_q-msix_index; - reply_q = list_entry(reply_q-list.next, - struct adapter_reply_queue, list); - } else { - if (loop cpu_grouping) { - ioc-cpu_msix_table[cpu_id] = - reply_q-msix_index; - loop++; - } else { - reply_q = list_entry(reply_q-list.next, - struct adapter_reply_queue, list); - ioc-cpu_msix_table[cpu_id] = - reply_q-msix_index; - loop = 1; - } + cpu = cpumask_first(cpu_online_mask); + + do { + unsigned int i, group = nr_cpus / nr_msix; + + if (index nr_cpus % nr_msix) + group++; + + for (i = 0 ; i group ; i++) { + ioc-cpu_msix_table[cpu] = index; + cpu = cpumask_next(cpu, cpu_online_mask); } - } + + index++; + + } while (cpu nr_cpus); } /** -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug#733565: SIX messages per on boot console should be TWO
MKP == Martin K Petersen martin.peter...@oracle.com writes: MKP [SCSI] sd: Quiesce mode sense error messages Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/2] dual scan thread bug fix
On Fri, 2014-01-03 at 10:58 -0500, Alan Stern wrote: On Thu, 2 Jan 2014, James Bottomley wrote: In the highly unusual case where two threads are running concurrently through the scanning code scanning the same target, we run into the situation where one may allocate the target while the other is still using it. In this case, because the reap checks for STARGET_CREATED and kills the target without reference counting, the second thread will do the wrong thing on reap. Fix this by reference counting even creates and doing the STARGET_CREATED check in the final put. I'm still concerned about one thing. The previous patch does this in scsi_alloc_target(): found: - found_target-reap_ref++; + if (!kref_get_unless_zero(found_target-reap_ref)) + /* +* release routine already fired. Target is dead, but +* STARGET_DEL may not yet be set (set in the release +* routine), so set here as well, just in case +*/ + found_target-state = STARGET_DEL; spin_unlock_irqrestore(shost-host_lock, flags); As a result, the two comments in this patch aren't right: @@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref *kref) struct scsi_target *starget = container_of(kref, struct scsi_target, reap_ref); - transport_remove_device(starget-dev); - device_del(starget-dev); - starget-state = STARGET_DEL; + /* +* if we get here and the target is still in the CREATED state that +* means it was allocated but never made visible (because a scan +* turned up no LUNs), so don't call device_del() on it. +*/ + if (starget-state == STARGET_RUNNING) { + transport_remove_device(starget-dev); + device_del(starget-dev); + } Here the state could already be STARGET_DEL, even though the target is still visible. Well, I agree with the theory. In practise, there are only a few machine instructions between the kref going to zero and us reaching that point, because kref_release will jump into this routine next, so the condition would be very hard to see. However, I suppose it's easy to close by checking for != STARGET_CREATED and there's no reason not to do that, so I'll change it. Also, it's a little odd that the comment talks about CREATED but the code really checks for RUNNING. They should be consistent. != STARGET_CREATED should solve this. @@ -506,11 +513,13 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, */ void scsi_target_reap(struct scsi_target *starget) { + /* +* serious problem if this triggers: STARGET_DEL is only set in the +* kref release routine, so we're doing another final put on an +* already released kref +*/ BUG_ON(starget-state == STARGET_DEL); Here the code is okay but the comment is wrong: STARGET_DEL is set in _two_ places (but neither of them runs until reap_ref has reached 0). Would it be better in scsi_alloc_target() to behave as though the state were STARGET_DEL without actually setting it? Yes, I'll update the comment to it only goes to DEL after the kref goes to zero. How does the attached diff look? James --- diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 82cf902..2f7de33 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -390,7 +390,7 @@ static void scsi_target_reap_ref_release(struct kref *kref) * means it was allocated but never made visible (because a scan * turned up no LUNs), so don't call device_del() on it. */ - if (starget-state == STARGET_RUNNING) { + if (starget-state != STARGET_CREATED) { transport_remove_device(starget-dev); device_del(starget-dev); } @@ -514,9 +514,9 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, void scsi_target_reap(struct scsi_target *starget) { /* -* serious problem if this triggers: STARGET_DEL is only set in the -* kref release routine, so we're doing another final put on an -* already released kref +* serious problem if this triggers: STARGET_DEL is only set in the if +* the reap_ref drops to zero, so we're trying to do another final put +* on an already released kref */ BUG_ON(starget-state == STARGET_DEL); scsi_target_reap_ref_put(starget); -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html