Re: [RFC 0/2] target refcounting infrastructure fixes for usb

2014-01-03 Thread Hans de Goede

Hi,

On 01/03/2014 01:45 AM, Sarah Sharp wrote:

On Sat, Dec 21, 2013 at 03:51:25PM -0500, Alan Stern wrote:

On Fri, 20 Dec 2013, Sarah Sharp wrote:


On Thu, Dec 19, 2013 at 11:48:47AM -0800, James Bottomley wrote:

It should apply incrementally on top of the previous two.  If it
actually works, I'll fold it into the first patch.


Well, it doesn't oops anymore, but it does generate a pile of warnings:


This was a simple oversight.  scsi_target_reap() was called at the
start of __scsi_remove_device(), but it should be called at the end.
The patch below, applied on top of James's three patches, will fix the
warnings.

Alan Stern

P.S.: The comment isn't entirely correct any more...


Ok, Alan's additional patch fixed the warnings I was seeing on UAS
device unplug.  James, can you Cc me on the finished patch when you send
it in?

Hans, I don't want to send the UAS patches off to Greg until James'
patches get into mainline.  I believe Greg's usb-next tree is frozen at
this point, so they'll have to wait until 3.15.


Ugh, I must say I'm rather unhappy about this, waiting till 3.14 to give
time to shake things out was fine. But since the start of the 3.13 cycle,
there have been no issues found in the xhci / uas code.

Yes it triggered an existing problem in another subsys, but the code itself
has been issue free all this time. If Greg's tree is indeed already frozen I
would rather have us asking an exception for this.

Alternatively we could add all of it to 3.14 except for the patch removing
the BROKEN marking from uas. Or at least all the non uas bits, which are
useful to have by themselves.

James, what is the status of getting the fix for the refcount issue into
mainline ?

Regards,

Hans
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: status of block-integrity

2014-01-03 Thread Hannes Reinecke

On 12/22/2013 08:21 PM, Christoph Hellwig wrote:

We have the block integrity code to support DIF/DIX in the the tree for
about 5 and a half years, and we still don't have a single consumer of
it.  By normal kernel rules it should never have been merged, or at
least the bitrot long removed.

Given that we'll have a lot of work to do in this area with block
multiqueue I think it's time to either kill it off for good or make sure
we can actually use and test it.


Which would make an ideal topic for LSF, wouldn't it?

Personally, I doubt it's a good idea to kill it off, but
a proper (userland) API for it has been a long time missing.

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mpt2sas SATA spinup behavior

2014-01-03 Thread Larkin Lowrey
Thanks for the reply, Rob.

The spinup count and spinup interval were 2 targets and 2 seconds. I
changed the values to 8 targets and 0 seconds.  The change did not alter
the behavior I'm seeing.

To spin down the drives I'm sending an ATA STANDBY IMMEDIATE command via
ATA Pass-Through SG_IO. The controller is definitely aware of this
because LSI's sas2ircu utility reports the drive status as Available
when in standby as opposed to Ready when spinning.

I'm running an IT firmware and these drives are JBOD as far as the
controller is concerned.

--Larkin

On 1/2/2014 5:02 PM, Elliott, Robert (Server Storage) wrote:
 In SAS systems, it's common for controllers and expanders to rotate spinup
 permission across phys to avoid overloading the power supply (e.g., if the
 system is sized to support 8 x 18 W drives, it might not work if 8 x 45 W
 are briefly consumed).  The most common algorithm allows n phys to spinup 
 at a time, waiting m seconds for the next set.  It would be prudent for a
 controller in an unknown server to be very cautious and use large values.

 A web search shows the LSI WebBIOS Configuration Utility offers 
 these controls:
 Spinup Drive Count 
  Controls the number of drives that spin up simultaneously.
 The default is 2 drives. 
  
  Spinup Delay 
  Controls the interval (in seconds) between the spinup of drives connected 
 to this controller. The delay prevents a drain on the system power supply
  that would occur if all drives spun up at the same time. 
 The default is 12 seconds. 

 For SAS drives, spinup is controlled with the NOTIFY (ENABLE SPINUP)
 primitive.  This occurs at power on and during any subsequent power draws 
 due to changing back to the active power condition using START STOP UNIT 
 commands (from stopped) or media access commands (from standby).

 For SATA drives, spinup is controlled at power on by delaying the initial
 OOB  signal exchange.  There is no easy way to control spinup thereafter if
 changing back to the active power condition with a media access command
 (from standby).

 How are you spinning down these drives - with the SCSI START STOP UNIT 
 command being translated by the LSI controller's SCSI-to-ATA translation
 layer into an ATA STANDBY IMMEDIATE command?  LSI might be keeping
 track of the drive's power condition and delaying media access commands 
 per the spinup rotation timing, while Marvell might be just passing the
 commands through.
 ---
 Rob ElliottHP Server Storage


 -Original Message-
 From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
 ow...@vger.kernel.org] On Behalf Of Larkin Lowrey
 Sent: Tuesday, 31 December, 2013 5:42 PM
 To: linux-scsi@vger.kernel.org
 Subject: mpt2sas SATA spinup behavior

 I'm seeing odd behavior while spinning up SATA drives on my LSI SAS 2008
 controller.

 I have 8 drives I keep spun down (most of the time).  I wrote a tool to
 spin them all up at the same time by reading a sector from each drive
 (one thread per drive). Four of the drives are connected to a Marvell
 controller (mvsas) and the other four to an LSI 2008 (mpt2sas).

 The four on the mvsas controller all finish spinning up after ~13s. The
 four on the mpt2sas controller finish after 40s. The mpt2sas drives,
 when spun up individually, will complete spin up in ~9s (except one at
 13s). It appears that each of the four drives are being accessed
 sequentially instead of in parallel and that they must all complete
 their spin up before any can complete their I/O. The mvsas drives, on
 the other hand, perform their spin-up I/O in parallel (different brand
 drive, 13s spin-up).

 Is there something unique to the LSI 2008 that requires SATA spin-up to
 be handled this way (sequentially)?

 I see no errors in dmesg/syslog. Are there any debug facilities that
 might shed light on what's going on?

 Can anyone recommend areas in the source code where I might start
 hunting for a root cause?

 Here's some additional detail. My tool watches for activity on any
 member drive and when there is activity on one it will spin up the
 remaining drives. In this first case I kicked an mpt2sas drive so the
 remaining 3 would be spun up along with the 4 mvsas drives. The three
 large numbers in brackets are milliseconds since the beginning of time.
 The first is the timestamp right before the block device is opened
 (O_DIRECT), the second is after open but before read(), and the third is
 after the read has completed.

 Interestingly, the open() does not complete for the 3 mpt2sas drives
 until 9s after the trigger drive was kicked. So, it appears that all I/O
 for the remaining drives was blocked while the controller waited for the
 first drive to respond. That seems bad.

 Dec 31 16:26:14 fubar hdpwr[5941]: Now spinning: /dev/sdf
 ST4000DM000-1F2168 s/n:#NRE
 Dec 31 16:26:14 fubar hdpwr[5941]: Spinning up /dev/sdg
 ST4000DM000-1F2168 s/n:#C85
 Dec 31 16:26:14 fubar hdpwr[5941]: Spinning up /dev/sdh
 ST4000DM000-1F2168 s/n:#C8M
 Dec 31 

Re: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst

2014-01-03 Thread walt
On 01/02/2014 11:15 AM, Sarah Sharp wrote:
 On Tue, Dec 31, 2013 at 12:40:16PM -0800, walt wrote:
 On 12/18/2013 01:11 PM, Greg Kroah-Hartman wrote:
 3.12-stable review patch.  If anyone has any objections, please let me know.

 --

 From: David Laight david.lai...@aculab.com

 commit 35773dac5f862cb1c82ea151eba3e2f6de51ec3e upstream.

 Section 4.11.7.1 of rev 1.0 of the xhci specification states that a link TRB
 can only occur at a boundary between underlying USB frames (512 bytes for
 high speed devices).

 If this isn't done the USB frames aren't formatted correctly and, for 
 example,
 the USB3 ethernet ax88179_178a card will stop sending...


 Unfortunately this patch causes a regression when copying large files to my
 outboard USB3 drive. (Nothing at all to do with networking.)

 Do you have CONFIG_USB_DEBUG turned on for 3.13?  If so, you should see
 dmesg output from this statement shortly before your drive fails:
 
 if (num_trbs = TRBS_PER_SEGMENT) {
   xhci_err(xhci, Too many fragments %d, max %d\n,
   num_trbs, TRBS_PER_SEGMENT - 1);
   return -ENOMEM;
 }

Well, the answers depend on whether the usb3 drive uses logical volumes or not
(lvm2), which I can't explain.  What I've described so far is with lvm2.

When using lvm2 on the usb3 drive, turning on USB_DEBUG has *no* effect -- the
console prints two or three lines stating that the ext4 journal has quit and
the drive is remounted ro.  That particular drive stays wedged until the next
reboot, but no other ill effects to the system.

OTOH, when I put a disk with just an ordinary ext4 partition in the usb3 dock,
(no logical volumes) the copy failure becomes catastrophic, with kernel panic
messages, leaving the system unresponsive and needing a hard reset to recover.

I also tried your other suggestion:

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 4265b48..1a6a43d 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -4714,7 +4714,7 @@ int xhci_gen_setup(struct usb_hcd *hcd, xhci_get_quirks_t 
get_quirks)
int retval;
 
/* Accept arbitrarily long scatter-gather lists */
-   hcd-self.sg_tablesize = ~0;
+   hcd-self.sg_tablesize = 31;
 
/* support to build packet from discontinuous buffers */
hcd-self.no_sg_constraint = 1;

Sadly it didn't fix the problem.  Did I get the patch right?

Thanks for your help, and I'm happy to try more ideas, as always.


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 4/4] arm64: Add APM X-Gene SoC SATA host controller DTS entries

2014-01-03 Thread Arnd Bergmann
On Friday 03 January 2014, Loc Ho wrote:
 +   sata23clk: sata23clk@1f22c000 {
 +   compatible = apm,xgene-device-clock;
 +   #clock-cells = 1;
 +   clocks = socplldiv2 0;
 +   clock-names = sata23clk;

 +   };
 +
 +   sata45clk: sata45clk@1f23c000 {
 +   compatible = apm,xgene-device-clock;
 +   #clock-cells = 1;
 +   clocks = socplldiv2 0;
 +   clock-names = sata45clk;

Something is wrong here: You have two devices with the same compatible
string but using different clock-names strings. The binding document
lists this as an optional property with the description  shall be
the name of the device clock. If missing, use the device name, which
doesn't seem to make any sense.

Please fix the binding and the existing users of this, and don't introduce
any more broken instances. Since each device clock is documented to
have only one parent anyway, please just make it an anonymous clock.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/4] ata: Add APM X-Gene SoC SATA host controller driver

2014-01-03 Thread Arnd Bergmann
On Friday 03 January 2014, Loc Ho wrote:
 +
 +/* Controller who PHY shared with SGMII Ethernet PHY */
 +#define XGENE_AHCI_SGMII_DTS apm,xgene-ahci-sgmii
 +
 +/* Controller who PHY (internal reference clock macro) shared with PCIe */
 +#define XGENE_AHCI_PCIE_DTS  apm,xgene-ahci-pcie

Kill these macros. I've commented on this in the past.

Also, there really shouldn't be any difference to the SATA driver
based on what PHY is used, so the strings shouldn't be different
either.

 +/* SATA host controller CSR */
 +#define SLVRDERRATTRIBUTES_ADDR  0x
 +#define SLVWRERRATTRIBUTES_ADDR  0x0004
 +#define MSTRDERRATTRIBUTES_ADDR  0x0008
 +#define MSTWRERRATTRIBUTES_ADDR  0x000c
 +#define BUSCTLREG_ADDR   0x0014

Are these strings taken literally from the data sheet? You can probably
drop the _ADDR part.

 +#define  MSTAWAUX_COHERENT_BYPASS_SET(dst, src) \
 + (((dst)  ~0x0002) | (((u32)(src)1)  0x0002))
 +#define  MSTARAUX_COHERENT_BYPASS_SET(dst, src) \
 + (((dst)  ~0x0001) | (((u32)(src))  0x0001))

The macros are only used once and don't really help readability.
Just open-code them. If you have complex macros like these
and use them multiple times, it may be better to use an inline
function.

 +static void xgene_ahci_enable_phy(struct xgene_ahci_context *ctx,
 +   int channel, int enable)
 +{
 + void *mmio = ctx-mmio_base;
 + u32 val;
 +
 + xgene_rd(mmio + PORTCFG_ADDR, val);
 + val = PORTADDR_SET(val, channel == 0 ? 2 : 3);
 + xgene_wr_flush(mmio + PORTCFG_ADDR, val);
 + xgene_rd(mmio + PORTPHY1CFG_ADDR, val);
 + val = PORTPHY1CFG_FRCPHYRDY_SET(val, enable);
 + xgene_wr(mmio + PORTPHY1CFG_ADDR, val);
 +}
 +
 +static void xgene_ahci_set_phy_cfg(struct xgene_ahci_context *ctx, int 
 channel)
 +{
 + void *mmio = ctx-mmio_base;
 + u32 val;
 +
 + dev_dbg(ctx-dev, port configure mmio 0x%p channel %d\n,
 + mmio, channel);
 + xgene_rd(mmio + PORTCFG_ADDR, val);
 + val = PORTADDR_SET(val, channel == 0 ? 2 : 3);
 + xgene_wr_flush(mmio + PORTCFG_ADDR, val);
 + /* Disable fix rate */
 + xgene_wr_flush(mmio + PORTPHY1CFG_ADDR, 0x0001fffe);
 + xgene_wr_flush(mmio + PORTPHY2CFG_ADDR, 0x5018461c);
 + xgene_wr_flush(mmio + PORTPHY3CFG_ADDR, 0x1c081907);
 + xgene_wr_flush(mmio + PORTPHY4CFG_ADDR, 0x1c080815);
 + xgene_rd(mmio + PORTPHY5CFG_ADDR, val);
 + /* Window negotiation 0x800 to 0x400 */
 + val = PORTPHY5CFG_RTCHG_SET(val, 0x300);
 + xgene_wr_flush(mmio + PORTPHY5CFG_ADDR, val);
 + xgene_rd(mmio + PORTAXICFG_ADDR, val);
 + val = PORTAXICFG_EN_CONTEXT_SET(val, 0x1); /* enable context mgmt */
 + val = PORTAXICFG_OUTTRANS_SET(val, 0xe); /* Outstanding */
 + xgene_wr_flush(mmio + PORTAXICFG_ADDR, val);
 +}

This looks like it should be part of the PHY driver?

 + /*
 +  * When both ACPI and DTS are enabled, custom ACPI built-in ACPI
 +  * table, and booting via DTS, we need to skip the probe of the
 +  * built-in ACPI table probe.
 +  */
 + if (!efi_enabled(EFI_BOOT)  dev-of_node == NULL)
 + return -ENODEV;

I think I've commented on this one before too. The comment talks
about ACPI, but the code is about EFI, which is completely unrelated.
Neither the code nor the comment seems to make sense here and
you wouldn't be in this function in the first place if the device
is not defined in ACPI or in DT.

 + /* Can't use devm_ioremap_resource due to overlapping region */
 + hpriv-csr_base = devm_ioremap(dev, res-start, resource_size(res));
 + if (!hpriv-csr_base) {
 + dev_err(dev, can't map %pR\n, res);
 + return -ENOMEM;
 + }

Why are the regions overlapping? That should not happen!

 + /* Select ATA */
 + if (of_device_is_compatible(pdev-dev.of_node,
 + XGENE_AHCI_SGMII_DTS)) {
 + if (xgene_ahci_mux_select(hpriv)) {
 + dev_err(dev, SATA mux selection failed\n);
 + return -ENODEV;
 + }
 + }

What kind of mux is this? Why does the SATA driver need to care about
it? It sounds like this should be part of the pinctrl driver.

 + /* Setup DMA mask */
 + rc = dma_set_mask(dev, DMA_BIT_MASK(64));
 + if (rc) {
 + dev_err(dev, Unable to set dma mask\n);
 + goto error;
 + }
 + rc = dma_set_coherent_mask(dev, DMA_BIT_MASK(64));
 + if (rc) {
 + dev_err(dev, Unable to set dma coherent mask\n);
 + goto error;
 + }

dma_set_mask_and_coherent?

 +
 +static const struct acpi_device_id xgene_ahci_acpi_match[] = {
 + {APMC0D00, 0},
 + {},
 +};
 +MODULE_DEVICE_TABLE(acpi, xgene_ahci_acpi_match);

Just drop the ACPI part for now. It's clear that the driver won't
work with ACPI in this state, since it would need to 

Re: [RFC 2/2] dual scan thread bug fix

2014-01-03 Thread Alan Stern
On Thu, 2 Jan 2014, James Bottomley wrote:

 In the highly unusual case where two threads are running concurrently through
 the scanning code scanning the same target, we run into the situation where
 one may allocate the target while the other is still using it.  In this case,
 because the reap checks for STARGET_CREATED and kills the target without
 reference counting, the second thread will do the wrong thing on reap.
 
 Fix this by reference counting even creates and doing the STARGET_CREATED
 check in the final put.

I'm still concerned about one thing.  The previous patch does this in
scsi_alloc_target():

   found:
 - found_target-reap_ref++;
 + if (!kref_get_unless_zero(found_target-reap_ref))
 + /*
 +  * release routine already fired.  Target is dead, but
 +  * STARGET_DEL may not yet be set (set in the release
 +  * routine), so set here as well, just in case
 +  */
 + found_target-state = STARGET_DEL;
   spin_unlock_irqrestore(shost-host_lock, flags);

As a result, the two comments in this patch aren't right:

 @@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref 
 *kref)
   struct scsi_target *starget
   = container_of(kref, struct scsi_target, reap_ref);
  
 - transport_remove_device(starget-dev);
 - device_del(starget-dev);
 - starget-state = STARGET_DEL;
 + /*
 +  * if we get here and the target is still in the CREATED state that
 +  * means it was allocated but never made visible (because a scan
 +  * turned up no LUNs), so don't call device_del() on it.
 +  */
 + if (starget-state == STARGET_RUNNING) {
 + transport_remove_device(starget-dev);
 + device_del(starget-dev);
 + }

Here the state could already be STARGET_DEL, even though the target is
still visible.

Also, it's a little odd that the comment talks about CREATED but the 
code really checks for RUNNING.  They should be consistent.

 @@ -506,11 +513,13 @@ static struct scsi_target *scsi_alloc_target(struct 
 device *parent,
   */
  void scsi_target_reap(struct scsi_target *starget)
  {
 + /*
 +  * serious problem if this triggers: STARGET_DEL is only set in the
 +  * kref release routine, so we're doing another final put on an
 +  * already released kref
 +  */
   BUG_ON(starget-state == STARGET_DEL);

Here the code is okay but the comment is wrong: STARGET_DEL is set in 
_two_ places (but neither of them runs until reap_ref has reached 0).

Would it be better in scsi_alloc_target() to behave as though the state 
were STARGET_DEL without actually setting it?

Alan Stern


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 4/4] arm64: Add APM X-Gene SoC SATA host controller DTS entries

2014-01-03 Thread Loc Ho
Hi,

 +   sata23clk: sata23clk@1f22c000 {
 +   compatible = apm,xgene-device-clock;
 +   #clock-cells = 1;
 +   clocks = socplldiv2 0;
 +   clock-names = sata23clk;

 +   };
 +
 +   sata45clk: sata45clk@1f23c000 {
 +   compatible = apm,xgene-device-clock;
 +   #clock-cells = 1;
 +   clocks = socplldiv2 0;
 +   clock-names = sata45clk;

 Something is wrong here: You have two devices with the same compatible
 string but using different clock-names strings. The binding document
 lists this as an optional property with the description  shall be
 the name of the device clock. If missing, use the device name, which
 doesn't seem to make any sense.

 Please fix the binding and the existing users of this, and don't introduce
 any more broken instances. Since each device clock is documented to
 have only one parent anyway, please just make it an anonymous clock.

Okay I miss understood this... I will need to fix the X-Gene device
parent clocks as well in an separate patch to the clock driver owner.

-Loc
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 4/4] arm64: Add APM X-Gene SoC SATA host controller DTS entries

2014-01-03 Thread Arnd Bergmann
On Friday 03 January 2014, Loc Ho wrote:
 Okay I miss understood this... I will need to fix the X-Gene device
 parent clocks as well in an separate patch to the clock driver owner.

Ok, thanks!

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst

2014-01-03 Thread Sarah Sharp
On Fri, Jan 03, 2014 at 07:40:33AM -0800, walt wrote:
 On 01/02/2014 11:15 AM, Sarah Sharp wrote:
  On Tue, Dec 31, 2013 at 12:40:16PM -0800, walt wrote:
  On 12/18/2013 01:11 PM, Greg Kroah-Hartman wrote:
  3.12-stable review patch.  If anyone has any objections, please let me 
  know.
 
  --
 
  From: David Laight david.lai...@aculab.com
 
  commit 35773dac5f862cb1c82ea151eba3e2f6de51ec3e upstream.
 
  Section 4.11.7.1 of rev 1.0 of the xhci specification states that a link 
  TRB
  can only occur at a boundary between underlying USB frames (512 bytes for
  high speed devices).
 
  If this isn't done the USB frames aren't formatted correctly and, for 
  example,
  the USB3 ethernet ax88179_178a card will stop sending...
 
 
  Unfortunately this patch causes a regression when copying large files to my
  outboard USB3 drive. (Nothing at all to do with networking.)
 
  Do you have CONFIG_USB_DEBUG turned on for 3.13?  If so, you should see
  dmesg output from this statement shortly before your drive fails:
  
  if (num_trbs = TRBS_PER_SEGMENT) {
  xhci_err(xhci, Too many fragments %d, max %d\n,
  num_trbs, TRBS_PER_SEGMENT - 1);
  return -ENOMEM;
  }
 
 Well, the answers depend on whether the usb3 drive uses logical volumes or not
 (lvm2), which I can't explain.  What I've described so far is with lvm2.
 
 When using lvm2 on the usb3 drive, turning on USB_DEBUG has *no* effect -- the
 console prints two or three lines stating that the ext4 journal has quit and
 the drive is remounted ro.  That particular drive stays wedged until the next
 reboot, but no other ill effects to the system.

Odd. In 3.12 xHCI has dynamic debugging, and turning on CONFIG_USB_DEBUG
should turn on debugging by default, so it's confusing that you didn't
see any messages.

Can I see your .config from /boot/?  Also, did you try capturing dmesg
with `tail -f /var/log/kern.log` or just dmesg?  Perhaps you need to run
`sudo dmesg -n 7`?

 OTOH, when I put a disk with just an ordinary ext4 partition in the usb3 dock,
 (no logical volumes) the copy failure becomes catastrophic, with kernel panic
 messages, leaving the system unresponsive and needing a hard reset to recover.
 
 I also tried your other suggestion:
 
 diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
 index 4265b48..1a6a43d 100644
 --- a/drivers/usb/host/xhci.c
 +++ b/drivers/usb/host/xhci.c
 @@ -4714,7 +4714,7 @@ int xhci_gen_setup(struct usb_hcd *hcd, 
 xhci_get_quirks_t get_quirks)
 int retval;
  
 /* Accept arbitrarily long scatter-gather lists */
 -   hcd-self.sg_tablesize = ~0;
 +   hcd-self.sg_tablesize = 31;
  
 /* support to build packet from discontinuous buffers */
 hcd-self.no_sg_constraint = 1;
 
 Sadly it didn't fix the problem.  Did I get the patch right?

Yes, you did.  So perhaps the patch triggers a different bug.  I can't
tell until I see xHCI debugging output.

 Thanks for your help, and I'm happy to try more ideas, as always.

Thanks for your patience. :)

Sarah Sharp
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/4] ata: Add APM X-Gene SoC SATA host controller driver

2014-01-03 Thread Loc Ho
Hi,

 +/* Controller who PHY shared with SGMII Ethernet PHY */
 +#define XGENE_AHCI_SGMII_DTS apm,xgene-ahci-sgmii
 +
 +/* Controller who PHY (internal reference clock macro) shared with PCIe */
 +#define XGENE_AHCI_PCIE_DTS  apm,xgene-ahci-pcie

 Kill these macros. I've commented on this in the past.

 Also, there really shouldn't be any difference to the SATA driver
 based on what PHY is used, so the strings shouldn't be different
 either.

The only need for this is the MUX when the PHY is shared between SGMII
and SATA. If we get rip of the MUX code, we can drop this. MUX
discussion is below.


 +/* SATA host controller CSR */
 +#define SLVRDERRATTRIBUTES_ADDR  0x
 +#define SLVWRERRATTRIBUTES_ADDR  0x0004
 +#define MSTRDERRATTRIBUTES_ADDR  0x0008
 +#define MSTWRERRATTRIBUTES_ADDR  0x000c
 +#define BUSCTLREG_ADDR   0x0014

 Are these strings taken literally from the data sheet? You can probably
 drop the _ADDR part.

These names are directly from the programming reference manual. I will
get rip of the _ADDR for both drivers - host and PHY.


 +#define  MSTAWAUX_COHERENT_BYPASS_SET(dst, src) \
 + (((dst)  ~0x0002) | (((u32)(src)1)  0x0002))
 +#define  MSTARAUX_COHERENT_BYPASS_SET(dst, src) \
 + (((dst)  ~0x0001) | (((u32)(src))  0x0001))

 The macros are only used once and don't really help readability.
 Just open-code them. If you have complex macros like these
 and use them multiple times, it may be better to use an inline
 function.

For these particular one, I will get rip of them.


 +static void xgene_ahci_enable_phy(struct xgene_ahci_context *ctx,
 +   int channel, int enable)
 +{
 + void *mmio = ctx-mmio_base;
 + u32 val;
 +
 + xgene_rd(mmio + PORTCFG_ADDR, val);
 + val = PORTADDR_SET(val, channel == 0 ? 2 : 3);
 + xgene_wr_flush(mmio + PORTCFG_ADDR, val);
 + xgene_rd(mmio + PORTPHY1CFG_ADDR, val);
 + val = PORTPHY1CFG_FRCPHYRDY_SET(val, enable);
 + xgene_wr(mmio + PORTPHY1CFG_ADDR, val);
 +}
 +
 +static void xgene_ahci_set_phy_cfg(struct xgene_ahci_context *ctx, int 
 channel)
 +{
 + void *mmio = ctx-mmio_base;
 + u32 val;
 +
 + dev_dbg(ctx-dev, port configure mmio 0x%p channel %d\n,
 + mmio, channel);
 + xgene_rd(mmio + PORTCFG_ADDR, val);
 + val = PORTADDR_SET(val, channel == 0 ? 2 : 3);
 + xgene_wr_flush(mmio + PORTCFG_ADDR, val);
 + /* Disable fix rate */
 + xgene_wr_flush(mmio + PORTPHY1CFG_ADDR, 0x0001fffe);
 + xgene_wr_flush(mmio + PORTPHY2CFG_ADDR, 0x5018461c);
 + xgene_wr_flush(mmio + PORTPHY3CFG_ADDR, 0x1c081907);
 + xgene_wr_flush(mmio + PORTPHY4CFG_ADDR, 0x1c080815);
 + xgene_rd(mmio + PORTPHY5CFG_ADDR, val);
 + /* Window negotiation 0x800 to 0x400 */
 + val = PORTPHY5CFG_RTCHG_SET(val, 0x300);
 + xgene_wr_flush(mmio + PORTPHY5CFG_ADDR, val);
 + xgene_rd(mmio + PORTAXICFG_ADDR, val);
 + val = PORTAXICFG_EN_CONTEXT_SET(val, 0x1); /* enable context mgmt */
 + val = PORTAXICFG_OUTTRANS_SET(val, 0xe); /* Outstanding */
 + xgene_wr_flush(mmio + PORTAXICFG_ADDR, val);
 +}

 This looks like it should be part of the PHY driver?

There are the corresponding configure on the host controller side. It
is not part of the PHY. Each host such as SATA, SGMII, or PCIe has its
own version of these registers. It can be completely different between
SATA and PCIe for example.


 + /*
 +  * When both ACPI and DTS are enabled, custom ACPI built-in ACPI
 +  * table, and booting via DTS, we need to skip the probe of the
 +  * built-in ACPI table probe.
 +  */
 + if (!efi_enabled(EFI_BOOT)  dev-of_node == NULL)
 + return -ENODEV;

 I think I've commented on this one before too. The comment talks
 about ACPI, but the code is about EFI, which is completely unrelated.
 Neither the code nor the comment seems to make sense here and
 you wouldn't be in this function in the first place if the device
 is not defined in ACPI or in DT.

Let remove them for the time being. Unless one has an custom ACPI
table built in the kernel, it isn't a problem.


 + /* Can't use devm_ioremap_resource due to overlapping region */
 + hpriv-csr_base = devm_ioremap(dev, res-start, resource_size(res));
 + if (!hpriv-csr_base) {
 + dev_err(dev, can't map %pR\n, res);
 + return -ENOMEM;
 + }

 Why are the regions overlapping? That should not happen!

Each host controller/PHY includes the following resource:

0x. for the host core register
0x.7000 for the mux select if shared with SGMII
0x.A000 for the PHY indirect access
0x.C000 for the host/PHY clocks
0x.D000 for the RAM shutdown removal

As I only used one resource register of size 64K, it overlapped with
the 0x.A000 mapped by the PHY already. If you believe it is better
that I have two resources 

Re: spinlock_irqsave() flags (Was: pm80xx: Spinlock fix)

2014-01-03 Thread Dan Williams
On Thu, Jan 2, 2014 at 2:31 AM, Suresh Thiagarajan
suresh.thiagara...@pmcs.com wrote:


 On Fri, Dec 27, 2013 at 9:48 PM, Oleg Nesterov o...@redhat.com wrote:
 On 12/24, Suresh Thiagarajan wrote:

 Below is a small pseudo code on protecting/serializing the flag for global 
 access.
 struct temp
 {
   ...
   spinlock_t lock;
   unsigned long lock_flags;
 };
 void my_lock(struct temp *t)
 {
unsigned long flag; // thread-private variable as suggested
spin_lock_irqsave(t-lock, flag);
t-lock_flags = flag; //updating inside critical section now 
 to serialize the access to flag
 }

 void my_unlock(struct temp *t)
 {
unsigned long flag = t-lock_flags;
t-lock_flags = 0;  //clearing it before getting out of 
 critical section
spin_unlock_irqrestore(t-lock, flag);
 }

 Yes, this should work as a quick fix. And you do not need to clear 
 -lock_flags
 in my_unlock().

 But when I look at original patch again, I no longer understand why do
 you need pm8001_ha-lock_flags at all. Of course I do not understand this
 code, I am sure I missed something, but at first glance it seems that only
 this sequence

 spin_unlock_irq(pm8001_ha-lock);
 t-task_done(t);
 spin_lock_irq(pm8001_ha-lock);

 should be fixed?

 If yes, why you can't simply do spin_unlock() + spin_lock() around
 t-task_done() ? This won't enable irqs, but spin_unlock_irqrestore()
 doesn't necessarily enables irqs too, so -task_done() can run with
 irqs disabled?

 And note that the pattern above has a lot of users, perhaps it makes
 sense to start with something like the patch below?

 Thanks James, Oleg and all for your inputs.
 Will start with review and testing this patch and then work/investigate to 
 keep shortest and clearest critical
 section so that we can have the lock and unlock within the same routine.


Fwiw we solved this in libsas a while back with a similar pattern
proposed by Oleg:

unsigned long flags;

local_irq_save(flags);
spin_unlock(lock);
...
spin_lock_lock(lock);
local_irq_restore(flags);

See commit 312d3e56119a [SCSI] libsas: remove ata_port.lock
management duties from lldds

--
Dan
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: status of block-integrity

2014-01-03 Thread Martin K. Petersen
 Hannes == Hannes Reinecke h...@suse.de writes:

Hannes Personally, I doubt it's a good idea to kill it off, but a
Hannes proper (userland) API for it has been a long time missing.

Before we throw the baby out with the bath water, maybe Darrick can fill
us in on the progress of the aio passthrough interface?

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bio_integrity_verify() bug causing READ verify to be silently skipped

2014-01-03 Thread Martin K. Petersen
 nab == Nicholas A Bellinger n...@linux-iscsi.org writes:

nab Given that bio_integrity_verify() is using bio_for_each_segment(),
nab the loop starts from the updated bio-bi_idx, and not a zero value,
nab which ends up skipping individual bio segment calls to
nab bi-verify_fn().

That's botched. Verify is meant to be called with the completed bytes
before the index is tampered with.


nab The following patch changes bio_integrity_verify() to use
nab bio_for_each_segment_all() instead of bio_for_each_segment() to
nab ensure that the segment walk always starts from zero, regardless of
nab the current bio-bi_idx value after bio_advance().

That breaks partial completion, though. I'll take a look at Kent's
changes...

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#733565: SIX messages per on boot console should be TWO

2014-01-03 Thread Martin K. Petersen
 Ben == Ben Hutchings b...@decadent.org.uk writes:

Ben I can see that it is emitted by sd_read_cache_type(), which is
Ben called by sd_revalidate_disk(), and that is apparently now called 3
Ben times during probe.  Which is quite ridiculous.

We have to discover the basics of the disk before we can create the
gendisk/block device/request queue. And some of the subsequent
parameters we need can't be stored or acted upon until everything has
been set up. So some questions we have to ask several times.


Ben And I wonder whether this situation (no caching mode page) is
Ben really serious enough to deserve logging at ERR severity?

I guess we could extend the first_scan logic to cover the error case
scenarios. But it is quite unusual for a device to not implement the
caching mode page...

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] isci: fix reset timeout handling

2014-01-03 Thread Dan Williams
Remove an erroneous BUG_ON() in the case of a hard reset timeout.  The
reset timeout handler puts the port into the awaiting link-up state.
The timeout causes the device to be disconnected and we need to be in
the awaiting link-up state to re-connect the port.  The BUG_ON() made
the incorrect assumption that resets never timeout and we always
complete the reset in the resetting state.

Testing this patch also uncovered that libata continues to attempt to
reset the port long after the driver has torn down the context.  Once
the driver has committed to abandoning the link it must indicate to
libata that recovery ends by returning -ENODEV from
-lldd_I_T_nexus_reset().

Cc: sta...@vger.kernel.org
Cc: Maciej Patelczyk maciej.patelc...@intel.com
Cc: Dave Jiang dave.ji...@intel.com
Acked-by: Lukasz Dorau lukasz.do...@intel.com
Reported-by: David Milburn dmilb...@redhat.com
Reported-by: Xun Ni xun...@intel.com
Tested-by: Xun Ni xun...@intel.com
Signed-off-by: Dan Williams dan.j.willi...@intel.com
---
v2: Dave reported the build warning regression the last patch caused,
and I clarified the changelog a bit.

 drivers/scsi/isci/port_config.c |7 ---
 drivers/scsi/isci/task.c|2 +-
 2 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/isci/port_config.c b/drivers/scsi/isci/port_config.c
index 85c77f6b802b..ac879745ef80 100644
--- a/drivers/scsi/isci/port_config.c
+++ b/drivers/scsi/isci/port_config.c
@@ -615,13 +615,6 @@ static void sci_apc_agent_link_up(struct isci_host *ihost,
  
SCIC_SDS_APC_WAIT_LINK_UP_NOTIFICATION);
} else {
/* the phy is already the part of the port */
-   u32 port_state = iport-sm.current_state_id;
-
-   /* if the PORT'S state is resetting then the link up is from
-* port hard reset in this case, we need to tell the port
-* that link up is recieved
-*/
-   BUG_ON(port_state != SCI_PORT_RESETTING);
port_agent-phy_ready_mask |= 1  phy_index;
sci_port_link_up(iport, iphy);
}
diff --git a/drivers/scsi/isci/task.c b/drivers/scsi/isci/task.c
index 0d30ca849e8f..5d6fda72d659 100644
--- a/drivers/scsi/isci/task.c
+++ b/drivers/scsi/isci/task.c
@@ -801,7 +801,7 @@ int isci_task_I_T_nexus_reset(struct domain_device *dev)
/* XXX: need to cleanup any ireqs targeting this
 * domain_device
 */
-   ret = TMF_RESP_FUNC_COMPLETE;
+   ret = -ENODEV;
goto out;
}
 

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi-sg: pass flag to inhibit setting LUN

2014-01-03 Thread Jiří Pinkava
Dne 3.1.2014 21:21, Martin K. Petersen napsal(a):
 Jiří == Jiří Pinkava j...@seznam.cz writes:
 Jiří This patch implements support for inhibiting setting LUN number
 Jiří for SCSI custom command send via /dev/sgX with ioctl(.., SG_IO,
 Jiří ...) call.

 Is it an absolute requirement that this is a per-command option or could
 we have a scsi_device quirk flag so we didn't have to plumb this through
 the entire I/O stack?

It should be enough to limit it to open file descriptor or sg subsystem.

This magic is expected to happen every time device is accessed
trough sg subsystem, but not if it is accessed other way
(eg. it is still some kind of common disk).

My understanding is that sg subsystem allows generic access to device,
in extreme case I might want to send command to LUN which is not the
active one.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst

2014-01-03 Thread walt
On 01/03/2014 11:54 AM, Sarah Sharp wrote:
 On Fri, Jan 03, 2014 at 07:40:33AM -0800, walt wrote:
 On 01/02/2014 11:15 AM, Sarah Sharp wrote:
 On Tue, Dec 31, 2013 at 12:40:16PM -0800, walt wrote:
 On 12/18/2013 01:11 PM, Greg Kroah-Hartman wrote:
 3.12-stable review patch.  If anyone has any objections, please let me 
 know.

 --

 From: David Laight david.lai...@aculab.com

 commit 35773dac5f862cb1c82ea151eba3e2f6de51ec3e upstream.

 Section 4.11.7.1 of rev 1.0 of the xhci specification states that a link 
 TRB
 can only occur at a boundary between underlying USB frames (512 bytes for
 high speed devices).

 If this isn't done the USB frames aren't formatted correctly and, for 
 example,
 the USB3 ethernet ax88179_178a card will stop sending...


 Unfortunately this patch causes a regression when copying large files to my
 outboard USB3 drive. (Nothing at all to do with networking.)

 Do you have CONFIG_USB_DEBUG turned on for 3.13?  If so, you should see
 dmesg output from this statement shortly before your drive fails:

 if (num_trbs = TRBS_PER_SEGMENT) {
 xhci_err(xhci, Too many fragments %d, max %d\n,
 num_trbs, TRBS_PER_SEGMENT - 1);
 return -ENOMEM;
 }

 Well, the answers depend on whether the usb3 drive uses logical volumes or 
 not
 (lvm2), which I can't explain.  What I've described so far is with lvm2.

 When using lvm2 on the usb3 drive, turning on USB_DEBUG has *no* effect

I'm so sorry Sarah, that was another mistake.  The mistake is so stupid I'm not
going to publish it here :(

Once I finally ran the kernel with debugging actually compiled in, dmesg 
contains
xhci debugging messages.  Wow :)

It's a big file so I zipped and attached it, which I hope is acceptable in lkml.

BTW, this dmesg is from a kernel with sg_tablesize = 31, which as I said before
doesn't fix the problem.  The cp stopped around 7GB just as before.

Sorry for the noise...


xhci.dmesg.gz
Description: application/gzip


Re: Bug#733565: SIX messages per on boot console should be TWO

2014-01-03 Thread jidanni
 MKP == Martin K Petersen martin.peter...@oracle.com writes:

MKP We have to discover the basics of the disk before we can create the
MKP gendisk/block device/request queue. And some of the subsequent
MKP parameters we need can't be stored or acted upon until everything has
MKP been set up. So some questions we have to ask several times.

Perhaps the messages could be each differentiated so the user doesn't see
them as something that looks like a bug and needs to be reported.

E.g., prefix/suffix with PHASE I CHECK, PHASE II CHECK, PHASE III CHECK.
or FIRST CHECK, INTERMEDIATE CHECK, FINAL CHECK,
or CHECK 1, CHECK 2...
or INTERMEDIATE PROBE:, FINAL PROBE:, etc.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/2] target refcounting infrastructure fixes for usb

2014-01-03 Thread Sarah Sharp
On Fri, Jan 03, 2014 at 09:56:45AM +0100, Hans de Goede wrote:
 Hi,
 
 On 01/03/2014 01:45 AM, Sarah Sharp wrote:
 Ok, Alan's additional patch fixed the warnings I was seeing on UAS
 device unplug.  James, can you Cc me on the finished patch when you send
 it in?
 
 Hans, I don't want to send the UAS patches off to Greg until James'
 patches get into mainline.  I believe Greg's usb-next tree is frozen at
 this point, so they'll have to wait until 3.15.
 
 Ugh, I must say I'm rather unhappy about this, waiting till 3.14 to give
 time to shake things out was fine. But since the start of the 3.13 cycle,
 there have been no issues found in the xhci / uas code.

I completely understand why you're unhappy.  I agree that the pull
request you sent me on Nov 18th is fine (aside from not being a GPG
signed git tag). [1]  I also agree that the UAS and xHCI driver changes
have been stable during my testing, other than triggering the SCSI oops
on UAS disconnect.

 Yes it triggered an existing problem in another subsys, but the code itself
 has been issue free all this time. If Greg's tree is indeed already frozen I
 would rather have us asking an exception for this.

Detach yourself emotionally from your code and look at this request from
a maintainer's perspective.

You're asking me to push 69 patches to Greg after -rc6 is out, for a
driver that's been marked broken for several kernel releases (uas).  The
patches enable a feature that's been basically untested across all xHCI
host controllers (streams), and they add new userspace API for usbfs to
expose streams.

In the meantime, there's a big push to get code into linux-next at least
a week or two before the merge window opens.  I sent my last pull request
on Dec 20th, and Felipe closed his USB gadget tree on Dec 26th.  I had
hoped to get the SCSI issue settled so I could send in the UAS patches
on the 20th, but that didn't happen.  My tree is closed.

These fixes should get merged (and will!), but I will not ask Greg for
an exception to get these patches into 3.14.

 Alternatively we could add all of it to 3.14 except for the patch removing
 the BROKEN marking from uas. Or at least all the non uas bits, which are
 useful to have by themselves.

I think the best way to proceed with this is for me to queue all the
xHCI patches in that pull request for usb-next once 3.14-rc1 is out, and
then you send a separate pull request to Greg.  You're going to need to
be able to send him pull requests with a signed git tag in the future
anyway.

 James, what is the status of getting the fix for the refcount issue into
 mainline ?

(Greg, James will be queuing the SCSI fix for the 3.14 merge window.)

Sarah Sharp

[1] http://marc.info/?l=linux-usbm=138478555324055w=2
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#733565: SIX messages per on boot console should be TWO

2014-01-03 Thread Martin K. Petersen
 Martin == Martin K Petersen martin.peter...@oracle.com writes:

Martin I guess we could extend the first_scan logic to cover the error
Martin case scenarios.

[SCSI] sd: Quiesce mode sense error messages

Messages about discovered disk properties are only printed once unless
they are found to have changed. Errors encountered during mode sense,
however, are printed every time we revalidate.

Quiesce mode sense errors so they are only printed during the first
scan.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 69725f7c32c1..14d601ed1956 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2283,7 +2283,7 @@ sd_read_write_protect_flag(struct scsi_disk *sdkp, 
unsigned char *buffer)
 
set_disk_ro(sdkp-disk, 0);
if (sdp-skip_ms_page_3f) {
-   sd_printk(KERN_NOTICE, sdkp, Assuming Write Enabled\n);
+   sd_first_printk(KERN_NOTICE, sdkp, Assuming Write Enabled\n);
return;
}
 
@@ -2315,7 +2315,7 @@ sd_read_write_protect_flag(struct scsi_disk *sdkp, 
unsigned char *buffer)
}
 
if (!scsi_status_is_good(res)) {
-   sd_printk(KERN_WARNING, sdkp,
+   sd_first_printk(KERN_WARNING, sdkp,
  Test WP failed, assume Write Enabled\n);
} else {
sdkp-write_prot = ((data.device_specific  0x80) != 0);
@@ -2383,7 +2383,8 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char 
*buffer)
if (!data.header_length) {
modepage = 6;
first_len = 0;
-   sd_printk(KERN_ERR, sdkp, Missing header in MODE_SENSE 
response\n);
+   sd_first_printk(KERN_ERR, sdkp,
+   Missing header in MODE_SENSE response\n);
}
 
/* that went OK, now ask for the proper length */
@@ -2396,7 +2397,7 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char 
*buffer)
if (len  3)
goto bad_sense;
else if (len  SD_BUF_SIZE) {
-   sd_printk(KERN_NOTICE, sdkp, Truncating mode parameter 
+   sd_first_printk(KERN_NOTICE, sdkp, Truncating mode parameter 
  data from %d to %d bytes\n, len, SD_BUF_SIZE);
len = SD_BUF_SIZE;
}
@@ -2419,8 +2420,9 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char 
*buffer)
/* We're interested only in the first 3 bytes.
 */
if (len - offset = 2) {
-   sd_printk(KERN_ERR, sdkp, Incomplete 
- mode parameter data\n);
+   sd_first_printk(KERN_ERR, sdkp,
+   Incomplete mode parameter 
+   data\n);
goto defaults;
} else {
modepage = page_code;
@@ -2434,14 +2436,15 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned 
char *buffer)
else if (!spf  len - offset  1)
offset += 2 + buffer[offset+1];
else {
-   sd_printk(KERN_ERR, sdkp, Incomplete 
- mode parameter data\n);
+   sd_first_printk(KERN_ERR, sdkp,
+   Incomplete mode 
+   parameter data\n);
goto defaults;
}
}
}
 
-   sd_printk(KERN_ERR, sdkp, No Caching mode page found\n);
+   sd_first_printk(KERN_ERR, sdkp, No Caching mode page found\n);
goto defaults;
 
Page_found:
@@ -2455,7 +2458,7 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char 
*buffer)
 
sdkp-DPOFUA = (data.device_specific  0x10) != 0;
if (sdkp-DPOFUA  !sdkp-device-use_10_for_rw) {
-   sd_printk(KERN_NOTICE, sdkp,
+   sd_first_printk(KERN_NOTICE, sdkp,
  Uses READ/WRITE(6), disabling FUA\n);
sdkp-DPOFUA = 0;
}
@@ -2477,16 +2480,19 @@ bad_sense:
sshdr.sense_key == ILLEGAL_REQUEST 
sshdr.asc == 0x24  sshdr.ascq == 0x0)
/* Invalid field in CDB */
-   sd_printk(KERN_NOTICE, sdkp, Cache data unavailable\n);
+   sd_first_printk(KERN_NOTICE, sdkp, Cache data unavailable\n);
else
-   sd_printk(KERN_ERR, sdkp, Asking for cache data failed\n);
+   

Re: [RFC 0/2] target refcounting infrastructure fixes for usb

2014-01-03 Thread Hans de Goede

Hi,

On 01/03/2014 11:00 PM, Sarah Sharp wrote:

On Fri, Jan 03, 2014 at 09:56:45AM +0100, Hans de Goede wrote:

Hi,

On 01/03/2014 01:45 AM, Sarah Sharp wrote:

Ok, Alan's additional patch fixed the warnings I was seeing on UAS
device unplug.  James, can you Cc me on the finished patch when you send
it in?

Hans, I don't want to send the UAS patches off to Greg until James'
patches get into mainline.  I believe Greg's usb-next tree is frozen at
this point, so they'll have to wait until 3.15.


Ugh, I must say I'm rather unhappy about this, waiting till 3.14 to give
time to shake things out was fine. But since the start of the 3.13 cycle,
there have been no issues found in the xhci / uas code.


I completely understand why you're unhappy.  I agree that the pull
request you sent me on Nov 18th is fine (aside from not being a GPG
signed git tag). [1]  I also agree that the UAS and xHCI driver changes
have been stable during my testing, other than triggering the SCSI oops
on UAS disconnect.


Yes it triggered an existing problem in another subsys, but the code itself
has been issue free all this time. If Greg's tree is indeed already frozen I
would rather have us asking an exception for this.


Detach yourself emotionally from your code and look at this request from
a maintainer's perspective.

You're asking me to push 69 patches to Greg after -rc6 is out, for a
driver that's been marked broken for several kernel releases (uas).  The
patches enable a feature that's been basically untested across all xHCI
host controllers (streams), and they add new userspace API for usbfs to
expose streams.


Yes 69 patches which have been ready since November 18. Your argument is
that they were not in linux-next before now, my unhappiness comes from
that they could have been in linux-next for some time now already.


In the meantime, there's a big push to get code into linux-next at least
a week or two before the merge window opens.  I sent my last pull request
on Dec 20th, and Felipe closed his USB gadget tree on Dec 26th.  I had
hoped to get the SCSI issue settled so I could send in the UAS patches
on the 20th, but that didn't happen.  My tree is closed.

These fixes should get merged (and will!), but I will not ask Greg for
an exception to get these patches into 3.14.


Alternatively we could add all of it to 3.14 except for the patch removing
the BROKEN marking from uas. Or at least all the non uas bits, which are
useful to have by themselves.


I think the best way to proceed with this is for me to queue all the
xHCI patches in that pull request for usb-next once 3.14-rc1 is out, and
then you send a separate pull request to Greg.  You're going to need to
be able to send him pull requests with a signed git tag in the future
anyway.


Since you have all the patches in your tree now anyways I believe it would
be best if you just send a pull-req for all of them. I see little value
in me sending a separate pull-req for the uas patches. As discussed before
I'm fine with picking up uas maintenance from then on.

Regards,

Hans
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mptNsas MSI-X fixes

2014-01-03 Thread Martin K. Petersen
We found a couple of MSI-X related problems with mptNsas on very large
systems. The patches below contain fixes for mpt2sas and mpt3sas
respectively.

 drivers/scsi/mpt2sas/mpt2sas_base.c | 64 
--
 drivers/scsi/mpt3sas/mpt3sas_base.c | 73 
++
 2 files changed, 44 insertions(+), 93 deletions(-)

-- 
Martin K. Petersen  Oracle Linux Engineering

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] [SCSI] mpt2sas: Rework the MSI-X grouping code

2014-01-03 Thread Martin K. Petersen
From: Martin K. Petersen martin.peter...@oracle.com

On systems with a non power-of-two CPU count the existing MSI-X grouping
code failed to distribute interrupts correctly. Rework the code to
handle arbitrary processor counts.

Also remove the hardcoded upper limit on the number of processors so we
can boot on large systems.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
CC: Sreekanth Reddy sreekanth.re...@lsi.com
---
 drivers/scsi/mpt2sas/mpt2sas_base.c | 64 +
 1 file changed, 23 insertions(+), 41 deletions(-)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c 
b/drivers/scsi/mpt2sas/mpt2sas_base.c
index 3901edc35812..bfdbdf214169 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
@@ -1332,53 +1332,35 @@ _base_request_irq(struct MPT2SAS_ADAPTER *ioc, u8 
index, u32 vector)
 static void
 _base_assign_reply_queues(struct MPT2SAS_ADAPTER *ioc)
 {
-   struct adapter_reply_queue *reply_q;
-   int cpu_id;
-   int cpu_grouping, loop, grouping, grouping_mod;
+   unsigned int cpu, nr_cpus, nr_msix, index = 0;
 
if (!_base_is_controller_msix_enabled(ioc))
return;
 
memset(ioc-cpu_msix_table, 0, ioc-cpu_msix_table_sz);
-   /* when there are more cpus than available msix vectors,
-* then group cpus togeather on same irq
-*/
-   if (ioc-cpu_count  ioc-msix_vector_count) {
-   grouping = ioc-cpu_count / ioc-msix_vector_count;
-   grouping_mod = ioc-cpu_count % ioc-msix_vector_count;
-   if (grouping  2 || (grouping == 2  !grouping_mod))
-   cpu_grouping = 2;
-   else if (grouping  4 || (grouping == 4  !grouping_mod))
-   cpu_grouping = 4;
-   else if (grouping  8 || (grouping == 8  !grouping_mod))
-   cpu_grouping = 8;
-   else
-   cpu_grouping = 16;
-   } else
-   cpu_grouping = 0;
-
-   loop = 0;
-   reply_q = list_entry(ioc-reply_queue_list.next,
-struct adapter_reply_queue, list);
-   for_each_online_cpu(cpu_id) {
-   if (!cpu_grouping) {
-   ioc-cpu_msix_table[cpu_id] = reply_q-msix_index;
-   reply_q = list_entry(reply_q-list.next,
-   struct adapter_reply_queue, list);
-   } else {
-   if (loop  cpu_grouping) {
-   ioc-cpu_msix_table[cpu_id] =
-   reply_q-msix_index;
-   loop++;
-   } else {
-   reply_q = list_entry(reply_q-list.next,
-   struct adapter_reply_queue, list);
-   ioc-cpu_msix_table[cpu_id] =
-   reply_q-msix_index;
-   loop = 1;
-   }
+
+   nr_cpus = num_online_cpus();
+   nr_msix = ioc-reply_queue_count = min(ioc-reply_queue_count,
+  ioc-facts.MaxMSIxVectors);
+   if (!nr_msix)
+   return;
+
+   cpu = cpumask_first(cpu_online_mask);
+
+   do {
+   unsigned int i, group = nr_cpus / nr_msix;
+
+   if (index  nr_cpus % nr_msix)
+   group++;
+
+   for (i = 0 ; i  group ; i++) {
+   ioc-cpu_msix_table[cpu] = index;
+   cpu = cpumask_next(cpu, cpu_online_mask);
}
-   }
+
+   index++;
+
+   } while (cpu  nr_cpus);
 }
 
 /**
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] [SCSI] mpt3sas: Rework the MSI-X grouping code

2014-01-03 Thread Martin K. Petersen
From: Martin K. Petersen martin.peter...@oracle.com

On systems with a non power-of-two CPU count the existing MSI-X grouping
code failed to distribute interrupts correctly. Rework the code to
handle arbitrary processor counts.

Also remove the hardcoded upper limit on the number of processors so we
can boot on large systems.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Cc: Sreekanth Reddy sreekanth.re...@lsi.com
---
 drivers/scsi/mpt3sas/mpt3sas_base.c | 73 +++--
 1 file changed, 21 insertions(+), 52 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index fa785062e97b..35b95e61acd5 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -1624,66 +1624,35 @@ _base_request_irq(struct MPT3SAS_ADAPTER *ioc, u8 
index, u32 vector)
 static void
 _base_assign_reply_queues(struct MPT3SAS_ADAPTER *ioc)
 {
-   struct adapter_reply_queue *reply_q;
-   int cpu_id;
-   int cpu_grouping, loop, grouping, grouping_mod;
-   int reply_queue;
+   unsigned int cpu, nr_cpus, nr_msix, index = 0;
 
if (!_base_is_controller_msix_enabled(ioc))
return;
 
memset(ioc-cpu_msix_table, 0, ioc-cpu_msix_table_sz);
 
-   /* NUMA Hardware bug workaround - drop to less reply queues */
-   if (ioc-reply_queue_count  ioc-facts.MaxMSIxVectors) {
-   ioc-reply_queue_count = ioc-facts.MaxMSIxVectors;
-   reply_queue = 0;
-   list_for_each_entry(reply_q, ioc-reply_queue_list, list) {
-   reply_q-msix_index = reply_queue;
-   if (++reply_queue == ioc-reply_queue_count)
-   reply_queue = 0;
-   }
-   }
+   nr_cpus = num_online_cpus();
+   nr_msix = ioc-reply_queue_count = min(ioc-reply_queue_count,
+  ioc-facts.MaxMSIxVectors);
+   if (!nr_msix)
+   return;
 
-   /* when there are more cpus than available msix vectors,
-* then group cpus togeather on same irq
-*/
-   if (ioc-cpu_count  ioc-msix_vector_count) {
-   grouping = ioc-cpu_count / ioc-msix_vector_count;
-   grouping_mod = ioc-cpu_count % ioc-msix_vector_count;
-   if (grouping  2 || (grouping == 2  !grouping_mod))
-   cpu_grouping = 2;
-   else if (grouping  4 || (grouping == 4  !grouping_mod))
-   cpu_grouping = 4;
-   else if (grouping  8 || (grouping == 8  !grouping_mod))
-   cpu_grouping = 8;
-   else
-   cpu_grouping = 16;
-   } else
-   cpu_grouping = 0;
-
-   loop = 0;
-   reply_q = list_entry(ioc-reply_queue_list.next,
-struct adapter_reply_queue, list);
-   for_each_online_cpu(cpu_id) {
-   if (!cpu_grouping) {
-   ioc-cpu_msix_table[cpu_id] = reply_q-msix_index;
-   reply_q = list_entry(reply_q-list.next,
-   struct adapter_reply_queue, list);
-   } else {
-   if (loop  cpu_grouping) {
-   ioc-cpu_msix_table[cpu_id] =
-   reply_q-msix_index;
-   loop++;
-   } else {
-   reply_q = list_entry(reply_q-list.next,
-   struct adapter_reply_queue, list);
-   ioc-cpu_msix_table[cpu_id] =
-   reply_q-msix_index;
-   loop = 1;
-   }
+   cpu = cpumask_first(cpu_online_mask);
+
+   do {
+   unsigned int i, group = nr_cpus / nr_msix;
+
+   if (index  nr_cpus % nr_msix)
+   group++;
+
+   for (i = 0 ; i  group ; i++) {
+   ioc-cpu_msix_table[cpu] = index;
+   cpu = cpumask_next(cpu, cpu_online_mask);
}
-   }
+
+   index++;
+
+   } while (cpu  nr_cpus);
 }
 
 /**
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#733565: SIX messages per on boot console should be TWO

2014-01-03 Thread jidanni
 MKP == Martin K Petersen martin.peter...@oracle.com writes:
MKP [SCSI] sd: Quiesce mode sense error messages
Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/2] dual scan thread bug fix

2014-01-03 Thread James Bottomley

On Fri, 2014-01-03 at 10:58 -0500, Alan Stern wrote:
 On Thu, 2 Jan 2014, James Bottomley wrote:
 
  In the highly unusual case where two threads are running concurrently 
  through
  the scanning code scanning the same target, we run into the situation where
  one may allocate the target while the other is still using it.  In this 
  case,
  because the reap checks for STARGET_CREATED and kills the target without
  reference counting, the second thread will do the wrong thing on reap.
  
  Fix this by reference counting even creates and doing the STARGET_CREATED
  check in the final put.
 
 I'm still concerned about one thing.  The previous patch does this in
 scsi_alloc_target():
 
found:
  -   found_target-reap_ref++;
  +   if (!kref_get_unless_zero(found_target-reap_ref))
  +   /*
  +* release routine already fired.  Target is dead, but
  +* STARGET_DEL may not yet be set (set in the release
  +* routine), so set here as well, just in case
  +*/
  +   found_target-state = STARGET_DEL;
  spin_unlock_irqrestore(shost-host_lock, flags);
 
 As a result, the two comments in this patch aren't right:
 
  @@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref 
  *kref)
  struct scsi_target *starget
  = container_of(kref, struct scsi_target, reap_ref);
   
  -   transport_remove_device(starget-dev);
  -   device_del(starget-dev);
  -   starget-state = STARGET_DEL;
  +   /*
  +* if we get here and the target is still in the CREATED state that
  +* means it was allocated but never made visible (because a scan
  +* turned up no LUNs), so don't call device_del() on it.
  +*/
  +   if (starget-state == STARGET_RUNNING) {
  +   transport_remove_device(starget-dev);
  +   device_del(starget-dev);
  +   }
 
 Here the state could already be STARGET_DEL, even though the target is
 still visible.

Well, I agree with the theory.  In practise, there are only a few
machine instructions between the kref going to zero and us reaching that
point, because kref_release will jump into this routine next, so the
condition would be very hard to see.  However, I suppose it's easy to
close by checking for != STARGET_CREATED and there's no reason not to do
that, so I'll change it.

 Also, it's a little odd that the comment talks about CREATED but the 
 code really checks for RUNNING.  They should be consistent.

!= STARGET_CREATED should solve this.

  @@ -506,11 +513,13 @@ static struct scsi_target *scsi_alloc_target(struct 
  device *parent,
*/
   void scsi_target_reap(struct scsi_target *starget)
   {
  +   /*
  +* serious problem if this triggers: STARGET_DEL is only set in the
  +* kref release routine, so we're doing another final put on an
  +* already released kref
  +*/
  BUG_ON(starget-state == STARGET_DEL);
 
 Here the code is okay but the comment is wrong: STARGET_DEL is set in 
 _two_ places (but neither of them runs until reap_ref has reached 0).
 
 Would it be better in scsi_alloc_target() to behave as though the state 
 were STARGET_DEL without actually setting it?

Yes, I'll update the comment to it only goes to DEL after the kref goes
to zero.

How does the attached diff look?

James

---

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 82cf902..2f7de33 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -390,7 +390,7 @@ static void scsi_target_reap_ref_release(struct kref *kref)
 * means it was allocated but never made visible (because a scan
 * turned up no LUNs), so don't call device_del() on it.
 */
-   if (starget-state == STARGET_RUNNING) {
+   if (starget-state != STARGET_CREATED) {
transport_remove_device(starget-dev);
device_del(starget-dev);
}
@@ -514,9 +514,9 @@ static struct scsi_target *scsi_alloc_target(struct device 
*parent,
 void scsi_target_reap(struct scsi_target *starget)
 {
/*
-* serious problem if this triggers: STARGET_DEL is only set in the
-* kref release routine, so we're doing another final put on an
-* already released kref
+* serious problem if this triggers: STARGET_DEL is only set in the if
+* the reap_ref drops to zero, so we're trying to do another final put
+* on an already released kref
 */
BUG_ON(starget-state == STARGET_DEL);
scsi_target_reap_ref_put(starget);




--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html