Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-09 Thread Matthias Prager
Hello linux-scsi and linux-raid,

I did some further research regarding my problem.
It appears to me the fault does not lie with the mpt2sas driver (not
that I can definitely exclude it), but with the md implementation.

I reproduced what I think is the same issue on a different machine (also
running Vmware ESXi 5 and an LSI 9211-8i in IR mode) with a different
set of hard-drives of the same model. Using systemrescuecd
(2.8.1-beta003) and booting the 64bit 3.4.4 kernel, I issued the
following commands:

1) 'hdparm -y /dev/sda' (to put the hard-drive to sleep)
2) 'mdadm --create /dev/md1 --metadata 1.2 --level=mirror
--raid-devices=2 --name=test1 /dev/sda missing'
3) 'fdisk -l /dev/md127' (for some reason /proc/mdstat indicates the md
is being created as md127)

2) gave me this feedback:
--
mdadm: super1.x cannot open /dev/sda: Device or resource busy
mdadm: /dev/sda is not suitable for this array.
mdadm: create aborted
---
Even though it says creating aborted it still created md127.

And 3) lead to these lines in dmesg:
---
[  604.838640] sd 2:0:0:0: [sda] Device not ready
[  604.838645] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.838655] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.838663] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
20 00
[  604.838680] end_request: I/O error, dev sda, sector 2048
[  604.838688] Buffer I/O error on device md127, logical block 0
[  604.838695] Buffer I/O error on device md127, logical block 1
[  604.838699] Buffer I/O error on device md127, logical block 2
[  604.838702] Buffer I/O error on device md127, logical block 3
[  604.838783] sd 2:0:0:0: [sda] Device not ready
[  604.838785] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.838789] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.838793] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.838797] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
08 00
[  604.838805] end_request: I/O error, dev sda, sector 2048
[  604.838808] Buffer I/O error on device md127, logical block 0
[  604.838983] sd 2:0:0:0: [sda] Device not ready
[  604.838986] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.838989] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.838993] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.838998] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 57 54 65 d8 00 00
08 00
[  604.839006] end_request: I/O error, dev sda, sector 146514
[  604.839009] Buffer I/O error on device md127, logical block 183143355
[  604.839087] sd 2:0:0:0: [sda] Device not ready
[  604.839090] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.839093] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.839097] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.839102] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 57 54 65 d8 00 00
08 00
[  604.839110] end_request: I/O error, dev sda, sector 146514
[  604.839113] Buffer I/O error on device md127, logical block 183143355
[  604.839271] sd 2:0:0:0: [sda] Device not ready
[  604.839274] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.839278] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.839282] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.839286] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
20 00
[  604.839321] end_request: I/O error, dev sda, sector 2048
[  604.839324] Buffer I/O error on device md127, logical block 0
[  604.839330] Buffer I/O error on device md127, logical block 1
[  604.840494] sd 2:0:0:0: [sda] Device not ready
[  604.840497] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.840504] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.840512] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.840516] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
08 00
[  604.840526] end_request: I/O error, dev sda, sector 2048
--

This excludes hardware-errors (different physical machine and devices)
as cause and also ext4 which the other system was using as filesystem.
Maybe Neil Brown (who scripts/get_maintainer.pl identified as the
maintainer of the md-code) can make bits and pieces of this. It may well
be this is the same problem but a different error-path - I don't know.

I will try to make the scenario more generic, but I don't have a
non-virtual machine to spare atm. Also please do let me know if I'm
posting this to the wrong lists (linux-scsi and linux-raid) or if there
is anything which might not be helpful with the way I'm reporting this.

Regards,
Matthias Prager
--
To unsubscribe from this list: send the line "unsub

Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-09 Thread Robert Trace
> I did some further research regarding my problem.
> It appears to me the fault does not lie with the mpt2sas driver (not
> that I can definitely exclude it), but with the md implementation.

I'm actually discovering some of the same issues (LSI 9211-8i w/ SATA
disks), but I've come to a slightly different conclusion.

I noticed that when my SATA disks are on a SATA controller and they spin
down (or are spun down via hdparm -y), then they response to TUR (TEST
UNIT READY) commands with an OK.  Any I/O sent to these disks simply
wait while the disks spin up and then complete as usual.

However, my SATA disks on the SAS controller respond to TUR with the
sense error "Not Ready/Initializing command required".  Any I/O sent to
these disks immediately fails.  You saw this in your logging:

> [  604.838640] sd 2:0:0:0: [sda] Device not ready
> [  604.838645] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [  604.838655] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
> [  604.838663] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
> initializing command required
> [  604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
> 20 00
> [  604.838680] end_request: I/O error, dev sda, sector 2048
> [  604.838688] Buffer I/O error on device md127, logical block 0
> [  604.838695] Buffer I/O error on device md127, logical block 1
> [  604.838699] Buffer I/O error on device md127, logical block 2
> [  604.838702] Buffer I/O error on device md127, logical block 3

Sending an explicit START UNIT command to these sleeping disks will wake
them up and then they behave normally.  (BTW, you can issue TURs and
START UNITs via the sg_turs and sg_start commands).

I've reproduced this behavior on the raw disks themselves, no MD layer
involved (although the freak-out by my MD layer is what alerted me to
this issue too... Having your entire array punted the first time you
access it is a little scary :-).  I'm also on raw hardware and I've seen
this behavior on kernels 3.0.33 through 3.4.4.

So, SATA disks respond differently depending on the controller they're
on.  I don't know if this is a SCSI thing, a SAS thing or a
firmware/driver thing for the 9211.

Now, whether or not the MD layer should be assembling arrays from
"failed" disks is, I think, a separate issue.

-- Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-09 Thread Darrick J. Wong
On Mon, Jul 09, 2012 at 03:37:09PM -0400, Robert Trace wrote:
> > I did some further research regarding my problem.
> > It appears to me the fault does not lie with the mpt2sas driver (not
> > that I can definitely exclude it), but with the md implementation.
> 
> I'm actually discovering some of the same issues (LSI 9211-8i w/ SATA
> disks), but I've come to a slightly different conclusion.
> 
> I noticed that when my SATA disks are on a SATA controller and they spin
> down (or are spun down via hdparm -y), then they response to TUR (TEST
> UNIT READY) commands with an OK.  Any I/O sent to these disks simply
> wait while the disks spin up and then complete as usual.
> 
> However, my SATA disks on the SAS controller respond to TUR with the
> sense error "Not Ready/Initializing command required".  Any I/O sent to
> these disks immediately fails.  You saw this in your logging:
> 
> > [  604.838640] sd 2:0:0:0: [sda] Device not ready
> > [  604.838645] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
> > driverbyte=DRIVER_SENSE
> > [  604.838655] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
> > [  604.838663] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
> > initializing command required
> > [  604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
> > 20 00
> > [  604.838680] end_request: I/O error, dev sda, sector 2048
> > [  604.838688] Buffer I/O error on device md127, logical block 0
> > [  604.838695] Buffer I/O error on device md127, logical block 1
> > [  604.838699] Buffer I/O error on device md127, logical block 2
> > [  604.838702] Buffer I/O error on device md127, logical block 3
> 
> Sending an explicit START UNIT command to these sleeping disks will wake
> them up and then they behave normally.  (BTW, you can issue TURs and
> START UNITs via the sg_turs and sg_start commands).
> 
> I've reproduced this behavior on the raw disks themselves, no MD layer
> involved (although the freak-out by my MD layer is what alerted me to
> this issue too... Having your entire array punted the first time you
> access it is a little scary :-).  I'm also on raw hardware and I've seen
> this behavior on kernels 3.0.33 through 3.4.4.
> 
> So, SATA disks respond differently depending on the controller they're
> on.  I don't know if this is a SCSI thing, a SAS thing or a
> firmware/driver thing for the 9211.

I suspect that /sys/devices//manage_start_stop = 0
for the SATA devices hanging off the SAS controller.  Setting that sysfs
attribute to 1 is supposed to enable the SCSI layer to send TUR when it sees
"LU not ready", as well as spin down the drives at suspend/poweroff time.

--D
> 
> Now, whether or not the MD layer should be assembling arrays from
> "failed" disks is, I think, a separate issue.
> 
> -- Rob
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-09 Thread NeilBrown
On Mon, 09 Jul 2012 16:40:15 +0200 Matthias Prager 
wrote:

> Hello linux-scsi and linux-raid,
> 
> I did some further research regarding my problem.
> It appears to me the fault does not lie with the mpt2sas driver (not
> that I can definitely exclude it), but with the md implementation.
> 
> I reproduced what I think is the same issue on a different machine (also
> running Vmware ESXi 5 and an LSI 9211-8i in IR mode) with a different
> set of hard-drives of the same model. Using systemrescuecd
> (2.8.1-beta003) and booting the 64bit 3.4.4 kernel, I issued the
> following commands:
> 
> 1) 'hdparm -y /dev/sda' (to put the hard-drive to sleep)
> 2) 'mdadm --create /dev/md1 --metadata 1.2 --level=mirror
> --raid-devices=2 --name=test1 /dev/sda missing'
> 3) 'fdisk -l /dev/md127' (for some reason /proc/mdstat indicates the md
> is being created as md127)
> 
> 2) gave me this feedback:
> --
> mdadm: super1.x cannot open /dev/sda: Device or resource busy
> mdadm: /dev/sda is not suitable for this array.
> mdadm: create aborted
> ---
> Even though it says creating aborted it still created md127.

One of my pet peeves in when people interpret the observations wrongly and
then report their interpretation instead of their observation.  However
sometimes it is very hard to separate the two.  You comment above looks
perfectly reasonable and looks like a clean observation and not and
interpretation.  Yet it is an interpretation :-)

The observation would be
   "Even though it says creating abort, md127 was still created".

You see, it wasn't this mdadm that created md127 - it certainly shouldn't
have as you asked it to create md1.

I don't know the exact sequence of events, but something - possibly relating
to the error messages reported below - caused udev to notice /dev/sda.
udev then ran "mdadm -I /dev/sda" and as it had some metadata on it, it
created an array with it.  As the name information in that metadata was
probably "test1" or similar, rather than "1", mdadm didn't know what number
was wanted for the array, so it chose a free high number - 127.

This metadata must have been left over from an earlier experiment.

So it might have been something like.

- you run mdadm (call this mdadm-1).
- mdadm tries to open sda
- driver notices that device is asleep, and wakes it up
- the waking up of the device causes a CHANGE uevent to udev
- this cause udev to run a new mdadm - mdadm-2
- mdadm-2 reads the metadata, sees old metadata, assembled sda in a new md127
- mdadm-1 gets scheduled again, tries to get O_EXCL access to sda and fails, 
  because sda is now part of md127

Clearly undesirable behaviour.  I'm not sure which bit is "wrong".

NeilBrown


> 
> And 3) lead to these lines in dmesg:
> ---
> [  604.838640] sd 2:0:0:0: [sda] Device not ready
> [  604.838645] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [  604.838655] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
> [  604.838663] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
> initializing command required
> [  604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
> 20 00
> [  604.838680] end_request: I/O error, dev sda, sector 2048
> [  604.838688] Buffer I/O error on device md127, logical block 0
> [  604.838695] Buffer I/O error on device md127, logical block 1
> [  604.838699] Buffer I/O error on device md127, logical block 2
> [  604.838702] Buffer I/O error on device md127, logical block 3
> [  604.838783] sd 2:0:0:0: [sda] Device not ready
> [  604.838785] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [  604.838789] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
> [  604.838793] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
> initializing command required
> [  604.838797] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
> 08 00
> [  604.838805] end_request: I/O error, dev sda, sector 2048
> [  604.838808] Buffer I/O error on device md127, logical block 0
> [  604.838983] sd 2:0:0:0: [sda] Device not ready
> [  604.838986] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [  604.838989] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
> [  604.838993] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
> initializing command required
> [  604.838998] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 57 54 65 d8 00 00
> 08 00
> [  604.839006] end_request: I/O error, dev sda, sector 146514
> [  604.839009] Buffer I/O error on device md127, logical block 183143355
> [  604.839087] sd 2:0:0:0: [sda] Device not ready
> [  604.839090] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [  604.839093] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
> [  604.839097] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
> initializing command required
> [  604.839102] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 57 54 65 d8 00 00
> 08 00
> [  604.839110] end_request: I/O error, dev sda, sector 146514
> [  604.8391

Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-09 Thread Robert Trace
On 07/09/2012 04:45 PM, Darrick J. Wong wrote:
>
> I suspect that /sys/devices//manage_start_stop = 0
> for the SATA devices hanging off the SAS controller.

Yep, looks like you're right.  For my system:

# cat /sys/block/sd?/device/scsi_disk/*/manage_start_stop
1
1
1
1
1
0
0
0
0
0
0
0
0

Those first 5 disks are SATA disks on SATA controllers.  The last 8
disks are SATA disks on the SAS controller.

> Setting that sysfs
> attribute to 1 is supposed to enable the SCSI layer to send TUR when it sees
> "LU not ready", as well as spin down the drives at suspend/poweroff time.

Setting it to 1 doesn't seem to have made any difference, however.

# cat /sys/block/sdm/device/scsi_disk/14\:0\:7\:0/manage_start_stop
0
# echo 1 > /sys/block/sdm/device/scsi_disk/14\:0\:7\:/manage_start_stop
# cat /sys/block/sdm/device/scsi_disk/14\:0\:7\:0/manage_start_stop
1
# hdparm -y /dev/sdm

/dev/sdm:
 issuing standby command
# hdparm -C /dev/sdm

/dev/sdm:
 drive state is:  standby
# dd if=/dev/sdm of=/dev/null bs=512 count=1
dd: reading `/dev/sdm': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00117802 s, 0.0 kB/s

... and on the scsi logging side, I see the read(10) to the disk which
immediately returns "Not Ready" and the I/O failure bubbles up the
chain.  And afterwards, the disk is still asleep.

# hdparm -C /dev/sdm

/dev/sdm:
 drive state is:  standby

Also, TURs don't appear to actually wake the disk up (should they?).
The only thing I've found that'll wake the disk up is an explicit START
UNIT command.

-- Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fcoe: Remove redundant 'less than zero' check

2012-07-09 Thread Andrew Morton
On Thu, 05 Jul 2012 07:52:25 -0700
Robert Love  wrote:

> strtoul returns an 'unsigned long' so there is no
> reason to check if the value is less than zero.
> 
> strtoul already checks for the '-' character deep
> in its bowels. It will return an error if the user
> has provided a negative value and fcoe_str_to_dev_loss
> will return that error to its caller.

huh, I never knew that.  So if we feed -1 to kstrtoul() it gets treated
as an error?  That seems a bit surprising.  You're sure about that?

> This patch fixes the following Coverity reported warning:
> 
> CID 703581 -  NO_EFFECT Unsigned compared against 0 - This
> less-than-zero comparison of an unsigned value is never true. "*val < 0UL".
> drivers/scsi/fcoe/fcoe_sysfs.c:105
> 
> Signed-off-by: Robert Love 
> ---
>  drivers/scsi/fcoe/fcoe_sysfs.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/fcoe/fcoe_sysfs.c b/drivers/scsi/fcoe/fcoe_sysfs.c
> index 2bc1631..5e75168 100644
> --- a/drivers/scsi/fcoe/fcoe_sysfs.c
> +++ b/drivers/scsi/fcoe/fcoe_sysfs.c
> @@ -102,7 +102,7 @@ static int fcoe_str_to_dev_loss(const char *buf, unsigned 
> long *val)
>   int ret;
>  
>   ret = kstrtoul(buf, 0, val);
> - if (ret || *val < 0)
> + if (ret)
>   return -EINVAL;
>   /*
>* Check for overflow; dev_loss_tmo is u32
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fcoe: Remove redundant 'less than zero' check

2012-07-09 Thread Love, Robert W
On 12-07-09 04:29 PM, Andrew Morton wrote:
> On Thu, 05 Jul 2012 07:52:25 -0700
> Robert Love  wrote:
>
>> strtoul returns an 'unsigned long' so there is no
>> reason to check if the value is less than zero.
>>
>> strtoul already checks for the '-' character deep
>> in its bowels. It will return an error if the user
>> has provided a negative value and fcoe_str_to_dev_loss
>> will return that error to its caller.
> huh, I never knew that.  So if we feed -1 to kstrtoul() it gets treated
> as an error?  That seems a bit surprising.  You're sure about that?
>
>
I believe so.

kstrtoul->kstrtoull->_kstrtoull->_parse_integer

When the call chain ultimately hits _parse_integer it breaks out of 
parsing if it hits a non-numeric or alphabetic character outside of the 
'a' to 'f' range. _kstrtoull notices that the buffer wasn't completely 
parsed and returns an error. I think the error will be -EINVAL.

//Rob--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-09 Thread Matthias Prager
Am 10.07.2012 00:08, schrieb NeilBrown:
> On Mon, 09 Jul 2012 16:40:15 +0200 Matthias Prager 
> wrote:
> 
>> Even though it says creating aborted it still created md127.
> 
> One of my pet peeves in when people interpret the observations wrongly and
> then report their interpretation instead of their observation.  However
> sometimes it is very hard to separate the two.  You comment above looks
> perfectly reasonable and looks like a clean observation and not and
> interpretation.  Yet it is an interpretation :-)
> 
> The observation would be
>"Even though it says creating abort, md127 was still created".
> 
> You see, it wasn't this mdadm that created md127 - it certainly shouldn't
> have as you asked it to create md1.
Sry - I jumped to conclusions without knowing what was actually going on.

> 
> I don't know the exact sequence of events, but something - possibly relating
> to the error messages reported below - caused udev to notice /dev/sda.
> udev then ran "mdadm -I /dev/sda" and as it had some metadata on it, it
> created an array with it.  As the name information in that metadata was
> probably "test1" or similar, rather than "1", mdadm didn't know what number
> was wanted for the array, so it chose a free high number - 127.
> 
> This metadata must have been left over from an earlier experiment.
That is correct (as am just realizing now). There is metadata of an
raid1 array left on the disk even though it was used (for a short time)
with zfs on freebsd before doing these experiments.

> 
> So it might have been something like.
> 
> - you run mdadm (call this mdadm-1).
> - mdadm tries to open sda
> - driver notices that device is asleep, and wakes it up
> - the waking up of the device causes a CHANGE uevent to udev
> - this cause udev to run a new mdadm - mdadm-2
> - mdadm-2 reads the metadata, sees old metadata, assembled sda in a new md127
> - mdadm-1 gets scheduled again, tries to get O_EXCL access to sda and fails, 
>   because sda is now part of md127
> 
> Clearly undesirable behaviour.  I'm not sure which bit is "wrong".
As it turns out mdadm is doing everything right. md127 is actually
already present (though inactive) at boot-time. So mdadm is absolutly
correct in saying sda is busy and refusing to do anything further.

> 
> NeilBrown
> 

The real problem seems to be located in some layer below md, which is
not waking up the disk for any i/o (at all - not even for fdisk -l).

Matthias
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-09 Thread Matthias Prager
Am 09.07.2012 21:37, schrieb Robert Trace:
>> I did some further research regarding my problem.
>> It appears to me the fault does not lie with the mpt2sas driver (not
>> that I can definitely exclude it), but with the md implementation.
> 
> I'm actually discovering some of the same issues (LSI 9211-8i w/ SATA
> disks), but I've come to a slightly different conclusion.
> 
> I noticed that when my SATA disks are on a SATA controller and they spin
> down (or are spun down via hdparm -y), then they response to TUR (TEST
> UNIT READY) commands with an OK.  Any I/O sent to these disks simply
> wait while the disks spin up and then complete as usual.
> 
> However, my SATA disks on the SAS controller respond to TUR with the
> sense error "Not Ready/Initializing command required".  Any I/O sent to
> these disks immediately fails.  You saw this in your logging:
> 
>> [  604.838640] sd 2:0:0:0: [sda] Device not ready
>> [  604.838645] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
>> driverbyte=DRIVER_SENSE
>> [  604.838655] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
>> [  604.838663] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
>> initializing command required
>> [  604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
>> 20 00
>> [  604.838680] end_request: I/O error, dev sda, sector 2048
>> [  604.838688] Buffer I/O error on device md127, logical block 0
>> [  604.838695] Buffer I/O error on device md127, logical block 1
>> [  604.838699] Buffer I/O error on device md127, logical block 2
>> [  604.838702] Buffer I/O error on device md127, logical block 3
> 
> Sending an explicit START UNIT command to these sleeping disks will wake
> them up and then they behave normally.  (BTW, you can issue TURs and
> START UNITs via the sg_turs and sg_start commands).
Thanks for these pointers.

> 
> I've reproduced this behavior on the raw disks themselves, no MD layer
> involved (although the freak-out by my MD layer is what alerted me to
> this issue too... Having your entire array punted the first time you
> access it is a little scary :-).  I'm also on raw hardware and I've seen
> this behavior on kernels 3.0.33 through 3.4.4.
This is interesting - are you sure about 3.0.33? I'm running this kernel
atm for it gives me no trouble (as opposed to >=3.1.10). The SATA disks
are spun up when I access data on them.

> 
> So, SATA disks respond differently depending on the controller they're
> on.  I don't know if this is a SCSI thing, a SAS thing or a
> firmware/driver thing for the 9211.
> 
> Now, whether or not the MD layer should be assembling arrays from
> "failed" disks is, I think, a separate issue.
I realize now in my cases the MD layer behaved correctly.

> 
> -- Rob
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-09 Thread Matthias Prager
Am 10.07.2012 00:24, schrieb Robert Trace:
> 
> Also, TURs don't appear to actually wake the disk up (should they?).
> The only thing I've found that'll wake the disk up is an explicit START
> UNIT command.

I haven't checked the scsi logging side, but about the only commands
that wake up the disks are 'smartctl -a /dev/sda' and 'sg_start'
(smartcl maybe issuing a START UNIT command on it's own).

Matthias
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6

2012-07-09 Thread Nicholas A. Bellinger
Hi folks,

On Wed, 2012-07-04 at 18:52 -0700, Nicholas A. Bellinger wrote:
> 
> To give an idea of how things are looking on the performance side, here
> some initial numbers for small block (4k) mixed random IOPs using the
> following fio test setup:



> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal 
> raw block
> 
> 25 Write / 75 Read  |  ~15K   | ~45K  | ~70K
> 75 Write / 25 Read  |  ~20K   | ~55K  | ~60K
> 
> 

After checking the original benchmarks here again, I realized that for
virtio-scsi+tcm_vhost the results where actually switched..

So this should have been: heavier READ case (25 / 75) == 55K, and
heavier WRITE case (75 / 25) == 45K.

> In the first case, virtio-scsi+tcm_vhost is out performing by 3x
> compared to virtio-scsi-raw using QEMU SCSI emulation with the same raw
> flash backend device.  For the second case heavier WRITE case, tcm_vhost
> is nearing full bare-metal utilization (~55K vs. ~60K).
> 
> Also converting tcm_vhost to use proper cmwq process context I/O
> submission will help to get even closer to bare metal speeds for both
> work-loads.
> 

Here are initial follow-up virtio-scsi randrw 4k benchmarks with
tcm_vhost recently converted to run backend I/O dispatch via modern cmwq
primitives (kworkerd).

fio randrw 4k workload | virtio-scsi+tcm_vhost+cmwq
---
  25 Write / 75 Read   |  ~60K
  75 Write / 25 Read   |  ~45K

So aside from the minor performance improvement for the 25 / 75
workload, the other main improvement is lower CPU usage using the
iomemory_vsl backends.  This is attributed to cmwq providing process
context on the same core as the vhost thread pulling items off vq, which
ends up being on the order of 1/3 less host CPU usage (for both
workloads) primarly from positive cache effects.

This patch is now available in target-pending/tcm_vhost, and I'll be
respinning the initial merge series into for-next-merge over the next
days + another round of list review.

Please let us know if you have any concerns.

Thanks!

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-09 Thread Robert Trace
[removed linux-raid since the md layer seems unrelated]

On 07/09/2012 08:12 PM, Matthias Prager wrote:
>>
>> I've reproduced this behavior on the raw disks themselves, no MD layer
>> involved (although the freak-out by my MD layer is what alerted me to
>> this issue too... Having your entire array punted the first time you
>> access it is a little scary :-).  I'm also on raw hardware and I've seen
>> this behavior on kernels 3.0.33 through 3.4.4.
> This is interesting - are you sure about 3.0.33? I'm running this kernel
> atm for it gives me no trouble (as opposed to >=3.1.10). The SATA disks
> are spun up when I access data on them.

Huh..  I just retested this and I'm seeing really random behavior.

I tried 3.0.33 a few days ago after I saw your initial e-mail to this
list.  At that time, the one disk I tried didn't wake up when I sent I/O
to it.

My first retest (just now), on 3.0.33 with four disks, showed the
behavior you initially reported.  Two of the disks woke up from the I/O,
but not all of them.

Repeating the test without rebooting made two disks wake up, but only
one of the same disks from the first test.  The second disk that woke up
was different.

After rebooting and running the test again, none of the disks woke up.

Rebooting again and all of the disks are waking up.

(FYI, here's the test I ran:

1.  hdparm -y /dev/sd[lmjk]
2.  hdparm -C /dev/sd[lmjk] (to verify disks in standby)
3.  for i in l m j k; do sg_turs -v /dev/sd${i}; done
(All disks reported "Not Ready")
4.  echo 3 > /proc/sys/vm/drop_caches
5.  for i in l m j k; do dd if=/dev/sd${i} of=/dev/null bs=512 count=1
skip=; done

I've been manually changing the skip= because I've seen the dd
command complete successfully without the disk waking up.  I think this
is because the disk is satisfying the read from its own cache.  Changing
where on the disk I'm reading should thwart this.
)

I'm confused.  I'll try more recent kernels again and see if the
behavior becomes predictable.

-- Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-09 Thread Robert Trace
On 07/09/2012 08:21 PM, Matthias Prager wrote:
>
> I haven't checked the scsi logging side, but about the only commands
> that wake up the disks are 'smartctl -a /dev/sda' and 'sg_start'
> (smartcl maybe issuing a START UNIT command on it's own).

smartctl -a does appear to wake the disks.  The scsi log shows an
IDENTIFY and then several ATA passthrough commands (one of which takes
~10 seconds to complete).  So, I don't see an explicit START UNIT, but
one of those ATA commands which I didn't decode could certainly trigger
the wakeup.

-- Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[set4 resend PATCH 0/5] libsas, libata: suspend / resume and "reset once"

2012-07-09 Thread Dan Williams
Hi Jeff,

Let me know if any of these need reworking, otherwise I believe James is
waiting on your ack to take them (well, all except patch1) through scsi.git.

--
Dan


Original description:
Set4 of 5 patchsets to update scsi, libsas, and libata in
support of the isci driver.

Let libsas hook into the generic suspend resume infrastructure in
libata, and provide a common suspend/resume implementation for lldds to
reuse.

"Reset once" is not part of the suspend/resume work.  But it is relevant
to libsas users who need to wait for domain-wide ata error recovery and
want to limit the effort for known well-behaved devices.

These have been in -next for the past couple kernel cycles.

---

Artur Wojcik (1):
  isci: implement suspend/resume support

Dan Williams (4):
  libata: reset once
  libata: export ata_port suspend/resume infrastructure for sas
  libsas: suspend / resume support
  libsas, ipr: cleanup ata_host flags initialization via ata_host_init


 Documentation/kernel-parameters.txt |3 +
 drivers/ata/libata-core.c   |   69 +++
 drivers/ata/libata-eh.c |2 +
 drivers/scsi/ipr.c  |3 -
 drivers/scsi/isci/host.c|2 -
 drivers/scsi/isci/host.h|2 -
 drivers/scsi/isci/init.c|   58 ++
 drivers/scsi/libsas/sas_ata.c   |   91 +--
 drivers/scsi/libsas/sas_discover.c  |   69 +++
 drivers/scsi/libsas/sas_dump.c  |1 
 drivers/scsi/libsas/sas_event.c |4 +-
 drivers/scsi/libsas/sas_init.c  |   90 ++-
 drivers/scsi/libsas/sas_internal.h  |1 
 drivers/scsi/libsas/sas_phy.c   |   21 
 drivers/scsi/libsas/sas_port.c  |   52 
 include/linux/libata.h  |   15 +-
 include/scsi/libsas.h   |   20 ++--
 include/scsi/sas_ata.h  |   10 
 18 files changed, 463 insertions(+), 50 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[set4 resend PATCH 1/5] async: introduce 'async_domain' type

2012-07-09 Thread Dan Williams
This is in preparation for teaching async_synchronize_full() to sync all
pending async work, and not just on the async_running domain.  This
conversion is functionally equivalent, just embedding the existing list
in a new async_domain type.

The .registered attribute is used in a later patch to distinguish
between domains that want to be flushed by async_synchronize_full()
versus those that only expect async_synchronize_{full|cookie}_domain to
be used for flushing.

Cc: Liam Girdwood 
Cc: James Bottomley 
Acked-by: Arjan van de Ven 
Acked-by: Mark Brown 
Tested-by: Eldad Zack 
Signed-off-by: Dan Williams 
---
 drivers/regulator/core.c  |2 +-
 drivers/scsi/libsas/sas_ata.c |2 +-
 drivers/scsi/scsi.c   |3 ++-
 drivers/scsi/scsi_priv.h  |2 +-
 include/linux/async.h |   35 +++
 kernel/async.c|   35 +--
 sound/soc/soc-dapm.c  |2 +-
 7 files changed, 54 insertions(+), 27 deletions(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 09a737c..4293aae 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -2744,7 +2744,7 @@ static void regulator_bulk_enable_async(void *data, 
async_cookie_t cookie)
 int regulator_bulk_enable(int num_consumers,
  struct regulator_bulk_data *consumers)
 {
-   LIST_HEAD(async_domain);
+   ASYNC_DOMAIN_EXCLUSIVE(async_domain);
int i;
int ret = 0;
 
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index bec3bc8..a59fcdc 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -742,7 +742,7 @@ static void async_sas_ata_eh(void *data, async_cookie_t 
cookie)
 void sas_ata_strategy_handler(struct Scsi_Host *shost)
 {
struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(shost);
-   LIST_HEAD(async);
+   ASYNC_DOMAIN_EXCLUSIVE(async);
int i;
 
/* it's ok to defer revalidation events during ata eh, these
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index bbbc9c9..4cade88 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -54,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -91,7 +92,7 @@ EXPORT_SYMBOL(scsi_logging_level);
 #endif
 
 /* sd, scsi core and power management need to coordinate flushing async 
actions */
-LIST_HEAD(scsi_sd_probe_domain);
+ASYNC_DOMAIN(scsi_sd_probe_domain);
 EXPORT_SYMBOL(scsi_sd_probe_domain);
 
 /* NB: These are exposed through /proc/scsi/scsi and form part of the ABI.
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 13d74da..4bd25ec 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -163,7 +163,7 @@ static inline int scsi_autopm_get_host(struct Scsi_Host *h) 
{ return 0; }
 static inline void scsi_autopm_put_host(struct Scsi_Host *h) {}
 #endif /* CONFIG_PM_RUNTIME */
 
-extern struct list_head scsi_sd_probe_domain;
+extern struct async_domain scsi_sd_probe_domain;
 
 /* 
  * internal scsi timeout functions: for use by mid-layer and transport
diff --git a/include/linux/async.h b/include/linux/async.h
index 68a9530..364e7ff 100644
--- a/include/linux/async.h
+++ b/include/linux/async.h
@@ -9,19 +9,46 @@
  * as published by the Free Software Foundation; version 2
  * of the License.
  */
+#ifndef __ASYNC_H__
+#define __ASYNC_H__
 
 #include 
 #include 
 
 typedef u64 async_cookie_t;
 typedef void (async_func_ptr) (void *data, async_cookie_t cookie);
+struct async_domain {
+   struct list_head node;
+   struct list_head domain;
+   int count;
+   unsigned registered:1;
+};
+
+/*
+ * domain participates in global async_synchronize_full
+ */
+#define ASYNC_DOMAIN(_name) \
+   struct async_domain _name = { .node = LIST_HEAD_INIT(_name.node), \
+ .domain = LIST_HEAD_INIT(_name.domain), \
+ .count = 0, \
+ .registered = 1 }
+
+/*
+ * domain is free to go out of scope as soon as all pending work is
+ * complete, this domain does not participate in async_synchronize_full
+ */
+#define ASYNC_DOMAIN_EXCLUSIVE(_name) \
+   struct async_domain _name = { .node = LIST_HEAD_INIT(_name.node), \
+ .domain = LIST_HEAD_INIT(_name.domain), \
+ .count = 0, \
+ .registered = 0 }
 
 extern async_cookie_t async_schedule(async_func_ptr *ptr, void *data);
 extern async_cookie_t async_schedule_domain(async_func_ptr *ptr, void *data,
-   struct list_head *list);
+   struct async_domain *domain);
 extern void async_synchronize_full(void);
-extern void async_synchronize_full_domain(struct list_head *list);
+extern void async_synchronize_full_domain(struct async_domain *domain);
 extern void as

[set4 resend PATCH 2/5] async: make async_synchronize_full() flush all work regardless of domain

2012-07-09 Thread Dan Williams
In response to an async related regression James noted:

  "My theory is that this is an init problem: The assumption in a lot of
   our code is that async_synchronize_full() waits for everything ... even
   the domain specific async schedules, which isn't true."

...so make this assumption true.

Each domain, including the default one, registers itself on a global domain
list when work is scheduled.  Once all entries complete it exits that
list.  Waiting for the list to be empty syncs all in-flight work across
all domains.

Domains can opt-out of global syncing if they are declared as exclusive
ASYNC_DOMAIN_EXCLUSIVE().  All stack-based domains have been declared
exclusive since the domain may go out of scope as soon as the last work
item completes.

Statically declared domains are mostly ok, but async_unregister_domain()
is there to close any theoretical races with pending
async_synchronize_full waiters at module removal time.

Cc: Len Brown 
Cc: Rafael J. Wysocki 
Cc: James Bottomley 
Acked-by: Arjan van de Ven 
Reported-by: Meelis Roos 
Reported-by: Eldad Zack 
Tested-by: Eldad Zack 
Signed-off-by: Dan Williams 
---
 drivers/scsi/scsi.c   |1 +
 include/linux/async.h |1 +
 kernel/async.c|   43 +--
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 4cade88..2936b44 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -1355,6 +1355,7 @@ static void __exit exit_scsi(void)
scsi_exit_devinfo();
scsi_exit_procfs();
scsi_exit_queue();
+   async_unregister_domain(&scsi_sd_probe_domain);
 }
 
 subsys_initcall(init_scsi);
diff --git a/include/linux/async.h b/include/linux/async.h
index 364e7ff..7a24fe9 100644
--- a/include/linux/async.h
+++ b/include/linux/async.h
@@ -46,6 +46,7 @@ struct async_domain {
 extern async_cookie_t async_schedule(async_func_ptr *ptr, void *data);
 extern async_cookie_t async_schedule_domain(async_func_ptr *ptr, void *data,
struct async_domain *domain);
+void async_unregister_domain(struct async_domain *domain);
 extern void async_synchronize_full(void);
 extern void async_synchronize_full_domain(struct async_domain *domain);
 extern void async_synchronize_cookie(async_cookie_t cookie);
diff --git a/kernel/async.c b/kernel/async.c
index ba5491d..9d31183 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -63,7 +63,9 @@ static async_cookie_t next_cookie = 1;
 
 static LIST_HEAD(async_pending);
 static ASYNC_DOMAIN(async_running);
+static LIST_HEAD(async_domains);
 static DEFINE_SPINLOCK(async_lock);
+static DEFINE_MUTEX(async_register_mutex);
 
 struct async_entry {
struct list_headlist;
@@ -145,6 +147,8 @@ static void async_run_entry_fn(struct work_struct *work)
/* 3) remove self from the running queue */
spin_lock_irqsave(&async_lock, flags);
list_del(&entry->list);
+   if (running->registered && --running->count == 0)
+   list_del_init(&running->node);
 
/* 4) free the entry */
kfree(entry);
@@ -187,6 +191,8 @@ static async_cookie_t __async_schedule(async_func_ptr *ptr, 
void *data, struct a
spin_lock_irqsave(&async_lock, flags);
newcookie = entry->cookie = next_cookie++;
list_add_tail(&entry->list, &async_pending);
+   if (running->registered && running->count++ == 0)
+   list_add_tail(&running->node, &async_domains);
atomic_inc(&entry_count);
spin_unlock_irqrestore(&async_lock, flags);
 
@@ -236,13 +242,43 @@ EXPORT_SYMBOL_GPL(async_schedule_domain);
  */
 void async_synchronize_full(void)
 {
+   mutex_lock(&async_register_mutex);
do {
-   async_synchronize_cookie(next_cookie);
-   } while (!list_empty(&async_running.domain) || 
!list_empty(&async_pending));
+   struct async_domain *domain = NULL;
+
+   spin_lock_irq(&async_lock);
+   if (!list_empty(&async_domains))
+   domain = list_first_entry(&async_domains, 
typeof(*domain), node);
+   spin_unlock_irq(&async_lock);
+
+   async_synchronize_cookie_domain(next_cookie, domain);
+   } while (!list_empty(&async_domains));
+   mutex_unlock(&async_register_mutex);
 }
 EXPORT_SYMBOL_GPL(async_synchronize_full);
 
 /**
+ * async_unregister_domain - ensure no more anonymous waiters on this domain
+ * @domain: idle domain to flush out of any async_synchronize_full instances
+ *
+ * async_synchronize_{cookie|full}_domain() are not flushed since callers
+ * of these routines should know the lifetime of @domain
+ *
+ * Prefer ASYNC_DOMAIN_EXCLUSIVE() declarations over flushing
+ */
+void async_unregister_domain(struct async_domain *domain)
+{
+   mutex_lock(&async_register_mutex);
+   spin_lock_irq(&async_lock);
+   WARN_ON(!domain->registered || !list_empty(&domain->node) ||
+   !lis

[set4 resend PATCH 3/5] scsi: queue async scan work to an async_schedule domain

2012-07-09 Thread Dan Williams
This is preparation to enable async_synchronize_full() to be used as a
replacement for scsi_complete_async_scans(), i.e. to stop leaking scsi
internal details where they are not needed.

Cc: Arjan van de Ven 
Cc: Len Brown 
Cc: Rafael J. Wysocki 
Cc: James Bottomley 
Tested-by: Eldad Zack 
Signed-off-by: Dan Williams 
---
 drivers/scsi/scsi_scan.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index f55e5f1..dff17c1 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1845,14 +1845,13 @@ static void do_scsi_scan_host(struct Scsi_Host *shost)
}
 }
 
-static int do_scan_async(void *_data)
+static void do_scan_async(void *_data, async_cookie_t c)
 {
struct async_scan_data *data = _data;
struct Scsi_Host *shost = data->shost;
 
do_scsi_scan_host(shost);
scsi_finish_async_scan(data);
-   return 0;
 }
 
 /**
@@ -1861,7 +1860,6 @@ static int do_scan_async(void *_data)
  **/
 void scsi_scan_host(struct Scsi_Host *shost)
 {
-   struct task_struct *p;
struct async_scan_data *data;
 
if (strncmp(scsi_scan_type, "none", 4) == 0)
@@ -1876,9 +1874,11 @@ void scsi_scan_host(struct Scsi_Host *shost)
return;
}
 
-   p = kthread_run(do_scan_async, data, "scsi_scan_%d", shost->host_no);
-   if (IS_ERR(p))
-   do_scan_async(data);
+   /* register with the async subsystem so wait_for_device_probe()
+* will flush this work
+*/
+   async_schedule(do_scan_async, data);
+
/* scsi_autopm_put_host(shost) is called in scsi_finish_async_scan() */
 }
 EXPORT_SYMBOL(scsi_scan_host);

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[set4 resend PATCH 4/5] scsi: cleanup usages of scsi_complete_async_scans

2012-07-09 Thread Dan Williams
Now that scsi registers its async scan work with the async subsystem,
wait_for_device_probe() is sufficient for ensuring all scanning is
complete.

Cc: Arjan van de Ven 
Cc: Len Brown 
Cc: Rafael J. Wysocki 
Cc: James Bottomley 
Tested-by: Eldad Zack 
Signed-off-by: Dan Williams 
---
 drivers/scsi/scsi_scan.c |   12 
 include/scsi/scsi_scan.h |   11 ---
 kernel/power/hibernate.c |8 
 kernel/power/user.c  |2 --
 4 files changed, 33 deletions(-)
 delete mode 100644 include/scsi/scsi_scan.h

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index dff17c1..a0bc663 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -187,18 +187,6 @@ int scsi_complete_async_scans(void)
return 0;
 }
 
-/* Only exported for the benefit of scsi_wait_scan */
-EXPORT_SYMBOL_GPL(scsi_complete_async_scans);
-
-#ifndef MODULE
-/*
- * For async scanning we need to wait for all the scans to complete before
- * trying to mount the root fs.  Otherwise non-modular drivers may not be ready
- * yet.
- */
-late_initcall(scsi_complete_async_scans);
-#endif
-
 /**
  * scsi_unlock_floptical - unlock device via a special MODE SENSE command
  * @sdev:  scsi device to send command to
diff --git a/include/scsi/scsi_scan.h b/include/scsi/scsi_scan.h
deleted file mode 100644
index 7889888..000
--- a/include/scsi/scsi_scan.h
+++ /dev/null
@@ -1,11 +0,0 @@
-#ifndef _SCSI_SCSI_SCAN_H
-#define _SCSI_SCSI_SCAN_H
-
-#ifdef CONFIG_SCSI
-/* drivers/scsi/scsi_scan.c */
-extern int scsi_complete_async_scans(void);
-#else
-static inline int scsi_complete_async_scans(void) { return 0; }
-#endif
-
-#endif /* _SCSI_SCSI_SCAN_H */
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 8b53db3..238025f 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -27,7 +27,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "power.h"
 
@@ -748,13 +747,6 @@ static int software_resume(void)
async_synchronize_full();
}
 
-   /*
-* We can't depend on SCSI devices being available after loading
-* one of their modules until scsi_complete_async_scans() is
-* called and the resume device usually is a SCSI one.
-*/
-   scsi_complete_async_scans();
-
swsusp_resume_device = name_to_dev_t(resume_file);
if (!swsusp_resume_device) {
error = -ENODEV;
diff --git a/kernel/power/user.c b/kernel/power/user.c
index 91b0fd0..4ed81e7 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -24,7 +24,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 
@@ -84,7 +83,6 @@ static int snapshot_open(struct inode *inode, struct file 
*filp)
 * appear.
 */
wait_for_device_probe();
-   scsi_complete_async_scans();
 
data->swap = -1;
data->mode = O_WRONLY;

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[set4 resend PATCH 5/5] Revert "[SCSI] fix async probe regression"

2012-07-09 Thread Dan Williams
This reverts commit 43a8d39d0137612c336aa8bbb2cb886a79772ffb.

Commit 43a8d39d fixed the fact that wait_for_device_probe() was unable
to flush sd probe work.  Now that sd probe work is once again flushable
via wait_for_device_probe() this workaround is no longer needed.

Cc: Meelis Roos 
Tested-by: Eldad Zack 
Signed-off-by: Dan Williams 
---
 drivers/scsi/scsi_scan.c |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index a0bc663..56a9379 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -147,7 +147,7 @@ int scsi_complete_async_scans(void)
 
do {
if (list_empty(&scanning_hosts))
-   goto out;
+   return 0;
/* If we can't get memory immediately, that's OK.  Just
 * sleep a little.  Even if we never get memory, the async
 * scans will finish eventually.
@@ -179,11 +179,8 @@ int scsi_complete_async_scans(void)
}
  done:
spin_unlock(&async_scan_lock);
-   kfree(data);
-
- out:
-   async_synchronize_full_domain(&scsi_sd_probe_domain);
 
+   kfree(data);
return 0;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [set4 resend PATCH 0/5] libsas, libata: suspend / resume and "reset once"

2012-07-09 Thread Dan Williams
Sorry.  Sent the wrong cover letter with this patch set.

Should be:

Subject: [set5 PATCH v2 0/5] scsi, async: asynchronous probing rework

Changes since v1: http://marc.info/?l=linux-scsi&m=134034693629294&w=2
1/ rebased on scsi/for-next to pick up the scsi_wait_scan module deletion
2/ added Arjan's ack (spoke with Arjan offline)
3/ added Eldad's tested-by
4/ dropped the scsi async scan fix that was merged to scsi/misc

Original description:
Set5 of 5 patchsets to update scsi, libsas, and libata in
support of the isci driver.

Commit 43a8d39d "[SCSI] fix async probe regression" found that
async_synchronize_full() was missing async work that was scheduled to
its own domain.  This led James to note:

  "My theory is that this is an init problem: The assumption in a lot of
   our code is that async_synchronize_full() waits for everything ... even
   the domain specific async schedules, which isn't true."

...and this set aims to make that assumption true, but also with the
ability to opt-out for "private" async work.

---

Dan Williams (5):
  async: introduce 'async_domain' type
  async: make async_synchronize_full() flush all work regardless of domain
  scsi: queue async scan work to an async_schedule domain
  scsi: cleanup usages of scsi_complete_async_scans
  Revert "[SCSI] fix async probe regression"


 drivers/regulator/core.c  |2 +
 drivers/scsi/libsas/sas_ata.c |2 +
 drivers/scsi/scsi.c   |4 ++
 drivers/scsi/scsi_priv.h  |2 +
 drivers/scsi/scsi_scan.c  |   31 -

 include/linux/async.h |   36 +--
 include/scsi/scsi_scan.h  |   11 --
 kernel/async.c|   76 +++--
 kernel/power/hibernate.c  |8 
 kernel/power/user.c   |2 -
 sound/soc/soc-dapm.c  |2 +
 11 files changed, 104 insertions(+), 72 deletions(-)
 delete mode 100644 include/scsi/scsi_scan.h
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[resend PATCH 0/5] libsas, libata: suspend / resume and "reset once"

2012-07-09 Thread Dan Williams
Hi Jeff,

Let me know if any of these need reworking, otherwise I believe James is
waiting on your ack to take them (well all except patch1) through scsi.git.

--
Dan


Original description:
Set4 of 5 patchsets to update scsi, libsas, and libata in
support of the isci driver.

Let libsas hook into the generic suspend resume infrastructure in
libata, and provide a common suspend/resume implementation for lldds to
reuse.

"Reset once" is not part of the suspend/resume work.  But it is relevant
to libsas users who need to wait for domain-wide ata error recovery and
want to limit the effort for known well-behaved devices.

These have been in -next for the past couple kernel cycles.

---

Artur Wojcik (1):
  isci: implement suspend/resume support

Dan Williams (4):
  libata: reset once
  libata: export ata_port suspend/resume infrastructure for sas
  libsas: suspend / resume support
  libsas, ipr: cleanup ata_host flags initialization via ata_host_init


 Documentation/kernel-parameters.txt |3 +
 drivers/ata/libata-core.c   |   69 +++
 drivers/ata/libata-eh.c |2 +
 drivers/scsi/ipr.c  |3 -
 drivers/scsi/isci/host.c|2 -
 drivers/scsi/isci/host.h|2 -
 drivers/scsi/isci/init.c|   58 ++
 drivers/scsi/libsas/sas_ata.c   |   91 +--
 drivers/scsi/libsas/sas_discover.c  |   69 +++
 drivers/scsi/libsas/sas_dump.c  |1 
 drivers/scsi/libsas/sas_event.c |4 +-
 drivers/scsi/libsas/sas_init.c  |   90 ++-
 drivers/scsi/libsas/sas_internal.h  |1 
 drivers/scsi/libsas/sas_phy.c   |   21 
 drivers/scsi/libsas/sas_port.c  |   52 
 include/linux/libata.h  |   15 +-
 include/scsi/libsas.h   |   20 ++--
 include/scsi/sas_ata.h  |   10 
 18 files changed, 463 insertions(+), 50 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[resend PATCH 2/5] libata: export ata_port suspend/resume infrastructure for sas

2012-07-09 Thread Dan Williams
Reuse ata_port_{suspend|resume}_common for sas.  This path is chosen
over adding coordination between ata-tranport and sas-transport because
libsas wants to revalidate the domain at resume-time at the host level.
It can not validate links have resumed properly until libata has had a
chance to perform its revalidation, and any sane placing of an ata_port
in the sas-transport model would delay it's resumption until after the
host.

Export the common portion of port suspend/resume (bypass pm_runtime),
and allow sas to perform these operations asynchronously (similar to the
libsas async-ata probe implmentation).  Async operation is determined by
having an external, rather than stack based, location for storing the
result of the operation.

Reviewed-by: Jacek Danecki 
Signed-off-by: Dan Williams 
---
 drivers/ata/libata-core.c |   58 -
 include/linux/libata.h|   11 +
 2 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index efd2c72..da31691 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5248,16 +5248,20 @@ bool ata_link_offline(struct ata_link *link)
 #ifdef CONFIG_PM
 static int ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
   unsigned int action, unsigned int ehi_flags,
-  int wait)
+  int *async)
 {
struct ata_link *link;
unsigned long flags;
-   int rc;
+   int rc = 0;
 
/* Previous resume operation might still be in
 * progress.  Wait for PM_PENDING to clear.
 */
if (ap->pflags & ATA_PFLAG_PM_PENDING) {
+   if (async) {
+   *async = -EAGAIN;
+   return 0;
+   }
ata_port_wait_eh(ap);
WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
}
@@ -5266,10 +5270,10 @@ static int ata_port_request_pm(struct ata_port *ap, 
pm_message_t mesg,
spin_lock_irqsave(ap->lock, flags);
 
ap->pm_mesg = mesg;
-   if (wait) {
-   rc = 0;
+   if (async)
+   ap->pm_result = async;
+   else
ap->pm_result = &rc;
-   }
 
ap->pflags |= ATA_PFLAG_PM_PENDING;
ata_for_each_link(link, ap, HOST_FIRST) {
@@ -5282,7 +5286,7 @@ static int ata_port_request_pm(struct ata_port *ap, 
pm_message_t mesg,
spin_unlock_irqrestore(ap->lock, flags);
 
/* wait and check result */
-   if (wait) {
+   if (!async) {
ata_port_wait_eh(ap);
WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
}
@@ -5292,9 +5296,8 @@ static int ata_port_request_pm(struct ata_port *ap, 
pm_message_t mesg,
 
 #define to_ata_port(d) container_of(d, struct ata_port, tdev)
 
-static int ata_port_suspend_common(struct device *dev, pm_message_t mesg)
+static int __ata_port_suspend_common(struct ata_port *ap, pm_message_t mesg, 
int *async)
 {
-   struct ata_port *ap = to_ata_port(dev);
unsigned int ehi_flags = ATA_EHI_QUIET;
int rc;
 
@@ -5309,10 +5312,17 @@ static int ata_port_suspend_common(struct device *dev, 
pm_message_t mesg)
if (mesg.event == PM_EVENT_SUSPEND)
ehi_flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_NO_RECOVERY;
 
-   rc = ata_port_request_pm(ap, mesg, 0, ehi_flags, 1);
+   rc = ata_port_request_pm(ap, mesg, 0, ehi_flags, async);
return rc;
 }
 
+static int ata_port_suspend_common(struct device *dev, pm_message_t mesg)
+{
+   struct ata_port *ap = to_ata_port(dev);
+
+   return __ata_port_suspend_common(ap, mesg, NULL);
+}
+
 static int ata_port_suspend(struct device *dev)
 {
if (pm_runtime_suspended(dev))
@@ -5337,16 +5347,22 @@ static int ata_port_poweroff(struct device *dev)
return ata_port_suspend_common(dev, PMSG_HIBERNATE);
 }
 
-static int ata_port_resume_common(struct device *dev)
+static int __ata_port_resume_common(struct ata_port *ap, int *async)
 {
-   struct ata_port *ap = to_ata_port(dev);
int rc;
 
rc = ata_port_request_pm(ap, PMSG_ON, ATA_EH_RESET,
-   ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET, 1);
+   ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET, async);
return rc;
 }
 
+static int ata_port_resume_common(struct device *dev)
+{
+   struct ata_port *ap = to_ata_port(dev);
+
+   return __ata_port_resume_common(ap, NULL);
+}
+
 static int ata_port_resume(struct device *dev)
 {
int rc;
@@ -5379,6 +5395,24 @@ static const struct dev_pm_ops ata_port_pm_ops = {
.runtime_idle = ata_port_runtime_idle,
 };
 
+/* sas ports don't participate in pm runtime management of ata_ports,
+ * and need to resume ata devices at the domain level, not the per-port
+ * level. sas suspend/resume is async to allow parallel port recovery
+ * since sas has multiple ata_port instances per Scsi_Host.
+ */
+int ata_sas

[resend PATCH 4/5] isci: implement suspend/resume support

2012-07-09 Thread Dan Williams
From: Artur Wojcik 

Provide a "simple-dev-pm-ops" implementation that shuts down the domain
and the device on suspend, and resumes the device and the domain on
resume.  All of the mechanics of restoring domain connectivity are
handled by libsas once isci has notified libsas that all links should be
back up.  libsas is in charge of handling links that did not resume, or
resumed out of order.

Signed-off-by: Artur Wojcik 
Signed-off-by: Jacek Danecki 
Signed-off-by: Dan Williams 
---
 drivers/scsi/isci/host.c |2 +-
 drivers/scsi/isci/host.h |2 +-
 drivers/scsi/isci/init.c |   58 +-
 3 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/isci/host.c b/drivers/scsi/isci/host.c
index 45385f5..bc8981e 100644
--- a/drivers/scsi/isci/host.c
+++ b/drivers/scsi/isci/host.c
@@ -1044,7 +1044,7 @@ static enum sci_status sci_controller_start(struct 
isci_host *ihost,
return SCI_SUCCESS;
 }
 
-void isci_host_scan_start(struct Scsi_Host *shost)
+void isci_host_start(struct Scsi_Host *shost)
 {
struct isci_host *ihost = SHOST_TO_SAS_HA(shost)->lldd_ha;
unsigned long tmo = sci_controller_get_suggested_start_timeout(ihost);
diff --git a/drivers/scsi/isci/host.h b/drivers/scsi/isci/host.h
index 9ab58e0..4911310 100644
--- a/drivers/scsi/isci/host.h
+++ b/drivers/scsi/isci/host.h
@@ -473,7 +473,7 @@ void sci_controller_remote_device_stopped(struct isci_host 
*ihost,
 
 enum sci_status sci_controller_continue_io(struct isci_request *ireq);
 int isci_host_scan_finished(struct Scsi_Host *, unsigned long);
-void isci_host_scan_start(struct Scsi_Host *);
+void isci_host_start(struct Scsi_Host *);
 u16 isci_alloc_tag(struct isci_host *ihost);
 enum sci_status isci_free_tag(struct isci_host *ihost, u16 io_tag);
 void isci_tci_free(struct isci_host *ihost, u16 tci);
diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c
index 92c1d86..da142a8 100644
--- a/drivers/scsi/isci/init.c
+++ b/drivers/scsi/isci/init.c
@@ -156,7 +156,7 @@ static struct scsi_host_template isci_sht = {
.target_alloc   = sas_target_alloc,
.slave_configure= sas_slave_configure,
.scan_finished  = isci_host_scan_finished,
-   .scan_start = isci_host_scan_start,
+   .scan_start = isci_host_start,
.change_queue_depth = sas_change_queue_depth,
.change_queue_type  = sas_change_queue_type,
.bios_param = sas_bios_param,
@@ -722,11 +722,67 @@ static void __devexit isci_pci_remove(struct pci_dev 
*pdev)
}
 }
 
+#ifdef CONFIG_PM
+static int isci_suspend(struct device *dev)
+{
+   struct pci_dev *pdev = to_pci_dev(dev);
+   struct isci_host *ihost;
+   int i;
+
+   for_each_isci_host(i, ihost, pdev) {
+   sas_suspend_ha(&ihost->sas_ha);
+   isci_host_deinit(ihost);
+   }
+
+   pci_save_state(pdev);
+   pci_disable_device(pdev);
+   pci_set_power_state(pdev, PCI_D3hot);
+
+   return 0;
+}
+
+static int isci_resume(struct device *dev)
+{
+   struct pci_dev *pdev = to_pci_dev(dev);
+   struct isci_host *ihost;
+   int rc, i;
+
+   pci_set_power_state(pdev, PCI_D0);
+   pci_restore_state(pdev);
+
+   rc = pcim_enable_device(pdev);
+   if (rc) {
+   dev_err(&pdev->dev,
+   "enabling device failure after resume(%d)\n", rc);
+   return rc;
+   }
+
+   pci_set_master(pdev);
+
+   for_each_isci_host(i, ihost, pdev) {
+   sas_prep_resume_ha(&ihost->sas_ha);
+
+   isci_host_init(ihost);
+   isci_host_start(ihost->sas_ha.core.shost);
+   wait_for_start(ihost);
+
+   sas_resume_ha(&ihost->sas_ha);
+   }
+
+   return 0;
+}
+
+static SIMPLE_DEV_PM_OPS(isci_pm_ops, isci_suspend, isci_resume);
+#endif
+
 static struct pci_driver isci_pci_driver = {
.name   = DRV_NAME,
.id_table   = isci_id_table,
.probe  = isci_pci_probe,
.remove = __devexit_p(isci_pci_remove),
+#ifdef CONFIG_PM
+   .driver.pm  = &isci_pm_ops,
+#endif
 };
 
 static __init int isci_init(void)

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[resend PATCH 3/5] libsas: suspend / resume support

2012-07-09 Thread Dan Williams
libsas power management routines to suspend and recover the sas domain
based on a model where the lldd is allowed and expected to be
"forgetful".

sas_suspend_ha - disable event processing allowing the lldd to take down
 links without concern for causing hotplug events.
 Regardless of whether the lldd actually posts link down
 messages libsas notifies the lldd that all
 domain_devices are gone.

sas_prep_resume_ha - on the way back up before the lldd starts link
 training clean out any spurious events that were
 generated on the way down, and re-enable event
 processing

sas_resume_ha - after the lldd has started and decided that all phys
have posted link-up events this routine is called to let
libsas start it's own timeout of any phys that did not
resume.  After the timeout an lldd can cancel the
phy teardown by posting a link-up event.

Storage for ex_change_count (u16) and phy_change_count (u8) are changed
to int so they can be set to -1 to indicate 'invalidated'.

Cc: Alan Stern 
Reviewed-by: Jacek Danecki 
Tested-by: Maciej Patelczyk 
Signed-off-by: Dan Williams 
---
 drivers/scsi/libsas/sas_ata.c  |   86 ++
 drivers/scsi/libsas/sas_discover.c |   69 
 drivers/scsi/libsas/sas_dump.c |1 
 drivers/scsi/libsas/sas_event.c|4 +-
 drivers/scsi/libsas/sas_init.c |   90 
 drivers/scsi/libsas/sas_internal.h |1 
 drivers/scsi/libsas/sas_phy.c  |   21 
 drivers/scsi/libsas/sas_port.c |   52 -
 include/scsi/libsas.h  |   20 ++--
 include/scsi/sas_ata.h |   10 
 10 files changed, 335 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index bec3bc8..4208e16 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -700,6 +700,92 @@ void sas_probe_sata(struct asd_sas_port *port)
if (ata_dev_disabled(sas_to_ata_dev(dev)))
sas_fail_probe(dev, __func__, -ENODEV);
}
+
+}
+
+static bool sas_ata_flush_pm_eh(struct asd_sas_port *port, const char *func)
+{
+   struct domain_device *dev, *n;
+   bool retry = false;
+
+   list_for_each_entry_safe(dev, n, &port->dev_list, dev_list_node) {
+   int rc;
+
+   if (!dev_is_sata(dev))
+   continue;
+
+   sas_ata_wait_eh(dev);
+   rc = dev->sata_dev.pm_result;
+   if (rc == -EAGAIN)
+   retry = true;
+   else if (rc) {
+   /* since we don't have a
+* ->port_{suspend|resume} routine in our
+*  ata_port ops, and no entanglements with
+*  acpi, suspend should just be mechanical trip
+*  through eh, catch cases where these
+*  assumptions are invalidated
+*/
+   WARN_ONCE(1, "failed %s %s error: %d\n", func,
+dev_name(&dev->rphy->dev), rc);
+   }
+
+   /* if libata failed to power manage the device, tear it down */
+   if (ata_dev_disabled(sas_to_ata_dev(dev)))
+   sas_fail_probe(dev, func, -ENODEV);
+   }
+
+   return retry;
+}
+
+void sas_suspend_sata(struct asd_sas_port *port)
+{
+   struct domain_device *dev;
+
+ retry:
+   mutex_lock(&port->ha->disco_mutex);
+   list_for_each_entry(dev, &port->dev_list, dev_list_node) {
+   struct sata_device *sata;
+
+   if (!dev_is_sata(dev))
+   continue;
+
+   sata = &dev->sata_dev;
+   if (sata->ap->pm_mesg.event == PM_EVENT_SUSPEND)
+   continue;
+
+   sata->pm_result = -EIO;
+   ata_sas_port_async_suspend(sata->ap, &sata->pm_result);
+   }
+   mutex_unlock(&port->ha->disco_mutex);
+
+   if (sas_ata_flush_pm_eh(port, __func__))
+   goto retry;
+}
+
+void sas_resume_sata(struct asd_sas_port *port)
+{
+   struct domain_device *dev;
+
+ retry:
+   mutex_lock(&port->ha->disco_mutex);
+   list_for_each_entry(dev, &port->dev_list, dev_list_node) {
+   struct sata_device *sata;
+
+   if (!dev_is_sata(dev))
+   continue;
+
+   sata = &dev->sata_dev;
+   if (sata->ap->pm_mesg.event == PM_EVENT_ON)
+   continue;
+
+   sata->pm_result = -EIO;
+   ata_sas_port_async_resume(sata->ap, &sata->pm_result);
+   }
+   mutex_unlock(&port->ha->disco_mutex);
+
+   if (sas_ata_flush_pm_eh(port

[resend PATCH 5/5] libsas, ipr: cleanup ata_host flags initialization via ata_host_init

2012-07-09 Thread Dan Williams
libsas and ipr pass flags to ata_host_init that are meant for the port.

ata_host flags:
ATA_HOST_SIMPLEX= (1 << 0), /* Host is simplex, one DMA 
channel per host only */
ATA_HOST_STARTED= (1 << 1), /* Host started */
ATA_HOST_PARALLEL_SCAN  = (1 << 2), /* Ports on this host can be 
scanned in parallel */
ATA_HOST_IGNORE_ATA = (1 << 3), /* Ignore ATA devices on this 
host. */

flags passed by libsas:
ATA_FLAG_SATA   = (1 << 1),
ATA_FLAG_PIO_DMA= (1 << 7), /* PIO cmds via DMA */
ATA_FLAG_NCQ= (1 << 10), /* host supports NCQ */

The only one that aliases is ATA_HOST_STARTED which is a 'don't care' in
the libsas and ipr cases since ata_hosts from these sources are not
registered with libata.

Cc: Brian King 
Reported-by: Hannes Reinecke 
Signed-off-by: Dan Williams 
---
 drivers/ata/libata-core.c |   10 ++
 drivers/scsi/ipr.c|3 +--
 drivers/scsi/libsas/sas_ata.c |5 +
 include/linux/libata.h|3 +--
 4 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index da31691..6aa72b8 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5958,24 +5958,18 @@ int ata_host_start(struct ata_host *host)
 }
 
 /**
- * ata_sas_host_init - Initialize a host struct
+ * ata_sas_host_init - Initialize a host struct for sas (ipr, libsas)
  * @host:  host to initialize
  * @dev:   device host is attached to
- * @flags: host flags
  * @ops:   port_ops
  *
- * LOCKING:
- * PCI/etc. bus probe sem.
- *
  */
-/* KILLME - the only user left is ipr */
 void ata_host_init(struct ata_host *host, struct device *dev,
-  unsigned long flags, struct ata_port_operations *ops)
+  struct ata_port_operations *ops)
 {
spin_lock_init(&host->lock);
mutex_init(&host->eh_mutex);
host->dev = dev;
-   host->flags = flags;
host->ops = ops;
 }
 
diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 467dc38..dacc784 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -8775,8 +8775,7 @@ static int __devinit ipr_probe_ioa(struct pci_dev *pdev,
 
ioa_cfg = (struct ipr_ioa_cfg *)host->hostdata;
memset(ioa_cfg, 0, sizeof(struct ipr_ioa_cfg));
-   ata_host_init(&ioa_cfg->ata_host, &pdev->dev,
- sata_port_info.flags, &ipr_sata_ops);
+   ata_host_init(&ioa_cfg->ata_host, &pdev->dev, &ipr_sata_ops);
 
ioa_cfg->ipr_chip = ipr_get_chip_info(dev_id);
 
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 4208e16..5d10e4d 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -580,10 +580,7 @@ int sas_ata_init(struct domain_device *found_dev)
struct ata_port *ap;
int rc;
 
-   ata_host_init(&found_dev->sata_dev.ata_host,
- ha->dev,
- sata_port_info.flags,
- &sas_sata_ops);
+   ata_host_init(&found_dev->sata_dev.ata_host, ha->dev, &sas_sata_ops);
ap = ata_sas_port_alloc(&found_dev->sata_dev.ata_host,
&sata_port_info,
shost);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index af467d3..baf9f82 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -990,8 +990,7 @@ extern int ata_host_activate(struct ata_host *host, int irq,
 irq_handler_t irq_handler, unsigned long irq_flags,
 struct scsi_host_template *sht);
 extern void ata_host_detach(struct ata_host *host);
-extern void ata_host_init(struct ata_host *, struct device *,
- unsigned long, struct ata_port_operations *);
+extern void ata_host_init(struct ata_host *, struct device *, struct 
ata_port_operations *);
 extern int ata_scsi_detect(struct scsi_host_template *sht);
 extern int ata_scsi_ioctl(struct scsi_device *dev, int cmd, void __user *arg);
 extern int ata_scsi_queuecmd(struct Scsi_Host *h, struct scsi_cmnd *cmd);

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[resend PATCH 1/5] libata: reset once

2012-07-09 Thread Dan Williams
Hotplug testing with libsas currently encounters a 55 second wait for
link recovery to give up.  In the case where the user trusts the
response time of their devices permit the recovery attempts to be
limited to one.

Signed-off-by: Dan Williams 
---
 Documentation/kernel-parameters.txt |3 +++
 drivers/ata/libata-core.c   |1 +
 drivers/ata/libata-eh.c |2 ++
 include/linux/libata.h  |1 +
 4 files changed, 7 insertions(+)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index a92c5eb..a896b25 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1351,6 +1351,9 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
* nohrst, nosrst, norst: suppress hard, soft
   and both resets.
 
+   * rstonce: only attempt one reset during
+ hot-unplug link recovery
+
* dump_id: dump IDENTIFY data.
 
If there are multiple matching configurations changing
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 3fe1202..efd2c72 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6388,6 +6388,7 @@ static int __init ata_parse_force_one(char **cur,
{ "nohrst", .lflags = ATA_LFLAG_NO_HRST },
{ "nosrst", .lflags = ATA_LFLAG_NO_SRST },
{ "norst",  .lflags = ATA_LFLAG_NO_HRST | 
ATA_LFLAG_NO_SRST },
+   { "rstonce",.lflags = ATA_LFLAG_RST_ONCE },
};
char *start = *cur, *p = *cur;
char *id, *val, *endp;
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 77fc806..a5d2aba 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2623,6 +2623,8 @@ int ata_eh_reset(struct ata_link *link, int classify,
 */
while (ata_eh_reset_timeouts[max_tries] != ULONG_MAX)
max_tries++;
+   if (link->flags & ATA_LFLAG_RST_ONCE)
+   max_tries = 1;
if (link->flags & ATA_LFLAG_NO_HRST)
hardreset = NULL;
if (link->flags & ATA_LFLAG_NO_SRST)
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 53da442..f777d30 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -182,6 +182,7 @@ enum {
ATA_LFLAG_DISABLED  = (1 << 6), /* link is disabled */
ATA_LFLAG_SW_ACTIVITY   = (1 << 7), /* keep activity stats */
ATA_LFLAG_NO_LPM= (1 << 8), /* disable LPM on this link */
+   ATA_LFLAG_RST_ONCE  = (1 << 9), /* limit recovery to one reset */
 
/* struct ata_port flags */
ATA_FLAG_SLAVE_POSS = (1 << 0), /* host supports slave dev */

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html