Re: Debugging scsi abort handling ?

2014-08-29 Thread Hannes Reinecke

On 08/29/2014 06:39 AM, Finn Thain wrote:


On Thu, 28 Aug 2014, Hannes Reinecke wrote:


What might happen, though, that the command is already dead and gone by
the time you're calling -scsi_done() (if you call it after eh_abort).
So there might not _be_ a command upon which you can call -scsi_done()
to start with.

Hence any LLDD need to clear up any internal references after a call to
eh_XXX to ensure it doesn't call -scsi_done() an in invalid command.

So even if the LLDD returns 'FAILED' upon a call to eh_XXX it _still_
needs to clear up the internal reference.


This is a question that has been bothering me too. If the host's
eh_abort_cmd() method returns FAILED, it seems the mid-layer is liable to
re-issue the same command to the LLD (?)


No.
FAILED for any eh_abort_cmd() means that the TMF hasn't been sent.
So the midlayer escalates to the next EH step.
The command will only ever be re-issued once EH completes.


Either that or return 'FAILED' for any later eh_XXX function until the
internal references can be cleared up.


So if a command may or may not exist after eh_abort_handler() returns
control to the mid-layer (regardless of SUCCESS or FAILURE), then the LLD
has to be careful about keeping track of which commands were aborted, if
those commands are still in the process of cleanup when eh_abort_handler()
returns.


Yes.


It's hard to see how that can work when command pointers are only unique
while a command exists.

Which is why we have the EH callbacks, to give the LLDD a chance to 
clean up internal references.



In effect, this would mean that EH functions cannot return at all, until
the relevant command(s) are completely forgotten by the LLD; and that
means the LLD itself may have to escalate abort - device reset - bus
reset - etc instead of simply returning FAILED.

More often than not the LLDD has its own internal command structure, 
which reference the midlayer SCSI command structure via a pointer.

Just clearing that pointer will do the trick.

Take eg. lpfc:
It'll construct its internal command here:

lpfc_cmd = lpfc_get_scsi_buf(phba, ndlp);
if (lpfc_cmd == NULL) {
lpfc_rampdown_queue_depth(phba);

lpfc_printf_vlog(vport, KERN_INFO, LOG_FCP,
 0707 driver's buffer pool is empty, 
 IO busied\n);
goto out_host_busy;
}

/*
 * Store the midlayer's command structure for the
 * completion phase
 * and complete the command initialization.
 */
lpfc_cmd-pCmd  = cmnd;
lpfc_cmd-rdata = rdata;
lpfc_cmd-timeout = 0;
lpfc_cmd-start_time = jiffies;
cmnd-host_scribble = (unsigned char *)lpfc_cmd;

and then checks for the pointer upon command completion:

static void
lpfc_scsi_cmd_iocb_cmpl(struct lpfc_hba *phba, struct lpfc_iocbq 
*pIocbIn,

struct lpfc_iocbq *pIocbOut)
{
struct lpfc_scsi_buf *lpfc_cmd =
(struct lpfc_scsi_buf *) pIocbIn-context1;

[ .. ]
/* Sanity check on return of outstanding command */
if (!(lpfc_cmd-pCmd))
return;

But indeed, 'FAILED' is not very meaningful here, leaving the 
midlayer with no information about what happened to the command.


Personally I would like to enforce this meaning on the eh_XXX callbacks:
- upon each eh_XXX callback the LLDD clears any internal references
  to the command / command scope (ie eh_abort_cmd clears the
  references to the command, eh_lun_reset clears all internal
  references to commands to this ITL nexus etc.)
  This happens irrespective of the return code.
- The eh_XXX callback shall return 'FAILED' if the respective
  TMF (or equivalent) could not be initiated.
- The eh_XXX callback shall return 'SUCCESS' if the respective
  TMF (or equvalent) could be initiated.
- After each eh_XXX callback control for this command / command
  scope is transferred back to the midlayer; the LLDD shall not
  assume the associated command structures to remain valid after
  that point.

I'm tempted to enshrine this in the documentation;
that surely will help me during the EH cleanup.
And Hans will have some guidelines on how to design uas EH :-)

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Drivers: scsi: storvsc: Force discovery of LUNs that may have been removed.

2014-08-29 Thread Hannes Reinecke

On 08/29/2014 04:42 AM, Mike Christie wrote:

On 08/27/2014 09:31 AM, Hannes Reinecke wrote:

On 08/19/2014 07:54 PM, Christoph Hellwig wrote:

On Sat, Aug 16, 2014 at 08:09:48PM -0700, K. Y. Srinivasan wrote:

The host asks the guest to scan when a LUN is removed or added.
The only way a guest can identify the removed LUN is when an I/O is
attempted on a removed LUN - the SRB status code indicates that the LUN
is invalid. We currently handle this SRB status and remove the device.

Rather than waiting for an I/O to remove the device, force the
discovery of
LUNs that may have been removed prior to discovering LUNs that may have
been added.


This looks pretty reasonable to me, but I wonder if we should move this
up to common code so that it happens for any host rescan triggered by
sysfs or other drivers as well.


Not without proper testing.
Currently we cannot rescan existing devices; the inquiry string is
nailed to the sdev structure. The only way to really refresh the
information is to delete it and rescan it again.


How are distros handling 0x6/0x3f/0x0e (report luns changed) when it
gets passed to userspace? Is everyone kicking off a new full (add and
delete) scan to handle this or logging it? Is the driver returning this
when the LUNs change?


Currently it's logged to userspace and ignored.
Doing an automated rescan has proven to be dangerous, as it
might disconnect any LUNs which are still in use by applications.
Especially HA or database setups tends to become very annoyed
when you do an automated rescan.

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Drivers: scsi: storvsc: Force discovery of LUNs that may have been removed.

2014-08-29 Thread Bart Van Assche
On 08/29/14 08:19, Hannes Reinecke wrote:
 On 08/29/2014 04:42 AM, Mike Christie wrote:
 How are distros handling 0x6/0x3f/0x0e (report luns changed) when it
 gets passed to userspace? Is everyone kicking off a new full (add and
 delete) scan to handle this or logging it? Is the driver returning this
 when the LUNs change?

 Currently it's logged to userspace and ignored.
 Doing an automated rescan has proven to be dangerous, as it
 might disconnect any LUNs which are still in use by applications.
 Especially HA or database setups tends to become very annoyed
 when you do an automated rescan.

Has it already been considered to add newly discovered LUNs
automatically and to leave it to the user to remove stale LUNs manually
? That would be similar to what the rescan-scsi-bus.sh script does
without option -r/--remove.

Bart.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Debugging scsi abort handling ?

2014-08-29 Thread Paolo Bonzini
Il 29/08/2014 08:08, Hannes Reinecke ha scritto:

 No.
 FAILED for any eh_abort_cmd() means that the TMF hasn't been sent.
 So the midlayer escalates to the next EH step.
 The command will only ever be re-issued once EH completes.

Then the answer to Hans's question is yes.  It is legal to call
-scsi_done() after the eh_abort handler returns FAILED.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Drivers: scsi: storvsc: Force discovery of LUNs that may have been removed.

2014-08-29 Thread Hannes Reinecke

On 08/29/2014 09:39 AM, Bart Van Assche wrote:

On 08/29/14 08:19, Hannes Reinecke wrote:

On 08/29/2014 04:42 AM, Mike Christie wrote:

How are distros handling 0x6/0x3f/0x0e (report luns changed) when it
gets passed to userspace? Is everyone kicking off a new full (add and
delete) scan to handle this or logging it? Is the driver returning this
when the LUNs change?


Currently it's logged to userspace and ignored.
Doing an automated rescan has proven to be dangerous, as it
might disconnect any LUNs which are still in use by applications.
Especially HA or database setups tends to become very annoyed
when you do an automated rescan.


Has it already been considered to add newly discovered LUNs
automatically and to leave it to the user to remove stale LUNs manually
? That would be similar to what the rescan-scsi-bus.sh script does
without option -r/--remove.



As of now we're still missing an in-kernel infrastructure which 
would allow us to react on any sense codes; currently we're relying 
on the administrator to setup a udev rule here.


Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/22] scsi logging update

2014-08-29 Thread Hannes Reinecke

On 08/28/2014 09:24 PM, Douglas Gilbert wrote:

On 14-08-28 01:33 PM, Hannes Reinecke wrote:

Hi all,

here's my next round of scsi logging updates.
Main feature is the update to have all logging
statements in one line so that they won't be broken
up even under high load.
This will dramatically improve debugging.

Additionally all printk() statements are moved
to dev_printk() variants to ensure proper device
tagging and keep the systemd journal happy.


s/all/most/ ??


My, you are picky.


Surely there are situations where a dev cannot be
associated with a printk(). For example in transport
discovery before any devices are found (or after,
if none are found). LLDs often helpfully log their HBA's
firmware details prior to discovery (and may fail
before discovery).


Indeed there are some printks left, eg during
SCSI initialization where we don't have any device.
And I didn't modify the LLDDs, which have their
own logging.
(But most don't use dev_printk(), neither).


And it is possible to write via sysfs to a driver
that has no devices attached. How does one log that?


Well, I haven't come across any logging messages here,
so the question has never arisen.

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Debugging scsi abort handling ?

2014-08-29 Thread Finn Thain

On Fri, 29 Aug 2014, Hannes Reinecke wrote:

 On 08/29/2014 06:39 AM, Finn Thain wrote:
 
  On Thu, 28 Aug 2014, Hannes Reinecke wrote:
 
   What might happen, though, that the command is already dead and gone 
   by the time you're calling -scsi_done() (if you call it after 
   eh_abort). So there might not _be_ a command upon which you can call
   -scsi_done() to start with.
  
   Hence any LLDD need to clear up any internal references after a call 
   to eh_XXX to ensure it doesn't call -scsi_done() an in invalid 
   command.
  
   So even if the LLDD returns 'FAILED' upon a call to eh_XXX it 
   _still_ needs to clear up the internal reference.
 
  This is a question that has been bothering me too. If the host's 
  eh_abort_cmd() method returns FAILED, it seems the mid-layer is liable 
  to re-issue the same command to the LLD (?)
 
 No.
 FAILED for any eh_abort_cmd() means that the TMF hasn't been sent.

Makes sense, though it appears to contradict this advice about returning 
SUCCESS in some situations: 
http://marc.info/?l=linux-scsim=140923498632496w=2

 The command will only ever be re-issued once EH completes.

...

 
 But indeed, 'FAILED' is not very meaningful here, leaving the midlayer 
 with no information about what happened to the command.
 
 Personally I would like to enforce this meaning on the eh_XXX callbacks:
 - upon each eh_XXX callback the LLDD clears any internal references
   to the command / command scope (ie eh_abort_cmd clears the
   references to the command, eh_lun_reset clears all internal
   references to commands to this ITL nexus etc.)
   This happens irrespective of the return code.
 - The eh_XXX callback shall return 'FAILED' if the respective
   TMF (or equivalent) could not be initiated.
 - The eh_XXX callback shall return 'SUCCESS' if the respective
   TMF (or equvalent) could be initiated.
 - After each eh_XXX callback control for this command / command
   scope is transferred back to the midlayer; the LLDD shall not
   assume the associated command structures to remain valid after
   that point.

Perhaps that last constraint should be relaxed to After the final EH 
callback (whether implemented or unimplemented by the host), command / 
command scope is transferred back to the midlayer...

A more severe TMF is probably mandatory (e.g. bus reset) but if the driver 
author later added a milder one (e.g. bus device reset), your rule would 
mean that the existing handler would then operate under new constraints, 
which might cause surprises.

 [...] I'm tempted to enshrine this in the documentation;

It is helpful, thanks.

-- 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/15] SCSI XCOPY support for the kernel and device mapper

2014-08-29 Thread Martin K. Petersen
 Mike == Mike Snitzer snit...@redhat.com writes:

Mike It would be ideal for XCOPY support to make its way upstream for
Mike 3.18.. but the window for staging this work in time is closing.

Mike Any chance you might have some time to review Mikulas' revised
Mike approach to your initial XCOPY support? 

It is at the top of my list.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Debugging scsi abort handling ?

2014-08-29 Thread Hannes Reinecke

On 08/29/2014 12:14 PM, Finn Thain wrote:


On Fri, 29 Aug 2014, Hannes Reinecke wrote:


On 08/29/2014 06:39 AM, Finn Thain wrote:


On Thu, 28 Aug 2014, Hannes Reinecke wrote:


What might happen, though, that the command is already dead and gone
by the time you're calling -scsi_done() (if you call it after
eh_abort). So there might not _be_ a command upon which you can call
-scsi_done() to start with.

Hence any LLDD need to clear up any internal references after a call
to eh_XXX to ensure it doesn't call -scsi_done() an in invalid
command.

So even if the LLDD returns 'FAILED' upon a call to eh_XXX it
_still_ needs to clear up the internal reference.


This is a question that has been bothering me too. If the host's
eh_abort_cmd() method returns FAILED, it seems the mid-layer is liable
to re-issue the same command to the LLD (?)


No.
FAILED for any eh_abort_cmd() means that the TMF hasn't been sent.


Makes sense, though it appears to contradict this advice about returning
SUCCESS in some situations:
http://marc.info/?l=linux-scsim=140923498632496w=2

Well, if the LLDD detects an invalid command (ie if it cannot find 
any internal command matching the midlayer command) that's an 
automatic success, obviously.


So we should rephrase things to:

- The eh_XXX callback shall return 'SUCCESS' if the respective
  TMF (or equvalent) could be initiated or if the matching command
  reference has already been completed by the LLDD. Otherwise
  the eh_XXX callback shall return 'FAILED'.


The command will only ever be re-issued once EH completes.


...



But indeed, 'FAILED' is not very meaningful here, leaving the midlayer
with no information about what happened to the command.

Personally I would like to enforce this meaning on the eh_XXX callbacks:
- upon each eh_XXX callback the LLDD clears any internal references
   to the command / command scope (ie eh_abort_cmd clears the
   references to the command, eh_lun_reset clears all internal
   references to commands to this ITL nexus etc.)
   This happens irrespective of the return code.
- The eh_XXX callback shall return 'FAILED' if the respective
   TMF (or equivalent) could not be initiated.
- The eh_XXX callback shall return 'SUCCESS' if the respective
   TMF (or equvalent) could be initiated.
- After each eh_XXX callback control for this command / command
   scope is transferred back to the midlayer; the LLDD shall not
   assume the associated command structures to remain valid after
   that point.


Perhaps that last constraint should be relaxed to After the final EH
callback (whether implemented or unimplemented by the host), command /
command scope is transferred back to the midlayer...


No, that's wrong.

By the time any eh_XXX callbacks are triggered control _is_ already 
back at the midlayer. IE the command timeout triggered and the block 
layer already set the REQ_ATOM_COMPLETED flag, short-circuiting any 
attempts to call -scsi_done().
So with the callbacks the midlayer actually informs the LLDD about a 
certain fact; there is nothing the LLDD can do to change ownership 
at that point.


(Correction: During the call of any eh_XXX callbacks control _is_ 
back at the LLDD, otherwise the callbacks would be pointless. It's

just that the LLDD shouldn't assume the command is valid _after_
any of the eh_XXX callbacks has terminated.)


A more severe TMF is probably mandatory (e.g. bus reset) but if the driver
author later added a milder one (e.g. bus device reset), your rule would
mean that the existing handler would then operate under new constraints,
which might cause surprises.


Well, _if_ we were to adopt this rule we obviously have to audit
existing LLDDs if the rule is followed, and tweak them if not.

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Debugging scsi abort handling ?

2014-08-29 Thread Hans de Goede
Hi,

On 08/29/2014 12:30 PM, Hannes Reinecke wrote:
 On 08/29/2014 12:14 PM, Finn Thain wrote:

 On Fri, 29 Aug 2014, Hannes Reinecke wrote:

 On 08/29/2014 06:39 AM, Finn Thain wrote:

 On Thu, 28 Aug 2014, Hannes Reinecke wrote:

 What might happen, though, that the command is already dead and gone
 by the time you're calling -scsi_done() (if you call it after
 eh_abort). So there might not _be_ a command upon which you can call
 -scsi_done() to start with.

 Hence any LLDD need to clear up any internal references after a call
 to eh_XXX to ensure it doesn't call -scsi_done() an in invalid
 command.

 So even if the LLDD returns 'FAILED' upon a call to eh_XXX it
 _still_ needs to clear up the internal reference.

 This is a question that has been bothering me too. If the host's
 eh_abort_cmd() method returns FAILED, it seems the mid-layer is liable
 to re-issue the same command to the LLD (?)

 No.
 FAILED for any eh_abort_cmd() means that the TMF hasn't been sent.

 Makes sense, though it appears to contradict this advice about returning
 SUCCESS in some situations:
 http://marc.info/?l=linux-scsim=140923498632496w=2

 Well, if the LLDD detects an invalid command (ie if it cannot find any 
 internal command matching the midlayer command) that's an automatic success, 
 obviously.
 
 So we should rephrase things to:
 
 - The eh_XXX callback shall return 'SUCCESS' if the respective
   TMF (or equvalent) could be initiated or if the matching command
   reference has already been completed by the LLDD. Otherwise
   the eh_XXX callback shall return 'FAILED'.

Your talking about could be initiated, so that means that at this
point the abort does not yet have to be completed, do I get that
right? What should the LLDD then do when the abort finishes,
call eh_scsi_done on the cmnd ?

What about the abort never finishing (timeout), does the mid layer
track this, or should the LLDD do that?

Regards,

Hans
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Drivers: scsi: storvsc: Force discovery of LUNs that may have been removed.

2014-08-29 Thread James Bottomley
On Fri, 2014-08-29 at 10:13 +0200, Hannes Reinecke wrote:
 On 08/29/2014 09:39 AM, Bart Van Assche wrote:
  On 08/29/14 08:19, Hannes Reinecke wrote:
  On 08/29/2014 04:42 AM, Mike Christie wrote:
  How are distros handling 0x6/0x3f/0x0e (report luns changed) when it
  gets passed to userspace? Is everyone kicking off a new full (add and
  delete) scan to handle this or logging it? Is the driver returning this
  when the LUNs change?
 
  Currently it's logged to userspace and ignored.
  Doing an automated rescan has proven to be dangerous, as it
  might disconnect any LUNs which are still in use by applications.
  Especially HA or database setups tends to become very annoyed
  when you do an automated rescan.
 
  Has it already been considered to add newly discovered LUNs
  automatically and to leave it to the user to remove stale LUNs manually
  ? That would be similar to what the rescan-scsi-bus.sh script does
  without option -r/--remove.
 
 
 As of now we're still missing an in-kernel infrastructure which 
 would allow us to react on any sense codes; currently we're relying 
 on the administrator to setup a udev rule here.

Um, I thought this was supposed to solve that problem:

commit 279afdfe78a020b4b1a68bffd0009b961b12982e
Author: Ewan D. Milne emi...@redhat.com
Date:   Thu Aug 8 15:07:48 2013 -0400

[SCSI] Generate uevents on certain unit attention codes

The idea was supposed to be that, as you say, log scrubbers are hard to
configure and break every time someone fixes a spelling error, so we
could now listen for a report luns data change uevent instead.

James



Re: [PATCH 2/2] Drivers: scsi: storvsc: Force discovery of LUNs that may have been removed.

2014-08-29 Thread Ewan Milne
On Thu, 2014-08-28 at 21:42 -0500, Mike Christie wrote:
 On 08/27/2014 09:31 AM, Hannes Reinecke wrote:
  On 08/19/2014 07:54 PM, Christoph Hellwig wrote:
  On Sat, Aug 16, 2014 at 08:09:48PM -0700, K. Y. Srinivasan wrote:
  The host asks the guest to scan when a LUN is removed or added.
  The only way a guest can identify the removed LUN is when an I/O is
  attempted on a removed LUN - the SRB status code indicates that the LUN
  is invalid. We currently handle this SRB status and remove the device.
 
  Rather than waiting for an I/O to remove the device, force the
  discovery of
  LUNs that may have been removed prior to discovering LUNs that may have
  been added.
 
  This looks pretty reasonable to me, but I wonder if we should move this
  up to common code so that it happens for any host rescan triggered by
  sysfs or other drivers as well.
 
  Not without proper testing.
  Currently we cannot rescan existing devices; the inquiry string is
  nailed to the sdev structure. The only way to really refresh the
  information is to delete it and rescan it again.
 
 How are distros handling 0x6/0x3f/0x0e (report luns changed) when it
 gets passed to userspace? Is everyone kicking off a new full (add and
 delete) scan to handle this or logging it? Is the driver returning this
 when the LUNs change?

Currently the udev rules we have to handle these events are installed
with a separate package, and only the REPORTED LUNS DATA HAS CHANGED
does anything, the others are commented out.  It turns out that e.g.
multipath stops using a path if it notices that the capacity has changed
and we need to do some more work there, it is under discussion.

We do not delete LUNs that disappear from the REPORT LUNS inventory,
although someone could write their own udev rule to do that if desired.

Beware the case where a LUN is remapped to a different LUN number, or
if LUN's WWID is used for a device with different data (e.g. a LUN
deleted and re-added and the WWID is the same although I don't know
if this actually happens).

Consider that the UA just provides notification to userspace of a
change -- lack of notification does not prevent someone from deciding
to rescan for new LUNs via sysfs any time they feel like it.  So
you can't just change the storage configuration and hope that no-one
notices until you are done making changes.

-Ewan

 
 Also is the driver getting a 0x5/0x25/0 (invalid LUN) when the LUN does
 not exist, or is it just getting that SRB_STATUS_INVALID_LUN error code?
 --
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] kexec: Export kexec_in_progress

2014-08-29 Thread Brian King
On 08/04/2014 09:21 AM, Brian King wrote:
 On 07/28/2014 03:28 PM, Brian King wrote:

 Export kexec_in_progress for use by device drivers and other modules
 to optimize kexec boot.

 Signed-off-by: Brian King brk...@linux.vnet.ibm.com
 ---

  kernel/kexec.c |2 ++
  1 file changed, 2 insertions(+)

 diff -puN kernel/kexec.c~kexec_export_in_prog kernel/kexec.c
 --- linux/kernel/kexec.c~kexec_export_in_prog2014-07-23 
 17:05:24.851887935 -0500
 +++ linux-bjking1/kernel/kexec.c 2014-07-23 17:05:24.856887970 -0500
 @@ -1716,3 +1716,5 @@ int kernel_kexec(void)
  mutex_unlock(kexec_mutex);
  return error;
  }
 +
 +EXPORT_SYMBOL_GPL(kexec_in_progress);
 
 Eric,
 
 Can I get an ack on this so we can take this entire series through the SCSI 
 tree?

Eric,

Any issues with this patch?

Thanks,

Brian


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Buffer I/O error after s2ram with usb storage

2014-08-29 Thread Matthieu CASTET
Le Wed, 27 Aug 2014 10:54:53 -0400,
Alan Stern st...@rowland.harvard.edu a écrit :

 On Wed, 27 Aug 2014, Matthieu CASTET wrote:
 
  Ping
  
  I have got also a problem with a usb sdcard reader (without power cut
  during suspend)
 
The usb storage driver call scsi_report_bus_reset after device reset,
but because of commit dfcf7775 4, we don't ignore unit attention if
sshdr.asc == 0x28  sshdr.ascq == 0x00 (Not-ready to ready).

If dfcf7775 is reverted there is no more Buffer I/O error.

Is that possible to find a way to restore the behavior before dfcf7775
commit (no Buffer I/O error after device reset) after a suspend to ram ?
 
 Since that commit was written to fix a problem with certain cdrom
 drives, maybe we would work around the issue by modifying the commit.  
 Have it go back to the original behavior if the device isn't a cdrom 
 drive.
 
 That's not a complete fix (it won't help when a CD drive is attached 
 via USB), but maybe it's better than nothing.
 
Ok,

note to handle all case we need also to filter unit_attention in
scsi_test_unit_ready. Otherwise DISK_EVENT_MEDIA_CHANGE event is set and 
check_disk_change will invalidate vfs cache.


Matthieu

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2bc0362..e994061 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2030,8 +2030,12 @@ scsi_test_unit_ready(struct scsi_device *sdev, int 
timeout, int retries,
result = scsi_execute_req(sdev, cmd, DMA_NONE, NULL, 0, sshdr,
  timeout, retries, NULL);
if (sdev-removable  scsi_sense_valid(sshdr) 
-   sshdr-sense_key == UNIT_ATTENTION)
-   sdev-changed = 1;
+   sshdr-sense_key == UNIT_ATTENTION) {
+   if (sdev-expecting_cc_ua)
+   sdev-expecting_cc_ua = 0;
+   else
+   sdev-changed = 1;
+   }
} while (scsi_sense_valid(sshdr) 
 sshdr-sense_key == UNIT_ATTENTION  --retries);
 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bnx2fc: fix incorrect DMA memory mapping in bnx2fc_map_sg()

2014-08-29 Thread Christoph Hellwig
Chad,

can you send out your last version with a proper changelog, signoff,
and the ack from Eddie included?  Also can you prioritize getting
the shared skb patch tested?

Thanks,
Christoph
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with USB-to-SATA adapters (was: AS2105-based enclosure size issues with 2TB HDDs)

2014-08-29 Thread Dale R. Worley
 From: Alan Stern st...@rowland.harvard.edu

 If you try to repartition the drive under Windows using the deficient 
 adapter, you'll see that the problem still exists.  It just doesn't 
 show up during normal use.

So in summary, the Windows workaround is icky, but it allows any use
but repartitioning to be one on the attached disk.

Dale
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with USB-to-SATA adapters (was: AS2105-based enclosure size issues with 2TB HDDs)

2014-08-29 Thread Matthew Dharm
Is there an 'easy' way to override the detected size of a storage
device from userspace?  If we had that, someone could write a helper
application which looked for this particular fubar and try to Do The
Right Thing(tm), or at least offer the user some options.

Matt

On Fri, Aug 29, 2014 at 2:07 PM, Dale R. Worley wor...@alum.mit.edu wrote:
 From: Alan Stern st...@rowland.harvard.edu

 If you try to repartition the drive under Windows using the deficient
 adapter, you'll see that the problem still exists.  It just doesn't
 show up during normal use.

 So in summary, the Windows workaround is icky, but it allows any use
 but repartitioning to be one on the attached disk.

 Dale
 --
 To unsubscribe from this list: send the line unsubscribe linux-usb in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Matthew Dharm
Maintainer, USB Mass Storage driver for Linux
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html