date:20140331

Re: [PATCH for-3.15] scsi/libiscsi: Fix static checker warning on bh locking

2014-03-31 Thread Mike Christie

On 03/30/2014 07:26 AM, Or Gerlitz wrote:
> From: Shlomo Pongratz 
> 
> Commit 659743b "[SCSI] libiscsi: Reduce locking contention in fast path" 
> introduced a
> new smatch warning on libiscsi.c "iscsi_xmit_task() warn: inconsistent returns
> bottom_half:: locked (1410 [(-61)]) unlocked (1425 [0], 1425 
> [s32min-(-1),1-s32max])",
> which we can eliminate by using non bh locking on the nested spin_lock call.
> 
> Reported-by: Dan Carpenter 
> Signed-off-by: Shlomo Pongratz 
> Signed-off-by: Or Gerlitz 
> ---
> 
>  drivers/scsi/libiscsi.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 5b8605c..5087957 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1411,9 +1411,9 @@ static int iscsi_xmit_task(struct iscsi_conn *conn)
>   conn->task = NULL;
>   }
>   /* regular RX path uses back_lock */
> - spin_lock_bh(&conn->session->back_lock);
> + spin_lock(&conn->session->back_lock);
>   __iscsi_put_task(task);
> - spin_unlock_bh(&conn->session->back_lock);
> + spin_unlock(&conn->session->back_lock);
>   return rc;

Thanks for cleaning this up Or and Shlomo.

Reviewed-by: Mike Christie 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] [SCSI] Fix spurious request sense in error handling

2014-03-31 Thread Hannes Reinecke

On 03/28/2014 06:50 PM, James Bottomley wrote:
> We unconditionally execute scsi_eh_get_sense() to make sure all failed
> commands that should have sense attached, do.  However, the routine forgets
> that some commands, because of the way they fail, will not have any sense code
> ... we should not bother them with a REQUEST_SENSE command.  Fix this by
> testing to see if we actually got a CHECK_CONDITION return and skip asking for
> sense if we don't.
> 
> Tested-by: Alan Stern 
> Signed-off-by: James Bottomley 
Acked-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] [SCSI] Fix command result state propagation

2014-03-31 Thread Hannes Reinecke

On 03/28/2014 06:51 PM, James Bottomley wrote:
> From: Alan Stern 
> 
> We're seeing a case where the contents of scmd->result isn't being reset after
> a SCSI command encounters an error, is resubmitted, times out and then gets
> handled.  The error handler acts on the stale result of the previous error
> instead of the timeout.  Fix this by properly zeroing the scmd->status before
> the command is resubmitted.
> 
> Signed-off-by: Alan Stern 
> Signed-off-by: James Bottomley 



Acked-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling

2014-03-31 Thread Hannes Reinecke

On 03/28/2014 08:29 PM, Alan Stern wrote:
> On Fri, 28 Mar 2014, James Bottomley wrote:
> 
>> This is a set of three patches we agreed to a while ago to eliminate a
>> USB deadlock.  I did rewrite the first patch, if it could be reviewed
>> and tested.
> 
> I tested all three patches under the same conditions as before, and 
> they all worked correctly.
> 
> In the revised patch 1, the meaning of SCSI_EH_ABORT_SCHEDULED isn't
> entirely clear.  This boils down to two questions, which I don't 
> know the answers to:
> 
>   What should happen in scmd_eh_abort_handler() if
>   scsi_host_eh_past_deadline() returns true and thereby
>   prevents scsi_try_to_abort_cmd() from being called?
>   The flag wouldn't get cleared then.
> 
Ah. Correct. But that's due to the first patch being incorrect.
Cf my response to the original first patch.

>   What should happen if some other pathway manages to call
>   scsi_try_to_abort_cmd() while scmd->abort_work is still
>   sitting on the work queue?  The command would be aborted
>   and the flag would be cleared, but the queued work would
>   remain.  Can this ever happen?
> 
Not that I could see.
A command abort is only ever triggered by the request timeout from
the block layer. And the timer is _not_ rearmed once the timeout
function (here: scsi_times_out()) is called.
Hence I fail to see how it can be called concurrently.

> Maybe scmd_eh_abort_handler() should check the flag before doing
> anything.  Is there any sort of sychronization to prevent the same
> incarnation of a command from being aborted twice (or by two different
> threads at the same time)?  If there is, it isn't obvious.
> 
See above. scsi_times_out() will only ever called once.
What can happen, though, is that _theoretically_ the LLDD might
decide to call ->done() on a timed out command when
scsi_eh_abort_handler() is still pending.


> (Also, what's going on at the start of scsi_abort_command()?  Contrary
> to what one might expect, the first part of the function _cancels_ a
> scheduled abort.  And it does so without clearing the
> SCSI_EH_ABORT_SCHEDULED flag.)
> 
The original idea was this:

SCSI_EH_ABORT_SCHEDULED is sticky _per command_.
Point is, any command abort is only ever send for a timed-out
command. And the main problem for a timed-out command is that we
simply _do not_ know what happened for that command. So _if_ a
command timed out, _and_ we've send an abort, _and_ the command
times out _again_ we'll be running into an endless loop between
timeout and aborting, and never returning the command at all.

So to prevent this we should set a marker on that command telling it
to _not_ try to abort the command again.
Which is what 'SCSI_EH_ABORT_SCHEDULED' was meant for:

- A command times out, sets 'SCSI_EH_ABORT_SCHEDULED'
- abort _succeeds_
- The same command times out a second time, notifies
  that SCSI_EH_ABORT_SCHEDULED is set, and doesn't call
  scsi_eh_abort_command() but rather escalates directly.

_However_, I do feel that we've gotten the wrong track with all of
this. In you original mail you stated:

> Basically, usb-storage deadlocks when the SCSI error handler
> invokes the eh_device_reset_handler callback while a command
> is running.  The command has timed out and will never complete
> normally, because the device's firmware has crashed.  But
> usb-storage's device-reset routine waits for the current command
> to finish, which brings everything to a standstill.

Question now to you as the USB god:

A command abort is only _ever_ send after a command timeout.
And the main idea of the command abort is to remove any context
the LLDD or the target might have. So by the time the command
abort returns SUCCESS we _expect_ the firmware to have cleared that
command. If for some reason the firmware isn't capable of doing so,
it should return FAILED.

So:
- Has the command timeout fired?
- If so, why hasn't it returned FAILED?
- If it had returned SUCCESS, why did the device_reset_handler
  wait for the command to complete?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] m68k/atari - atari_scsi: use correct virt/phys translation for DMA buffer

2014-03-31 Thread Geert Uytterhoeven

Hi James,

If you don't object, I'd like to take this one through the m68k tree, as it
depends on a new m68k platform function.

On Sun, Mar 30, 2014 at 1:01 AM, Michael Schmitz  wrote:
> With new ST-RAM allocation to work also when the kernel is running
> from FastRAM, use dedicated virt/phys translation for this case.
>
> Signed-off-by: Michael Schmitz 
> ---
>  drivers/scsi/atari_scsi.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/scsi/atari_scsi.c b/drivers/scsi/atari_scsi.c
> index 296c936..a8d721f 100644
> --- a/drivers/scsi/atari_scsi.c
> +++ b/drivers/scsi/atari_scsi.c
> @@ -639,7 +639,7 @@ static int __init atari_scsi_detect(struct 
> scsi_host_template *host)
> "double buffer\n");
> return 0;
> }
> -   atari_dma_phys_buffer = virt_to_phys(atari_dma_buffer);
> +   atari_dma_phys_buffer = atari_stram_to_phys(atari_dma_buffer);
> atari_dma_orig_addr = 0;
> }
>  #endif
> --
> 1.7.0.4

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 4/4] m68k/atari - atari_scsi: use correct virt/phys translation for DMA buffer

2014-03-31 Thread Michael Schmitz

With the kernel running from FastRAM instead of ST-RAM, none of ST-RAM is
mapped by mem_init, and DMA-addressable buffer must be mapped by ioremap.

Use platform specific virt/phys translation helpers for this case.

Signed-off-by: Michael Schmitz 
Cc: linux-scsi@vger.kernel.org
---
 drivers/scsi/atari_scsi.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/atari_scsi.c b/drivers/scsi/atari_scsi.c
index 296c936..a8d721f 100644
--- a/drivers/scsi/atari_scsi.c
+++ b/drivers/scsi/atari_scsi.c
@@ -639,7 +639,7 @@ static int __init atari_scsi_detect(struct 
scsi_host_template *host)
"double buffer\n");
return 0;
}
-   atari_dma_phys_buffer = virt_to_phys(atari_dma_buffer);
+   atari_dma_phys_buffer = atari_stram_to_phys(atari_dma_buffer);
atari_dma_orig_addr = 0;
}
 #endif
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] scsi: reintroduce scsi_driver.init_command

2014-03-31 Thread Christoph Hellwig

On Mon, Mar 31, 2014 at 01:56:13AM -0500, Mike Christie wrote:
> The above call would free the cmnd->cmnd and set it to null. If then
> scsi_io_completion was going to do some error processing it looks like
> it could try to access the scsi_cmnd->cmnd field.
> 
> With the current code that would not be a problem because the blk unprep
> callback is not called until the block layer does its request cleanup in
> blk_finish_request which as you know is after
> scsi_io_completion/scsi_end_request is done with the cmnd.

Right, we should probabl call ->uninit_command at that place as well
instead of calling it internall in ->done.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 19/55] scsi: Mark function as static in isci/phy.c

2014-03-31 Thread Dorau, Lukasz

On Saturday, March 29, 2014 7:04 PM Rashika Kheria  
wrote:
> 
> Mark function as static in isci/phy.c because it is not used outside
> this file.
> 
> This eliminates the following warning in isci/phy.c:
> drivers/scsi/isci/phy.c:672:6: warning: no previous prototype for
> ‘scu_link_layer_set_txcomsas_timeout’ [-Wmissing-prototypes]
> 
> Signed-off-by: Rashika Kheria 
> Reviewed-by: Josh Triplett 

Acked-by: Lukasz Dorau 

> ---
>  drivers/scsi/isci/phy.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/isci/phy.c b/drivers/scsi/isci/phy.c
> index cb87b2e..8e21022 100644
> --- a/drivers/scsi/isci/phy.c
> +++ b/drivers/scsi/isci/phy.c
> @@ -669,7 +669,8 @@ static const char *phy_event_name(u32 event_code)
>   phy_state_name(state), phy_event_name(code), code)
> 
> 
> -void scu_link_layer_set_txcomsas_timeout(struct isci_phy *iphy, u32 timeout)
> +static void scu_link_layer_set_txcomsas_timeout(struct isci_phy *iphy,
> + u32 timeout)
>  {
>   u32 val;
> 
> --
> 1.7.9.5

N�r��yb�X��ǧv�^�)޺{.n�+{���"�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

RE: [PATCH 21/55] scsi: Mark function as static in isci/port.c

2014-03-31 Thread Dorau, Lukasz

On Saturday, March 29, 2014 7:07 PM Rashika Kheria  
wrote:
> Mark function as static in isci/port.c because they are not used outside
> this file.
> 
> This eliminates the following warning in isci/port.c:
> drivers/scsi/isci/port.c:65:13: warning: no previous prototype for
> ‘port_state_name’ [-Wmissing-prototypes]
> 
> Signed-off-by: Rashika Kheria 
> Reviewed-by: Josh Triplett 

Acked-by: Lukasz Dorau 

> ---
>  drivers/scsi/isci/port.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/isci/port.c b/drivers/scsi/isci/port.c
> index 13098b0..225d39f 100644
> --- a/drivers/scsi/isci/port.c
> +++ b/drivers/scsi/isci/port.c
> @@ -62,7 +62,7 @@
> 
>  #undef C
>  #define C(a) (#a)
> -const char *port_state_name(enum sci_port_states state)
> +static const char *port_state_name(enum sci_port_states state)
>  {
>   static const char * const strings[] = PORT_STATES;
> 
> --
> 1.7.9.5

RE: [PATCH 20/55] scsi: Mark function as static in isci/remote_device.c

2014-03-31 Thread Dorau, Lukasz

On Saturday, March 29, 2014 7:05 PM Rashika Kheria  
wrote:
> Mark function as static in isci/remote_device.c because it is not used
> outside this file.
> 
> This eliminates the following warning in isci/remote_device.c:
> drivers/scsi/isci/remote_device.c:1387:6: warning: no previous prototype for
> ‘isci_remote_device_wait_for_resume_from_abort’ [-Wmissing-prototypes]
> 
> Signed-off-by: Rashika Kheria 
> Reviewed-by: Josh Triplett 

Acked-by: Lukasz Dorau 

> ---
>  drivers/scsi/isci/remote_device.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/isci/remote_device.c 
> b/drivers/scsi/isci/remote_device.c
> index 96a26f4..33033fb 100644
> --- a/drivers/scsi/isci/remote_device.c
> +++ b/drivers/scsi/isci/remote_device.c
> @@ -1384,7 +1384,7 @@ static bool isci_remote_device_test_resume_done(
>   return done;
>  }
> 
> -void isci_remote_device_wait_for_resume_from_abort(
> +static void isci_remote_device_wait_for_resume_from_abort(
>   struct isci_host *ihost,
>   struct isci_remote_device *idev)
>  {
> --
> 1.7.9.5

N�r��yb�X��ǧv�^�)޺{.n�+{���"�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

Re: [PATCH 19/55] scsi: Mark function as static in isci/phy.c

2014-03-31 Thread Josh Triplett

On Mon, Mar 31, 2014 at 08:54:49AM +, Dorau, Lukasz wrote:
> On Saturday, March 29, 2014 7:04 PM Rashika Kheria  
> wrote:
> > 
> > Mark function as static in isci/phy.c because it is not used outside
> > this file.
> > 
> > This eliminates the following warning in isci/phy.c:
> > drivers/scsi/isci/phy.c:672:6: warning: no previous prototype for
> > ‘scu_link_layer_set_txcomsas_timeout’ [-Wmissing-prototypes]
> > 
> > Signed-off-by: Rashika Kheria 
> > Reviewed-by: Josh Triplett 
> 
> Acked-by: Lukasz Dorau 

Since you're a maintainer of the driver in question, can you accept the
three patches you acked, or would you suggest that they go through
another tree?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 19/55] scsi: Mark function as static in isci/phy.c

2014-03-31 Thread Dorau, Lukasz

On Monday, March 31, 2014 11:36 AM Josh Triplett  wrote:
> On Mon, Mar 31, 2014 at 08:54:49AM +, Dorau, Lukasz wrote:
> > On Saturday, March 29, 2014 7:04 PM Rashika Kheria
>  wrote:
> > >
> > > Mark function as static in isci/phy.c because it is not used outside
> > > this file.
> > >
> > > This eliminates the following warning in isci/phy.c:
> > > drivers/scsi/isci/phy.c:672:6: warning: no previous prototype for
> > > ‘scu_link_layer_set_txcomsas_timeout’ [-Wmissing-prototypes]
> > >
> > > Signed-off-by: Rashika Kheria 
> > > Reviewed-by: Josh Triplett 
> >
> > Acked-by: Lukasz Dorau 
> 
> Since you're a maintainer of the driver in question, can you accept the
> three patches you acked, or would you suggest that they go through
> another tree?
> 
> - Josh Triplett

Hi Josh,

The isci patches should be accepted by James Bottomley.

Lukasz

[Bug 71231] System unresponsable after a lot of LUNs have been added to the system

2014-03-31 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=71231

--- Comment #10 from Alex  ---
The new kernel 3.14 works fine too.
So 3.10 and 3.12 have this problem (and my problem is, that I must use the 3.10
line)

-- 
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] scsi: reintroduce scsi_driver.init_command

2014-03-31 Thread Christoph Hellwig

> The above call would free the cmnd->cmnd and set it to null. If then
> scsi_io_completion was going to do some error processing it looks like
> it could try to access the scsi_cmnd->cmnd field.
> 
> With the current code that would not be a problem because the blk unprep
> callback is not called until the block layer does its request cleanup in
> blk_finish_request which as you know is after
> scsi_io_completion/scsi_end_request is done with the cmnd.

This incremental patches fixes the issue, and makes sure the uninit calls are
nicely paired like the rest of the I/O completion routines after patch 2:


diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 48c5c77..8e79612 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -490,8 +490,6 @@ static void scsi_requeue_command(struct request_queue *q, 
struct scsi_cmnd *cmd)
struct request *req = cmd->request;
unsigned long flags;
 
-   scsi_uninit_command(cmd);
-
spin_lock_irqsave(q->queue_lock, flags);
blk_unprep_request(req);
req->special = NULL;
@@ -941,6 +939,7 @@ requeue:
/* Unprep the request and put it back at the head of the queue.
 * A new command will be prepared and issued.
 */
+   scsi_uninit_command(cmd);
scsi_release_buffers(cmd);
scsi_requeue_command(q, cmd);
break;
@@ -956,6 +955,7 @@ requeue:
return;
 
 next_command:
+   scsi_uninit_command(cmd);
scsi_release_buffers(cmd);
scsi_next_command(cmd);
 }
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d95c4fd..d99cb3f 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1652,8 +1652,6 @@ static int sd_done(struct scsi_cmnd *SCpnt)
unsigned char op = SCpnt->cmnd[0];
unsigned char unmap = SCpnt->cmnd[1] & 8;
 
-   sd_uninit_command(SCpnt);
-
if (req->cmd_flags & REQ_DISCARD || req->cmd_flags & REQ_WRITE_SAME) {
if (!result) {
good_bytes = blk_rq_bytes(req);
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] [SCSI] Fix abort state memory problem

2014-03-31 Thread Alan Stern

On Mon, 31 Mar 2014, Hannes Reinecke wrote:

> On 03/28/2014 06:49 PM, James Bottomley wrote:
> > The abort state is being stored persistently across commands, meaning if the
> > command times out a second time, the error handler thinks an abort is still
> > pending.  Fix this by properly resetting the abort flag after the abort is
> > finished.
> > 
> > Signed-off-by: James Bottomley 
> > ---
> >  drivers/scsi/scsi_error.c | 9 ++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> > index 771c16b..b9f3b07 100644
> > --- a/drivers/scsi/scsi_error.c
> > +++ b/drivers/scsi/scsi_error.c
> > @@ -869,10 +869,13 @@ static int scsi_try_bus_device_reset(struct scsi_cmnd 
> > *scmd)
> >  
> >  static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, struct 
> > scsi_cmnd *scmd)
> >  {
> > -   if (!hostt->eh_abort_handler)
> > -   return FAILED;
> > +   int retval = FAILED;
> > +
> > +   if (hostt->eh_abort_handler)
> > +   retval = hostt->eh_abort_handler(scmd);
> >  
> > -   return hostt->eh_abort_handler(scmd);
> > +   scmd->eh_eflags &= ~SCSI_EH_ABORT_SCHEDULED;
> > +   return retval;
> >  }
> >  
> >  static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
> > 
> Hmm. No, I don't think this is correct.
> 
> SCSI_EH_ABORT_SCHEDULED signifies whether scmd_eh_abort_handler()
> needs to run. As such it needs to be reset when the workqueue item
> is triggered.
> 
> Can you try with this instead?
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 771c16b..9694803 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -120,6 +120,8 @@ scmd_eh_abort_handler(struct work_struct *work)
> struct scsi_device *sdev = scmd->device;
> int rtn;
> 
> +   scmd->eh_eflags &= ~SCSI_EH_ABORT_SCHEDULED;
> +
> if (scsi_host_eh_past_deadline(sdev->host)) {
> SCSI_LOG_ERROR_RECOVERY(3,
> scmd_printk(KERN_INFO, scmd,

This doesn't seem like a good idea either, because the flag is tested 
later on in scsi_decide_disposition().

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling

2014-03-31 Thread Alan Stern

On Mon, 31 Mar 2014, Hannes Reinecke wrote:

> On 03/28/2014 08:29 PM, Alan Stern wrote:
> > On Fri, 28 Mar 2014, James Bottomley wrote:
> > 
> >> This is a set of three patches we agreed to a while ago to eliminate a
> >> USB deadlock.  I did rewrite the first patch, if it could be reviewed
> >> and tested.
> > 
> > I tested all three patches under the same conditions as before, and 
> > they all worked correctly.
> > 
> > In the revised patch 1, the meaning of SCSI_EH_ABORT_SCHEDULED isn't
> > entirely clear.  This boils down to two questions, which I don't 
> > know the answers to:
> > 
> > What should happen in scmd_eh_abort_handler() if
> > scsi_host_eh_past_deadline() returns true and thereby
> > prevents scsi_try_to_abort_cmd() from being called?
> > The flag wouldn't get cleared then.
> > 
> Ah. Correct. But that's due to the first patch being incorrect.
> Cf my response to the original first patch.

See my response to your response.  :-)

> > What should happen if some other pathway manages to call
> > scsi_try_to_abort_cmd() while scmd->abort_work is still
> > sitting on the work queue?  The command would be aborted
> > and the flag would be cleared, but the queued work would
> > remain.  Can this ever happen?
> > 
> Not that I could see.
> A command abort is only ever triggered by the request timeout from
> the block layer. And the timer is _not_ rearmed once the timeout
> function (here: scsi_times_out()) is called.
> Hence I fail to see how it can be called concurrently.

scsi_try_to_abort_cmd() is also called (via a different pathway) when a 
command sent by the error handler itself times out.  I haven't traced 
through all the different paths to make sure none of them can run 
concurrently.  But I'm willing to take your word for it.

> > Maybe scmd_eh_abort_handler() should check the flag before doing
> > anything.  Is there any sort of sychronization to prevent the same
> > incarnation of a command from being aborted twice (or by two different
> > threads at the same time)?  If there is, it isn't obvious.
> > 
> See above. scsi_times_out() will only ever called once.
> What can happen, though, is that _theoretically_ the LLDD might
> decide to call ->done() on a timed out command when
> scsi_eh_abort_handler() is still pending.

That's okay.  We can expect the LLDD to have sufficient locking to
handle that sort of thing without confusion (usb-storage does, for
example).

> > (Also, what's going on at the start of scsi_abort_command()?  Contrary
> > to what one might expect, the first part of the function _cancels_ a
> > scheduled abort.  And it does so without clearing the
> > SCSI_EH_ABORT_SCHEDULED flag.)
> > 
> The original idea was this:
> 
> SCSI_EH_ABORT_SCHEDULED is sticky _per command_.
> Point is, any command abort is only ever send for a timed-out
> command. And the main problem for a timed-out command is that we
> simply _do not_ know what happened for that command. So _if_ a
> command timed out, _and_ we've send an abort, _and_ the command
> times out _again_ we'll be running into an endless loop between
> timeout and aborting, and never returning the command at all.
> 
> So to prevent this we should set a marker on that command telling it
> to _not_ try to abort the command again.

I disagree.  We _have_ to abort the command again -- how else can we
stop a running command?  To prevent the loop you described, we should
avoid _retrying_ the command after it is aborted the second time.

> Which is what 'SCSI_EH_ABORT_SCHEDULED' was meant for:
> 
> - A command times out, sets 'SCSI_EH_ABORT_SCHEDULED'
> - abort _succeeds_
> - The same command times out a second time, notifies
>   that SCSI_EH_ABORT_SCHEDULED is set, and doesn't call
>   scsi_eh_abort_command() but rather escalates directly.

The proper time to escalate is after the command is aborted again, not
while the command is still running.  The only situation where you might
want to escalate while a command is still running would be if you were
unable to abort the command.

(Hmmm, maybe that's not true for SCSI devices in general.  It is true 
for USB mass-storage, however.  Perhaps the reset handlers in 
usb-storage should be changed so that they will automatically abort a 
running command before starting the reset.)

> _However_, I do feel that we've gotten the wrong track with all of
> this. In you original mail you stated:
> 
> > Basically, usb-storage deadlocks when the SCSI error handler
> > invokes the eh_device_reset_handler callback while a command
> > is running.  The command has timed out and will never complete
> > normally, because the device's firmware has crashed.  But
> > usb-storage's device-reset routine waits for the current command
> > to finish, which brings everything to a standstill.
> 
> Question now to you as the USB god:
> 
> A command abort is only _ever_ send after a command timeout.
> And the main idea of the command abort is to remove any context
> the LLDD or

Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling

2014-03-31 Thread Hannes Reinecke

On 03/31/2014 03:33 PM, Alan Stern wrote:
> On Mon, 31 Mar 2014, Hannes Reinecke wrote:
> 
>> On 03/28/2014 08:29 PM, Alan Stern wrote:
>>> On Fri, 28 Mar 2014, James Bottomley wrote:
>>>
 This is a set of three patches we agreed to a while ago to eliminate a
 USB deadlock.  I did rewrite the first patch, if it could be reviewed
 and tested.
>>>
>>> I tested all three patches under the same conditions as before, and 
>>> they all worked correctly.
>>>
>>> In the revised patch 1, the meaning of SCSI_EH_ABORT_SCHEDULED isn't
>>> entirely clear.  This boils down to two questions, which I don't 
>>> know the answers to:
>>>
>>> What should happen in scmd_eh_abort_handler() if
>>> scsi_host_eh_past_deadline() returns true and thereby
>>> prevents scsi_try_to_abort_cmd() from being called?
>>> The flag wouldn't get cleared then.
>>>
>> Ah. Correct. But that's due to the first patch being incorrect.
>> Cf my response to the original first patch.
> 
> See my response to your response.  :-)
> 
Okay, So I probably should refrain from issueing a response to
your response to my response lest infinite recursion happens :-)

>>> What should happen if some other pathway manages to call
>>> scsi_try_to_abort_cmd() while scmd->abort_work is still
>>> sitting on the work queue?  The command would be aborted
>>> and the flag would be cleared, but the queued work would
>>> remain.  Can this ever happen?
>>>
>> Not that I could see.
>> A command abort is only ever triggered by the request timeout from
>> the block layer. And the timer is _not_ rearmed once the timeout
>> function (here: scsi_times_out()) is called.
>> Hence I fail to see how it can be called concurrently.
> 
> scsi_try_to_abort_cmd() is also called (via a different pathway) when a 
> command sent by the error handler itself times out.  I haven't traced 
> through all the different paths to make sure none of them can run 
> concurrently.  But I'm willing to take your word for it.
> 
Yes, but that's not calling scsi_abort_command(), but rather invokes
scsi_abort_eh_cmnd().

>>> Maybe scmd_eh_abort_handler() should check the flag before doing
>>> anything.  Is there any sort of sychronization to prevent the same
>>> incarnation of a command from being aborted twice (or by two different
>>> threads at the same time)?  If there is, it isn't obvious.
>>>
>> See above. scsi_times_out() will only ever called once.
>> What can happen, though, is that _theoretically_ the LLDD might
>> decide to call ->done() on a timed out command when
>> scsi_eh_abort_handler() is still pending.
> 
> That's okay.  We can expect the LLDD to have sufficient locking to
> handle that sort of thing without confusion (usb-storage does, for
> example).
> 
>>> (Also, what's going on at the start of scsi_abort_command()?  Contrary
>>> to what one might expect, the first part of the function _cancels_ a
>>> scheduled abort.  And it does so without clearing the
>>> SCSI_EH_ABORT_SCHEDULED flag.)
>>>
>> The original idea was this:
>>
>> SCSI_EH_ABORT_SCHEDULED is sticky _per command_.
>> Point is, any command abort is only ever send for a timed-out
>> command. And the main problem for a timed-out command is that we
>> simply _do not_ know what happened for that command. So _if_ a
>> command timed out, _and_ we've send an abort, _and_ the command
>> times out _again_ we'll be running into an endless loop between
>> timeout and aborting, and never returning the command at all.
>>
>> So to prevent this we should set a marker on that command telling it
>> to _not_ try to abort the command again.
> 
> I disagree.  We _have_ to abort the command again -- how else can we
> stop a running command?  To prevent the loop you described, we should
> avoid _retrying_ the command after it is aborted the second time.
> 
The actual question is whether it's worth aborting the same command
a second time.
In principle any reset (like LUN reset etc) should clear the
command, too.
And the EH abort functionality is geared around this.
If, for some reason, the transport layer / device driver
requires a command abort to be send then sure, we need
to accommodate for that.

>> Which is what 'SCSI_EH_ABORT_SCHEDULED' was meant for:
>>
>> - A command times out, sets 'SCSI_EH_ABORT_SCHEDULED'
>> - abort _succeeds_
>> - The same command times out a second time, notifies
>>   that SCSI_EH_ABORT_SCHEDULED is set, and doesn't call
>>   scsi_eh_abort_command() but rather escalates directly.
> 
> The proper time to escalate is after the command is aborted again, not
> while the command is still running.  The only situation where you might
> want to escalate while a command is still running would be if you were
> unable to abort the command.
> 
> (Hmmm, maybe that's not true for SCSI devices in general.  It is true 
> for USB mass-storage, however.  Perhaps the reset handlers in 
> usb-storage should be changed so that they will automatically abort a 
> running command before starting t

[PATCH 4/4] blk-mq: support shared tag maps

2014-03-31 Thread Christoph Hellwig

---
 block/blk-mq-tag.c |2 ++
 block/blk-mq.c |   83 +++-
 block/blk-mq.h |2 ++
 include/linux/blk-mq.h |   12 +++
 4 files changed, 91 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 108f82b..a7b1888 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -121,6 +121,8 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int 
total_tags,
if (!tags)
return NULL;
 
+   kref_init(&tags->ref_count);
+
nr_tags = total_tags - reserved_tags;
nr_cache = nr_tags / num_possible_cpus();
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f1b5d52..3d63d71 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1051,8 +1051,10 @@ void blk_mq_free_commands(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_mq_free_commands);
 
-static void blk_mq_free_rq_map(struct blk_mq_tags *tags)
+static void blk_mq_free_rq_map(struct kref *kref)
 {
+   struct blk_mq_tags *tags =
+   container_of(kref, struct blk_mq_tags, ref_count);
struct page *page;
 
while (!list_empty(&tags->page_list)) {
@@ -1066,6 +1068,17 @@ static void blk_mq_free_rq_map(struct blk_mq_tags *tags)
blk_mq_free_tags(tags);
 }
 
+static void blk_mq_put_rq_map(struct blk_mq_tags *tags)
+{
+   kref_put(&tags->ref_count, blk_mq_free_rq_map);
+}
+
+static struct blk_mq_tags *blk_mq_get_rq_map(struct blk_mq_tags *tags)
+{
+   kref_get(&tags->ref_count);
+   return tags;
+}
+
 static size_t order_to_size(unsigned int order)
 {
size_t ret = PAGE_SIZE;
@@ -1144,7 +1157,7 @@ static struct blk_mq_tags *blk_mq_init_rq_map(unsigned 
int total_tags,
 
 fail:
pr_warn("%s: failed to allocate requests\n", __func__);
-   blk_mq_free_rq_map(tags);
+   blk_mq_put_rq_map(tags);
return NULL;
 }
 
@@ -1178,10 +1191,14 @@ static int blk_mq_init_hw_queues(struct request_queue 
*q,
blk_mq_hctx_notify, hctx);
blk_mq_register_cpu_notifier(&hctx->cpu_notifier);
 
-   hctx->tags = blk_mq_init_rq_map(hctx->queue_depth,
-   reg->reserved_tags, reg->cmd_size, node);
-   if (!hctx->tags)
-   break;
+   if (reg->shared_tags) {
+   hctx->tags = 
blk_mq_get_rq_map(reg->shared_tags->tags[i]);
+   } else {
+   hctx->tags = blk_mq_init_rq_map(hctx->queue_depth,
+   reg->reserved_tags, reg->cmd_size, 
node);
+   if (!hctx->tags)
+   break;
+   }
 
/*
 * Allocate space for all possible cpus to avoid allocation in
@@ -1221,7 +1238,7 @@ static int blk_mq_init_hw_queues(struct request_queue *q,
 
blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
if (hctx->tags)
-   blk_mq_free_rq_map(hctx->tags);
+   blk_mq_put_rq_map(hctx->tags);
kfree(hctx->ctxs);
}
 
@@ -1399,7 +1416,7 @@ void blk_mq_free_queue(struct request_queue *q)
kfree(hctx->ctx_map);
kfree(hctx->ctxs);
if (hctx->tags)
-   blk_mq_free_rq_map(hctx->tags);
+   blk_mq_put_rq_map(hctx->tags);
blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
if (q->mq_ops->exit_hctx)
q->mq_ops->exit_hctx(hctx, i);
@@ -1459,6 +1476,56 @@ static int blk_mq_queue_reinit_notify(struct 
notifier_block *nb,
return NOTIFY_OK;
 }
 
+struct blk_mq_shared_tags *blk_mq_alloc_shared_tags(struct blk_mq_reg *reg,
+   int (*init)(void *, struct request *), void *data)
+{
+   struct blk_mq_shared_tags *shared_tags;
+   int i, j;
+
+   shared_tags = kmalloc_node(sizeof(*shared_tags) +
+   reg->nr_hw_queues * sizeof(struct blk_mq_tags),
+   GFP_KERNEL, reg->numa_node);
+   if (!shared_tags)
+   goto out;
+
+   shared_tags->nr_hw_queues = reg->nr_hw_queues;
+   shared_tags->queue_depth = reg->queue_depth;
+   for (i = 0; i < reg->nr_hw_queues; i++) {
+   shared_tags->tags[i] = blk_mq_init_rq_map(reg->queue_depth,
+   reg->reserved_tags, reg->cmd_size, 
reg->numa_node);
+   if (!shared_tags->tags[i])
+   goto out_unwind;
+
+   for (j = 0; j < reg->queue_depth; j++) {
+   struct request *rq = shared_tags->tags[i]->rqs[j];
+   int ret;
+
+   ret = init(data, rq);
+   BUG_ON(ret);
+   }
+   }
+
+   return shared_tags;
+
+out_unwind:
+   while (--i >= 0)
+   blk_mq_put_rq_map(shar

[PATCH 2/4] blk-mq: initialize request on allocation

2014-03-31 Thread Christoph Hellwig

If we want to share tag and request allocation between queues we cannot
initialize the request at init/free time, but need to initialize it
at allocation time as it might get used for different queues over its
lifetime.

Signed-off-by: Christoph Hellwig 
---
 block/blk-mq.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 871acd6..ec0c276 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -82,6 +82,7 @@ static struct request *__blk_mq_alloc_request(struct 
blk_mq_hw_ctx *hctx,
tag = blk_mq_get_tag(hctx->tags, gfp, reserved);
if (tag != BLK_MQ_TAG_FAIL) {
rq = hctx->rqs[tag];
+   blk_rq_init(hctx->queue, rq);
rq->tag = tag;
 
return rq;
@@ -254,9 +255,7 @@ static void __blk_mq_free_request(struct blk_mq_hw_ctx 
*hctx,
const int tag = rq->tag;
struct request_queue *q = rq->q;
 
-   blk_rq_init(hctx->queue, rq);
blk_mq_put_tag(hctx->tags, tag);
-
blk_mq_queue_exit(q);
 }
 
@@ -1128,7 +1127,6 @@ static int blk_mq_init_rq_map(struct blk_mq_hw_ctx *hctx,
left -= to_do * rq_size;
for (j = 0; j < to_do; j++) {
hctx->rqs[i] = p;
-   blk_rq_init(hctx->queue, hctx->rqs[i]);
p += rq_size;
i++;
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] blk-mq: move request structures into struct blk_mq_tags

2014-03-31 Thread Christoph Hellwig

This is in preparation for allowing to share the tags, and thus request
allocation between multiple queues.

Also remove blk_mq_tag_to_rq, as it was unused and thus untestable.  If we
need it back it can easil be re-added as a non-inline function.

Note that we also now straight out fail queue initialization if we can't
allocate tags - keeping track of a reduced queue_depth over a more complex
call chain isn't easil possible and this shouldn't happen on an of todays
systems.

Signed-off-by: Christoph Hellwig 
---
 block/blk-mq-tag.c |   13 
 block/blk-mq.c |   84 +---
 block/blk-mq.h |   18 +++
 include/linux/blk-mq.h |8 -
 4 files changed, 61 insertions(+), 62 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 83ae96c..108f82b 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -7,19 +7,6 @@
 #include "blk-mq.h"
 #include "blk-mq-tag.h"
 
-/*
- * Per tagged queue (tag address space) map
- */
-struct blk_mq_tags {
-   unsigned int nr_tags;
-   unsigned int nr_reserved_tags;
-   unsigned int nr_batch_move;
-   unsigned int nr_max_cache;
-
-   struct percpu_ida free_tags;
-   struct percpu_ida reserved_tags;
-};
-
 void blk_mq_wait_for_tags(struct blk_mq_tags *tags)
 {
int tag = blk_mq_get_tag(tags, __GFP_WAIT, false);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ec0c276..f1b5d52 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -81,7 +81,7 @@ static struct request *__blk_mq_alloc_request(struct 
blk_mq_hw_ctx *hctx,
 
tag = blk_mq_get_tag(hctx->tags, gfp, reserved);
if (tag != BLK_MQ_TAG_FAIL) {
-   rq = hctx->rqs[tag];
+   rq = hctx->tags->rqs[tag];
blk_rq_init(hctx->queue, rq);
rq->tag = tag;
 
@@ -406,7 +406,9 @@ static void blk_mq_timeout_check(void *__data, unsigned 
long *free_tags)
if (tag >= hctx->queue_depth)
break;
 
-   rq = hctx->rqs[tag++];
+   rq = hctx->tags->rqs[tag++];
+   if (rq->q != hctx->queue)
+   continue;
 
if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
continue;
@@ -993,7 +995,7 @@ static int blk_mq_init_hw_commands(struct blk_mq_hw_ctx 
*hctx,
int ret = 0;
 
for (i = 0; i < hctx->queue_depth; i++) {
-   struct request *rq = hctx->rqs[i];
+   struct request *rq = hctx->tags->rqs[i];
 
ret = init(data, hctx, rq, i);
if (ret)
@@ -1030,7 +1032,7 @@ static void blk_mq_free_hw_commands(struct blk_mq_hw_ctx 
*hctx,
unsigned int i;
 
for (i = 0; i < hctx->queue_depth; i++) {
-   struct request *rq = hctx->rqs[i];
+   struct request *rq = hctx->tags->rqs[i];
 
free(data, hctx, rq, i);
}
@@ -1049,20 +1051,19 @@ void blk_mq_free_commands(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_mq_free_commands);
 
-static void blk_mq_free_rq_map(struct blk_mq_hw_ctx *hctx)
+static void blk_mq_free_rq_map(struct blk_mq_tags *tags)
 {
struct page *page;
 
-   while (!list_empty(&hctx->page_list)) {
-   page = list_first_entry(&hctx->page_list, struct page, lru);
+   while (!list_empty(&tags->page_list)) {
+   page = list_first_entry(&tags->page_list, struct page, lru);
list_del_init(&page->lru);
__free_pages(page, page->private);
}
 
-   kfree(hctx->rqs);
+   kfree(tags->rqs);
 
-   if (hctx->tags)
-   blk_mq_free_tags(hctx->tags);
+   blk_mq_free_tags(tags);
 }
 
 static size_t order_to_size(unsigned int order)
@@ -1075,28 +1076,35 @@ static size_t order_to_size(unsigned int order)
return ret;
 }
 
-static int blk_mq_init_rq_map(struct blk_mq_hw_ctx *hctx,
- unsigned int reserved_tags, int node)
+static struct blk_mq_tags *blk_mq_init_rq_map(unsigned int total_tags,
+   unsigned int reserved_tags, unsigned int cmd_size, int node)
 {
+   struct blk_mq_tags *tags;
unsigned int i, j, entries_per_page, max_order = 4;
size_t rq_size, left;
 
-   INIT_LIST_HEAD(&hctx->page_list);
+   tags = blk_mq_init_tags(total_tags, reserved_tags, node);
+   if (!tags)
+   return NULL;
+
+   INIT_LIST_HEAD(&tags->page_list);
 
-   hctx->rqs = kmalloc_node(hctx->queue_depth * sizeof(struct request *),
+   tags->rqs = kmalloc_node(total_tags * sizeof(struct request *),
GFP_KERNEL, node);
-   if (!hctx->rqs)
-   return -ENOMEM;
+   if (!tags->rqs) {
+   blk_mq_free_tags(tags);
+   return NULL;
+   }
 
/*
 * rq_size is the size of the request plus driver payload, rounded
 * to the cacheline size
 */
-

[PATCH 1/4] blk->mq: stop pre-initializing req->special

2014-03-31 Thread Christoph Hellwig

We can get at the private data easil using pointer arithmetics.  Do so
instead of initializing req->special so that we don't rely on the
request state in various initialization functions and shave off another
few instructions in the fast path.

Signed-off-by: Christoph Hellwig 
---
 block/blk-flush.c  |   10 ++
 block/blk-mq.c |   15 ++-
 block/blk-mq.h |1 -
 drivers/block/null_blk.c   |4 ++--
 drivers/block/virtio_blk.c |6 +++---
 5 files changed, 9 insertions(+), 27 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 43e6b47..9a0c427 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -306,22 +306,16 @@ static bool blk_kick_flush(struct request_queue *q)
 */
q->flush_pending_idx ^= 1;
 
+   blk_rq_init(q, q->flush_rq);
if (q->mq_ops) {
-   struct blk_mq_ctx *ctx = first_rq->mq_ctx;
-   struct blk_mq_hw_ctx *hctx = q->mq_ops->map_queue(q, ctx->cpu);
-
-   blk_mq_rq_init(hctx, q->flush_rq);
-   q->flush_rq->mq_ctx = ctx;
-
/*
 * Reuse the tag value from the fist waiting request,
 * with blk-mq the tag is generated during request
 * allocation and drivers can rely on it being inside
 * the range they asked for.
 */
+   q->flush_rq->mq_ctx = first_rq->mq_ctx;
q->flush_rq->tag = first_rq->tag;
-   } else {
-   blk_rq_init(q, q->flush_rq);
}
 
q->flush_rq->cmd_type = REQ_TYPE_FS;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4274ee0..871acd6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -248,24 +248,13 @@ struct request *blk_mq_alloc_reserved_request(struct 
request_queue *q, int rw,
 }
 EXPORT_SYMBOL(blk_mq_alloc_reserved_request);
 
-/*
- * Re-init and set pdu, if we have it
- */
-void blk_mq_rq_init(struct blk_mq_hw_ctx *hctx, struct request *rq)
-{
-   blk_rq_init(hctx->queue, rq);
-
-   if (hctx->cmd_size)
-   rq->special = blk_mq_rq_to_pdu(rq);
-}
-
 static void __blk_mq_free_request(struct blk_mq_hw_ctx *hctx,
  struct blk_mq_ctx *ctx, struct request *rq)
 {
const int tag = rq->tag;
struct request_queue *q = rq->q;
 
-   blk_mq_rq_init(hctx, rq);
+   blk_rq_init(hctx->queue, rq);
blk_mq_put_tag(hctx->tags, tag);
 
blk_mq_queue_exit(q);
@@ -1139,7 +1128,7 @@ static int blk_mq_init_rq_map(struct blk_mq_hw_ctx *hctx,
left -= to_do * rq_size;
for (j = 0; j < to_do; j++) {
hctx->rqs[i] = p;
-   blk_mq_rq_init(hctx, hctx->rqs[i]);
+   blk_rq_init(hctx->queue, hctx->rqs[i]);
p += rq_size;
i++;
}
diff --git a/block/blk-mq.h b/block/blk-mq.h
index ebbe6ba..238379a 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,7 +27,6 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool 
async);
 void blk_mq_init_flush(struct request_queue *q);
 void blk_mq_drain_queue(struct request_queue *q);
 void blk_mq_free_queue(struct request_queue *q);
-void blk_mq_rq_init(struct blk_mq_hw_ctx *hctx, struct request *rq);
 
 /*
  * CPU hotplug helpers
diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index 091b9ea..71df69d 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -226,7 +226,7 @@ static void null_cmd_end_timer(struct nullb_cmd *cmd)
 
 static void null_softirq_done_fn(struct request *rq)
 {
-   end_cmd(rq->special);
+   end_cmd(blk_mq_rq_to_pdu(rq));
 }
 
 static inline void null_handle_cmd(struct nullb_cmd *cmd)
@@ -311,7 +311,7 @@ static void null_request_fn(struct request_queue *q)
 
 static int null_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *rq)
 {
-   struct nullb_cmd *cmd = rq->special;
+   struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq);
 
cmd->rq = rq;
cmd->nq = hctx->driver_data;
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 0eace43..11e8f4b 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -112,7 +112,7 @@ static int __virtblk_add_req(struct virtqueue *vq,
 
 static inline void virtblk_request_done(struct request *req)
 {
-   struct virtblk_req *vbr = req->special;
+   struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
int error = virtblk_result(vbr);
 
if (req->cmd_type == REQ_TYPE_BLOCK_PC) {
@@ -154,7 +154,7 @@ static void virtblk_done(struct virtqueue *vq)
 static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
 {
struct virtio_blk *vblk = hctx->queue->queuedata;
-   struct virtblk_req *vbr = req->special;
+   struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
unsigned long flags;
unsigned int num;
const bool last = (r

[RFC] blk-mq: support for shared tags

2014-03-31 Thread Christoph Hellwig

This series adds support for sharing tags (and thus requests) between
multiple request_queues.  We'll need this for SCSI, and I think Martin
also wants something similar for nvme.

Besides the mess with request contructors/destructors the major RFC here
is how the blk_mq_alloc_shared_tags API should look like.  For now I've
been lazy and reused struct blk_mq_reg, but that feels a bit cumbersome.
Either a separate blk_mq_tags_reg or just passing the few arguments directly
would work fine for me.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling

2014-03-31 Thread James Bottomley

[lets split the thread]
On Mon, 2014-03-31 at 16:37 +0200, Hannes Reinecke wrote:
> On 03/31/2014 03:33 PM, Alan Stern wrote:
> > On Mon, 31 Mar 2014, Hannes Reinecke wrote:
> >> On 03/28/2014 08:29 PM, Alan Stern wrote:
> >>> On Fri, 28 Mar 2014, James Bottomley wrote:
> >>> Maybe scmd_eh_abort_handler() should check the flag before doing
> >>> anything.  Is there any sort of sychronization to prevent the same
> >>> incarnation of a command from being aborted twice (or by two different
> >>> threads at the same time)?  If there is, it isn't obvious.
> >>>
> >> See above. scsi_times_out() will only ever called once.
> >> What can happen, though, is that _theoretically_ the LLDD might
> >> decide to call ->done() on a timed out command when
> >> scsi_eh_abort_handler() is still pending.
> > 
> > That's okay.  We can expect the LLDD to have sufficient locking to
> > handle that sort of thing without confusion (usb-storage does, for
> > example).
> > 
> >>> (Also, what's going on at the start of scsi_abort_command()?  Contrary
> >>> to what one might expect, the first part of the function _cancels_ a
> >>> scheduled abort.  And it does so without clearing the
> >>> SCSI_EH_ABORT_SCHEDULED flag.)
> >>>
> >> The original idea was this:
> >>
> >> SCSI_EH_ABORT_SCHEDULED is sticky _per command_.
> >> Point is, any command abort is only ever send for a timed-out
> >> command. And the main problem for a timed-out command is that we
> >> simply _do not_ know what happened for that command. So _if_ a
> >> command timed out, _and_ we've send an abort, _and_ the command
> >> times out _again_ we'll be running into an endless loop between
> >> timeout and aborting, and never returning the command at all.
> >>
> >> So to prevent this we should set a marker on that command telling it
> >> to _not_ try to abort the command again.
> > 
> > I disagree.  We _have_ to abort the command again -- how else can we
> > stop a running command?  To prevent the loop you described, we should
> > avoid _retrying_ the command after it is aborted the second time.
> > 
> The actual question is whether it's worth aborting the same command
> a second time.
> In principle any reset (like LUN reset etc) should clear the
> command, too.
> And the EH abort functionality is geared around this.
> If, for some reason, the transport layer / device driver
> requires a command abort to be send then sure, we need
> to accommodate for that.

We already discussed this (that was my first response too).  USB needs
all outstanding commands aborted before proceeding to reset.  I'm
starting to think the actual way to fix this is to reset the abort
scheduled only if we send something else, so this might be a better fix.

It doesn't matter if we finish a command with abort scheduled because
the command then gets freed and the flags cleaned.

We can take our time with this because the other two patches, which I
can send separately fix the current deadlock (we no longer send an
unaborted request sense before the reset), and it can go via rc fixes.

James

---

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 771c16b..3cfd2bf 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1007,6 +1007,12 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
unsigned char *cmnd,
const unsigned long stall_for = msecs_to_jiffies(100);
int rtn;
 
+   /*
+* We're sending another command we'll need to abort, so reset the
+* abort scheduled flag on the real command before we save its state
+*/
+   scmd->eh_eflags &= ~SCSI_EH_ABORT_SCHEDULED;
+
 retry:
scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes);
shost->eh_action = &done;


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/4] tcm_qla2xxx: T10-Dif set harware capability

2014-03-31 Thread Quinn Tran


Regards,
Quinn Tran




On 3/28/14 5:12 PM, "sagi grimberg"  wrote:

>On 3/29/2014 2:05 AM, Quinn Tran wrote:
>> Set Protection Type(1,2,3) capabilities, Guarg type (CRC/IPchksm)
>> capabilities bits to let TCM core knows of HW/fabric capabilities.
>>
>> Signed-off-by: Nicholas Bellinger 
>> Signed-off-by: Giridhar Malavali 
>> ---
>>   drivers/scsi/qla2xxx/tcm_qla2xxx.c | 23 +++
>>   drivers/scsi/qla2xxx/tcm_qla2xxx.h |  1 +
>>   2 files changed, 24 insertions(+)
>>
>> diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
>>b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
>> index b23a0ff..4d93081 100644
>> --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
>> +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
>> @@ -910,12 +910,20 @@ DEF_QLA_TPG_ATTR_BOOL(demo_mode_login_only);
>>   DEF_QLA_TPG_ATTRIB(demo_mode_login_only);
>>   QLA_TPG_ATTR(demo_mode_login_only, S_IRUGO | S_IWUSR);
>>
>> +/*
>> + * Define tcm_qla2xxx_tpg_attrib_s_t10dif_force_on
>> + */
>> +DEF_QLA_TPG_ATTR_BOOL(t10dif_force_on);
>> +DEF_QLA_TPG_ATTRIB(t10dif_force_on);
>> +QLA_TPG_ATTR(t10dif_force_on, S_IRUGO | S_IWUSR);
>> +
>>   static struct configfs_attribute *tcm_qla2xxx_tpg_attrib_attrs[] = {
>>  &tcm_qla2xxx_tpg_attrib_generate_node_acls.attr,
>>  &tcm_qla2xxx_tpg_attrib_cache_dynamic_acls.attr,
>>  &tcm_qla2xxx_tpg_attrib_demo_mode_write_protect.attr,
>>  &tcm_qla2xxx_tpg_attrib_prod_mode_write_protect.attr,
>>  &tcm_qla2xxx_tpg_attrib_demo_mode_login_only.attr,
>> +&tcm_qla2xxx_tpg_attrib_t10dif_force_on.attr,
>>  NULL,
>>   };
>>
>> @@ -1049,6 +1057,18 @@ static struct se_portal_group
>>*tcm_qla2xxx_make_tpg(
>>  tpg->tpg_attrib.demo_mode_write_protect = 1;
>>  tpg->tpg_attrib.cache_dynamic_acls = 1;
>>  tpg->tpg_attrib.demo_mode_login_only = 1;
>> +tpg->tpg_attrib.t10dif_force_on = 0;
>> +tpg->se_tpg.fabric_sup_prot_type = 0;
>> +tpg->se_tpg.fabric_sup_guard_type = 0;
>
>You can lose guard_type - this is more relevant to the initiator side.

QT> OK

>
>> +
>> +if (scsi_host_get_prot(lport->qla_vha->host)) {
>> +tpg->se_tpg.fabric_sup_prot_type = (TARGET_DIF_TYPE0_PROT|
>> +TARGET_DIF_TYPE1_PROT|TARGET_DIF_TYPE2_PROT|
>> +TARGET_DIF_TYPE3_PROT);
>> +
>> +tpg->se_tpg.fabric_sup_guard_type = TARGET_GUARD_CRC|
>> +TARGET_GUARD_IP;
>> +}
>>
>>  ret = core_tpg_register(&tcm_qla2xxx_fabric_configfs->tf_ops, wwn,
>>  &tpg->se_tpg, tpg, TRANSPORT_TPG_TYPE_NORMAL);
>> @@ -1127,6 +1147,8 @@ static ssize_t tcm_qla2xxx_npiv_tpg_store_enable(
>>  qlt_stop_phase1(vha->vha_tgt.qla_tgt);
>>  }
>>
>> +core_tpg_set_fabric_t10dif(se_tpg, tpg->tpg_attrib.t10dif_force_on);
>> +
>
>Any way we can get this logic to be shared also with iscsi, srp, etc...
>all fabrics should
>be set with t10dif right? so I would imagine it would be better to
>centralize it right?

QT> Not sure how you want this logic to be shared.  This patch is specific
to Qlogic driver registering its capabilities.


>
>>  return count;
>>   }
>>
>> @@ -1169,6 +1191,7 @@ static struct se_portal_group
>>*tcm_qla2xxx_npiv_make_tpg(
>>  tpg->tpg_attrib.demo_mode_write_protect = 1;
>>  tpg->tpg_attrib.cache_dynamic_acls = 1;
>>  tpg->tpg_attrib.demo_mode_login_only = 1;
>> +tpg->tpg_attrib.t10dif_force_on = 0;
>>
>>  ret = core_tpg_register(&tcm_qla2xxx_npiv_fabric_configfs->tf_ops,
>>wwn,
>>  &tpg->se_tpg, tpg, TRANSPORT_TPG_TYPE_NORMAL);
>> diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.h
>>b/drivers/scsi/qla2xxx/tcm_qla2xxx.h
>> index 33aaac8..62fdce3 100644
>> --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.h
>> +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.h
>> @@ -28,6 +28,7 @@ struct tcm_qla2xxx_tpg_attrib {
>>  int demo_mode_write_protect;
>>  int prod_mode_write_protect;
>>  int demo_mode_login_only;
>> +int t10dif_force_on;
>>   };
>>
>>   struct tcm_qla2xxx_tpg {




This message and any attached documents contain information from QLogic 
Corporation or its wholly-owned subsidiaries that may be confidential. If you 
are not the intended recipient, you may not read, copy, distribute, or use this 
information. If you have received this transmission in error, please notify the 
sender immediately by reply e-mail and then delete this message.
<>

Re: [PATCH 3/4] target/rd: T10-Dif: Add init/format support

2014-03-31 Thread Quinn Tran


Regards,
Quinn Tran




On 3/28/14 5:16 PM, "sagi grimberg"  wrote:

>On 3/29/2014 2:05 AM, Quinn Tran wrote:
>> This patch is borrow code from
>>
>> commit 0f5e2ec46dd64579c0770f3822764f94db4fa465
>> Author: Nicholas Bellinger 
>> Date:   Sat Jan 18 09:32:56 2014 +
>>
>>  target/file: Add DIF protection init/format support
>>
>>  This patch adds support for DIF protection init/format support into
>>  the FILEIO backend.
>>
>>  It involves using a seperate $FILE.protection for storing PI that
>>is
>>  opened via fd_init_prot() using the common pi_prot_type attribute.
>>  The actual formatting of the protection is done via
>>fd_format_prot()
>>  using the common pi_prot_format attribute, that will populate the
>>  initial PI data based upon the currently configured pi_prot_type.
>>
>>  Based on original FILEIO code from Sagi.
>>
>>  v1 changes:
>>- Fix sparse warnings in fd_init_format_buf (Fengguang)
>>
>>  Cc: Martin K. Petersen 
>>  Cc: Christoph Hellwig 
>>  Cc: Hannes Reinecke 
>>  Cc: Sagi Grimberg 
>>  Cc: Or Gerlitz 
>>  Signed-off-by: Nicholas Bellinger 
>> ---
>>   drivers/target/target_core_rd.c | 64
>>-
>>   1 file changed, 63 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/target/target_core_rd.c
>>b/drivers/target/target_core_rd.c
>> index 66a5aba..01dda0b 100644
>> --- a/drivers/target/target_core_rd.c
>> +++ b/drivers/target/target_core_rd.c
>> @@ -603,7 +603,7 @@ static int rd_init_prot(struct se_device *dev)
>>   {
>>  struct rd_dev *rd_dev = RD_DEV(dev);
>>
>> -if (!dev->dev_attrib.pi_prot_type)
>> +if (!dev->dev_attrib.pi_prot_type)
>>  return 0;
>>
>>  return rd_build_prot_space(rd_dev, dev->prot_length);
>> @@ -616,6 +616,67 @@ static void rd_free_prot(struct se_device *dev)
>>  rd_release_prot_space(rd_dev);
>>   }
>>
>> +static void rd_init_format_buf(struct se_device *dev, unsigned char
>>*buf,
>> +   u32 unit_size, u32 *ref_tag, u16 app_tag,
>> +   bool inc_reftag)
>> +{
>> +unsigned char *p = buf;
>> +int i;
>> +
>> +for (i = 0; i < unit_size; i += dev->prot_length) {
>> +*((u16 *)&p[0]) = 0x;
>> +*((__be16 *)&p[2]) = cpu_to_be16(app_tag);
>> +*((__be32 *)&p[4]) = cpu_to_be32(*ref_tag);
>> +
>> +if (inc_reftag)
>> +(*ref_tag)++;
>> +
>> +p += dev->prot_length;
>> +}
>> +}
>> +
>> +static int rd_format_prot(struct se_device *dev)
>> +{
>> +struct rd_dev *rd_dev = RD_DEV(dev);
>> +u32 ref_tag = 0;
>> +int i,j;
>> +bool inc_reftag = false;
>> +struct rd_dev_sg_table *pt;
>> +struct scatterlist *sg;
>> +void *paddr;
>> +
>> +if (!dev->dev_attrib.pi_prot_type) {
>> +pr_err("Unable to format_prot while pi_prot_type == 0\n");
>> +return -ENODEV;
>> +}
>> +
>> +switch (dev->dev_attrib.pi_prot_type) {
>> +case TARGET_DIF_TYPE3_PROT:
>> +ref_tag = 0x;
>> +break;
>> +case TARGET_DIF_TYPE2_PROT:
>> +case TARGET_DIF_TYPE1_PROT:
>> +inc_reftag = true;
>> +break;
>> +default:
>> +break;
>> +}
>> +
>> +for (i=0; i < rd_dev->sg_prot_count; i++) {
>> +pt= &rd_dev->sg_prot_array[i];
>> +for_each_sg(pt->sg_table, sg, pt->rd_sg_count, j) {
>> +paddr = kmap(sg_page(sg)) + sg->offset;
>> +
>> +rd_init_format_buf(dev, paddr, sg->length, &ref_tag,
>> +0x, inc_reftag);
>> +kunmap(paddr);
>> +}
>> +}
>> +
>> +return 0;
>> +}
>> +
>> +
>>   static struct sbc_ops rd_sbc_ops = {
>>  .execute_rw = rd_execute_rw,
>>   };
>> @@ -642,6 +703,7 @@ static struct se_subsystem_api rd_mcp_template = {
>>  .get_device_type= sbc_get_device_type,
>>  .get_blocks = rd_get_blocks,
>>  .init_prot  = rd_init_prot,
>> +.format_prot= rd_format_prot,
>>  .free_prot  = rd_free_prot,
>>   };
>>
>
>This patch is redundant altogether. rd_mcp already init protection with
>escape values:
>rc = rd_allocate_sgl_table(rd_dev, sg_table, total_sg_needed, 0xff);

QT> I see.  Will re-examine my code to live without this function.

>




This message and any attached documents contain information from QLogic 
Corporation or its wholly-owned subsidiaries that may be confidential. If you 
are not the intended recipient, you may not read, copy, distribute, or use this 
information. If you have received this transmission in error, please notify the 
sender immediately by reply e-mail and then delete this message.
<>

Re: [PATCH 4/4] target/rd: T10-Dif: RAM disk is allocating more space than required.

2014-03-31 Thread Quinn Tran


Regards,
Quinn Tran




On 3/28/14 5:22 PM, "sagi grimberg"  wrote:

>On 3/29/2014 2:05 AM, Quinn Tran wrote:
>> Ram disk is allocating 8x more space than required for diff data.
>> For large RAM disk test, there is small potential for memory
>> starvation.
>>
>> Signed-off-by: Nicholas Bellinger 
>> Signed-off-by: Giridhar Malavali 
>> ---
>>   drivers/target/target_core_rd.c | 6 +-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/target/target_core_rd.c
>>b/drivers/target/target_core_rd.c
>> index 01dda0b..6df177a 100644
>> --- a/drivers/target/target_core_rd.c
>> +++ b/drivers/target/target_core_rd.c
>> @@ -253,7 +253,11 @@ static int rd_build_prot_space(struct rd_dev
>>*rd_dev, int prot_length)
>>  if (rd_dev->rd_flags & RDF_NULLIO)
>>  return 0;
>>
>> -total_sg_needed = rd_dev->rd_page_count / prot_length;
>> +/* prot_length=8byte dif data
>> + * tot sg needed = rd_page_count * (PGSZ/512) * (prot_length/PGSZ) +
>>pad
>> + * PGSZ canceled each other.
>> + */
>> +total_sg_needed = (rd_dev->rd_page_count * prot_length / 512) +1;
>
>You probably will want to use block_size instead of hard-coding 512.
>Other then that this makes sense.

QT> agreed

>
>>
>>  sg_tables = (total_sg_needed / max_sg_per_table) + 1;
>>
>




This message and any attached documents contain information from QLogic 
Corporation or its wholly-owned subsidiaries that may be confidential. If you 
are not the intended recipient, you may not read, copy, distribute, or use this 
information. If you have received this transmission in error, please notify the 
sender immediately by reply e-mail and then delete this message.
<>

scsi_debug driver puzzle

2014-03-31 Thread Laurence Oberman

Hello

I have what to me is a puzzle but is likely a stupid question about
the queuecommand interface in the scsi_debug driver.

I see the host template set for  scsi_debug_queuecommand but in the
driver we have the function declared as int
scsi_debug_queuecommand_lck
So how is this working.

Egrep pattern: scsi_debug_queuecommand

  File Line
0 scsi_debug.c 3551 int scsi_debug_queuecommand_lck(struct scsi_cmnd
*SCpnt, done_funct_t done)
1 scsi_debug.c 3900 static DEF_SCSI_QCMD(scsi_debug_queuecommand)
2 scsi_debug.c 3912 .queuecommand =  scsi_debug_queuecommand,

Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] target/core: T10-Dif: check HW support capabilities

2014-03-31 Thread Quinn Tran


Regards,
Quinn Tran




On 3/28/14 6:24 PM, "sagi grimberg"  wrote:

>On 3/29/2014 3:53 AM, Quinn Tran wrote:
>> +
>> +if (dev->dev_attrib.pi_prot_type) {
>> +uint32_t cap[] = { 0,
>> +   TARGET_DIF_TYPE1_PROTECTION,
>> +   TARGET_DIF_TYPE2_PROTECTION,
>> +   TARGET_DIF_TYPE3_PROTECTION};
>> +uint32_t pt_bits = cap[dev->dev_attrib.pi_prot_type];
>> +pt_bits &= se_tpg->fabric_sup_prot_type;
>>> At what point should the fabric fill that? it can vary between portals
>>> right?
>> QT> Yes, protection mode can vary between portals. When user select the
>> physical function (via fabric_make_tpg), you know the specific portal
>> based on user input and its capability. This is where Qlogic register
>>its
>> capabilities based on specific hardware.
>>
>>
>>> I would prefer to do that as a function pointer to explicitly ask the
>>> fabric it's support.
>> QT> is it still require with previous answer ?
>>
>
>Well, I think it is nicer to explicitly ask the fabric at that point
>instead of checking what it previously set.
>
>By the way, I think this patch breaks existing iSER support as iSER
>doesn't set these bits.
>Thats why I think it would be a good idea to *explicitly* ask.

QT> I see.  No issue with converting to a callback.

>
>>
 +pr_err("dev[%p]: DIF protection mismatch with
fabric "
 +"(%s). Transport capability
bits[0x%x]\n",
 +dev,
se_tpg->se_tpg_wwn->wwn_group.cg_item.ci_name,
 +se_tpg->fabric_sup_prot_type);
 +return -EFAULT;
>>> Didn't we agree that this is allowed and the target core should
>>> compensate on the lack fabric support?
>>  My recollection was that it's not recommended to have different
>> portals with different supported feature.
>
>Well we seem to remember different things...
>Anyway I think it is better to compensate that in backstore/target-core
>level, that would be better
>than silently turn off protection. Martin? Nic? your takes?

QT> the error return above fail the binding (ln -sf  ) between the back disk and the frontend/fabric LUN representation.
The failure happens during configuration time.  The commented out code
imply a silent turn off. Sorry should have clean it out.


>
>Also I don't know what rats are hiding here if the backstore is handling
>IOs in this time...
>What about cleaning up all the protection resources the backstore driver
>might be managing?

QT> hmm.  It's a big hammer.  I'll let the other folks chime in on this
because it affect user's interaction.  Nicholas ? Martin?

>
>>   Meaning a SCSI Write without Dif
>> down a none-T10PI portal update the data.  The Guard on the disk is now
>> mismatch with the data. A SCSI Read down a T10PI path will run into a
>> Guard failure.
>>
>> By introducing this option, Disk vendor require additional logic to
>> compensate for this. I think it's cheaper to have supported hardware
>> rather than support nightmare.
>>
 +}
 +}
 +
   if (lun->lun_se_dev !=  NULL) {
   pr_err("Port Symlink already exists\n");
   return -EEXIST;
 diff --git a/drivers/target/target_core_tpg.c
 b/drivers/target/target_core_tpg.c
 index c036595..9279971 100644
 --- a/drivers/target/target_core_tpg.c
 +++ b/drivers/target/target_core_tpg.c
 @@ -632,6 +632,15 @@ int core_tpg_set_initiator_node_tag(
}
EXPORT_SYMBOL(core_tpg_set_initiator_node_tag);

 +void core_tpg_set_fabric_t10dif(
 +struct se_portal_group *tpg,
 +uint32_t fabric_t10dif_force_on)
 +{
 +tpg->fabric_t10dif_force_on = fabric_t10dif_force_on;
 +}
 +EXPORT_SYMBOL(core_tpg_set_fabric_t10dif);
 +
>>> Is there a user for this function in this patch?
>> QT> I'm on the fence with this piece.  Just thinking of a case where DIX
>> is not available for initiator side, but user wants to turn on
>>protection
>> at the link layer.  Our test folks would like to cover this case in the
>> future.
>
>Not sure I understand. Initiator will send the target data+protection
>(DIX disabled HBA does INSERT), why does this help?
>Why should the target fabric care who generated the protection (OS or
>HBA)?

QT> Sorry for the confusion.  The case I'm trying to get at is "Data In
Insert" and "Data out strip".In this case, the protection starts from
front end target adapter to the back end storage.  In revisit your
previous patch, this case is not covered.


>
>Sagi.




This message and any attached documents contain information from QLogic 
Corporation or its wholly-owned subsidiaries that may be confidential. If you 
are not the intended recipient, you may not read, copy, distribute, or use this 
information. If you have received this transmission

Re: scsi_debug driver puzzle

2014-03-31 Thread Bryn M. Reeves

On 03/31/2014 06:32 PM, Laurence Oberman wrote:
>   File Line
> 0 scsi_debug.c 3551 int scsi_debug_queuecommand_lck(struct scsi_cmnd
> *SCpnt, done_funct_t done)
> 1 scsi_debug.c 3900 static DEF_SCSI_QCMD(scsi_debug_queuecommand)
> 2 scsi_debug.c 3912 .queuecommand =  scsi_debug_queuecommand,

Magical scsi_host.h macro as part of the effort to push host_lock down.
The macro creates a function named as its argument which takes and drops
the shost->host_lock around a call to the real queuecommand function:

spin_lock_irqsave(shost->host_lock, irq_flags);
scsi_cmd_get_serial(shost, cmd);
rc = func_name##_lck (cmd,cmd->scsi_done);
spin_unlock_irqrestore(shost->host_lock, irq_flags);

http://fpaste.org/90345/88875139/

Cheers,
Bryn.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4 v2] scsi: reintroduce scsi_driver.init_command

2014-03-31 Thread Christoph Hellwig

Move control of the prep_fn back from the ULDs into the scsi layer.  Besides
cleaning up the code this also prepares for usinng blk-mq, which doesn't
have equivalent functionality to the prep_fn method and requires the driver
to provide just a single I/O submission method.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/scsi_lib.c|   66 ++--
 drivers/scsi/sd.c  |   44 ++---
 drivers/scsi/sr.c  |   19 -
 include/scsi/scsi_driver.h |8 ++
 4 files changed, 61 insertions(+), 76 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index a73751b..ed81fc3 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1071,15 +1071,7 @@ static struct scsi_cmnd *scsi_get_cmd_from_req(struct 
scsi_device *sdev,
 
 int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
 {
-   struct scsi_cmnd *cmd;
-   int ret = scsi_prep_state_check(sdev, req);
-
-   if (ret != BLKPREP_OK)
-   return ret;
-
-   cmd = scsi_get_cmd_from_req(sdev, req);
-   if (unlikely(!cmd))
-   return BLKPREP_DEFER;
+   struct scsi_cmnd *cmd = req->special;
 
/*
 * BLOCK_PC requests may transfer data, in which case they must
@@ -1123,15 +1115,11 @@ EXPORT_SYMBOL(scsi_setup_blk_pc_cmnd);
  */
 int scsi_setup_fs_cmnd(struct scsi_device *sdev, struct request *req)
 {
-   struct scsi_cmnd *cmd;
-   int ret = scsi_prep_state_check(sdev, req);
-
-   if (ret != BLKPREP_OK)
-   return ret;
+   struct scsi_cmnd *cmd = req->special;
 
if (unlikely(sdev->scsi_dh_data && sdev->scsi_dh_data->scsi_dh
 && sdev->scsi_dh_data->scsi_dh->prep_fn)) {
-   ret = sdev->scsi_dh_data->scsi_dh->prep_fn(sdev, req);
+   int ret = sdev->scsi_dh_data->scsi_dh->prep_fn(sdev, req);
if (ret != BLKPREP_OK)
return ret;
}
@@ -1141,16 +1129,13 @@ int scsi_setup_fs_cmnd(struct scsi_device *sdev, struct 
request *req)
 */
BUG_ON(!req->nr_phys_segments);
 
-   cmd = scsi_get_cmd_from_req(sdev, req);
-   if (unlikely(!cmd))
-   return BLKPREP_DEFER;
-
memset(cmd->cmnd, 0, BLK_MAX_CDB);
return scsi_init_io(cmd, GFP_ATOMIC);
 }
 EXPORT_SYMBOL(scsi_setup_fs_cmnd);
 
-int scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
+static int
+scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
 {
int ret = BLKPREP_OK;
 
@@ -1202,9 +1187,9 @@ int scsi_prep_state_check(struct scsi_device *sdev, 
struct request *req)
}
return ret;
 }
-EXPORT_SYMBOL(scsi_prep_state_check);
 
-int scsi_prep_return(struct request_queue *q, struct request *req, int ret)
+static int
+scsi_prep_return(struct request_queue *q, struct request *req, int ret)
 {
struct scsi_device *sdev = q->queuedata;
 
@@ -1235,18 +1220,44 @@ int scsi_prep_return(struct request_queue *q, struct 
request *req, int ret)
 
return ret;
 }
-EXPORT_SYMBOL(scsi_prep_return);
 
-int scsi_prep_fn(struct request_queue *q, struct request *req)
+static int scsi_prep_fn(struct request_queue *q, struct request *req)
 {
struct scsi_device *sdev = q->queuedata;
-   int ret = BLKPREP_KILL;
+   struct scsi_cmnd *cmd;
+   int ret;
 
-   if (req->cmd_type == REQ_TYPE_BLOCK_PC)
+   ret = scsi_prep_state_check(sdev, req);
+   if (ret != BLKPREP_OK)
+   goto out;
+
+   cmd = scsi_get_cmd_from_req(sdev, req);
+   if (unlikely(!cmd)) {
+   ret = BLKPREP_DEFER;
+   goto out;
+   }
+
+   if (req->cmd_type == REQ_TYPE_FS)
+   ret = scsi_cmd_to_driver(cmd)->init_command(cmd);
+   else if (req->cmd_type == REQ_TYPE_BLOCK_PC)
ret = scsi_setup_blk_pc_cmnd(sdev, req);
+   else
+   ret = BLKPREP_KILL;
+
+out:
return scsi_prep_return(q, req, ret);
 }
-EXPORT_SYMBOL(scsi_prep_fn);
+
+static void scsi_unprep_fn(struct request_queue *q, struct request *req)
+{
+   if (req->cmd_type == REQ_TYPE_FS) {
+   struct scsi_cmnd *cmd = req->special;
+   struct scsi_driver *drv = scsi_cmd_to_driver(cmd);
+
+   if (drv->uninit_command)
+   drv->uninit_command(cmd);
+   }
+}
 
 /*
  * scsi_dev_queue_ready: if we can send requests to sdev, return 1 else
@@ -1667,6 +1678,7 @@ struct request_queue *scsi_alloc_queue(struct scsi_device 
*sdev)
return NULL;
 
blk_queue_prep_rq(q, scsi_prep_fn);
+   blk_queue_unprep_rq(q, scsi_unprep_fn);
blk_queue_softirq_done(q, scsi_softirq_done);
blk_queue_rq_timed_out(q, scsi_times_out);
blk_queue_lld_busy(q, scsi_lld_busy);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 89e6c04..d99cb3f 100644
--- a/drivers/scsi/sd.c
+++ b/driv

Re: misc scsi midlayer updates

2014-03-31 Thread Mike Christie

On 03/27/2014 11:14 AM, Christoph Hellwig wrote:
> Various patches from the scsi multiqueue development that make sense on their
> own.


With the updated patch "[PATCH 3/4 v2] scsi: reintroduce
scsi_driver.init_command", they look ok to me.

Reviewed-by: Mike Christie 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling

2014-03-31 Thread Alan Stern

On Mon, 31 Mar 2014, Hannes Reinecke wrote:

> >> Ah. Correct. But that's due to the first patch being incorrect.
> >> Cf my response to the original first patch.
> > 
> > See my response to your response.  :-)
> > 
> Okay, So I probably should refrain from issueing a response to
> your response to my response lest infinite recursion happens :-)

Indeed.

> >>>   What should happen if some other pathway manages to call
> >>>   scsi_try_to_abort_cmd() while scmd->abort_work is still
> >>>   sitting on the work queue?  The command would be aborted
> >>>   and the flag would be cleared, but the queued work would
> >>>   remain.  Can this ever happen?
> >>>
> >> Not that I could see.
> >> A command abort is only ever triggered by the request timeout from
> >> the block layer. And the timer is _not_ rearmed once the timeout
> >> function (here: scsi_times_out()) is called.
> >> Hence I fail to see how it can be called concurrently.
> > 
> > scsi_try_to_abort_cmd() is also called (via a different pathway) when a 
> > command sent by the error handler itself times out.  I haven't traced 
> > through all the different paths to make sure none of them can run 
> > concurrently.  But I'm willing to take your word for it.
> > 
> Yes, but that's not calling scsi_abort_command(), but rather invokes
> scsi_abort_eh_cmnd().

Sure.  But either way, we end up in scsi_try_to_abort_cmd(), which
calls the LLDD's abort handler.  Thus leading to the possibility of
aborting the same command more than once.

> >> The original idea was this:
> >>
> >> SCSI_EH_ABORT_SCHEDULED is sticky _per command_.
> >> Point is, any command abort is only ever send for a timed-out
> >> command. And the main problem for a timed-out command is that we
> >> simply _do not_ know what happened for that command. So _if_ a
> >> command timed out, _and_ we've send an abort, _and_ the command
> >> times out _again_ we'll be running into an endless loop between
> >> timeout and aborting, and never returning the command at all.
> >>
> >> So to prevent this we should set a marker on that command telling it
> >> to _not_ try to abort the command again.
> > 
> > I disagree.  We _have_ to abort the command again -- how else can we
> > stop a running command?  To prevent the loop you described, we should
> > avoid _retrying_ the command after it is aborted the second time.
> > 
> The actual question is whether it's worth aborting the same command
> a second time.
> In principle any reset (like LUN reset etc) should clear the
> command, too.
> And the EH abort functionality is geared around this.
> If, for some reason, the transport layer / device driver
> requires a command abort to be send then sure, we need
> to accommodate for that.

As James mentioned, with USB a reset does not abort a running command.  
Instead it waits for the command to finish.  (However, this could be
changed in usb-storage, if required.)

> As said, yes, in principle you are right. We should be aborting the
> command a second time, _and then_ starting the escalation.
> 
> So if the above reasoning is okay then this patch should be doing
> the trick:
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 771c16b..0e72374 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -189,6 +189,7 @@ scsi_abort_command(struct scsi_cmnd *scmd)
> /*
>  * Retry after abort failed, escalate to next level.
>  */
> +   scmd->eh_eflags &= ~SCSI_EH_ABORT_SCHEDULED;
> SCSI_LOG_ERROR_RECOVERY(3,
> scmd_printk(KERN_INFO, scmd,
> "scmd %p previous abort
> failed\n", scmd));
> 
> (Beware of line
> breaks)
> 
> Can you test with it?

I don't understand.  This doesn't solve the fundamental problem (namely 
that you escalate before aborting a running command).  All it does is 
clear the SCSI_EH_ABORT_SCHEDULED flag before escalating.

What's needed is something like a 2-bit abort counter.  
scsi_try_to_abort_cmd() should increment the counter each time it runs, 
and if scmd_eh_abort_handler() sees that the counter is too high, it 
should avoid its retry pathway.  _Then_ you can move on to something 
more drastic.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling

2014-03-31 Thread Alan Stern

On Mon, 31 Mar 2014, James Bottomley wrote:

> > The actual question is whether it's worth aborting the same command
> > a second time.
> > In principle any reset (like LUN reset etc) should clear the
> > command, too.
> > And the EH abort functionality is geared around this.
> > If, for some reason, the transport layer / device driver
> > requires a command abort to be send then sure, we need
> > to accommodate for that.
> 
> We already discussed this (that was my first response too).  USB needs
> all outstanding commands aborted before proceeding to reset.  I'm
> starting to think the actual way to fix this is to reset the abort
> scheduled only if we send something else, so this might be a better fix.
> 
> It doesn't matter if we finish a command with abort scheduled because
> the command then gets freed and the flags cleaned.
> 
> We can take our time with this because the other two patches, which I
> can send separately fix the current deadlock (we no longer send an
> unaborted request sense before the reset), and it can go via rc fixes.
> 
> James
> 
> ---
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 771c16b..3cfd2bf 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -1007,6 +1007,12 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
> unsigned char *cmnd,
>   const unsigned long stall_for = msecs_to_jiffies(100);
>   int rtn;
>  
> + /*
> +  * We're sending another command we'll need to abort, so reset the
> +  * abort scheduled flag on the real command before we save its state
> +  */
> + scmd->eh_eflags &= ~SCSI_EH_ABORT_SCHEDULED;
> +
>  retry:
>   scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes);
>   shost->eh_action = &done;

I don't know if Hannes will like this, but I don't think it is right.  
This is not the retry pathway that's causing problems; that pathway 
begins where scmd_eh_abort_handler() calls scsi_queue_insert().

Now, that call is already guarded by this test:

if (... && (++scmd->retries <= scmd->allowed)) {

which would seem to prevent the infinite loop that Hannes was concerned 
about.  So maybe the 1/3 patch posted a couple of days ago is the best 
solution after all.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] target/rd: T10-Dif: RAM disk is allocating more space than required.

2014-03-31 Thread Nicholas A. Bellinger

Hi Quinn & Co,

On Mon, 2014-03-31 at 16:15 +, Quinn Tran wrote:
> On 3/28/14 5:22 PM, "sagi grimberg"  wrote:
> 
> >On 3/29/2014 2:05 AM, Quinn Tran wrote:
> >> Ram disk is allocating 8x more space than required for diff data.
> >> For large RAM disk test, there is small potential for memory
> >> starvation.
> >>
> >> Signed-off-by: Nicholas Bellinger 
> >> Signed-off-by: Giridhar Malavali 
> >> ---
> >>   drivers/target/target_core_rd.c | 6 +-
> >>   1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/target/target_core_rd.c
> >>b/drivers/target/target_core_rd.c
> >> index 01dda0b..6df177a 100644
> >> --- a/drivers/target/target_core_rd.c
> >> +++ b/drivers/target/target_core_rd.c
> >> @@ -253,7 +253,11 @@ static int rd_build_prot_space(struct rd_dev
> >>*rd_dev, int prot_length)
> >>  if (rd_dev->rd_flags & RDF_NULLIO)
> >>  return 0;
> >>
> >> -total_sg_needed = rd_dev->rd_page_count / prot_length;
> >> +/* prot_length=8byte dif data
> >> + * tot sg needed = rd_page_count * (PGSZ/512) * (prot_length/PGSZ) +
> >>pad
> >> + * PGSZ canceled each other.
> >> + */
> >> +total_sg_needed = (rd_dev->rd_page_count * prot_length / 512) +1;
> >
> >You probably will want to use block_size instead of hard-coding 512.
> >Other then that this makes sense.
> 
> QT> agreed
> 

Applying the following updated patch to target-pending/for-next, along
with Sagi's comment wrt to block_size.

Also adding your missing Signed-off-by.  Please make sure to include
these in future patches.  ;)

Thanks!

--nab

commit 435b88ba4a38adc39842957610b27a8d0fb4584b
Author: Quinn Tran 
Date:   Fri Mar 28 19:05:27 2014 -0400

target/rd: T10-Dif: RAM disk is allocating more space than required.

Ram disk is allocating 8x more space than required for diff data.
For large RAM disk test, there is small potential for memory
starvation.

(Use block_size when calculating total_sg_needed - sagi + nab)

Signed-off-by: Giridhar Malavali 
Signed-off-by: Quinn Tran 
Cc:  #3.14+
Signed-off-by: Nicholas Bellinger 

diff --git a/drivers/target/target_core_rd.c b/drivers/target/target_core_rd.c
index 66a5aba..b920db3 100644
--- a/drivers/target/target_core_rd.c
+++ b/drivers/target/target_core_rd.c
@@ -242,7 +242,7 @@ static void rd_release_prot_space(struct rd_dev *rd_dev)
rd_dev->sg_prot_count = 0;
 }
 
-static int rd_build_prot_space(struct rd_dev *rd_dev, int prot_length)
+static int rd_build_prot_space(struct rd_dev *rd_dev, int prot_length, int 
block_size)
 {
struct rd_dev_sg_table *sg_table;
u32 total_sg_needed, sg_tables;
@@ -252,8 +252,13 @@ static int rd_build_prot_space(struct rd_dev *rd_dev, int 
prot_length)
 
if (rd_dev->rd_flags & RDF_NULLIO)
return 0;
-
-   total_sg_needed = rd_dev->rd_page_count / prot_length;
+   /*
+* prot_length=8byte dif data
+* tot sg needed = rd_page_count * (PGSZ/block_size) *
+* (prot_length/block_size) + pad
+* PGSZ canceled each other.
+*/
+   total_sg_needed = (rd_dev->rd_page_count * prot_length / block_size) + 
1;
 
sg_tables = (total_sg_needed / max_sg_per_table) + 1;
 
@@ -606,7 +611,8 @@ static int rd_init_prot(struct se_device *dev)
 if (!dev->dev_attrib.pi_prot_type)
return 0;
 
-   return rd_build_prot_space(rd_dev, dev->prot_length);
+   return rd_build_prot_space(rd_dev, dev->prot_length,
+  dev->dev_attrib.block_size);
 }
 
 static void rd_free_prot(struct se_device *dev)

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] target/core: T10-Dif: check HW support capabilities

2014-03-31 Thread Nicholas A. Bellinger

On Sat, 2014-03-29 at 04:24 +0300, sagi grimberg wrote:
> On 3/29/2014 3:53 AM, Quinn Tran wrote:
> > +
> > +if (dev->dev_attrib.pi_prot_type) {
> > +uint32_t cap[] = { 0,
> > +   TARGET_DIF_TYPE1_PROTECTION,
> > +   TARGET_DIF_TYPE2_PROTECTION,
> > +   TARGET_DIF_TYPE3_PROTECTION};
> > +uint32_t pt_bits = cap[dev->dev_attrib.pi_prot_type];
> > +pt_bits &= se_tpg->fabric_sup_prot_type;
> >> At what point should the fabric fill that? it can vary between portals
> >> right?
> > QT> Yes, protection mode can vary between portals. When user select the
> > physical function (via fabric_make_tpg), you know the specific portal
> > based on user input and its capability. This is where Qlogic register its
> > capabilities based on specific hardware.
> >
> >
> >> I would prefer to do that as a function pointer to explicitly ask the
> >> fabric it's support.
> > QT> is it still require with previous answer ?
> >
> 
> Well, I think it is nicer to explicitly ask the fabric at that point 
> instead of checking what it previously set.
> 

I'm currently working on a series that explicitly queries the fabric for
what PI modes are supported, as per our LSF discussion.

> By the way, I think this patch breaks existing iSER support as iSER 
> doesn't set these bits.
> Thats why I think it would be a good idea to *explicitly* ask.



> 
> >
> >>> +pr_err("dev[%p]: DIF protection mismatch with fabric 
> >>> "
> >>> +"(%s). Transport capability bits[0x%x]\n",
> >>> +dev, 
> >>> se_tpg->se_tpg_wwn->wwn_group.cg_item.ci_name,
> >>> +se_tpg->fabric_sup_prot_type);
> >>> +return -EFAULT;
> >> Didn't we agree that this is allowed and the target core should
> >> compensate on the lack fabric support?
> >  My recollection was that it's not recommended to have different
> > portals with different supported feature.
> 
> Well we seem to remember different things...
> Anyway I think it is better to compensate that in backstore/target-core 
> level, that would be better
> than silently turn off protection. Martin? Nic? your takes?
> 

At the BoF we concluded that when a backend has PI enabled, but the
fabric does not support PI, that target-core should strip off the
INQUIRY + other control bits that normally indicate protection, but only
on that particular path (eg: session).

This would allow different iSCSI network portals to set a se_session bit
at login time in order to indicate if/when protection is supported at
the fabric nexus level.

If it wasn't for iscsi-target / iser-target sharing the same endpoint
across different fabrics, the PI information could simply be queried on
a per /sys/kernel/config/target/$FABRIC/$WWPN/$TPGT endpoint basis, but
alas it's not that simple..

> Also I don't know what rats are hiding here if the backstore is handling 
> IOs in this time...
> What about cleaning up all the protection resources the backstore driver 
> might be managing?
> 
> >   Meaning a SCSI Write without Dif
> > down a none-T10PI portal update the data.  The Guard on the disk is now
> > mismatch with the data. A SCSI Read down a T10PI path will run into a
> > Guard failure.
> >
> > By introducing this option, Disk vendor require additional logic to
> > compensate for this. I think it's cheaper to have supported hardware
> > rather than support nightmare.
> >
> >>> +}
> >>> +}
> >>> +
> >>>   if (lun->lun_se_dev !=  NULL) {
> >>>   pr_err("Port Symlink already exists\n");
> >>>   return -EEXIST;
> >>> diff --git a/drivers/target/target_core_tpg.c
> >>> b/drivers/target/target_core_tpg.c
> >>> index c036595..9279971 100644
> >>> --- a/drivers/target/target_core_tpg.c
> >>> +++ b/drivers/target/target_core_tpg.c
> >>> @@ -632,6 +632,15 @@ int core_tpg_set_initiator_node_tag(
> >>>}
> >>>EXPORT_SYMBOL(core_tpg_set_initiator_node_tag);
> >>>
> >>> +void core_tpg_set_fabric_t10dif(
> >>> +struct se_portal_group *tpg,
> >>> +uint32_t fabric_t10dif_force_on)
> >>> +{
> >>> +tpg->fabric_t10dif_force_on = fabric_t10dif_force_on;
> >>> +}
> >>> +EXPORT_SYMBOL(core_tpg_set_fabric_t10dif);
> >>> +
> >> Is there a user for this function in this patch?
> > QT> I'm on the fence with this piece.  Just thinking of a case where DIX
> > is not available for initiator side, but user wants to turn on protection
> > at the link layer.  Our test folks would like to cover this case in the
> > future.
> 
> Not sure I understand. Initiator will send the target data+protection 
> (DIX disabled HBA does INSERT), why does this help?
> Why should the target fabric care who generated the protection (OS or HBA)?
> 

So this is the case where the fabric is responsible for doing the WRITE
INSERT + READ_STRIP (which could be in hardware or software), but the
INQUIRY

Re: [PATCH 2/4] tcm_qla2xxx: T10-Dif set harware capability

2014-03-31 Thread Nicholas A. Bellinger

 On Mon, 2014-03-31 at 15:38 +, Quinn Tran wrote:
> On 3/28/14 5:12 PM, "sagi grimberg"  wrote:
> 
> >On 3/29/2014 2:05 AM, Quinn Tran wrote:
> >> Set Protection Type(1,2,3) capabilities, Guarg type (CRC/IPchksm)
> >> capabilities bits to let TCM core knows of HW/fabric capabilities.
> >>
> >> Signed-off-by: Nicholas Bellinger 
> >> Signed-off-by: Giridhar Malavali 
> >> ---
> >>   drivers/scsi/qla2xxx/tcm_qla2xxx.c | 23 +++
> >>   drivers/scsi/qla2xxx/tcm_qla2xxx.h |  1 +
> >>   2 files changed, 24 insertions(+)
> >>
> >> diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
> >>b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
> >> index b23a0ff..4d93081 100644
> >> --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
> >> +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c



> >> +
> >> +if (scsi_host_get_prot(lport->qla_vha->host)) {
> >> +tpg->se_tpg.fabric_sup_prot_type = (TARGET_DIF_TYPE0_PROT|
> >> +TARGET_DIF_TYPE1_PROT|TARGET_DIF_TYPE2_PROT|
> >> +TARGET_DIF_TYPE3_PROT);
> >> +
> >> +tpg->se_tpg.fabric_sup_guard_type = TARGET_GUARD_CRC|
> >> +TARGET_GUARD_IP;
> >> +}
> >>
> >>  ret = core_tpg_register(&tcm_qla2xxx_fabric_configfs->tf_ops, wwn,
> >>  &tpg->se_tpg, tpg, TRANSPORT_TPG_TYPE_NORMAL);
> >> @@ -1127,6 +1147,8 @@ static ssize_t tcm_qla2xxx_npiv_tpg_store_enable(
> >>  qlt_stop_phase1(vha->vha_tgt.qla_tgt);
> >>  }
> >>
> >> +core_tpg_set_fabric_t10dif(se_tpg, tpg->tpg_attrib.t10dif_force_on);
> >> +
> >
> >Any way we can get this logic to be shared also with iscsi, srp, etc...
> >all fabrics should
> >be set with t10dif right? so I would imagine it would be better to
> >centralize it right?
> 
> QT> Not sure how you want this logic to be shared.  This patch is specific
> to Qlogic driver registering its capabilities.
> 

I think that Sagi was referring to a target_core_fabric_ops callback to
query protection information from the fabric..

As mentioned in the last response, this would work just fine on
a /sys/kernel/config/target/$FABRIC/$WWPN/$TPGT context basis, if it
wasn't for iscsi-target / iser-target sharing the same endpoint while
still allowing different protection modes.

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] target/core: T10-Dif: check HW support capabilities

2014-03-31 Thread Nicholas A. Bellinger

On Mon, 2014-03-31 at 17:53 +, Quinn Tran wrote:
> On 3/28/14 6:24 PM, "sagi grimberg"  wrote:
> 
> >On 3/29/2014 3:53 AM, Quinn Tran wrote:



>  +}
>  +}
>  +
>    if (lun->lun_se_dev !=  NULL) {
>    pr_err("Port Symlink already exists\n");
>    return -EEXIST;
>  diff --git a/drivers/target/target_core_tpg.c
>  b/drivers/target/target_core_tpg.c
>  index c036595..9279971 100644
>  --- a/drivers/target/target_core_tpg.c
>  +++ b/drivers/target/target_core_tpg.c
>  @@ -632,6 +632,15 @@ int core_tpg_set_initiator_node_tag(
> }
> EXPORT_SYMBOL(core_tpg_set_initiator_node_tag);
> 
>  +void core_tpg_set_fabric_t10dif(
>  +struct se_portal_group *tpg,
>  +uint32_t fabric_t10dif_force_on)
>  +{
>  +tpg->fabric_t10dif_force_on = fabric_t10dif_force_on;
>  +}
>  +EXPORT_SYMBOL(core_tpg_set_fabric_t10dif);
>  +
> >>> Is there a user for this function in this patch?
> >> QT> I'm on the fence with this piece.  Just thinking of a case where DIX
> >> is not available for initiator side, but user wants to turn on
> >>protection
> >> at the link layer.  Our test folks would like to cover this case in the
> >> future.
> >
> >Not sure I understand. Initiator will send the target data+protection
> >(DIX disabled HBA does INSERT), why does this help?
> >Why should the target fabric care who generated the protection (OS or
> >HBA)?
> 
> QT> Sorry for the confusion.  The case I'm trying to get at is "Data In
> Insert" and "Data out strip".In this case, the protection starts from
> front end target adapter to the back end storage.  In revisit your
> previous patch, this case is not covered.
> 
> 



So for the TARGET_PROT_DIN_INSERT + TARGET_PROT_DOUT_STRIP cases, the
target will need to expose INQUIRY PROTECT=1 + other PI related control
bits when the fabric supports these modes, regardless of what PI is
supported on the backend device.

Keeping this configuration in mind as well while coding up the
aforementioned series.  ;)

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling

2014-03-31 Thread Hannes Reinecke

On 03/31/2014 05:03 PM, James Bottomley wrote:
> [lets split the thread]
> On Mon, 2014-03-31 at 16:37 +0200, Hannes Reinecke wrote:
>> On 03/31/2014 03:33 PM, Alan Stern wrote:
>>> On Mon, 31 Mar 2014, Hannes Reinecke wrote:
 On 03/28/2014 08:29 PM, Alan Stern wrote:
> On Fri, 28 Mar 2014, James Bottomley wrote:
> Maybe scmd_eh_abort_handler() should check the flag before doing
> anything.  Is there any sort of sychronization to prevent the same
> incarnation of a command from being aborted twice (or by two different
> threads at the same time)?  If there is, it isn't obvious.
>
 See above. scsi_times_out() will only ever called once.
 What can happen, though, is that _theoretically_ the LLDD might
 decide to call ->done() on a timed out command when
 scsi_eh_abort_handler() is still pending.
>>>
>>> That's okay.  We can expect the LLDD to have sufficient locking to
>>> handle that sort of thing without confusion (usb-storage does, for
>>> example).
>>>
> (Also, what's going on at the start of scsi_abort_command()?  Contrary
> to what one might expect, the first part of the function _cancels_ a
> scheduled abort.  And it does so without clearing the
> SCSI_EH_ABORT_SCHEDULED flag.)
>
 The original idea was this:

 SCSI_EH_ABORT_SCHEDULED is sticky _per command_.
 Point is, any command abort is only ever send for a timed-out
 command. And the main problem for a timed-out command is that we
 simply _do not_ know what happened for that command. So _if_ a
 command timed out, _and_ we've send an abort, _and_ the command
 times out _again_ we'll be running into an endless loop between
 timeout and aborting, and never returning the command at all.

 So to prevent this we should set a marker on that command telling it
 to _not_ try to abort the command again.
>>>
>>> I disagree.  We _have_ to abort the command again -- how else can we
>>> stop a running command?  To prevent the loop you described, we should
>>> avoid _retrying_ the command after it is aborted the second time.
>>>
>> The actual question is whether it's worth aborting the same command
>> a second time.
>> In principle any reset (like LUN reset etc) should clear the
>> command, too.
>> And the EH abort functionality is geared around this.
>> If, for some reason, the transport layer / device driver
>> requires a command abort to be send then sure, we need
>> to accommodate for that.
> 
> We already discussed this (that was my first response too).  USB needs
> all outstanding commands aborted before proceeding to reset.  I'm
> starting to think the actual way to fix this is to reset the abort
> scheduled only if we send something else, so this might be a better fix.
> 
> It doesn't matter if we finish a command with abort scheduled because
> the command then gets freed and the flags cleaned.
> 
> We can take our time with this because the other two patches, which I
> can send separately fix the current deadlock (we no longer send an
> unaborted request sense before the reset), and it can go via rc fixes.
> 
Yes, agreed.

The USB case seems to be a bit more tricky, and at least I need some
more time to fully understand the details and implications of this.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

38 matches

Mail list logo