[GIT PULL -v3] target fixes for v3.12-rc0 (was v3.11)

2013-09-02 Thread Nicholas A. Bellinger
Hi Linus,

Now with DKIM + SPF in place on linux-iscsi.org (thanks for the help Ted
& Co), here is another attempt at the target fixes PULL request for
v3.11 that was missed.

Given that the last PULL emails hit your spam folder, if/when you get
this please give me a quick response to confirm.

Please go ahead and pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git master

The first patch is to address a long standing issue where INQUIRY vendor
+ model response data was not correctly padded with ASCII spaces,
causing MSFT and Falconstor multipath stacks to not function with our
LUNs.

The second -> forth patches are additional iscsi-target regression fixes
for the post >= v3.10 iser-target changes.  The second and third are
failure cases that have appeared during further testing, and the forth
is only reproducible with malformed NOP packets.

The fifth patch is a v3.11 specific regression caused by a recent
optimization that showed up during WRITE I/O failure testing.

I'll be sending Patch #1 and Patch #5 to Greg-KH separately for stable.

Thank you,

--nab

Nicholas Bellinger (5):
  target: Fix trailing ASCII space usage in INQUIRY vendor+model
  iscsi-target: Fix ImmediateData=Yes failure regression in >= v3.10
  iscsi-target: Fix iscsit_transport reference leak during NP thread
reset
  iscsi-target: Fix potential NULL pointer in solicited NOPOUT reject
  target: Fix se_cmd->state_list leak regression during WRITE failure

 drivers/target/iscsi/iscsi_target.c   |   17 ++---
 drivers/target/iscsi/iscsi_target_login.c |9 -
 drivers/target/target_core_spc.c  |9 ++---
 drivers/target/target_core_transport.c|   11 +++
 4 files changed, 31 insertions(+), 15 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 60758] module scsi_wait_scan not found kernel panic on boot

2013-09-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60758

--- Comment #22 from Jeff Zhou  ---
(In reply to zakrzewskim from comment #21)
> Here you are: http://www.upemax.user.icpnet.pl/3.10.4-1.el6.lst

Yes then we see the scsi_scan_wait is not in 3.10.4 either, otherwise there
will be a line :"drivers/scsi/scsi_wait_scan.ko".

The rc.sysinit file in your system might need adjustment to remove the
reference to scsi_wait_scan module.

But that might not be the root cause, since 3.10.4 without scsi_wait_scan works
fine under this script.

There could be a regression between 3.10.4 to 3.10.5 cause your system hang,
might be or might not be scsi problem.

To find the root cause and fix it, probably need to roll back between 3.10.5 to
3.10.4, build and test the kernel for booting, git bisect could be helpful.
Then we can check the details to see why the commit crashing the specific
machine.

-- 
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ibmvscsi: Fix little endian issues

2013-09-02 Thread Anton Blanchard

The hypervisor is big endian, so little endian kernel builds need
to byteswap.

Signed-off-by: Anton Blanchard 
---

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index d0fa4b6..62bdbc9 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -241,7 +241,7 @@ static void gather_partition_info(void)
struct device_node *rootdn;
 
const char *ppartition_name;
-   const unsigned int *p_number_ptr;
+   const __be32 *p_number_ptr;
 
/* Retrieve information about this partition */
rootdn = of_find_node_by_path("/");
@@ -255,7 +255,7 @@ static void gather_partition_info(void)
sizeof(partition_name));
p_number_ptr = of_get_property(rootdn, "ibm,partition-no", NULL);
if (p_number_ptr)
-   partition_number = *p_number_ptr;
+   partition_number = of_read_number(p_number_ptr, 1);
of_node_put(rootdn);
 }
 
@@ -270,10 +270,11 @@ static void set_adapter_info(struct ibmvscsi_host_data 
*hostdata)
strncpy(hostdata->madapter_info.partition_name, partition_name,
sizeof(hostdata->madapter_info.partition_name));
 
-   hostdata->madapter_info.partition_number = partition_number;
+   hostdata->madapter_info.partition_number =
+   cpu_to_be32(partition_number);
 
-   hostdata->madapter_info.mad_version = 1;
-   hostdata->madapter_info.os_type = 2;
+   hostdata->madapter_info.mad_version = cpu_to_be32(1);
+   hostdata->madapter_info.os_type = cpu_to_be32(2);
 }
 
 /**
@@ -464,9 +465,9 @@ static int initialize_event_pool(struct event_pool *pool,
memset(&evt->crq, 0x00, sizeof(evt->crq));
atomic_set(&evt->free, 1);
evt->crq.valid = 0x80;
-   evt->crq.IU_length = sizeof(*evt->xfer_iu);
-   evt->crq.IU_data_ptr = pool->iu_token + 
-   sizeof(*evt->xfer_iu) * i;
+   evt->crq.IU_length = cpu_to_be16(sizeof(*evt->xfer_iu));
+   evt->crq.IU_data_ptr = cpu_to_be64(pool->iu_token + 
+   sizeof(*evt->xfer_iu) * i);
evt->xfer_iu = pool->iu_storage + i;
evt->hostdata = hostdata;
evt->ext_list = NULL;
@@ -588,7 +589,7 @@ static void init_event_struct(struct srp_event_struct 
*evt_struct,
evt_struct->cmnd_done = NULL;
evt_struct->sync_srp = NULL;
evt_struct->crq.format = format;
-   evt_struct->crq.timeout = timeout;
+   evt_struct->crq.timeout = cpu_to_be16(timeout);
evt_struct->done = done;
 }
 
@@ -659,8 +660,8 @@ static int map_sg_list(struct scsi_cmnd *cmd, int nseg,
 
scsi_for_each_sg(cmd, sg, nseg, i) {
struct srp_direct_buf *descr = md + i;
-   descr->va = sg_dma_address(sg);
-   descr->len = sg_dma_len(sg);
+   descr->va = cpu_to_be64(sg_dma_address(sg));
+   descr->len = cpu_to_be32(sg_dma_len(sg));
descr->key = 0;
total_length += sg_dma_len(sg);
}
@@ -703,13 +704,14 @@ static int map_sg_data(struct scsi_cmnd *cmd,
}
 
indirect->table_desc.va = 0;
-   indirect->table_desc.len = sg_mapped * sizeof(struct srp_direct_buf);
+   indirect->table_desc.len = cpu_to_be32(sg_mapped *
+  sizeof(struct srp_direct_buf));
indirect->table_desc.key = 0;
 
if (sg_mapped <= MAX_INDIRECT_BUFS) {
total_length = map_sg_list(cmd, sg_mapped,
   &indirect->desc_list[0]);
-   indirect->len = total_length;
+   indirect->len = cpu_to_be32(total_length);
return 1;
}
 
@@ -731,9 +733,10 @@ static int map_sg_data(struct scsi_cmnd *cmd,
 
total_length = map_sg_list(cmd, sg_mapped, evt_struct->ext_list);
 
-   indirect->len = total_length;
-   indirect->table_desc.va = evt_struct->ext_list_token;
-   indirect->table_desc.len = sg_mapped * sizeof(indirect->desc_list[0]);
+   indirect->len = cpu_to_be32(total_length);
+   indirect->table_desc.va = cpu_to_be64(evt_struct->ext_list_token);
+   indirect->table_desc.len = cpu_to_be32(sg_mapped *
+  sizeof(indirect->desc_list[0]));
memcpy(indirect->desc_list, evt_struct->ext_list,
   MAX_INDIRECT_BUFS * sizeof(struct srp_direct_buf));
return 1;
@@ -849,7 +852,7 @@ static int ibmvscsi_send_srp_event(struct srp_event_struct 
*evt_struct,
   struct ibmvscsi_host_data *hostdata,
   unsigned long timeout)
 {
-   u64 *crq_as_u64 = (u64 *) &evt_struct->crq;
+   __be64 *crq_as_u64 = (__be64 *)&evt_struct->crq;
int request_status = 0;
int rc;
int

[Bug 60758] module scsi_wait_scan not found kernel panic on boot

2013-09-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60758

--- Comment #21 from zakrzews...@wp.pl ---
Here you are: http://www.upemax.user.icpnet.pl/3.10.4-1.el6.lst

-- 
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 60758] module scsi_wait_scan not found kernel panic on boot

2013-09-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60758

--- Comment #20 from Jeff Zhou  ---
Yes, there could be extra bugs.

Since this bug is about "scsi_wait_scan not found" in v3.10.5 and above, we can
try to work on this one first.

As suggested by Alan, the lsinitrd is available, it can be used to list out the
modules of a initramfs file.
Would you like to run lsinitrd for a working version (v.3.10.4-1) and the first
version with issue (v.3.10.5-1) to show the modules in the initramfs file?

lsinitrd initrd_v.xxx.img > v.xxx.lst

-- 
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] Disallow changing the device state via sysfs into "deleted"

2013-09-02 Thread Christoph Hellwig
On Mon, Sep 02, 2013 at 09:12:15PM +0200, Bart Van Assche wrote:
> Hmm ... I think the second option would require to add an additional
> case in sdev_set_state() to avoid a compiler warning. This is
> something James objected against (on June 24, see also
> http://thread.gmane.org/gmane.linux.scsi/82572/focus=82576).

In that case the cast version seems fine to me.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] Disallow changing the device state via sysfs into "deleted"

2013-09-02 Thread Christoph Hellwig
On Mon, Sep 02, 2013 at 08:59:30PM +0200, Bart Van Assche wrote:
> Do you agree with changing switch (state) into switch ((int)state) ?
> Without that additional change gcc reports the following warning:
> 
> drivers/scsi/scsi_sysfs.c: In function ?store_state_field?:
> drivers/scsi/scsi_sysfs.c:640:2: warning: case value ?0? not in
> enumerated type ?enum scsi_device_state? [-Wswitch]

Either that, or add a SDEV_INVALID_STATE = 0 value to the enum.  That
variant seems a little more elegant.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] Disallow changing the device state via sysfs into "deleted"

2013-09-02 Thread Bart Van Assche

On 09/02/13 21:06, Christoph Hellwig wrote:

On Mon, Sep 02, 2013 at 08:59:30PM +0200, Bart Van Assche wrote:

Do you agree with changing switch (state) into switch ((int)state) ?
Without that additional change gcc reports the following warning:

drivers/scsi/scsi_sysfs.c: In function ?store_state_field?:
drivers/scsi/scsi_sysfs.c:640:2: warning: case value ?0? not in
enumerated type ?enum scsi_device_state? [-Wswitch]


Either that, or add a SDEV_INVALID_STATE = 0 value to the enum.  That
variant seems a little more elegant.


Hmm ... I think the second option would require to add an additional 
case in sdev_set_state() to avoid a compiler warning. This is something 
James objected against (on June 24, see also 
http://thread.gmane.org/gmane.linux.scsi/82572/focus=82576).


Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] Disallow changing the device state via sysfs into "deleted"

2013-09-02 Thread Bart Van Assche

On 09/01/13 18:49, Christoph Hellwig wrote:

On Tue, Aug 20, 2013 at 02:08:39PM +0200, Bart Van Assche wrote:

Changing the state of a SCSI device via sysfs into "cancel",
"cancel-offline" or "deleted" prevents removal of these devices by
scsi_remove_host(). Hence do not allow this.


Looks good modulo a minor style nipick below.

Reviewed-by: Christoph Hellwig 


+   if (state == 0 || state == SDEV_CANCEL ||
+   state == SDEV_CANCEL_OFFLINE || state == SDEV_DEL ||
+   scsi_device_set_state(sdev, state) != 0)
return -EINVAL;


Doing a switch on the state would be a lot more readable here:

switch (state) {
case 0:
case SDEV_CANCEL:
case SDEV_CANCEL_OFFLINE:
case SDEV_DEL:
ret = -EINVAL;
break;
default:
ret = scsi_device_set_state(sdev, state);
}

if (ret)
return ret;

return count;
  }


Do you agree with changing switch (state) into switch ((int)state) ? 
Without that additional change gcc reports the following warning:


drivers/scsi/scsi_sysfs.c: In function ‘store_state_field’:
drivers/scsi/scsi_sysfs.c:640:2: warning: case value ‘0’ not in 
enumerated type ‘enum scsi_device_state’ [-Wswitch]


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] scsi: improved eh timeout handler

2013-09-02 Thread Bart Van Assche

On 09/02/13 15:11, Hannes Reinecke wrote:

On 09/02/2013 02:45 PM, Bart Van Assche wrote:

On 09/02/13 09:12, Hannes Reinecke wrote:

@@ -353,6 +354,8 @@ void scsi_put_command(struct scsi_cmnd *cmd)
   list_del_init(&cmd->list);
   spin_unlock_irqrestore(&cmd->device->list_lock, flags);

+cancel_delayed_work(&cmd->abort_work);
+
   __scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev);
   }
   EXPORT_SYMBOL(scsi_put_command);


Is this approach safe ? Is it e.g. possible that the abort work
starts just before the cancel_delayed_work() call and continues to
run after scsi_put_command() has finished ? In
drivers/scsi/libfc/fc_exch.c a similar issue is solved by holding an
additional reference as long as delayed work (fc_exch.timeout_work)
is queued.


I have been thinking of this, and in fact my original approach had
'cancel_delayed_work_sync' here. However, this didn't work as
scsi_put_command() might end up being called from an softirq context.


From my understanding the workqueue stuff guarantees that either

a) the workqueue item is still queued
-> cancel_delayed_work will be in fact synchronous, as it'll
   cancel queue item from the queue
b) the workqueue item is running
-> cancel_delayed_work is essentially a no-op, as the function
   is running and will be terminated anyway.

IE from the API perspective the transition between 'queued' and
'running' is atomic, and no in-between states are visible.

So case a) is obviously safe, and for case b) the abort function is
already running. But then the abort function has been called from
the block timeout handler, which did a blk_mark_rq_complete() prior
to calling the handler. So any completion coming in after that will
be ignored, and scsi_put_command() won't be called.

Hence we should be safe here.


Hello Hannes,

I think that you have just explained why that cancel_delayed_work() call 
in scsi_put_command() is not necessary at all ... In case of normal 
command completion, scsi_put_command() will be reached with no 
scmd_eh_abort_handler() call queued since there was no timeout. If the 
command timed out scsi_put_command() will be called after the abort_work 
has already been dequeued since work items are dequeued before being 
executed.


Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hallo..........

2013-09-02 Thread Mutoni Williams

Hello,How are you,I hope you're well,my name is Mutoni,I'm medium height and 
fair in complexion,i love,caring and I decided to contact you.I really want to 
have a good relationship with you.Next I have a special something I want to 
discuss with you,and tell you more about my self.Hope hear from you soon.I know 
age will not be a bearier to our relationship,I will give my best. 
Yours,Miss.Mutoni.
 
Hallo,Wie geht es dir,ich hoffe,du bist gut,mein Name ist Mutoni, ich mittlerer 
Höhe und Messe in Teint bin,ich liebe,fürsorgliche und ich beschlossen,you.I 
kontaktieren wirklich wollen,eine gute Beziehung mit Ihnen haben.Weiter habe 
ich eine besondere etwas möchte ich mit Ihnen zu besprechen haben,und erfahren 
Sie mehr über meine self.Hope von Ihnen zu hören soon.i wissen Alter wird kein 
bearier,unsere Beziehung zu sein, werde ich mein Bestes geben.Yours,Miss.Mutoni.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 60758] module scsi_wait_scan not found kernel panic on boot

2013-09-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60758

--- Comment #19 from zakrzews...@wp.pl ---
1. All of them are separate. I just install kernel-ml via yum.

2. I can only show such log from kernel 3.10.4-1. I don't have KVM to see
what's going on. Since this is a production machine inside datacenter I can
only rent KVM for 2 hours.

3. I found another bug - server is freezing and load is getting higher than
100: 

BUG: unable to handle kernel paging request at 01060032
IP: [] SyS_epoll_ctl+0x145/0x420
PGD 0
Oops:  [#1] SMP
Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack
xt_iprange iptable_filter ip_tables netconsole configfs nct6775 hwmon_vid ipv6
cpufreq_ondemand ppdev iTCO_wdt iTCO_vendor_support shpchp coretemp hwmon
acpi_cpufreq freq_table mperf kvm_intel kvm crc32_pclmul crc32c_intel
ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper
aes_x86_64 microcode pcspkr i2c_i801 parport_pc parport r8169 mii sg lpc_ich
xhci_hcd snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_seq
snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc ext4 jbd2 mbcache
raid1 sd_mod crc_t10dif mxm_wmi video ahci libahci wmi dm_mirror dm_region_hash
dm_log dm_mod
CPU: 0 PID: 3133 Comm: nginx Not tainted 3.10.4-1.el6.elrepo.x86_64 #1
Hardware name: MSI MS-7816/H87-G43 (MS-7816), BIOS V2.3 06/07/2013
task: 8807f18c0ac0 ti: 880753374000 task.ti: 880753374000
RIP: 0010:[]  [] SyS_epoll_ctl+0x145/0x420
RSP: 0018:880753375f18  EFLAGS: 00010202
RAX: 01060002 RBX: 8807f23125c0 RCX: 0001
RDX:  RSI: 880650213bc0 RDI: 88076704b808
RBP: 880753375f78 R08: 0002 R09: 0101010101010101
R10: 7fff52ea2220 R11: 0202 R12: 880773f15c80
R13: 0001 R14: 0045 R15: 88076704b800
FS:  7f8451c707c0() GS:88081ea0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 01060032 CR3: 000771b3b000 CR4: 001407f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Stack:
 8807f445c648 000152ea22b0 88076704b808 fff7810dfe46
 11b814408001 8807 01838b20 0001
 11b81440 11bd9288 11b6f8c0 7fff52ea22b0
Call Trace:
 [] system_call_fastpath+0x16/0x1b
Code: 00 00 c7 45 ac 00 00 00 00 83 f8 01 0f 86 4d 01 00 00 49 8d 47 08 48 89
c7 48 89 45 b0 e8 44 4c 41 00 49 8b 47 70 48 85 c0 74 1f <48> 3b 58 30 48 89 c6
77 0d 72 68 44 89 f2 2b 50 38 83 fa 00 7e
RIP  [] SyS_epoll_ctl+0x145/0x420
 RSP 
CR2: 01060032
---[ end trace 3c47cb214b7f3743 ]---

-- 
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Odd behavior of a "SAS-2" backplane with SGPIO commands

2013-09-02 Thread Pasi Kärkkäinen
On Sun, Sep 01, 2013 at 08:13:02PM +0300, Pasi Kärkkäinen wrote:
> On Fri, Aug 30, 2013 at 05:38:04PM -0400, Rich wrote:
> >Apparently, only about 4 months.
> >P17 firmware is out.
> >Tested on a card which was demonstrating the incorrect behavior before, 
> > of
> >model 9201-16i.
> >It does appear to contain the fixes for this problem.
> 
> This is great news, finally! 
> Thanks for testing and reporting!
>

Hmm, I checked the P17 firmware changelog PDFs for LSI 9211-8i, 9207-8i and 
9201-16i,
and I can't see anything about this issue being fixed.. 

But if you say it's now fixed, then I guess it means LSI doesn't list every 
change in the changelog..

-- Pasi
 
> 
> >- Rich
> > 
> >On Mon, Apr 22, 2013 at 8:39 PM, Pasi Kärkkäinen <[1]pa...@iki.fi> wrote:
> > 
> >  On Mon, Apr 22, 2013 at 08:28:58PM -0400, Rich wrote:
> >  >I'm told that the fix won't actually be in until P17, because it
> >  involved
> >  >touching a large number of codepaths, but that it will be in P17.
> >  >
> > 
> >  That's unfortunate.. so another 5-6 months.
> > 
> >  Thanks for the info!
> > 
> >  -- Pasi
> > 
> >  >- Rich
> >  >
> >  >On Mon, Apr 22, 2013 at 8:19 PM, Pasi KÃ*rkkÃ*inen
> >  <[1][2]pa...@iki.fi>
> >  >wrote:
> >  >
> >  >  On Wed, Dec 19, 2012 at 09:41:42PM +0200, Pasi KÃ*rkkÃ*inen
> >  wrote:
> >  >  > On Wed, Dec 19, 2012 at 02:35:02PM -0500, Rich wrote:
> >  >  > > Â  Â Nope, I'm wrong.
> >  >  >
> >  >  > :(
> >  >  >
> >  >  > > Â  Â I flashed P15 on a machine, and the same behavior
> >  persisted,
> >  >  right up to
> >  >  > > Â  Â and including the "groups of 3 devices light up when I
> >  do my
> >  >  own
> >  >  > > Â  Â smp_write_gpio".
> >  >  > > Â  Â Whereas if I flash the experimental blob I was handed 
> > by
> >  >  support to try
> >  >  > > Â  Â and confirm they had resolved the issue, it does the
> >  correct
> >  >  thing.
> >  >  > > Â  Â I wonder why this change didn't make it into P15.
> >  >  >
> >  >  > Hmm.. if you have open contact channel with LSI support please
> >  ask
> >  >  them..
> >  >  > it'd be nice to get the fix for wider audience..
> >  >  >
> >  >
> >  >  It seems LSI P16 firmware has been released.. in the FW 
> > changelog
> >  I can
> >  >  see at least this:
> >  >
> >  >  "Disable SGPIO Group ID support in Channel NVDATA XML's. This
> >  allows the
> >  >  use of manufacturing page 7 or default slot mapping setting for
> >  direct
> >  >  connected drive slots."
> >  >
> >  >  -- Pasi
> >  >
> >  >  >
> >  >  > > Â  Â - Rich
> >  >  > >
> >  >  > > Â  Â On Wed, Dec 19, 2012 at 2:00 PM, Rich
> >  >  <[1][2][3]rerc...@pha.jhu.edu> wrote:
> >  >  > >
> >  >  > > Â  Â  Â From the LSI P15 F/W release notes:
> >  >  > > Â  Â  Â SCGCQ00342805 (DFCT) Ã* - SlotStatus updates to SES
> >  managed
> >  >  Enclosure may
> >  >  > > Â  Â  Â update incorrect slots
> >  >  > > Â  Â  Â "Modified FW to use SES diag page 0Ah mapping only 
> > if
> >  SMP
> >  >  Discover
> >  >  > > Â  Â  Â DeviceSlotNum is used for Encl PhyÃ* Slot
> >  enumeration."
> >  >  > > Â  Â  Â SCGCQ00345867 (DFCT) Ã* - Channel NVDATA internal 
> > Phy
> >  >  reverse setting for
> >  >  > > Â  Â  Â 9207-4i4e 9207-8i 9217-4i4eÃ* 9217-8i and 9201-16i
> >  >  > > Â  Â  Â "ISSUE DESC:Ã* Internal PHY orders are reversed for
> >  the
> >  >  channel boards
> >  >  > > Â  Â  Â above."
> >  >  > > Â  Â  Â So I would submit this is likely fixed by this FW
> >  rev.
> >  >  > > Â  Â  Â - Rich
> >  >  > >
> >  >  > > Â  Â  Â On Fri, Dec 7, 2012 at 8:46 AM, Pasi
> >  KÃ*â*¬rkkÃ*â*¬inen
> >  >  <[2][3][4]pa...@iki.fi>
> >  >  > > Â  Â  Â wrote:
> >  >  > >
> >  >  > > Â  Â  Â  Â On Thu, Nov 01, 2012 at 11:55:25AM -0400, Rich
> >  wrote:
> >  >  > > Â  Â  Â  Â > On Sun, Oct 21, 2012 at 8:46 AM, Pasi
> >  >  KÃ*â*¬rkkÃ*â*¬inen <[3][4][5]pa...@iki.fi>
> >  >  > > Â  Â  Â  Â wrote:
> >  >  > > Â  Â  Â  Â > > On Mon, Sep 10, 2012 at 07:13:15PM +0300, 
> > Pasi
> >  >  KÃ*â*¬rkkÃ*â*¬inen wrote:
> >  >  > > Â  Â  Â  Â > >> On Mon, Sep 10, 2012 at 12:07:45PM -0400,
> >  Rich
> >  >  wrote:
> >  >  > > Â  Â  Â  Â > >> > On Mon, Sep 10, 2012 at 12:04 PM, Pasi
> >  >  KÃ*â*¬rkkÃ*â*¬inen
> >  >  > > Â  Â  Â  Â <[4][5][6]pa...@iki.fi> wrote:
> >  >  > > Â  Â  Â  Â > >> > > On Mon, Sep 10, 2012 at 06:01:42PM 
> > +0200,
> >  >  Emmanuel

Re: [PATCH 3/9] scsi: improved eh timeout handler

2013-09-02 Thread Hannes Reinecke
On 09/02/2013 02:45 PM, Bart Van Assche wrote:
> On 09/02/13 09:12, Hannes Reinecke wrote:
>> @@ -353,6 +354,8 @@ void scsi_put_command(struct scsi_cmnd *cmd)
>>   list_del_init(&cmd->list);
>>   spin_unlock_irqrestore(&cmd->device->list_lock, flags);
>>
>> +cancel_delayed_work(&cmd->abort_work);
>> +
>>   __scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev);
>>   }
>>   EXPORT_SYMBOL(scsi_put_command);
> 
> Is this approach safe ? Is it e.g. possible that the abort work
> starts just before the cancel_delayed_work() call and continues to
> run after scsi_put_command() has finished ? In
> drivers/scsi/libfc/fc_exch.c a similar issue is solved by holding an
> additional reference as long as delayed work (fc_exch.timeout_work)
> is queued.
> 
I have been thinking of this, and in fact my original approach had
'cancel_delayed_work_sync' here. However, this didn't work as
scsi_put_command() might end up being called from an softirq context.

>From my understanding the workqueue stuff guarantees that either
a) the workqueue item is still queued
   -> cancel_delayed_work will be in fact synchronous, as it'll
  cancel queue item from the queue
b) the workqueue item is running
   -> cancel_delayed_work is essentially a no-op, as the function
  is running and will be terminated anyway.

IE from the API perspective the transition between 'queued' and
'running' is atomic, and no in-between states are visible.

So case a) is obviously safe, and for case b) the abort function is
already running. But then the abort function has been called from
the block timeout handler, which did a blk_mark_rq_complete() prior
to calling the handler. So any completion coming in after that will
be ignored, and scsi_put_command() won't be called.

Hence we should be safe here.

>> +void
>> +scmd_eh_abort_handler(struct work_struct *work)
>> +{
>> +struct scsi_cmnd *scmd =
>> +container_of(work, struct scsi_cmnd, abort_work.work);
>> +struct scsi_device *sdev = scmd->device;
>> +unsigned long flags;
>> +int rtn;
>> +
>> +spin_lock_irqsave(sdev->host->host_lock, flags);
>> +if (scsi_host_eh_past_deadline(sdev->host)) {
>> +spin_unlock_irqrestore(sdev->host->host_lock, flags);
>> +SCSI_LOG_ERROR_RECOVERY(3,
>> +scmd_printk(KERN_INFO, scmd,
>> +"scmd %p eh timeout, not aborting\n", scmd));
>> +} else {
>> +spin_unlock_irqrestore(sdev->host->host_lock, flags);
>> +SCSI_LOG_ERROR_RECOVERY(3,
>> +scmd_printk(KERN_INFO, scmd,
>> +"aborting command %p\n", scmd));
>> +rtn = scsi_try_to_abort_cmd(sdev->host->hostt, scmd);
>> +if (rtn == SUCCESS) {
>> +scmd->result |= DID_TIME_OUT << 16;
>> +if (!scsi_noretry_cmd(scmd) &&
>> +(++scmd->retries <= scmd->allowed)) {
>> +SCSI_LOG_ERROR_RECOVERY(3,
>> +scmd_printk(KERN_WARNING, scmd,
>> +"scmd %p retry "
>> +"aborted command\n", scmd));
>> +scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
>> +} else {
>> +SCSI_LOG_ERROR_RECOVERY(3,
>> +scmd_printk(KERN_WARNING, scmd,
>> +"scmd %p finish "
>> +"aborted command\n", scmd));
>> +scsi_finish_command(scmd);
>> +}
>> +return;
>> +}
>> +SCSI_LOG_ERROR_RECOVERY(3,
>> +scmd_printk(KERN_INFO, scmd,
>> +"scmd %p abort failed, rtn %d\n",
>> +scmd, rtn));
>> +}
>> +
>> +if (scsi_eh_scmd_add(scmd, 0)) {
>> +SCSI_LOG_ERROR_RECOVERY(3,
>> +scmd_printk(KERN_WARNING, scmd,
>> +"scmd %p terminate "
>> +"aborted command\n", scmd));
>> +scmd->result |= DID_TIME_OUT << 16;
>> +scsi_finish_command(scmd);
>> +}
>> +}
> 
> This patch adds several new calls to LLD EH handlers. Is it
> guaranteed that these will only be invoked before scsi_remove_host()
> has finished ? For more background information, see also "[PATCH]
> Make scsi_remove_host() wait until error handling finished"
> (http://thread.gmane.org/gmane.linux.scsi/82572/focus=82779).
> 
Well, that depends how scsi_remove_host() handles outstanding
commands. What happens if you call scsi_remove_host() and there is
still I/O in flight? I would assume that any HBA would have to kill
any outstanding I/O prior to calling scsi_remove_host() (FC most
certainly does this).
Which would mean that it'll have to wait for scsi_put_command() to
be called for all outstanding commands. And as scsi_put_command()
will be called only _after_ our routine runs (see the reasoning
above) we should be safe.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 7

Testing NPIV Feature with Qemu-KVM

2013-09-02 Thread cshastri


Hi All,

I am testing NPIV feature on upstream Qemu, I have configured the zone
and able to see the created vport on the storage array.

Since, I am learning on how to setup the NPIV, I haven't created the  
different zone for

the vport and the array, I just added in the existing zone.

Now, how do pass the LUN to the qemu, from Dr. Hannes Reineckei mail  
thread I got to know that lspci command on the host doesn't show the  
virtual HBA.


I didn't understand why there is limitation on that and if I specify
/usr/libexec/qemu-kvm -enable-kvm Fedora19 -m 3000 -smp 2 -net nic -net
\ user -vnc 127.0.0.1:0 -drive if=scsi,file=/dev/sdj

How do I make sure that qemu is using the virtual HBA or (vport)?

Thanks,
Shastri

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] scsi: improved eh timeout handler

2013-09-02 Thread Bart Van Assche

On 09/02/13 09:12, Hannes Reinecke wrote:

@@ -353,6 +354,8 @@ void scsi_put_command(struct scsi_cmnd *cmd)
list_del_init(&cmd->list);
spin_unlock_irqrestore(&cmd->device->list_lock, flags);

+   cancel_delayed_work(&cmd->abort_work);
+
__scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev);
  }
  EXPORT_SYMBOL(scsi_put_command);


Is this approach safe ? Is it e.g. possible that the abort work starts 
just before the cancel_delayed_work() call and continues to run after 
scsi_put_command() has finished ? In drivers/scsi/libfc/fc_exch.c a 
similar issue is solved by holding an additional reference as long as 
delayed work (fc_exch.timeout_work) is queued.



+void
+scmd_eh_abort_handler(struct work_struct *work)
+{
+   struct scsi_cmnd *scmd =
+   container_of(work, struct scsi_cmnd, abort_work.work);
+   struct scsi_device *sdev = scmd->device;
+   unsigned long flags;
+   int rtn;
+
+   spin_lock_irqsave(sdev->host->host_lock, flags);
+   if (scsi_host_eh_past_deadline(sdev->host)) {
+   spin_unlock_irqrestore(sdev->host->host_lock, flags);
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   "scmd %p eh timeout, not aborting\n", 
scmd));
+   } else {
+   spin_unlock_irqrestore(sdev->host->host_lock, flags);
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   "aborting command %p\n", scmd));
+   rtn = scsi_try_to_abort_cmd(sdev->host->hostt, scmd);
+   if (rtn == SUCCESS) {
+   scmd->result |= DID_TIME_OUT << 16;
+   if (!scsi_noretry_cmd(scmd) &&
+   (++scmd->retries <= scmd->allowed)) {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_WARNING, scmd,
+   "scmd %p retry "
+   "aborted command\n", scmd));
+   scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
+   } else {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_WARNING, scmd,
+   "scmd %p finish "
+   "aborted command\n", scmd));
+   scsi_finish_command(scmd);
+   }
+   return;
+   }
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   "scmd %p abort failed, rtn %d\n",
+   scmd, rtn));
+   }
+
+   if (scsi_eh_scmd_add(scmd, 0)) {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_WARNING, scmd,
+   "scmd %p terminate "
+   "aborted command\n", scmd));
+   scmd->result |= DID_TIME_OUT << 16;
+   scsi_finish_command(scmd);
+   }
+}


This patch adds several new calls to LLD EH handlers. Is it guaranteed 
that these will only be invoked before scsi_remove_host() has finished ? 
For more background information, see also "[PATCH] Make 
scsi_remove_host() wait until error handling finished" 
(http://thread.gmane.org/gmane.linux.scsi/82572/focus=82779).


Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] scsi: Fix erratic device offline during EH

2013-09-02 Thread Hannes Reinecke
Commit 18a4d0a22ed6c54b67af7718c305cd010f09ddf8
(Handle disk devices which can not process medium access commands)
was introduced to offline any device which cannot process medium
access commands.
However, commit 3eef6257de48ff84a5d98ca533685df8a3beaeb8
(Reduce error recovery time by reducing use of TURs) reduced
the number of TURs by sending it only on the first failing
command, which might or might not be a medium access command.
So in combination this results in an erratic device offlining
during EH; if the command where the TUR was sent upon happens
to be a medium access command the device will be set offline,
if not everything proceeds as normal.

So instead of checking the EH command in the ->eh_action
callback we should rather call ->eh_action when we're
about to finish the command _and_ have sent a TUR previously.
This should then set the device offline as advertised.

Cc: Martin K. Petersen 
Cc: Ewan Milne 
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_error.c | 28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index abf0916..c88cb7e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -941,12 +941,6 @@ retry:
 
scsi_eh_restore_cmnd(scmd, &ses);
 
-   if (scmd->request->cmd_type != REQ_TYPE_BLOCK_PC) {
-   struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
-   if (sdrv->eh_action)
-   rtn = sdrv->eh_action(scmd, cmnd, cmnd_size, rtn);
-   }
-
return rtn;
 }
 
@@ -964,6 +958,18 @@ static int scsi_request_sense(struct scsi_cmnd *scmd)
return scsi_send_eh_cmnd(scmd, NULL, 0, scmd->device->eh_timeout, ~0);
 }
 
+static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
+{
+   static unsigned char tur_command[6] = {TEST_UNIT_READY, 0, 0, 0, 0, 0};
+
+   if (scmd->request->cmd_type != REQ_TYPE_BLOCK_PC) {
+   struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
+   if (sdrv->eh_action)
+   rtn = sdrv->eh_action(scmd, tur_command, 6, rtn);
+   }
+   return rtn;
+}
+
 /**
  * scsi_eh_finish_cmd - Handle a cmd that eh is finished with.
  * @scmd:  Original SCSI cmd that eh has finished.
@@ -1142,7 +1148,9 @@ static int scsi_eh_test_devices(struct list_head 
*cmd_list,
 
list_for_each_entry_safe(scmd, next, cmd_list, eh_entry)
if (scmd->device == sdev) {
-   if (finish_cmds)
+   if (finish_cmds &&
+   (try_stu ||
+scsi_eh_action(scmd, SUCCESS) == SUCCESS))
scsi_eh_finish_cmd(scmd, done_q);
else
list_move_tail(&scmd->eh_entry, work_q);
@@ -1281,7 +1289,8 @@ static int scsi_eh_stu(struct Scsi_Host *shost,
!scsi_eh_tur(stu_scmd)) {
list_for_each_entry_safe(scmd, next,
  work_q, eh_entry) {
-   if (scmd->device == sdev)
+   if (scmd->device == sdev &&
+   scsi_eh_action(scmd, SUCCESS) == 
SUCCESS)
scsi_eh_finish_cmd(scmd, 
done_q);
}
}
@@ -1348,7 +1357,8 @@ static int scsi_eh_bus_device_reset(struct Scsi_Host 
*shost,
!scsi_eh_tur(bdr_scmd)) {
list_for_each_entry_safe(scmd, next,
 work_q, eh_entry) {
-   if (scmd->device == sdev)
+   if (scmd->device == sdev &&
+   scsi_eh_action(scmd, rtn) != FAILED)
scsi_eh_finish_cmd(scmd,
   done_q);
}
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] scsi_error: Update documentation

2013-09-02 Thread Hannes Reinecke
Update the documentation to cover asynchronous command aborts
and clarify sections referring to SCSI timeout handling.

Signed-off-by: Hannes Reinecke 
---
 Documentation/scsi/scsi_eh.txt  | 69 +++--
 Documentation/scsi/scsi_mid_low_api.txt |  9 -
 drivers/scsi/scsi.c |  6 +--
 3 files changed, 48 insertions(+), 36 deletions(-)

diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
index 6ff16b6..a0c8511 100644
--- a/Documentation/scsi/scsi_eh.txt
+++ b/Documentation/scsi/scsi_eh.txt
@@ -42,20 +42,14 @@ discussion.
 
  Once LLDD gets hold of a scmd, either the LLDD will complete the
 command by calling scsi_done callback passed from midlayer when
-invoking hostt->queuecommand() or SCSI midlayer will time it out.
+invoking hostt->queuecommand() or the block layer will time it out.
 
 
 [1-2-1] Completing a scmd w/ scsi_done
 
  For all non-EH commands, scsi_done() is the completion callback.  It
-does the following.
-
- 1. Delete timeout timer.  If it fails, it means that timeout timer
-has expired and is going to finish the command.  Just return.
-
- 2. Link scmd to per-cpu scsi_done_q using scmd->en_entry
-
- 3. Raise SCSI_SOFTIRQ
+just calls blk_complete_request() to delete the block layer timer and
+raise SCSI_SOFTIRQ
 
  SCSI_SOFTIRQ handler scsi_softirq calls scsi_decide_disposition() to
 determine what to do with the command.  scsi_decide_disposition()
@@ -64,10 +58,12 @@ with the command.
 
  - SUCCESS
scsi_finish_command() is invoked for the command.  The
-   function does some maintenance choirs and notify completion by
-   calling scmd->done() callback, which, for fs requests, would
-   be HLD completion callback - sd:sd_rw_intr, sr:rw_intr,
-   st:st_intr.
+   function does some maintenance chores and then calls
+   scsi_io_completion() to finish the I/O.
+   scsi_io_completion() then notifies the block layer on
+   the completed request by calling blk_end_request and
+   friends or figures out what to do with the remainder
+   of the data in case of an error.
 
  - NEEDS_RETRY
  - ADD_TO_MLQUEUE
@@ -86,33 +82,45 @@ function
  1. invokes optional hostt->eh_timed_out() callback.  Return value can
 be one of
 
-- EH_HANDLED
-   This indicates that eh_timed_out() dealt with the timeout.  The
-   scmd is passed to __scsi_done() and thus linked into per-cpu
-   scsi_done_q.  Normal command completion described in [1-2-1]
-   follows.
+- BLK_EH_HANDLED
+   This indicates that eh_timed_out() dealt with the timeout.
+   The command is passed back to the block layer and completed
+   via __blk_complete_requests().
+
+   *NOTE* After returning BLK_EH_HANDLED the SCSI layer is
+   assumed to be finished with the command, and no other
+   functions from the SCSI layer will be called. So this
+   should typically only be returned if the eh_timed_out()
+   handler raced with normal completion.
 
-- EH_RESET_TIMER
+- BLK_EH_RESET_TIMER
This indicates that more time is required to finish the
command.  Timer is restarted.  This action is counted as a
retry and only allowed scmd->allowed + 1(!) times.  Once the
-   limit is reached, action for EH_NOT_HANDLED is taken instead.
+   limit is reached, action for BLK_EH_NOT_HANDLED is taken instead.
 
-   *NOTE* This action is racy as the LLDD could finish the scmd
-   after the timeout has expired but before it's added back.  In
-   such cases, scsi_done() would think that timeout has occurred
-   and return without doing anything.  We lose completion and the
-   command will time out again.
-
-- EH_NOT_HANDLED
-   This is the same as when eh_timed_out() callback doesn't exist.
+- BLK_EH_NOT_HANDLED
+eh_timed_out() callback did not handle the command.
Step #2 is taken.
 
+ 2. If the host supports asynchronous completion (as indicated by the
+no_async_abort setting in the host template) scsi_abort_command()
+is invoked to schedule an asynchrous abort. If that fails
+Step #3 is taken.
+
  2. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
 command.  See [1-3] for more information.
 
+[1-3] Asynchronous command aborts
+
+ After a timeout occurs a command abort is scheduled from
+ scsi_abort_command(). If the abort is successful the command
+ will either be retried (if the number of retries is not exhausted)
+ or terminated with DID_TIME_OUT.
+ Otherwise scsi_eh_scmd_add() is invoked for the command.
+ See [1-4] for more information.
 
-[1-3] How EH takes over
+[1-4] How EH takes over
 
  scmds enter EH via scsi_eh_scmd_add(), which does the following.
 
@@ -320,7 +328,8 @@ scmd->allowed.
 
 <>
 
-   This action is taken for each timed out command.
+   This action is taken for each timed out command when
+   no_async_abort is enabled in the host te

[PATCH 2/3] scsi: improved eh timeout handler

2013-09-02 Thread Hannes Reinecke
When a command runs into a timeout we need to send an 'ABORT TASK'
TMF. This is typically done by the 'eh_abort_handler' LLDD callback.

Conceptually, however, this function is a normal SCSI command, so
there is no need to enter the error handler.

This patch implements a new scsi_abort_command() function which
invokes an asynchronous function scsi_eh_abort_handler() to
abort the commands via the usual 'eh_abort_handler'.

If abort succeeds the command is either retried or terminated,
depending on the number of allowed retries. However, 'eh_eflags'
records the abort, so if the retry would fail again the
command is pushed onto the error handler without trying to
abort it (again); it'll be cleared up from SCSI EH.

Signed-off-by: Hannes Reinecke 
Cc: Ren Mingxin 
---
 drivers/scsi/scsi.c   |   3 +
 drivers/scsi/scsi_error.c | 149 ++
 drivers/scsi/scsi_priv.h  |   2 +
 include/scsi/scsi_cmnd.h  |   1 +
 include/scsi/scsi_host.h  |   5 ++
 5 files changed, 147 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index fe0bcb1..2b04a57 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -297,6 +297,7 @@ struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, 
gfp_t gfp_mask)
 
cmd->device = dev;
INIT_LIST_HEAD(&cmd->list);
+   INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);
spin_lock_irqsave(&dev->list_lock, flags);
list_add_tail(&cmd->list, &dev->cmd_list);
spin_unlock_irqrestore(&dev->list_lock, flags);
@@ -353,6 +354,8 @@ void scsi_put_command(struct scsi_cmnd *cmd)
list_del_init(&cmd->list);
spin_unlock_irqrestore(&cmd->device->list_lock, flags);
 
+   cancel_delayed_work(&cmd->abort_work);
+
__scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev);
 }
 EXPORT_SYMBOL(scsi_put_command);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c88cb7e..8eaf524 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -53,6 +53,7 @@ static void scsi_eh_done(struct scsi_cmnd *scmd);
 #define HOST_RESET_SETTLE_TIME  (10)
 
 static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
+static int scsi_try_to_abort_cmd(struct scsi_host_template *, struct scsi_cmnd 
*);
 
 /* called with shost->host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
@@ -100,6 +101,115 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
 }
 
 /**
+ * scmd_eh_abort_handler - Handle command aborts
+ * @work:  command to be aborted.
+ */
+void
+scmd_eh_abort_handler(struct work_struct *work)
+{
+   struct scsi_cmnd *scmd =
+   container_of(work, struct scsi_cmnd, abort_work.work);
+   struct scsi_device *sdev = scmd->device;
+   unsigned long flags;
+   int rtn;
+
+   spin_lock_irqsave(sdev->host->host_lock, flags);
+   if (scsi_host_eh_past_deadline(sdev->host)) {
+   spin_unlock_irqrestore(sdev->host->host_lock, flags);
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   "scmd %p eh timeout, not aborting\n", 
scmd));
+   } else {
+   spin_unlock_irqrestore(sdev->host->host_lock, flags);
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   "aborting command %p\n", scmd));
+   rtn = scsi_try_to_abort_cmd(sdev->host->hostt, scmd);
+   if (rtn == SUCCESS) {
+   scmd->result |= DID_TIME_OUT << 16;
+   if (!scsi_noretry_cmd(scmd) &&
+   (++scmd->retries <= scmd->allowed)) {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_WARNING, scmd,
+   "scmd %p retry "
+   "aborted command\n", scmd));
+   scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
+   } else {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_WARNING, scmd,
+   "scmd %p finish "
+   "aborted command\n", scmd));
+   scsi_finish_command(scmd);
+   }
+   return;
+   }
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   "scmd %p abort failed, rtn %d\n",
+   scmd, rtn));
+   }
+
+   if (scsi_eh_scmd_add(scmd, 0)) {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_WARNING, scmd,
+  

[PATCHv6 0/3] New EH command timeout handler

2013-09-02 Thread Hannes Reinecke
Hi all,

this patchset implements a new SCSI EH command timeout handler
which will be sending command aborts inline without actually
engaging SCSI EH.
SCSI EH will only be invoked if command abort fails.

In addition the commands will be returned directly
if the command abort succeeded, cutting down recovery
times dramatically.

With the original SCSI EH I got:
# time dd if=/dev/zero of=/dev/dm-2 bs=4k count=4k oflag=direct
4096+0 records in
4096+0 records out
16777216 bytes (17 MB) copied, 142.652 s, 118 kB/s

real2m22.657s
user0m0.013s
sys 0m0.145s

With this patchset I got:
# time dd if=/dev/zero of=/dev/dm-2 bs=4k count=4k oflag=direct
4096+0 records in
4096+0 records out
16777216 bytes (17 MB) copied, 52.1579 s, 322 kB/s

real0m52.163s
user0m0.012s
sys 0m0.145s

Test was to disable RSCN on the target port, disable the
target port, and then start the 'dd' command as indicated.

Changes to the original version:
- Use a private list in scsi_eh_abort_handler to avoid
  list starvation (pointed out by Joern Engel)
- Terminate command aborts when the first abort fails
- Do not attempt command aborts if the host is already in recovery
  or if the device is removed.
- Flush abort workqueue if the device is removed.

Changes to v2:
- Removed eh_entry initialisation
- Convert to per-command workqueue

Changes to v3:
- Use delayed_work
- Enable new eh timeout handler for virtio, SAS, and FC
- Modify logging messages to include scmd pointer

Changes to v4:
- Remove stubs when enabling new eh timeout handler
  for other drivers

Changes to v5:
- Enable new eh timeout handler per default
- Update documentation

Hannes Reinecke (3):
  scsi: Fix erratic device offline during EH
  scsi: improved eh timeout handler
  scsi_error: Update documentation

 Documentation/scsi/scsi_eh.txt  |  69 +++--
 Documentation/scsi/scsi_mid_low_api.txt |   9 +-
 drivers/scsi/scsi.c |   9 +-
 drivers/scsi/scsi_error.c   | 177 
 drivers/scsi/scsi_priv.h|   2 +
 include/scsi/scsi_cmnd.h|   1 +
 include/scsi/scsi_host.h|   5 +
 7 files changed, 214 insertions(+), 58 deletions(-)

-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] uas: rename work list lock + list field

2013-09-02 Thread Gerd Hoffmann
This patch prepares for the addition of another list and renames the
work list lock and the list_head field in struct uas_cmd_info.

Signed-off-by: Gerd Hoffmann 
---
 drivers/usb/storage/uas.c | 50 +++
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index f89202f..a63972a 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -77,7 +77,7 @@ struct uas_cmd_info {
struct urb *cmd_urb;
struct urb *data_in_urb;
struct urb *data_out_urb;
-   struct list_head list;
+   struct list_head work;
 };
 
 /* I hate forward declarations, but I actually have a loop */
@@ -89,7 +89,7 @@ static void uas_configure_endpoints(struct uas_dev_info 
*devinfo);
 static void uas_free_streams(struct uas_dev_info *devinfo);
 
 static DECLARE_WORK(uas_work, uas_do_work);
-static DEFINE_SPINLOCK(uas_work_lock);
+static DEFINE_SPINLOCK(uas_lists_lock);
 static LIST_HEAD(uas_work_list);
 
 static void uas_unlink_data_urbs(struct uas_dev_info *devinfo,
@@ -124,11 +124,11 @@ static void uas_do_work(struct work_struct *work)
unsigned long flags;
int err;
 
-   spin_lock_irq(&uas_work_lock);
+   spin_lock_irq(&uas_lists_lock);
list_replace_init(&uas_work_list, &list);
-   spin_unlock_irq(&uas_work_lock);
+   spin_unlock_irq(&uas_lists_lock);
 
-   list_for_each_entry_safe(cmdinfo, temp, &list, list) {
+   list_for_each_entry_safe(cmdinfo, temp, &list, work) {
struct scsi_pointer *scp = (void *)cmdinfo;
struct scsi_cmnd *cmnd = container_of(scp,
struct scsi_cmnd, SCp);
@@ -139,10 +139,10 @@ static void uas_do_work(struct work_struct *work)
cmdinfo->state &= ~IS_IN_WORK_LIST;
spin_unlock_irqrestore(&devinfo->lock, flags);
if (err) {
-   list_del(&cmdinfo->list);
-   spin_lock_irq(&uas_work_lock);
-   list_add_tail(&cmdinfo->list, &uas_work_list);
-   spin_unlock_irq(&uas_work_lock);
+   list_del(&cmdinfo->work);
+   spin_lock_irq(&uas_lists_lock);
+   list_add_tail(&cmdinfo->work, &uas_work_list);
+   spin_unlock_irq(&uas_lists_lock);
schedule_work(&uas_work);
}
}
@@ -155,12 +155,12 @@ static void uas_abort_work(struct uas_dev_info *devinfo)
struct list_head list;
unsigned long flags;
 
-   spin_lock_irq(&uas_work_lock);
+   spin_lock_irq(&uas_lists_lock);
list_replace_init(&uas_work_list, &list);
-   spin_unlock_irq(&uas_work_lock);
+   spin_unlock_irq(&uas_lists_lock);
 
spin_lock_irqsave(&devinfo->lock, flags);
-   list_for_each_entry_safe(cmdinfo, temp, &list, list) {
+   list_for_each_entry_safe(cmdinfo, temp, &list, work) {
struct scsi_pointer *scp = (void *)cmdinfo;
struct scsi_cmnd *cmnd = container_of(scp,
struct scsi_cmnd, SCp);
@@ -178,10 +178,10 @@ static void uas_abort_work(struct uas_dev_info *devinfo)
uas_try_complete(cmnd, __func__);
} else {
/* not our uas device, relink into list */
-   list_del(&cmdinfo->list);
-   spin_lock_irq(&uas_work_lock);
-   list_add_tail(&cmdinfo->list, &uas_work_list);
-   spin_unlock_irq(&uas_work_lock);
+   list_del(&cmdinfo->work);
+   spin_lock_irq(&uas_lists_lock);
+   list_add_tail(&cmdinfo->work, &uas_work_list);
+   spin_unlock_irq(&uas_lists_lock);
}
}
spin_unlock_irqrestore(&devinfo->lock, flags);
@@ -288,10 +288,10 @@ static void uas_xfer_data(struct urb *urb, struct 
scsi_cmnd *cmnd,
cmdinfo->state |= direction | SUBMIT_STATUS_URB;
err = uas_submit_urbs(cmnd, cmnd->device->hostdata, GFP_ATOMIC);
if (err) {
-   spin_lock(&uas_work_lock);
-   list_add_tail(&cmdinfo->list, &uas_work_list);
+   spin_lock(&uas_lists_lock);
+   list_add_tail(&cmdinfo->work, &uas_work_list);
cmdinfo->state |= IS_IN_WORK_LIST;
-   spin_unlock(&uas_work_lock);
+   spin_unlock(&uas_lists_lock);
schedule_work(&uas_work);
}
 }
@@ -694,10 +694,10 @@ static int uas_queuecommand_lck(struct scsi_cmnd *cmnd,
spin_unlock_irqrestore(&devinfo->lock, flags);
return SCSI_MLQUEUE_DEVICE_BUSY;
}
-   spin_lock(&uas_work_lock);
-   list_add_tail(&cmdinfo->list

[PATCH 2/5] uas: properly reinitialize in uas_eh_bus_reset_handler

2013-09-02 Thread Gerd Hoffmann
Signed-off-by: Gerd Hoffmann 
---
 drivers/usb/storage/uas.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index d966b59..f89202f 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -85,6 +85,8 @@ static int uas_submit_urbs(struct scsi_cmnd *cmnd,
struct uas_dev_info *devinfo, gfp_t gfp);
 static void uas_do_work(struct work_struct *work);
 static int uas_try_complete(struct scsi_cmnd *cmnd, const char *caller);
+static void uas_configure_endpoints(struct uas_dev_info *devinfo);
+static void uas_free_streams(struct uas_dev_info *devinfo);
 
 static DECLARE_WORK(uas_work, uas_do_work);
 static DEFINE_SPINLOCK(uas_work_lock);
@@ -800,7 +802,11 @@ static int uas_eh_bus_reset_handler(struct scsi_cmnd *cmnd)
usb_kill_anchored_urbs(&devinfo->cmd_urbs);
usb_kill_anchored_urbs(&devinfo->sense_urbs);
usb_kill_anchored_urbs(&devinfo->data_urbs);
+   uas_free_streams(devinfo);
err = usb_reset_device(udev);
+   if (!err) {
+   uas_configure_endpoints(devinfo);
+   }
devinfo->resetting = 0;
 
if (err) {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] uas: add dead request list

2013-09-02 Thread Gerd Hoffmann
This patch adds a new list where all requests which are canceled are
added to, so we don't loose them.  Then, after killing all inflight
urbs on bus reset (and disconnect) we'll walk over the list and clean
them up.

Without this we can end up with aborted requests lingering around in
case of status pipe transfer errors.

Signed-off-by: Gerd Hoffmann 
---
 drivers/usb/storage/uas.c | 69 +--
 1 file changed, 61 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index a63972a..9dfb8f9 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -78,6 +78,7 @@ struct uas_cmd_info {
struct urb *data_in_urb;
struct urb *data_out_urb;
struct list_head work;
+   struct list_head dead;
 };
 
 /* I hate forward declarations, but I actually have a loop */
@@ -87,10 +88,12 @@ static void uas_do_work(struct work_struct *work);
 static int uas_try_complete(struct scsi_cmnd *cmnd, const char *caller);
 static void uas_configure_endpoints(struct uas_dev_info *devinfo);
 static void uas_free_streams(struct uas_dev_info *devinfo);
+static void uas_log_cmd_state(struct scsi_cmnd *cmnd, const char *caller);
 
 static DECLARE_WORK(uas_work, uas_do_work);
 static DEFINE_SPINLOCK(uas_lists_lock);
 static LIST_HEAD(uas_work_list);
+static LIST_HEAD(uas_dead_list);
 
 static void uas_unlink_data_urbs(struct uas_dev_info *devinfo,
 struct uas_cmd_info *cmdinfo)
@@ -167,15 +170,13 @@ static void uas_abort_work(struct uas_dev_info *devinfo)
struct uas_dev_info *di = (void *)cmnd->device->hostdata;
 
if (di == devinfo) {
+   uas_log_cmd_state(cmnd, __func__);
+   BUG_ON(cmdinfo->state & COMMAND_ABORTED);
cmdinfo->state |= COMMAND_ABORTED;
+   spin_lock_irq(&uas_lists_lock);
+   list_add_tail(&cmdinfo->dead, &uas_dead_list);
+   spin_unlock_irq(&uas_lists_lock);
cmdinfo->state &= ~IS_IN_WORK_LIST;
-   if (devinfo->resetting) {
-   /* uas_stat_cmplt() will not do that
-* when a device reset is in
-* progress */
-   cmdinfo->state &= ~COMMAND_INFLIGHT;
-   }
-   uas_try_complete(cmnd, __func__);
} else {
/* not our uas device, relink into list */
list_del(&cmdinfo->work);
@@ -187,6 +188,43 @@ static void uas_abort_work(struct uas_dev_info *devinfo)
spin_unlock_irqrestore(&devinfo->lock, flags);
 }
 
+static void uas_zap_dead(struct uas_dev_info *devinfo)
+{
+   struct uas_cmd_info *cmdinfo;
+   struct uas_cmd_info *temp;
+   struct list_head list;
+   unsigned long flags;
+
+   spin_lock_irq(&uas_lists_lock);
+   list_replace_init(&uas_dead_list, &list);
+   spin_unlock_irq(&uas_lists_lock);
+
+   spin_lock_irqsave(&devinfo->lock, flags);
+   list_for_each_entry_safe(cmdinfo, temp, &list, dead) {
+   struct scsi_pointer *scp = (void *)cmdinfo;
+   struct scsi_cmnd *cmnd = container_of(scp,
+   struct scsi_cmnd, SCp);
+   struct uas_dev_info *di = (void *)cmnd->device->hostdata;
+
+   if (di == devinfo) {
+   uas_log_cmd_state(cmnd, __func__);
+   BUG_ON(!(cmdinfo->state & COMMAND_ABORTED));
+   /* all urbs are killed, clear inflight bits */
+   cmdinfo->state &= ~(COMMAND_INFLIGHT |
+   DATA_IN_URB_INFLIGHT |
+   DATA_OUT_URB_INFLIGHT);
+   uas_try_complete(cmnd, __func__);
+   } else {
+   /* not our uas device, relink into list */
+   list_del(&cmdinfo->dead);
+   spin_lock_irq(&uas_lists_lock);
+   list_add_tail(&cmdinfo->dead, &uas_dead_list);
+   spin_unlock_irq(&uas_lists_lock);
+   }
+   }
+   spin_unlock_irqrestore(&devinfo->lock, flags);
+}
+
 static void uas_sense(struct urb *urb, struct scsi_cmnd *cmnd)
 {
struct sense_iu *sense_iu = urb->transfer_buffer;
@@ -274,6 +312,9 @@ static int uas_try_complete(struct scsi_cmnd *cmnd, const 
char *caller)
if (cmdinfo->state & COMMAND_ABORTED) {
scmd_printk(KERN_INFO, cmnd, "abort completed\n");
cmnd->result = DID_ABORT << 16;
+   spin_lock_irq(&uas_lists_lock);
+   list_del(&cmdinfo->dead);
+   spin_unlock_irq(&uas_lists_lock);
}
cmnd->scsi_done(cmnd);
re

Re: [PATCHv5 0/9] New EH command timeout handler

2013-09-02 Thread Hannes Reinecke
On 09/02/2013 11:31 AM, Christoph Hellwig wrote:
> On Mon, Sep 02, 2013 at 10:54:13AM +0200, Hannes Reinecke wrote:
>> I don't mind. Having talked to the various SCSI folks everyone
>> agreed that calling abort asynchronously shouldn't do any harm.
>> At least as far as the SCSI spec goes.
>>
>> As for documentation: I didn't document it it currently as with my
>> implementation it's pretty much an optional thing.
>> But if we were to enable it globally it surely should be documented.
>>
>> So if there is a consensus I surely can enable it globally and
>> update the documentation.
> 
> I would defintively love enabling it globally.  As a fallback we should
> probably keep the old code for a while and prepare a way to disable
> the async code by a flag in the host template.  If no reason to enable
> it shows up after a year ore two we can remove the old code entirely.
> 
> I.e change the tail of scsi_times_out into something like:
> 
> 
>   if (rtn == BLK_EH_NOT_HANDLED && !shost->hostt->no_async_abort) {
>   if (scsi_abort_command(scmd))
>   return BLK_EH_NOT_HANDLED;
>   }
> 
>   scmd->result |= DID_TIME_OUT << 16;
>   ...
> 
> That way we don't even need the new BLK_EH_SCHEDULED return value and
> have all the handling in scsi_error.c
> 
Right. Will be doing it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv5 0/9] New EH command timeout handler

2013-09-02 Thread Christoph Hellwig
On Mon, Sep 02, 2013 at 10:54:13AM +0200, Hannes Reinecke wrote:
> I don't mind. Having talked to the various SCSI folks everyone
> agreed that calling abort asynchronously shouldn't do any harm.
> At least as far as the SCSI spec goes.
> 
> As for documentation: I didn't document it it currently as with my
> implementation it's pretty much an optional thing.
> But if we were to enable it globally it surely should be documented.
> 
> So if there is a consensus I surely can enable it globally and
> update the documentation.

I would defintively love enabling it globally.  As a fallback we should
probably keep the old code for a while and prepare a way to disable
the async code by a flag in the host template.  If no reason to enable
it shows up after a year ore two we can remove the old code entirely.

I.e change the tail of scsi_times_out into something like:


if (rtn == BLK_EH_NOT_HANDLED && !shost->hostt->no_async_abort) {
if (scsi_abort_command(scmd))
return BLK_EH_NOT_HANDLED;
}

scmd->result |= DID_TIME_OUT << 16;
...

That way we don't even need the new BLK_EH_SCHEDULED return value and
have all the handling in scsi_error.c
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


re: [SCSI] qla2xxx: Add support for ISP8044.

2013-09-02 Thread Dan Carpenter
Hello Atul Deshmukh,

This is a semi-automatic email about new static checker warnings.

The patch ef4647420025: "[SCSI] qla2xxx: Add support for ISP8044." 
from Aug 27, 2013, leads to the following Smatch complaint:

drivers/scsi/qla2xxx/qla_os.c:2962 qla2x00_probe_one()
 error: we previously assumed 'base_vha' could be null (see line 2632)

drivers/scsi/qla2xxx/qla_os.c
  2631  base_vha = qla2x00_create_host(sht, ha);
  2632  if (!base_vha) {
 
Existing check.

  2633  ret = -ENOMEM;
  2634  qla2x00_mem_free(ha);
  2635  qla2x00_free_req_que(ha, req);
  2636  qla2x00_free_rsp_que(ha, rsp);
  2637  goto probe_hw_failed;
  2638  }

[ snip ]

  2954  probe_hw_failed:
  2955  if (IS_QLA82XX(ha)) {
  2956  qla82xx_idc_lock(ha);
  2957  qla82xx_clear_drv_active(ha);
  2958  qla82xx_idc_unlock(ha);
  2959  }
  2960  if (IS_QLA8044(ha)) {
  2961  qla8044_idc_lock(ha);
  2962  qla8044_clear_drv_active(base_vha);
 
Patch adds dereference.

  2963  qla8044_idc_unlock(ha);
  2964  }

regards,
dan carpenter
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv5 0/9] New EH command timeout handler

2013-09-02 Thread Hannes Reinecke
On 09/02/2013 10:27 AM, Christoph Hellwig wrote:
> One thing I'm still wondering is why we can't enable this globally,
> and if there is a reason it should be documented.
> 
> As far as I can tell the actual calling context of the eh_abort_handler
> doesn't change, so an LLDD would have to rely on some very specific
> side effects of being in EH to break, and we never guaranteed such
> specifics.
> 
I don't mind. Having talked to the various SCSI folks everyone
agreed that calling abort asynchronously shouldn't do any harm.
At least as far as the SCSI spec goes.

As for documentation: I didn't document it it currently as with my
implementation it's pretty much an optional thing.
But if we were to enable it globally it surely should be documented.

So if there is a consensus I surely can enable it globally and
update the documentation.

James?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv5 0/9] New EH command timeout handler

2013-09-02 Thread Christoph Hellwig
One thing I'm still wondering is why we can't enable this globally,
and if there is a reason it should be documented.

As far as I can tell the actual calling context of the eh_abort_handler
doesn't change, so an LLDD would have to rely on some very specific
side effects of being in EH to break, and we never guaranteed such
specifics.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/9] libsas: Enable new EH timeout handler

2013-09-02 Thread Hannes Reinecke
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/libsas/sas_scsi_host.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
b/drivers/scsi/libsas/sas_scsi_host.c
index da3aee1..6eb2f0c 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -864,7 +864,7 @@ enum blk_eh_timer_return sas_scsi_timed_out(struct 
scsi_cmnd *cmd)
 {
scmd_printk(KERN_DEBUG, cmd, "command %p timed out\n", cmd);
 
-   return BLK_EH_NOT_HANDLED;
+   return scsi_abort_command(cmd);
 }
 
 int sas_ioctl(struct scsi_device *sdev, int cmd, void __user *arg)
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/9] scsi_transport_fc: Enable new EH timeout handler

2013-09-02 Thread Hannes Reinecke
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_transport_fc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 4628fd5..012c801 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -2079,7 +2079,7 @@ fc_timed_out(struct scsi_cmnd *scmd)
if (rport->port_state == FC_PORTSTATE_BLOCKED)
return BLK_EH_RESET_TIMER;
 
-   return BLK_EH_NOT_HANDLED;
+   return scsi_abort_command(scmd);
 }
 
 /*
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv5 0/9] New EH command timeout handler

2013-09-02 Thread Hannes Reinecke
Hi all,

this patchset implements a new SCSI EH command timeout handler
which will be sending command aborts inline without actually
engaging SCSI EH.
SCSI EH will only be invoked if command abort fails.

In addition the commands will be returned directly
if the command abort succeeded, cutting down recovery
times dramatically.

With the original SCSI EH I got:
# time dd if=/dev/zero of=/dev/dm-2 bs=4k count=4k oflag=direct
4096+0 records in
4096+0 records out
16777216 bytes (17 MB) copied, 142.652 s, 118 kB/s

real2m22.657s
user0m0.013s
sys 0m0.145s

With this patchset I got:
# time dd if=/dev/zero of=/dev/dm-2 bs=4k count=4k oflag=direct
4096+0 records in
4096+0 records out
16777216 bytes (17 MB) copied, 52.1579 s, 322 kB/s

real0m52.163s
user0m0.012s
sys 0m0.145s

Test was to disable RSCN on the target port, disable the
target port, and then start the 'dd' command as indicated.

Changes to the original version:
- Use a private list in scsi_eh_abort_handler to avoid
  list starvation (pointed out by Joern Engel)
- Terminate command aborts when the first abort fails
- Do not attempt command aborts if the host is already in recovery
  or if the device is removed.
- Flush abort workqueue if the device is removed.

Changes to v2:
- Removed eh_entry initialisation
- Convert to per-command workqueue

Changes to v3:
- Use delayed_work
- Enable new eh timeout handler for virtio, SAS, and FC
- Modify logging messages to include scmd pointer

Changes to v4:
- Remove stubs when enabling new eh timeout handler
  for other drivers

Hannes Reinecke (9):
  scsi: Fix erratic device offline during EH
  blk-timeout: add BLK_EH_SCHEDULED return code
  scsi: improved eh timeout handler
  virtio_scsi: Enable new EH timeout handler
  libsas: Enable new EH timeout handler
  mptsas: Enable new EH timeout handler
  mpt2sas: Enable new EH timeout handler
  mpt3sas: Enable new EH timeout handler
  scsi_transport_fc: Enable new EH timeout handler

 drivers/message/fusion/mptsas.c  |   3 +-
 drivers/scsi/libsas/sas_scsi_host.c  |   2 +-
 drivers/scsi/mpt2sas/mpt2sas_scsih.c |   3 +-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c |   1 +
 drivers/scsi/scsi.c  |   3 +
 drivers/scsi/scsi_error.c| 178 ++-
 drivers/scsi/scsi_priv.h |   2 +
 drivers/scsi/scsi_transport_fc.c |   2 +-
 drivers/scsi/virtio_scsi.c   |   2 +
 include/linux/blkdev.h   |   1 +
 include/scsi/scsi_cmnd.h |   2 +
 11 files changed, 173 insertions(+), 26 deletions(-)

-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] blk-timeout: add BLK_EH_SCHEDULED return code

2013-09-02 Thread Hannes Reinecke
Add a 'BLK_EH_SCHEDULED' return code for blk-timeout to indicate
that a delayed error recovery has been initiated.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_error.c | 4 
 include/linux/blkdev.h| 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c88cb7e..b7dd774 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -161,6 +161,10 @@ enum blk_eh_timer_return scsi_times_out(struct request 
*req)
else if (host->hostt->eh_timed_out)
rtn = host->hostt->eh_timed_out(scmd);
 
+   /* Check for asynchronous command aborts */
+   if (rtn == BLK_EH_SCHEDULED)
+   return BLK_EH_NOT_HANDLED;
+
scmd->result |= DID_TIME_OUT << 16;
 
if (unlikely(rtn == BLK_EH_NOT_HANDLED &&
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2fdb4a4..d846e2b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -238,6 +238,7 @@ enum blk_eh_timer_return {
BLK_EH_NOT_HANDLED,
BLK_EH_HANDLED,
BLK_EH_RESET_TIMER,
+   BLK_EH_SCHEDULED,
 };
 
 typedef enum blk_eh_timer_return (rq_timed_out_fn)(struct request *);
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/9] virtio_scsi: Enable new EH timeout handler

2013-09-02 Thread Hannes Reinecke
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/virtio_scsi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 74b88ef..5fca1ca 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -683,6 +683,7 @@ static struct scsi_host_template 
virtscsi_host_template_single = {
.queuecommand = virtscsi_queuecommand_single,
.eh_abort_handler = virtscsi_abort,
.eh_device_reset_handler = virtscsi_device_reset,
+   .eh_timed_out = scsi_abort_command,
 
.can_queue = 1024,
.dma_boundary = UINT_MAX,
@@ -699,6 +700,7 @@ static struct scsi_host_template 
virtscsi_host_template_multi = {
.queuecommand = virtscsi_queuecommand_multi,
.eh_abort_handler = virtscsi_abort,
.eh_device_reset_handler = virtscsi_device_reset,
+   .eh_timed_out = scsi_abort_command,
 
.can_queue = 1024,
.dma_boundary = UINT_MAX,
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/9] scsi: Fix erratic device offline during EH

2013-09-02 Thread Hannes Reinecke
Commit 18a4d0a22ed6c54b67af7718c305cd010f09ddf8
(Handle disk devices which can not process medium access commands)
was introduced to offline any device which cannot process medium
access commands.
However, commit 3eef6257de48ff84a5d98ca533685df8a3beaeb8
(Reduce error recovery time by reducing use of TURs) reduced
the number of TURs by sending it only on the first failing
command, which might or might not be a medium access command.
So in combination this results in an erratic device offlining
during EH; if the command where the TUR was sent upon happens
to be a medium access command the device will be set offline,
if not everything proceeds as normal.

So instead of checking the EH command in the ->eh_action
callback we should rather call ->eh_action when we're
about to finish the command _and_ have sent a TUR previously.
This should then set the device offline as advertised.

Cc: Martin K. Petersen 
Cc: Ewan Milne 
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_error.c | 28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index abf0916..c88cb7e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -941,12 +941,6 @@ retry:
 
scsi_eh_restore_cmnd(scmd, &ses);
 
-   if (scmd->request->cmd_type != REQ_TYPE_BLOCK_PC) {
-   struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
-   if (sdrv->eh_action)
-   rtn = sdrv->eh_action(scmd, cmnd, cmnd_size, rtn);
-   }
-
return rtn;
 }
 
@@ -964,6 +958,18 @@ static int scsi_request_sense(struct scsi_cmnd *scmd)
return scsi_send_eh_cmnd(scmd, NULL, 0, scmd->device->eh_timeout, ~0);
 }
 
+static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
+{
+   static unsigned char tur_command[6] = {TEST_UNIT_READY, 0, 0, 0, 0, 0};
+
+   if (scmd->request->cmd_type != REQ_TYPE_BLOCK_PC) {
+   struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
+   if (sdrv->eh_action)
+   rtn = sdrv->eh_action(scmd, tur_command, 6, rtn);
+   }
+   return rtn;
+}
+
 /**
  * scsi_eh_finish_cmd - Handle a cmd that eh is finished with.
  * @scmd:  Original SCSI cmd that eh has finished.
@@ -1142,7 +1148,9 @@ static int scsi_eh_test_devices(struct list_head 
*cmd_list,
 
list_for_each_entry_safe(scmd, next, cmd_list, eh_entry)
if (scmd->device == sdev) {
-   if (finish_cmds)
+   if (finish_cmds &&
+   (try_stu ||
+scsi_eh_action(scmd, SUCCESS) == SUCCESS))
scsi_eh_finish_cmd(scmd, done_q);
else
list_move_tail(&scmd->eh_entry, work_q);
@@ -1281,7 +1289,8 @@ static int scsi_eh_stu(struct Scsi_Host *shost,
!scsi_eh_tur(stu_scmd)) {
list_for_each_entry_safe(scmd, next,
  work_q, eh_entry) {
-   if (scmd->device == sdev)
+   if (scmd->device == sdev &&
+   scsi_eh_action(scmd, SUCCESS) == 
SUCCESS)
scsi_eh_finish_cmd(scmd, 
done_q);
}
}
@@ -1348,7 +1357,8 @@ static int scsi_eh_bus_device_reset(struct Scsi_Host 
*shost,
!scsi_eh_tur(bdr_scmd)) {
list_for_each_entry_safe(scmd, next,
 work_q, eh_entry) {
-   if (scmd->device == sdev)
+   if (scmd->device == sdev &&
+   scsi_eh_action(scmd, rtn) != FAILED)
scsi_eh_finish_cmd(scmd,
   done_q);
}
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/9] mpt2sas: Enable new EH timeout handler

2013-09-02 Thread Hannes Reinecke
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/mpt2sas/mpt2sas_scsih.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c 
b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index 389d792..9ede2c2 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -7628,8 +7628,9 @@ static struct scsi_host_template scsih_driver_template = {
.slave_destroy  = _scsih_slave_destroy,
.scan_finished  = _scsih_scan_finished,
.scan_start = _scsih_scan_start,
-   .change_queue_depth = _scsih_change_queue_depth,
+   .change_queue_depth = _scsih_change_queue_depth,
.change_queue_type  = _scsih_change_queue_type,
+   .eh_timed_out   = scsi_abort_command,
.eh_abort_handler   = _scsih_abort,
.eh_device_reset_handler= _scsih_dev_reset,
.eh_target_reset_handler= _scsih_target_reset,
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/9] mpt3sas: Enable new EH timeout handler

2013-09-02 Thread Hannes Reinecke
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index a961fe1..b893383 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -7237,6 +7237,7 @@ static struct scsi_host_template scsih_driver_template = {
.scan_start = _scsih_scan_start,
.change_queue_depth = _scsih_change_queue_depth,
.change_queue_type  = _scsih_change_queue_type,
+   .eh_timed_out   = scsi_abort_command,
.eh_abort_handler   = _scsih_abort,
.eh_device_reset_handler= _scsih_dev_reset,
.eh_target_reset_handler= _scsih_target_reset,
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/9] scsi: improved eh timeout handler

2013-09-02 Thread Hannes Reinecke
When a command runs into a timeout we need to send an 'ABORT TASK'
TMF. This is typically done by the 'eh_abort_handler' LLDD callback.

Conceptually, however, this function is a normal SCSI command, so
there is no need to enter the error handler.

This patch implements a new scsi_abort_command() function which
invokes an asynchronous function scsi_eh_abort_handler() to
abort the commands via the usual 'eh_abort_handler'.

If abort succeeds the command is either retried or terminated,
depending on the number of allowed retries. However, 'eh_eflags'
records the abort, so if the retry would fail again the
command is pushed onto the error handler without trying to
abort it (again); it'll be cleared up from SCSI EH.

Signed-off-by: Hannes Reinecke 
Cc: Ren Mingxin 
---
 drivers/scsi/scsi.c   |   3 +
 drivers/scsi/scsi_error.c | 146 +-
 drivers/scsi/scsi_priv.h  |   2 +
 include/scsi/scsi_cmnd.h  |   2 +
 4 files changed, 140 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index fe0bcb1..2b04a57 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -297,6 +297,7 @@ struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, 
gfp_t gfp_mask)
 
cmd->device = dev;
INIT_LIST_HEAD(&cmd->list);
+   INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);
spin_lock_irqsave(&dev->list_lock, flags);
list_add_tail(&cmd->list, &dev->cmd_list);
spin_unlock_irqrestore(&dev->list_lock, flags);
@@ -353,6 +354,8 @@ void scsi_put_command(struct scsi_cmnd *cmd)
list_del_init(&cmd->list);
spin_unlock_irqrestore(&cmd->device->list_lock, flags);
 
+   cancel_delayed_work(&cmd->abort_work);
+
__scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev);
 }
 EXPORT_SYMBOL(scsi_put_command);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index b7dd774..21c2465 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -53,6 +53,7 @@ static void scsi_eh_done(struct scsi_cmnd *scmd);
 #define HOST_RESET_SETTLE_TIME  (10)
 
 static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
+static int scsi_try_to_abort_cmd(struct scsi_host_template *, struct scsi_cmnd 
*);
 
 /* called with shost->host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
@@ -100,6 +101,116 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
 }
 
 /**
+ * scmd_eh_abort_handler - Handle command aborts
+ * @work:  command to be aborted.
+ */
+void
+scmd_eh_abort_handler(struct work_struct *work)
+{
+   struct scsi_cmnd *scmd =
+   container_of(work, struct scsi_cmnd, abort_work.work);
+   struct scsi_device *sdev = scmd->device;
+   unsigned long flags;
+   int rtn;
+
+   spin_lock_irqsave(sdev->host->host_lock, flags);
+   if (scsi_host_eh_past_deadline(sdev->host)) {
+   spin_unlock_irqrestore(sdev->host->host_lock, flags);
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   "scmd %p eh timeout, not aborting\n", 
scmd));
+   } else {
+   spin_unlock_irqrestore(sdev->host->host_lock, flags);
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   "aborting command %p\n", scmd));
+   rtn = scsi_try_to_abort_cmd(sdev->host->hostt, scmd);
+   if (rtn == SUCCESS) {
+   scmd->result |= DID_TIME_OUT << 16;
+   if (!scsi_noretry_cmd(scmd) &&
+   (++scmd->retries <= scmd->allowed)) {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_WARNING, scmd,
+   "scmd %p retry "
+   "aborted command\n", scmd));
+   scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
+   } else {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_WARNING, scmd,
+   "scmd %p finish "
+   "aborted command\n", scmd));
+   scsi_finish_command(scmd);
+   }
+   return;
+   }
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   "scmd %p abort failed, rtn %d\n",
+   scmd, rtn));
+   }
+
+   if (scsi_eh_scmd_add(scmd, 0)) {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_WARNING, scmd,
+   "scmd %p terminate "
+ 

[PATCH 6/9] mptsas: Enable new EH timeout handler

2013-09-02 Thread Hannes Reinecke
Signed-off-by: Hannes Reinecke 
---
 drivers/message/fusion/mptsas.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index dd239bd..d0aed77 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -1986,7 +1986,8 @@ static struct scsi_host_template mptsas_driver_template = 
{
.slave_configure= mptsas_slave_configure,
.target_destroy = mptsas_target_destroy,
.slave_destroy  = mptscsih_slave_destroy,
-   .change_queue_depth = mptscsih_change_queue_depth,
+   .change_queue_depth = mptscsih_change_queue_depth,
+   .eh_timed_out   = scsi_abort_command,
.eh_abort_handler   = mptscsih_abort,
.eh_device_reset_handler= mptscsih_dev_reset,
.eh_host_reset_handler  = mptscsih_host_reset,
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html