Re: [PATCH v2 3/7] ibmvscsi: Replace magic values in set_adpater_info() with defines

2016-02-16 Thread Martin K. Petersen
> "Tyrel" == Tyrel Datwyler  writes:

>> Is there some reason you didn't carry the review tag over from this:
>> 
>> http://mid.gmane.org/20160204084459.gw27...@c203.arch.suse.de
>> 
>> ?
>> 
>> James

Tyrel> The patch is slightly changed from v1. A define for AIX os type
Tyrel> was added as mentioned in the cover letter v2 changes, and I
Tyrel> moved the defines to the mad_adapter_info_data structure around
Tyrel> the fields they apply.

Johannes: Mind checking this out?

https://patchwork.kernel.org/patch/8276101/

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ses: wrongly error out for Simple Subenclosures

2016-02-16 Thread Tom Yan
I have a WD My Passport that has a SES device with it:

[tom@localhost ~]$ lsscsi -g
...
[7:0:0:0]diskWD   My Passport 083A 1065  /dev/sdc   /dev/sg3
[7:0:0:1]enclosu WD   SES Device   1065  -  /dev/sg4

which is a Simple Subenclosure, as documented in SES-3 (Rev. 11a) Section 4.3.3.

Currently the kernel spam the following error for it:

scsi 7:0:0:1: Wrong diagnostic page; asked for 1 got 0
scsi 7:0:0:1: Failed to get diagnostic page 0xffea
scsi 7:0:0:1: Failed to bind enclosure -19

However, according to the the aforementioned documentation, Simple
Subenclosure is a device that does NOT support any SES diagnostic
page, except the Short Enclosure Status diagnostic page.

And my device is doing pretty much exactly the right thing:

[tom@localhost ~]$ sudo sg_ses /dev/sg4
  WDSES Device1065
Supported diagnostic pages:
  Supported Diagnostic Pages [sdp] [0x0]
  Short Enclosure Status (SES) [ses] [0x8]
   [0x80]
   [0x83]
   [0x84]
   [0x85]

[tom@localhost ~]$ sudo sg_ses -p 0x8 -vvv /dev/sg4
open /dev/sg4 with flags=0x802
inquiry cdb: 12 00 00 00 24 00
  duration=0 ms
  WDSES Device1065
enclosure services device
Receive diagnostic results cmd for Short Enclosure Status (SES) page
Receive diagnostic results cmd: 1c 01 08 ff fc 00
  duration=0 ms
receive diagnostic results: pass-through requested 65532 bytes
(data-in) but got 4 bytes
receive diagnostic results: response
08 00 00 00
Short enclosure status diagnostic page, status=0x0

The content of the Short Enclosure Status page (08 00 00 00) is
correct as documented in SES-3 (Rev. 11a) Section 6.1.11 too.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Why does sd "assume write through" instead of "write back"?

2016-02-16 Thread Tom Yan
Currently the scsi disk driver would "assume write through" (i.e. no
writeback cache on the disk) when the disk fail to report the
availablity of writeback cache (e.g. no caching mode page), then spam
an error through kernel messages.

Although there is a switch to change that "assumption", but it's only
available as a quirk for USB disks.

So why shouldn't we make "assume write back" as the general default
instead? Apparently all it triggers are SYNCHRONIZE CACHE commands,
which I don't suppose to be dangerous on any disks, no matter it
actually has writeback cache available/enabled or not? Probably not
even performance impact?

In case there are any drives that are "allegic" to SYNCHRONIZE CACHE
(I mean serious issue like the drive will be frozen once it receive
that command, not something like invalid opcode), shouldn't there be
quirk written for them, instead of generally falling back to what we
consider "dangerous" and spam error for every drives that doesn't have
a caching mode page (which are pretty common)?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fusion-mptbase: handle failed allocation for workqueue

2016-02-16 Thread Ewan Milne
On Tue, 2016-02-16 at 13:33 -0500, Insu Yun wrote:
> 
> 
> On Tue, Feb 16, 2016 at 1:18 PM, Ewan Milne  wrote:
> On Mon, 2016-02-15 at 21:50 -0500, Insu Yun wrote:
> > the failure of ioc->reset_work_q is checked,
> > but not ioc->fw_event_q.
> >
> > Signed-off-by: Insu Yun 
> > ---
> >  drivers/message/fusion/mptbase.c | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/message/fusion/mptbase.c
> b/drivers/message/fusion/mptbase.c
> > index 5dcc031..d4907a1 100644
> > --- a/drivers/message/fusion/mptbase.c
> > +++ b/drivers/message/fusion/mptbase.c
> > @@ -1996,6 +1996,13 @@ mpt_attach(struct pci_dev *pdev,
> const struct pci_device_id *id)
> >   snprintf(ioc->fw_event_q_name, MPT_KOBJ_NAME_LEN,
> "mpt/%d", ioc->id);
> >   ioc->fw_event_q =
> create_singlethread_workqueue(ioc->fw_event_q_name);
> >
> > + if (!ioc->fw_event_q) {
> > + destroy_workqueue(ioc->reset_work_q);
> > + pci_release_selected_regions(pdev, ioc->bars);
> > + kfree(ioc);
> > + return -ENOMEM;
> > + }
> > +
> >   if ((r = mpt_do_ioc_recovery(ioc,
> MPT_HOSTEVENT_IOC_BRINGUP,
> >   CAN_SLEEP)) != 0){
> >   printk(MYIOC_s_ERR_FMT "didn't initialize
> properly! (%d)\n",
> 
> This does not look correct to me.  The error path for the call
> to
> mpt_do_ioc_recovery() after create_singlethread_workqueue()
> for
> ioc->fw_event_q does other cleanup, including:
> 
> list_del(>list);
> if (ioc->alt_ioc)
> ioc->alt_ioc->alt_ioc = NULL;
> iounmap(ioc->memmap);
> 
> and
> 
> kfree(ioc);
> pci_set_drvdata(pdev, NULL);
> 
> 
> Oh yes. Right. 
> I just copied from above error handling code.
> 
> 
>  
> 
> Here I think you are kfree()ing ioc while it is still on the
> _list,
> which will cause a crash.
> 
> Note to Avago:  this code could use a symbolic return code
> identifier:
> 
> if (r != -5)
> pci_release_selected_regions(pdev,
> ioc->bars);
> 
> 
> What is -5? it seems strange for me.
> 
> Is it error code? then it is better to use macro.

The comments above mpt_do_ioc_recovery() say:

 *  Returns:

   
 *   0 for success  

   
 *  -1 if failed to get board READY 

   
 *  -2 if READY but IOCFacts Failed 

   
 *  -3 if READY but PrimeIOCFifos Failed

   
 *  -4 if READY but IOCInit Failed  

   
 *  -5 if failed to enable_device and/or request_selected_regions   

   
 *  -6 if failed to upload firmware

and yes, it would be better to use a macro (symbolic value) hence my
comment to Avago.   

   
> 
> 
> 
> In general, with sequential allocation of resources like this,
> error
> handling can be performed using a series of goto's to labels
> at the
> end of the function that release the resources in reverse
> order.  This
> avoids the duplication of code within the function, and
> reduces the
> chance for errors when the function is later modified.  See
> 

Re: [Bug 111441] New: iscsi fails to attach to targets

2016-02-16 Thread Hannes Reinecke

On 02/08/2016 09:01 AM, Nicholas A. Bellinger wrote:

On Tue, 2016-02-02 at 14:56 -0800, Nicholas A. Bellinger wrote:

On Mon, 2016-02-01 at 10:55 -0600, Mike Christie wrote:

On 01/30/2016 01:38 AM, Nicholas A. Bellinger wrote:

On Fri, 2016-01-29 at 17:32 -0600, Mike Christie wrote:

On 01/29/2016 04:21 PM, Serguei Bezverkhi (sbezverk) wrote:

HI Mike,

I tried your patch and it is has eliminated first traceback but I still do not 
see my remote targets.



That is sort of expected. Your target is not setup for ALUA properly. It
says it supports ALUA, but when scsi_dh_alua asks about the ports it is
reporting there are none. Ccing the people that made the patch that
added the issue and own the code.

Hey Christoph and Hannes,

The dh/alua changes that added this:

 error = scsi_dh_add_device(sdev);
 if (error) {
 sdev_printk(KERN_INFO, sdev,
 "failed to add device handler: %d\n",
error);
 return error;
 }

to scsi_sysfs_add_sdev are adding a regression.

1. If that fails, then we forget to do device_del before doing the
return. My patch in this thread added that back, so we do not see the
sysfs oopses anymore. But.

2. It looks like in older kernels, we would allow misconfigured targets
like this one to still setup devices. Do we want that old behavior back?
Should we just ignore the return value from scsi_dh_add_device above?
Note that in this case, it is LIO so it can be easily fixed on the
target side by just setting it up properly. I do not think other targets
would hit this type of issue.



Btw, what does misconfigured mean here wrt target ALUA..?


[   25.833195] sd 6:0:0:4: alua: supports implicit and explicit TPGS
[   25.833360] sd 6:0:0:4: alua: No target port descriptors found
[   25.833363] sd 6:0:0:4: alua: Attach failed (-22)
[   25.833365] sd 6:0:0:4: failed to add device handler: -22



Strange, this hasn't changed in forever on the target side..


He has LIO configured to report it supports implicit/explicit ALUA, but
the ports do not seem to be configured.

For the LIO config side, are his LUNs just not in a the default_lu_gp or
any other group?


So every non-PSCSI backend device becomes part of default_lu_gp +
default_tg_pt_gp and automatically shows up in EVPD=0x83, without user
needing to do any additional configuration.

Here's what the output looks like:

root@haakon3:/usr/src/target-pending.git# sg_inq -Hi /dev/sdb
VPD INQUIRY: Device Identification page
   
   Designation descriptor number 3, descriptor length: 8
 transport: Serial Attached SCSI Protocol (SPL-2)
 designator_type: Relative target port,  code_set: Binary
 associated with the target port
 designator header(hex): 61 94 00 04
 designator:
  00 00 00 00 02 
   Designation descriptor number 4, descriptor length: 8
 transport: Serial Attached SCSI Protocol (SPL-2)
 designator_type: Target port group,  code_set: Binary
 associated with the target port
 designator header(hex): 61 95 00 04
 designator:
  00 00 00 00 00 
   Designation descriptor number 5, descriptor length: 8
 designator_type: Logical unit group,  code_set: Binary
 associated with the addressed logical unit
 designator header(hex): 01 06 00 04
 designator:
  00 00 00 00 00 
  

So AFAICT, the relative target port, target port group, and logical unit
group being returned from target on v4.5-rc1 code looks correct.

Serguei, can you confirm with 'sg_inq -Hi /dev/sdX' output on your side
with the v3.10 based target..?

AFAICT the parsing in scsi_vpd_tpg_id() from commit a8aa3978 looks
correct too.

Hannes, any ideas..?


Ping.


Please try with my latest scsi_dh_alua patchset posted to linux-scsi.
That should solve the error attaching devices.

Cheers,

Hannes
--
Dr. Hannes Reinecke
Kirchenweg 47
D-90419 Nürnberg
Tel.: 0911 - 3 77 76 10
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use ida_simple for SCSI iSCSI transport session id

2016-02-16 Thread Mike Christie
On 02/15/2016 12:26 PM, Chris Leech wrote:
> On Fri, Feb 12, 2016 at 09:54:51AM -0800, James Bottomley wrote:
>> On Fri, 2016-02-12 at 09:38 -0800, Lee Duncan wrote:
>>> The scsi_transport_iscsi module already uses the ida_simple
>>> routines for managing the target ID, if requested to do
>>> so. This change replaces an ever-increasing atomic integer
>>> that tracks the session ID itself with the ida_simple
>>> family of routines. This means that the session ID
>>> will be reclaimed and can be reused when the session
>>> is freed.
>>
>> Is reusing session ID's really a good idea?  For sequential sessions it
>> means that the ID of the next session will be re-used, i.e. the same as
>> the previous sessions, which could lead to target confusion.  I think
>> local uniqueness of session IDs is more important than wrap around
>> because sessions are short lived entities and the chances of the same
>> session being alive by the time we've wrapped is pretty tiny.
> 
> I've got a few complaints about target resources being tied up because
> we don't reuse session IDs.  The ISID becomes a component in the
> I_T nexus identifier, so changing it invalidates persistent reservations.
> 
>> If you can demostrate a multi-target problem, perhaps we should rather
>> fix this by making the next session id a target local quantity?
> 
> Mike's got a good point that we don't really need to base the ISID off
> of our local session identifier (kobject name).  I think getting reuse
> right may be a bit trickier than being a target local value, because it
> needs to be unique across target portal groups.  Which probably furthers
> the argument that we should deal with that in the userspace tools.
> 
> If we plan to split the protocol ISID cleanly from the kobject name,
> I guess the question is if aggressive reuse of the local identifier is
> better than dealing with the unlikely collision on rollover?

I thought Lee's patch to convert the host_no from a atomic_t to ida
based was merged in Martin's tree. If that is going upstream, then I
thought you would want to fix the session id too.

Is the concern similar to /dev/sdX reuse and bad apps? In this case,
some app might do a logout and login, but not update the sysfs mapping.
You could then hit corruption due to the sysfs session id now mapping to
a different target.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [SCSI] dpt_i2o: fix build warning

2016-02-16 Thread Sudip Mukherjee

On Tuesday 16 February 2016 02:47 PM, Johannes Thumshirn wrote:

On Tue, Feb 16, 2016 at 02:07:36PM +0530, Sudip Mukherjee wrote:

We were getting build warning about:
drivers/scsi/dpt_i2o.c:183:29: warning: ‘dptids’ defined but not used

dptids[] is only used in the MODULE_DEVICE_TABLE so when MODULE is not
defined then dptids[] becomes unused.


Nah. Care to make a proper pci driver from it instead of plastering yet
another #ifdef in?


Ok. I am on it. I hope someone has the hardware to check the changes.

regards
sudip
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fusion-mptbase: handle failed allocation for workqueue

2016-02-16 Thread Ewan Milne
On Mon, 2016-02-15 at 21:50 -0500, Insu Yun wrote:
> the failure of ioc->reset_work_q is checked,
> but not ioc->fw_event_q.
> 
> Signed-off-by: Insu Yun 
> ---
>  drivers/message/fusion/mptbase.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/message/fusion/mptbase.c 
> b/drivers/message/fusion/mptbase.c
> index 5dcc031..d4907a1 100644
> --- a/drivers/message/fusion/mptbase.c
> +++ b/drivers/message/fusion/mptbase.c
> @@ -1996,6 +1996,13 @@ mpt_attach(struct pci_dev *pdev, const struct 
> pci_device_id *id)
>   snprintf(ioc->fw_event_q_name, MPT_KOBJ_NAME_LEN, "mpt/%d", ioc->id);
>   ioc->fw_event_q = create_singlethread_workqueue(ioc->fw_event_q_name);
>  
> + if (!ioc->fw_event_q) {
> + destroy_workqueue(ioc->reset_work_q);
> + pci_release_selected_regions(pdev, ioc->bars);
> + kfree(ioc);
> + return -ENOMEM;
> + }
> +
>   if ((r = mpt_do_ioc_recovery(ioc, MPT_HOSTEVENT_IOC_BRINGUP,
>   CAN_SLEEP)) != 0){
>   printk(MYIOC_s_ERR_FMT "didn't initialize properly! (%d)\n",

This does not look correct to me.  The error path for the call to
mpt_do_ioc_recovery() after create_singlethread_workqueue() for
ioc->fw_event_q does other cleanup, including:

list_del(>list);
if (ioc->alt_ioc)
ioc->alt_ioc->alt_ioc = NULL;
iounmap(ioc->memmap);

and

kfree(ioc);
pci_set_drvdata(pdev, NULL);

Here I think you are kfree()ing ioc while it is still on the _list,
which will cause a crash.

Note to Avago:  this code could use a symbolic return code identifier:

if (r != -5)
pci_release_selected_regions(pdev, ioc->bars);

In general, with sequential allocation of resources like this, error
handling can be performed using a series of goto's to labels at the
end of the function that release the resources in reverse order.  This
avoids the duplication of code within the function, and reduces the
chance for errors when the function is later modified.  See init_sd
in drivers/scsi/sd.c for an example.

Reviewed-by: Ewan D. Milne 





--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


aacraid: ioctl hang (4.5-rc4)

2016-02-16 Thread Roger Willcocks
Hi folks,

I've see this on multiple machines here with various kernels and
hardware configurations, the most canonical being:

hardware: HP Z840 + ASR8405 (4 x HGST raid 6 formatted as xfs)

Z840 firmware: v01.16
ASR8405 firmware: 7.8-0 (32730)

kernel: 4.5.4.rc4 tip with command line options 'ro single'

(Adaptec driver 1.2-1 (41024))

However the same thing happens with 2 and 4 x ASR81605ZQ on a Supermicro
motherboard with the Adaptec 1.2.1-40300 driver. The 30200 driver, fwiw,
causes a kernel panic (page fault in aac_intr_normal).

To reproduce, queue up a bunch of write requests:

# dd if=/dev/zero of=/file/on/raid &

then in a loop, run 'arcconf getversion' repeatedly. I'm using arcconf
1.7 (B21229)

Within thirty seconds or so (less than 100 repeats) arcconf hangs with
userspace stack:

#0  0x7f60103a9a47 in ioctl () from /lib64/libc.so.6
#1  0x0059f4bc in faos_SendReceiveFIB(FSAAPI_CONTEXT*, _FIB*,
unsigned int) ()
#2  0x0056de3f in FsaInternalSendReceiveFib(FSAAPI_CONTEXT*,
_FIB*, int, unsigned int) ()
#3  0x0058da76 in FsaGetAdapterSasPhyInfo ()
#4  0x004f97ea in ArcAdapter::initSasPhyInfo(Ret&) ()
#5  0x00507516 in ArcAdapter::buildChildren(Ret&) ()
#6  0x00533486 in ArcSystem::buildChildren(StorLib*, Ret&) ()
#7  0x00492170 in StorLib::getSystemConfig() ()
#8  0x004289c0 in main ()

The kernel's stuck in down_interruptible() called from aac_fib_send().
^C releases it (but you can repeat the experiment.)

I've done a small amount of digging and afaict the controller issues an
interrupt when the request completes, but the driver doesn't call up()
because the XFerState in the reply doesn't match to the request.

sent: 1010a1
rcvd: 804832ad

I don't really know enough about the controller protocol to debug this
further.

First things first I'd be grateful if somebody could confirm the hang.

--
Roger


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] hisi_sas: use slot abort in v2 hw

2016-02-16 Thread John Garry

On 16/02/2016 15:32, Hannes Reinecke wrote:

On 02/16/2016 01:22 PM, John Garry wrote:

When TRANS_TX_ERR_FRAME_TXED error occurs for
a command, the command should be re-attempted.

Signed-off-by: John Garry 
---
  drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 19 +--
  1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index 58e1956..2bf93079b 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -1190,7 +1190,8 @@ static void sata_done_v2_hw(struct hisi_hba *hisi_hba, 
struct sas_task *task,
  /* by default, task resp is complete */
  static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
   struct sas_task *task,
-  struct hisi_sas_slot *slot)
+  struct hisi_sas_slot *slot,
+  int *abort_slot)
  {
struct task_status_struct *ts = >task_status;
struct hisi_sas_err_record_v2 *err_record = slot->status_buffer;
@@ -1299,6 +1300,13 @@ static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
ts->stat = SAS_DATA_UNDERRUN;
break;
}
+   case TRANS_TX_ERR_FRAME_TXED:
+   {
+   /* This will request a retry */
+   ts->stat = SAS_QUEUE_FULL;
+   ++(*abort_slot);
+   break;
+   }
case TRANS_TX_OPEN_FAIL_WITH_IT_NEXUS_LOSS:
case TRANS_TX_ERR_PHY_NOT_ENABLE:
case TRANS_TX_OPEN_CNX_ERR_BY_OTHER:
@@ -1491,11 +1499,17 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct 
hisi_sas_slot *slot,

if ((complete_hdr->dw0 & CMPLT_HDR_ERX_MSK) &&
(!(complete_hdr->dw0 & CMPLT_HDR_RSPNS_XFRD_MSK))) {
+   int abort_slot = 0;
dev_dbg(dev, "%s slot %d has error info 0x%x\n",
__func__, slot->cmplt_queue_slot,
complete_hdr->dw0 & CMPLT_HDR_ERX_MSK);

-   slot_err_v2_hw(hisi_hba, task, slot);
+   slot_err_v2_hw(hisi_hba, task, slot, _slot);
+   if (unlikely(abort_slot)) {
+   queue_work(hisi_hba->wq, >abort_slot);
+   sts = ts->stat;
+   goto out_1;
+   }
goto out;
}


Again this weird 'abort_slot' pointer reshuffling.
Care to clarify?

Cheers,

Hannes



Hopefully my explanation for patch #3 will help clarify here, as the 
code is functionally the same.


Thanks,
John



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()

2016-02-16 Thread John Garry

On 16/02/2016 15:33, Hannes Reinecke wrote:

On 02/16/2016 01:22 PM, John Garry wrote:

In high-datarate aging tests, it is found that
the SCSI framework can periodically
issue lu resets to the device. This is because scsi
commands begin to timeout. It is found that TASK SET
FULL may be returned many times for the same command,
causing the timeouts.
To overcome this, the queue depth for the device needs
to be reduced to 64 (from 256, set in
sas_slave_configure()).


Hmm. TASK SET FULL should cause the queue depth to be reduced
automatically, no?

Cheers,

Hannes



I need to double-check if Task set full reduces the depth, I don't think 
it does.


Regardless I found we were getting a combination of commands being 
retried due to Task Set Full and also SAS_QUEUE_FULL errors. For sure 
the SAS_QUEUE_FULL task errors reduce the queue depth in 
scsi_track_queue_full(). However I found it to be very slow in tracking, 
and we were getting commands timing out before the queue depth fell enough.


It would be nice to change default queue depth in sas_slave_configure() 
to a lower value so we can avoid this patch, but I am not sure if other 
vendor's HBA performance would be affected. From looking at the history 
of sas_slave_configure(), it would seem the value of 256 was inherited 
from mpt2sas driver, so I'm not sure.


Cheers,
John

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] hisi_sas: use slot abort in v1 hw

2016-02-16 Thread John Garry

On 16/02/2016 15:31, Hannes Reinecke wrote:

On 02/16/2016 01:22 PM, John Garry wrote:

When TRANS_TX_CREDIT_TIMEOUT_ERR or
TRANS_TX_CLOSE_NORMAL_ERR errors occur for a
command, the command should be re-attempted.

Signed-off-by: John Garry 
---
  drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 22 ++
  1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
index ce5f65d..34f71a1c 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
@@ -1118,9 +1118,8 @@ static int prep_ssp_v1_hw(struct hisi_hba *hisi_hba,
  }

  /* by default, task resp is complete */
-static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
-  struct sas_task *task,
-  struct hisi_sas_slot *slot)
+static void slot_err_v1_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
+  struct hisi_sas_slot *slot, int *abort_slot)
  {
struct task_status_struct *ts = >task_status;
struct hisi_sas_err_record_v1 *err_record = slot->status_buffer;
@@ -1212,6 +1211,14 @@ static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
ts->stat = SAS_NAK_R_ERR;
break;
}
+   case TRANS_TX_CREDIT_TIMEOUT_ERR:
+   case TRANS_TX_CLOSE_NORMAL_ERR:
+   {
+   /* This will request a retry */
+   ts->stat = SAS_QUEUE_FULL;
+   ++(*abort_slot);
+   break;
+   }
default:
{
ts->stat = SAM_STAT_CHECK_CONDITION;
@@ -1317,8 +1324,14 @@ static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,

if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
!(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
+   int abort_slot = 0;

-   slot_err_v1_hw(hisi_hba, task, slot);
+   slot_err_v1_hw(hisi_hba, task, slot,  _slot);
+   if (unlikely(abort_slot)) {
+   queue_work(hisi_hba->wq, >abort_slot);
+   sts = ts->stat;
+   goto out_1;
+   }
goto out;
}


What is the 'abort_slot' variable for?
Currently it's just a counter, no?
So why the weird pointer passing?

And it does feel weird. Apparently the driver does get a message,
but still has to abort the command. Why?
Isn't the message an indicator that the command has been aborted?

Cheers,

Hannes



I'll paste some more code for convenience and to help clarify:

static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
   struct hisi_sas_slot *slot, int abort)
{
...

if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
!(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
int abort_slot = 0;

slot_err_v1_hw(hisi_hba, task, slot,  _slot);
if (unlikely(abort_slot)) { /* check if we need to abort the 
task */

queue_work(hisi_hba->wq, >abort_slot);
sts = ts->stat;
goto out_1;
}
goto out;
}

 ...

out:
if (sas_dev && sas_dev->running_req)
sas_dev->running_req--;

hisi_sas_slot_task_free(hisi_hba, task, slot);
sts = ts->stat;

if (task->task_done)
task->task_done(task);
out_1:

return sts;
}

Variable abort_slot is really a boolean flag which can be set in 
slot_err_v1_hw(). When error TRANS_TX_CREDIT_TIMEOUT_ERR or 
TRANS_TX_CLOSE_NORMAL_ERR occurs in the slot, abort_slot is set. In this 
case we don't immediately complete the task (goto out and call 
hisi_sas_slot_task_free() and task->task_done()), but instead queue the 
task to be aborted in the device before completing (call queue_work() 
and then goto out_1).
When hisi_sas_slot_abort() [patch #2] runs in the workqueue for the 
task, it first aborts the task in the device with a TMF, and then 
completes the task. Finally the status (SAS_QUEUE_FULL) is passed back 
to SCSI framework, which will request a retry for the scsi command.


This is the method our hw people recommended to handle these types of 
errors.


Hope this explains,
Cheers,
John


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] hisi_sas: add hisi_sas_slot_abort()

2016-02-16 Thread John Garry

On 16/02/2016 15:22, Hannes Reinecke wrote:

On 02/16/2016 01:22 PM, John Garry wrote:

Add a function to abort a slot (task) in the device
(if it is in the task set) and then cleanup and
complete the task.
The function is called from work queue context as
it cannot be called from the context where it is
triggered (interrupt).

Signed-off-by: John Garry 
---
  drivers/scsi/hisi_sas/hisi_sas.h  |  1 +
  drivers/scsi/hisi_sas/hisi_sas_main.c | 43 +++
  2 files changed, 44 insertions(+)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 02da7e4..a05ce71 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -120,6 +120,7 @@ struct hisi_sas_slot {
dma_addr_t command_table_dma;
struct hisi_sas_sge_page *sge_page;
dma_addr_t sge_page_dma;
+   struct work_struct abort_slot;
  };

  struct hisi_sas_tmf_task {
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
b/drivers/scsi/hisi_sas/hisi_sas_main.c
index c600f5e..65509eb 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -15,6 +15,9 @@
  #define DEV_IS_GONE(dev) \
((!dev) || (dev->dev_type == SAS_PHY_UNUSED))

+static int hisi_sas_debug_issue_ssp_tmf(struct domain_device *device,
+   u8 *lun, struct hisi_sas_tmf_task *tmf);
+
  static struct hisi_hba *dev_to_hisi_hba(struct domain_device *device)
  {
return device->port->ha->lldd_ha;
@@ -113,6 +116,45 @@ static int hisi_sas_task_prep_ata(struct hisi_hba 
*hisi_hba,
return hisi_hba->hw->prep_stp(hisi_hba, slot);
  }

+/*
+ * This function will issue an abort TMF if a task is still in
+ * the target. Then it will do the task complete cleanup and
+ * callbacks.
+ */
+static void hisi_sas_slot_abort(struct work_struct *work)
+{
+   struct hisi_sas_slot *abort_slot =
+   container_of(work, struct hisi_sas_slot, abort_slot);
+   struct sas_task *task = abort_slot->task;
+   struct hisi_hba *hisi_hba = dev_to_hisi_hba(task->dev);
+   struct scsi_cmnd *cmnd = task->uldd_task;
+   struct hisi_sas_tmf_task tmf_task;
+   struct domain_device *device = task->dev;
+   struct hisi_sas_device *sas_dev = device->lldd_dev;
+   struct scsi_lun lun;
+   int tag = abort_slot->idx, rc;
+
+   int_to_scsilun(cmnd->device->lun, );
+   tmf_task.tmf = TMF_QUERY_TASK;
+   tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
+
+   rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, _task);
+
+   /* TMF_RESP_FUNC_SUCC means task is present in the task set */
+   if (rc != TMF_RESP_FUNC_SUCC)
+   goto out;
+   tmf_task.tmf = TMF_ABORT_TASK;
+   tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
+
+   rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, _task);
+out:
+   hisi_sas_slot_task_free(hisi_hba, task, abort_slot);
+   if (task->task_done)
+   task->task_done(task);
+   if (sas_dev && sas_dev->running_req)
+   sas_dev->running_req--;
+}
+
  static int hisi_sas_task_prep(struct sas_task *task, struct hisi_hba 
*hisi_hba,
  int is_tmf, struct hisi_sas_tmf_task *tmf,
  int *pass)

Do you really need to query the task first?
As per SAM a successful return from an ABORT TASK TMF has this meaning:

A response of FUNCTION COMPLETE shall indicate that the task was
aborted or was not in the task set.

Ie it doesn't matter if the task was present or not.

Cheers,

Hannes



I think that it should be ok to the abort without first querying. I'll 
just test this to double-check. (This would make the first patch in the 
series superflous)


Cheers,
John


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()

2016-02-16 Thread Hannes Reinecke
On 02/16/2016 01:22 PM, John Garry wrote:
> In high-datarate aging tests, it is found that
> the SCSI framework can periodically
> issue lu resets to the device. This is because scsi
> commands begin to timeout. It is found that TASK SET
> FULL may be returned many times for the same command,
> causing the timeouts.
> To overcome this, the queue depth for the device needs
> to be reduced to 64 (from 256, set in
> sas_slave_configure()).
> 
Hmm. TASK SET FULL should cause the queue depth to be reduced
automatically, no?

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] hisi_sas: use slot abort in v1 hw

2016-02-16 Thread Hannes Reinecke
On 02/16/2016 01:22 PM, John Garry wrote:
> When TRANS_TX_CREDIT_TIMEOUT_ERR or
> TRANS_TX_CLOSE_NORMAL_ERR errors occur for a
> command, the command should be re-attempted.
> 
> Signed-off-by: John Garry 
> ---
>  drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 22 ++
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c 
> b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
> index ce5f65d..34f71a1c 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
> @@ -1118,9 +1118,8 @@ static int prep_ssp_v1_hw(struct hisi_hba *hisi_hba,
>  }
>  
>  /* by default, task resp is complete */
> -static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
> -struct sas_task *task,
> -struct hisi_sas_slot *slot)
> +static void slot_err_v1_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
> +struct hisi_sas_slot *slot, int *abort_slot)
>  {
>   struct task_status_struct *ts = >task_status;
>   struct hisi_sas_err_record_v1 *err_record = slot->status_buffer;
> @@ -1212,6 +1211,14 @@ static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
>   ts->stat = SAS_NAK_R_ERR;
>   break;
>   }
> + case TRANS_TX_CREDIT_TIMEOUT_ERR:
> + case TRANS_TX_CLOSE_NORMAL_ERR:
> + {
> + /* This will request a retry */
> + ts->stat = SAS_QUEUE_FULL;
> + ++(*abort_slot);
> + break;
> + }
>   default:
>   {
>   ts->stat = SAM_STAT_CHECK_CONDITION;
> @@ -1317,8 +1324,14 @@ static int slot_complete_v1_hw(struct hisi_hba 
> *hisi_hba,
>  
>   if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
>   !(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
> + int abort_slot = 0;
>  
> - slot_err_v1_hw(hisi_hba, task, slot);
> + slot_err_v1_hw(hisi_hba, task, slot,  _slot);
> + if (unlikely(abort_slot)) {
> + queue_work(hisi_hba->wq, >abort_slot);
> + sts = ts->stat;
> + goto out_1;
> + }
>   goto out;
>   }
>  
What is the 'abort_slot' variable for?
Currently it's just a counter, no?
So why the weird pointer passing?

And it does feel weird. Apparently the driver does get a message,
but still has to abort the command. Why?
Isn't the message an indicator that the command has been aborted?

Cheers,

Hannes

-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] hisi_sas: use slot abort in v2 hw

2016-02-16 Thread Hannes Reinecke
On 02/16/2016 01:22 PM, John Garry wrote:
> When TRANS_TX_ERR_FRAME_TXED error occurs for
> a command, the command should be re-attempted.
> 
> Signed-off-by: John Garry 
> ---
>  drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c 
> b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
> index 58e1956..2bf93079b 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
> @@ -1190,7 +1190,8 @@ static void sata_done_v2_hw(struct hisi_hba *hisi_hba, 
> struct sas_task *task,
>  /* by default, task resp is complete */
>  static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
>  struct sas_task *task,
> -struct hisi_sas_slot *slot)
> +struct hisi_sas_slot *slot,
> +int *abort_slot)
>  {
>   struct task_status_struct *ts = >task_status;
>   struct hisi_sas_err_record_v2 *err_record = slot->status_buffer;
> @@ -1299,6 +1300,13 @@ static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
>   ts->stat = SAS_DATA_UNDERRUN;
>   break;
>   }
> + case TRANS_TX_ERR_FRAME_TXED:
> + {
> + /* This will request a retry */
> + ts->stat = SAS_QUEUE_FULL;
> + ++(*abort_slot);
> + break;
> + }
>   case TRANS_TX_OPEN_FAIL_WITH_IT_NEXUS_LOSS:
>   case TRANS_TX_ERR_PHY_NOT_ENABLE:
>   case TRANS_TX_OPEN_CNX_ERR_BY_OTHER:
> @@ -1491,11 +1499,17 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct 
> hisi_sas_slot *slot,
>  
>   if ((complete_hdr->dw0 & CMPLT_HDR_ERX_MSK) &&
>   (!(complete_hdr->dw0 & CMPLT_HDR_RSPNS_XFRD_MSK))) {
> + int abort_slot = 0;
>   dev_dbg(dev, "%s slot %d has error info 0x%x\n",
>   __func__, slot->cmplt_queue_slot,
>   complete_hdr->dw0 & CMPLT_HDR_ERX_MSK);
>  
> - slot_err_v2_hw(hisi_hba, task, slot);
> + slot_err_v2_hw(hisi_hba, task, slot, _slot);
> + if (unlikely(abort_slot)) {
> + queue_work(hisi_hba->wq, >abort_slot);
> + sts = ts->stat;
> + goto out_1;
> + }
>   goto out;
>   }
>  
Again this weird 'abort_slot' pointer reshuffling.
Care to clarify?

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] hisi_sas: add hisi_sas_slot_abort()

2016-02-16 Thread Hannes Reinecke
On 02/16/2016 01:22 PM, John Garry wrote:
> Add a function to abort a slot (task) in the device
> (if it is in the task set) and then cleanup and
> complete the task.
> The function is called from work queue context as
> it cannot be called from the context where it is
> triggered (interrupt).
> 
> Signed-off-by: John Garry 
> ---
>  drivers/scsi/hisi_sas/hisi_sas.h  |  1 +
>  drivers/scsi/hisi_sas/hisi_sas_main.c | 43 
> +++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/drivers/scsi/hisi_sas/hisi_sas.h 
> b/drivers/scsi/hisi_sas/hisi_sas.h
> index 02da7e4..a05ce71 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas.h
> +++ b/drivers/scsi/hisi_sas/hisi_sas.h
> @@ -120,6 +120,7 @@ struct hisi_sas_slot {
>   dma_addr_t command_table_dma;
>   struct hisi_sas_sge_page *sge_page;
>   dma_addr_t sge_page_dma;
> + struct work_struct abort_slot;
>  };
>  
>  struct hisi_sas_tmf_task {
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
> b/drivers/scsi/hisi_sas/hisi_sas_main.c
> index c600f5e..65509eb 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> @@ -15,6 +15,9 @@
>  #define DEV_IS_GONE(dev) \
>   ((!dev) || (dev->dev_type == SAS_PHY_UNUSED))
>  
> +static int hisi_sas_debug_issue_ssp_tmf(struct domain_device *device,
> + u8 *lun, struct hisi_sas_tmf_task *tmf);
> +
>  static struct hisi_hba *dev_to_hisi_hba(struct domain_device *device)
>  {
>   return device->port->ha->lldd_ha;
> @@ -113,6 +116,45 @@ static int hisi_sas_task_prep_ata(struct hisi_hba 
> *hisi_hba,
>   return hisi_hba->hw->prep_stp(hisi_hba, slot);
>  }
>  
> +/*
> + * This function will issue an abort TMF if a task is still in
> + * the target. Then it will do the task complete cleanup and
> + * callbacks.
> + */
> +static void hisi_sas_slot_abort(struct work_struct *work)
> +{
> + struct hisi_sas_slot *abort_slot =
> + container_of(work, struct hisi_sas_slot, abort_slot);
> + struct sas_task *task = abort_slot->task;
> + struct hisi_hba *hisi_hba = dev_to_hisi_hba(task->dev);
> + struct scsi_cmnd *cmnd = task->uldd_task;
> + struct hisi_sas_tmf_task tmf_task;
> + struct domain_device *device = task->dev;
> + struct hisi_sas_device *sas_dev = device->lldd_dev;
> + struct scsi_lun lun;
> + int tag = abort_slot->idx, rc;
> +
> + int_to_scsilun(cmnd->device->lun, );
> + tmf_task.tmf = TMF_QUERY_TASK;
> + tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
> +
> + rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, _task);
> +
> + /* TMF_RESP_FUNC_SUCC means task is present in the task set */
> + if (rc != TMF_RESP_FUNC_SUCC)
> + goto out;
> + tmf_task.tmf = TMF_ABORT_TASK;
> + tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
> +
> + rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, _task);
> +out:
> + hisi_sas_slot_task_free(hisi_hba, task, abort_slot);
> + if (task->task_done)
> + task->task_done(task);
> + if (sas_dev && sas_dev->running_req)
> + sas_dev->running_req--;
> +}
> +
>  static int hisi_sas_task_prep(struct sas_task *task, struct hisi_hba 
> *hisi_hba,
> int is_tmf, struct hisi_sas_tmf_task *tmf,
> int *pass)
Do you really need to query the task first?
As per SAM a successful return from an ABORT TASK TMF has this meaning:

A response of FUNCTION COMPLETE shall indicate that the task was
aborted or was not in the task set.

Ie it doesn't matter if the task was present or not.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] hisi_sas: add TMF_RESP_FUNC_SUCC check

2016-02-16 Thread Hannes Reinecke
On 02/16/2016 01:22 PM, John Garry wrote:
> When a tmf is issued, various response codes can be
> returned from the target. For a query tmf the
> response may be TMF_RESP_FUNC_COMPLETE or
> TMF_RESP_FUNC_SUCC.
> Add a condition for TMF_RESP_FUNC_SUCC.
> Also, check for SAM_STAT_GOOD is replaced with
> TMF_RESP_FUNC_COMPLETE, which is a genuine tmf
> response code. SAM_STAT_GOOD and
> TMF_RESP_FUNC_COMPLETE have the same value, so
> this is why it worked before.
> 
> Signed-off-by: John Garry 
> ---
>  drivers/scsi/hisi_sas/hisi_sas_main.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
> b/drivers/scsi/hisi_sas/hisi_sas_main.c
> index 2194917..c600f5e 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> @@ -661,12 +661,18 @@ static int hisi_sas_exec_internal_tmf_task(struct 
> domain_device *device,
>   }
>  
>   if (task->task_status.resp == SAS_TASK_COMPLETE &&
> - task->task_status.stat == SAM_STAT_GOOD) {
> +  task->task_status.stat == TMF_RESP_FUNC_COMPLETE) {
>   res = TMF_RESP_FUNC_COMPLETE;
>   break;
>   }
>  
>   if (task->task_status.resp == SAS_TASK_COMPLETE &&
> + task->task_status.stat == TMF_RESP_FUNC_SUCC) {
> + res = TMF_RESP_FUNC_SUCC;
> + break;
> + }
> +
> + if (task->task_status.resp == SAS_TASK_COMPLETE &&
> task->task_status.stat == SAS_DATA_UNDERRUN) {
>   /* no error, but return the number of bytes of
>* underrun
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] esas2r: Fix array overrun

2016-02-16 Thread Tomas Henzl
On 15.2.2016 20:01, Alan wrote:
> Check the array size *before* dereferencing it with a user provided offset
>
> Signed-off-by: Alan Cox 

Reviewed-by: Tomas Henzl 

Tomas

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled

2016-02-16 Thread jiangyiwen
On 2016/2/5 11:13, Martin K. Petersen wrote:
>> "Yiwen" == jiangyiwen   writes:
> 
> Yiwen,
> 
> Yiwen> First, I don't understand why blk_peek_request() return
> Yiwen> EREMOTEIO, as I know, in this situation we only prepare scsi
> Yiwen> command without sending to device, and I think EREMOTEIO should
> Yiwen> be returned only when IO has already sent to device, maybe I
> Yiwen> don't understand definition of EREMOTEIO.  So, Why don't return
> Yiwen> the errno with EOPNOTSUPP?
> 
> DM currently has special handling for EREMOTEIO failures (because that's
> what we'd return when a device responds with ILLEGAL REQUEST).
> 
> I am not opposed to returning EOPNOTSUPP but it would require more
> changes and since this is a bugfix for stable I want to keep it as small
> as possible.
> 
> Yiwen> In addition, I still worried with whether there has other
> Yiwen> situations which will return EIO or other error. In this way,
> Yiwen> MD/DM still can happen this type of problem, so I think may be in
> Yiwen> multipath we still needs a protection to avoid it.
> 
> There are various error scenarios where we can end up bailing with a
> BLKPREP_KILL. But the general rule of thumb is that these conditions all
> demand a retry. The optional commands like WRITE SAME and UNMAP are
> special in that they are irrecoverable.
> 
> Yiwen> At last, I have a additional problem, I remember that you
> Yiwen> previously send a series of patches about XCOPY, why don't have
> Yiwen> any news latter later? I very much expect that I can see these
> Yiwen> patches which are merged into kernel.
> 
> I am working on a refresh of the series that includes token-based copy
> offload support in addition to EXTENDED COPY. The patches depend on Mike
> Christie's request flag patch series which has yet to be merged.
> 
Hi Martin,
I tested this code, but found a problem.

When called scsi_prep_fn return BLKPREP_INVALID, we should
use the same code with BLKPREP_KILL in scsi_prep_return. This
patch should add a line of code as follows:

---
 drivers/scsi/scsi_lib.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index dd8ad2a..d8d2198 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1343,6 +1343,7 @@ scsi_prep_return(struct request_queue *q, struct request 
*req, int ret)

switch (ret) {
case BLKPREP_KILL:
+   case BLKPREP_INVALID:
req->errors = DID_NO_CONNECT << 16;
/* release the command and kill it */
if (req->special) {
-- 
1.8.4.3

Thanks,
Yiwen Jiang.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] hisi_sas: add hisi_sas_slot_abort()

2016-02-16 Thread John Garry
Add a function to abort a slot (task) in the device
(if it is in the task set) and then cleanup and
complete the task.
The function is called from work queue context as
it cannot be called from the context where it is
triggered (interrupt).

Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas.h  |  1 +
 drivers/scsi/hisi_sas/hisi_sas_main.c | 43 +++
 2 files changed, 44 insertions(+)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 02da7e4..a05ce71 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -120,6 +120,7 @@ struct hisi_sas_slot {
dma_addr_t command_table_dma;
struct hisi_sas_sge_page *sge_page;
dma_addr_t sge_page_dma;
+   struct work_struct abort_slot;
 };
 
 struct hisi_sas_tmf_task {
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
b/drivers/scsi/hisi_sas/hisi_sas_main.c
index c600f5e..65509eb 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -15,6 +15,9 @@
 #define DEV_IS_GONE(dev) \
((!dev) || (dev->dev_type == SAS_PHY_UNUSED))
 
+static int hisi_sas_debug_issue_ssp_tmf(struct domain_device *device,
+   u8 *lun, struct hisi_sas_tmf_task *tmf);
+
 static struct hisi_hba *dev_to_hisi_hba(struct domain_device *device)
 {
return device->port->ha->lldd_ha;
@@ -113,6 +116,45 @@ static int hisi_sas_task_prep_ata(struct hisi_hba 
*hisi_hba,
return hisi_hba->hw->prep_stp(hisi_hba, slot);
 }
 
+/*
+ * This function will issue an abort TMF if a task is still in
+ * the target. Then it will do the task complete cleanup and
+ * callbacks.
+ */
+static void hisi_sas_slot_abort(struct work_struct *work)
+{
+   struct hisi_sas_slot *abort_slot =
+   container_of(work, struct hisi_sas_slot, abort_slot);
+   struct sas_task *task = abort_slot->task;
+   struct hisi_hba *hisi_hba = dev_to_hisi_hba(task->dev);
+   struct scsi_cmnd *cmnd = task->uldd_task;
+   struct hisi_sas_tmf_task tmf_task;
+   struct domain_device *device = task->dev;
+   struct hisi_sas_device *sas_dev = device->lldd_dev;
+   struct scsi_lun lun;
+   int tag = abort_slot->idx, rc;
+
+   int_to_scsilun(cmnd->device->lun, );
+   tmf_task.tmf = TMF_QUERY_TASK;
+   tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
+
+   rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, _task);
+
+   /* TMF_RESP_FUNC_SUCC means task is present in the task set */
+   if (rc != TMF_RESP_FUNC_SUCC)
+   goto out;
+   tmf_task.tmf = TMF_ABORT_TASK;
+   tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
+
+   rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, _task);
+out:
+   hisi_sas_slot_task_free(hisi_hba, task, abort_slot);
+   if (task->task_done)
+   task->task_done(task);
+   if (sas_dev && sas_dev->running_req)
+   sas_dev->running_req--;
+}
+
 static int hisi_sas_task_prep(struct sas_task *task, struct hisi_hba *hisi_hba,
  int is_tmf, struct hisi_sas_tmf_task *tmf,
  int *pass)
@@ -206,6 +248,7 @@ static int hisi_sas_task_prep(struct sas_task *task, struct 
hisi_hba *hisi_hba,
slot->task = task;
slot->port = port;
task->lldd_task = slot;
+   INIT_WORK(>abort_slot, hisi_sas_slot_abort);
 
slot->status_buffer = dma_pool_alloc(hisi_hba->status_buffer_pool,
 GFP_ATOMIC,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] hisi_sas: add abort and retry feature

2016-02-16 Thread John Garry
This patchset introduces support to abort
certain commands which have failed and retry.
Certain errors require that the command be
retried, like TRANS_TX_CREDIT_TIMEOUT_ERR in
v1 hw.
However, when these errors occur the IO may
still be active, so the IO must first be
aborted, and then retried. The HiSilicon
SAS controller has no FW to do this work, so
it needs to be done manually.

John Garry (6):
  hisi_sas: add TMF_RESP_FUNC_SUCC check
  hisi_sas: add hisi_sas_slot_abort()
  hisi_sas: use slot abort in v1 hw
  hisi_sas: use slot abort in v2 hw
  hisi_sas: add hisi_sas_slave_configure()
  hisi_sas: update driver version to 1.3

 drivers/scsi/hisi_sas/hisi_sas.h   |  3 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 66 --
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 22 +---
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 19 --
 4 files changed, 101 insertions(+), 9 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] hisi_sas: use slot abort in v2 hw

2016-02-16 Thread John Garry
When TRANS_TX_ERR_FRAME_TXED error occurs for
a command, the command should be re-attempted.

Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index 58e1956..2bf93079b 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -1190,7 +1190,8 @@ static void sata_done_v2_hw(struct hisi_hba *hisi_hba, 
struct sas_task *task,
 /* by default, task resp is complete */
 static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
   struct sas_task *task,
-  struct hisi_sas_slot *slot)
+  struct hisi_sas_slot *slot,
+  int *abort_slot)
 {
struct task_status_struct *ts = >task_status;
struct hisi_sas_err_record_v2 *err_record = slot->status_buffer;
@@ -1299,6 +1300,13 @@ static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
ts->stat = SAS_DATA_UNDERRUN;
break;
}
+   case TRANS_TX_ERR_FRAME_TXED:
+   {
+   /* This will request a retry */
+   ts->stat = SAS_QUEUE_FULL;
+   ++(*abort_slot);
+   break;
+   }
case TRANS_TX_OPEN_FAIL_WITH_IT_NEXUS_LOSS:
case TRANS_TX_ERR_PHY_NOT_ENABLE:
case TRANS_TX_OPEN_CNX_ERR_BY_OTHER:
@@ -1491,11 +1499,17 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct 
hisi_sas_slot *slot,
 
if ((complete_hdr->dw0 & CMPLT_HDR_ERX_MSK) &&
(!(complete_hdr->dw0 & CMPLT_HDR_RSPNS_XFRD_MSK))) {
+   int abort_slot = 0;
dev_dbg(dev, "%s slot %d has error info 0x%x\n",
__func__, slot->cmplt_queue_slot,
complete_hdr->dw0 & CMPLT_HDR_ERX_MSK);
 
-   slot_err_v2_hw(hisi_hba, task, slot);
+   slot_err_v2_hw(hisi_hba, task, slot, _slot);
+   if (unlikely(abort_slot)) {
+   queue_work(hisi_hba->wq, >abort_slot);
+   sts = ts->stat;
+   goto out_1;
+   }
goto out;
}
 
@@ -1555,6 +1569,7 @@ out:
 
if (task->task_done)
task->task_done(task);
+out_1:
 
return sts;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] hisi_sas: use slot abort in v1 hw

2016-02-16 Thread John Garry
When TRANS_TX_CREDIT_TIMEOUT_ERR or
TRANS_TX_CLOSE_NORMAL_ERR errors occur for a
command, the command should be re-attempted.

Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
index ce5f65d..34f71a1c 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
@@ -1118,9 +1118,8 @@ static int prep_ssp_v1_hw(struct hisi_hba *hisi_hba,
 }
 
 /* by default, task resp is complete */
-static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
-  struct sas_task *task,
-  struct hisi_sas_slot *slot)
+static void slot_err_v1_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
+  struct hisi_sas_slot *slot, int *abort_slot)
 {
struct task_status_struct *ts = >task_status;
struct hisi_sas_err_record_v1 *err_record = slot->status_buffer;
@@ -1212,6 +1211,14 @@ static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
ts->stat = SAS_NAK_R_ERR;
break;
}
+   case TRANS_TX_CREDIT_TIMEOUT_ERR:
+   case TRANS_TX_CLOSE_NORMAL_ERR:
+   {
+   /* This will request a retry */
+   ts->stat = SAS_QUEUE_FULL;
+   ++(*abort_slot);
+   break;
+   }
default:
{
ts->stat = SAM_STAT_CHECK_CONDITION;
@@ -1317,8 +1324,14 @@ static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
 
if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
!(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
+   int abort_slot = 0;
 
-   slot_err_v1_hw(hisi_hba, task, slot);
+   slot_err_v1_hw(hisi_hba, task, slot,  _slot);
+   if (unlikely(abort_slot)) {
+   queue_work(hisi_hba->wq, >abort_slot);
+   sts = ts->stat;
+   goto out_1;
+   }
goto out;
}
 
@@ -1375,6 +1388,7 @@ out:
 
if (task->task_done)
task->task_done(task);
+out_1:
 
return sts;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] hisi_sas: update driver version to 1.3

2016-02-16 Thread John Garry
Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index a05ce71..beb7318 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -23,7 +23,7 @@
 #include 
 #include 
 
-#define DRV_VERSION "v1.2"
+#define DRV_VERSION "v1.3"
 
 #define HISI_SAS_MAX_PHYS  9
 #define HISI_SAS_MAX_QUEUES32
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] hisi_sas: add TMF_RESP_FUNC_SUCC check

2016-02-16 Thread John Garry
When a tmf is issued, various response codes can be
returned from the target. For a query tmf the
response may be TMF_RESP_FUNC_COMPLETE or
TMF_RESP_FUNC_SUCC.
Add a condition for TMF_RESP_FUNC_SUCC.
Also, check for SAM_STAT_GOOD is replaced with
TMF_RESP_FUNC_COMPLETE, which is a genuine tmf
response code. SAM_STAT_GOOD and
TMF_RESP_FUNC_COMPLETE have the same value, so
this is why it worked before.

Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 2194917..c600f5e 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -661,12 +661,18 @@ static int hisi_sas_exec_internal_tmf_task(struct 
domain_device *device,
}
 
if (task->task_status.resp == SAS_TASK_COMPLETE &&
-   task->task_status.stat == SAM_STAT_GOOD) {
+task->task_status.stat == TMF_RESP_FUNC_COMPLETE) {
res = TMF_RESP_FUNC_COMPLETE;
break;
}
 
if (task->task_status.resp == SAS_TASK_COMPLETE &&
+   task->task_status.stat == TMF_RESP_FUNC_SUCC) {
+   res = TMF_RESP_FUNC_SUCC;
+   break;
+   }
+
+   if (task->task_status.resp == SAS_TASK_COMPLETE &&
  task->task_status.stat == SAS_DATA_UNDERRUN) {
/* no error, but return the number of bytes of
 * underrun
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()

2016-02-16 Thread John Garry
In high-datarate aging tests, it is found that
the SCSI framework can periodically
issue lu resets to the device. This is because scsi
commands begin to timeout. It is found that TASK SET
FULL may be returned many times for the same command,
causing the timeouts.
To overcome this, the queue depth for the device needs
to be reduced to 64 (from 256, set in
sas_slave_configure()).

Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 65509eb..dde7c5a 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -454,6 +454,19 @@ static int hisi_sas_dev_found(struct domain_device *device)
return 0;
 }
 
+static int hisi_sas_slave_configure(struct scsi_device *sdev)
+{
+   struct domain_device *dev = sdev_to_domain_dev(sdev);
+   int ret = sas_slave_configure(sdev);
+
+   if (ret)
+   return ret;
+   if (!dev_is_sata(dev))
+   sas_change_queue_depth(sdev, 64);
+
+   return 0;
+}
+
 static void hisi_sas_scan_start(struct Scsi_Host *shost)
 {
struct hisi_hba *hisi_hba = shost_priv(shost);
@@ -997,7 +1010,7 @@ static struct scsi_host_template hisi_sas_sht = {
.name   = DRV_NAME,
.queuecommand   = sas_queuecommand,
.target_alloc   = sas_target_alloc,
-   .slave_configure= sas_slave_configure,
+   .slave_configure= hisi_sas_slave_configure,
.scan_finished  = hisi_sas_scan_finished,
.scan_start = hisi_sas_scan_start,
.change_queue_depth = sas_change_queue_depth,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: fix soft lockup in scsi_remove_target() on module removal

2016-02-16 Thread Johannes Thumshirn
On Wed, Feb 10, 2016 at 08:03:26AM -0800, James Bottomley wrote:
> This softlockup is currently happening:
> 
> [  444.088002] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! 
> [kworker/1:1:29]
> [  444.088002] Modules linked in: lpfc(-) qla2x00tgt(O) qla2xxx_scst(O) 
> scst_vdisk(O) scsi_transport_fc libcrc32c scst(O) dlm configfs nfsd lockd 
> grace nfs_acl auth_rpcgss sunrpc ed
> d snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device dm_mod iTCO_wdt 
> snd_hda_codec_realtek snd_hda_codec_generic gpio_ich iTCO_vendor_support 
> ppdev snd_hda_intel snd_hda_codec snd_hda
> _core snd_hwdep tg3 snd_pcm snd_timer libphy lpc_ich parport_pc ptp 
> acpi_cpufreq snd pps_core fjes parport i2c_i801 ehci_pci tpm_tis tpm sr_mod 
> cdrom soundcore floppy hwmon sg 8250_
> fintek pcspkr i915 drm_kms_helper uhci_hcd ehci_hcd drm fb_sys_fops sysimgblt 
> sysfillrect syscopyarea i2c_algo_bit usbcore button video usb_common fan 
> ata_generic ata_piix libata th
> ermal
> [  444.088002] CPU: 1 PID: 29 Comm: kworker/1:1 Tainted: G   O
> 4.4.0-rc5-2.g1e923a3-default #1
> [  444.088002] Hardware name: FUJITSU SIEMENS ESPRIMO E   /D2164-A1, 
> BIOS 5.00 R1.10.2164.A1   05/08/2006
> [  444.088002] Workqueue: fc_wq_4 fc_rport_final_delete [scsi_transport_fc]
> [  444.088002] task: f6266ec0 ti: f6268000 task.ti: f6268000
> [  444.088002] EIP: 0060:[] EFLAGS: 0286 CPU: 1
> [  444.088002] EIP is at _raw_spin_unlock_irqrestore+0x14/0x20
> [  444.088002] EAX: 0286 EBX: f20d3800 ECX: 0002 EDX: 0286
> [  444.088002] ESI: f50ba800 EDI: f2146848 EBP: f6269ec8 ESP: f6269ec8
> [  444.088002]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [  444.088002] CR0: 8005003b CR2: 08f96600 CR3: 363ae000 CR4: 06d0
> [  444.088002] Stack:
> [  444.088002]  f6269eec c066b0f7 0286 f2146848 f50ba808 f50ba800 
> f50ba800 f2146a90
> [  444.088002]  f2146848 f6269f08 f8f0a4ed f3141000 f2146800 f2146a90 
> f619fa00 0040
> [  444.088002]  f6269f40 c026cb25 0001 166c6392 0061 f6757140 
> f6136340 0004
> [  444.088002] Call Trace:
> [  444.088002]  [] scsi_remove_target+0x167/0x1c0
> [  444.088002]  [] fc_rport_final_delete+0x9d/0x1e0 
> [scsi_transport_fc]
> [  444.088002]  [] process_one_work+0x155/0x3e0
> [  444.088002]  [] worker_thread+0x37/0x490
> [  444.088002]  [] kthread+0x9b/0xb0
> [  444.088002]  [] ret_from_kernel_thread+0x21/0x40
> 
> What appears to be happening is that something has pinned the target
> so it can't go into STARGET_DEL via final release and the loop in
> scsi_remove_target spins endlessly until that happens.
> 
> The fix for this soft lockup is to not keep looping over a device that
> we've called remove on but which hasn't gone into DEL state.  This
> patch will retain a simplistic memory of the last target and not keep
> looping over it.
> 
> Reported-by: Sebastian Herbszt 
> Tested-by: Sebastian Herbszt 
> Fixes: 40998193560dab6c3ce8d25f4fa58a23e252ef38
> Cc: sta...@vger.kernel.org
> Signed-off-by: James Bottomley 

Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [SCSI] dpt_i2o: fix build warning

2016-02-16 Thread Johannes Thumshirn
On Tue, Feb 16, 2016 at 02:07:36PM +0530, Sudip Mukherjee wrote:
> We were getting build warning about:
> drivers/scsi/dpt_i2o.c:183:29: warning: ‘dptids’ defined but not used
> 
> dptids[] is only used in the MODULE_DEVICE_TABLE so when MODULE is not
> defined then dptids[] becomes unused.

Nah. Care to make a proper pci driver from it instead of plastering yet
another #ifdef in?

> 
> Signed-off-by: Sudip Mukherjee 
> ---
>  drivers/scsi/dpt_i2o.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/scsi/dpt_i2o.c b/drivers/scsi/dpt_i2o.c
> index d4cda5e..21c8d21 100644
> --- a/drivers/scsi/dpt_i2o.c
> +++ b/drivers/scsi/dpt_i2o.c
> @@ -180,11 +180,14 @@ static u8 adpt_read_blink_led(adpt_hba* host)
>   
> *
>   */
>  
> +#ifdef MODULE
>  static struct pci_device_id dptids[] = {
>   { PCI_DPT_VENDOR_ID, PCI_DPT_DEVICE_ID, PCI_ANY_ID, PCI_ANY_ID,},
>   { PCI_DPT_VENDOR_ID, PCI_DPT_RAPTOR_DEVICE_ID, PCI_ANY_ID, PCI_ANY_ID,},
>   { 0, }
>  };
> +#endif
> +
>  MODULE_DEVICE_TABLE(pci,dptids);
>  
>  static int adpt_detect(struct scsi_host_template* sht)
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [SCSI] dpt_i2o: fix build warning

2016-02-16 Thread Sudip Mukherjee
We were getting build warning about:
drivers/scsi/dpt_i2o.c:183:29: warning: ‘dptids’ defined but not used

dptids[] is only used in the MODULE_DEVICE_TABLE so when MODULE is not
defined then dptids[] becomes unused.

Signed-off-by: Sudip Mukherjee 
---
 drivers/scsi/dpt_i2o.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/dpt_i2o.c b/drivers/scsi/dpt_i2o.c
index d4cda5e..21c8d21 100644
--- a/drivers/scsi/dpt_i2o.c
+++ b/drivers/scsi/dpt_i2o.c
@@ -180,11 +180,14 @@ static u8 adpt_read_blink_led(adpt_hba* host)
  *
  */
 
+#ifdef MODULE
 static struct pci_device_id dptids[] = {
{ PCI_DPT_VENDOR_ID, PCI_DPT_DEVICE_ID, PCI_ANY_ID, PCI_ANY_ID,},
{ PCI_DPT_VENDOR_ID, PCI_DPT_RAPTOR_DEVICE_ID, PCI_ANY_ID, PCI_ANY_ID,},
{ 0, }
 };
+#endif
+
 MODULE_DEVICE_TABLE(pci,dptids);
 
 static int adpt_detect(struct scsi_host_template* sht)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html