date:20130424

Re: [PATCH v2] qla2xxx: Fix for locking issue between driver ISR and mailbox routines

2013-04-24 Thread Saurav Kashyap


Acked-by: Saurav Kashyap saurav.kash...@qlogic.com

Thanks,
~Saurav

The driver uses ha-mbx_cmd_flags variable to pass information between
its ISR and mailbox routines, however, it does so without the protection
of
any locks.  Under certain conditions, this can lead to multiple mailbox
command completions being signaled, which, in turn, leads to a false
mailbox timeout error for the subsequently issued mailbox command.

The issue occurs frequently but intermittenly with the Qlogic 8GFC mezz
card during card initialization, resulting in card initialization failure.

Signed-off-by: Gurinder (Sunny) Shergill gurinder.sherg...@hp.com
---
 drivers/scsi/qla2xxx/qla_inline.h | 11 +++
 drivers/scsi/qla2xxx/qla_isr.c| 27 ---
 drivers/scsi/qla2xxx/qla_mbx.c|  2 --
 drivers/scsi/qla2xxx/qla_mr.c | 10 ++
 drivers/scsi/qla2xxx/qla_nx.c | 26 ++
 5 files changed, 27 insertions(+), 49 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_inline.h
b/drivers/scsi/qla2xxx/qla_inline.h
index 98ab921..0a5c895 100644
--- a/drivers/scsi/qla2xxx/qla_inline.h
+++ b/drivers/scsi/qla2xxx/qla_inline.h
@@ -278,3 +278,14 @@ qla2x00_do_host_ramp_up(scsi_qla_host_t *vha)

   set_bit(HOST_RAMP_UP_QUEUE_DEPTH, vha-dpc_flags);
 }
+
+static inline void
+qla2x00_handle_mbx_completion(struct qla_hw_data *ha, int status)
+{
+  if (test_bit(MBX_INTR_WAIT, ha-mbx_cmd_flags) 
+  (status  MBX_INTERRUPT)  ha-flags.mbox_int) {
+  set_bit(MBX_INTERRUPT, ha-mbx_cmd_flags);
+  clear_bit(MBX_INTR_WAIT, ha-mbx_cmd_flags);
+  complete(ha-mbx_intr_comp);
+  }
+}
diff --git a/drivers/scsi/qla2xxx/qla_isr.c
b/drivers/scsi/qla2xxx/qla_isr.c
index 259d920..d2a4c75 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -104,14 +104,9 @@ qla2100_intr_handler(int irq, void *dev_id)
   RD_REG_WORD(reg-hccr);
   }
   }
+  qla2x00_handle_mbx_completion(ha, status);
   spin_unlock_irqrestore(ha-hardware_lock, flags);

-  if (test_bit(MBX_INTR_WAIT, ha-mbx_cmd_flags) 
-  (status  MBX_INTERRUPT)  ha-flags.mbox_int) {
-  set_bit(MBX_INTERRUPT, ha-mbx_cmd_flags);
-  complete(ha-mbx_intr_comp);
-  }
-
   return (IRQ_HANDLED);
 }

@@ -221,14 +216,9 @@ qla2300_intr_handler(int irq, void *dev_id)
   WRT_REG_WORD(reg-hccr, HCCR_CLR_RISC_INT);
   RD_REG_WORD_RELAXED(reg-hccr);
   }
+  qla2x00_handle_mbx_completion(ha, status);
   spin_unlock_irqrestore(ha-hardware_lock, flags);

-  if (test_bit(MBX_INTR_WAIT, ha-mbx_cmd_flags) 
-  (status  MBX_INTERRUPT)  ha-flags.mbox_int) {
-  set_bit(MBX_INTERRUPT, ha-mbx_cmd_flags);
-  complete(ha-mbx_intr_comp);
-  }
-
   return (IRQ_HANDLED);
 }

@@ -2613,14 +2603,9 @@ qla24xx_intr_handler(int irq, void *dev_id)
   if (unlikely(IS_QLA83XX(ha)  (ha-pdev-revision == 1)))
   ndelay(3500);
   }
+  qla2x00_handle_mbx_completion(ha, status);
   spin_unlock_irqrestore(ha-hardware_lock, flags);

-  if (test_bit(MBX_INTR_WAIT, ha-mbx_cmd_flags) 
-  (status  MBX_INTERRUPT)  ha-flags.mbox_int) {
-  set_bit(MBX_INTERRUPT, ha-mbx_cmd_flags);
-  complete(ha-mbx_intr_comp);
-  }
-
   return IRQ_HANDLED;
 }

@@ -2763,13 +2748,9 @@ qla24xx_msix_default(int irq, void *dev_id)
   }
   WRT_REG_DWORD(reg-hccr, HCCRX_CLR_RISC_INT);
   } while (0);
+  qla2x00_handle_mbx_completion(ha, status);
   spin_unlock_irqrestore(ha-hardware_lock, flags);

-  if (test_bit(MBX_INTR_WAIT, ha-mbx_cmd_flags) 
-  (status  MBX_INTERRUPT)  ha-flags.mbox_int) {
-  set_bit(MBX_INTERRUPT, ha-mbx_cmd_flags);
-  complete(ha-mbx_intr_comp);
-  }
   return IRQ_HANDLED;
 }

diff --git a/drivers/scsi/qla2xxx/qla_mbx.c
b/drivers/scsi/qla2xxx/qla_mbx.c
index 9e5d89d..3587ec2 100644
--- a/drivers/scsi/qla2xxx/qla_mbx.c
+++ b/drivers/scsi/qla2xxx/qla_mbx.c
@@ -179,8 +179,6 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha,
mbx_cmd_t *mcp)

   wait_for_completion_timeout(ha-mbx_intr_comp, mcp-tov * HZ);

-  clear_bit(MBX_INTR_WAIT, ha-mbx_cmd_flags);
-
   } else {
   ql_dbg(ql_dbg_mbx, vha, 0x1011,
   Cmd=%x Polling Mode.\n, command);
diff --git a/drivers/scsi/qla2xxx/qla_mr.c b/drivers/scsi/qla2xxx/qla_mr.c
index 729b743..5483da8 100644
--- a/drivers/scsi/qla2xxx/qla_mr.c
+++ b/drivers/scsi/qla2xxx/qla_mr.c
@@ -148,9 +148,6 @@ qlafx00_mailbox_command(scsi_qla_host_t *vha, struct
mbx_cmd_32 *mcp)
   spin_unlock_irqrestore(ha-hardware_lock, flags);

   wait_for_completion_timeout(ha-mbx_intr_comp, mcp-tov * HZ);
-
-  clear_bit(MBX_INTR_WAIT, ha-mbx_cmd_flags);
-
   } else {
   ql_dbg(ql_dbg_mbx,

sg_ses -j shows Transport protocol: Oxc not decoded

2013-04-24 Thread Chris Dunlop

Hi,

I have 3 boxes, each with an LSI 9211-8i and a mix of LSI expanders (Supermicro
SAS-846EL2, SAS-826EL2). For some of my expanders, 'sg_ses -j' (originally
sg3_utils 1.33, now 1.35) is showing:

Slot 24 [0,23]  Element type: Array device slot
  ...
  Additional Element Status:
Transport protocol: Oxc not decoded

...where the slot contains a SATA device. It's always Slot 24, and other
slots show up fine. E.g. on one of the expanders with SATA drives in
both Slot 23 and 24:

h3# sg_ses -j /dev/sg81
  LSI   SAS2X36   0e0b
  Primary enclosure logical identifier (hex): 500304800013453f
...
Slot 23 [0,22]  Element type: Array device slot
  Enclosure Status:
Predicted failure=0, Disabled=0, Swap=0, status: OK
OK=0, Reserved device=0, Hot spare=0, Cons check=0
In crit array=0, In failed array=0, Rebuild/remap=0, R/R abort=0
App client bypass A=0, Do not remove=0, Enc bypass A=0, Enc bypass B=0
Ready to insert=0, RMV=0, Ident=0, Report=0
App client bypass B=0, Fault sensed=0, Fault reqstd=0, Device off=0
Bypassed A=0, Bypassed B=0, Dev bypassed A=0, Dev bypassed B=0
  Additional Element Status:
Transport protocol: SAS
number of phys: 1, not all phys: 0, device slot number: 22
phy index: 0
  device type: no device attached
  initiator port for:
  target port for: SATA_device
  attached SAS address: 0x500304800013453f
  SAS address: 0x5003048000134522
  phy identifier: 0x0
Slot 24 [0,23]  Element type: Array device slot
  Enclosure Status:
Predicted failure=0, Disabled=0, Swap=0, status: OK
OK=0, Reserved device=0, Hot spare=0, Cons check=0
In crit array=0, In failed array=0, Rebuild/remap=0, R/R abort=0
App client bypass A=0, Do not remove=0, Enc bypass A=0, Enc bypass B=0
Ready to insert=0, RMV=0, Ident=0, Report=0
App client bypass B=0, Fault sensed=0, Fault reqstd=0, Device off=0
Bypassed A=0, Bypassed B=0, Dev bypassed A=0, Dev bypassed B=0
  Additional Element Status:
Transport protocol: Oxc not decoded
...

This may be unrelated, but 'sg_ses -j' is also coming up with the
following error on 3 of the 6 expanders identified as LSI SAS2X36 0e0b
(this doesn't include any of the expanders with the Slot 24 problem):

join_work: oi=6, ei=255 (broken_ei=0) not in join_arr


The expander types are:

--
$ for h in h1 h2 h3; do echo === $h ===
  ssh $h 'lsscsi | grep enclosu'
done
=== h1 ===
[0:0:24:0]   enclosu LSI CORP SAS2X36  0717  -
[0:0:27:0]   enclosu LSI  SAS2X36  0e0b  -
[0:0:38:0]   enclosu LSI CORP SAS2X28  0717  -
[0:0:62:0]   enclosu LSI  SAS2X36  0e0b  -
[0:0:85:0]   enclosu LSI  SAS2X36  0e0b  -
=== h2 ===
[0:0:25:0]   enclosu LSI CORP SAS2X36  0717  -
[0:0:29:0]   enclosu LSI CORP SAS2X28  0717  -
=== h3 ===
[0:0:23:0]   enclosu LSI CORP SAS2X36  0717  -
[0:0:45:0]   enclosu LSI  SAS2X36  0e0b  -
[0:0:57:0]   enclosu LSI CORP SAS2X28  0717  -
[0:0:81:0]   enclosu LSI  SAS2X36  0e0b  -
[0:0:88:0]   enclosu LSI  SAS2X36  0e0b  -
--

...and they're daisy-chained like this:

--
for h in b2 b4 b5; do echo === $h ===
  ssh $h 'find /sys/bus/scsi/devices/host0/ -name expander\* | egrep -v 
bsg|sas_(expander|device)' 
done
=== h1 ===
/sys/bus/scsi/devices/host0/port-0:0/expander-0:0
/sys/bus/scsi/devices/host0/port-0:0/expander-0:0/port-0:0:0/expander-0:1
/sys/bus/scsi/devices/host0/port-0:0/expander-0:0/port-0:0:0/expander-0:1/port-0:1:25/expander-0:4
/sys/bus/scsi/devices/host0/port-0:1/expander-0:2
/sys/bus/scsi/devices/host0/port-0:1/expander-0:2/port-0:2:0/expander-0:3
=== h2 ===
/sys/bus/scsi/devices/host0/port-0:0/expander-0:0
/sys/bus/scsi/devices/host0/port-0:1/expander-0:1
=== h3 ===
/sys/bus/scsi/devices/host0/port-0:0/expander-0:0
/sys/bus/scsi/devices/host0/port-0:0/expander-0:0/port-0:0:0/expander-0:1
/sys/bus/scsi/devices/host0/port-0:1/expander-0:2
/sys/bus/scsi/devices/host0/port-0:1/expander-0:2/port-0:2:0/expander-0:3
/sys/bus/scsi/devices/host0/port-0:1/expander-0:2/port-0:2:0/expander-0:3/port-0:3:0/expander-0:4
--

(Sorry, I don't know how to relate the /sys/bus/scsi stuff to the scsi ids or
/dev/sgXX.)

The errors are showing up like:

--
$ for h in h1 h2 h3; do
  ssh $h '
for d in $(lsscsi -tg | awk \$2 == \enclosu\ { print \$5 }); do
  echo === $(hostname):$d ===
  sg_ses -j $d 21 
done
  '
done | egrep 'LSI|^=|^Slot 24|join_work|not decoded' | sed -r 's/^=/\n=/'

=== h1:/dev/sg24 ===
  LSI CORP  SAS2X36   0717
Slot 24 [0,23]  Element type: Array device slot

=== h1:/dev/sg27 ===
  LSI

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread Ric Wheeler


Hi Rob,

Comments inline below.

On 04/24/2013 01:44 AM, Elliott, Robert (Server Storage) wrote:

If the writeback cache is enabled (per the WCE bit in the Caching mode page),
prudent software uses the FUA bit in WRITE commands when writing metadata
and/or sends the SYNCHRONIZE CACHE command at important checkpoints to
ensure the data is not going to be lost due to a power loss.  Some
database software is particularly prolific at sending these commands.

Around 2003, many RAID controllers with non-volatile writeback caches honored
the SYNCHRONIZE CACHE command, flushing the entire cache to the drives.  This
started causing timeouts as non-volatile write cache sizes grew.  Recently,
it's even causing trouble on individual disk drives with growing volatile
write caches.

The intent of software using these commands and bits was unclear - it could be:
a) ensure data is in non-volatile cache (and will eventually be flushed)
or on the medium; or
b) ensure data is on the medium (so the drives are ready for removal).




Linux issues SYNCHRONIZE_CACHE commands when we need to make sure that the data 
needs to be crash safe (after a transaction commit from a file system journal, 
an explicit fsync call or write system call with O_SYNC set).


If the cache is nonvolatile (i.e., the target will have it after a power outage 
or reboot), we are fine - pretty much your (a) clause above.


Not sure we have thought through (or can control) how an array would handle 
pulling a drive from behind a RAID controller that has not flushed its state.


As a short-term fix, many RAID controllers assumed intent (a) and started
interpreting the SYNCHRONIZE CACHE command as a NOP and ignoring the FUA bit.


We have seen problems with some RAID controllers that leave the write cache 
enabled on back end drives - their cache is battery backed, but the cache on 
those backend drives is exposed to certain data loss on a power outage.


It would be nice if they always disabled the write cache on the backend drives 
*or* advertised WCE and propagated the SYNCHRONIZE_CACHE commands to each drive 
when we send them down.


Surprise removal of a drive from a RAID controller is risky even if software
has run SYNCHRONIZE CACHE, since the RAID controller might be doing other
activity in the background. So, there are other reasons to justify assuming
that the user just won't do that.

Afraid of breaking software with intent (b) (which was more likely in the
days of floppy disks, Bournelli Boxes, and other removable block devices),
T10 chose to clarify that the original meaning was (b) and added new
FUA_NV and SYNC_NV bits to let software express intent (a).  The hope
was that devices would implement the bits and software would start using
them at appropriate times.

Unfortunately, the short-term fix worked well enough that it still prevails
today, and most standalone removable media block devices have disappeared.
There is not much software actually sending the FUA_NV and SYNC_NV bits
and few devices honoring the bits per the standard.

As an SBC-3 letter ballot comment, I recently submitted T10 proposal
13-050 (see http://www.t10.org/doc13.htm) to obsolete the SYNC_NV and
FUA_NV bits and change the meaning of the commands without those bits
to intent (a), reflecting what the industry has actually done.


This is definitely something that we should review and take into account going 
forward.


It does sound like we have a lot of confusion around WCE meaning in the storage 
industry today though, which leads me to think that we will need to allow raw 
block accessing applications to manually override our flush settings (reluctantly!).


Regards,

Ric







-Original Message-
From: linux-scsi-ow...@vger.kernel.org 
[mailto:linux-scsi-ow...@vger.kernel.org] On Behalf Of Jeremy Linton
Sent: Tuesday, April 23, 2013 5:40 PM
To: James Bottomley
Cc: Ric Wheeler; linux-scsi@vger.kernel.org; Martin K. Petersen; Jeff Moyer; 
Tejun Heo; Mike Snitzer; dgilb...@interlog.com
Subject: Re: T10 WCE interpretation in Linux  device level access

On 4/23/2013 3:07 PM, James Bottomley wrote:


I bet they don't; they probably obey the spec.  There's a SYNC_NV bit
which if unset (which it is in our implementation) means only sync your
non-NV cache.  For a device with all NV, that equates to nop.

Yes, linux leaves the SYNC_NV bit unset in scsi_setup_flush_cmnd().

The draft specs, and a couple others I have laying about says: says the device
shall sync cache to medium for both volatile and non volatile cache data if
SYNC_NV is _unset_.

With it set, the table could be more confusing!

For volatile cache blocks with SYNC_NV set If a non-volatile cache is present,
then the device server shall synchronize to non-volatile cache or to the medium.
If a non-volatile cache is not present, then the device server shall synchronize
to the medium.

And for Non-volatile cache with it set No Requirement


Which to me says, don't expect any particular behavior

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread Paolo Bonzini

Il 23/04/2013 22:07, James Bottomley ha scritto:
 On Tue, 2013-04-23 at 15:41 -0400, Ric Wheeler wrote:
 For many years, we have used WCE as an indication that a device has a 
 volatile 
 write cache (not just a write cache) and used this as a trigger to send down 
 SYNCHRONIZE_CACHE commands as needed.

 Some arrays with non-volatile cache seem to have WCE set and simply ignore 
 the 
 command.
 
 I bet they don't; they probably obey the spec.  There's a SYNC_NV bit
 which if unset (which it is in our implementation) means only sync your
 non-NV cache.  For a device with all NV, that equates to nop.

Isn't it the other way round?

SYNC_NV = 0 means sync all your caches to the medium, and it's what we do.

SYNC_NV = 1 means sync volatile to non-volatile, and it's what Ric wants.

So we should set SYNC_NV=1 if NV_SUP is set, perhaps only if the medium
is non-removable just to err on the safe side.

Ric, can you check what your arrays set in VPD page 0x86 byte 6 bit 1?

Paolo

 Some arrays with non-volatile cache seem to not set WCE.

 Others arrays with non-volatile cache - our problem arrays - set WCE and do 
 something horrible and slow when sent the SYNCHRONIZE_CACHE commands.
 
 These arrays sound to be out of spec, so we should probably just
 blacklist them.
 
 Note that for file systems, you can override this behavior by mounting with 
 our 
 barriers disabled (mount -o nobarrier .). There is currently no way do 
 disable this for anything using the device directly, not through the file 
 system.

 Some applications run against block devices - not through a file system - 
 and 
 want not to slow to a crawl when they have an array in my problem set.

 Giving them a hook to ignore WCE seems to be a hack, but one that would 
 resolve 
 issues with users who won't want to wait months (years?) for us to convince 
 the 
 array vendors.

 Is this a hook worth doing?
 
 We already have the ability to set the cache type in sysfs.  It tries to
 do a mode select back to the array, but the USB guys want it for the
 reverse problem (write back cache behind bridge declared as write
 through).
 
 Have we hashed this out in the T10 committee?
 
 SBC-3 contains everything one could wish for about handling devices with
 volatile and NV cache, I thought.
 
 James
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread Hannes Reinecke

On 04/23/2013 10:07 PM, James Bottomley wrote:
 On Tue, 2013-04-23 at 15:41 -0400, Ric Wheeler wrote:
 For many years, we have used WCE as an indication that a device has a 
 volatile 
 write cache (not just a write cache) and used this as a trigger to send down 
 SYNCHRONIZE_CACHE commands as needed.

 Some arrays with non-volatile cache seem to have WCE set and simply ignore 
 the 
 command.
 
 I bet they don't; they probably obey the spec.  There's a SYNC_NV bit
 which if unset (which it is in our implementation) means only sync your
 non-NV cache.  For a device with all NV, that equates to nop.
 
 Some arrays with non-volatile cache seem to not set WCE.

 Others arrays with non-volatile cache - our problem arrays - set WCE and do 
 something horrible and slow when sent the SYNCHRONIZE_CACHE commands.
 
 These arrays sound to be out of spec, so we should probably just
 blacklist them.
 
Don't think so.
There is no time limit for the SYNCHRONIZE_CACHE command, so the
array might take all day to write out the cache.

In fact, I've recently had a rather heated discussion with a certain
storage vendor which reserved his right to take up to several
seconds when flushing the cache.
Might be that we're in fact talking about the same one ... are we on
a naming-and-shaming policy here ?
If so I could tell you some really _interesting_ details about their
behaviour ...

Also note that SYNCHRONIZE_CACHE was always problematic; and as
we're not even setting the LBA range we're even have cross-speak
when issuing it to partitioned devices.

 Note that for file systems, you can override this behavior by mounting with 
 our 
 barriers disabled (mount -o nobarrier .). There is currently no way do 
 disable this for anything using the device directly, not through the file 
 system.

 Some applications run against block devices - not through a file system - 
 and 
 want not to slow to a crawl when they have an array in my problem set.

 Giving them a hook to ignore WCE seems to be a hack, but one that would 
 resolve 
 issues with users who won't want to wait months (years?) for us to convince 
 the 
 array vendors.

 Is this a hook worth doing?
 
 We already have the ability to set the cache type in sysfs.  It tries to
 do a mode select back to the array, but the USB guys want it for the
 reverse problem (write back cache behind bridge declared as write
 through).
 
You can always set the 'IMMED' bit for these arrays :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH][v3] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd

2013-04-24 Thread Hannes Reinecke

scsi_send_eh_cmnd() is calling queuecommand() directly, so
it needs to check the return value here.
The only valid return codes for queuecommand() are 'busy'
states, so we need to wait for a bit to allow the LLDD
to recover.

Based on an earlier patch from Wen Xiong.

Cc: Wen Xiong wenxi...@linux.vnet.ibm.com
Cc: Brian King brk...@linux.vnet.ibm.com
Signed-off-by: Hannes Reinecke h...@suse.de

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c1b05a8..1fc6da94 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -791,22 +791,32 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
unsigned char *cmnd,
struct scsi_device *sdev = scmd-device;
struct Scsi_Host *shost = sdev-host;
DECLARE_COMPLETION_ONSTACK(done);
-   unsigned long timeleft;
+   unsigned long timeleft = timeout;
struct scsi_eh_save ses;
+   const int stall_for = min(HZ/10, 1);
int rtn;
 
+retry:
scsi_eh_prep_cmnd(scmd, ses, cmnd, cmnd_size, sense_bytes);
shost-eh_action = done;
 
scsi_log_send(scmd);
scmd-scsi_done = scsi_eh_done;
-   shost-hostt-queuecommand(shost, scmd);
-
-   timeleft = wait_for_completion_timeout(done, timeout);
+   rtn = shost-hostt-queuecommand(shost, scmd);
+   if (rtn) {
+   if (timeleft) {
+   scsi_eh_restore_cmnd(scmd, ses);
+   timeleft -= stall_for;
+   msleep(stall_for);
+   goto retry;
+   }
+   rtn = NEEDS_RETRY;
+   } else
+   timeleft = wait_for_completion_timeout(done, timeout);
 
shost-eh_action = NULL;
 
-   scsi_log_completion(scmd, SUCCESS);
+   scsi_log_completion(scmd, rtn);
 
SCSI_LOG_ERROR_RECOVERY(3,
printk(%s: scmd: %p, timeleft: %ld\n,
@@ -837,7 +847,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
unsigned char *cmnd,
rtn = FAILED;
break;
}
-   } else {
+   } else if (!rtn) {
scsi_abort_eh_cmnd(scmd);
rtn = FAILED;
}
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread Paolo Bonzini

Il 24/04/2013 14:07, Hannes Reinecke ha scritto:
 On 04/24/2013 01:17 PM, Paolo Bonzini wrote:
 Il 23/04/2013 22:07, James Bottomley ha scritto:
 On Tue, 2013-04-23 at 15:41 -0400, Ric Wheeler wrote:
 For many years, we have used WCE as an indication that a device has a 
 volatile 
 write cache (not just a write cache) and used this as a trigger to send 
 down 
 SYNCHRONIZE_CACHE commands as needed.

 Some arrays with non-volatile cache seem to have WCE set and simply ignore 
 the 
 command.

 I bet they don't; they probably obey the spec.  There's a SYNC_NV bit
 which if unset (which it is in our implementation) means only sync your
 non-NV cache.  For a device with all NV, that equates to nop.

 Isn't it the other way round?

 SYNC_NV = 0 means sync all your caches to the medium, and it's what we do.

 SYNC_NV = 1 means sync volatile to non-volatile, and it's what Ric wants.

 So we should set SYNC_NV=1 if NV_SUP is set, perhaps only if the medium
 is non-removable just to err on the safe side.
 
 Or use 'WRITE_AND_VERIFY' here; that's guaranteed to hit the disk.
 Plus it even has a guarantee about data consistency on the disk,
 which the normal WRITE command has not.

The point is to _avoid_ hitting the disk. :)

Paolo

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread Hannes Reinecke

On 04/24/2013 02:08 PM, Paolo Bonzini wrote:
 Il 24/04/2013 14:07, Hannes Reinecke ha scritto:
 On 04/24/2013 01:17 PM, Paolo Bonzini wrote:
 Il 23/04/2013 22:07, James Bottomley ha scritto:
 On Tue, 2013-04-23 at 15:41 -0400, Ric Wheeler wrote:
 For many years, we have used WCE as an indication that a device has a 
 volatile 
 write cache (not just a write cache) and used this as a trigger to send 
 down 
 SYNCHRONIZE_CACHE commands as needed.

 Some arrays with non-volatile cache seem to have WCE set and simply 
 ignore the 
 command.

 I bet they don't; they probably obey the spec.  There's a SYNC_NV bit
 which if unset (which it is in our implementation) means only sync your
 non-NV cache.  For a device with all NV, that equates to nop.

 Isn't it the other way round?

 SYNC_NV = 0 means sync all your caches to the medium, and it's what we do.

 SYNC_NV = 1 means sync volatile to non-volatile, and it's what Ric wants.

 So we should set SYNC_NV=1 if NV_SUP is set, perhaps only if the medium
 is non-removable just to err on the safe side.

 Or use 'WRITE_AND_VERIFY' here; that's guaranteed to hit the disk.
 Plus it even has a guarantee about data consistency on the disk,
 which the normal WRITE command has not.
 
 The point is to _avoid_ hitting the disk. :)
 
Ah. Really?

Why do we discuss SYNCHRONIZE CACHE then?
I was under the impression that we're talking various 'barriers'
(or rather 'flush' nowadays) implementations.
Which require that some data needs to be written to disk before
continuing.

Or did I somehow misread the thread?

Confused,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread Paolo Bonzini

Il 24/04/2013 14:12, Hannes Reinecke ha scritto:
 On 04/24/2013 02:08 PM, Paolo Bonzini wrote:
 Il 24/04/2013 14:07, Hannes Reinecke ha scritto:
 On 04/24/2013 01:17 PM, Paolo Bonzini wrote:
 Il 23/04/2013 22:07, James Bottomley ha scritto:
 On Tue, 2013-04-23 at 15:41 -0400, Ric Wheeler wrote:
 For many years, we have used WCE as an indication that a device has a 
 volatile 
 write cache (not just a write cache) and used this as a trigger to send 
 down 
 SYNCHRONIZE_CACHE commands as needed.

 Some arrays with non-volatile cache seem to have WCE set and simply 
 ignore the 
 command.

 I bet they don't; they probably obey the spec.  There's a SYNC_NV bit
 which if unset (which it is in our implementation) means only sync your
 non-NV cache.  For a device with all NV, that equates to nop.

 Isn't it the other way round?

 SYNC_NV = 0 means sync all your caches to the medium, and it's what we 
 do.

 SYNC_NV = 1 means sync volatile to non-volatile, and it's what Ric wants.

 So we should set SYNC_NV=1 if NV_SUP is set, perhaps only if the medium
 is non-removable just to err on the safe side.

 Or use 'WRITE_AND_VERIFY' here; that's guaranteed to hit the disk.
 Plus it even has a guarantee about data consistency on the disk,
 which the normal WRITE command has not.

 The point is to _avoid_ hitting the disk. :)

 Ah. Really?
 
 Why do we discuss SYNCHRONIZE CACHE then?

Because we do want the data to hit the non-volatile cache, just in case
the disk has both a volatile and a non-volatile cache.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread Ric Wheeler


On 04/24/2013 08:08 AM, Paolo Bonzini wrote:

Il 24/04/2013 14:07, Hannes Reinecke ha scritto:

On 04/24/2013 01:17 PM, Paolo Bonzini wrote:

Il 23/04/2013 22:07, James Bottomley ha scritto:

On Tue, 2013-04-23 at 15:41 -0400, Ric Wheeler wrote:

For many years, we have used WCE as an indication that a device has a volatile
write cache (not just a write cache) and used this as a trigger to send down
SYNCHRONIZE_CACHE commands as needed.

Some arrays with non-volatile cache seem to have WCE set and simply ignore the
command.

I bet they don't; they probably obey the spec.  There's a SYNC_NV bit
which if unset (which it is in our implementation) means only sync your
non-NV cache.  For a device with all NV, that equates to nop.

Isn't it the other way round?

SYNC_NV = 0 means sync all your caches to the medium, and it's what we do.

SYNC_NV = 1 means sync volatile to non-volatile, and it's what Ric wants.

So we should set SYNC_NV=1 if NV_SUP is set, perhaps only if the medium
is non-removable just to err on the safe side.

Or use 'WRITE_AND_VERIFY' here; that's guaranteed to hit the disk.
Plus it even has a guarantee about data consistency on the disk,
which the normal WRITE command has not.

The point is to _avoid_ hitting the disk. :)

Paolo



The point is to have a crash-proof version of the data acknowledged by the 
target device while letting data sit in volatile state as long as possible. To 
be even clearer, we would love to do this for a sub-range of the device but 
currently use a big hammer to flush the entire cache (possibly for multiple 
file systems on one target storage device).


Once we use the SYNCHRONIZE_CACHE (or CACHE_FLUSH_EXT) command, we want the data 
on that target device to be there if someone loses power.


If the device can promise this, we don't care (and don't know) how it manages 
that promise. It can leave the data on battery backed DRAM, can archive it to 
flash or any other scheme that works.


Just as importantly, we don't want to destage data to the back end drives if 
that is not required since it is really, really slow.


The confusion here is that various storage devices have used the standard bits 
in arbitrary ways which makes it very hard to have one clear set of rules.


Even harder to explain to end users when to use a work around (like mount -o 
nobarrier) or the proposed ignore flushes block level call :)


Regards,

Ric

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread Paolo Bonzini

Il 24/04/2013 14:27, Ric Wheeler ha scritto:
 The point is to _avoid_ hitting the disk. :)
 
 The point is to have a crash-proof version of the data acknowledged by
 the target device while letting data sit in volatile state as long as
 possible. To be even clearer, we would love to do this for a sub-range
 of the device but currently use a big hammer to flush the entire cache
 (possibly for multiple file systems on one target storage device).
 
 Once we use the SYNCHRONIZE_CACHE (or CACHE_FLUSH_EXT) command, we want
 the data on that target device to be there if someone loses power.
 
 If the device can promise this, we don't care (and don't know) how it
 manages that promise. It can leave the data on battery backed DRAM, can
 archive it to flash or any other scheme that works.

That's exactly the point of SYNC_NV=1.

 Just as importantly, we don't want to destage data to the back end
 drives if that is not required since it is really, really slow.
 
 The confusion here is that various storage devices have used the
 standard bits in arbitrary ways which makes it very hard to have one
 clear set of rules.

Also that we have ignored the problem for long, and it's worked
surprisingly well. :)

 Even harder to explain to end users when to use a work around (like
 mount -o nobarrier) or the proposed ignore flushes block level call :)

Hoping Thunderbird doesn't mangle the patch too badly, code might be
worth a thousand words... see after the sig, compile-tested only.  I have
no access to these controllers, neither the good ones nor the bad ones. :)

Paolo

- 8 -
From: Paolo Bonzini pbonz...@redhat.com
Subject: [PATCH] scsi: only make REQ_FLUSH flush to non-volatile cache

The point of REQ_FLUSH is to have a crash-proof version of the data
acknowledged by the target device.  We want the data on that target
device to be there if someone loses power, but we don't care (and don't
want to know) how it manages that promise.  It can leave the data on
battery backed DRAM, can archive it to flash or any other scheme that
works.

This is exactly what SYNC_NV=1 does.  Instead, SYNC_NV=0 should flush
the data to the medium, which is not desirable when we have a non-volatile
cache (except perhaps if the medium is removable).

NOT-Tested-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7992635..97ecfd9 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -800,9 +800,17 @@ static int sd_setup_write_same_cmnd(struct scsi_device 
*sdp, struct request *rq)
 
 static int scsi_setup_flush_cmnd(struct scsi_device *sdp, struct request *rq)
 {
+   struct scsi_disk *sdkp = scsi_disk(rq-rq_disk);
+
rq-timeout = SD_FLUSH_TIMEOUT;
rq-retries = SD_MAX_RETRIES;
rq-cmd[0] = SYNCHRONIZE_CACHE;
+
+   /* No need to synchronize to medium if we have a non-volatile cache,
+* but be safe if the medium could just go away.
+*/
+   if (sdkp-nv_sup  !sdp-removable)
+   rq-cmd[1] |= 4; /* SYNC_NV */
rq-cmd_len = 10;
 
return scsi_setup_blk_pc_cmnd(sdp, rq);
@@ -2511,6 +2519,26 @@ static void sd_read_app_tag_own(struct scsi_disk *sdkp, 
unsigned char *buffer)
 }
 
 /**
+ * sd_read_extended_inquiry - Query disk device for non-volatile cache.
+ * @disk: disk to query
+ */
+static void sd_read_extended_inquiry(struct scsi_disk *sdkp)
+{
+   const int vpd_len = 64;
+   unsigned char *buffer = kmalloc(vpd_len, GFP_KERNEL);
+
+   if (!buffer ||
+   /* Block Limits VPD */
+   scsi_get_vpd_page(sdkp-device, 0x86, buffer, vpd_len))
+   goto out;
+
+   sdkp-nv_sup = (buffer[6]  0x02) != 0;
+
+ out:
+   kfree(buffer);
+}
+
+/**
  * sd_read_block_limits - Query disk device for preferred I/O sizes.
  * @disk: disk to query
  */
@@ -2684,6 +2712,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
sd_read_capacity(sdkp, buffer);
 
if (sd_try_extended_inquiry(sdp)) {
+   sd_read_extended_inquiry(sdkp);
sd_read_block_provisioning(sdkp);
sd_read_block_limits(sdkp);
sd_read_block_characteristics(sdkp);
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 74a1e4c..6334dfe 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -84,6 +84,7 @@ struct scsi_disk {
unsignedlbpws10 : 1;
unsignedlbpvpd : 1;
unsignedws16 : 1;
+   unsignednv_sup : 1;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 

 Regards,
 
 Ric
 

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] scsi: ufs: Add support for sending NOP OUT UPIU

2013-04-24 Thread Sujit Reddy Thumma

As part of device initialization sequence, sending NOP OUT UPIU and
waiting for NOP IN UPIU response is mandatory. This confirms that the
device UFS Transport (UTP) layer is functional and the host can configure
the device with further commands. Add support for sending NOP OUT UPIU to
check the device connection path and test whether the UTP layer on the
device side is functional during initialization.

Signed-off-by: Sujit Reddy Thumma sthu...@codeaurora.org
---
 drivers/scsi/ufs/ufshcd.c |  167 ++---
 drivers/scsi/ufs/ufshcd.h |4 +
 2 files changed, 162 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index ced421a..1a8cba3 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -34,12 +34,21 @@
  */
 
 #include ufshcd.h
+#include linux/iopoll.h
+#include linux/async.h
 
 /* Timeout after 10 secs if UIC command hangs */
 #define UIC_COMMAND_TIMEOUT(10 * HZ)
 /* Retry for 3 times in case of UIC failures */
 #define UIC_COMMAND_RETRIES3
 
+/* NOP OUT retries waiting for NOP IN response */
+#define NOP_OUT_RETRIES10
+/* Timeout after 30 msecs if NOP OUT hangs without response */
+#define NOP_OUT_TIMEOUT30 /* msecs */
+/* Reserved tag for internal commands */
+#define INTERNAL_CMD_TAG   0
+
 enum {
UFSHCD_MAX_CHANNEL  = 0,
UFSHCD_MAX_ID   = 1,
@@ -607,7 +616,7 @@ static void ufshcd_prepare_req_desc(struct ufshcd_lrb 
*lrbp, u32 *upiu_flags)
 {
struct utp_transfer_req_desc *req_desc = lrbp-utr_descriptor_ptr;
enum dma_data_direction cmd_dir =
-   lrbp-cmd-sc_data_direction;
+   lrbp-cmd ? lrbp-cmd-sc_data_direction : DMA_NONE;
u32 data_direction;
u32 dword_0;
 
@@ -624,6 +633,8 @@ static void ufshcd_prepare_req_desc(struct ufshcd_lrb 
*lrbp, u32 *upiu_flags)
 
dword_0 = data_direction | (lrbp-command_type
 UPIU_COMMAND_TYPE_OFFSET);
+   if (lrbp-intr_cmd)
+   dword_0 |= UTP_REQ_DESC_INT_CMD;
 
/* Transfer request descriptor header fields */
req_desc-header.dword_0 = cpu_to_le32(dword_0);
@@ -712,6 +723,18 @@ static inline void 
ufshcd_prepare_utp_query_req_upiu(struct ufs_hba *hba,
 
 }
 
+static inline void ufshcd_prepare_utp_nop_upiu(struct ufshcd_lrb *lrbp)
+{
+   struct utp_upiu_req *ucd_req_ptr = lrbp-ucd_req_ptr;
+
+   memset(ucd_req_ptr, 0, sizeof(struct utp_upiu_req));
+
+   /* command descriptor fields */
+   ucd_req_ptr-header.dword_0 =
+   UPIU_HEADER_DWORD(
+   UPIU_TRANSACTION_NOP_OUT, 0, 0, lrbp-task_tag);
+}
+
 /**
  * ufshcd_compose_upiu - form UFS Protocol Information Unit(UPIU)
  * @hba - UFS hba
@@ -741,12 +764,14 @@ static int ufshcd_compose_upiu(struct ufs_hba *hba, 
struct ufshcd_lrb *lrbp)
case UTP_CMD_TYPE_SCSI:
case UTP_CMD_TYPE_DEV_MANAGE:
ufshcd_prepare_req_desc(lrbp, upiu_flags);
-   if (lrbp-cmd  lrbp-command_type == UTP_CMD_TYPE_SCSI)
+   if (lrbp-cmd  lrbp-command_type == UTP_CMD_TYPE_SCSI) {
ufshcd_prepare_utp_scsi_cmd_upiu(lrbp, upiu_flags);
-   else if (lrbp-cmd)
+   } else if (lrbp-cmd  ufshcd_is_query_req(lrbp)) {
ufshcd_prepare_utp_query_req_upiu(hba, lrbp,
upiu_flags);
-   else {
+   } else if (!lrbp-cmd) {
+   ufshcd_prepare_utp_nop_upiu(lrbp);
+   } else {
dev_err(hba-dev, %s: Invalid UPIU request\n,
__func__);
ret = -EINVAL;
@@ -799,6 +824,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *cmd)
lrbp-sense_buffer = cmd-sense_buffer;
lrbp-task_tag = tag;
lrbp-lun = cmd-device-lun;
+   lrbp-intr_cmd = false;
 
if (ufshcd_is_query_req(lrbp))
lrbp-command_type = UTP_CMD_TYPE_DEV_MANAGE;
@@ -1112,6 +1138,113 @@ static int ufshcd_dme_link_startup(struct ufs_hba *hba)
return err;
 }
 
+static inline int
+ufshcd_compose_nop_out_upiu(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
+{
+   lrbp-cmd = NULL;
+   lrbp-sense_bufflen = 0;
+   lrbp-sense_buffer = NULL;
+   lrbp-task_tag = INTERNAL_CMD_TAG;
+   lrbp-lun = 0; /* NOP OUT is not specific to any LUN */
+   lrbp-command_type = UTP_CMD_TYPE_DEV_MANAGE;
+   lrbp-intr_cmd = true; /* No interrupt aggregation */
+
+   return ufshcd_compose_upiu(hba, lrbp);
+}
+
+/**
+ * ufshcd_validate_dev_connection() - Check device connection status
+ * @hba: per-adapter instance
+ *
+ * Send NOP OUT UPIU and wait for NOP IN response to check whether the
+ * device Transport Protocol (UTP) layer is ready after a reset.
+ * If the UTP layer at the device side is not initialized, it

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread Jeremy Linton

On 4/24/2013 7:57 AM, Paolo Bonzini wrote:
 If the device can promise this, we don't care (and don't know) how it 
 manages that promise. It can leave the data on battery backed DRAM, can 
 archive it to flash or any other scheme that works.
 
 That's exactly the point of SYNC_NV=1.

Well its the point, but the specification is written such that the 
vendors can
choose to implement it any way they wish, especially for split cache
systems where there is both volatile and non volatile cache.

Flushing the NV cache to medium (as is the current behavior) may not be 
a bad
idea anyway.

Thats because I know of a large vendors array where the non-volatile 
cache
might be better described as the sometimes non-volatile cache. That is because
a failure to flush the volatile portions results in the non-volatile portions
being considered invalid when power is restored. This fences the volume, and the
usual method for recovering the array is to call support and have them
invalidate the NV portions of the cache. Thereby negating the whole reason for
having a NV cache. I'm sure they don't tell customers this fact when they sell
the array, when it happened in our lab I was in a state of shock for about a 
week.









--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][v3] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd

2013-04-24 Thread James Bottomley

On Wed, 2013-04-24 at 13:32 +0200, Hannes Reinecke wrote:
 scsi_send_eh_cmnd() is calling queuecommand() directly, so
 it needs to check the return value here.
 The only valid return codes for queuecommand() are 'busy'
 states, so we need to wait for a bit to allow the LLDD
 to recover.
 
 Based on an earlier patch from Wen Xiong.

This looks good, only one minor nitpick:

 Cc: Wen Xiong wenxi...@linux.vnet.ibm.com
 Cc: Brian King brk...@linux.vnet.ibm.com
 Signed-off-by: Hannes Reinecke h...@suse.de
 
 diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
 index c1b05a8..1fc6da94 100644
 --- a/drivers/scsi/scsi_error.c
 +++ b/drivers/scsi/scsi_error.c
 @@ -791,22 +791,32 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
 unsigned char *cmnd,
   struct scsi_device *sdev = scmd-device;
   struct Scsi_Host *shost = sdev-host;
   DECLARE_COMPLETION_ONSTACK(done);
 - unsigned long timeleft;
 + unsigned long timeleft = timeout;
   struct scsi_eh_save ses;
 + const int stall_for = min(HZ/10, 1);
   int rtn;
  
 +retry:
   scsi_eh_prep_cmnd(scmd, ses, cmnd, cmnd_size, sense_bytes);
   shost-eh_action = done;
  
   scsi_log_send(scmd);
   scmd-scsi_done = scsi_eh_done;
 - shost-hostt-queuecommand(shost, scmd);
 -
 - timeleft = wait_for_completion_timeout(done, timeout);
 + rtn = shost-hostt-queuecommand(shost, scmd);
 + if (rtn) {
 + if (timeleft) {
 + scsi_eh_restore_cmnd(scmd, ses);
 + timeleft -= stall_for;
 + msleep(stall_for);

Stall for is in HZ: need to convert to ms, so

msleep(stall_for * 1000/HZ);

I also don't see a need to restore and reprep the command each time, but
I don't see a problem with it either.

James


 + goto retry;
 + }
 + rtn = NEEDS_RETRY;
 + } else
 + timeleft = wait_for_completion_timeout(done, timeout);
  
   shost-eh_action = NULL;
  
 - scsi_log_completion(scmd, SUCCESS);
 + scsi_log_completion(scmd, rtn);
  
   SCSI_LOG_ERROR_RECOVERY(3,
   printk(%s: scmd: %p, timeleft: %ld\n,
 @@ -837,7 +847,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
 unsigned char *cmnd,
   rtn = FAILED;
   break;
   }
 - } else {
 + } else if (!rtn) {
   scsi_abort_eh_cmnd(scmd);
   rtn = FAILED;
   }
 --
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread Douglas Gilbert


On 13-04-23 03:41 PM, Ric Wheeler wrote:


For many years, we have used WCE as an indication that a device has a volatile
write cache (not just a write cache) and used this as a trigger to send down
SYNCHRONIZE_CACHE commands as needed.

Some arrays with non-volatile cache seem to have WCE set and simply ignore the
command.

Some arrays with non-volatile cache seem to not set WCE.

Others arrays with non-volatile cache - our problem arrays - set WCE and do
something horrible and slow when sent the SYNCHRONIZE_CACHE commands.

Note that for file systems, you can override this behavior by mounting with our
barriers disabled (mount -o nobarrier .). There is currently no way do
disable this for anything using the device directly, not through the file 
system.

Some applications run against block devices - not through a file system - and
want not to slow to a crawl when they have an array in my problem set.

Giving them a hook to ignore WCE seems to be a hack, but one that would resolve
issues with users who won't want to wait months (years?) for us to convince the
array vendors.

Is this a hook worth doing?

Have we hashed this out in the T10 committee?


Naturally I'm biased, but I tend to think the user space
is usually smarter than the kernel. That assumes skilled
users.

So if the user space issues a SYNCHRONIZE_CACHE with the
IMMED bit set and for the whole disk then the user should
have a way of forcing that command to be issued. The
assumption here is that the skilled user is about to power
down that array or pull some disks or SSDs *.

The more questionable cases are when a file system or the
block layer is issuing a barrier or some such that
translates to a SYNCHRONIZE_CACHE. That should be ignored
in some cases already discussed in this thread.


While working with SoCs I have noticed an interesting
technique. Sub-system sized sections of the memory mapped
IO space (e.g. a bank of GPIOs) can be write protected by
a simple ASCII sequence **. Attempts to change configuration
registers after write protect are ignored and an error
is noted (if anyone cares). The same ACSII sequence can be
used to un-write protect those sub-system configuration
registers. Typically on a SoC if the GPIOs are randomly
re-configured, it's game over.

Back to the SCSI world: a better solution might be if an
LLD could be informed of the reason a SCSI control command
is being issued (a sort of come from field). Failing, or
it addition to that, a sysfs interface could be added to
filter out dangerous SCSI commands:
  echo SC  /sys/class/scsi_device/8:0:0:0/device/filter

  cat /sys/class/scsi_device/8:0:0:0/device/filter
FU SC

If, for whatever reason, we did ignore a SYNCHRONIZE_CACHE
command we could use vendor specific sense data (vendor=Linux)
to indicate that a command had been ignored. That could be
extended to all SCSI commands that are filtered out ***;
better that than EIO, EACCES etc.

Doug Gilbert

*   and if Linux doesn't permit this, then user might be
advised to run another, more obedient, host OS with
Linux running as a VM. A pass-by rather than a
pass-through ...

**  only the configuration registers are write protected, so
data can still be written to the GPIOs

*** like me, many pass-through users cannot see why SCSI
commands injected to the SCSI subsystem (e.g. via
sg or bsg) are filtered out silently by the block layer.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5] scsi: ufs: move the ufshcd_hba_stop to ufshcd.c

2013-04-24 Thread Seungwon Jeon

Move the ufshcd_hba_stop from header file.

Signed-off-by: Seungwon Jeon tgih@samsung.com
---
 drivers/scsi/ufs/ufshcd.c |9 +
 drivers/scsi/ufs/ufshcd.h |9 -
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 60fd40c..41b9639 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -285,6 +285,15 @@ static inline void ufshcd_hba_start(struct ufs_hba *hba)
 }
 
 /**
+ * ufshcd_hba_stop - Send controller to reset state
+ * @hba: per adapter instance
+ */
+static inline void ufshcd_hba_stop(struct ufs_hba *hba)
+{
+   writel(CONTROLLER_DISABLE, (hba-mmio_base + REG_CONTROLLER_ENABLE));
+}
+
+/**
  * ufshcd_is_hba_active - Get controller state
  * @hba: per adapter instance
  *
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 6b99a42..1680394 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -190,13 +190,4 @@ int ufshcd_init(struct device *, struct ufs_hba ** , void 
__iomem * ,
unsigned int);
 void ufshcd_remove(struct ufs_hba *);
 
-/**
- * ufshcd_hba_stop - Send controller to reset state
- * @hba: per adapter instance
- */
-static inline void ufshcd_hba_stop(struct ufs_hba *hba)
-{
-   writel(CONTROLLER_DISABLE, (hba-mmio_base + REG_CONTROLLER_ENABLE));
-}
-
 #endif /* End of Header */
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] scsi: ufs: amend interrupt configuration

2013-04-24 Thread Seungwon Jeon

It makes interrupt setting more flexible especially
for disabling. And wrong bit mask is fixed for ver 1.0.
[17:16] is added for mask.

Signed-off-by: Seungwon Jeon tgih@samsung.com
---
 drivers/scsi/ufs/ufshcd.c |   86 -
 drivers/scsi/ufs/ufshcd.h |4 +-
 drivers/scsi/ufs/ufshci.h |5 ++-
 3 files changed, 66 insertions(+), 29 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index b6c19b0..efe2256 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -35,6 +35,10 @@
 
 #include ufshcd.h
 
+#define UFSHCD_ENABLE_INTRS(UTP_TRANSFER_REQ_COMPL |\
+UTP_TASK_REQ_COMPL |\
+UFSHCD_ERROR_MASK)
+
 enum {
UFSHCD_MAX_CHANNEL  = 0,
UFSHCD_MAX_ID   = 1,
@@ -64,6 +68,20 @@ enum {
 };
 
 /**
+ * ufshcd_get_intr_mask - Get the interrupt bit mask
+ * @hba - Pointer to adapter instance
+ *
+ * Returns interrupt bit mask per version
+ */
+static inline u32 ufshcd_get_intr_mask(struct ufs_hba *hba)
+{
+   if (hba-ufs_version == UFSHCI_VERSION_10)
+   return INTERRUPT_MASK_ALL_VER_10;
+   else
+   return INTERRUPT_MASK_ALL_VER_11;
+}
+
+/**
  * ufshcd_get_ufs_version - Get the UFS version supported by the HBA
  * @hba - Pointer to adapter instance
  *
@@ -397,25 +415,45 @@ static int ufshcd_map_sg(struct ufshcd_lrb *lrbp)
 }
 
 /**
- * ufshcd_int_config - enable/disable interrupts
+ * ufshcd_enable_intr - enable interrupts
  * @hba: per adapter instance
- * @option: interrupt option
+ * @intrs: interrupt bits
  */
-static void ufshcd_int_config(struct ufs_hba *hba, u32 option)
+static void ufshcd_enable_intr(struct ufs_hba *hba, u32 intrs)
 {
-   switch (option) {
-   case UFSHCD_INT_ENABLE:
-   ufshcd_writel(hba, REG_INTERRUPT_ENABLE, hba-int_enable_mask);
-   break;
-   case UFSHCD_INT_DISABLE:
-   if (hba-ufs_version == UFSHCI_VERSION_10)
-   ufshcd_writel(hba, REG_INTERRUPT_ENABLE,
- INTERRUPT_DISABLE_MASK_10);
-   else
-   ufshcd_writel(hba, REG_INTERRUPT_ENABLE,
- INTERRUPT_DISABLE_MASK_11);
-   break;
+   u32 set = ufshcd_readl(hba, REG_INTERRUPT_ENABLE);
+
+   if (hba-ufs_version == UFSHCI_VERSION_10) {
+   u32 rw;
+   rw = set  INTERRUPT_MASK_RW_VER_10;
+   set = rw | ((set ^ intrs)  intrs);
+   } else {
+   set |= intrs;
+   }
+
+   ufshcd_writel(hba, REG_INTERRUPT_ENABLE, set);
+}
+
+/**
+ * ufshcd_disable_intr - disable interrupts
+ * @hba: per adapter instance
+ * @intrs: interrupt bits
+ */
+static void ufshcd_disable_intr(struct ufs_hba *hba, u32 intrs)
+{
+   u32 set = ufshcd_readl(hba, REG_INTERRUPT_ENABLE);
+
+   if (hba-ufs_version == UFSHCI_VERSION_10) {
+   u32 rw;
+   rw = (set  INTERRUPT_MASK_RW_VER_10) 
+   ~(intrs  INTERRUPT_MASK_RW_VER_10);
+   set = rw | ((set  intrs)  ~INTERRUPT_MASK_RW_VER_10);
+
+   } else {
+   set = ~intrs;
}
+
+   ufshcd_writel(hba, REG_INTERRUPT_ENABLE, set);
 }
 
 /**
@@ -717,8 +755,7 @@ static int ufshcd_dme_link_startup(struct ufs_hba *hba)
uic_cmd-argument3 = 0;
 
/* enable UIC related interrupts */
-   hba-int_enable_mask |= UIC_COMMAND_COMPL;
-   ufshcd_int_config(hba, UFSHCD_INT_ENABLE);
+   ufshcd_enable_intr(hba, UIC_COMMAND_COMPL);
 
/* sending UIC commands to controller */
ufshcd_send_uic_command(hba, uic_cmd);
@@ -765,13 +802,9 @@ static int ufshcd_make_hba_operational(struct ufs_hba *hba)
}
 
/* Enable required interrupts */
-   hba-int_enable_mask |= (UTP_TRANSFER_REQ_COMPL |
-UIC_ERROR |
-UTP_TASK_REQ_COMPL |
-DEVICE_FATAL_ERROR |
-CONTROLLER_FATAL_ERROR |
-SYSTEM_BUS_FATAL_ERROR);
-   ufshcd_int_config(hba, UFSHCD_INT_ENABLE);
+   ufshcd_enable_intr(hba, UTP_TRANSFER_REQ_COMPL | UIC_ERROR |
+  UTP_TASK_REQ_COMPL | DEVICE_FATAL_ERROR |
+  CONTROLLER_FATAL_ERROR | SYSTEM_BUS_FATAL_ERROR);
 
/* Configure interrupt aggregation */
ufshcd_config_int_aggr(hba, INT_AGGR_CONFIG);
@@ -1578,7 +1611,7 @@ static void ufshcd_hba_free(struct ufs_hba *hba)
 void ufshcd_remove(struct ufs_hba *hba)
 {
/* disable interrupts */
-   ufshcd_int_config(hba, UFSHCD_INT_DISABLE);
+   ufshcd_disable_intr(hba, hba-intr_mask);
 
ufshcd_hba_stop(hba);
ufshcd_hba_free(hba);
@@ -1636,6 +1669,9 @@ int ufshcd_init(struct device *dev, struct ufs_hba 
**hba_handle,
/* Get UFS version supported by

[PATCH 2/5] scsi: ufs: wrap the i/o access operations

2013-04-24 Thread Seungwon Jeon

Simplify operations with hiding mmio_base.

Signed-off-by: Seungwon Jeon tgih@samsung.com
---
 drivers/scsi/ufs/ufshcd.c |  106 +++--
 drivers/scsi/ufs/ufshcd.h |5 ++
 2 files changed, 49 insertions(+), 62 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 41b9639..b6c19b0 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -71,7 +71,7 @@ enum {
  */
 static inline u32 ufshcd_get_ufs_version(struct ufs_hba *hba)
 {
-   return readl(hba-mmio_base + REG_UFS_VERSION);
+   return ufshcd_readl(hba, REG_UFS_VERSION);
 }
 
 /**
@@ -130,8 +130,7 @@ static inline int ufshcd_get_tm_free_slot(struct ufs_hba 
*hba)
  */
 static inline void ufshcd_utrl_clear(struct ufs_hba *hba, u32 pos)
 {
-   writel(~(1  pos),
-   (hba-mmio_base + REG_UTP_TRANSFER_REQ_LIST_CLEAR));
+   ufshcd_writel(hba, REG_UTP_TRANSFER_REQ_LIST_CLEAR, ~(1  pos));
 }
 
 /**
@@ -165,7 +164,7 @@ static inline int ufshcd_get_lists_status(u32 reg)
  */
 static inline int ufshcd_get_uic_cmd_result(struct ufs_hba *hba)
 {
-   return readl(hba-mmio_base + REG_UIC_COMMAND_ARG_2) 
+   return ufshcd_readl(hba, REG_UIC_COMMAND_ARG_2) 
   MASK_UIC_COMMAND_RESULT;
 }
 
@@ -243,18 +242,14 @@ ufshcd_config_int_aggr(struct ufs_hba *hba, int option)
 {
switch (option) {
case INT_AGGR_RESET:
-   writel((INT_AGGR_ENABLE |
-   INT_AGGR_COUNTER_AND_TIMER_RESET),
-   (hba-mmio_base +
-REG_UTP_TRANSFER_REQ_INT_AGG_CONTROL));
+   ufshcd_writel(hba, REG_UTP_TRANSFER_REQ_INT_AGG_CONTROL,
+ INT_AGGR_ENABLE | 
INT_AGGR_COUNTER_AND_TIMER_RESET);
break;
case INT_AGGR_CONFIG:
-   writel((INT_AGGR_ENABLE |
-   INT_AGGR_PARAM_WRITE |
-   INT_AGGR_COUNTER_THRESHOLD_VALUE |
-   INT_AGGR_TIMEOUT_VALUE),
-   (hba-mmio_base +
-REG_UTP_TRANSFER_REQ_INT_AGG_CONTROL));
+   ufshcd_writel(hba, REG_UTP_TRANSFER_REQ_INT_AGG_CONTROL,
+ INT_AGGR_ENABLE | INT_AGGR_PARAM_WRITE |
+ INT_AGGR_COUNTER_THRESHOLD_VALUE |
+ INT_AGGR_TIMEOUT_VALUE);
break;
}
 }
@@ -267,12 +262,10 @@ ufshcd_config_int_aggr(struct ufs_hba *hba, int option)
  */
 static void ufshcd_enable_run_stop_reg(struct ufs_hba *hba)
 {
-   writel(UTP_TASK_REQ_LIST_RUN_STOP_BIT,
-  (hba-mmio_base +
-   REG_UTP_TASK_REQ_LIST_RUN_STOP));
-   writel(UTP_TRANSFER_REQ_LIST_RUN_STOP_BIT,
-  (hba-mmio_base +
-   REG_UTP_TRANSFER_REQ_LIST_RUN_STOP));
+   ufshcd_writel(hba, REG_UTP_TASK_REQ_LIST_RUN_STOP,
+ UTP_TASK_REQ_LIST_RUN_STOP_BIT);
+   ufshcd_writel(hba, REG_UTP_TRANSFER_REQ_LIST_RUN_STOP,
+ UTP_TRANSFER_REQ_LIST_RUN_STOP_BIT);
 }
 
 /**
@@ -281,7 +274,7 @@ static void ufshcd_enable_run_stop_reg(struct ufs_hba *hba)
  */
 static inline void ufshcd_hba_start(struct ufs_hba *hba)
 {
-   writel(CONTROLLER_ENABLE , (hba-mmio_base + REG_CONTROLLER_ENABLE));
+   ufshcd_writel(hba, REG_CONTROLLER_ENABLE, CONTROLLER_ENABLE);
 }
 
 /**
@@ -290,7 +283,7 @@ static inline void ufshcd_hba_start(struct ufs_hba *hba)
  */
 static inline void ufshcd_hba_stop(struct ufs_hba *hba)
 {
-   writel(CONTROLLER_DISABLE, (hba-mmio_base + REG_CONTROLLER_ENABLE));
+   ufshcd_writel(hba, REG_CONTROLLER_ENABLE, CONTROLLER_DISABLE);
 }
 
 /**
@@ -301,7 +294,7 @@ static inline void ufshcd_hba_stop(struct ufs_hba *hba)
  */
 static inline int ufshcd_is_hba_active(struct ufs_hba *hba)
 {
-   return (readl(hba-mmio_base + REG_CONTROLLER_ENABLE)  0x1) ? 0 : 1;
+   return (ufshcd_readl(hba, REG_CONTROLLER_ENABLE)  0x1) ? 0 : 1;
 }
 
 /**
@@ -313,8 +306,7 @@ static inline
 void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
 {
__set_bit(task_tag, hba-outstanding_reqs);
-   writel((1  task_tag),
-  (hba-mmio_base + REG_UTP_TRANSFER_REQ_DOOR_BELL));
+   ufshcd_writel(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL, 1  task_tag);
 }
 
 /**
@@ -338,8 +330,7 @@ static inline void ufshcd_copy_sense_data(struct ufshcd_lrb 
*lrbp)
  */
 static inline void ufshcd_hba_capabilities(struct ufs_hba *hba)
 {
-   hba-capabilities =
-   readl(hba-mmio_base + REG_CONTROLLER_CAPABILITIES);
+   hba-capabilities = ufshcd_readl(hba, REG_CONTROLLER_CAPABILITIES);
 
/* nutrs and nutmrs are 0 based values */
hba-nutrs = (hba-capabilities  MASK_TRANSFER_REQUESTS_SLOTS) + 1;
@@ -356,16 +347,13 @@ static inline void
 ufshcd_send_uic_command(struct ufs_hba *hba, struct uic_command *uic_cmnd)
 {
/* Write Args */
-   writel(uic_cmnd-argument1,
-

[PATCH 4/5] scsi: ufs: rework link start-up process

2013-04-24 Thread Seungwon Jeon

Link start-up requires long time with multiphase handshakes
between UFS host and device. This affects driver's probe time.
This patch let link start-up run asynchronously.
And completion time of uic command is defined to avoid a
permanent wait.

Signed-off-by: Seungwon Jeon tgih@samsung.com
---
 drivers/scsi/ufs/ufshcd.c |  114 +---
 drivers/scsi/ufs/ufshcd.h |6 ++-
 2 files changed, 89 insertions(+), 31 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index efe2256..76ff332 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -38,6 +38,7 @@
 #define UFSHCD_ENABLE_INTRS(UTP_TRANSFER_REQ_COMPL |\
 UTP_TASK_REQ_COMPL |\
 UFSHCD_ERROR_MASK)
+#define UIC_CMD_TIMEOUT100
 
 enum {
UFSHCD_MAX_CHANNEL  = 0,
@@ -357,13 +358,15 @@ static inline void ufshcd_hba_capabilities(struct ufs_hba 
*hba)
 }
 
 /**
- * ufshcd_send_uic_command - Send UIC commands to unipro layers
+ * ufshcd_dispatch_uic_cmd - Dispatch UIC commands to unipro layers
  * @hba: per adapter instance
  * @uic_command: UIC command
  */
 static inline void
-ufshcd_send_uic_command(struct ufs_hba *hba, struct uic_command *uic_cmnd)
+ufshcd_dispatch_uic_cmd(struct ufs_hba *hba, struct uic_command *uic_cmnd)
 {
+   init_completion(uic_cmnd-done);
+
/* Write Args */
ufshcd_writel(hba, REG_UIC_COMMAND_ARG_1, uic_cmnd-argument1);
ufshcd_writel(hba, REG_UIC_COMMAND_ARG_2, uic_cmnd-argument2);
@@ -375,6 +378,45 @@ ufshcd_send_uic_command(struct ufs_hba *hba, struct 
uic_command *uic_cmnd)
 }
 
 /**
+ * ufshcd_wait_for_uic_cmd - Wait complectioin of UIC command
+ * @hba: per adapter instance
+ * @uic_command: UIC command
+ *
+ * Returns 0 only if success.
+ */
+static int ufshcd_wait_for_uic_cmd(struct ufs_hba *hba)
+{
+   struct uic_command *uic_cmd = hba-active_uic_cmd;
+   int ret;
+
+   if (wait_for_completion_timeout(uic_cmd-done,
+   msecs_to_jiffies(UIC_CMD_TIMEOUT)))
+   ret = ufshcd_get_uic_cmd_result(hba);
+   else
+   ret = -ETIMEDOUT;
+
+   return ret;
+}
+
+/**
+ * ufshcd_ready_uic_cmd - Check if controller is ready
+ *to accept UIC commands
+ * @hba: per adapter instance
+ * Return true on success, else false
+ */
+static inline bool ufshcd_ready_uic_cmd(struct ufs_hba *hba)
+{
+   if (ufshcd_readl(hba, REG_CONTROLLER_STATUS)  UIC_COMMAND_READY) {
+   return true;
+   } else {
+   dev_err(hba-dev,
+   Controller not ready
+to accept UIC commands\n);
+   return false;
+   }
+}
+
+/**
  * ufshcd_map_sg - Map scatter-gather list to prdt
  * @lrbp - pointer to local reference block
  *
@@ -735,15 +777,10 @@ static int ufshcd_dme_link_startup(struct ufs_hba *hba)
 {
struct uic_command *uic_cmd;
unsigned long flags;
+   int ret;
 
-   /* check if controller is ready to accept UIC commands */
-   if (((ufshcd_readl(hba, REG_CONTROLLER_STATUS)) 
-   UIC_COMMAND_READY) == 0x0) {
-   dev_err(hba-dev,
-   Controller not ready
-to accept UIC commands\n);
+   if (!ufshcd_ready_uic_cmd(hba))
return -EIO;
-   }
 
spin_lock_irqsave(hba-host-host_lock, flags);
 
@@ -754,13 +791,16 @@ static int ufshcd_dme_link_startup(struct ufs_hba *hba)
uic_cmd-argument2 = 0;
uic_cmd-argument3 = 0;
 
-   /* enable UIC related interrupts */
-   ufshcd_enable_intr(hba, UIC_COMMAND_COMPL);
+   /* Dispatching UIC commands to controller */
+   ufshcd_dispatch_uic_cmd(hba, uic_cmd);
 
-   /* sending UIC commands to controller */
-   ufshcd_send_uic_command(hba, uic_cmd);
spin_unlock_irqrestore(hba-host-host_lock, flags);
-   return 0;
+
+   ret = ufshcd_wait_for_uic_cmd(hba);
+   if (ret)
+   dev_err(hba-dev, link startup: error code %d returned\n, 
ret);
+
+   return ret;
 }
 
 /**
@@ -898,6 +938,9 @@ static int ufshcd_initialize_hba(struct ufs_hba *hba)
if (ufshcd_hba_enable(hba))
return -EIO;
 
+   /* enable UIC related interrupts */
+   ufshcd_enable_intr(hba, UIC_COMMAND_COMPL | UIC_ERROR);
+
/* Configure UTRL and UTMRL base address registers */
ufshcd_writel(hba, REG_UTP_TRANSFER_REQ_LIST_BASE_L,
  lower_32_bits(hba-utrdl_dma_addr));
@@ -909,7 +952,9 @@ static int ufshcd_initialize_hba(struct ufs_hba *hba)
  upper_32_bits(hba-utmrdl_dma_addr));
 
/* Initialize unipro link startup procedure */
-   return ufshcd_dme_link_startup(hba);
+   schedule_work(hba-link_startup_wq);
+
+   return 0;
 }
 
 /**
@@ -1186,6 +1231,16 @@ ufshcd_transfer_rsp_status(struct

[PATCH 5/5] scsi: ufs: add dme operations

2013-04-24 Thread Seungwon Jeon

Add uic command operations including DME_XXX series.

Signed-off-by: Seungwon Jeon tgih@samsung.com
---
 drivers/scsi/ufs/ufs-attrs.h |  129 
 drivers/scsi/ufs/ufshcd.c|  220 +-
 drivers/scsi/ufs/ufshcd.h|   55 +++
 drivers/scsi/ufs/ufshci.h|   19 
 4 files changed, 422 insertions(+), 1 deletions(-)
 create mode 100644 drivers/scsi/ufs/ufs-attrs.h

diff --git a/drivers/scsi/ufs/ufs-attrs.h b/drivers/scsi/ufs/ufs-attrs.h
new file mode 100644
index 000..562bb49
--- /dev/null
+++ b/drivers/scsi/ufs/ufs-attrs.h
@@ -0,0 +1,129 @@
+/*
+ * drivers/scsi/ufs/ufs-attrs.h
+ *
+ * Copyright (C) 2013 Samsung Electronics Co., Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _UFS_ATTRS_H_
+#define _UFS_ATTRS_H_
+
+/*
+ * PHY Adpater attributes
+ */
+#define PA_ACTIVETXDATALANES   0x1560
+#define PA_ACTIVERXDATALANES   0x1580
+#define PA_TXTRAILINGCLOCKS0x1564
+#define PA_PHY_TYPE0x1500
+#define PA_AVAILTXDATALANES0x1520
+#define PA_AVAILRXDATALANES0x1540
+#define PA_MINRXTRAILINGCLOCKS 0x1543
+#define PA_TXPWRSTATUS 0x1567
+#define PA_RXPWRSTATUS 0x1582
+#define PA_TXFORCECLOCK0x1562
+#define PA_TXPWRMODE   0x1563
+#define PA_LEGACYDPHYESCDL 0x1570
+#define PA_MAXTXSPEEDFAST  0x1521
+#define PA_MAXTXSPEEDSLOW  0x1522
+#define PA_MAXRXSPEEDFAST  0x1541
+#define PA_MAXRXSPEEDSLOW  0x1542
+#define PA_TXLINKSTARTUPHS 0x1544
+#define PA_TXSPEEDFAST 0x1565
+#define PA_TXSPEEDSLOW 0x1566
+#define PA_REMOTEVERINFO   0x15A0
+#define PA_TXGEAR  0x1568
+#define PA_TXTERMINATION   0x1569
+#define PA_HSSERIES0x156A
+#define PA_PWRMODE 0x1571
+#define PA_RXGEAR  0x1583
+#define PA_RXTERMINATION   0x1584
+#define PA_MAXRXPWMGEAR0x1586
+#define PA_MAXRXHSGEAR 0x1587
+#define PA_RXHSUNTERMCAP   0x15A5
+#define PA_RXLSTERMCAP 0x15A6
+#define PA_PACPREQTIMEOUT  0x1590
+#define PA_PACPREQEOBTIMEOUT   0x1591
+#define PA_LOCALVERINFO0x15A9
+#define PA_TACTIVATE   0x15A8
+#define PA_PACPFRAMECOUNT  0x15C0
+#define PA_PACPERRORCOUNT  0x15C1
+#define PA_PHYTESTCONTROL  0x15C2
+#define PA_PWRMODEUSERDATA00x15B0
+#define PA_PWRMODEUSERDATA10x15B1
+#define PA_PWRMODEUSERDATA20x15B2
+#define PA_PWRMODEUSERDATA30x15B3
+#define PA_PWRMODEUSERDATA40x15B4
+#define PA_PWRMODEUSERDATA50x15B5
+#define PA_PWRMODEUSERDATA60x15B6
+#define PA_PWRMODEUSERDATA70x15B7
+#define PA_PWRMODEUSERDATA80x15B8
+#define PA_PWRMODEUSERDATA90x15B9
+#define PA_PWRMODEUSERDATA10   0x15BA
+#define PA_PWRMODEUSERDATA11   0x15BB
+#define PA_CONNECTEDTXDATALANE 0x1561
+#define PA_CONNECTEDRXDATALANE 0x1581
+#define PA_LOGICALLANEMAP  0x15A1
+#define PA_SLEEPNOCONFIGTIME   0x15A2
+#define PA_STALLNOCONFIGTIME   0x15A3
+#define PA_SAVECONFIGTIME  0x15A4
+
+/*
+ * Data Link Layer Attributes
+ */
+#define DL_TC0TXFCTHRESHOLD0x2040
+#define DL_FC0PROTTIMEOUTVAL   0x2041
+#define DL_TC0REPLAYTIMEOUTVAL 0x2042
+#define DL_AFC0REQTIMEOUTVAL   0x2043
+#define DL_AFC0CREDITTHRESHOLD 0x2044
+#define DL_TC0OUTACKTHRESHOLD  0x2045
+#define DL_TC1TXFCTHRESHOLD0x2060
+#define DL_FC1PROTTIMEOUTVAL   0x2061
+#define DL_TC1REPLAYTIMEOUTVAL 0x2062
+#define DL_AFC1REQTIMEOUTVAL   0x2063
+#define DL_AFC1CREDITTHRESHOLD 0x2064
+#define DL_TC1OUTACKTHRESHOLD  0x2065
+#define DL_TXPREEMPTIONCAP 0x2000
+#define DL_TC0TXMAXSDUSIZE 0x2001
+#define DL_TC0RXINITCREDITVAL  0x2002
+#define DL_TC0TXBUFFERSIZE 0x2005
+#define DL_PEERTC0PRESENT  0x2046
+#define DL_PEERTC0RXINITCREVAL 0x2047
+#define DL_TC1TXMAXSDUSIZE 0x2003
+#define DL_TC1RXINITCREDITVAL  0x2004
+#define DL_TC1TXBUFFERSIZE 0x2006
+#define DL_PEERTC1PRESENT  0x2066
+#define DL_PEERTC1RXINITCREVAL 0x2067
+
+/*
+ * Network Layer Attributes
+ */
+#define N_DEVICEID 0x3000
+#define N_DEVICEID_VALID   0x3001
+#define N_TC0TXMAXSDUSIZE  0x3020
+#define N_TC1TXMAXSDUSIZE  0x3021
+
+/*
+ * Transport Layer Attributes
+ */
+#define T_NUMCPORTS0x4000
+#define T_NUMTESTFEATURES  0x4001
+#define T_CONNECTIONSTATE  0x4020
+#define T_PEERDEVICEID 0x4021
+#define T_PEERCPORTID  0x4022
+#define T_TRAFFICCLASS 0x4023
+#define T_PROTOCOLID   0x4024
+#define T_CPORTFLAGS   0x4025
+#define T_TXTOKENVALUE 0x4026
+#define T_RXTOKENVALUE 0x4027
+#define T_LOCALBUFFERSPACE 0x4028
+#define T_PEERBUFFERSPACE  0x4029
+#define T_CREDITSTOSEND0x402A
+#define T_CPORTMODE0x402B
+#define T_TC0TXMAXSDUSIZE

RE: T10 WCE interpretation in Linux device level access

2013-04-24 Thread Black, David

Jeremy,

It looks like, you, Paolo and Ric have hit the nail on the head here - this is
a nice summary, IMHO:

 On 4/24/2013 7:57 AM, Paolo Bonzini wrote:
 If the device can promise this, we don't care (and don't know) how it 
 manages that promise. It can leave the data on battery backed DRAM, can 
 archive it to flash or any other scheme that works.
 
 That's exactly the point of SYNC_NV=1.

   Well its the point, but the specification is written such that the 
 vendors can
 choose to implement it any way they wish, especially for split cache
 systems where there is both volatile and non volatile cache.

Independent of T10's best intentions at the time, the implementations aren't
doing what's needed or intended, and I'd guess that the SYNC_NV bit is not
being set to 1 by [other people's ;-) ] software that should be setting it
to 1 if it were paying attention to the standard.

This is further complicated by it being completely legitimate wrt the SCSI
standard to put non-volatile cache in a system and not have the SCSI interface
admit that the non-volatile cache exists (WCE=0, SYNCHRONIZE CACHE is a no-op
independent of the value of SYNC_NV).

I believe that Rob Elliot's 13-050 proposal to obsolete SYNC_NV and re-specify
SYNCHRONIZE CACHE to make all data non-volatile by whatever means the target
chooses is what T10 should do, and that matches Ric's summary:

 If the device can promise this, we don't care (and don't know) how it 
 manages that promise. It can leave the data on battery backed DRAM, can 
 archive it to flash or any other scheme that works.

Beyond that, attempting to manage drive removal from storage systems via the
SCSI interface with standard commands is a waste of time and effort, IMHO.
In a serious storage array (and even some fairly simple RAID controllers), some
vendor-specific magic is needed to get the array (or controller) to prepare
so that the drive can be removed cleanly.  To oversimplify, it's not enough to
flush data to the drive; the array or controller is stateful, and hence has
to be told to forget the drive, where forget involves things that are
rather implementation-specific.

Thanks,
--David

David L. Black, Distinguished Engineer
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953 FAX: +1 (508) 293-7786
david.bl...@emc.com    Mobile: +1 (978) 394-7754


 -Original Message-
 From: Jeremy Linton [mailto:jlin...@tributary.com]
 Sent: Wednesday, April 24, 2013 10:36 AM
 To: Paolo Bonzini
 Cc: Ric Wheeler; Hannes Reinecke; James Bottomley; linux-scsi@vger.kernel.org;
 Martin K. Petersen; Jeff Moyer; Tejun Heo; Mike Snitzer; Black, David;
 Elliott, Robert (Server Storage); Knight, Frederick
 Subject: Re: T10 WCE interpretation in Linux  device level access
 
 On 4/24/2013 7:57 AM, Paolo Bonzini wrote:
  If the device can promise this, we don't care (and don't know) how it
  manages that promise. It can leave the data on battery backed DRAM, can
  archive it to flash or any other scheme that works.
 
  That's exactly the point of SYNC_NV=1.
 
   Well its the point, but the specification is written such that the
 vendors can
 choose to implement it any way they wish, especially for split cache
 systems where there is both volatile and non volatile cache.
 
   Flushing the NV cache to medium (as is the current behavior) may not be
 a bad
 idea anyway.
 
   Thats because I know of a large vendors array where the non-volatile
 cache
 might be better described as the sometimes non-volatile cache. That is
 because
 a failure to flush the volatile portions results in the non-volatile portions
 being considered invalid when power is restored. This fences the volume, and
 the
 usual method for recovering the array is to call support and have them
 invalidate the NV portions of the cache. Thereby negating the whole reason for
 having a NV cache. I'm sure they don't tell customers this fact when they sell
 the array, when it happened in our lab I was in a state of shock for about a
 week.
 
 
 
 
 
 
 
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread Ric Wheeler


On 04/24/2013 02:20 PM, Black, David wrote:

Jeremy,

It looks like, you, Paolo and Ric have hit the nail on the head here - this is
a nice summary, IMHO:


On 4/24/2013 7:57 AM, Paolo Bonzini wrote:

If the device can promise this, we don't care (and don't know) how it
manages that promise. It can leave the data on battery backed DRAM, can

archive it to flash or any other scheme that works.

That's exactly the point of SYNC_NV=1.

Well its the point, but the specification is written such that the 
vendors can
choose to implement it any way they wish, especially for split cache
systems where there is both volatile and non volatile cache.

Independent of T10's best intentions at the time, the implementations aren't
doing what's needed or intended, and I'd guess that the SYNC_NV bit is not
being set to 1 by [other people's ;-) ] software that should be setting it
to 1 if it were paying attention to the standard.

This is further complicated by it being completely legitimate wrt the SCSI
standard to put non-volatile cache in a system and not have the SCSI interface
admit that the non-volatile cache exists (WCE=0, SYNCHRONIZE CACHE is a no-op
independent of the value of SYNC_NV).

I believe that Rob Elliot's 13-050 proposal to obsolete SYNC_NV and re-specify
SYNCHRONIZE CACHE to make all data non-volatile by whatever means the target
chooses is what T10 should do, and that matches Ric's summary:


If the device can promise this, we don't care (and don't know) how it
manages that promise. It can leave the data on battery backed DRAM, can

archive it to flash or any other scheme that works.

Beyond that, attempting to manage drive removal from storage systems via the
SCSI interface with standard commands is a waste of time and effort, IMHO.
In a serious storage array (and even some fairly simple RAID controllers), some
vendor-specific magic is needed to get the array (or controller) to prepare
so that the drive can be removed cleanly.  To oversimplify, it's not enough to
flush data to the drive; the array or controller is stateful, and hence has
to be told to forget the drive, where forget involves things that are
rather implementation-specific.

Thanks,
--David



So I think that leaves us with some arrays that might benefit from Paolo's 
proposed patch, but almost certainly still will need to be able to ignore 
flushes for some block device accessing DB's, etc


Ric

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread James Bottomley

On Wed, 2013-04-24 at 16:41 -0400, Ric Wheeler wrote:
 On 04/24/2013 02:20 PM, Black, David wrote:
  Jeremy,
 
  It looks like, you, Paolo and Ric have hit the nail on the head here - this 
  is
  a nice summary, IMHO:
 
  On 4/24/2013 7:57 AM, Paolo Bonzini wrote:
  If the device can promise this, we don't care (and don't know) how it
  manages that promise. It can leave the data on battery backed DRAM, can
  archive it to flash or any other scheme that works.
 
  That's exactly the point of SYNC_NV=1.
 Well its the point, but the specification is written such that the 
  vendors can
  choose to implement it any way they wish, especially for split cache
  systems where there is both volatile and non volatile cache.
  Independent of T10's best intentions at the time, the implementations aren't
  doing what's needed or intended, and I'd guess that the SYNC_NV bit is not
  being set to 1 by [other people's ;-) ] software that should be setting it
  to 1 if it were paying attention to the standard.
 
  This is further complicated by it being completely legitimate wrt the SCSI
  standard to put non-volatile cache in a system and not have the SCSI 
  interface
  admit that the non-volatile cache exists (WCE=0, SYNCHRONIZE CACHE is a 
  no-op
  independent of the value of SYNC_NV).
 
  I believe that Rob Elliot's 13-050 proposal to obsolete SYNC_NV and 
  re-specify
  SYNCHRONIZE CACHE to make all data non-volatile by whatever means the target
  chooses is what T10 should do, and that matches Ric's summary:
 
  If the device can promise this, we don't care (and don't know) how it
  manages that promise. It can leave the data on battery backed DRAM, can
  archive it to flash or any other scheme that works.
  Beyond that, attempting to manage drive removal from storage systems via the
  SCSI interface with standard commands is a waste of time and effort, IMHO.
  In a serious storage array (and even some fairly simple RAID controllers), 
  some
  vendor-specific magic is needed to get the array (or controller) to 
  prepare
  so that the drive can be removed cleanly.  To oversimplify, it's not enough 
  to
  flush data to the drive; the array or controller is stateful, and hence has
  to be told to forget the drive, where forget involves things that are
  rather implementation-specific.
 
  Thanks,
  --David
 
 
 So I think that leaves us with some arrays that might benefit from Paolo's 
 proposed patch, but almost certainly still will need to be able to ignore 
 flushes for some block device accessing DB's, etc

That just leaves us with random standards behaviour.  Lets permit the
deterministic thing instead for the distros.  It kills two birds with
one stone because we can set WCE for the stupid UAS devices that clear
it wrongly as well.

For those who don't read code well, you add a temporary prefix to the
cache set in

echo xxx  /sys/class/scsi_disk/disk/cache_type

and it will set the flags for the lifetime of the current kernel, but
won't try to do a mode select to make them permanent.

James

---

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7992635..af16e88 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -142,6 +142,7 @@ sd_store_cache_type(struct device *dev, struct 
device_attribute *attr,
char *buffer_data;
struct scsi_mode_data data;
struct scsi_sense_hdr sshdr;
+   const char *temp = temporary ;
int len;
 
if (sdp-type != TYPE_DISK)
@@ -150,6 +151,13 @@ sd_store_cache_type(struct device *dev, struct 
device_attribute *attr,
 * it's not worth the risk */
return -EINVAL;
 
+   if (strncmp(buf, temp, sizeof(temp) - 1) == 0) {
+   buf += sizeof(temp) - 1;
+   sdkp-cache_override = 1;
+   } else {
+   sdkp-cache_override = 0;
+   }
+
for (i = 0; i  ARRAY_SIZE(sd_cache_types); i++) {
len = strlen(sd_cache_types[i]);
if (strncmp(sd_cache_types[i], buf, len) == 0 
@@ -162,6 +170,13 @@ sd_store_cache_type(struct device *dev, struct 
device_attribute *attr,
return -EINVAL;
rcd = ct  0x01 ? 1 : 0;
wce = ct  0x02 ? 1 : 0;
+
+   if (sdkp-cache_override) {
+   sdkp-WCE = wce;
+   sdkp-RCD = rcd;
+   return count;
+   }
+
if (scsi_mode_sense(sdp, 0x08, 8, buffer, sizeof(buffer), SD_TIMEOUT,
SD_MAX_RETRIES, data, NULL))
return -EINVAL;
@@ -2319,6 +2334,10 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char 
*buffer)
int old_rcd = sdkp-RCD;
int old_dpofua = sdkp-DPOFUA;
 
+
+   if (sdkp-cache_override)
+   return;
+   
first_len = 4;
if (sdp-skip_ms_page_8) {
if (sdp-type == TYPE_RBC)
@@ -2812,6 +2831,7 @@ static void sd_probe_async(void *data, async_cookie_t 
cookie)
sdkp-capacity = 0;

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread Paolo Bonzini

Il 24/04/2013 23:02, James Bottomley ha scritto:
 That just leaves us with random standards behaviour.  Lets permit the
 deterministic thing instead for the distros.  It kills two birds with
 one stone because we can set WCE for the stupid UAS devices that clear
 it wrongly as well.
 
 For those who don't read code well, you add a temporary prefix to the
 cache set in
 
 echo xxx  /sys/class/scsi_disk/disk/cache_type
 
 and it will set the flags for the lifetime of the current kernel, but
 won't try to do a mode select to make them permanent.

Having the knob is useful indeed.  I don't like the temporary name
though, because temporary write-through doesn't sound like it can eat
data on a power loss.  What about force or assume?

Also, this would be in addition to my patch (when tested), right?

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread James Bottomley

On Wed, 2013-04-24 at 23:54 +0200, Paolo Bonzini wrote:
 Il 24/04/2013 23:02, James Bottomley ha scritto:
  That just leaves us with random standards behaviour.  Lets permit the
  deterministic thing instead for the distros.  It kills two birds with
  one stone because we can set WCE for the stupid UAS devices that clear
  it wrongly as well.
  
  For those who don't read code well, you add a temporary prefix to the
  cache set in
  
  echo xxx  /sys/class/scsi_disk/disk/cache_type
  
  and it will set the flags for the lifetime of the current kernel, but
  won't try to do a mode select to make them permanent.
 
 Having the knob is useful indeed.  I don't like the temporary name
 though, because temporary write-through doesn't sound like it can eat
 data on a power loss.  What about force or assume?

I'm fairly ambivalent, except not force.  The default behaviour is to do
the mode select, so force seems to imply that as well, except it won't.
I don't see a difference between assume and temporary.

 Also, this would be in addition to my patch (when tested), right?

Not really ... given T10s deprecation I don't think we want to touch
anything to do with SYNC_NV because it just adds to the uncertainty
about what will actually happen.  Giving the ability to control WCE (and
RCD) fixes all the problems raised so far.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] qla2xxx: Patches for scsi misc branch.

2013-04-24 Thread Saurav Kashyap

Hi James,

This patchset fixes the sparse warnings. Please apply the following patches
to the scsi tree at your earliest convenience for inclusion in the next
mainline merge window.

Thanks,
~Saurav
 
Armen Baloyan (1):
  qla2xxx: fix sparse warning large integer implicitly truncated to
unsigned type

Fengguang Wu (1):
  qla2xxx: qla2x00_sp_compl can be static.

 drivers/scsi/qla2xxx/qla_mr.c |6 ++
 drivers/scsi/qla2xxx/qla_os.c |2 +-
 2 files changed, 3 insertions(+), 5 deletions(-)

-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] qla2xxx: qla2x00_sp_compl can be static.

2013-04-24 Thread Saurav Kashyap

From: Fengguang Wu fengguang...@intel.com

Signed-off-by: Fengguang Wu fengguang...@intel.com
Signed-off-by: Saurav Kashyap saurav.kash...@qlogic.com
---
 drivers/scsi/qla2xxx/qla_os.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index a083715..7b621c2 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -644,7 +644,7 @@ qla2x00_sp_free_dma(void *vha, void *ptr)
qla2x00_rel_sp(sp-fcport-vha, sp);
 }
 
-void
+static void
 qla2x00_sp_compl(void *data, void *ptr, int res)
 {
struct qla_hw_data *ha = (struct qla_hw_data *)data;
-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] qla2xxx: fix sparse warning large integer implicitly truncated to unsigned type

2013-04-24 Thread Saurav Kashyap

From: Armen Baloyan armen.balo...@qlogic.com

Signed-off-by: Armen Baloyan armen.balo...@qlogic.com
Signed-off-by: Saurav Kashyap saurav.kash...@qlogic.com
---
 drivers/scsi/qla2xxx/qla_mr.c |6 ++
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_mr.c b/drivers/scsi/qla2xxx/qla_mr.c
index 729b743..937fed8 100644
--- a/drivers/scsi/qla2xxx/qla_mr.c
+++ b/drivers/scsi/qla2xxx/qla_mr.c
@@ -3003,12 +3003,10 @@ qlafx00_build_scsi_iocbs(srb_t *sp, struct 
cmd_type_7_fx00 *cmd_pkt,
 
/* Set transfer direction */
if (cmd-sc_data_direction == DMA_TO_DEVICE) {
-   lcmd_pkt-cntrl_flags =
-   __constant_cpu_to_le16(TMF_WRITE_DATA);
+   lcmd_pkt-cntrl_flags = TMF_WRITE_DATA;
vha-qla_stats.output_bytes += scsi_bufflen(cmd);
} else if (cmd-sc_data_direction == DMA_FROM_DEVICE) {
-   lcmd_pkt-cntrl_flags =
-   __constant_cpu_to_le16(TMF_READ_DATA);
+   lcmd_pkt-cntrl_flags = TMF_READ_DATA;
vha-qla_stats.input_bytes += scsi_bufflen(cmd);
}
 
-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread Ric Wheeler


On 04/24/2013 06:09 PM, James Bottomley wrote:

On Wed, 2013-04-24 at 23:54 +0200, Paolo Bonzini wrote:

Il 24/04/2013 23:02, James Bottomley ha scritto:

That just leaves us with random standards behaviour.  Lets permit the
deterministic thing instead for the distros.  It kills two birds with
one stone because we can set WCE for the stupid UAS devices that clear
it wrongly as well.

For those who don't read code well, you add a temporary prefix to the
cache set in

echo xxx  /sys/class/scsi_disk/disk/cache_type

and it will set the flags for the lifetime of the current kernel, but
won't try to do a mode select to make them permanent.

Having the knob is useful indeed.  I don't like the temporary name
though, because temporary write-through doesn't sound like it can eat
data on a power loss.  What about force or assume?

I'm fairly ambivalent, except not force.  The default behaviour is to do
the mode select, so force seems to imply that as well, except it won't.
I don't see a difference between assume and temporary.


Also, this would be in addition to my patch (when tested), right?

Not really ... given T10s deprecation I don't think we want to touch
anything to do with SYNC_NV because it just adds to the uncertainty
about what will actually happen.  Giving the ability to control WCE (and
RCD) fixes all the problems raised so far.

James




Why are we turning off the RCD bit in this? Not sure it matters, but we only 
should care about WCE (and the dirty write cache data)?


Thanks!

Ric

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread James Bottomley

On Wed, 2013-04-24 at 18:36 -0400, Ric Wheeler wrote:
 On 04/24/2013 06:09 PM, James Bottomley wrote:
  On Wed, 2013-04-24 at 23:54 +0200, Paolo Bonzini wrote:
  Il 24/04/2013 23:02, James Bottomley ha scritto:
  That just leaves us with random standards behaviour.  Lets permit the
  deterministic thing instead for the distros.  It kills two birds with
  one stone because we can set WCE for the stupid UAS devices that clear
  it wrongly as well.
 
  For those who don't read code well, you add a temporary prefix to the
  cache set in
 
  echo xxx  /sys/class/scsi_disk/disk/cache_type
 
  and it will set the flags for the lifetime of the current kernel, but
  won't try to do a mode select to make them permanent.
  Having the knob is useful indeed.  I don't like the temporary name
  though, because temporary write-through doesn't sound like it can eat
  data on a power loss.  What about force or assume?
  I'm fairly ambivalent, except not force.  The default behaviour is to do
  the mode select, so force seems to imply that as well, except it won't.
  I don't see a difference between assume and temporary.
 
  Also, this would be in addition to my patch (when tested), right?
  Not really ... given T10s deprecation I don't think we want to touch
  anything to do with SYNC_NV because it just adds to the uncertainty
  about what will actually happen.  Giving the ability to control WCE (and
  RCD) fixes all the problems raised so far.
 
 
 Why are we turning off the RCD bit in this? Not sure it matters, but we only 
 should care about WCE (and the dirty write cache data)?

Well, it's in the code.  Cache policy is a combination of those two
bits.  The cache type takes a cache policy string, ergo it must update
both.  We don't do anything with it because having a write back cache
and no cache at all is transparent to us.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/6] scsi_debug: fix logical block provisioning support

2013-04-24 Thread Douglas Gilbert


On 13-04-16 09:11 AM, Akinobu Mita wrote:

I tried testing the logical block provisioning support in scsi_debug,
but it didn't work as I expected.

For example, load scsi_debug module with UNMAP command supported
and fill the storage with random data.

# modprobe scsi_debug lbpu=1
# dd if=/dev/urandom of=/dev/sdb

Then, try to unmap LBA 0, but Get LBA status reports:

# sg_unmap --lba=0 --num=1 /dev/sdb
# sg_get_lba_status --lba=0 /dev/sdb
descriptor LBA: 0x  blocks: 16384  mapped

This is unexpected result.  Because UNMAP command to LBA 0 finished
without any errors, but Get LBA status shows that LBA 0 is still mapped.

I looked around the logical block provisioning support in scsi_debug,
and I found several problems there.  This patch series tries to fix
these problems and it is broken into small patches as much as possible
for ease of review.

Cc: James E.J. Bottomley jbottom...@parallels.com
Cc: linux-scsi@vger.kernel.org
Cc: Douglas Gilbert dgilb...@interlog.com
Cc: Martin K. Petersen martin.peter...@oracle.com

Akinobu Mita (6):
   scsi_debug: call map_region() and unmap_region() only when needed
   scsi_debug: prohibit scsi_debug_unmap_granularity ==
 scsi_debug_unmap_alignment
   scsi_debug: clear correct memory region when LBPRZ is enabled
   scsi_debug: add translation functions between LBA and index of
 provisioning map
   scsi_debug: fix initialization of provisioning map
   scsi_debug: fix logical block provisioning support



I'd like to see some feedback from Martin Petersen on this
set of patches. For my part, for this patch series (1/6
fo 6/6):

Acked-by: Douglas Gilbert dgilb...@interlog.com

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] scsi_debug: fix data integrity support

2013-04-24 Thread Douglas Gilbert


On 13-04-21 05:17 AM, Akinobu Mita wrote:

When I tried testing the data integrity support in scsi_debug on x86_32,
I got CONFIG_DEBUG_HIGHMEM warnings and protection errors.  This was
triggered due to misused kmap_atomic/kunmap_atomic.

And then, while I was testing the fix of the above issue with several
combination with module parameters dix and dif, I found that doing
'modprobe scsi_debug dif=0 dix=1' causes kernel crash.

This patch set includes these fixes and cleanup which is related to data
integrity support.

Cc: James E.J. Bottomley jbottom...@parallels.com
Cc: Douglas Gilbert dgilb...@interlog.com
Cc: Martin K. Petersen martin.peter...@oracle.com
Cc: linux-scsi@vger.kernel.org

Akinobu Mita (3):
   scsi_debug: fix data integrity support on highmem machine
   scsi_debug: fix NULL pointer dereference with parameters dif=0 dix=1
   scsi_debug: simplify offset calculation for dif_storep

  drivers/scsi/scsi_debug.c | 29 -
  1 file changed, 12 insertions(+), 17 deletions(-)


Again, I'd like to see some feedback from Martin Petersen
on this set of patches. For my part, for this patch series
(1/3 to 3/3):

Acked-by: Douglas Gilbert dgilb...@interlog.com

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] [SCSI] sd: workaround invalid OPTIMAL TRANSFER LENGTH

2013-04-24 Thread Martin K. Petersen

 Mike == Mike Snitzer snit...@redhat.com writes:

Mike,

Following up on our discussion at LSF/MM last week...

Mike Workaround disk firmware that improperly sets OPTIMAL TRANSFER
Mike LENGTH to 0x (aka UINT_MAX or 4294967295U) by assuming
Mike this _optional_ BLOCK LIMITS VPD field was not specified (0).

My only real objection is that UINT_MAX appears to be completely
coincidental. What if the next drive specifies UINT_MAX-1 or $RANDOM?

Lacking a decent heuristic I'd rather just blacklist the offending drive
using the skip_vpd_pages knob we already have in place.

-- 
Martin K. Petersen  Oracle Linux Engineering


[SCSI] Workaround for disks that report bad optimal transfer length

Not all disks fill out the VPD pages correctly. Add a blacklist flag
that allows us ignore the SBC-3 VPD pages for a given device. The
BLIST_SKIP_VPD_PAGES flag triggers our existing skip_vpd_pages
scsi_device parameter to bypass VPD scanning.

Also blacklist the offending Seagate drive model.

Reported-by: Mike Snitzer snit...@redhat.com
Signed-off-by: Martin K. Petersen martin.peter...@oracle.com


diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index 43fca91..f969aca 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -228,6 +228,7 @@ static struct {
{SanDisk, ImageMate CF-SD1, NULL, BLIST_FORCELUN},
{SEAGATE, ST34555N, 0930, BLIST_NOTQ},/* Chokes on tagged 
INQUIRY */
{SEAGATE, ST3390N, 9546, BLIST_NOTQ},
+   {SEAGATE, ST900MM0006, NULL, BLIST_SKIP_VPD_PAGES},
{SGI, RAID3, *, BLIST_SPARSELUN},
{SGI, RAID5, *, BLIST_SPARSELUN},
{SGI, TP9100, *, BLIST_REPORTLUN2},
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3e58b22..b9e39e0 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -924,6 +924,9 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned 
char *inq_result,
if (*bflags  BLIST_NO_DIF)
sdev-no_dif = 1;
 
+   if (*bflags  BLIST_SKIP_VPD_PAGES)
+   sdev-skip_vpd_pages = 1;
+
transport_configure_device(sdev-sdev_gendev);
 
if (sdev-host-hostt-slave_configure) {
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index cc1f3e7..447d2d7 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -31,4 +31,5 @@
 #define BLIST_MAX_512  0x80 /* maximum 512 sector cdb length */
 #define BLIST_ATTACH_PQ3   0x100 /* Scan: Attach to PQ3 devices */
 #define BLIST_NO_DIF   0x200 /* Disable T10 PI (DIF) */
+#define BLIST_SKIP_VPD_PAGES   0x400 /* Ignore SBC-3 VPD pages */
 #endif
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: T10 WCE interpretation in Linux device level access

2013-04-24 Thread Martin K. Petersen

 James == James Bottomley james.bottom...@hansenpartnership.com writes:

James I'm fairly ambivalent, except not force.  The default behaviour
James is to do the mode select, so force seems to imply that as well,
James except it won't.  I don't see a difference between assume and
James temporary.

I'm ok with your patch. And a strong believer in not altering the
SYNCHRONIZE CACHE behavior that's been rigorously tested in the field by
adding SYNC_NV to the mix.


James Not really ... given T10s deprecation I don't think we want to
James touch anything to do with SYNC_NV because it just adds to the
James uncertainty about what will actually happen.  

Yep.


James Giving the ability to control WCE (and RCD) fixes all the
James problems raised so far.

If there are devices that would truly benefit from SYNC_NV we could add
a sync_nv parameter to scsi_disk's sysfs that could be used to set that
bit when issuing flush_cmnd.

But it would be something we would do manually on a per-device basis and
not something that is automatically keyed off of NV_SUP (SYNC_NV doesn't
require NV_SUP, btw., so that's not even a valid check).

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/6] scsi_debug: prohibit scsi_debug_unmap_granularity == scsi_debug_unmap_alignment

2013-04-24 Thread Martin K. Petersen

 Akinobu == Akinobu Mita akinobu.m...@gmail.com writes:

Akinobu scsi_debug prohibits setting scsi_debug_unmap_alignment to be
Akinobu greater than scsi_debug_unmap_granularity.  But setting them to
Akinobu be the same value is not prohibited.  In this case, the only
Akinobu difference with scsi_debug_unmap_alignment == 0 is the logical
Akinobu blocks from 0 to scsi_debug_unmap_alignment - 1 cannot be
Akinobu unmapped.  But the difference is not properly handled in the
Akinobu current code.

Akinobu So this prohibits such unusual setting.

Acked-by: Martin K. Petersen martin.peter...@oracle.com

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/6] scsi_debug: clear correct memory region when LBPRZ is enabled

2013-04-24 Thread Martin K. Petersen

 Akinobu == Akinobu Mita akinobu.m...@gmail.com writes:

Akinobu The function unmap_region() clears memory region specified as
Akinobu the logical block address and the number of logical blocks in
Akinobu ramdisk storage (fake_storep) if lbpu and lbprz module
Akinobu parameters are enabled.

Akinobu In the while loop of unmap_region(), it advances optimal unmap
Akinobu granularity in logical blocks.  But it only clears one logical
Akinobu block at LBA 'block' per loop iteration.  And furthermore, the
Akinobu 'block' is not pointing to a logical block address which should
Akinobu be cleared, it is a index of probisioning map (map_storep).

Acked-by: Martin K. Petersen martin.peter...@oracle.com

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/6] scsi_debug: add translation functions between LBA and index of provisioning map

2013-04-24 Thread Martin K. Petersen

 Akinobu == Akinobu Mita akinobu.m...@gmail.com writes:

Akinobu The translation from LBA to index of provisioning map
Akinobu (map_storep) is used in various places (map_state(),
Akinobu map_region(), and unmap_region()).  But it is not correctly
Akinobu calculated if scsi_debug_unmap_alignment is zero.

Akinobu This introduces correct translation functions between LBA and
Akinobu index of provisioning map:

Akinobu static unsigned long lba_to_map_index(sector_t lba);
Akinobu static sector_t map_index_to_lba(unsigned long index);

Akinobu Actual bug fixes with using these functions will be done by
Akinobu forthcoming patches.

Acked-by: Martin K. Petersen martin.peter...@oracle.com

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/6] scsi_debug: fix initialization of provisioning map

2013-04-24 Thread Martin K. Petersen

 Akinobu == Akinobu Mita akinobu.m...@gmail.com writes:

Akinobu provisioning map (map_storep) is a bitmap accessed by bitops.
Akinobu So the allocation size should be a multiple of sizeof(unsigned
Akinobu long) and also the bitmap should be cleared by using
Akinobu bitmap_clear() instead of memset().

Akinobu Otherwise it will cause problem on big-endian architecture if
Akinobu the number of bits is not a multiple of BITS_PER_LONG.

Acked-by: Martin K. Petersen martin.peter...@oracle.com

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: sd: workaround invalid OPTIMAL TRANSFER LENGTH

2013-04-24 Thread Mike Snitzer

On Wed, Apr 24 2013 at  9:19pm -0400,
Martin K. Petersen martin.peter...@oracle.com wrote:

  Mike == Mike Snitzer snit...@redhat.com writes:
 
 Mike,
 
 Following up on our discussion at LSF/MM last week...
 
 Mike Workaround disk firmware that improperly sets OPTIMAL TRANSFER
 Mike LENGTH to 0x (aka UINT_MAX or 4294967295U) by assuming
 Mike this _optional_ BLOCK LIMITS VPD field was not specified (0).
 
 My only real objection is that UINT_MAX appears to be completely
 coincidental. What if the next drive specifies UINT_MAX-1 or $RANDOM?
 
 Lacking a decent heuristic I'd rather just blacklist the offending drive
 using the skip_vpd_pages knob we already have in place.

OK, sounds fine to me, thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] scsi_debug: fix logical block provisioning support

2013-04-24 Thread Martin K. Petersen

 Akinobu == Akinobu Mita akinobu.m...@gmail.com writes:

Akinobu This problem is due to the wrong translation between LBA and
Akinobu index of provisioning map.  Fix it by using correct translation
Akinobu functions added previously.

Acked-by: Martin K. Petersen martin.peter...@oracle.com

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 1/1] sd_dif: problem with verify of type 1 protection information (PI)

2013-04-24 Thread Martin K. Petersen

 Jeremy == Jeremy Higdon jer...@sgi.com writes:

Jeremy It appears to me that there is a problem with handling of type 1
Jeremy protection information.  It is considering a logical block
Jeremy reference tag of 0x to be an error, but it is actually
Jeremy valid any time ((lba  0x) == 0x) [for example,
Jeremy 2TiB-1, 4TiB-1, 6TiB-1, etc.].

Yes, that's a bogus check. It's actually one I put in way back to debug
a faulty device and forgot to back out. My DIF test tooling does not
rely on the block layer generate/verify functions so I never noticed.

The same issue was reported recently by Keith Busch.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/6] scsi_debug: call map_region() and unmap_region() only when needed

2013-04-24 Thread Martin K. Petersen

 Akinobu == Akinobu Mita akinobu.m...@gmail.com writes:

Akinobu If the logical block provisioning is not enabled, map_region()
Akinobu and unmap_region() have no effect and they don't need to be
Akinobu called.

Akinobu So this makes map_region() and unmap_region() to be called only
Akinobu when scsi_debug_lbp() returns true, i.e. logical block
Akinobu provisioning is enabled.

Acked-by: Martin K. Petersen martin.peter...@oracle.com


Akinobu Because scsi_debug_unmap_granularity cannot be zero with usual
Akinobu setting: scsi_debug_unmap_granularity is 1 by default, and it
Akinobu can be changed to zero with explicit module parameter setting
Akinobu only when the logical block provisioning is disabled.  But it
Akinobu is only meaningful module parameter when the logical block
Akinobu provisioning is enabled.

As you have found out, I generally allow module parameter values and
combinations thereof that do not make strict sense. That's because the
main reason for adding these parameters in the first place is to test
the block and SCSI layer code that queries them. scsi_debug is a test
vehicle and being able to simulate a device with broken reporting is
something I use a lot.

That said, I don't think we have any test scripts that actually depend
on this combination of unmap parameters so I'm ok with you cleaning them
up.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] scsi_debug: fix data integrity support on highmem machine

2013-04-24 Thread Martin K. Petersen

 Akinobu == Akinobu Mita akinobu.m...@gmail.com writes:

Akinobu kmap_atomic() is now using stack based implementation and
Akinobu doesn't take the KM_type argument anymore.  So nesting
Akinobu kmap_atomic() calls must be properly stacked.

Akinobu This fixes nesting kmap_atomic() calls for scsi_sglist and
Akinobu scsi_prot_sglist in prot_verify_write().

I don't see how the prog_sglist is incorrectly nested? Also, with your
code you also end up mapping and unmapping the protection page for every
data segment. The two scatterlists have different cadence.


Akinobu This also fixes another issue that invalid kmap address is used
Akinobu for kunmap_atomic(): the kmap address 'daddr' is incremented in
Akinobu the loop for each data page, and it can reach the next page
Akinobu boundary.

That fix is fine.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] scsi_debug: simplify offset calculation for dif_storep

2013-04-24 Thread Martin K. Petersen

 Akinobu == Akinobu Mita akinobu.m...@gmail.com writes:

+   sector += len / sizeof(*dif_storep);

I'd rather see sizeof(struct scsi_dif_tuple) here. But that's just
personal preference. Otherwise ok.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] scsi_debug: fix NULL pointer dereference with parameters dif=0 dix=1

2013-04-24 Thread Martin K. Petersen

 Akinobu == Akinobu Mita akinobu.m...@gmail.com writes:

Akinobu The protection info dif_storep is allocated only when parameter
Akinobu dif is not zero.  But it will be accessed when reading or
Akinobu writing to the storage installed with parameter dix is not
Akinobu zero.

Akinobu So kernel crashes if scsi_debug module is loaded with
Akinobu parameters dix=1 and dif=0.

The full story is that scsi_debug does not support DIF and DIX correctly
by virtue of simultaneously being the HBA and the target. And since
there is no actual data transfer between the HBA and the target the
notion of DIF is weak at best.

I did look into making scsi_debug do the right thing but it's quite a
bit of code and I lost interest about halfway through the effort. If
you'd like to fix this properly I can probably find the patch to use as
baseline?

In this particular case I'm ok with your patch keying off of
scsi_debug_dix. But it's very important to me that I'm able to set
P_TYPE and the host protection mask as I see fit using the module
parameters. Even when the combination of options don't make strict
sense.

Acked-by: Martin K. Petersen martin.peter...@oracle.com

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/5] scsi: ufs: wrap the i/o access operations

2013-04-24 Thread Sujit Reddy Thumma


On 4/24/2013 9:36 PM, Seungwon Jeon wrote:

Simplify operations with hiding mmio_base.

Signed-off-by: Seungwon Jeon tgih@samsung.com
---
  drivers/scsi/ufs/ufshcd.c |  106 +++--
  drivers/scsi/ufs/ufshcd.h |5 ++
  2 files changed, 49 insertions(+), 62 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 41b9639..b6c19b0 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -71,7 +71,7 @@ enum {
   */
  static inline u32 ufshcd_get_ufs_version(struct ufs_hba *hba)
  {
-   return readl(hba-mmio_base + REG_UFS_VERSION);
+   return ufshcd_readl(hba, REG_UFS_VERSION);
  }

  /**
@@ -130,8 +130,7 @@ static inline int ufshcd_get_tm_free_slot(struct ufs_hba 
*hba)
   */
  static inline void ufshcd_utrl_clear(struct ufs_hba *hba, u32 pos)
  {
-   writel(~(1  pos),
-   (hba-mmio_base + REG_UTP_TRANSFER_REQ_LIST_CLEAR));
+   ufshcd_writel(hba, REG_UTP_TRANSFER_REQ_LIST_CLEAR, ~(1  pos));
  }

  /**
@@ -165,7 +164,7 @@ static inline int ufshcd_get_lists_status(u32 reg)
   */
  static inline int ufshcd_get_uic_cmd_result(struct ufs_hba *hba)
  {
-   return readl(hba-mmio_base + REG_UIC_COMMAND_ARG_2) 
+   return ufshcd_readl(hba, REG_UIC_COMMAND_ARG_2) 
   MASK_UIC_COMMAND_RESULT;
  }

@@ -243,18 +242,14 @@ ufshcd_config_int_aggr(struct ufs_hba *hba, int option)
  {
switch (option) {
case INT_AGGR_RESET:
-   writel((INT_AGGR_ENABLE |
-   INT_AGGR_COUNTER_AND_TIMER_RESET),
-   (hba-mmio_base +
-REG_UTP_TRANSFER_REQ_INT_AGG_CONTROL));
+   ufshcd_writel(hba, REG_UTP_TRANSFER_REQ_INT_AGG_CONTROL,
+ INT_AGGR_ENABLE | 
INT_AGGR_COUNTER_AND_TIMER_RESET);
break;
case INT_AGGR_CONFIG:
-   writel((INT_AGGR_ENABLE |
-   INT_AGGR_PARAM_WRITE |
-   INT_AGGR_COUNTER_THRESHOLD_VALUE |
-   INT_AGGR_TIMEOUT_VALUE),
-   (hba-mmio_base +
-REG_UTP_TRANSFER_REQ_INT_AGG_CONTROL));
+   ufshcd_writel(hba, REG_UTP_TRANSFER_REQ_INT_AGG_CONTROL,
+ INT_AGGR_ENABLE | INT_AGGR_PARAM_WRITE |
+ INT_AGGR_COUNTER_THRESHOLD_VALUE |
+ INT_AGGR_TIMEOUT_VALUE);
break;
}
  }
@@ -267,12 +262,10 @@ ufshcd_config_int_aggr(struct ufs_hba *hba, int option)
   */
  static void ufshcd_enable_run_stop_reg(struct ufs_hba *hba)
  {
-   writel(UTP_TASK_REQ_LIST_RUN_STOP_BIT,
-  (hba-mmio_base +
-   REG_UTP_TASK_REQ_LIST_RUN_STOP));
-   writel(UTP_TRANSFER_REQ_LIST_RUN_STOP_BIT,
-  (hba-mmio_base +
-   REG_UTP_TRANSFER_REQ_LIST_RUN_STOP));
+   ufshcd_writel(hba, REG_UTP_TASK_REQ_LIST_RUN_STOP,
+ UTP_TASK_REQ_LIST_RUN_STOP_BIT);
+   ufshcd_writel(hba, REG_UTP_TRANSFER_REQ_LIST_RUN_STOP,
+ UTP_TRANSFER_REQ_LIST_RUN_STOP_BIT);
  }

  /**
@@ -281,7 +274,7 @@ static void ufshcd_enable_run_stop_reg(struct ufs_hba *hba)
   */
  static inline void ufshcd_hba_start(struct ufs_hba *hba)
  {
-   writel(CONTROLLER_ENABLE , (hba-mmio_base + REG_CONTROLLER_ENABLE));
+   ufshcd_writel(hba, REG_CONTROLLER_ENABLE, CONTROLLER_ENABLE);
  }

  /**
@@ -290,7 +283,7 @@ static inline void ufshcd_hba_start(struct ufs_hba *hba)
   */
  static inline void ufshcd_hba_stop(struct ufs_hba *hba)
  {
-   writel(CONTROLLER_DISABLE, (hba-mmio_base + REG_CONTROLLER_ENABLE));
+   ufshcd_writel(hba, REG_CONTROLLER_ENABLE, CONTROLLER_DISABLE);
  }

  /**
@@ -301,7 +294,7 @@ static inline void ufshcd_hba_stop(struct ufs_hba *hba)
   */
  static inline int ufshcd_is_hba_active(struct ufs_hba *hba)
  {
-   return (readl(hba-mmio_base + REG_CONTROLLER_ENABLE)  0x1) ? 0 : 1;
+   return (ufshcd_readl(hba, REG_CONTROLLER_ENABLE)  0x1) ? 0 : 1;
  }

  /**
@@ -313,8 +306,7 @@ static inline
  void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
  {
__set_bit(task_tag, hba-outstanding_reqs);
-   writel((1  task_tag),
-  (hba-mmio_base + REG_UTP_TRANSFER_REQ_DOOR_BELL));
+   ufshcd_writel(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL, 1  task_tag);
  }

  /**
@@ -338,8 +330,7 @@ static inline void ufshcd_copy_sense_data(struct ufshcd_lrb 
*lrbp)
   */
  static inline void ufshcd_hba_capabilities(struct ufs_hba *hba)
  {
-   hba-capabilities =
-   readl(hba-mmio_base + REG_CONTROLLER_CAPABILITIES);
+   hba-capabilities = ufshcd_readl(hba, REG_CONTROLLER_CAPABILITIES);

/* nutrs and nutmrs are 0 based values */
hba-nutrs = (hba-capabilities  MASK_TRANSFER_REQUESTS_SLOTS) + 1;
@@ -356,16 +347,13 @@ static inline void
  ufshcd_send_uic_command(struct ufs_hba *hba, struct uic_command

Re: [PATCH 4/5] scsi: ufs: rework link start-up process

2013-04-24 Thread Sujit Reddy Thumma


On 4/24/2013 9:36 PM, Seungwon Jeon wrote:

Link start-up requires long time with multiphase handshakes
between UFS host and device. This affects driver's probe time.
This patch let link start-up run asynchronously.
And completion time of uic command is defined to avoid a
permanent wait.


I have similar patch posted few days back scsi: ufs: Generalize UFS 
Interconnect Layer (UIC) command support which does a bit more (mutex, 
error handling) than what is done here. Can that be used/improved?




Signed-off-by: Seungwon Jeon tgih@samsung.com
---
  drivers/scsi/ufs/ufshcd.c |  114 +---
  drivers/scsi/ufs/ufshcd.h |6 ++-
  2 files changed, 89 insertions(+), 31 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index efe2256..76ff332 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -38,6 +38,7 @@
  #define UFSHCD_ENABLE_INTRS   (UTP_TRANSFER_REQ_COMPL |\
 UTP_TASK_REQ_COMPL |\
 UFSHCD_ERROR_MASK)
+#define UIC_CMD_TIMEOUT100

  enum {
UFSHCD_MAX_CHANNEL  = 0,
@@ -357,13 +358,15 @@ static inline void ufshcd_hba_capabilities(struct ufs_hba 
*hba)
  }

  /**
- * ufshcd_send_uic_command - Send UIC commands to unipro layers
+ * ufshcd_dispatch_uic_cmd - Dispatch UIC commands to unipro layers
   * @hba: per adapter instance
   * @uic_command: UIC command
   */
  static inline void
-ufshcd_send_uic_command(struct ufs_hba *hba, struct uic_command *uic_cmnd)
+ufshcd_dispatch_uic_cmd(struct ufs_hba *hba, struct uic_command *uic_cmnd)
  {
+   init_completion(uic_cmnd-done);
+
/* Write Args */
ufshcd_writel(hba, REG_UIC_COMMAND_ARG_1, uic_cmnd-argument1);
ufshcd_writel(hba, REG_UIC_COMMAND_ARG_2, uic_cmnd-argument2);
@@ -375,6 +378,45 @@ ufshcd_send_uic_command(struct ufs_hba *hba, struct 
uic_command *uic_cmnd)
  }

  /**
+ * ufshcd_wait_for_uic_cmd - Wait complectioin of UIC command
+ * @hba: per adapter instance
+ * @uic_command: UIC command
+ *
+ * Returns 0 only if success.
+ */
+static int ufshcd_wait_for_uic_cmd(struct ufs_hba *hba)
+{
+   struct uic_command *uic_cmd = hba-active_uic_cmd;
+   int ret;
+
+   if (wait_for_completion_timeout(uic_cmd-done,
+   msecs_to_jiffies(UIC_CMD_TIMEOUT)))
+   ret = ufshcd_get_uic_cmd_result(hba);
+   else
+   ret = -ETIMEDOUT;
+
+   return ret;
+}
+
+/**
+ * ufshcd_ready_uic_cmd - Check if controller is ready
+ *to accept UIC commands
+ * @hba: per adapter instance
+ * Return true on success, else false
+ */
+static inline bool ufshcd_ready_uic_cmd(struct ufs_hba *hba)
+{
+   if (ufshcd_readl(hba, REG_CONTROLLER_STATUS)  UIC_COMMAND_READY) {
+   return true;
+   } else {
+   dev_err(hba-dev,
+   Controller not ready
+to accept UIC commands\n);
+   return false;
+   }
+}
+
+/**
   * ufshcd_map_sg - Map scatter-gather list to prdt
   * @lrbp - pointer to local reference block
   *
@@ -735,15 +777,10 @@ static int ufshcd_dme_link_startup(struct ufs_hba *hba)
  {
struct uic_command *uic_cmd;
unsigned long flags;
+   int ret;

-   /* check if controller is ready to accept UIC commands */
-   if (((ufshcd_readl(hba, REG_CONTROLLER_STATUS)) 
-   UIC_COMMAND_READY) == 0x0) {
-   dev_err(hba-dev,
-   Controller not ready
-to accept UIC commands\n);
+   if (!ufshcd_ready_uic_cmd(hba))
return -EIO;
-   }

spin_lock_irqsave(hba-host-host_lock, flags);

@@ -754,13 +791,16 @@ static int ufshcd_dme_link_startup(struct ufs_hba *hba)
uic_cmd-argument2 = 0;
uic_cmd-argument3 = 0;

-   /* enable UIC related interrupts */
-   ufshcd_enable_intr(hba, UIC_COMMAND_COMPL);
+   /* Dispatching UIC commands to controller */
+   ufshcd_dispatch_uic_cmd(hba, uic_cmd);

-   /* sending UIC commands to controller */
-   ufshcd_send_uic_command(hba, uic_cmd);
spin_unlock_irqrestore(hba-host-host_lock, flags);
-   return 0;
+
+   ret = ufshcd_wait_for_uic_cmd(hba);
+   if (ret)
+   dev_err(hba-dev, link startup: error code %d returned\n, 
ret);
+
+   return ret;
  }

  /**
@@ -898,6 +938,9 @@ static int ufshcd_initialize_hba(struct ufs_hba *hba)
if (ufshcd_hba_enable(hba))
return -EIO;

+   /* enable UIC related interrupts */
+   ufshcd_enable_intr(hba, UIC_COMMAND_COMPL | UIC_ERROR);
+
/* Configure UTRL and UTMRL base address registers */
ufshcd_writel(hba, REG_UTP_TRANSFER_REQ_LIST_BASE_L,
  lower_32_bits(hba-utrdl_dma_addr));
@@ -909,7 +952,9 @@ static int ufshcd_initialize_hba(struct ufs_hba *hba)

Re: [PATCH][v3] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd

2013-04-24 Thread Hannes Reinecke

On 04/24/2013 04:48 PM, James Bottomley wrote:
 On Wed, 2013-04-24 at 13:32 +0200, Hannes Reinecke wrote:
 scsi_send_eh_cmnd() is calling queuecommand() directly, so
 it needs to check the return value here.
 The only valid return codes for queuecommand() are 'busy'
 states, so we need to wait for a bit to allow the LLDD
 to recover.

 Based on an earlier patch from Wen Xiong.
 
 This looks good, only one minor nitpick:
 
 Cc: Wen Xiong wenxi...@linux.vnet.ibm.com
 Cc: Brian King brk...@linux.vnet.ibm.com
 Signed-off-by: Hannes Reinecke h...@suse.de

 diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
 index c1b05a8..1fc6da94 100644
 --- a/drivers/scsi/scsi_error.c
 +++ b/drivers/scsi/scsi_error.c
 @@ -791,22 +791,32 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
 unsigned char *cmnd,
  struct scsi_device *sdev = scmd-device;
  struct Scsi_Host *shost = sdev-host;
  DECLARE_COMPLETION_ONSTACK(done);
 -unsigned long timeleft;
 +unsigned long timeleft = timeout;
  struct scsi_eh_save ses;
 +const int stall_for = min(HZ/10, 1);
  int rtn;
  
 +retry:
  scsi_eh_prep_cmnd(scmd, ses, cmnd, cmnd_size, sense_bytes);
  shost-eh_action = done;
  
  scsi_log_send(scmd);
  scmd-scsi_done = scsi_eh_done;
 -shost-hostt-queuecommand(shost, scmd);
 -
 -timeleft = wait_for_completion_timeout(done, timeout);
 +rtn = shost-hostt-queuecommand(shost, scmd);
 +if (rtn) {
 +if (timeleft) {
 +scsi_eh_restore_cmnd(scmd, ses);
 +timeleft -= stall_for;
 +msleep(stall_for);
 
 Stall for is in HZ: need to convert to ms, so
 
   msleep(stall_for * 1000/HZ);
 
Right. Will be converting it.

 I also don't see a need to restore and reprep the command each time, but
 I don't see a problem with it either.
 
This was mainly thought as a safeguard here; just in case someone
else might be fiddling with the command. But yeah, it's not required.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] qla2xxx: fix sparse warning large integer implicitly truncated to unsigned type

2013-04-24 Thread Saurav Kashyap

From: Armen Baloyan armen.balo...@qlogic.com

Signed-off-by: Armen Baloyan armen.balo...@qlogic.com
Signed-off-by: Saurav Kashyap saurav.kash...@qlogic.com
---
 drivers/scsi/qla2xxx/qla_mr.c |6 ++
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_mr.c b/drivers/scsi/qla2xxx/qla_mr.c
index 729b743..937fed8 100644
--- a/drivers/scsi/qla2xxx/qla_mr.c
+++ b/drivers/scsi/qla2xxx/qla_mr.c
@@ -3003,12 +3003,10 @@ qlafx00_build_scsi_iocbs(srb_t *sp, struct 
cmd_type_7_fx00 *cmd_pkt,
 
/* Set transfer direction */
if (cmd-sc_data_direction == DMA_TO_DEVICE) {
-   lcmd_pkt-cntrl_flags =
-   __constant_cpu_to_le16(TMF_WRITE_DATA);
+   lcmd_pkt-cntrl_flags = TMF_WRITE_DATA;
vha-qla_stats.output_bytes += scsi_bufflen(cmd);
} else if (cmd-sc_data_direction == DMA_FROM_DEVICE) {
-   lcmd_pkt-cntrl_flags =
-   __constant_cpu_to_le16(TMF_READ_DATA);
+   lcmd_pkt-cntrl_flags = TMF_READ_DATA;
vha-qla_stats.input_bytes += scsi_bufflen(cmd);
}
 
-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] qla2xxx: Patches for scsi misc branch.

2013-04-24 Thread Saurav Kashyap

Hi James,

This patchset fixes the sparse warnings. Please apply the following patches
to the scsi tree at your earliest convenience for inclusion in the next
mainline merge window.

Thanks,
~Saurav
 
Armen Baloyan (1):
  qla2xxx: fix sparse warning large integer implicitly truncated to
unsigned type

Fengguang Wu (1):
  qla2xxx: qla2x00_sp_compl can be static.

 drivers/scsi/qla2xxx/qla_mr.c |6 ++
 drivers/scsi/qla2xxx/qla_os.c |2 +-
 2 files changed, 3 insertions(+), 5 deletions(-)

-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] qla2xxx: qla2x00_sp_compl can be static.

2013-04-24 Thread Saurav Kashyap

From: Fengguang Wu fengguang...@intel.com

Signed-off-by: Fengguang Wu fengguang...@intel.com
Signed-off-by: Saurav Kashyap saurav.kash...@qlogic.com
---
 drivers/scsi/qla2xxx/qla_os.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index a083715..7b621c2 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -644,7 +644,7 @@ qla2x00_sp_free_dma(void *vha, void *ptr)
qla2x00_rel_sp(sp-fcport-vha, sp);
 }
 
-void
+static void
 qla2x00_sp_compl(void *data, void *ptr, int res)
 {
struct qla_hw_data *ha = (struct qla_hw_data *)data;
-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

51 matches

Mail list logo