Re: [PATCH] block devices: validate block device capacity

2014-01-31 Thread Mikulas Patocka


On Thu, 30 Jan 2014, James Bottomley wrote:

  So, if you want 64-bit page offsets, you need to increase pgoff_t size, 
  and that will increase the limit for both files and block devices.
 
 No.  The point is the page cache mapping of the device uses a
 manufactured inode saved in the backing device. It looks fixable in the
 buffer code before the page cache gets involved.

So if you think you can support 16TiB devices and leave pgoff_t 32-bit, 
send a patch that does it.

Until you make it, you should apply the patch that I sent, that prevents 
kernel lockups or data corruption when the user uses 16TiB device on 
32-bit kernel.

Mikulas
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] st: Do not rewind for SG_IO

2014-01-31 Thread Hannes Reinecke
Plain 'st' devices are defined to do a rewind on close.
This causes quite some issues when trying to read the
VPD pages to figure out the WWN of the device.
Especially for udev this means we either have to use
another (non-rewinding) device node for this
(and thereby introducing race conditions) or not
generating a persistent device node at all
(and breaking multi-tape setups).

This patch make the tape always non-rewinding
when SG_IO is used, thus allowing udev to get
a proper device id for tapes.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/st.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index dc4826c..da18ce5 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -3645,6 +3645,9 @@ static long st_ioctl(struct file *file, unsigned int 
cmd_in, unsigned long arg)
case SCSI_IOCTL_GET_IDLUN:
case SCSI_IOCTL_GET_BUS_NUMBER:
break;
+   case SG_IO:
+   STp-rew_at_close = 0;
+   /* Fallthrough */
default:
if ((cmd_in == SG_IO ||
 cmd_in == SCSI_IOCTL_SEND_COMMAND ||
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 00/16] scsi_dh_alua updates

2014-01-31 Thread Hannes Reinecke
Hi James,

here's an update for the ALUA device handler I've been hoarding
for quite some time. The major bit here is the asynchronous
RTPG handling. With the original design we would treat every
LUN independently, despite the fact that several LUNs might
in fact belong to the same target port group. So any
change on one LUN will affect the others, too.
And we now can treat LUNs in 'transitioning' ALUA mode
correctly, as now we'll be blocking any I/O in the prep_fn()
until the controller is in a working state again.

This is the second version of the patchset, containing
suggested changes from Mike Christie to move
GFP_ATOMIC to GFP_KERNEL allocations.

Hannes Reinecke (16):
  scsi_dh_alua: Improve error handling
  scsi_dh_alua: use flag for RTPG extended header
  scsi_dh_alua: Pass buffer as function argument
  scsi_dh_alua: Make stpg synchronous
  scsi_dh_alua: put sense buffer on stack
  scsi_dh_alua: use local buffer for VPD inquiry
  scsi_dh_alua: Use separate alua_port_group structure
  scsi_dh_alua: parse target device id
  scsi_dh_alua: simplify sense code handling
  scsi_dh_alua: Do not attach to management devices
  scsi_dh_alua: multipath failover fails with error 15
  scsi_dh: return individual errors in scsi_dh_activate()
  scsi_dh_alua: Clarify logging message
  scsi_dh: invoke callback if -activate is not present
  scsi_dh_alua: revert commit a8e5a2d593cbfccf530c3382c2c328d2edaa7b66
  scsi_dh_alua: Use workqueue for RTPG

 drivers/scsi/device_handler/scsi_dh.c  |   18 +-
 drivers/scsi/device_handler/scsi_dh_alua.c | 1045 
 2 files changed, 750 insertions(+), 313 deletions(-)

-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/16] scsi_dh_alua: use flag for RTPG extended header

2014-01-31 Thread Hannes Reinecke
We should be using a flag when RTPG extended header is not
supported, that saves us sending RTPG twice for older arrays.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index e4e5497..ece2255 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -61,6 +61,7 @@
 
 /* flags passed from user level */
 #define ALUA_OPTIMIZE_STPG 1
+#define ALUA_RTPG_EXT_HDR_UNSUPP   2
 
 struct alua_dh_data {
int group_id;
@@ -181,8 +182,7 @@ done:
  * submit_rtpg - Issue a REPORT TARGET GROUP STATES command
  * @sdev: sdev the command should be sent to
  */
-static unsigned submit_rtpg(struct scsi_device *sdev, struct alua_dh_data *h,
-   bool rtpg_ext_hdr_req)
+static unsigned submit_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
 {
struct request *rq;
int err;
@@ -195,7 +195,7 @@ static unsigned submit_rtpg(struct scsi_device *sdev, 
struct alua_dh_data *h,
 
/* Prepare the command. */
rq-cmd[0] = MAINTENANCE_IN;
-   if (rtpg_ext_hdr_req)
+   if (!(h-flags  ALUA_RTPG_EXT_HDR_UNSUPP))
rq-cmd[1] = MI_REPORT_TARGET_PGS | MI_EXT_HDR_PARAM_FMT;
else
rq-cmd[1] = MI_REPORT_TARGET_PGS;
@@ -565,7 +565,6 @@ static int alua_rtpg(struct scsi_device *sdev, struct 
alua_dh_data *h, int wait_
int len, k, off, valid_states = 0;
unsigned char *ucp;
unsigned err, retval;
-   bool rtpg_ext_hdr_req = 1;
unsigned long expiry, interval = 0;
unsigned int tpg_desc_tbl_off;
unsigned char orig_transition_tmo;
@@ -576,7 +575,7 @@ static int alua_rtpg(struct scsi_device *sdev, struct 
alua_dh_data *h, int wait_
expiry = round_jiffies_up(jiffies + h-transition_tmo * HZ);
 
  retry:
-   retval = submit_rtpg(sdev, h, rtpg_ext_hdr_req);
+   retval = submit_rtpg(sdev, h);
 
if (retval) {
if (h-senselen == 0 ||
@@ -600,10 +599,10 @@ static int alua_rtpg(struct scsi_device *sdev, struct 
alua_dh_data *h, int wait_
 * The retry without rtpg_ext_hdr_req set
 * handles this.
 */
-   if (rtpg_ext_hdr_req == 1 
+   if (!(h-flags  ALUA_RTPG_EXT_HDR_UNSUPP) 
sense_hdr.sense_key == ILLEGAL_REQUEST 
sense_hdr.asc == 0x24  sense_hdr.ascq == 0) {
-   rtpg_ext_hdr_req = 0;
+   h-flags |= ALUA_RTPG_EXT_HDR_UNSUPP;
goto retry;
}
 
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/16] scsi_dh_alua: use local buffer for VPD inquiry

2014-01-31 Thread Hannes Reinecke
VPD inquiry need to be done only once, so we can be using
a local buffer here.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 45 ++
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index adc77ef..88f98e0 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -322,16 +322,27 @@ static int alua_check_tpgs(struct scsi_device *sdev, 
struct alua_dh_data *h)
  */
 static int alua_vpd_inquiry(struct scsi_device *sdev, struct alua_dh_data *h)
 {
+   unsigned char *buff;
+   unsigned char bufflen = 36;
int len, timeout = ALUA_FAILOVER_TIMEOUT;
unsigned char sense[SCSI_SENSE_BUFFERSIZE];
struct scsi_sense_hdr sense_hdr;
unsigned retval;
unsigned char *d;
unsigned long expiry;
+   int err;
 
expiry = round_jiffies_up(jiffies + timeout);
  retry:
-   retval = submit_vpd_inquiry(sdev, h-buff, h-bufflen, sense);
+   buff = kmalloc(bufflen, GFP_KERNEL);
+   if (!buff) {
+   sdev_printk(KERN_WARNING, sdev,
+   %s: kmalloc buffer failed\n,
+   ALUA_DH_NAME);
+   /* Temporary failure, bypass */
+   return SCSI_DH_DEV_TEMP_BUSY;
+   }
+   retval = submit_vpd_inquiry(sdev, buff, bufflen, sense);
if (retval) {
unsigned err;
 
@@ -345,6 +356,7 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
err = SCSI_DH_DEV_TEMP_BUSY;
else
err = SCSI_DH_IO;
+   kfree(buff);
return err;
}
err = alua_check_sense(sdev, sense_hdr);
@@ -362,24 +374,19 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
}
 
/* Check if vpd page exceeds initial buffer */
-   len = (h-buff[2]  8) + h-buff[3] + 4;
-   if (len  h-bufflen) {
+   len = (buff[2]  8) + buff[3] + 4;
+   if (len  bufflen) {
/* Resubmit with the correct length */
-   if (realloc_buffer(h, len)) {
-   sdev_printk(KERN_WARNING, sdev,
-   %s: kmalloc buffer failed\n,
-   ALUA_DH_NAME);
-   /* Temporary failure, bypass */
-   return SCSI_DH_DEV_TEMP_BUSY;
-   }
+   kfree(buff);
+   bufflen = len;
goto retry;
}
 
/*
 * Now look for the correct descriptor.
 */
-   d = h-buff + 4;
-   while (d  h-buff + len) {
+   d = buff + 4;
+   while (d  buff + len) {
switch (d[1]  0xf) {
case 0x4:
/* Relative target port */
@@ -406,13 +413,15 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
ALUA_DH_NAME);
h-state = TPGS_STATE_OPTIMIZED;
h-tpgs = TPGS_MODE_NONE;
-   return SCSI_DH_DEV_UNSUPP;
+   err = SCSI_DH_DEV_UNSUPP;
+   } else {
+   sdev_printk(KERN_INFO, sdev,
+   %s: port group %02x rel port %02x\n,
+   ALUA_DH_NAME, h-group_id, h-rel_port);
+   err = SCSI_DH_OK;
}
-   sdev_printk(KERN_INFO, sdev,
-   %s: port group %02x rel port %02x\n,
-   ALUA_DH_NAME, h-group_id, h-rel_port);
-
-   return SCSI_DH_OK;
+   kfree(buff);
+   return err;
 }
 
 static char print_alua_state(int state)
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/16] scsi_dh_alua: Use separate alua_port_group structure

2014-01-31 Thread Hannes Reinecke
The port group needs to be a separate structure as several
LUNs might belong to the same group.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 220 ++---
 1 file changed, 139 insertions(+), 81 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index 88f98e0..0af6866 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -63,9 +63,13 @@
 #define ALUA_OPTIMIZE_STPG 1
 #define ALUA_RTPG_EXT_HDR_UNSUPP   2
 
-struct alua_dh_data {
+static LIST_HEAD(port_group_list);
+static DEFINE_SPINLOCK(port_group_lock);
+
+struct alua_port_group {
+   struct kref kref;
+   struct list_headnode;
int group_id;
-   int rel_port;
int tpgs;
int state;
int pref;
@@ -74,6 +78,13 @@ struct alua_dh_data {
unsigned char   *buff;
int bufflen;
unsigned char   transition_tmo;
+};
+
+struct alua_dh_data {
+   struct alua_port_group  *pg;
+   int rel_port;
+   int tpgs;
+   unsignedflags; /* used for optimizing STPG */
struct scsi_device  *sdev;
activate_complete   callback_fn;
void*callback_data;
@@ -92,18 +103,18 @@ static inline struct alua_dh_data *get_alua_data(struct 
scsi_device *sdev)
return ((struct alua_dh_data *) scsi_dh_data-buf);
 }
 
-static int realloc_buffer(struct alua_dh_data *h, unsigned len)
+static int realloc_buffer(struct alua_port_group *pg, unsigned len)
 {
-   if (h-buff  h-buff != h-inq)
-   kfree(h-buff);
+   if (pg-buff  pg-buff != pg-inq)
+   kfree(pg-buff);
 
-   h-buff = kmalloc(len, GFP_NOIO);
-   if (!h-buff) {
-   h-buff = h-inq;
-   h-bufflen = ALUA_INQUIRY_SIZE;
+   pg-buff = kmalloc(len, GFP_NOIO);
+   if (!pg-buff) {
+   pg-buff = pg-inq;
+   pg-bufflen = ALUA_INQUIRY_SIZE;
return 1;
}
-   h-bufflen = len;
+   pg-bufflen = len;
return 0;
 }
 
@@ -137,6 +148,20 @@ static struct request *get_alua_req(struct scsi_device 
*sdev,
return rq;
 }
 
+static void release_port_group(struct kref *kref)
+{
+   struct alua_port_group *pg;
+
+   pg = container_of(kref, struct alua_port_group, kref);
+   printk(KERN_WARNING alua: release port group %d\n, pg-group_id);
+   spin_lock(port_group_lock);
+   list_del(pg-node);
+   spin_unlock(port_group_lock);
+   if (pg-buff  pg-inq != pg-buff)
+   kfree(pg-buff);
+   kfree(pg);
+}
+
 /*
  * submit_vpd_inquiry - Issue an INQUIRY VPD page 0x83 command
  * @sdev: sdev the command should be sent to
@@ -327,10 +352,11 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
int len, timeout = ALUA_FAILOVER_TIMEOUT;
unsigned char sense[SCSI_SENSE_BUFFERSIZE];
struct scsi_sense_hdr sense_hdr;
-   unsigned retval;
+   unsigned retval, err;
+   int group_id = -1;
unsigned char *d;
unsigned long expiry;
-   int err;
+   struct alua_port_group *pg = NULL;
 
expiry = round_jiffies_up(jiffies + timeout);
  retry:
@@ -344,8 +370,6 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
}
retval = submit_vpd_inquiry(sdev, buff, bufflen, sense);
if (retval) {
-   unsigned err;
-
if (!(driver_byte(retval)  DRIVER_SENSE) ||
!scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE,
  sense_hdr)) {
@@ -356,8 +380,7 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
err = SCSI_DH_DEV_TEMP_BUSY;
else
err = SCSI_DH_IO;
-   kfree(buff);
-   return err;
+   goto out;
}
err = alua_check_sense(sdev, sense_hdr);
if (err == ADD_TO_MLQUEUE  time_before(jiffies, expiry))
@@ -369,7 +392,8 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
sdev_printk(KERN_INFO, sdev,
%s: evpd inquiry failed, , ALUA_DH_NAME);
scsi_show_extd_sense(sense_hdr.asc, sense_hdr.ascq);
-   return SCSI_DH_IO;
+   err = SCSI_DH_IO;
+   goto out;
}
}
 
@@ -394,7 +418,7 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
 

[PATCH 10/16] scsi_dh_alua: Do not attach to management devices

2014-01-31 Thread Hannes Reinecke
Management devices should be ignored when
detecting ALUA capabilites.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index 174ff45..a1c69bb 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -315,6 +315,23 @@ static int alua_check_tpgs(struct scsi_device *sdev, 
struct alua_dh_data *h)
 {
int err = SCSI_DH_OK;
 
+   if (scsi_is_wlun(sdev-lun)) {
+   h-tpgs = TPGS_MODE_NONE;
+   sdev_printk(KERN_INFO, sdev,
+   %s: disable for WLUN\n,
+   ALUA_DH_NAME);
+   return SCSI_DH_DEV_UNSUPP;
+   }
+   if (sdev-type == TYPE_RAID ||
+   sdev-type == TYPE_ENCLOSURE ||
+   sdev-type == 0x1F) {
+   h-tpgs = TPGS_MODE_NONE;
+   sdev_printk(KERN_INFO, sdev,
+   %s: disable for enclosure devices\n,
+   ALUA_DH_NAME);
+   return SCSI_DH_DEV_UNSUPP;
+   }
+
h-tpgs = scsi_device_tpgs(sdev);
switch (h-tpgs) {
case TPGS_MODE_EXPLICIT|TPGS_MODE_IMPLICIT:
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/16] scsi_dh: invoke callback if -activate is not present

2014-01-31 Thread Hannes Reinecke
When -activate isn't present we still need to invoke the
callbacks, otherwise the system might stall.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/device_handler/scsi_dh.c 
b/drivers/scsi/device_handler/scsi_dh.c
index ae7f399..a90380f 100644
--- a/drivers/scsi/device_handler/scsi_dh.c
+++ b/drivers/scsi/device_handler/scsi_dh.c
@@ -411,7 +411,7 @@ int scsi_dh_activate(struct request_queue *q, 
activate_complete fn, void *data)
err = SCSI_DH_DEV_OFFLINED;
spin_unlock_irqrestore(q-queue_lock, flags);
 
-   if (err != SCSI_DH_OK) {
+   if (err != SCSI_DH_OK || !scsi_dh-activate) {
if (fn)
fn(data, err);
goto out;
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/16] scsi_dh_alua: parse target device id

2014-01-31 Thread Hannes Reinecke
VPD descriptor association 0x2 in VPD page 0x83 identification
descrioptors can be used to identify the array / target device.
Some tricks need to be taken for EMC and HP, which put the
array identification into the standard inquiry.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 123 -
 1 file changed, 119 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index 0af6866..857a999 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -69,6 +69,9 @@ static DEFINE_SPINLOCK(port_group_lock);
 struct alua_port_group {
struct kref kref;
struct list_headnode;
+   unsigned char   target_id[256];
+   unsigned char   target_id_str[256];
+   int target_id_size;
int group_id;
int tpgs;
int state;
@@ -351,12 +354,14 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
unsigned char bufflen = 36;
int len, timeout = ALUA_FAILOVER_TIMEOUT;
unsigned char sense[SCSI_SENSE_BUFFERSIZE];
+   char target_id_str[256], *target_id = NULL;
+   int target_id_size;
struct scsi_sense_hdr sense_hdr;
unsigned retval, err;
int group_id = -1;
unsigned char *d;
unsigned long expiry;
-   struct alua_port_group *pg = NULL;
+   struct alua_port_group *tmp_pg, *pg = NULL;
 
expiry = round_jiffies_up(jiffies + timeout);
  retry:
@@ -409,9 +414,54 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
/*
 * Now look for the correct descriptor.
 */
+   memset(target_id_str, 0, 256);
+   target_id_size = 0;
d = buff + 4;
while (d  buff + len) {
switch (d[1]  0xf) {
+   case 0x2:
+   /* EUI-64 */
+   if ((d[1]  0x30) == 0x20) {
+   target_id_size = d[3];
+   target_id = d + 4;
+   switch (target_id_size) {
+   case 8:
+   sprintf(target_id_str,
+   eui.%8phN, d + 4);
+   break;
+   case 12:
+   sprintf(target_id_str,
+   eui.%12phN, d + 4);
+   break;
+   case 16:
+   sprintf(target_id_str,
+   eui.%16phN, d + 4);
+   break;
+   default:
+   target_id_size = 0;
+   break;
+   }
+   }
+   break;
+   case 0x3:
+   /* NAA */
+   if ((d[1]  0x30) == 0x20) {
+   target_id_size = d[3];
+   target_id = d + 4;
+   switch (target_id_size) {
+   case 8:
+   sprintf(target_id_str,
+   naa.%8phN, d + 4);
+   break;
+   case 16:
+   sprintf(target_id_str,
+   naa.%16phN, d + 4);
+   break;
+   default:
+   target_id_size = 0;
+   break;
+   }
+   }
case 0x4:
/* Relative target port */
h-rel_port = (d[6]  8) + d[7];
@@ -420,6 +470,18 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
/* Target port group */
group_id = (d[6]  8) + d[7];
break;
+   case 0x8:
+   /* SCSI name string */
+   if ((d[1]  0x30) == 0x20) {
+   /* SCSI name */
+   target_id_size = d[3];
+   target_id = d + 4;
+   strncpy(target_id_str, d + 4, 256);
+   if (target_id_size  255)
+   target_id_size = 255;
+   

[PATCH 11/16] scsi_dh_alua: multipath failover fails with error 15

2014-01-31 Thread Hannes Reinecke
When a path is already optimized multipath failover will fail
with the message
Could not failover device X:Y: Handler scsi_dh_alua Error 15

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index a1c69bb..8ea35a9 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -851,6 +851,8 @@ static unsigned alua_stpg(struct scsi_device *sdev, struct 
alua_port_group *pg)
return SCSI_DH_RETRY;
}
switch (pg-state) {
+   case TPGS_STATE_OPTIMIZED:
+   return SCSI_DH_OK;
case TPGS_STATE_NONOPTIMIZED:
if ((pg-flags  ALUA_OPTIMIZE_STPG) 
(!pg-pref) 
@@ -865,10 +867,11 @@ static unsigned alua_stpg(struct scsi_device *sdev, 
struct alua_port_group *pg)
break;
case TPGS_STATE_TRANSITIONING:
return SCSI_DH_RETRY;
-   break;
default:
+   sdev_printk(KERN_INFO, sdev,
+   %s: stpg failed, unhandled TPGS state %d,
+   ALUA_DH_NAME, pg-state);
return SCSI_DH_NOSYS;
-   break;
}
/* Set state to transitioning */
pg-state = TPGS_STATE_TRANSITIONING;
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/16] scsi_dh_alua: Use workqueue for RTPG

2014-01-31 Thread Hannes Reinecke
The current ALUA device_handler has two drawbacks:
- We're sending a 'SET TARGET PORT GROUP' command to every LUN,
  disregarding the fact that several LUNs might be in a port group
  and will be automatically switched whenever _any_ LUN within
  that port group receives the command.
- Whenever a LUN is in 'transitioning' mode we cannot block I/O
  to that LUN, instead the controller has to abort the command.
  This leads to increased traffic across the wire and heavy load
  on the controller during switchover.

With this patch the RTPG handling is moved to a workqueue, which
is being run once per port group. This reduces the number of
'REPORT TARGET PORT GROUP' and 'SET TARGET PORT GROUPS' which
will be send to the controller. It also allows us to block
I/O to any LUN / port group found to be in 'transitioning' ALUA
mode, as the workqueue item will be requeued until the controller
moves out of transitioning.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 389 ++---
 1 file changed, 304 insertions(+), 85 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index 6591ac1..99649b9 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -22,6 +22,8 @@
 #include linux/slab.h
 #include linux/delay.h
 #include linux/module.h
+#include linux/workqueue.h
+#include linux/rcupdate.h
 #include scsi/scsi.h
 #include scsi/scsi_dbg.h
 #include scsi/scsi_eh.h
@@ -58,13 +60,20 @@
 #define ALUA_INQUIRY_SIZE  36
 #define ALUA_FAILOVER_TIMEOUT  60
 #define ALUA_FAILOVER_RETRIES  5
+#define ALUA_RTPG_DELAY_MSECS  5
 
 /* flags passed from user level */
-#define ALUA_OPTIMIZE_STPG 1
-#define ALUA_RTPG_EXT_HDR_UNSUPP   2
+#define ALUA_OPTIMIZE_STPG 0x01
+#define ALUA_RTPG_EXT_HDR_UNSUPP   0x02
+/* State machine flags */
+#define ALUA_PG_RUN_RTPG   0x10
+#define ALUA_PG_RUN_STPG   0x20
+#define ALUA_PG_STPG_DONE  0x40
+
 
 static LIST_HEAD(port_group_list);
 static DEFINE_SPINLOCK(port_group_lock);
+static struct workqueue_struct *kmpath_aluad;
 
 struct alua_port_group {
struct kref kref;
@@ -81,14 +90,26 @@ struct alua_port_group {
unsigned char   *buff;
int bufflen;
unsigned char   transition_tmo;
+   unsigned long   expiry;
+   unsigned long   interval;
+   struct delayed_work rtpg_work;
+   spinlock_t  rtpg_lock;
+   struct list_headrtpg_list;
+   struct scsi_device  *rtpg_sdev;
 };
 
 struct alua_dh_data {
struct alua_port_group  *pg;
+   spinlock_t  pg_lock;
int rel_port;
int tpgs;
+   int error;
unsignedflags; /* used for optimizing STPG */
-   struct scsi_device  *sdev;
+   struct completion   init_complete;
+};
+
+struct alua_queue_data {
+   struct list_headentry;
activate_complete   callback_fn;
void*callback_data;
 };
@@ -98,11 +119,13 @@ struct alua_dh_data {
 
 static char print_alua_state(int);
 static int alua_check_sense(struct scsi_device *, struct scsi_sense_hdr *);
+static void alua_rtpg_work(struct work_struct *work);
+static void alua_check(struct scsi_device *sdev);
 
 static inline struct alua_dh_data *get_alua_data(struct scsi_device *sdev)
 {
struct scsi_dh_data *scsi_dh_data = sdev-scsi_dh_data;
-   BUG_ON(scsi_dh_data == NULL);
+
return ((struct alua_dh_data *) scsi_dh_data-buf);
 }
 
@@ -570,9 +593,12 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
ALUA_DH_NAME, group_id, h-rel_port);
}
if (pg) {
-   h-pg = pg;
kref_get(pg-kref);
spin_unlock(port_group_lock);
+   spin_lock(h-pg_lock);
+   rcu_assign_pointer(h-pg, pg);
+   spin_unlock(h-pg_lock);
+   synchronize_rcu();
err = SCSI_DH_OK;
goto out;
}
@@ -601,9 +627,17 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
pg-state = TPGS_STATE_OPTIMIZED;
pg-flags = h-flags;
kref_init(pg-kref);
+   INIT_DELAYED_WORK(pg-rtpg_work, alua_rtpg_work);
+   INIT_LIST_HEAD(pg-rtpg_list);
+   spin_lock_init(pg-rtpg_lock);
list_add(pg-node, port_group_list);
-   h-pg = pg;
+   kref_get(pg-kref);
spin_unlock(port_group_lock);
+   spin_lock(h-pg_lock);
+   rcu_assign_pointer(h-pg, pg);
+   spin_unlock(h-pg_lock);
+   kref_put(pg-kref, release_port_group);
+   synchronize_rcu();
err = SCSI_DH_OK;
 out:
   

[PATCH 04/16] scsi_dh_alua: Make stpg synchronous

2014-01-31 Thread Hannes Reinecke
We should be issuing STPG synchronously as we need to
evaluate the return code on failure.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 192 +
 1 file changed, 89 insertions(+), 103 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index 5358c2f..ef92008 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -227,82 +227,27 @@ done:
 }
 
 /*
- * stpg_endio - Evaluate SET TARGET GROUP STATES
- * @sdev: the device to be evaluated
- * @state: the new target group state
- *
- * Evaluate a SET TARGET GROUP STATES command response.
- */
-static void stpg_endio(struct request *req, int error)
-{
-   struct alua_dh_data *h = req-end_io_data;
-   struct scsi_sense_hdr sense_hdr;
-   unsigned err = SCSI_DH_OK;
-
-   if (host_byte(req-errors) != DID_OK ||
-   msg_byte(req-errors) != COMMAND_COMPLETE) {
-   err = SCSI_DH_IO;
-   goto done;
-   }
-
-   if (req-sense_len  0) {
-   if (!scsi_normalize_sense(h-sense, SCSI_SENSE_BUFFERSIZE,
- sense_hdr)) {
-   err = SCSI_DH_IO;
-   goto done;
-   }
-   err = alua_check_sense(h-sdev, sense_hdr);
-   if (err == ADD_TO_MLQUEUE) {
-   err = SCSI_DH_RETRY;
-   goto done;
-   }
-   sdev_printk(KERN_INFO, h-sdev, %s: stpg failed, ,
-   ALUA_DH_NAME);
-   scsi_show_sense_hdr(sense_hdr);
-   sdev_printk(KERN_INFO, h-sdev, %s: stpg failed, ,
-   ALUA_DH_NAME);
-   scsi_show_extd_sense(sense_hdr.asc, sense_hdr.ascq);
-   err = SCSI_DH_IO;
-   } else if (error)
-   err = SCSI_DH_IO;
-
-   if (err == SCSI_DH_OK) {
-   h-state = TPGS_STATE_OPTIMIZED;
-   sdev_printk(KERN_INFO, h-sdev,
-   %s: port group %02x switched to state %c\n,
-   ALUA_DH_NAME, h-group_id,
-   print_alua_state(h-state));
-   }
-done:
-   req-end_io_data = NULL;
-   __blk_put_request(req-q, req);
-   if (h-callback_fn) {
-   h-callback_fn(h-callback_data, err);
-   h-callback_fn = h-callback_data = NULL;
-   }
-   return;
-}
-
-/*
  * submit_stpg - Issue a SET TARGET GROUP STATES command
  *
  * Currently we're only setting the current target port group state
  * to 'active/optimized' and let the array firmware figure out
  * the states of the remaining groups.
  */
-static unsigned submit_stpg(struct alua_dh_data *h)
+static unsigned submit_stpg(struct scsi_device *sdev, int group_id,
+   unsigned char *sense)
 {
struct request *rq;
+   unsigned char stpg_data[8];
int stpg_len = 8;
-   struct scsi_device *sdev = h-sdev;
+   int err;
 
/* Prepare the data buffer */
-   memset(h-buff, 0, stpg_len);
-   h-buff[4] = TPGS_STATE_OPTIMIZED  0x0f;
-   h-buff[6] = (h-group_id  8)  0xff;
-   h-buff[7] = h-group_id  0xff;
+   memset(stpg_data, 0, stpg_len);
+   stpg_data[4] = TPGS_STATE_OPTIMIZED  0x0f;
+   stpg_data[6] = (group_id  8)  0xff;
+   stpg_data[7] = group_id  0xff;
 
-   rq = get_alua_req(sdev, h-buff, stpg_len, WRITE);
+   rq = get_alua_req(sdev, stpg_data, stpg_len, WRITE);
if (!rq)
return SCSI_DH_RES_TEMP_UNAVAIL;
 
@@ -315,13 +260,22 @@ static unsigned submit_stpg(struct alua_dh_data *h)
rq-cmd[9] = stpg_len  0xff;
rq-cmd_len = COMMAND_SIZE(MAINTENANCE_OUT);
 
-   rq-sense = h-sense;
+   rq-sense = sense;
memset(rq-sense, 0, SCSI_SENSE_BUFFERSIZE);
-   rq-sense_len = h-senselen = 0;
-   rq-end_io_data = h;
+   rq-sense_len = 0;
 
-   blk_execute_rq_nowait(rq-q, NULL, rq, 1, stpg_endio);
-   return SCSI_DH_OK;
+   err = blk_execute_rq(rq-q, NULL, rq, 1);
+   if (err  0) {
+   if (!rq-errors)
+   err = DID_ERROR  16;
+   else
+   err = rq-errors;
+   if (rq-sense_len)
+   err |= (DRIVER_SENSE  24);
+   }
+   blk_put_request(rq);
+
+   return err;
 }
 
 /*
@@ -715,6 +669,69 @@ static int alua_rtpg(struct scsi_device *sdev, struct 
alua_dh_data *h, int wait_
 }
 
 /*
+ * alua_stpg - Issue a SET TARGET GROUP STATES command
+ *
+ * Issue a SET TARGET GROUP STATES command and evaluate the
+ * response. Returns SCSI_DH_RETRY per default to trigger
+ * a re-evaluation of the target group state.
+ */
+static unsigned alua_stpg(struct scsi_device *sdev, struct alua_dh_data *h)
+{
+   int retval, err = SCSI_DH_RETRY;
+   unsigned char 

[PATCH 09/16] scsi_dh_alua: simplify sense code handling

2014-01-31 Thread Hannes Reinecke
Most sense code is already handled in the generic
code, so we shouldn't be adding special cases here.
However, when doing so we need to check for
unit attention whenever we're sending an internal
command.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 33 +-
 1 file changed, 5 insertions(+), 28 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index 857a999..174ff45 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -388,6 +388,8 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
goto out;
}
err = alua_check_sense(sdev, sense_hdr);
+   if (sense_hdr.sense_key == UNIT_ATTENTION)
+   err = ADD_TO_MLQUEUE;
if (err == ADD_TO_MLQUEUE  time_before(jiffies, expiry))
goto retry;
if (err != SUCCESS) {
@@ -617,21 +619,6 @@ static int alua_check_sense(struct scsi_device *sdev,
 * LUN Not Accessible - ALUA state transition
 */
return ADD_TO_MLQUEUE;
-   if (sense_hdr-asc == 0x04  sense_hdr-ascq == 0x0b)
-   /*
-* LUN Not Accessible -- Target port in standby state
-*/
-   return SUCCESS;
-   if (sense_hdr-asc == 0x04  sense_hdr-ascq == 0x0c)
-   /*
-* LUN Not Accessible -- Target port in unavailable 
state
-*/
-   return SUCCESS;
-   if (sense_hdr-asc == 0x04  sense_hdr-ascq == 0x12)
-   /*
-* LUN Not Ready -- Offline
-*/
-   return SUCCESS;
break;
case UNIT_ATTENTION:
if (sense_hdr-asc == 0x29  sense_hdr-ascq == 0x00)
@@ -646,7 +633,7 @@ static int alua_check_sense(struct scsi_device *sdev,
return ADD_TO_MLQUEUE;
if (sense_hdr-asc == 0x2a  sense_hdr-ascq == 0x01)
/*
-* Mode Parameters Changed
+* Mode parameter changed
 */
return ADD_TO_MLQUEUE;
if (sense_hdr-asc == 0x2a  sense_hdr-ascq == 0x06)
@@ -659,18 +646,6 @@ static int alua_check_sense(struct scsi_device *sdev,
 * Implicit ALUA state transition failed
 */
return ADD_TO_MLQUEUE;
-   if (sense_hdr-asc == 0x3f  sense_hdr-ascq == 0x03)
-   /*
-* Inquiry data has changed
-*/
-   return ADD_TO_MLQUEUE;
-   if (sense_hdr-asc == 0x3f  sense_hdr-ascq == 0x0e)
-   /*
-* REPORTED_LUNS_DATA_HAS_CHANGED is reported
-* when switching controllers on targets like
-* Intel Multi-Flex. We can just retry.
-*/
-   return ADD_TO_MLQUEUE;
break;
}
 
@@ -735,6 +710,8 @@ static int alua_rtpg(struct scsi_device *sdev, struct 
alua_port_group *pg, int w
}
 
err = alua_check_sense(sdev, sense_hdr);
+   if (sense_hdr.sense_key == UNIT_ATTENTION)
+   err = ADD_TO_MLQUEUE;
if (err == ADD_TO_MLQUEUE  time_before(jiffies, expiry)) {
sdev_printk(KERN_ERR, sdev, %s: rtpg retry, ,
ALUA_DH_NAME);
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/16] scsi_dh_alua: Pass buffer as function argument

2014-01-31 Thread Hannes Reinecke
Pass in the buffer as a function argument for submit_vpd() and
submit_rtpg().

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 44 --
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index ece2255..5358c2f 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -143,12 +143,13 @@ static struct request *get_alua_req(struct scsi_device 
*sdev,
  * submit_vpd_inquiry - Issue an INQUIRY VPD page 0x83 command
  * @sdev: sdev the command should be sent to
  */
-static int submit_vpd_inquiry(struct scsi_device *sdev, struct alua_dh_data *h)
+static int submit_vpd_inquiry(struct scsi_device *sdev, unsigned char *buff,
+ int bufflen, unsigned char *sense)
 {
struct request *rq;
int err;
 
-   rq = get_alua_req(sdev, h-buff, h-bufflen, READ);
+   rq = get_alua_req(sdev, buff, bufflen, READ);
if (!rq) {
err = DRIVER_BUSY  24;
goto done;
@@ -158,12 +159,12 @@ static int submit_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
rq-cmd[0] = INQUIRY;
rq-cmd[1] = 1;
rq-cmd[2] = 0x83;
-   rq-cmd[4] = h-bufflen;
+   rq-cmd[4] = bufflen;
rq-cmd_len = COMMAND_SIZE(INQUIRY);
 
-   rq-sense = h-sense;
+   rq-sense = sense;
memset(rq-sense, 0, SCSI_SENSE_BUFFERSIZE);
-   rq-sense_len = h-senselen = 0;
+   rq-sense_len = 0;
 
err = blk_execute_rq(rq-q, NULL, rq, 1);
if (err  0) {
@@ -171,7 +172,8 @@ static int submit_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
err = DID_ERROR  16;
else
err = rq-errors;
-   h-senselen = rq-sense_len;
+   if (rq-sense_len)
+   err |= (DRIVER_SENSE  24);
}
blk_put_request(rq);
 done:
@@ -182,12 +184,13 @@ done:
  * submit_rtpg - Issue a REPORT TARGET GROUP STATES command
  * @sdev: sdev the command should be sent to
  */
-static unsigned submit_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
+static unsigned submit_rtpg(struct scsi_device *sdev, unsigned char *buff,
+   int bufflen, unsigned char *sense, int flags)
 {
struct request *rq;
int err;
 
-   rq = get_alua_req(sdev, h-buff, h-bufflen, READ);
+   rq = get_alua_req(sdev, buff, bufflen, READ);
if (!rq) {
err = DRIVER_BUSY  24;
goto done;
@@ -195,19 +198,19 @@ static unsigned submit_rtpg(struct scsi_device *sdev, 
struct alua_dh_data *h)
 
/* Prepare the command. */
rq-cmd[0] = MAINTENANCE_IN;
-   if (!(h-flags  ALUA_RTPG_EXT_HDR_UNSUPP))
+   if (!(flags  ALUA_RTPG_EXT_HDR_UNSUPP))
rq-cmd[1] = MI_REPORT_TARGET_PGS | MI_EXT_HDR_PARAM_FMT;
else
rq-cmd[1] = MI_REPORT_TARGET_PGS;
-   rq-cmd[6] = (h-bufflen  24)  0xff;
-   rq-cmd[7] = (h-bufflen  16)  0xff;
-   rq-cmd[8] = (h-bufflen   8)  0xff;
-   rq-cmd[9] = h-bufflen  0xff;
+   rq-cmd[6] = (bufflen  24)  0xff;
+   rq-cmd[7] = (bufflen  16)  0xff;
+   rq-cmd[8] = (bufflen   8)  0xff;
+   rq-cmd[9] = bufflen  0xff;
rq-cmd_len = COMMAND_SIZE(MAINTENANCE_IN);
 
-   rq-sense = h-sense;
+   rq-sense = sense;
memset(rq-sense, 0, SCSI_SENSE_BUFFERSIZE);
-   rq-sense_len = h-senselen = 0;
+   rq-sense_len = 0;
 
err = blk_execute_rq(rq-q, NULL, rq, 1);
if (err  0) {
@@ -215,7 +218,8 @@ static unsigned submit_rtpg(struct scsi_device *sdev, 
struct alua_dh_data *h)
err = DID_ERROR  16;
else
err = rq-errors;
-   h-senselen = rq-sense_len;
+   if (rq-sense_len)
+   err |= (DRIVER_SENSE  24);
}
blk_put_request(rq);
 done:
@@ -374,11 +378,11 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
 
expiry = round_jiffies_up(jiffies + timeout);
  retry:
-   retval = submit_vpd_inquiry(sdev, h);
+   retval = submit_vpd_inquiry(sdev, h-buff, h-bufflen, h-sense);
if (retval) {
unsigned err;
 
-   if (h-senselen == 0 ||
+   if (!(driver_byte(retval)  DRIVER_SENSE) ||
!scsi_normalize_sense(h-sense, SCSI_SENSE_BUFFERSIZE,
  sense_hdr)) {
sdev_printk(KERN_INFO, sdev,
@@ -575,10 +579,10 @@ static int alua_rtpg(struct scsi_device *sdev, struct 
alua_dh_data *h, int wait_
expiry = round_jiffies_up(jiffies + h-transition_tmo * HZ);
 
  retry:
-   retval = submit_rtpg(sdev, h);
+   retval = submit_rtpg(sdev, h-buff, 

[PATCH 05/16] scsi_dh_alua: put sense buffer on stack

2014-01-31 Thread Hannes Reinecke
There is no need to have the sense buffer as part of the per-device
structure, we can put the sense buffer on the stack.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index ef92008..adc77ef 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -74,8 +74,6 @@ struct alua_dh_data {
unsigned char   *buff;
int bufflen;
unsigned char   transition_tmo;
-   unsigned char   sense[SCSI_SENSE_BUFFERSIZE];
-   int senselen;
struct scsi_device  *sdev;
activate_complete   callback_fn;
void*callback_data;
@@ -325,6 +323,7 @@ static int alua_check_tpgs(struct scsi_device *sdev, struct 
alua_dh_data *h)
 static int alua_vpd_inquiry(struct scsi_device *sdev, struct alua_dh_data *h)
 {
int len, timeout = ALUA_FAILOVER_TIMEOUT;
+   unsigned char sense[SCSI_SENSE_BUFFERSIZE];
struct scsi_sense_hdr sense_hdr;
unsigned retval;
unsigned char *d;
@@ -332,12 +331,12 @@ static int alua_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
 
expiry = round_jiffies_up(jiffies + timeout);
  retry:
-   retval = submit_vpd_inquiry(sdev, h-buff, h-bufflen, h-sense);
+   retval = submit_vpd_inquiry(sdev, h-buff, h-bufflen, sense);
if (retval) {
unsigned err;
 
if (!(driver_byte(retval)  DRIVER_SENSE) ||
-   !scsi_normalize_sense(h-sense, SCSI_SENSE_BUFFERSIZE,
+   !scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE,
  sense_hdr)) {
sdev_printk(KERN_INFO, sdev,
%s: evpd inquiry failed, , ALUA_DH_NAME);
@@ -519,6 +518,7 @@ static int alua_check_sense(struct scsi_device *sdev,
  */
 static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h, int 
wait_for_transition)
 {
+   unsigned char sense[SCSI_SENSE_BUFFERSIZE];
struct scsi_sense_hdr sense_hdr;
int len, k, off, valid_states = 0;
unsigned char *ucp;
@@ -533,11 +533,11 @@ static int alua_rtpg(struct scsi_device *sdev, struct 
alua_dh_data *h, int wait_
expiry = round_jiffies_up(jiffies + h-transition_tmo * HZ);
 
  retry:
-   retval = submit_rtpg(sdev, h-buff, h-bufflen, h-sense, h-flags);
+   retval = submit_rtpg(sdev, h-buff, h-bufflen, sense, h-flags);
 
if (retval) {
if (!(driver_byte(retval)  DRIVER_SENSE) ||
-   !scsi_normalize_sense(h-sense, SCSI_SENSE_BUFFERSIZE,
+   !scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE,
  sense_hdr)) {
sdev_printk(KERN_INFO, sdev, %s: rtpg failed, ,
ALUA_DH_NAME);
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/16] scsi_dh_alua: revert commit a8e5a2d593cbfccf530c3382c2c328d2edaa7b66

2014-01-31 Thread Hannes Reinecke
Obsoleted by the next patch.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index 7f03417..6591ac1 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -678,13 +678,12 @@ static int alua_check_sense(struct scsi_device *sdev,
 /*
  * alua_rtpg - Evaluate REPORT TARGET GROUP STATES
  * @sdev: the device to be evaluated.
- * @wait_for_transition: if nonzero, wait ALUA_FAILOVER_TIMEOUT seconds for 
device to exit transitioning state
  *
  * Evaluate the Target Port Group State.
  * Returns SCSI_DH_DEV_OFFLINED if the path is
  * found to be unusable.
  */
-static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg, int 
wait_for_transition)
+static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
 {
unsigned char sense[SCSI_SENSE_BUFFERSIZE];
struct scsi_sense_hdr sense_hdr;
@@ -774,7 +773,7 @@ static int alua_rtpg(struct scsi_device *sdev, struct 
alua_port_group *pg, int w
else
pg-transition_tmo = ALUA_FAILOVER_TIMEOUT;
 
-   if (wait_for_transition  (orig_transition_tmo != pg-transition_tmo)) 
{
+   if (orig_transition_tmo != pg-transition_tmo) {
sdev_printk(KERN_INFO, sdev,
%s: transition timeout set to %d seconds\n,
ALUA_DH_NAME, pg-transition_tmo);
@@ -812,19 +811,14 @@ static int alua_rtpg(struct scsi_device *sdev, struct 
alua_port_group *pg, int w
 
switch (pg-state) {
case TPGS_STATE_TRANSITIONING:
-   if (wait_for_transition) {
-   if (time_before(jiffies, expiry)) {
-   /* State transition, retry */
-   interval += 2000;
-   msleep(interval);
-   goto retry;
-   }
-   err = SCSI_DH_RETRY;
-   } else {
-   err = SCSI_DH_OK;
+   if (time_before(jiffies, expiry)) {
+   /* State transition, retry */
+   interval += 2000;
+   msleep(interval);
+   goto retry;
}
-
/* Transitioning time exceeded, set port to standby */
+   err = SCSI_DH_RETRY;
pg-state = TPGS_STATE_STANDBY;
break;
case TPGS_STATE_OFFLINE:
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/16] scsi_dh: return individual errors in scsi_dh_activate()

2014-01-31 Thread Hannes Reinecke
When calling scsi_dh_activate() we should be returning
individual errors and not lumping all into one.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh.c 
b/drivers/scsi/device_handler/scsi_dh.c
index 33e422e..ae7f399 100644
--- a/drivers/scsi/device_handler/scsi_dh.c
+++ b/drivers/scsi/device_handler/scsi_dh.c
@@ -381,7 +381,7 @@ EXPORT_SYMBOL_GPL(scsi_unregister_device_handler);
  */
 int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data)
 {
-   int err = 0;
+   int err = SCSI_DH_OK;
unsigned long flags;
struct scsi_device *sdev;
struct scsi_device_handler *scsi_dh = NULL;
@@ -400,15 +400,18 @@ int scsi_dh_activate(struct request_queue *q, 
activate_complete fn, void *data)
if (sdev-scsi_dh_data)
scsi_dh = sdev-scsi_dh_data-scsi_dh;
dev = get_device(sdev-sdev_gendev);
-   if (!scsi_dh || !dev ||
-   sdev-sdev_state == SDEV_CANCEL ||
-   sdev-sdev_state == SDEV_DEL)
+   if (!scsi_dh)
err = SCSI_DH_NOSYS;
-   if (sdev-sdev_state == SDEV_OFFLINE)
+   else if (!dev)
+   err = SCSI_DH_DEV_OFFLINED;
+   else if (sdev-sdev_state == SDEV_CANCEL ||
+sdev-sdev_state == SDEV_DEL)
+   err = SCSI_DH_NOTCONN;
+   else if (sdev-sdev_state == SDEV_OFFLINE)
err = SCSI_DH_DEV_OFFLINED;
spin_unlock_irqrestore(q-queue_lock, flags);
 
-   if (err) {
+   if (err != SCSI_DH_OK) {
if (fn)
fn(data, err);
goto out;
@@ -417,7 +420,8 @@ int scsi_dh_activate(struct request_queue *q, 
activate_complete fn, void *data)
if (scsi_dh-activate)
err = scsi_dh-activate(sdev, fn, data);
 out:
-   put_device(dev);
+   if (dev)
+   put_device(dev);
return err;
 }
 EXPORT_SYMBOL_GPL(scsi_dh_activate);
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/16] scsi_dh_alua: Improve error handling

2014-01-31 Thread Hannes Reinecke
Improve error handling and use standard logging functions
instead of hand-crafted ones.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 150 +++--
 1 file changed, 98 insertions(+), 52 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index 5248c88..e4e5497 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -23,6 +23,7 @@
 #include linux/delay.h
 #include linux/module.h
 #include scsi/scsi.h
+#include scsi/scsi_dbg.h
 #include scsi/scsi_eh.h
 #include scsi/scsi_dh.h
 
@@ -144,11 +145,13 @@ static struct request *get_alua_req(struct scsi_device 
*sdev,
 static int submit_vpd_inquiry(struct scsi_device *sdev, struct alua_dh_data *h)
 {
struct request *rq;
-   int err = SCSI_DH_RES_TEMP_UNAVAIL;
+   int err;
 
rq = get_alua_req(sdev, h-buff, h-bufflen, READ);
-   if (!rq)
+   if (!rq) {
+   err = DRIVER_BUSY  24;
goto done;
+   }
 
/* Prepare the command. */
rq-cmd[0] = INQUIRY;
@@ -162,12 +165,12 @@ static int submit_vpd_inquiry(struct scsi_device *sdev, 
struct alua_dh_data *h)
rq-sense_len = h-senselen = 0;
 
err = blk_execute_rq(rq-q, NULL, rq, 1);
-   if (err == -EIO) {
-   sdev_printk(KERN_INFO, sdev,
-   %s: evpd inquiry failed with %x\n,
-   ALUA_DH_NAME, rq-errors);
+   if (err  0) {
+   if (!rq-errors)
+   err = DID_ERROR  16;
+   else
+   err = rq-errors;
h-senselen = rq-sense_len;
-   err = SCSI_DH_IO;
}
blk_put_request(rq);
 done:
@@ -182,11 +185,13 @@ static unsigned submit_rtpg(struct scsi_device *sdev, 
struct alua_dh_data *h,
bool rtpg_ext_hdr_req)
 {
struct request *rq;
-   int err = SCSI_DH_RES_TEMP_UNAVAIL;
+   int err;
 
rq = get_alua_req(sdev, h-buff, h-bufflen, READ);
-   if (!rq)
+   if (!rq) {
+   err = DRIVER_BUSY  24;
goto done;
+   }
 
/* Prepare the command. */
rq-cmd[0] = MAINTENANCE_IN;
@@ -205,12 +210,12 @@ static unsigned submit_rtpg(struct scsi_device *sdev, 
struct alua_dh_data *h,
rq-sense_len = h-senselen = 0;
 
err = blk_execute_rq(rq-q, NULL, rq, 1);
-   if (err == -EIO) {
-   sdev_printk(KERN_INFO, sdev,
-   %s: rtpg failed with %x\n,
-   ALUA_DH_NAME, rq-errors);
+   if (err  0) {
+   if (!rq-errors)
+   err = DID_ERROR  16;
+   else
+   err = rq-errors;
h-senselen = rq-sense_len;
-   err = SCSI_DH_IO;
}
blk_put_request(rq);
 done:
@@ -218,13 +223,11 @@ done:
 }
 
 /*
- * alua_stpg - Evaluate SET TARGET GROUP STATES
+ * stpg_endio - Evaluate SET TARGET GROUP STATES
  * @sdev: the device to be evaluated
  * @state: the new target group state
  *
- * Send a SET TARGET GROUP STATES command to the device.
- * We only have to test here if we should resubmit the command;
- * any other error is assumed as a failure.
+ * Evaluate a SET TARGET GROUP STATES command response.
  */
 static void stpg_endio(struct request *req, int error)
 {
@@ -239,9 +242,8 @@ static void stpg_endio(struct request *req, int error)
}
 
if (req-sense_len  0) {
-   err = scsi_normalize_sense(h-sense, SCSI_SENSE_BUFFERSIZE,
-  sense_hdr);
-   if (!err) {
+   if (!scsi_normalize_sense(h-sense, SCSI_SENSE_BUFFERSIZE,
+ sense_hdr)) {
err = SCSI_DH_IO;
goto done;
}
@@ -250,10 +252,12 @@ static void stpg_endio(struct request *req, int error)
err = SCSI_DH_RETRY;
goto done;
}
-   sdev_printk(KERN_INFO, h-sdev,
-   %s: stpg sense code: %02x/%02x/%02x\n,
-   ALUA_DH_NAME, sense_hdr.sense_key,
-   sense_hdr.asc, sense_hdr.ascq);
+   sdev_printk(KERN_INFO, h-sdev, %s: stpg failed, ,
+   ALUA_DH_NAME);
+   scsi_show_sense_hdr(sense_hdr);
+   sdev_printk(KERN_INFO, h-sdev, %s: stpg failed, ,
+   ALUA_DH_NAME);
+   scsi_show_extd_sense(sense_hdr.asc, sense_hdr.ascq);
err = SCSI_DH_IO;
} else if (error)
err = SCSI_DH_IO;
@@ -362,15 +366,43 @@ static int alua_check_tpgs(struct scsi_device *sdev, 
struct alua_dh_data *h)
  */
 static int alua_vpd_inquiry(struct scsi_device *sdev, struct 

[PATCH 13/16] scsi_dh_alua: Clarify logging message

2014-01-31 Thread Hannes Reinecke
We should be diffentiating between an invalid TPGS setting
and unsupported.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index 8ea35a9..7f03417 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -347,12 +347,18 @@ static int alua_check_tpgs(struct scsi_device *sdev, 
struct alua_dh_data *h)
sdev_printk(KERN_INFO, sdev, %s: supports implicit TPGS\n,
ALUA_DH_NAME);
break;
-   default:
-   h-tpgs = TPGS_MODE_NONE;
+   case 0:
sdev_printk(KERN_INFO, sdev, %s: not supported\n,
ALUA_DH_NAME);
err = SCSI_DH_DEV_UNSUPP;
break;
+   default:
+   sdev_printk(KERN_INFO, sdev,
+   %s: unsupported TPGS setting %d\n,
+   ALUA_DH_NAME, h-tpgs);
+   h-tpgs = TPGS_MODE_NONE;
+   err = SCSI_DH_DEV_UNSUPP;
+   break;
}
 
return err;
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] st: Do not rewind for SG_IO

2014-01-31 Thread Jeremy Linton
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 1/31/2014 2:46 AM, Hannes Reinecke wrote:

 This patch make the tape always non-rewinding when SG_IO is used, thus
 allowing udev to get a proper device id for tapes.

This is wholly bad. Just because someone fires a SG ioctl at the device
(usually to perform an operation that cannot be done with the st_ops) doesn't
mean they don't want the tape rewound on close.



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJS69EBAAoJEL5i86xrzcy7Ke0IAJbFISKHpJXuuWkK5EveElgG
8+Oy/ndRTSilqg5Ghn4Givr6LnVgs2hZVu6RUz3Y4WADwehxMof3iq6VhqN8bwkr
Zun40DxZAwrxAQQJ8jn+0grKbiL/GdkTr6CwVJ7AUC1odFUOXd9tCqKa8YEzsRwQ
dfoHBqU3cgGFir/l9wlvz0n+9kR4O3Y81IzCTJNAaLNRDelss6eqKEXuRI/53/5y
K5WcYSxHNvqpBLWlRRF2fouyrxiVdsYr4WGoJZf9ReMK5UV8Ztr3YFG7HsRAAyTA
b9PzWQF160U73sh6UFIjxG1UNmkBMxilLdQTJWfVHrQTeWakXRIV9gYB/0Z2l2Q=
=teWU
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] st: Do not rewind for SG_IO

2014-01-31 Thread Jeremy Linton
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 1/31/2014 2:46 AM, Hannes Reinecke wrote:

 This patch make the tape always non-rewinding when SG_IO is used, thus
 allowing udev to get a proper device id for tapes.

Maybe instead of silently changing the behavior, if you just _HAVE_ to 
open
the st device, add an ioctl or st/mt_op that disables the rewind on close.
That way applications have to explicitly disable the rewind on close.







-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJS69K7AAoJEL5i86xrzcy7aC8IAJWoag7UaFselARB6eZ4Zfvm
qi0Fho04TkqnNUJ5VEU81p05XwPJrmonrmqK55kR0PVkMT3o4Wp/KpkeN7gwrQjx
ecR1Ckpoo4Q6n3W/HY06amN6qxLHgwi8RuU9vF7gjRZP4xqW57WRZz1GcuerD94n
tF/i2Ajev6ZsdmRSCUN9DDFDR5RNKZ+XmiX3ihx4L1v27I/zMEteO66pDEIRdCoM
laJnzsEfh/VNZdLeB3wck5xnW6HVq9YgqtH/oV+2LHeHg/Ji626g5/qsjhaA3YJQ
asol8MJsbBGIcaRKEa9EJYy76GVFyCkMLywVFEyN7F9xcFD75P5p2a4siMlYzBc=
=7Viv
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][ATTEND] Plumbing I/O Cache / Tiered Storage Hints

2014-01-31 Thread Dan Williams
On Tue, Jan 28, 2014 at 2:57 PM, Dan Williams dan.j.willi...@intel.com wrote:
 In addition to disk drives with internally tiered-storage (solid-state
 + magnetic media) the kernel has also grown native I/O-caching /
 tiered-storage implementations in bcache and dm-cache.

 Currently, all these solutions depend on heuristics to determine what
 tier the data referenced in an I/O belongs.  However, the presence of
 hinting proposals from SCSI [1], ATA [2], and bcache [3] indicate that
 these devices (hardware or virtual) want to consume explicit hints
 indicating the value of caching data in a higher performing tier.

 At LSF I want to discuss options and opportunities for plumbing cache
 hints from userspace, through the I/O stack to devices.  My colleague,
 Jason Akers, is also interested in this discussion as he is
 investigating how to exploit these hints from userspace.  I
 participated in the LSF 2012 discussion of this topic, see that it was
 raised again at LSF 2013, and note that we have not settled on an
 enabling path.  What's new for this year is an effort to set aside,
 for now, the deeper complexities of the device specification proposals
 and focus on a minimal set of hints that can be specified per-process
 (ionice) and maybe per-file (fadvise).

 My suspicion is that AIO attributes is useful for applications that
 want access to the full range of access hints exposed by a device.
 However, for the general buffered-I/O / tiered-storage case, a small
 set of hints achieves the bulk of the value.

 I am also interested in:
 Volatile ranges
 SMR Enabling
 Integrity passthrough

 --
 Dan

 [1] T10 LBA Access Hints
 [2] T13 Hybrid Information Feature

Correction, the hinting scheme is defined by SATA-IO in SATA 3.2

 [3] AIO Attributes: http://marc.info/?l=linux-aiom=136580574523674w=2
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][ATTEND] Plumbing I/O Cache / Tiered Storage Hints

2014-01-31 Thread Marc C
Hi,

 [2] T13 Hybrid Information Feature

 Correction, the hinting scheme is defined by SATA-IO in SATA 3.2

I, too, would also like to know of the direction for supporting hybrid
hinting, since we already have support for SEND/RECV FPDMA queued in the
kernel. The new commands were used initially for queued TRIMs, but would
also be very useful on SSHD devices, especially since the commands
allows initiators to provide hints on whether data is considered hot
or cold, without requiring I/O.

-Marc

On 01/31/2014 10:18 AM, Dan Williams wrote:
 On Tue, Jan 28, 2014 at 2:57 PM, Dan Williams dan.j.willi...@intel.com 
 wrote:
 In addition to disk drives with internally tiered-storage (solid-state
 + magnetic media) the kernel has also grown native I/O-caching /
 tiered-storage implementations in bcache and dm-cache.

 Currently, all these solutions depend on heuristics to determine what
 tier the data referenced in an I/O belongs.  However, the presence of
 hinting proposals from SCSI [1], ATA [2], and bcache [3] indicate that
 these devices (hardware or virtual) want to consume explicit hints
 indicating the value of caching data in a higher performing tier.

 At LSF I want to discuss options and opportunities for plumbing cache
 hints from userspace, through the I/O stack to devices.  My colleague,
 Jason Akers, is also interested in this discussion as he is
 investigating how to exploit these hints from userspace.  I
 participated in the LSF 2012 discussion of this topic, see that it was
 raised again at LSF 2013, and note that we have not settled on an
 enabling path.  What's new for this year is an effort to set aside,
 for now, the deeper complexities of the device specification proposals
 and focus on a minimal set of hints that can be specified per-process
 (ionice) and maybe per-file (fadvise).

 My suspicion is that AIO attributes is useful for applications that
 want access to the full range of access hints exposed by a device.
 However, for the general buffered-I/O / tiered-storage case, a small
 set of hints achieves the bulk of the value.

 I am also interested in:
 Volatile ranges
 SMR Enabling
 Integrity passthrough

 --
 Dan

 [1] T10 LBA Access Hints
 [2] T13 Hybrid Information Feature
 
 Correction, the hinting scheme is defined by SATA-IO in SATA 3.2
 
 [3] AIO Attributes: http://marc.info/?l=linux-aiom=136580574523674w=2
 --
 To unsubscribe from this list: send the line unsubscribe linux-ide in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[LSF/MM TOPIC] Fixing large block devices on 32 bit

2014-01-31 Thread James Bottomley
It has been reported:

http://marc.info/?t=13911144726

That large block devices (specifically devices  16TB) crash when
mounted on 32 bit systems.  The problem specifically is that although
CONFIG_LBDAF extends the size of sector_t within the block and storage
layers to 64 bits, the buffer cache isn't big enough.  Specifically,
buffers are mapped through a single page cache mapping on the backing
device inode.  The size of the allowed offset in the page cache radix
tree is pgoff_t which is 32 bits, so once the size of device goes beyond
16TB, this offset wraps and all hell breaks loose.

The problem is that although the current single drive limit is about
4TB, it will only be a couple of years before 16TB devices are
available.  By then, I bet that most arm (and other exotic CPU) Linux
based personal file servers are still going to be 32 bit, so they're not
going to be able to take this generation (or beyond) of drives.  The
thing I'd like to discuss is how to fix this.  There are several options
I see, but there might be others.

 1. Try to pretend that CONFIG_LBDAF is supposed to cap out at 16TB
and there's nothing we can do about it ... this won't be at all
popular with arm based file server manufacturers.
 2. Slyly make sure that the buffer cache won't go over 16TB by
keeping filesystem metadata below that limit ... the horse has
probably already bolted on this one.
 3. Increase pgoff_t and the radix tree indexes to u64 for
CONFIG_LBDAF.  This will blow out the size of struct page on 32
bits by 4 bytes and may have other knock on effects, but at
least it will be transparent.
 4. add an additional radix tree lookup within the buffer cache, so
instead of a single inode for the buffer cache, we have a radix
tree of them which are added and removed at the granularity of
16TB offsets as entries are requested.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC] Fixing large block devices on 32 bit

2014-01-31 Thread Dave Jones
On Fri, Jan 31, 2014 at 11:02:58AM -0800, James Bottomley wrote:
 
  it will only be a couple of years before 16TB devices are
  available.  By then, I bet that most arm (and other exotic CPU) Linux
  based personal file servers are still going to be 32 bit, so they're not
  going to be able to take this generation (or beyond) of drives. 
  
   1. Try to pretend that CONFIG_LBDAF is supposed to cap out at 16TB
  and there's nothing we can do about it ... this won't be at all
  popular with arm based file server manufacturers.

Some of the higher end home-NAS's have already moved from arm/ppc - x86_64[1]
Unless ARM64 starts appearing at a low enough price point, I wouldn't be 
surprised to see the smaller vendors do a similar move just to stay competitive.
(probably while keeping 'legacy' product lines for a while at a cheaper 
pricepoint
 that won't take bigger disks).

Dave

[1] http://forum.synology.com/wiki/index.php/What_kind_of_CPU_does_my_NAS_have

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[LSF/MM ATTEND]

2014-01-31 Thread Christopher Voltz

I would like to attend LSF/MM 2014. I am interested in discussions regarding 
scsi-mq, scsi-eh, scsi-over-pcie, I/O caching, and enabling of persistent 
memory. I work on LLDD for the HP Smart Array.

Christopher

attachment: christopher_voltz.vcf

[PATCH] qla2xxx: Remove last vestiges of qla_tgt_cmd.cmd_list

2014-01-31 Thread Roland Dreier
From: Roland Dreier rol...@purestorage.com

The only place this struct member is touched is in one INIT_LIST_HEAD.

Signed-off-by: Roland Dreier rol...@purestorage.com
---
 drivers/scsi/qla2xxx/qla_target.c | 2 --
 drivers/scsi/qla2xxx/qla_target.h | 1 -
 2 files changed, 3 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c 
b/drivers/scsi/qla2xxx/qla_target.c
index 38a1257e76e1..2f42b650367c 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -2593,8 +2593,6 @@ static int qlt_handle_cmd_for_atio(struct scsi_qla_host 
*vha,
return -ENOMEM;
}
 
-   INIT_LIST_HEAD(cmd-cmd_list);
-
memcpy(cmd-atio, atio, sizeof(*atio));
cmd-state = QLA_TGT_STATE_NEW;
cmd-tgt = ha-tgt.qla_tgt;
diff --git a/drivers/scsi/qla2xxx/qla_target.h 
b/drivers/scsi/qla2xxx/qla_target.h
index b33e411f28a0..f4a4beee2b96 100644
--- a/drivers/scsi/qla2xxx/qla_target.h
+++ b/drivers/scsi/qla2xxx/qla_target.h
@@ -855,7 +855,6 @@ struct qla_tgt_cmd {
uint16_t loop_id;   /* to save extra sess dereferences */
struct qla_tgt *tgt;/* to save extra sess dereferences */
struct scsi_qla_host *vha;
-   struct list_head cmd_list;
 
struct atio_from_isp atio;
 };
-- 
1.9.rc1

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC] Fixing large block devices on 32 bit

2014-01-31 Thread Chris Mason

On 01/31/2014 02:02 PM, James Bottomley wrote:

It has been reported:

http://marc.info/?t=13911144726

That large block devices (specifically devices  16TB) crash when
mounted on 32 bit systems.  The problem specifically is that although
CONFIG_LBDAF extends the size of sector_t within the block and storage
layers to 64 bits, the buffer cache isn't big enough.  Specifically,
buffers are mapped through a single page cache mapping on the backing
device inode.  The size of the allowed offset in the page cache radix
tree is pgoff_t which is 32 bits, so once the size of device goes beyond
16TB, this offset wraps and all hell breaks loose.

The problem is that although the current single drive limit is about
4TB, it will only be a couple of years before 16TB devices are
available.  By then, I bet that most arm (and other exotic CPU) Linux
based personal file servers are still going to be 32 bit, so they're not
going to be able to take this generation (or beyond) of drives.  The
thing I'd like to discuss is how to fix this.  There are several options
I see, but there might be others.

  1. Try to pretend that CONFIG_LBDAF is supposed to cap out at 16TB
 and there's nothing we can do about it ... this won't be at all
 popular with arm based file server manufacturers.
  2. Slyly make sure that the buffer cache won't go over 16TB by
 keeping filesystem metadata below that limit ... the horse has
 probably already bolted on this one.
  3. Increase pgoff_t and the radix tree indexes to u64 for
 CONFIG_LBDAF.  This will blow out the size of struct page on 32
 bits by 4 bytes and may have other knock on effects, but at
 least it will be transparent.
  4. add an additional radix tree lookup within the buffer cache, so
 instead of a single inode for the buffer cache, we have a radix
 tree of them which are added and removed at the granularity of
 16TB offsets as entries are requested.



I started typing up that #3 is going to cause problems with RCU radix, 
but it looks ok.  I think we'll find a really scarey number of places 
that interchange pgoff_t with unsigned long though.


I prefer #4, but it means each FS needs to add code too.  We assume 
page_offset(page) maps to the disk in more than a few places.


-chris


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC] Fixing large block devices on 32 bit

2014-01-31 Thread Dave Hansen
On 01/31/2014 11:02 AM, James Bottomley wrote:
  3. Increase pgoff_t and the radix tree indexes to u64 for
 CONFIG_LBDAF.  This will blow out the size of struct page on 32
 bits by 4 bytes and may have other knock on effects, but at
 least it will be transparent.

I'm not sure how many acrobatics we want to go through for 32-bit, but...

Between page-mapping and page-index, we have 64 bits of space, which
*should* be plenty to uniquely identify a block.  We could easily add a
second-level lookup somewhere so that we store some cookie for the
address_space instead of a direct pointer.  How many devices would need,
practically?  8 bits worth?
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qla2xxx: Remove last vestiges of qla_tgt_cmd.cmd_list

2014-01-31 Thread Nicholas A. Bellinger
On Fri, 2014-01-31 at 13:11 -0800, Roland Dreier wrote:
 From: Roland Dreier rol...@purestorage.com
 
 The only place this struct member is touched is in one INIT_LIST_HEAD.
 
 Signed-off-by: Roland Dreier rol...@purestorage.com
 ---

Applied to target-pending/queue.

Thanks Roland!

--nab

  drivers/scsi/qla2xxx/qla_target.c | 2 --
  drivers/scsi/qla2xxx/qla_target.h | 1 -
  2 files changed, 3 deletions(-)
 
 diff --git a/drivers/scsi/qla2xxx/qla_target.c 
 b/drivers/scsi/qla2xxx/qla_target.c
 index 38a1257e76e1..2f42b650367c 100644
 --- a/drivers/scsi/qla2xxx/qla_target.c
 +++ b/drivers/scsi/qla2xxx/qla_target.c
 @@ -2593,8 +2593,6 @@ static int qlt_handle_cmd_for_atio(struct scsi_qla_host 
 *vha,
   return -ENOMEM;
   }
  
 - INIT_LIST_HEAD(cmd-cmd_list);
 -
   memcpy(cmd-atio, atio, sizeof(*atio));
   cmd-state = QLA_TGT_STATE_NEW;
   cmd-tgt = ha-tgt.qla_tgt;
 diff --git a/drivers/scsi/qla2xxx/qla_target.h 
 b/drivers/scsi/qla2xxx/qla_target.h
 index b33e411f28a0..f4a4beee2b96 100644
 --- a/drivers/scsi/qla2xxx/qla_target.h
 +++ b/drivers/scsi/qla2xxx/qla_target.h
 @@ -855,7 +855,6 @@ struct qla_tgt_cmd {
   uint16_t loop_id;   /* to save extra sess dereferences */
   struct qla_tgt *tgt;/* to save extra sess dereferences */
   struct scsi_qla_host *vha;
 - struct list_head cmd_list;
  
   struct atio_from_isp atio;
  };


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC] Fixing large block devices on 32 bit

2014-01-31 Thread James Bottomley
On Fri, 2014-01-31 at 16:20 -0500, Chris Mason wrote:
 On 01/31/2014 02:02 PM, James Bottomley wrote:
  It has been reported:
 
  http://marc.info/?t=13911144726
 
  That large block devices (specifically devices  16TB) crash when
  mounted on 32 bit systems.  The problem specifically is that although
  CONFIG_LBDAF extends the size of sector_t within the block and storage
  layers to 64 bits, the buffer cache isn't big enough.  Specifically,
  buffers are mapped through a single page cache mapping on the backing
  device inode.  The size of the allowed offset in the page cache radix
  tree is pgoff_t which is 32 bits, so once the size of device goes beyond
  16TB, this offset wraps and all hell breaks loose.
 
  The problem is that although the current single drive limit is about
  4TB, it will only be a couple of years before 16TB devices are
  available.  By then, I bet that most arm (and other exotic CPU) Linux
  based personal file servers are still going to be 32 bit, so they're not
  going to be able to take this generation (or beyond) of drives.  The
  thing I'd like to discuss is how to fix this.  There are several options
  I see, but there might be others.
 
1. Try to pretend that CONFIG_LBDAF is supposed to cap out at 16TB
   and there's nothing we can do about it ... this won't be at all
   popular with arm based file server manufacturers.
2. Slyly make sure that the buffer cache won't go over 16TB by
   keeping filesystem metadata below that limit ... the horse has
   probably already bolted on this one.
3. Increase pgoff_t and the radix tree indexes to u64 for
   CONFIG_LBDAF.  This will blow out the size of struct page on 32
   bits by 4 bytes and may have other knock on effects, but at
   least it will be transparent.
4. add an additional radix tree lookup within the buffer cache, so
   instead of a single inode for the buffer cache, we have a radix
   tree of them which are added and removed at the granularity of
   16TB offsets as entries are requested.
 
 
 I started typing up that #3 is going to cause problems with RCU radix, 
 but it looks ok.  I think we'll find a really scarey number of places 
 that interchange pgoff_t with unsigned long though.

Yes, beyond the performance issues of doing 64 bits in the radix tree,
it does look reasonably safe.

 I prefer #4, but it means each FS needs to add code too.  We assume 
 page_offset(page) maps to the disk in more than a few places.

Hmm, yes, that's just a few cases of the readahead code, though, isn't
it?  The necessary fixes look fairly small per filesystem.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC] Fixing large block devices on 32 bit

2014-01-31 Thread James Bottomley
On Fri, 2014-01-31 at 14:26 -0500, Dave Jones wrote:
 On Fri, Jan 31, 2014 at 11:02:58AM -0800, James Bottomley wrote:
  
   it will only be a couple of years before 16TB devices are
   available.  By then, I bet that most arm (and other exotic CPU) Linux
   based personal file servers are still going to be 32 bit, so they're not
   going to be able to take this generation (or beyond) of drives. 
   
1. Try to pretend that CONFIG_LBDAF is supposed to cap out at 16TB
   and there's nothing we can do about it ... this won't be at all
   popular with arm based file server manufacturers.
 
 Some of the higher end home-NAS's have already moved from arm/ppc - x86_64[1]
 Unless ARM64 starts appearing at a low enough price point, I wouldn't be 
 surprised to see the smaller vendors do a similar move just to stay 
 competitive.
 (probably while keeping 'legacy' product lines for a while at a cheaper 
 pricepoint
  that won't take bigger disks).

So yould you bet on the problem solving itself *before* we get 16TB
disks?  Because if we ignore it, that's the bet we're making.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC] Fixing large block devices on 32 bit

2014-01-31 Thread James Bottomley
On Fri, 2014-01-31 at 13:47 -0800, Dave Hansen wrote:
 On 01/31/2014 11:02 AM, James Bottomley wrote:
   3. Increase pgoff_t and the radix tree indexes to u64 for
  CONFIG_LBDAF.  This will blow out the size of struct page on 32
  bits by 4 bytes and may have other knock on effects, but at
  least it will be transparent.
 
 I'm not sure how many acrobatics we want to go through for 32-bit, but...

That's partly the question: 32 bits was dying in the x86 space (at least
until quark), but it's still predominant in embedded.

 Between page-mapping and page-index, we have 64 bits of space, which
 *should* be plenty to uniquely identify a block.  We could easily add a
 second-level lookup somewhere so that we store some cookie for the
 address_space instead of a direct pointer.  How many devices would need,
 practically?  8 bits worth?

That might work.  8 bits would get us up to 4PB, which is looking a bit
high for single disk spinning rust.  However, how would the cookie work
efficiently? remember we'll be doing this lookup every time we pull a
page out of the page cache.  And the problem is that most of our lookups
will be on file inodes, which won't be  16TB, so it's a lot of overhead
in the generic machinery for a problem that only occurs on buffer
related page cache lookups.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC] Fixing large block devices on 32 bit

2014-01-31 Thread Dave Hansen
On 01/31/2014 04:25 PM, Kirill A. Shutemov wrote:
  I think all we have to do is set a low bit in page-mapping
 It's already in use to say page-mapping is anon_vma. ;)

I weasel-worded that by not saying *THE* low bit. ;)

We find *some* discriminator whether it be a page flag or an actual bit
in page-mapping, or a magic value that doesn't collide with the
existing PAGE_MAPPING_* flags.

Poor 'struct page'.  It's the doormat of data structures.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC] Fixing large block devices on 32 bit

2014-01-31 Thread Kirill A. Shutemov
On Fri, Jan 31, 2014 at 04:19:43PM -0800, Dave Hansen wrote:
 On 01/31/2014 03:27 PM, James Bottomley wrote:
  On Fri, 2014-01-31 at 13:47 -0800, Dave Hansen wrote:
  On 01/31/2014 11:02 AM, James Bottomley wrote:
   3. Increase pgoff_t and the radix tree indexes to u64 for
  CONFIG_LBDAF.  This will blow out the size of struct page on 32
  bits by 4 bytes and may have other knock on effects, but at
  least it will be transparent.
 
  I'm not sure how many acrobatics we want to go through for 32-bit, but...
  
  That's partly the question: 32 bits was dying in the x86 space (at least
  until quark), but it's still predominant in embedded.
  
  Between page-mapping and page-index, we have 64 bits of space, which
  *should* be plenty to uniquely identify a block.  We could easily add a
  second-level lookup somewhere so that we store some cookie for the
  address_space instead of a direct pointer.  How many devices would need,
  practically?  8 bits worth?
  
  That might work.  8 bits would get us up to 4PB, which is looking a bit
  high for single disk spinning rust.  However, how would the cookie work
  efficiently? remember we'll be doing this lookup every time we pull a
  page out of the page cache.  And the problem is that most of our lookups
  will be on file inodes, which won't be  16TB, so it's a lot of overhead
  in the generic machinery for a problem that only occurs on buffer
  related page cache lookups.
 
 I think all we have to do is set a low bit in page-mapping

It's already in use to say page-mapping is anon_vma. ;)

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[LSF/MM TOPIC] SMR: Disrupting recording technology meriting a new class of storage device

2014-01-31 Thread Albert Chen
[LSF/MM TOPIC] SMR: Disrupting recording technology meriting a new class of 
storage device

Shingle Magnetic Recording is a disruptive technology that delivers the next 
areal density gain for the HDD industry by partially overlapping tracks. 
Shingling requires physical writes to be sequential, and opens the question of 
how to address this behavior at a system level. Two general approaches 
contemplated are to either to do the block management in the device or in the 
host storage stack/file system through Zone Block Commands (ZBC).

The use of ZBC to handle SMR block management yields several benefits such as:
- Predictable performance and latency
- Faster development time
- Access to application and system level semantic information
- Scalability / Fewer Drive Resources
- Higher reliability

Essential to a host managed approach (ZBC) is the openness of Linux and its 
community is a good place for WD to validate and seek feedback for our thinking 
- where in the Linux system stack is the best place to add ZBC handling? at the 
Device Mapper layer? or somewhere else in the storage stack? New ideas and 
comments are appreciated.

For more information about ZBC, please refer to Ted's ty...@mit.edu email to 
linux-fsde...@vger.kernel.org with the subject  [RFC] Draft Linux kernel 
interfaces for ZBC drives.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cxgb4i: Use cxgb4_select_ntuple to correctly calculate ntuple fields

2014-01-31 Thread Mike Christie

On 1/28/14 7:01 PM, k...@chelsio.com wrote:

[PATCH] cxgb4i: Use cxgb4_select_ntuple to correctly calculate ntuple fields

From: Karen Xie k...@chelsio.com

Fixed calculates wrong tuple values on T5 adapter: switch to use the exported 
API cxgb4_select_ntuple() from cxgb4 base driver.

Signed-off-by: Karen Xie k...@chelsio.com


Patch looks ok to me.

Reviewed-by: Mike Christie micha...@cs.wisc.edu

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html