[PATCH] bnx2fc: do not add shared skbs to the fcoe_rx_list

2014-07-25 Thread Maurizio Lombardi
In some cases, the fcoe_rx_list may contains multiple instances
of the same skb (the so called shared skbs).

the bnx2fc_l2_rcv thread is a loop that extracts a skb from the list,
modifies (and destroys) its content and the proceed to the next one.
The problem is that if the skb is shared, the remaining instances will
be corrupted.

The solution is to use skb_share_check() before adding the skb to the
fcoe_rx_list.

[ 6286.808725] [ cut here ]
[ 6286.808729] WARNING: at include/scsi/fc_frame.h:173 
bnx2fc_l2_rcv_thread+0x425/0x450 [bnx2fc]()
[ 6286.808748] Modules linked in: bnx2x(-) mdio dm_service_time bnx2fc cnic uio 
fcoe libfcoe 8021q garp stp mrp libfc llc scsi_transport_fc scsi_tgt sg 
iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crct10dif_pclmul 
crc32_pclmul crc32c_intel e1000e ghash_clmulni_intel aesni_intel lrw gf128mul 
glue_helper ablk_helper ptp cryptd hpilo serio_raw hpwdt lpc_ich pps_core 
ipmi_si pcspkr mfd_core ipmi_msghandler shpchp pcc_cpufreq mperf nfsd 
auth_rpcgss nfs_acl lockd sunrpc dm_multipath xfs libcrc32c ata_generic 
pata_acpi sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect 
sysimgblt i2c_algo_bit ata_piix drm_kms_helper ttm drm libata i2c_core hpsa 
dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mdio]
[ 6286.808750] CPU: 3 PID: 1304 Comm: bnx2fc_l2_threa Not tainted 
3.10.0-121.el7.x86_64 #1
[ 6286.808750] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013
[ 6286.808752]   0b36e715 8800deba1e00 
815ec0ba
[ 6286.808753]  8800deba1e38 8105dee1 a05618c0 
8801e4c81888
[ 6286.808754]  e8663868 8801f402b180 8801f56bc000 
8800deba1e48
[ 6286.808754] Call Trace:
[ 6286.808759]  [815ec0ba] dump_stack+0x19/0x1b
[ 6286.808762]  [8105dee1] warn_slowpath_common+0x61/0x80
[ 6286.808763]  [8105e00a] warn_slowpath_null+0x1a/0x20
[ 6286.808765]  [a054f415] bnx2fc_l2_rcv_thread+0x425/0x450 [bnx2fc]
[ 6286.808767]  [a054eff0] ? bnx2fc_disable+0x90/0x90 [bnx2fc]
[ 6286.808769]  [81085aef] kthread+0xcf/0xe0
[ 6286.808770]  [81085a20] ? kthread_create_on_node+0x140/0x140
[ 6286.808772]  [815fc76c] ret_from_fork+0x7c/0xb0
[ 6286.808773]  [81085a20] ? kthread_create_on_node+0x140/0x140
[ 6286.808774] ---[ end trace c6cdb939184ccb4e ]---

Signed-off-by: Maurizio Lombardi mlomb...@redhat.com
---
 drivers/scsi/bnx2fc/bnx2fc_fcoe.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c 
b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
index 785d0d7..a190ab6 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
@@ -411,6 +411,7 @@ static int bnx2fc_rcv(struct sk_buff *skb, struct 
net_device *dev,
struct fc_frame_header *fh;
struct fcoe_rcv_info *fr;
struct fcoe_percpu_s *bg;
+   struct sk_buff *tmp_skb;
unsigned short oxid;
 
interface = container_of(ptype, struct bnx2fc_interface,
@@ -423,6 +424,12 @@ static int bnx2fc_rcv(struct sk_buff *skb, struct 
net_device *dev,
goto err;
}
 
+   tmp_skb = skb_share_check(skb, GFP_ATOMIC);
+   if (!tmp_skb)
+   goto err;
+
+   skb = tmp_skb;
+
if (unlikely(eth_hdr(skb)-h_proto != htons(ETH_P_FCOE))) {
printk(KERN_ERR PFX bnx2fc_rcv: Wrong FC type frame\n);
goto err;
-- 
Maurizio Lombardi

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bnx2fc: do not add shared skbs to the fcoe_rx_list

2014-07-25 Thread Maurizio Lombardi
Hi Eddie,

just want to add you to the CC list.

Some days ago the bnx2fc's maintainer email address has been updated,
this should be the new one: qlogic-storage-upstr...@qlogic.com

I tried to send this patch to the new address but I received the following
delivery failure notification:


Delivery has failed to these recipients or groups:

dept_linux...@qlogic.commailto:dept_linux...@qlogic.com
Your message can't be delivered because delivery to this address is restricted.


On 07/25/2014 10:02 AM, Maurizio Lombardi wrote:
 In some cases, the fcoe_rx_list may contains multiple instances
 of the same skb (the so called shared skbs).
 
 the bnx2fc_l2_rcv thread is a loop that extracts a skb from the list,
 modifies (and destroys) its content and the proceed to the next one.
 The problem is that if the skb is shared, the remaining instances will
 be corrupted.
 
 The solution is to use skb_share_check() before adding the skb to the
 fcoe_rx_list.
 
 [ 6286.808725] [ cut here ]
 [ 6286.808729] WARNING: at include/scsi/fc_frame.h:173 
 bnx2fc_l2_rcv_thread+0x425/0x450 [bnx2fc]()
 [ 6286.808748] Modules linked in: bnx2x(-) mdio dm_service_time bnx2fc cnic 
 uio fcoe libfcoe 8021q garp stp mrp libfc llc scsi_transport_fc scsi_tgt sg 
 iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crct10dif_pclmul 
 crc32_pclmul crc32c_intel e1000e ghash_clmulni_intel aesni_intel lrw gf128mul 
 glue_helper ablk_helper ptp cryptd hpilo serio_raw hpwdt lpc_ich pps_core 
 ipmi_si pcspkr mfd_core ipmi_msghandler shpchp pcc_cpufreq mperf nfsd 
 auth_rpcgss nfs_acl lockd sunrpc dm_multipath xfs libcrc32c ata_generic 
 pata_acpi sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect 
 sysimgblt i2c_algo_bit ata_piix drm_kms_helper ttm drm libata i2c_core hpsa 
 dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mdio]
 [ 6286.808750] CPU: 3 PID: 1304 Comm: bnx2fc_l2_threa Not tainted 
 3.10.0-121.el7.x86_64 #1
 [ 6286.808750] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013
 [ 6286.808752]   0b36e715 8800deba1e00 
 815ec0ba
 [ 6286.808753]  8800deba1e38 8105dee1 a05618c0 
 8801e4c81888
 [ 6286.808754]  e8663868 8801f402b180 8801f56bc000 
 8800deba1e48
 [ 6286.808754] Call Trace:
 [ 6286.808759]  [815ec0ba] dump_stack+0x19/0x1b
 [ 6286.808762]  [8105dee1] warn_slowpath_common+0x61/0x80
 [ 6286.808763]  [8105e00a] warn_slowpath_null+0x1a/0x20
 [ 6286.808765]  [a054f415] bnx2fc_l2_rcv_thread+0x425/0x450 [bnx2fc]
 [ 6286.808767]  [a054eff0] ? bnx2fc_disable+0x90/0x90 [bnx2fc]
 [ 6286.808769]  [81085aef] kthread+0xcf/0xe0
 [ 6286.808770]  [81085a20] ? kthread_create_on_node+0x140/0x140
 [ 6286.808772]  [815fc76c] ret_from_fork+0x7c/0xb0
 [ 6286.808773]  [81085a20] ? kthread_create_on_node+0x140/0x140
 [ 6286.808774] ---[ end trace c6cdb939184ccb4e ]---
 
 Signed-off-by: Maurizio Lombardi mlomb...@redhat.com
 ---
  drivers/scsi/bnx2fc/bnx2fc_fcoe.c | 7 +++
  1 file changed, 7 insertions(+)
 
 diff --git a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c 
 b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
 index 785d0d7..a190ab6 100644
 --- a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
 +++ b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
 @@ -411,6 +411,7 @@ static int bnx2fc_rcv(struct sk_buff *skb, struct 
 net_device *dev,
   struct fc_frame_header *fh;
   struct fcoe_rcv_info *fr;
   struct fcoe_percpu_s *bg;
 + struct sk_buff *tmp_skb;
   unsigned short oxid;
  
   interface = container_of(ptype, struct bnx2fc_interface,
 @@ -423,6 +424,12 @@ static int bnx2fc_rcv(struct sk_buff *skb, struct 
 net_device *dev,
   goto err;
   }
  
 + tmp_skb = skb_share_check(skb, GFP_ATOMIC);
 + if (!tmp_skb)
 + goto err;
 +
 + skb = tmp_skb;
 +
   if (unlikely(eth_hdr(skb)-h_proto != htons(ETH_P_FCOE))) {
   printk(KERN_ERR PFX bnx2fc_rcv: Wrong FC type frame\n);
   goto err;
 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 1/4] Add XEN pvSCSI protocol description

2014-07-25 Thread jgross
From: Juergen Gross jgr...@suse.com

Add the definition of pvSCSI protocol used between the pvSCSI frontend in a
XEN domU and the pvSCSI backend in a XEN driver domain (usually Dom0).

This header was originally provided by Fujitsu for XEN based on Linux 2.6.18.
Changes are:
- added comment
- adapt to Linux style guide
- add support for larger SG-lists by putting them in an own granted page
- remove stale definitions

Signed-off-by: Juergen Gross jgr...@suse.com
---
 include/xen/interface/io/vscsiif.h | 102 +
 1 file changed, 102 insertions(+)
 create mode 100644 include/xen/interface/io/vscsiif.h

diff --git a/include/xen/interface/io/vscsiif.h 
b/include/xen/interface/io/vscsiif.h
new file mode 100644
index 000..8bc747d
--- /dev/null
+++ b/include/xen/interface/io/vscsiif.h
@@ -0,0 +1,102 @@
+/**
+ * vscsiif.h
+ *
+ * Based on the blkif.h code.
+ *
+ * This interface is to be regarded as a stable API between XEN domains
+ * running potentially different Linux kernel versions.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright(c) FUJITSU Limited 2008.
+ */
+
+#ifndef __XEN__PUBLIC_IO_SCSI_H__
+#define __XEN__PUBLIC_IO_SCSI_H__
+
+#include ring.h
+#include ../grant_table.h
+
+/* commands between backend and frontend */
+#define VSCSIIF_ACT_SCSI_CDB   1   /* SCSI CDB command */
+#define VSCSIIF_ACT_SCSI_ABORT 2   /* SCSI Device(Lun) Abort*/
+#define VSCSIIF_ACT_SCSI_RESET 3   /* SCSI Device(Lun) Reset*/
+
+/*
+ * Maximum scatter/gather segments per request.
+ *
+ * Considering balance between allocating at least 16 vscsiif_request
+ * structures on one page (4096 bytes) and the number of scatter/gather
+ * elements needed, we decided to use 26 as a magic number.
+ */
+#define VSCSIIF_SG_TABLESIZE   26
+
+/*
+ * based on Linux kernel 2.6.18
+ * Changing these values requires support of multiple protocols via the rings
+ * as old clients will blindly use these values and the resulting structure
+ * sizes.
+ */
+#define VSCSIIF_MAX_COMMAND_SIZE   16
+#define VSCSIIF_SENSE_BUFFERSIZE   96
+
+struct scsiif_request_segment {
+   grant_ref_t gref;
+   uint16_t offset;
+   uint16_t length;
+};
+
+/* Size of one request is 252 bytes */
+struct vscsiif_request {
+   uint16_t rqid;  /* private guest value, echoed in resp  */
+   uint8_t act;/* command between backend and frontend */
+   uint8_t cmd_len;
+
+   uint8_t cmnd[VSCSIIF_MAX_COMMAND_SIZE];
+   uint16_t timeout_per_command;   /* The command is issued by twice
+  the value in Backend. */
+   uint16_t channel, id, lun;
+   uint16_t padding;
+   uint8_t sc_data_direction;  /* for DMA_TO_DEVICE(1)
+  DMA_FROM_DEVICE(2)
+  DMA_NONE(3) requests */
+   uint8_t nr_segments;/* Number of pieces of scatter-gather */
+#define VSCSIIF_SG_GRANT   0x80/* flag: SG elements via grant page */
+   /* nr_segments counts grant pages with
+  SG elements
+  usable if feature-sg-grant set */
+
+   struct scsiif_request_segment seg[VSCSIIF_SG_TABLESIZE];
+   uint32_t reserved[3];
+};
+
+struct vscsiif_response {
+   uint16_t rqid;  /* identifies request */
+   uint8_t padding;
+   uint8_t sense_len;
+   uint8_t sense_buffer[VSCSIIF_SENSE_BUFFERSIZE];
+   int32_t rslt;
+   uint32_t residual_len;  /* request bufflen -
+  return the value from physical device */
+   uint32_t reserved[36];
+};
+
+DEFINE_RING_TYPES(vscsiif, struct vscsiif_request, struct vscsiif_response);
+
+#endif 

[PATCH V2 2/4] Introduce xen-scsifront module

2014-07-25 Thread jgross
From: Juergen Gross jgr...@suse.com

Introduces the XEN pvSCSI frontend. With pvSCSI it is possible for a XEN domU
to issue SCSI commands to a SCSI LUN assigned to that domU. The SCSI commands
are passed to the pvSCSI backend in a driver domain (usually Dom0) which is
owner of the physical device. This allows e.g. to use SCSI tape drives in a
XEN domU.

The code is taken from the pvSCSI implementation in XEN done by Fujitsu based
on Linux kernel 2.6.18.

Changes from the original version are:
- port to upstream kernel
- put all code in just one source file
- move module to appropriate location in kernel tree
- adapt to Linux style guide
- some minor code simplifications
- replace constants with defines
- remove not used defines
- add support for larger SG lists by putting them in a granted page

Signed-off-by: Juergen Gross jgr...@suse.com
---
 drivers/scsi/Kconfig |9 +
 drivers/scsi/Makefile|1 +
 drivers/scsi/xen-scsifront.c | 1006 ++
 3 files changed, 1016 insertions(+)
 create mode 100644 drivers/scsi/xen-scsifront.c

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index baca589..e860c16 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -611,6 +611,15 @@ config VMWARE_PVSCSI
  To compile this driver as a module, choose M here: the
  module will be called vmw_pvscsi.
 
+config XEN_SCSI_FRONTEND
+   tristate XEN SCSI frontend driver
+   depends on SCSI  XEN
+   help
+ The XEN SCSI frontend driver allows the kernel to access SCSI Devices
+ within another guest OS (usually Dom0).
+ Only needed if the kernel is running in a XEN guest and generic
+ SCSI access to a device is needed.
+
 config HYPERV_STORAGE
tristate Microsoft Hyper-V virtual storage driver
depends on SCSI  HYPERV
diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index e172d4f..a4ee9c5 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -144,6 +144,7 @@ obj-$(CONFIG_SCSI_ESAS2R)   += esas2r/
 obj-$(CONFIG_SCSI_PMCRAID) += pmcraid.o
 obj-$(CONFIG_SCSI_VIRTIO)  += virtio_scsi.o
 obj-$(CONFIG_VMWARE_PVSCSI)+= vmw_pvscsi.o
+obj-$(CONFIG_XEN_SCSI_FRONTEND)+= xen-scsifront.o
 obj-$(CONFIG_HYPERV_STORAGE)   += hv_storvsc.o
 
 obj-$(CONFIG_ARM)  += arm/
diff --git a/drivers/scsi/xen-scsifront.c b/drivers/scsi/xen-scsifront.c
new file mode 100644
index 000..469fc54
--- /dev/null
+++ b/drivers/scsi/xen-scsifront.c
@@ -0,0 +1,1006 @@
+/*
+ * Xen SCSI frontend driver
+ *
+ * Copyright (c) 2008, FUJITSU Limited
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the Software), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/*
+ * Patched to support 2TB drives
+ * 2010, Samuel Kvasnica, IMS Nanofabrication AG
+ */
+
+#include linux/version.h
+#include linux/module.h
+#include linux/kernel.h
+#include linux/device.h
+#include linux/kthread.h
+#include linux/wait.h
+#include linux/interrupt.h
+#include linux/spinlock.h
+#include linux/sched.h
+#include linux/blkdev.h
+#include linux/pfn.h
+#include linux/slab.h
+
+#include scsi/scsi_cmnd.h
+#include scsi/scsi_device.h
+#include scsi/scsi.h
+#include scsi/scsi_host.h
+
+#include xen/xen.h
+#include xen/xenbus.h
+#include xen/grant_table.h
+#include xen/events.h
+#include xen/page.h
+
+#include xen/interface/grant_table.h
+#include xen/interface/io/vscsiif.h
+#include xen/interface/io/protocols.h
+
+#include asm/xen/hypervisor.h
+
+
+#define GRANT_INVALID_REF  0
+
+#define VSCSIFRONT_OP_ADD_LUN  1
+#define VSCSIFRONT_OP_DEL_LUN  2
+
+#define DEFAULT_TASK_COMM_LEN  TASK_COMM_LEN
+
+/* tuning point*/
+#define 

Re: [PATCH 07/14] scsi: convert host_busy to atomic_t

2014-07-25 Thread Christoph Hellwig
On Tue, Jul 22, 2014 at 12:18:19AM -0400, Martin K. Petersen wrote:
 More nitpicking. In the two previous atomic conversion patches you kept
 %hu for the atomic_t. Here it's %d.

I'll fix it to use %d everywhere, given that's it's not an unsigned short
anymore.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 3/4] Introduce XEN scsiback module

2014-07-25 Thread jgross
From: Juergen Gross jgr...@suse.com

Introduces the XEN pvSCSI backend. With pvSCSI it is possible for a XEN domU
to issue SCSI commands to a SCSI LUN assigned to that domU. The SCSI commands
are passed to the pvSCSI backend in a driver domain (usually Dom0) which is
owner of the physical device. This allows e.g. to use SCSI tape drives in a
XEN domU.

The code is taken from the pvSCSI implementation in XEN done by Fujitsu based
on Linux kernel 2.6.18.

Changes from the original version are:
- port to upstream kernel
- put all code in just one source file
- adapt to Linux style guide
- use target core infrastructure instead doing pure pass-through
- enable module unloading
- support SG-list in grant page(s)
- support task abort
- remove redundant struct backend
- allocate resources dynamically
- correct minor error in scsiback_fast_flush_area
- free allocated resources in case of error during I/O preparation
- initialize SCSI emulation table statically
- fix emulation of report LUNs to not needing retry

Signed-off-by: Juergen Gross jgr...@suse.com
---
 drivers/xen/Kconfig|9 +
 drivers/xen/Makefile   |1 +
 drivers/xen/xen-scsiback.c | 2655 
 3 files changed, 2665 insertions(+)
 create mode 100644 drivers/xen/xen-scsiback.c

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 38fb36e..fc8f420 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -172,6 +172,15 @@ config XEN_PCIDEV_BACKEND
 
  If in doubt, say m.
 
+config XEN_SCSI_BACKEND
+   tristate XEN SCSI backend driver
+   depends on SCSI  XEN  XEN_BACKEND  TARGET_CORE
+   help
+ The SCSI backend driver allows the kernel to export its SCSI Devices
+ to other guests via a high-performance shared-memory interface.
+ Only needed for systems running as XEN driver domains (e.g. Dom0) and
+ if guests need generic access to SCSI devices.
+
 config XEN_PRIVCMD
tristate
depends on XEN
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 45e00af..b42ee75 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -33,6 +33,7 @@ obj-$(CONFIG_XEN_STUB)+= xen-stub.o
 obj-$(CONFIG_XEN_ACPI_HOTPLUG_MEMORY)  += xen-acpi-memhotplug.o
 obj-$(CONFIG_XEN_ACPI_HOTPLUG_CPU) += xen-acpi-cpuhotplug.o
 obj-$(CONFIG_XEN_ACPI_PROCESSOR)   += xen-acpi-processor.o
+obj-$(CONFIG_XEN_SCSI_BACKEND) += xen-scsiback.o
 xen-evtchn-y   := evtchn.o
 xen-gntdev-y   := gntdev.o
 xen-gntalloc-y := gntalloc.o
diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c
new file mode 100644
index 000..6eed255
--- /dev/null
+++ b/drivers/xen/xen-scsiback.c
@@ -0,0 +1,2655 @@
+/*
+ * Xen SCSI backend driver
+ *
+ * Copyright (c) 2008, FUJITSU Limited
+ *
+ * Based on the blkback driver code.
+ * Adaption to kernel taget core infrastructure taken from vhost/scsi.c
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the Software), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/*
+ * Patched to support 2TB drives + allow tape  autoloader operations
+ * 2010, Samuel Kvasnica, IMS Nanofabrication AG
+ */
+
+#include stdarg.h
+
+#include linux/module.h
+#include linux/utsname.h
+#include linux/interrupt.h
+#include linux/slab.h
+#include linux/wait.h
+#include linux/sched.h
+#include linux/blkdev.h
+#include linux/list.h
+#include linux/gfp.h
+#include linux/kthread.h
+#include linux/delay.h
+#include linux/vmalloc.h
+#include linux/spinlock.h
+#include linux/configfs.h
+
+#include generated/utsrelease.h
+
+#include scsi/scsi.h
+#include scsi/scsi_cmnd.h

Re: [RESEND][PATCH 07/10][SCSI]mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support

2014-07-25 Thread Sreekanth Reddy
Hi Martin,

Following are the changes that I have done in this patch over the
first RDPQ support patch,

1. Reduced the redundancy in the function
_base_release_memory_pools(), _base_allocate_memory_pools().
2. Set pci_set_consistent_dma_mask() to DMA_BIT_MASK(32). still I am
analysing whether this change may affect on any other things or not?

Signed-off-by: Sreekanth Reddy sreekanth.re...@avagotech.com

diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c
b/drivers/scsi/mpt2sas/mpt2sas_base.c
index 6b2a79e..cf69e61 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
@@ -1186,7 +1186,7 @@ _base_config_dma_addressing(struct
MPT2SAS_ADAPTER *ioc, struct pci_dev *pdev)
 dma_get_required_mask(pdev-dev);
 if ((required_mask  DMA_BIT_MASK(32))  !pci_set_dma_mask(pdev,
 DMA_BIT_MASK(64))  !pci_set_consistent_dma_mask(pdev,
-DMA_BIT_MASK(64))) {
+DMA_BIT_MASK(32))) {
 ioc-base_add_sg_single = _base_add_sg_single_64;
 ioc-sge_size = sizeof(Mpi2SGESimple64_t);
 desc = 64;
@@ -1424,6 +1424,9 @@ _base_enable_msix(struct MPT2SAS_ADAPTER *ioc)
 ioc-reply_queue_count = min_t(int, ioc-cpu_count,
 ioc-msix_vector_count);

+if (!ioc-rdpq_array_enable  max_msix_vectors == -1)
+max_msix_vectors = 8;
+
 if (max_msix_vectors  0) {
 ioc-reply_queue_count = min_t(int, max_msix_vectors,
 ioc-reply_queue_count);
@@ -1477,6 +1480,335 @@ _base_enable_msix(struct MPT2SAS_ADAPTER *ioc)
 }

 /**
+ * _base_wait_for_doorbell_int - waiting for controller interrupt(generated by
+ * a write to the doorbell)
+ * @ioc: per adapter object
+ * @timeout: timeout in second
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ *
+ * Notes: MPI2_HIS_IOC2SYS_DB_STATUS - set to one when IOC writes to doorbell.
+ */
+static int
+_base_wait_for_doorbell_int(struct MPT2SAS_ADAPTER *ioc, int timeout,
+int sleep_flag)
+{
+u32 cntdn, count;
+u32 int_status;
+
+count = 0;
+cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout;
+do {
+int_status = readl(ioc-chip-HostInterruptStatus);
+if (int_status  MPI2_HIS_IOC2SYS_DB_STATUS) {
+dhsprintk(ioc, printk(MPT2SAS_INFO_FMT
+%s: successful count(%d), timeout(%d)\n,
+ioc-name, __func__, count, timeout));
+return 0;
+}
+if (sleep_flag == CAN_SLEEP)
+msleep(1);
+else
+udelay(500);
+count++;
+} while (--cntdn);
+
+printk(MPT2SAS_ERR_FMT
+%s: failed due to timeout count(%d), int_status(%x)!\n,
+ioc-name, __func__, count, int_status);
+return -EFAULT;
+}
+
+/**
+ * _base_wait_for_doorbell_ack - waiting for controller to read the doorbell.
+ * @ioc: per adapter object
+ * @timeout: timeout in second
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ *
+ * Notes: MPI2_HIS_SYS2IOC_DB_STATUS - set to one when host writes to
+ * doorbell.
+ */
+static int
+_base_wait_for_doorbell_ack(struct MPT2SAS_ADAPTER *ioc, int timeout,
+int sleep_flag)
+{
+u32 cntdn, count;
+u32 int_status;
+u32 doorbell;
+
+count = 0;
+cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout;
+do {
+int_status = readl(ioc-chip-HostInterruptStatus);
+if (!(int_status  MPI2_HIS_SYS2IOC_DB_STATUS)) {
+dhsprintk(ioc, printk(MPT2SAS_INFO_FMT
+%s: successful count(%d), timeout(%d)\n,
+ioc-name, __func__, count, timeout));
+return 0;
+} else if (int_status  MPI2_HIS_IOC2SYS_DB_STATUS) {
+doorbell = readl(ioc-chip-Doorbell);
+if ((doorbell  MPI2_IOC_STATE_MASK) ==
+MPI2_IOC_STATE_FAULT) {
+mpt2sas_base_fault_info(ioc , doorbell);
+return -EFAULT;
+}
+} else if (int_status == 0x)
+goto out;
+
+if (sleep_flag == CAN_SLEEP)
+msleep(1);
+else
+udelay(500);
+count++;
+} while (--cntdn);
+
+ out:
+printk(MPT2SAS_ERR_FMT
+%s: failed due to timeout count(%d), int_status(%x)!\n,
+ioc-name, __func__, count, int_status);
+return -EFAULT;
+}
+
+/**
+ * _base_wait_for_doorbell_not_used - waiting for doorbell to not be in use
+ * @ioc: per adapter object
+ * @timeout: timeout in second
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ *
+ */
+static int
+_base_wait_for_doorbell_not_used(struct MPT2SAS_ADAPTER *ioc, int timeout,
+int sleep_flag)
+{
+u32 cntdn, count;
+u32 doorbell_reg;
+
+count = 0;
+cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout;
+do {
+doorbell_reg = readl(ioc-chip-Doorbell);
+if (!(doorbell_reg  MPI2_DOORBELL_USED)) 

Re: [PATCH 4/5] [SCSI] Do not use platform_bus as a parent

2014-07-25 Thread James Bottomley
On Fri, 2014-07-25 at 15:23 +0100, Pawel Moll wrote:
 The host devices without a parent were forcefully adopted
 by platform bus. This patch removes this assignment. In
 effect the dev_dev may be NULL now, which means ISA.
 
 Cc: James E.J. Bottomley jbottom...@parallels.com
 Cc: linux-scsi@vger.kernel.org
 Signed-off-by: Pawel Moll pawel.m...@arm.com
 ---
 
 This patch is a part of effort to remove references to platform_bus
 and make it static.
 
 James, could you please have a look and advice if the change is
 correct? Would you happen to know the real reasons behind
 using the root platform_bus device a parent?

Yes, for DMA purposes, the parent cannot now be NULL; we'll get a panic
in the DMA transfers if it is.  A lot of the legacy ISA device on x86
and I thought some ARM SOC devices don't pass in the parent device, so
we hang them off a known parent.

You can grep for it; these are the devices that will begin to panic if
you apply this patch:

arch/ia64/hp/sim/simscsi.c: error = scsi_add_host(host, NULL);
drivers/scsi/a2091.c:   error = scsi_add_host(instance, NULL);
drivers/scsi/a3000.c:   error = scsi_add_host(instance, NULL);
drivers/scsi/aha152x.c: if( scsi_add_host(shpnt, NULL) ) {
drivers/scsi/gdth.c:error = scsi_add_host(shp, NULL);
drivers/scsi/gdth.c:error = scsi_add_host(shp, NULL);
drivers/scsi/gvp11.c:   error = scsi_add_host(instance, NULL);
drivers/scsi/imm.c: err = scsi_add_host(host, NULL);
drivers/scsi/pcmcia/fdomain_stub.c:if (scsi_add_host(host, NULL))
drivers/scsi/pcmcia/nsp_cs.c:   ret = scsi_add_host (host, NULL);
drivers/scsi/pcmcia/qlogic_stub.c:  if (scsi_add_host(shost, NULL))
drivers/scsi/pcmcia/sym53c500_cs.c: if (scsi_add_host(host, NULL))
drivers/scsi/ppa.c: err = scsi_add_host(host, NULL);
drivers/scsi/qlogicfas.c:   if (scsi_add_host(hreg, NULL))
drivers/scsi/scsi_module.c: error = scsi_add_host(shost, NULL);
drivers/scsi/sgiwd93.c: err = scsi_add_host(host, NULL);

Note I've picked up scsi_module, so anything that uses the SCSI module
interface also has this problem.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] qla2xxx: Prevent removal and board_disable race

2014-07-25 Thread Chad Dupuis



On Wed, 18 Jun 2014, Joe Lawrence wrote:


Introduce mutual exclusion between the qla2xxx_remove_one PCI driver
callback and qla2x00_disable_board_on_pci_error, which is scheduled as
board_disable work by qla2x00_check_reg{32,16}_for_disconnect:

* Leave the driver-specific data attached to the underlying PCI device
intact in qla2x00_disable_board_on_pci_error, so that qla2x00_remove_one
has enough breadcrumbs to determine that any board_disable work has been
completed.

* In qla2xxx_remove_one, set a bit to prevent any subsequent
board_disable work from scheduling, then cancel and wait until pending
work has completed.

* Reuse the PCI device enable count check in qla2x00_remove_one to
determine if board_disable has occured.  The original purpose of this
check was unnecessary since the driver remove function wasn't called
when the probe fails.

Signed-off-by: Joe Lawrence joe.lawre...@stratus.com
---
drivers/scsi/qla2xxx/qla_def.h |1 +
drivers/scsi/qla2xxx/qla_isr.c |3 ++-
drivers/scsi/qla2xxx/qla_os.c  |   31 +++
3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
index 1267b11..7c441c9 100644
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -3404,6 +3404,7 @@ typedef struct scsi_qla_host {

unsigned long   pci_flags;
#define PFLG_DISCONNECTED   0   /* PCI device removed */
+#define PFLG_DRIVER_REMOVING   1   /* PCI driver .remove */

uint32_tdevice_flags;
#define SWITCH_FOUNDBIT_0
diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index 2741ad8..ee5eef4 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -117,7 +117,8 @@ qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, 
uint32_t reg)
{
/* Check for PCI disconnection */
if (reg == 0x) {
-   if (!test_and_set_bit(PFLG_DISCONNECTED, vha-pci_flags)) {
+   if (!test_and_set_bit(PFLG_DISCONNECTED, vha-pci_flags) 
+   !test_bit(PFLG_DRIVER_REMOVING, vha-pci_flags)) {


In our remove function we set a bit that we are unloading:

set_bit (UNLOADING, base_vha-dpc_flags);

could this be used instead?


/*
 * Schedule this (only once) on the default system
 * workqueue so that all the adapter workqueues and the
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 39c9953..51cba37 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -3123,15 +3123,25 @@ qla2x00_remove_one(struct pci_dev *pdev)
scsi_qla_host_t *base_vha;
struct qla_hw_data  *ha;

+   base_vha = pci_get_drvdata(pdev);
+   ha = base_vha-hw;
+
+   /* Indicate device removal to prevent future board_disable and wait
+* until any pending board_disable has completed. */
+   set_bit(PFLG_DRIVER_REMOVING, base_vha-pci_flags);
+   cancel_work_sync(ha-board_disable);
+
/*
-* If the PCI device is disabled that means that probe failed and any
-* resources should be have cleaned up on probe exit.
+* If the PCI device is disabled then there was a PCI-disconnect and
+* qla2x00_disable_board_on_pci_error has taken care of most of the
+* resources.
 */
-   if (!atomic_read(pdev-enable_cnt))
+   if (!atomic_read(pdev-enable_cnt)) {


Should we also add a check here that this is a disconnection before 
freeing these structs?  The original intent of the check for 
pdev-enable_cnt is to make sure we don't try to dereference an already 
freed struct if probe failed.



+   scsi_host_put(base_vha-host);
+   kfree(ha);
+   pci_set_drvdata(pdev, NULL);
return;
-
-   base_vha = pci_get_drvdata(pdev);
-   ha = base_vha-hw;
+   }

qla2x00_wait_for_hba_ready(base_vha);

@@ -4791,18 +4801,15 @@ qla2x00_disable_board_on_pci_error(struct work_struct 
*work)
qla82xx_md_free(base_vha);
qla2x00_free_queues(ha);

-   scsi_host_put(base_vha-host);
-
qla2x00_unmap_iobases(ha);

pci_release_selected_regions(ha-pdev, ha-bars);
-   kfree(ha);
-   ha = NULL;
-
pci_disable_pcie_error_reporting(pdev);
pci_disable_device(pdev);
-   pci_set_drvdata(pdev, NULL);

+   /*
+* Let qla2x00_remove_one cleanup qla_hw_data on device removal.
+*/
}

/**


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] [SCSI] Do not use platform_bus as a parent

2014-07-25 Thread Pawel Moll
On Fri, 2014-07-25 at 15:46 +0100, James Bottomley wrote:
 On Fri, 2014-07-25 at 15:23 +0100, Pawel Moll wrote:
  The host devices without a parent were forcefully adopted
  by platform bus. This patch removes this assignment. In
  effect the dev_dev may be NULL now, which means ISA.
  
  Cc: James E.J. Bottomley jbottom...@parallels.com
  Cc: linux-scsi@vger.kernel.org
  Signed-off-by: Pawel Moll pawel.m...@arm.com
  ---
  
  This patch is a part of effort to remove references to platform_bus
  and make it static.
  
  James, could you please have a look and advice if the change is
  correct? Would you happen to know the real reasons behind
  using the root platform_bus device a parent?
 
 Yes, for DMA purposes, the parent cannot now be NULL; we'll get a panic
 in the DMA transfers if it is.  

That's what I though at the beginning as well, but then I crawled
through get_dma_ops(struct device *dev) and it seems that on most
architectures (all but two, if I remember correctly) it will return a
default set of DMA ops if the dev == NULL, eg.

static inline struct dma_map_ops *get_dma_ops(struct device *dev)
{
#ifndef CONFIG_X86_DEV_DMA_OPS
return dma_ops;
#else
if (unlikely(!dev) || !dev-archdata.dma_ops)
return dma_ops;
else
return dev-archdata.dma_ops;
#endif 
}

in arch/x86/include/asm/dma-mapping.h. Now, there seem to be only a
handful of places where dev_dma is dereferenced: scsi_dma_map and
scsi_dma_unmap in drivers/scsi/scsi_lib_dma.c, where all the calls seem
to be NULL resistant and __scsi_alloc_queue in __scsi_alloc_queue
which will oops as you said.

Anyway, if you are saying that dev_dma must not be NULL at any
circumstances, I'll either have to find some kind of replacement for
platform_bus or convince Greg that platform_bus must not be made
static ;-)

 A lot of the legacy ISA device on x86
 and I thought some ARM SOC devices don't pass in the parent device, so
 we hang them off a known parent.

That's another thing I'm not sure - once assigned, is the dma_dev
related to the parent in any way? Even more - is the shost_gendev.parent
used at all? Doesn't seem to be. If it's only about a place in the
device model hierarchy, leaving parent as NULL will make such device a
virtual one, which it probably should be...

Paweł

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 3/3] [SCSI] Make LBP quirk skip lbpme checks tests

2014-07-25 Thread KY Srinivasan


 -Original Message-
 From: Martin K. Petersen [mailto:martin.peter...@oracle.com]
 Sent: Thursday, July 24, 2014 8:54 AM
 To: Sitsofe Wheeler
 Cc: Martin K. Petersen; Christoph Hellwig; KY Srinivasan;
 gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
 de...@linuxdriverproject.org; oher...@suse.com; a...@canonical.com;
 jasow...@redhat.com; jbottom...@parallels.com; linux-
 s...@vger.kernel.org
 Subject: Re: [PATCH v2 3/3] [SCSI] Make LBP quirk skip lbpme checks tests
 
  Sitsofe == Sitsofe Wheeler sits...@gmail.com writes:
 
 Sitsofe So we can see it is really a SATA device that announces discard
 Sitsofe correctly and supports discard through WRITE_SAME(16).
 
 No, that's the SATA device that announces support for DSM TRIM, and as a
 result the Linux SATL reports support for WRITE SAME(16) w. the UNMAP bit
 set and LBPME.
 
 Sitsofe It is the act of passing it through Hyper-V that turned it into
 Sitsofe a SCSI device that supports UNMAP (but not WRITE_SAME(16)),
 Sitsofe doesn't announce its SCSI conformance number and doesn't
 Sitsofe correctly announce which features it supports. Surely in this
 Sitsofe case it's reasonable to quirk our way around the problem?
 
 No. That's an issue in Hyper-V that'll you'll have to take up with Microsoft. 
 I
 don't know what their passthrough limitations are for SCSI-ATA translation.
 Maybe K. Y. has some insight into this?

For the pass through case, the host validates the request and passes the 
request to the device.
However, not all scsi commands are passed through even though the device it is 
being passed through
may support the command. WRITE_SAME is one such command. Consequently, in the 
EVPD page,
we will set state indicating that WRITE_SAME is not supported (even if the 
device supports it).

Hope this helps.

K. Y 
 
 There must be a reason why the VPD page was added and yet the device not
 flagged as LBPME=1.
 
 Many vendors do not support UNMAP/WRITE SAME to DSM TRIM
 translation.
 Additionally, many vendors explicitly only whitelist drives that are known to
 be working correctly. Your drive is an ADATA and therefore very likely to be
 blacklisted by default by a vendor SATL.
 
 --
 Martin K. PetersenOracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] [SCSI] Make LBP quirk skip lbpme checks tests

2014-07-25 Thread Martin K. Petersen
 KY == KY Srinivasan k...@microsoft.com writes:

KY For the pass through case, the host validates the request and passes
KY the request to the device.  However, not all scsi commands are
KY passed through even though the device it is being passed through may
KY support the command. WRITE_SAME is one such command. Consequently,
KY in the EVPD page, we will set state indicating that WRITE_SAME is
KY not supported (even if the device supports it).

The LBP VPD page flags UNMAP as being supported. Do you actually support
UNMAP to DSM TRIM SCSI-ATA translation?

One challenge in that department is that a single UNMAP command may turn
into many, many, many DSM TRIM commands on the underlying SATA device.
That's why we went with WRITE SAME for the internal Linux SATL, capping
the maximum number of blocks to what we can fit in a single DSM TRIM
command.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] [SCSI] Make LBP quirk skip lbpme checks tests

2014-07-25 Thread James Bottomley
On Fri, 2014-07-25 at 16:47 +, KY Srinivasan wrote:
 
  -Original Message-
  From: Martin K. Petersen [mailto:martin.peter...@oracle.com]
  Sent: Thursday, July 24, 2014 8:54 AM
  To: Sitsofe Wheeler
  Cc: Martin K. Petersen; Christoph Hellwig; KY Srinivasan;
  gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
  de...@linuxdriverproject.org; oher...@suse.com; a...@canonical.com;
  jasow...@redhat.com; jbottom...@parallels.com; linux-
  s...@vger.kernel.org
  Subject: Re: [PATCH v2 3/3] [SCSI] Make LBP quirk skip lbpme checks tests
  
   Sitsofe == Sitsofe Wheeler sits...@gmail.com writes:
  
  Sitsofe So we can see it is really a SATA device that announces discard
  Sitsofe correctly and supports discard through WRITE_SAME(16).
  
  No, that's the SATA device that announces support for DSM TRIM, and as a
  result the Linux SATL reports support for WRITE SAME(16) w. the UNMAP bit
  set and LBPME.
  
  Sitsofe It is the act of passing it through Hyper-V that turned it into
  Sitsofe a SCSI device that supports UNMAP (but not WRITE_SAME(16)),
  Sitsofe doesn't announce its SCSI conformance number and doesn't
  Sitsofe correctly announce which features it supports. Surely in this
  Sitsofe case it's reasonable to quirk our way around the problem?
  
  No. That's an issue in Hyper-V that'll you'll have to take up with 
  Microsoft. I
  don't know what their passthrough limitations are for SCSI-ATA translation.
  Maybe K. Y. has some insight into this?
 
 For the pass through case, the host validates the request and passes
 the request to the device.
 However, not all scsi commands are passed through even though the
 device it is being passed through
 may support the command. WRITE_SAME is one such command. Consequently,
 in the EVPD page,
 we will set state indicating that WRITE_SAME is not supported (even if
 the device supports it).

I think you haven't appreciated the problem: He's passing a SATA SSD via
the SCSI hyper-v interface.  That means that the windows host is doing
SCSI-ATA translation.  The problem is that the Windows translation
layer (SATL) looks to be incomplete and it's not correctly translating
the IDENTIFY bit that corresponds to TRIM to the correct VPD pages so
consequently, Linux  won't send UNMAP commands to the device (to be
translated back to TRIM).

We already know this is a bug in the Windows SATL which needs fixing (if
you could report it and get a fix, that would be great) and that we're
not going to be able to work around this automatically in Linux because
the proposed patch would have us unconditionally try UNMAP for all
Hyper-V devices.  The current proposed fix is to enable UNMAP manually
via sysfs in the guest boot scripts, but obviously that means that
Hyper-V guests with direct pass through of SSDs need operator
intervention to turn on TRIM.

James



Re: [PATCH V2 0/4] Add XEN pvSCSI support

2014-07-25 Thread Konrad Rzeszutek Wilk
On Fri, Jul 25, 2014 at 01:37:29PM +0200, jgr...@suse.com wrote:
 This series adds XEN pvSCSI support. With pvSCSI it is possible to use 
 physical
 SCSI devices from a XEN domain.
 
 The support consists of a backend in the privileged Domain-0 doing the real
 I/O and a frontend in the unprivileged domU passing I/O-requests to the 
 backend.
 

About the question that Christopher Hellwig asked - was that ever answered?

 The code is taken (and adapted) from the original pvSCSI implementation done
 for Linux 2.6 in 2008 by Fujitsu.
 
 [PATCH V2 1/4] Add XEN pvSCSI protocol description
 [PATCH V2 2/4] Introduce xen-scsifront module
 [PATCH V2 3/4] Introduce XEN scsiback module
 [PATCH V2 4/4] add xen pvscsi maintainer
 
 Changes in V2:
 - use core target infrastructure by backend instead of pure SCSI passthrough
 - add support for larger SG lists by putting them in grant page(s)
 - add command abort capability
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] qla2xxx: Prevent removal and board_disable race

2014-07-25 Thread Joe Lawrence
On Fri, 25 Jul 2014 11:23:02 -0400
Chad Dupuis chad.dup...@qlogic.com wrote:

 
 
 On Wed, 18 Jun 2014, Joe Lawrence wrote:
 
  Introduce mutual exclusion between the qla2xxx_remove_one PCI driver
  callback and qla2x00_disable_board_on_pci_error, which is scheduled as
  board_disable work by qla2x00_check_reg{32,16}_for_disconnect:
 
  * Leave the driver-specific data attached to the underlying PCI device
  intact in qla2x00_disable_board_on_pci_error, so that qla2x00_remove_one
  has enough breadcrumbs to determine that any board_disable work has been
  completed.
 
  * In qla2xxx_remove_one, set a bit to prevent any subsequent
  board_disable work from scheduling, then cancel and wait until pending
  work has completed.
 
  * Reuse the PCI device enable count check in qla2x00_remove_one to
  determine if board_disable has occured.  The original purpose of this
  check was unnecessary since the driver remove function wasn't called
  when the probe fails.
 
  Signed-off-by: Joe Lawrence joe.lawre...@stratus.com
  ---
  drivers/scsi/qla2xxx/qla_def.h |1 +
  drivers/scsi/qla2xxx/qla_isr.c |3 ++-
  drivers/scsi/qla2xxx/qla_os.c  |   31 +++
  3 files changed, 22 insertions(+), 13 deletions(-)
 
  diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
  index 1267b11..7c441c9 100644
  --- a/drivers/scsi/qla2xxx/qla_def.h
  +++ b/drivers/scsi/qla2xxx/qla_def.h
  @@ -3404,6 +3404,7 @@ typedef struct scsi_qla_host {
 
  unsigned long   pci_flags;
  #define PFLG_DISCONNECTED   0   /* PCI device removed */
  +#define PFLG_DRIVER_REMOVING   1   /* PCI driver .remove */
 
  uint32_tdevice_flags;
  #define SWITCH_FOUNDBIT_0
  diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
  index 2741ad8..ee5eef4 100644
  --- a/drivers/scsi/qla2xxx/qla_isr.c
  +++ b/drivers/scsi/qla2xxx/qla_isr.c
  @@ -117,7 +117,8 @@ qla2x00_check_reg32_for_disconnect(scsi_qla_host_t 
  *vha, uint32_t reg)
  {
  /* Check for PCI disconnection */
  if (reg == 0x) {
  -   if (!test_and_set_bit(PFLG_DISCONNECTED, vha-pci_flags)) {
  +   if (!test_and_set_bit(PFLG_DISCONNECTED, vha-pci_flags) 
  +   !test_bit(PFLG_DRIVER_REMOVING, vha-pci_flags)) {
 
 In our remove function we set a bit that we are unloading:
 
 set_bit (UNLOADING, base_vha-dpc_flags);
 
 could this be used instead?

Hi Chad,

Thanks for the comments.

I think with a little bit of code shuffling this could be accomplished.
My goal with the patchset was to try and present the problem/fix as
plain as possible.  It was easiest to collect all the atomic bits I
needed inside a single variable.  Should I be tacking on such flags to
'dpc_flags' ?

 
  /*
   * Schedule this (only once) on the default system
   * workqueue so that all the adapter workqueues and the
  diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
  index 39c9953..51cba37 100644
  --- a/drivers/scsi/qla2xxx/qla_os.c
  +++ b/drivers/scsi/qla2xxx/qla_os.c
  @@ -3123,15 +3123,25 @@ qla2x00_remove_one(struct pci_dev *pdev)
  scsi_qla_host_t *base_vha;
  struct qla_hw_data  *ha;
 
  +   base_vha = pci_get_drvdata(pdev);
  +   ha = base_vha-hw;
  +
  +   /* Indicate device removal to prevent future board_disable and wait
  +* until any pending board_disable has completed. */
  +   set_bit(PFLG_DRIVER_REMOVING, base_vha-pci_flags);
  +   cancel_work_sync(ha-board_disable);
  +
  /*
  -* If the PCI device is disabled that means that probe failed and any
  -* resources should be have cleaned up on probe exit.
  +* If the PCI device is disabled then there was a PCI-disconnect and
  +* qla2x00_disable_board_on_pci_error has taken care of most of the
  +* resources.
   */
  -   if (!atomic_read(pdev-enable_cnt))
  +   if (!atomic_read(pdev-enable_cnt)) {
 
 Should we also add a check here that this is a disconnection before 
 freeing these structs?  The original intent of the check for 
 pdev-enable_cnt is to make sure we don't try to dereference an already 
 freed struct if probe failed.

I'm not exactly sure what you're asking here.  In my tests, when .probe
return -ERRNO, .remove was not called.  Is there another call path into
qla2x00_remove_one?

The reason I didn't completely cleanup qla_hw_data in
qla2x00_disable_board_on_pci_error and re-purposed the PCI enable count
check, was that I needed some way of determining that any
board_disable work was out of the way before proceeding with
qla2x00_remove_one.

The patch's set_bit / cancel_work_sync above (along with the test_bit
before board_disable schedule) should be ensuring that the
board_disable won't be running or rescheduling in the future.

If qla2x00_disable_board_on_pci_error got as far as actually disabling
the PCI device, then qla2x00_remove_one's check on its enable count
would 

Re: [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess

2014-07-25 Thread Martin K. Petersen
 Christoph == Christoph Hellwig h...@lst.de writes:

Christoph Seems like these counters are missing any sort of
Christoph synchronization for updates, as a over 10 year old comment
Christoph from me noted.  Fix this by using atomic counters, and while
Christoph we're at it also make sure they are in the same cacheline as
Christoph the _busy counters and not needlessly stored to in every I/O
Christoph completion.

Christoph With the new model the _busy counters can temporarily go
Christoph negative, so all the readers are updated to check for  0
Christoph values.  Longer term every successful I/O completion will
Christoph reset the counters to zero, so the temporarily negative
Christoph values will not cause any harm.

Reviewed-by: Martin K. Petersen martin.peter...@oracle.com

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/14] scsi: only maintain target_blocked if the driver has a target queue limit

2014-07-25 Thread Martin K. Petersen
 Christoph == Christoph Hellwig h...@lst.de writes:

Christoph This saves us an atomic operation for each I/O submission and
Christoph completion for the usual case where the driver doesn't set a
Christoph per-target can_queue value.  Only a few iscsi hardware
Christoph offload drivers set the per-target can_queue value at the
Christoph moment.

Reviewed-by: Martin K. Petersen martin.peter...@oracle.com

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/14] scsi: unwind blk_end_request_all and blk_end_request_err calls

2014-07-25 Thread Martin K. Petersen
 Christoph == Christoph Hellwig h...@lst.de writes:

Christoph Replace the calls to the various blk_end_request variants
Christoph with opencode equivalents.  Blk-mq is using a model that
Christoph gives the driver control between the bio updates and the
Christoph actual completion, and making the old code follow that same
Christoph model allows us to keep the code more similar for both paths.

Reviewed-by: Martin K. Petersen martin.peter...@oracle.com

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/14] scatterlist: allow chaining to preallocated chunks

2014-07-25 Thread Martin K. Petersen
 Christoph == Christoph Hellwig h...@lst.de writes:

Christoph Blk-mq drivers usually preallocate their S/G list as part of
Christoph the request, but if we want to support the very large S/G
Christoph lists currently supported by the SCSI code that would tie up
Christoph a lot of memory in the preallocated request pool.  Add
Christoph support to the scatterlist code so that it can initialize a
Christoph S/G list that uses a preallocated first chunks and
Christoph dynamically allocated additional chunks.  That way the
Christoph scsi-mq code can preallocate a first page worth of S/G
Christoph entries as part of the request, and dynamically extend the
Christoph S/G list when needed.

Reviewed-by: Martin K. Petersen martin.peter...@oracle.com

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/14] scsi: add support for a blk-mq based I/O path.

2014-07-25 Thread Martin K. Petersen
 Christoph == Christoph Hellwig h...@lst.de writes:

Christoph This patch adds support for an alternate I/O path in the scsi
Christoph midlayer which uses the blk-mq infrastructure instead of the
Christoph legacy request code.

Reviewed-by: Martin K. Petersen martin.peter...@oracle.com

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] hpsa: work in progress lockless monster patches

2014-07-25 Thread scameron

hpsa: Work In Progress: lockless monster patches

To be clear, I am not suggesting that these patches be merged at this time.

This patchset is vs. Christoph Hellwig's scsi-mq.4 branch which
may be found here: git://git.infradead.org/users/hch/scsi.git

We've been working for a long time on a patchset for hpsa to remove
all the locks from the main i/o path in pursuit of high IOPS.  Some
of that work is already upstream, but a lot more of it is not quite
yet ready to be merged.  However, I think we've gone dark for a bit
too long on this, and even though the patches aren't really ready to
be merged just yet, I thought I should let other people who might be
interested have a look anyway, as things are starting to be at least
more stable than unstable.  Be warned though, there are still some
problems, esp. around error recovery.

That being said, with the right hardware (fast SAS SSDs, recent Smart
Arrays e.g. P430, with up-to-date firmware, attached to recent disk enclosures)
with these patches and the scsi-mq patches, it is possible to get around
~970k IOPS from a single logical drive on a single controller.

There are about 150 patches in this set.  Rather than bomb the list
with that, here is a link to a tarball of the patches in the form of
an stgit patch series:

https://github.com/smcameron/hpsa-lockless-patches-work-in-progress/blob/master/hpsa-lockless-vs-hch-scsi-mq.4-2014-07-25-1415CDT.tar.bz2?raw=true

In some cases, I have probably erred on the side of having too many too
small patches, on the theory that it is easier to bake a cake than to 
unbake a cake.  Before these are submitted for reals, there will be
some squashing of patches, along with other cleaning up.

There are some patches in this set which are already upstream in
James's tree which do not happen to be in Christoph's tree
(most of which are named *_sent_to_james).  There are also
quite a few patches which are strictly for debugging and are not
ever intended to be merged. 


Here is a list of all the patches sorted by author:

Arnd Bergmann (1):
  [SCSI] hpsa: fix non-x86 builds

Christoph Hellwig (1):
  reserve block tags in scsi host

Joe Handzik (9):
  hpsa: add masked physical devices into h-dev[] array
  hpsa: add hpsa_bmic_id_physical_device function
  hpsa: get queue depth for physical devices
  hpsa: use ioaccel2 path to submit IOs to physical drives in HBA mode.
  hpsa: Get queue depth from identify physical bmic for physical disks.
  hpsa: add ioaccel sg chaining for the ioaccel2 path
  Set the phys_disk value for a CommandList structure that is
submitted. Squash
  hpsa: unmap ioaccel2 commands before, not after adding to resubmit
workqueue
  hpsa: add more ioaccel2 error handling, including underrun statuses.

Nicholas Bellinger (1):
  hpsa: Convert SCSI LLD -queuecommand() for host_lock less operation

Robert Elliott (44):
  Change scsi.c scsi_log_completion() to print strings for QUEUED,
  Set scsi_logging_level to be more verbose to get better messages
  hpsa: do not unconditionally copy sense data
  hpsa: optimize cmd_alloc function by remembering last allocation
  From a154642aeed291c7cfe4b9ea9da932156030f7d1 Mon Sep 17 00:00:00
2001
  hpsa: Clean up host, channel, target, lun prints
  hpsa: do not check cmd_alloc return value - it cannnot return NULL
  hpsa: return -1 rather than -ENOMEM in hpsa_get_raid_map if fill_cmd
fails
  hpsa: return -ENOMEM not 1 from ioaccel2_alloc_cmds_and_bft on
allocation failure
  hpsa: return -ENOMEM not 1 from hpsa_alloc_ioaccel_cmd_and_bft on
allocation failure
  hpsa: return -ENOMEM not -EFAULT from hpsa_passthru_ioctl on kmalloc
failure
  hpsa: pass error from pci_set_consistent_dma_mask intact from
hpsa_message
  hpsa: detect and report failures changing controller transport modes
  hpsa: add hpsa_disable_interrupt_mode
  hpsa: rename free_irqs to hpsa_free_irqs and move before
hpsa_request_irq
  hpsa: rename hpsa_request_irq to hpsa_request_irqs
  hpsa: on failure of request_irq, free the irqs that we did get
  hpsa: make hpsa_pci_init disable interrupts and pci_disable_device on
critical failures
  hpsa: fix a string constant broken into two lines
  hpsa: avoid unneccesary calls to resource freeing functions in
hpsa_init_one
  hpsa: add hpsa_free_cfgtables function to undo what
hpsa_find_cfgtables does
  hpsa: report failure to ioremap config table
  hpsa: add hpsa_free_pci_init function
  hpsa: clean up error handling in hpsa_pci_init
  hpsa: make hpsa_remove_one use hpsa_pci_free_init
  hpsa: rename hpsa_alloc_ioaccel_cmd_and_bft to
hpsa_alloc_ioaccel1_cmd_and_bft
  hpsa: rename hpsa_allocate_cmd_pool to hpsa_alloc_cmd_pool
  hpsa: rename ioaccel2_alloc_cmds_and_bft to
hpsa_alloc_ioaccel2_cmd_and_bft
  hpsa: fix memory leak in hpsa_alloc_cmd_pool
  hpsa: return status not void from hpsa_put_ctlr_into_performant_mode
  hpsa: clean up error handling in hpsa_init_one
  hpsa: report allocation failures while allocating SG 

Re: [PATCH 14/14] fnic: reject device resets without assigned tags for the blk-mq case

2014-07-25 Thread Martin K. Petersen
 Christoph == Christoph Hellwig h...@lst.de writes:

Christoph Either someone who can actually test the hardware will have
Christoph to come up with a similar hack for the blk-mq case, or we'll
Christoph have to bite the bullet and fix the way the EH ioctls work
Christoph for real, but until that happens we fail these explicit
Christoph requests here.

Reviewed-by: Martin K. Petersen martin.peter...@oracle.com

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH 07/10][SCSI]mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support

2014-07-25 Thread Martin K. Petersen
 Sreekanth == Sreekanth Reddy sreekanth.re...@avagotech.com writes:

Sreekanth,

Sreekanth Following are the changes that I have done in this patch over
Sreekanth the first RDPQ support patch,

Please, please do the function moves in a different patch. Or use a
simple prototype declaration like I did to avoid moving the code around
in the first place.

Sreekanth 2. Set pci_set_consistent_dma_mask() to
SreekanthDMA_BIT_MASK(32). still I am analysing whether this change
Sreekanthmay affect on any other things or not?

I think you're done allocating things by the time this is called. But to
be sure you can call _base_config_dma_addressing() after you're done
with the RDPQ allocations (or rearrange the code so you allocate RDPQ
before the DMA mask is upped in general).

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] qla2xxx: Prevent removal and board_disable race

2014-07-25 Thread Chad Dupuis


On Fri, 25 Jul 2014, Joe Lawrence wrote:


On Fri, 25 Jul 2014 11:23:02 -0400
Chad Dupuis chad.dup...@qlogic.com wrote:




On Wed, 18 Jun 2014, Joe Lawrence wrote:


Introduce mutual exclusion between the qla2xxx_remove_one PCI driver
callback and qla2x00_disable_board_on_pci_error, which is scheduled as
board_disable work by qla2x00_check_reg{32,16}_for_disconnect:

* Leave the driver-specific data attached to the underlying PCI device
intact in qla2x00_disable_board_on_pci_error, so that qla2x00_remove_one
has enough breadcrumbs to determine that any board_disable work has been
completed.

* In qla2xxx_remove_one, set a bit to prevent any subsequent
board_disable work from scheduling, then cancel and wait until pending
work has completed.

* Reuse the PCI device enable count check in qla2x00_remove_one to
determine if board_disable has occured.  The original purpose of this
check was unnecessary since the driver remove function wasn't called
when the probe fails.

Signed-off-by: Joe Lawrence joe.lawre...@stratus.com
---
drivers/scsi/qla2xxx/qla_def.h |1 +
drivers/scsi/qla2xxx/qla_isr.c |3 ++-
drivers/scsi/qla2xxx/qla_os.c  |   31 +++
3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
index 1267b11..7c441c9 100644
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -3404,6 +3404,7 @@ typedef struct scsi_qla_host {

unsigned long   pci_flags;
#define PFLG_DISCONNECTED   0   /* PCI device removed */
+#define PFLG_DRIVER_REMOVING   1   /* PCI driver .remove */

uint32_tdevice_flags;
#define SWITCH_FOUNDBIT_0
diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index 2741ad8..ee5eef4 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -117,7 +117,8 @@ qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, 
uint32_t reg)
{
/* Check for PCI disconnection */
if (reg == 0x) {
-   if (!test_and_set_bit(PFLG_DISCONNECTED, vha-pci_flags)) {
+   if (!test_and_set_bit(PFLG_DISCONNECTED, vha-pci_flags) 
+   !test_bit(PFLG_DRIVER_REMOVING, vha-pci_flags)) {


In our remove function we set a bit that we are unloading:

set_bit (UNLOADING, base_vha-dpc_flags);

could this be used instead?


Hi Chad,

Thanks for the comments.

I think with a little bit of code shuffling this could be accomplished.
My goal with the patchset was to try and present the problem/fix as
plain as possible.  It was easiest to collect all the atomic bits I
needed inside a single variable.  Should I be tacking on such flags to
'dpc_flags' ?


Now that I see your explanation, it makes sense how you did it.  My 
concern had been that we introduce more flags which could theoretcially 
introduce more points for race conditions but dpc_flags in pretty 
overloaded as it is.







/*
 * Schedule this (only once) on the default system
 * workqueue so that all the adapter workqueues and the
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 39c9953..51cba37 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -3123,15 +3123,25 @@ qla2x00_remove_one(struct pci_dev *pdev)
scsi_qla_host_t *base_vha;
struct qla_hw_data  *ha;

+   base_vha = pci_get_drvdata(pdev);
+   ha = base_vha-hw;
+
+   /* Indicate device removal to prevent future board_disable and wait
+* until any pending board_disable has completed. */
+   set_bit(PFLG_DRIVER_REMOVING, base_vha-pci_flags);
+   cancel_work_sync(ha-board_disable);
+
/*
-* If the PCI device is disabled that means that probe failed and any
-* resources should be have cleaned up on probe exit.
+* If the PCI device is disabled then there was a PCI-disconnect and
+* qla2x00_disable_board_on_pci_error has taken care of most of the
+* resources.
 */
-   if (!atomic_read(pdev-enable_cnt))
+   if (!atomic_read(pdev-enable_cnt)) {


Should we also add a check here that this is a disconnection before
freeing these structs?  The original intent of the check for
pdev-enable_cnt is to make sure we don't try to dereference an already
freed struct if probe failed.


I'm not exactly sure what you're asking here.  In my tests, when .probe
return -ERRNO, .remove was not called.  Is there another call path into
qla2x00_remove_one?

The reason I didn't completely cleanup qla_hw_data in
qla2x00_disable_board_on_pci_error and re-purposed the PCI enable count
check, was that I needed some way of determining that any
board_disable work was out of the way before proceeding with
qla2x00_remove_one.

The patch's set_bit / cancel_work_sync above (along with the test_bit
before board_disable schedule) should be 

[PATCH] [SCSI] Make scsi_transfer_length take a scsi_data_buffer argument

2014-07-25 Thread Martin K. Petersen
For bidirectional commands we need to be able to distinguish between the
in and out scsi_data_buffers when calculating the wire transfer length.
Make scsi_transfer_length() take a scsi_data_buffer argument so the
caller can choose which I/O direction the calculation should apply to.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Christoph Hellwig h...@infradead.org
---
 drivers/scsi/libiscsi.c| 2 +-
 drivers/target/loopback/tcm_loop.c | 2 +-
 include/scsi/scsi_cmnd.h   | 5 +++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index f2db82beb646..fdea8c1527d4 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -391,7 +391,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task)
if (scsi_get_prot_op(sc) != SCSI_PROT_NORMAL)
task-protected = true;
 
-   transfer_length = scsi_transfer_length(sc);
+   transfer_length = scsi_transfer_length(sc, scsi_out(sc));
hdr-data_length = cpu_to_be32(transfer_length);
if (sc-sc_data_direction == DMA_TO_DEVICE) {
struct iscsi_r2t_info *r2t = task-unsol_r2t;
diff --git a/drivers/target/loopback/tcm_loop.c 
b/drivers/target/loopback/tcm_loop.c
index 340de9d92b15..c50453df555a 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -213,7 +213,7 @@ static void tcm_loop_submission_work(struct work_struct 
*work)
 
}
 
-   transfer_length = scsi_transfer_length(sc);
+   transfer_length = scsi_transfer_length(sc, scsi_out(sc));
if (!scsi_prot_sg_count(sc) 
scsi_get_prot_op(sc) != SCSI_PROT_NORMAL) {
se_cmd-prot_pto = true;
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 73f349044941..42e62cfff515 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -313,9 +313,10 @@ static inline void set_driver_byte(struct scsi_cmnd *cmd, 
char status)
cmd-result = (cmd-result  0x00ff) | (status  24);
 }
 
-static inline unsigned scsi_transfer_length(struct scsi_cmnd *scmd)
+static inline unsigned scsi_transfer_length(struct scsi_cmnd *scmd,
+   struct scsi_data_buffer *sdb)
 {
-   unsigned int xfer_len = scsi_out(scmd)-length;
+   unsigned int xfer_len = sdb-length;
unsigned int prot_op = scsi_get_prot_op(scmd);
unsigned int sector_size = scmd-device-sector_size;
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Block/SCSI data integrity update v2

2014-07-25 Thread Martin K. Petersen
This is the data integrity patch series originally submitted for 3.16.
It has been rebased on top of the current 3.17 SCSI queue tree. I
believe I have addressed all the changes requested in the reviews. Aside
from a code cleanup in the sd prot_op code there are no functional
changes.

 - bi_special is now an anonymous union similar to how we do it in
   struct request [02/14]

 - Split integrity calculation cleanup into a separate patch [06/14]

 - The disk flag has been renamed device_is_integrity_capable
   [09/14]

 - Dropped bio integrity flag accessor functions [10/14]

 - Augmented patch description for the integrity checksum flag [11/14]

 - T10 library relocated to block/ [13/14]

 - A couple of tweaks in the sd patch to make things work on top of the
   current SCSI queue tree [14/14]

-- 
Martin K. Petersen  Oracle Linux Engineering

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/14] block: Replace bi_integrity with bi_special

2014-07-25 Thread Martin K. Petersen
For commands like REQ_COPY we need a way to pass extra information along
with each bio. Like integrity metadata this information must be
available at the bottom of the stack so bi_private does not suffice.

Rename the existing bi_integrity field to bi_special and make it a union
so we can have different bio extensions for each class of command.

We previously used bi_integrity != NULL as a way to identify whether a
bio had integrity metadata or not. Introduce a REQ_INTEGRITY to be the
indicator now that bi_special can contain different things.

In addition, bio_integrity(bio) will now return a pointer to the
integrity payload (when applicable).

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
---
 Documentation/block/data-integrity.txt | 10 +-
 block/bio-integrity.c  | 19 ++-
 drivers/scsi/sd_dif.c  |  8 
 include/linux/bio.h| 11 +--
 include/linux/blk_types.h  |  8 ++--
 include/linux/blkdev.h |  7 ++-
 6 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/Documentation/block/data-integrity.txt 
b/Documentation/block/data-integrity.txt
index b4eacf48053c..4d4de8b09530 100644
--- a/Documentation/block/data-integrity.txt
+++ b/Documentation/block/data-integrity.txt
@@ -129,11 +129,11 @@ interface for this is being worked on.
 4.1 BIO
 
 The data integrity patches add a new field to struct bio when
-CONFIG_BLK_DEV_INTEGRITY is enabled.  bio-bi_integrity is a pointer
-to a struct bip which contains the bio integrity payload.  Essentially
-a bip is a trimmed down struct bio which holds a bio_vec containing
-the integrity metadata and the required housekeeping information (bvec
-pool, vector count, etc.)
+CONFIG_BLK_DEV_INTEGRITY is enabled.  bio_integrity(bio) returns a
+pointer to a struct bip which contains the bio integrity payload.
+Essentially a bip is a trimmed down struct bio which holds a bio_vec
+containing the integrity metadata and the required housekeeping
+information (bvec pool, vector count, etc.)
 
 A kernel subsystem can enable data integrity protection on a bio by
 calling bio_integrity_alloc(bio).  This will allocate and attach the
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 6818c251e937..78d8312a6528 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -77,6 +77,7 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio 
*bio,
bip-bip_slab = idx;
bip-bip_bio = bio;
bio-bi_integrity = bip;
+   bio-bi_rw |= REQ_INTEGRITY;
 
return bip;
 err:
@@ -94,7 +95,7 @@ EXPORT_SYMBOL(bio_integrity_alloc);
  */
 void bio_integrity_free(struct bio *bio)
 {
-   struct bio_integrity_payload *bip = bio-bi_integrity;
+   struct bio_integrity_payload *bip = bio_integrity(bio);
struct bio_set *bs = bio-bi_pool;
 
if (bip-bip_owns_buf)
@@ -134,7 +135,7 @@ static inline unsigned int bip_integrity_vecs(struct 
bio_integrity_payload *bip)
 int bio_integrity_add_page(struct bio *bio, struct page *page,
   unsigned int len, unsigned int offset)
 {
-   struct bio_integrity_payload *bip = bio-bi_integrity;
+   struct bio_integrity_payload *bip = bio_integrity(bio);
struct bio_vec *iv;
 
if (bip-bip_vcnt = bip_integrity_vecs(bip)) {
@@ -235,7 +236,7 @@ EXPORT_SYMBOL(bio_integrity_tag_size);
 static int bio_integrity_tag(struct bio *bio, void *tag_buf, unsigned int len,
 int set)
 {
-   struct bio_integrity_payload *bip = bio-bi_integrity;
+   struct bio_integrity_payload *bip = bio_integrity(bio);
struct blk_integrity *bi = bdev_get_integrity(bio-bi_bdev);
unsigned int nr_sectors;
 
@@ -310,12 +311,12 @@ static int bio_integrity_generate_verify(struct bio *bio, 
int operate)
struct bio_vec *bv;
sector_t sector;
unsigned int sectors, ret = 0, i;
-   void *prot_buf = bio-bi_integrity-bip_buf;
+   void *prot_buf = bio_integrity(bio)-bip_buf;
 
if (operate)
sector = bio-bi_iter.bi_sector;
else
-   sector = bio-bi_integrity-bip_iter.bi_sector;
+   sector = bio_integrity(bio)-bip_iter.bi_sector;
 
bix.disk_name = bio-bi_bdev-bd_disk-disk_name;
bix.sector_size = bi-sector_size;
@@ -511,7 +512,7 @@ static void bio_integrity_verify_fn(struct work_struct 
*work)
  */
 void bio_integrity_endio(struct bio *bio, int error)
 {
-   struct bio_integrity_payload *bip = bio-bi_integrity;
+   struct bio_integrity_payload *bip = bio_integrity(bio);
 
BUG_ON(bip-bip_bio != bio);
 
@@ -542,7 +543,7 @@ EXPORT_SYMBOL(bio_integrity_endio);
  */
 void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
 {
-   struct bio_integrity_payload *bip = bio-bi_integrity;
+   struct bio_integrity_payload *bip = bio_integrity(bio);
struct blk_integrity *bi = 

[PATCH 03/14] block: Remove integrity tagging functions

2014-07-25 Thread Martin K. Petersen
None of the filesystems appear interested in using the integrity tagging
feature. Potentially because very few storage devices actually permit
using the application tag space.

Remove the tagging functions.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Reviewed-by: Christoph Hellwig h...@lst.de
---
 Documentation/block/data-integrity.txt | 34 
 block/bio-integrity.c  | 94 +-
 block/blk-integrity.c  |  2 -
 drivers/scsi/sd_dif.c  | 65 ---
 include/linux/bio.h|  3 --
 include/linux/blkdev.h |  4 --
 6 files changed, 1 insertion(+), 201 deletions(-)

diff --git a/Documentation/block/data-integrity.txt 
b/Documentation/block/data-integrity.txt
index 4d4de8b09530..f56ec97f0d14 100644
--- a/Documentation/block/data-integrity.txt
+++ b/Documentation/block/data-integrity.txt
@@ -206,36 +206,6 @@ will require extra work due to the application tag.
   bio_integrity_enabled() returned 1.
 
 
-int bio_integrity_tag_size(bio);
-
-  If the filesystem wants to use the application tag space it will
-  first have to find out how much storage space is available.
-  Because tag space is generally limited (usually 2 bytes per
-  sector regardless of sector size), the integrity framework
-  supports interleaving the information between the sectors in an
-  I/O.
-
-  Filesystems can call bio_integrity_tag_size(bio) to find out how
-  many bytes of storage are available for that particular bio.
-
-  Another option is bdev_get_tag_size(block_device) which will
-  return the number of available bytes per hardware sector.
-
-
-int bio_integrity_set_tag(bio, void *tag_buf, len);
-
-  After a successful return from bio_integrity_prep(),
-  bio_integrity_set_tag() can be used to attach an opaque tag
-  buffer to a bio.  Obviously this only makes sense if the I/O is
-  a WRITE.
-
-
-int bio_integrity_get_tag(bio, void *tag_buf, len);
-
-  Similarly, at READ I/O completion time the filesystem can
-  retrieve the tag buffer using bio_integrity_get_tag().
-
-
 5.3 PASSING EXISTING INTEGRITY METADATA
 
 Filesystems that either generate their own integrity metadata or
@@ -288,8 +258,6 @@ will require extra work due to the application tag.
 .name   = STANDARDSBODY-TYPE-VARIANT-CSUM,
 .generate_fn= my_generate_fn,
.verify_fn  = my_verify_fn,
-   .get_tag_fn = my_get_tag_fn,
-   .set_tag_fn = my_set_tag_fn,
.tuple_size = sizeof(struct my_tuple_size),
.tag_size   = tag bytes per hw sector,
 };
@@ -311,7 +279,5 @@ will require extra work due to the application tag.
   are available per hardware sector.  For DIF this is either 2 or
   0 depending on the value of the Control Mode Page ATO bit.
 
-  See 6.2 for a description of get_tag_fn and set_tag_fn.
-
 --
 2007-12-24 Martin K. Petersen martin.peter...@oracle.com
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 78d8312a6528..2c4bbc0aaa93 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -216,90 +216,6 @@ static inline unsigned int bio_integrity_bytes(struct 
blk_integrity *bi,
 }
 
 /**
- * bio_integrity_tag_size - Retrieve integrity tag space
- * @bio:   bio to inspect
- *
- * Description: Returns the maximum number of tag bytes that can be
- * attached to this bio. Filesystems can use this to determine how
- * much metadata to attach to an I/O.
- */
-unsigned int bio_integrity_tag_size(struct bio *bio)
-{
-   struct blk_integrity *bi = bdev_get_integrity(bio-bi_bdev);
-
-   BUG_ON(bio-bi_iter.bi_size == 0);
-
-   return bi-tag_size * (bio-bi_iter.bi_size / bi-sector_size);
-}
-EXPORT_SYMBOL(bio_integrity_tag_size);
-
-static int bio_integrity_tag(struct bio *bio, void *tag_buf, unsigned int len,
-int set)
-{
-   struct bio_integrity_payload *bip = bio_integrity(bio);
-   struct blk_integrity *bi = bdev_get_integrity(bio-bi_bdev);
-   unsigned int nr_sectors;
-
-   BUG_ON(bip-bip_buf == NULL);
-
-   if (bi-tag_size == 0)
-   return -1;
-
-   nr_sectors = bio_integrity_hw_sectors(bi,
-   DIV_ROUND_UP(len, bi-tag_size));
-
-   if (nr_sectors * bi-tuple_size  bip-bip_iter.bi_size) {
-   printk(KERN_ERR %s: tag too big for bio: %u  %u\n, __func__,
-  nr_sectors * bi-tuple_size, bip-bip_iter.bi_size);
-   return -1;
-   }
-
-   if (set)
-   bi-set_tag_fn(bip-bip_buf, tag_buf, nr_sectors);
-   else
-   bi-get_tag_fn(bip-bip_buf, tag_buf, nr_sectors);
-
-

[PATCH 06/14] block: Make protection interval calculation generic

2014-07-25 Thread Martin K. Petersen
Now that the protection interval has been detached from the sector size
we need to be able to handle sizes that are different from 4K and
512. Make the interval calculation generic.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
---
 block/bio-integrity.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 8c03c78281e9..320ce7df1a5f 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -203,11 +203,7 @@ EXPORT_SYMBOL(bio_integrity_enabled);
 static inline unsigned int bio_integrity_intervals(struct blk_integrity *bi,
   unsigned int sectors)
 {
-   /* At this point there are only 512b or 4096b DIF/EPP devices */
-   if (bi-interval == 4096)
-   return sectors = 3;
-
-   return sectors;
+   return sectors  (ilog2(bi-interval) - 9);
 }
 
 static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi,
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/14] block: Don't merge requests if integrity flags differ

2014-07-25 Thread Martin K. Petersen
We'd occasionally merge requests with conflicting integrity flags.
Introduce a merge helper which checks that the requests have compatible
integrity payloads.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Reviewed-by: Christoph Hellwig h...@lst.de
Reviewed-by: Sagi Grimberg sa...@mellanox.com
---
 block/blk-integrity.c  | 36 ++--
 block/blk-merge.c  |  6 +++---
 include/linux/blkdev.h | 20 ++--
 3 files changed, 39 insertions(+), 23 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index 1c6ba442cd91..79ffb4855af0 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -186,37 +186,53 @@ int blk_integrity_compare(struct gendisk *gd1, struct 
gendisk *gd2)
 }
 EXPORT_SYMBOL(blk_integrity_compare);
 
-int blk_integrity_merge_rq(struct request_queue *q, struct request *req,
-  struct request *next)
+bool blk_integrity_merge_rq(struct request_queue *q, struct request *req,
+   struct request *next)
 {
-   if (blk_integrity_rq(req) != blk_integrity_rq(next))
-   return -1;
+   if (blk_integrity_rq(req) == 0  blk_integrity_rq(next) == 0)
+   return true;
+
+   if (blk_integrity_rq(req) == 0 || blk_integrity_rq(next) == 0)
+   return false;
+
+   if (bio_integrity(req-bio)-bip_flags !=
+   bio_integrity(next-bio)-bip_flags)
+   return false;
 
if (req-nr_integrity_segments + next-nr_integrity_segments 
q-limits.max_integrity_segments)
-   return -1;
+   return false;
 
-   return 0;
+   return true;
 }
 EXPORT_SYMBOL(blk_integrity_merge_rq);
 
-int blk_integrity_merge_bio(struct request_queue *q, struct request *req,
-   struct bio *bio)
+bool blk_integrity_merge_bio(struct request_queue *q, struct request *req,
+struct bio *bio)
 {
int nr_integrity_segs;
struct bio *next = bio-bi_next;
 
+   if (blk_integrity_rq(req) == 0  bio_integrity(bio) == NULL)
+   return true;
+
+   if (blk_integrity_rq(req) == 0 || bio_integrity(bio) == NULL)
+   return false;
+
+   if (bio_integrity(req-bio)-bip_flags != bio_integrity(bio)-bip_flags)
+   return false;
+
bio-bi_next = NULL;
nr_integrity_segs = blk_rq_count_integrity_sg(q, bio);
bio-bi_next = next;
 
if (req-nr_integrity_segments + nr_integrity_segs 
q-limits.max_integrity_segments)
-   return -1;
+   return false;
 
req-nr_integrity_segments += nr_integrity_segs;
 
-   return 0;
+   return true;
 }
 EXPORT_SYMBOL(blk_integrity_merge_bio);
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 54535831f1e1..20dcc7d0aa8f 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -308,7 +308,7 @@ static inline int ll_new_hw_segment(struct request_queue *q,
if (req-nr_phys_segments + nr_phys_segs  queue_max_segments(q))
goto no_merge;
 
-   if (bio_integrity(bio)  blk_integrity_merge_bio(q, req, bio))
+   if (blk_integrity_merge_bio(q, req, bio) == false)
goto no_merge;
 
/*
@@ -405,7 +405,7 @@ static int ll_merge_requests_fn(struct request_queue *q, 
struct request *req,
if (total_phys_segments  queue_max_segments(q))
return 0;
 
-   if (blk_integrity_rq(req)  blk_integrity_merge_rq(q, req, next))
+   if (blk_integrity_merge_rq(q, req, next) == false)
return 0;
 
/* Merge is OK... */
@@ -585,7 +585,7 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
return false;
 
/* only merge integrity protected bio into ditto rq */
-   if (bio_integrity(bio) != blk_integrity_rq(rq))
+   if (blk_integrity_merge_bio(rq-q, rq, bio) == false)
return false;
 
/* must be using the same buffer */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 68570d6cfec9..d2a0ba9a0487 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1501,10 +1501,10 @@ extern int blk_integrity_compare(struct gendisk *, 
struct gendisk *);
 extern int blk_rq_map_integrity_sg(struct request_queue *, struct bio *,
   struct scatterlist *);
 extern int blk_rq_count_integrity_sg(struct request_queue *, struct bio *);
-extern int blk_integrity_merge_rq(struct request_queue *, struct request *,
- struct request *);
-extern int blk_integrity_merge_bio(struct request_queue *, struct request *,
-  struct bio *);
+extern bool blk_integrity_merge_rq(struct request_queue *, struct request *,
+  struct request *);
+extern bool blk_integrity_merge_bio(struct request_queue *, struct request *,
+   

[PATCH 01/14] block: Get rid of bdev_integrity_enabled()

2014-07-25 Thread Martin K. Petersen
bdev_integrity_enabled() is only used by bio_integrity_enabled().
Combine these two functions.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Reviewed-by: Christoph Hellwig h...@lst.de
Reviewed-by: Sagi Grimberg sa...@mellanox.com
---
 Documentation/block/data-integrity.txt | 10 -
 block/bio-integrity.c  | 39 +++---
 include/linux/bio.h|  6 +++---
 3 files changed, 20 insertions(+), 35 deletions(-)

diff --git a/Documentation/block/data-integrity.txt 
b/Documentation/block/data-integrity.txt
index 2d735b0ae383..b4eacf48053c 100644
--- a/Documentation/block/data-integrity.txt
+++ b/Documentation/block/data-integrity.txt
@@ -192,16 +192,6 @@ will require extra work due to the application tag.
 supported by the block device.
 
 
-int bdev_integrity_enabled(block_device, int rw);
-
-  bdev_integrity_enabled() will return 1 if the block device
-  supports integrity metadata transfer for the data direction
-  specified in 'rw'.
-
-  bdev_integrity_enabled() honors the write_generate and
-  read_verify flags in sysfs and will respond accordingly.
-
-
 int bio_integrity_prep(bio);
 
   To generate IMD for WRITE and to set up buffers for READ, the
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 9e241063a616..6818c251e937 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -153,24 +153,6 @@ int bio_integrity_add_page(struct bio *bio, struct page 
*page,
 }
 EXPORT_SYMBOL(bio_integrity_add_page);
 
-static int bdev_integrity_enabled(struct block_device *bdev, int rw)
-{
-   struct blk_integrity *bi = bdev_get_integrity(bdev);
-
-   if (bi == NULL)
-   return 0;
-
-   if (rw == READ  bi-verify_fn != NULL 
-   (bi-flags  INTEGRITY_FLAG_READ))
-   return 1;
-
-   if (rw == WRITE  bi-generate_fn != NULL 
-   (bi-flags  INTEGRITY_FLAG_WRITE))
-   return 1;
-
-   return 0;
-}
-
 /**
  * bio_integrity_enabled - Check whether integrity can be passed
  * @bio:   bio to check
@@ -180,16 +162,29 @@ static int bdev_integrity_enabled(struct block_device 
*bdev, int rw)
  * set prior to calling.  The functions honors the write_generate and
  * read_verify flags in sysfs.
  */
-int bio_integrity_enabled(struct bio *bio)
+bool bio_integrity_enabled(struct bio *bio)
 {
+   struct blk_integrity *bi = bdev_get_integrity(bio-bi_bdev);
+
if (!bio_is_rw(bio))
-   return 0;
+   return false;
 
/* Already protected? */
if (bio_integrity(bio))
-   return 0;
+   return false;
+
+   if (bi == NULL)
+   return false;
+
+   if (bio_data_dir(bio) == READ  bi-verify_fn != NULL 
+   (bi-flags  INTEGRITY_FLAG_READ))
+   return true;
+
+   if (bio_data_dir(bio) == WRITE  bi-generate_fn != NULL 
+   (bi-flags  INTEGRITY_FLAG_WRITE))
+   return true;
 
-   return bdev_integrity_enabled(bio-bi_bdev, bio_data_dir(bio));
+   return false;
 }
 EXPORT_SYMBOL(bio_integrity_enabled);
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d2633ee099d9..f11dd9efd3d9 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -665,7 +665,7 @@ struct biovec_slab {
 extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, 
unsigned int);
 extern void bio_integrity_free(struct bio *);
 extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, 
unsigned int);
-extern int bio_integrity_enabled(struct bio *bio);
+extern bool bio_integrity_enabled(struct bio *bio);
 extern int bio_integrity_set_tag(struct bio *, void *, unsigned int);
 extern int bio_integrity_get_tag(struct bio *, void *, unsigned int);
 extern int bio_integrity_prep(struct bio *);
@@ -684,9 +684,9 @@ static inline int bio_integrity(struct bio *bio)
return 0;
 }
 
-static inline int bio_integrity_enabled(struct bio *bio)
+static inline bool bio_integrity_enabled(struct bio *bio)
 {
-   return 0;
+   return false;
 }
 
 static inline int bioset_integrity_create(struct bio_set *bs, int pool_size)
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/14] block: Clean up the code used to generate and verify integrity metadata

2014-07-25 Thread Martin K. Petersen
Instead of the operate parameter we pass in a seed value and a pointer
to a function that can be used to process the integrity metadata. The
generation function is changed to have a return value to fit into this
scheme.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Reviewed-by: Sagi Grimberg sa...@mellanox.com
---
 block/bio-integrity.c  |  82 ++
 drivers/scsi/sd_dif.c  | 106 ++---
 include/linux/bio.h|  12 ++
 include/linux/blkdev.h |   9 ++---
 4 files changed, 94 insertions(+), 115 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 320ce7df1a5f..c4dd7f6d4f1d 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -213,49 +213,37 @@ static inline unsigned int bio_integrity_bytes(struct 
blk_integrity *bi,
 }
 
 /**
- * bio_integrity_generate_verify - Generate/verify integrity metadata for a bio
+ * bio_integrity_process - Process integrity metadata for a bio
  * @bio:   bio to generate/verify integrity metadata for
- * @operate:   operate number, 1 for generate, 0 for verify
+ * @proc_fn:   Pointer to the relevant processing function
  */
-static int bio_integrity_generate_verify(struct bio *bio, int operate)
+static int bio_integrity_process(struct bio *bio,
+integrity_processing_fn *proc_fn)
 {
struct blk_integrity *bi = bdev_get_integrity(bio-bi_bdev);
-   struct blk_integrity_exchg bix;
+   struct blk_integrity_iter iter;
struct bio_vec *bv;
struct bio_integrity_payload *bip = bio_integrity(bio);
-   sector_t seed;
-   unsigned int intervals, ret = 0, i;
+   unsigned int i, ret = 0;
void *prot_buf = page_address(bip-bip_vec-bv_page) +
bip-bip_vec-bv_offset;
 
-   if (operate)
-   seed = bio-bi_iter.bi_sector;
-   else
-   seed = bip-bip_iter.bi_sector;
-
-   bix.disk_name = bio-bi_bdev-bd_disk-disk_name;
-   bix.interval = bi-interval;
+   iter.disk_name = bio-bi_bdev-bd_disk-disk_name;
+   iter.interval = bi-interval;
+   iter.seed = bip_get_seed(bip);
+   iter.prot_buf = prot_buf;
 
bio_for_each_segment_all(bv, bio, i) {
void *kaddr = kmap_atomic(bv-bv_page);
-   bix.data_buf = kaddr + bv-bv_offset;
-   bix.data_size = bv-bv_len;
-   bix.prot_buf = prot_buf;
-   bix.seed = seed;
-
-   if (operate)
-   bi-generate_fn(bix);
-   else {
-   ret = bi-verify_fn(bix);
-   if (ret) {
-   kunmap_atomic(kaddr);
-   return ret;
-   }
-   }
 
-   intervals = bv-bv_len / bi-interval;
-   seed += intervals;
-   prot_buf += intervals * bi-tuple_size;
+   iter.data_buf = kaddr + bv-bv_offset;
+   iter.data_size = bv-bv_len;
+
+   ret = proc_fn(iter);
+   if (ret) {
+   kunmap_atomic(kaddr);
+   return ret;
+   }
 
kunmap_atomic(kaddr);
}
@@ -263,20 +251,6 @@ static int bio_integrity_generate_verify(struct bio *bio, 
int operate)
 }
 
 /**
- * bio_integrity_generate - Generate integrity metadata for a bio
- * @bio:   bio to generate integrity metadata for
- *
- * Description: Generates integrity metadata for a bio by calling the
- * block device's generation callback function.  The bio must have a
- * bip attached with enough room to accommodate the generated
- * integrity metadata.
- */
-static void bio_integrity_generate(struct bio *bio)
-{
-   bio_integrity_generate_verify(bio, 1);
-}
-
-/**
  * bio_integrity_prep - Prepare bio for integrity I/O
  * @bio:   bio to prepare
  *
@@ -327,7 +301,7 @@ int bio_integrity_prep(struct bio *bio)
 
bip-bip_owns_buf = 1;
bip-bip_iter.bi_size = len;
-   bip-bip_iter.bi_sector = bio-bi_iter.bi_sector;
+   bip_set_seed(bip, bio-bi_iter.bi_sector);
 
/* Map it */
offset = offset_in_page(buf);
@@ -363,26 +337,13 @@ int bio_integrity_prep(struct bio *bio)
 
/* Auto-generate integrity metadata if this is a write */
if (bio_data_dir(bio) == WRITE)
-   bio_integrity_generate(bio);
+   bio_integrity_process(bio, bi-generate_fn);
 
return 0;
 }
 EXPORT_SYMBOL(bio_integrity_prep);
 
 /**
- * bio_integrity_verify - Verify integrity metadata for a bio
- * @bio:   bio to verify
- *
- * Description: This function is called to verify the integrity of a
- * bio. The data in the bio io_vec is compared to the integrity
- * metadata returned by the HBA.
- */
-static int bio_integrity_verify(struct bio *bio)
-{
-   return bio_integrity_generate_verify(bio, 0);
-}
-
-/**
  * 

[PATCH 10/14] block: Relocate bio integrity flags

2014-07-25 Thread Martin K. Petersen
Move flags affecting the integrity code out of the bio bi_flags and into
the block integrity payload.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Reviewed-by: Sagi Grimberg sa...@mellanox.com
---
 block/bio-integrity.c | 4 ++--
 drivers/scsi/sd_dif.c | 4 ++--
 include/linux/bio.h   | 9 -
 include/linux/blk_types.h | 6 ++
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 3a54e695b445..63460cc34abe 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -98,7 +98,7 @@ void bio_integrity_free(struct bio *bio)
struct bio_integrity_payload *bip = bio_integrity(bio);
struct bio_set *bs = bio-bi_pool;
 
-   if (bip-bip_owns_buf)
+   if (bip-bip_flags  BIP_BLOCK_INTEGRITY)
kfree(page_address(bip-bip_vec-bv_page) +
  bip-bip_vec-bv_offset);
 
@@ -299,7 +299,7 @@ int bio_integrity_prep(struct bio *bio)
return -EIO;
}
 
-   bip-bip_owns_buf = 1;
+   bip-bip_flags |= BIP_BLOCK_INTEGRITY;
bip-bip_iter.bi_size = len;
bip_set_seed(bip, bio-bi_iter.bi_sector);
 
diff --git a/drivers/scsi/sd_dif.c b/drivers/scsi/sd_dif.c
index 1e971c6f8c2b..4ce636fdc15f 100644
--- a/drivers/scsi/sd_dif.c
+++ b/drivers/scsi/sd_dif.c
@@ -326,7 +326,7 @@ void sd_dif_prepare(struct request *rq, sector_t hw_sector,
unsigned int j;
 
/* Already remapped? */
-   if (bio_flagged(bio, BIO_MAPPED_INTEGRITY))
+   if (bip-bip_flags  BIP_MAPPED_INTEGRITY)
break;
 
virt = bip_get_seed(bip)  0x;
@@ -347,7 +347,7 @@ void sd_dif_prepare(struct request *rq, sector_t hw_sector,
kunmap_atomic(sdt);
}
 
-   bio-bi_flags |= (1  BIO_MAPPED_INTEGRITY);
+   bip-bip_flags |= BIP_MAPPED_INTEGRITY;
}
 }
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index b43f8acc845e..2657425629a8 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -314,7 +314,7 @@ struct bio_integrity_payload {
 
unsigned short  bip_slab;   /* slab the bip came from */
unsigned short  bip_vcnt;   /* # of integrity bio_vecs */
-   unsignedbip_owns_buf:1; /* should free bip_buf */
+   unsigned short  bip_flags;  /* control flags */
 
struct work_struct  bip_work;   /* I/O completion */
 
@@ -322,6 +322,13 @@ struct bio_integrity_payload {
struct bio_vec  bip_inline_vecs[0];/* embedded bvec array */
 };
 
+enum bip_flags {
+   BIP_BLOCK_INTEGRITY = 1  0, /* block layer owns integrity data */
+   BIP_MAPPED_INTEGRITY= 1  1, /* ref tag has been remapped */
+   BIP_CTRL_NOCHECK= 1  2, /* disable HBA integrity checking */
+   BIP_DISK_NOCHECK= 1  3, /* disable disk integrity checking */
+};
+
 static inline sector_t bip_get_seed(struct bio_integrity_payload *bip)
 {
return bip-bip_iter.bi_sector;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 0b7084f7c136..c96939d488ed 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -120,10 +120,8 @@ struct bio {
 #define BIO_USER_MAPPED 6  /* contains user pages */
 #define BIO_EOPNOTSUPP 7   /* not supported */
 #define BIO_NULL_MAPPED 8  /* contains invalid user pages */
-#define BIO_FS_INTEGRITY 9 /* fs owns integrity data, not block layer */
-#define BIO_QUIET  10  /* Make BIO Quiet */
-#define BIO_MAPPED_INTEGRITY 11/* integrity metadata has been remapped */
-#define BIO_SNAP_STABLE12  /* bio data must be snapshotted during 
write */
+#define BIO_QUIET  9   /* Make BIO Quiet */
+#define BIO_SNAP_STABLE10  /* bio data must be snapshotted during 
write */
 
 /*
  * Flags starting here get preserved by bio_reset() - this includes
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/14] block: Remove bip_buf

2014-07-25 Thread Martin K. Petersen
bip_buf is not really needed so we can remove it.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Reviewed-by: Christoph Hellwig h...@lst.de
Reviewed-by: Sagi Grimberg sa...@mellanox.com
---
 block/bio-integrity.c | 10 ++
 include/linux/bio.h   |  3 ---
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 2c4bbc0aaa93..cea93a9553d6 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -99,7 +99,8 @@ void bio_integrity_free(struct bio *bio)
struct bio_set *bs = bio-bi_pool;
 
if (bip-bip_owns_buf)
-   kfree(bip-bip_buf);
+   kfree(page_address(bip-bip_vec-bv_page) +
+ bip-bip_vec-bv_offset);
 
if (bs) {
if (bip-bip_slab != BIO_POOL_NONE)
@@ -225,14 +226,16 @@ static int bio_integrity_generate_verify(struct bio *bio, 
int operate)
struct blk_integrity *bi = bdev_get_integrity(bio-bi_bdev);
struct blk_integrity_exchg bix;
struct bio_vec *bv;
+   struct bio_integrity_payload *bip = bio_integrity(bio);
sector_t sector;
unsigned int sectors, ret = 0, i;
-   void *prot_buf = bio_integrity(bio)-bip_buf;
+   void *prot_buf = page_address(bip-bip_vec-bv_page) +
+   bip-bip_vec-bv_offset;
 
if (operate)
sector = bio-bi_iter.bi_sector;
else
-   sector = bio_integrity(bio)-bip_iter.bi_sector;
+   sector = bip-bip_iter.bi_sector;
 
bix.disk_name = bio-bi_bdev-bd_disk-disk_name;
bix.sector_size = bi-sector_size;
@@ -327,7 +330,6 @@ int bio_integrity_prep(struct bio *bio)
}
 
bip-bip_owns_buf = 1;
-   bip-bip_buf = buf;
bip-bip_iter.bi_size = len;
bip-bip_iter.bi_sector = bio-bi_iter.bi_sector;
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index f8b09d7020ef..38bcec55c4c6 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -310,9 +310,6 @@ struct bio_integrity_payload {
 
struct bvec_iterbip_iter;
 
-   /* kill - should just use bip_vec */
-   void*bip_buf;   /* generated integrity data */
-
bio_end_io_t*bip_end_io;/* saved I/O completion fn */
 
unsigned short  bip_slab;   /* slab the bip came from */
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/14] block: Integrity checksum flag

2014-07-25 Thread Martin K. Petersen
Make the choice of checksum a per-I/O property by introducing a flag
that can be inspected by the SCSI layer. There are several reasons for
this:

 1. It allows us to switch choice of checksum without unloading and
reloading the HBA driver.

 2. During error recovery we need to be able to tell the HBA that
checksums read from disk should not be verified and converted to IP
checksums.

 3. For error injection purposes we need to be able to write a bad guard
tag to storage. Since the storage device only supports T10 CRC we
need to be able to disable IP checksum conversion on the HBA.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
---
 block/bio-integrity.c  | 3 +++
 drivers/scsi/sd_dif.c  | 6 --
 include/linux/bio.h| 1 +
 include/linux/blkdev.h | 1 +
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 63460cc34abe..31795b05339e 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -303,6 +303,9 @@ int bio_integrity_prep(struct bio *bio)
bip-bip_iter.bi_size = len;
bip_set_seed(bip, bio-bi_iter.bi_sector);
 
+   if (bi-flags  BLK_INTEGRITY_IP_CHECKSUM)
+   bip-bip_flags |= BIP_IP_CHECKSUM;
+
/* Map it */
offset = offset_in_page(buf);
for (i = 0 ; i  nr_pages ; i++) {
diff --git a/drivers/scsi/sd_dif.c b/drivers/scsi/sd_dif.c
index 4ce636fdc15f..2198abee619e 100644
--- a/drivers/scsi/sd_dif.c
+++ b/drivers/scsi/sd_dif.c
@@ -255,12 +255,14 @@ void sd_dif_config_host(struct scsi_disk *sdkp)
return;
 
/* Enable DMA of protection information */
-   if (scsi_host_get_guard(sdkp-device-host)  SHOST_DIX_GUARD_IP)
+   if (scsi_host_get_guard(sdkp-device-host)  SHOST_DIX_GUARD_IP) {
if (type == SD_DIF_TYPE3_PROTECTION)
blk_integrity_register(disk, dif_type3_integrity_ip);
else
blk_integrity_register(disk, dif_type1_integrity_ip);
-   else
+
+   disk-integrity-flags |= BLK_INTEGRITY_IP_CHECKSUM;
+   } else
if (type == SD_DIF_TYPE3_PROTECTION)
blk_integrity_register(disk, dif_type3_integrity_crc);
else
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 2657425629a8..a60195e07cff 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -327,6 +327,7 @@ enum bip_flags {
BIP_MAPPED_INTEGRITY= 1  1, /* ref tag has been remapped */
BIP_CTRL_NOCHECK= 1  2, /* disable HBA integrity checking */
BIP_DISK_NOCHECK= 1  3, /* disable disk integrity checking */
+   BIP_IP_CHECKSUM = 1  4, /* IP checksum */
 };
 
 static inline sector_t bip_get_seed(struct bio_integrity_payload *bip)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 57ac257c6872..68570d6cfec9 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1466,6 +1466,7 @@ enum blk_integrity_flags {
BLK_INTEGRITY_VERIFY= 1  0,
BLK_INTEGRITY_GENERATE  = 1  1,
BLK_INTEGRITY_DEVICE_CAPABLE= 1  2,
+   BLK_INTEGRITY_IP_CHECKSUM   = 1  3,
 };
 
 struct blk_integrity_iter {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/14] block: Add T10 Protection Information functions

2014-07-25 Thread Martin K. Petersen
The T10 Protection Information format is also used by some devices that
do not go through the SCSI layer (virtual block devices, NVMe). Relocate
the relevant functions to a block layer library that can be used without
involving SCSI.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Reviewed-by: Christoph Hellwig h...@lst.de
---
 block/Kconfig  |   1 +
 block/Makefile |   4 +-
 block/t10-pi.c | 164 ++
 drivers/scsi/Kconfig   |   1 -
 drivers/scsi/sd_dif.c  | 241 -
 include/linux/crc-t10dif.h |   5 +-
 include/linux/t10-pi.h |  60 +++
 7 files changed, 250 insertions(+), 226 deletions(-)
 create mode 100644 block/t10-pi.c
 create mode 100644 include/linux/t10-pi.h

diff --git a/block/Kconfig b/block/Kconfig
index 2429515c05c2..1427106f30c9 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -77,6 +77,7 @@ config BLK_DEV_BSGLIB
 
 config BLK_DEV_INTEGRITY
bool Block layer data integrity support
+   select CRC_T10_PI if BLK_DEV_INTEGRITY
---help---
Some storage devices allow extra information to be
stored/retrieved to help protect the data.  The block layer
diff --git a/block/Makefile b/block/Makefile
index a2ce6ac935ec..00ecc97629db 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -20,6 +20,6 @@ obj-$(CONFIG_IOSCHED_DEADLINE)+= deadline-iosched.o
 obj-$(CONFIG_IOSCHED_CFQ)  += cfq-iosched.o
 
 obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
-obj-$(CONFIG_BLK_DEV_INTEGRITY)+= blk-integrity.o
 obj-$(CONFIG_BLK_CMDLINE_PARSER)   += cmdline-parser.o
-obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o
+obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
+
diff --git a/block/t10-pi.c b/block/t10-pi.c
new file mode 100644
index ..f45da5d55bd8
--- /dev/null
+++ b/block/t10-pi.c
@@ -0,0 +1,164 @@
+/*
+ * t10_pi.c - Functions for generating and verifying T10 Protection
+ *   Information.
+ *
+ * Copyright (C) 2007, 2008, 2014 Oracle Corporation
+ * Written by: Martin K. Petersen martin.peter...@oracle.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; see the file COPYING.  If not, write to
+ * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
+ * USA.
+ *
+ */
+
+#include linux/t10-pi.h
+#include linux/blkdev.h
+#include linux/crc-t10dif.h
+#include net/checksum.h
+
+typedef __u16 (csum_fn) (void *, unsigned int);
+
+static __u16 t10_pi_crc_fn(void *data, unsigned int len)
+{
+   return cpu_to_be16(crc_t10dif(data, len));
+}
+
+static __u16 t10_pi_ip_fn(void *data, unsigned int len)
+{
+   return ip_compute_csum(data, len);
+}
+
+/*
+ * Type 1 and Type 2 protection use the same format: 16 bit guard tag,
+ * 16 bit app tag, 32 bit reference tag. Type 3 does not define the ref
+ * tag.
+ */
+static int t10_pi_generate(struct blk_integrity_iter *iter, csum_fn *fn,
+  unsigned int type)
+{
+   unsigned int i;
+
+   for (i = 0 ; i  iter-data_size ; i += iter-interval) {
+   struct t10_pi_tuple *pi = iter-prot_buf;
+
+   pi-guard_tag = fn(iter-data_buf, iter-interval);
+   pi-app_tag = 0;
+
+   if (type == 1)
+   pi-ref_tag = cpu_to_be32(iter-seed  0x);
+   else
+   pi-ref_tag = 0;
+
+   iter-data_buf += iter-interval;
+   iter-prot_buf += sizeof(struct t10_pi_tuple);
+   iter-seed++;
+   }
+
+   return 0;
+}
+
+static int t10_pi_verify(struct blk_integrity_iter *iter, csum_fn *fn,
+   unsigned int type)
+{
+   unsigned int i;
+
+   for (i = 0 ; i  iter-data_size ; i += iter-interval) {
+   struct t10_pi_tuple *pi = iter-prot_buf;
+   __u16 csum;
+
+   switch (type) {
+   case 1:
+   case 2:
+   if (pi-app_tag == 0x)
+   goto next;
+
+   if (be32_to_cpu(pi-ref_tag) !=
+   (iter-seed  0x)) {
+   pr_err(%s: ref tag error at location %lu  \
+  (rcvd %u)\n, iter-disk_name,
+  iter-seed, be32_to_cpu(pi-ref_tag));
+   return -EILSEQ;
+   }
+

[PATCH 14/14] sd: Honor block layer integrity handling flags

2014-07-25 Thread Martin K. Petersen
A set of flags introduced in the block layer enable better control over
how protection information is handled. These flags are useful for both
error injection and data recovery purposes. Checking can be enabled and
disabled for controller and disk, and the guard tag format is now a
per-I/O property.

Update sd_protect_op to communicate the relevant information to the
low-level device driver via a set of flags in scsi_cmnd.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
---
 drivers/scsi/sd.c| 73 
 drivers/scsi/sd.h| 66 +--
 drivers/scsi/sd_dif.c| 23 ++-
 include/scsi/scsi_cmnd.h | 36 +---
 4 files changed, 142 insertions(+), 56 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 3663e38ba4df..f068a22ef680 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -610,29 +610,44 @@ static void scsi_disk_put(struct scsi_disk *sdkp)
mutex_unlock(sd_ref_mutex);
 }
 
-static void sd_prot_op(struct scsi_cmnd *scmd, unsigned int dif)
-{
-   unsigned int prot_op = SCSI_PROT_NORMAL;
-   unsigned int dix = scsi_prot_sg_count(scmd);
-
-   if (scmd-sc_data_direction == DMA_FROM_DEVICE) {
-   if (dif  dix)
-   prot_op = SCSI_PROT_READ_PASS;
-   else if (dif  !dix)
-   prot_op = SCSI_PROT_READ_STRIP;
-   else if (!dif  dix)
-   prot_op = SCSI_PROT_READ_INSERT;
-   } else {
-   if (dif  dix)
-   prot_op = SCSI_PROT_WRITE_PASS;
-   else if (dif  !dix)
-   prot_op = SCSI_PROT_WRITE_INSERT;
-   else if (!dif  dix)
-   prot_op = SCSI_PROT_WRITE_STRIP;
+
+
+static unsigned char sd_setup_protect_cmnd(struct scsi_cmnd *scmd,
+  unsigned int dix, unsigned int dif)
+{
+   struct bio_integrity_payload *bip = bio_integrity(scmd-request-bio);
+   unsigned int prot_op = sd_prot_op(rq_data_dir(scmd-request), dix, dif);
+   unsigned int protect = 0;
+
+   if (dix) {  /* DIX Type 0, 1, 2, 3 */
+   if (bip-bip_flags  BIP_IP_CHECKSUM)
+   scmd-prot_flags |= SCSI_PROT_IP_CHECKSUM;
+
+   if ((bip-bip_flags  BIP_CTRL_NOCHECK) == 0)
+   scmd-prot_flags |= SCSI_PROT_GUARD_CHECK;
+   }
+
+   if (dif != SD_DIF_TYPE3_PROTECTION) {   /* DIX/DIF Type 0, 1, 2 */
+   scmd-prot_flags |= SCSI_PROT_REF_INCREMENT;
+
+   if ((bip  bip-bip_flags  BIP_CTRL_NOCHECK) == 0)
+   scmd-prot_flags |= SCSI_PROT_REF_CHECK;
+   }
+
+   if (dif) {  /* DIX/DIF Type 1, 2, 3 */
+   scmd-prot_flags |= SCSI_PROT_TRANSFER_PI;
+
+   if (bip  bip-bip_flags  BIP_DISK_NOCHECK)
+   protect = 3  5;   /* Disable target PI checking */
+   else
+   protect = 1  5;   /* Enable target PI checking */
}
 
scsi_set_prot_op(scmd, prot_op);
scsi_set_prot_type(scmd, dif);
+   scmd-prot_flags = sd_prot_flag_mask(prot_op);
+
+   return protect;
 }
 
 static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
@@ -893,7 +908,8 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
sector_t block = blk_rq_pos(rq);
sector_t threshold;
unsigned int this_count = blk_rq_sectors(rq);
-   int ret, host_dif;
+   unsigned int dif, dix;
+   int ret;
unsigned char protect;
 
ret = scsi_init_io(SCpnt, GFP_ATOMIC);
@@ -998,7 +1014,7 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd 
*SCpnt)
SCpnt-cmnd[0] = WRITE_6;
 
if (blk_integrity_rq(rq))
-   sd_dif_prepare(rq, block, sdp-sector_size);
+   sd_dif_prepare(SCpnt);
 
} else if (rq_data_dir(rq) == READ) {
SCpnt-cmnd[0] = READ_6;
@@ -1013,14 +1029,15 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd 
*SCpnt)
writing : reading, this_count,
blk_rq_sectors(rq)));
 
-   /* Set RDPROTECT/WRPROTECT if disk is formatted with DIF */
-   host_dif = scsi_host_dif_capable(sdp-host, sdkp-protection_type);
-   if (host_dif)
-   protect = 1  5;
+   dix = scsi_prot_sg_count(SCpnt);
+   dif = scsi_host_dif_capable(SCpnt-device-host, sdkp-protection_type);
+
+   if (dif || dix)
+   protect = sd_setup_protect_cmnd(SCpnt, dix, dif);
else
protect = 0;
 
-   if (host_dif == SD_DIF_TYPE2_PROTECTION) {
+   if (protect  sdkp-protection_type == SD_DIF_TYPE2_PROTECTION) {
SCpnt-cmnd = 

[PATCH 05/14] block: Deprecate the use of the term sector in the context of block integrity

2014-07-25 Thread Martin K. Petersen
The protection interval is not necessarily tied to the logical block
size of a block device. Stop using the terms sector and sectors.

Going forward we will use the term seed to describe the initial
reference tag value for a given I/O. Interval will be used to describe
the portion of the data buffer that a given piece of protection
information is associated with.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
---
 block/bio-integrity.c  | 42 +-
 block/blk-integrity.c  | 10 +-
 drivers/scsi/sd_dif.c  | 46 +++---
 include/linux/blkdev.h |  6 +++---
 4 files changed, 52 insertions(+), 52 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index cea93a9553d6..8c03c78281e9 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -191,20 +191,20 @@ bool bio_integrity_enabled(struct bio *bio)
 EXPORT_SYMBOL(bio_integrity_enabled);
 
 /**
- * bio_integrity_hw_sectors - Convert 512b sectors to hardware ditto
+ * bio_integrity_intervals - Return number of integrity intervals for a bio
  * @bi:blk_integrity profile for device
- * @sectors:   Number of 512 sectors to convert
+ * @sectors:   Size of the bio in 512-byte sectors
  *
  * Description: The block layer calculates everything in 512 byte
- * sectors but integrity metadata is done in terms of the hardware
- * sector size of the storage device.  Convert the block layer sectors
- * to physical sectors.
+ * sectors but integrity metadata is done in terms of the data integrity
+ * interval size of the storage device.  Convert the block layer sectors
+ * to the appropriate number of integrity intervals.
  */
-static inline unsigned int bio_integrity_hw_sectors(struct blk_integrity *bi,
-   unsigned int sectors)
+static inline unsigned int bio_integrity_intervals(struct blk_integrity *bi,
+  unsigned int sectors)
 {
/* At this point there are only 512b or 4096b DIF/EPP devices */
-   if (bi-sector_size == 4096)
+   if (bi-interval == 4096)
return sectors = 3;
 
return sectors;
@@ -213,7 +213,7 @@ static inline unsigned int bio_integrity_hw_sectors(struct 
blk_integrity *bi,
 static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi,
   unsigned int sectors)
 {
-   return bio_integrity_hw_sectors(bi, sectors) * bi-tuple_size;
+   return bio_integrity_intervals(bi, sectors) * bi-tuple_size;
 }
 
 /**
@@ -227,25 +227,25 @@ static int bio_integrity_generate_verify(struct bio *bio, 
int operate)
struct blk_integrity_exchg bix;
struct bio_vec *bv;
struct bio_integrity_payload *bip = bio_integrity(bio);
-   sector_t sector;
-   unsigned int sectors, ret = 0, i;
+   sector_t seed;
+   unsigned int intervals, ret = 0, i;
void *prot_buf = page_address(bip-bip_vec-bv_page) +
bip-bip_vec-bv_offset;
 
if (operate)
-   sector = bio-bi_iter.bi_sector;
+   seed = bio-bi_iter.bi_sector;
else
-   sector = bip-bip_iter.bi_sector;
+   seed = bip-bip_iter.bi_sector;
 
bix.disk_name = bio-bi_bdev-bd_disk-disk_name;
-   bix.sector_size = bi-sector_size;
+   bix.interval = bi-interval;
 
bio_for_each_segment_all(bv, bio, i) {
void *kaddr = kmap_atomic(bv-bv_page);
bix.data_buf = kaddr + bv-bv_offset;
bix.data_size = bv-bv_len;
bix.prot_buf = prot_buf;
-   bix.sector = sector;
+   bix.seed = seed;
 
if (operate)
bi-generate_fn(bix);
@@ -257,9 +257,9 @@ static int bio_integrity_generate_verify(struct bio *bio, 
int operate)
}
}
 
-   sectors = bv-bv_len / bi-sector_size;
-   sector += sectors;
-   prot_buf += sectors * bi-tuple_size;
+   intervals = bv-bv_len / bi-interval;
+   seed += intervals;
+   prot_buf += intervals * bi-tuple_size;
 
kunmap_atomic(kaddr);
}
@@ -300,17 +300,17 @@ int bio_integrity_prep(struct bio *bio)
unsigned long start, end;
unsigned int len, nr_pages;
unsigned int bytes, offset, i;
-   unsigned int sectors;
+   unsigned int intervals;
 
bi = bdev_get_integrity(bio-bi_bdev);
q = bdev_get_queue(bio-bi_bdev);
BUG_ON(bi == NULL);
BUG_ON(bio_integrity(bio));
 
-   sectors = bio_integrity_hw_sectors(bi, bio_sectors(bio));
+   intervals = bio_integrity_intervals(bi, bio_sectors(bio));
 
/* Allocate kernel buffer for protection data */
-   len = sectors * bi-tuple_size;
+   len = intervals * bi-tuple_size;
buf = kmalloc(len, 

[ANNOUNCE] scsi patch queue tree updated

2014-07-25 Thread Christoph Hellwig
I've pushed out new version of the for-3.17 core and drivers trees:

  git://git.infradead.org/users/hch/scsi-queue.git core-for-3.17
  git://git.infradead.org/users/hch/scsi-queue.git drivers-for-3.17

In the core tree the biggest update is the merge of the blk-mq
support, but various smaller updates also made it.  On the drivers
side I have only merged minor updates for the ufs and hyperv drivers.

We're fairly late in the cycle but I really would like to not miss the
scsi_debug and megaraid updates that have been out for a while but need
another review.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [SCSI] Make scsi_transfer_length take a scsi_data_buffer argument

2014-07-25 Thread Christoph Hellwig
On Fri, Jul 25, 2014 at 04:00:19PM -0400, Martin K. Petersen wrote:
 For bidirectional commands we need to be able to distinguish between the
 in and out scsi_data_buffers when calculating the wire transfer length.
 Make scsi_transfer_length() take a scsi_data_buffer argument so the
 caller can choose which I/O direction the calculation should apply to.
 
 Signed-off-by: Martin K. Petersen martin.peter...@oracle.com

Thanks, this looks good to me,

Reviewed-by: Christoph Hellwig h...@lst.de
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html