Re: [PATCH-v2 0/6] vhost/scsi: Add T10 PI SGL passthrough support

2014-06-10 Thread Paolo Bonzini

Il 10/06/2014 09:07, Nicholas A. Bellinger ha scritto:

 OK, finally went over this, looks good to me:

 Acked-by: Michael S. Tsirkin m...@redhat.com

 However, we really should stop making more changes
 before fixing ANY_LAYOUT in this driver.

 virtio 1.0 should be out soon and that makes ANY_LAYOUT
 a required feature.


What else is required to complete the ANY_LAYOUT conversion..?


Basically, you need to stop expecting the request and response headers 
to be in a separate iov, and also vhost-scsi should not expect pi to be 
in a separate iov than the main data.  A layout that has everything in 
the same iov would be fine, and similarly for a layout that has the CDB 
in a separate iov than the rest of the header.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 3/3] TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire

2014-06-10 Thread Paolo Bonzini

Il 10/06/2014 10:04, Nicholas A. Bellinger ha scritto:


That said, there is one other small qla2xxx change to enable per-session
PI that is currently missing in Quinn's patch in scsi/for-next code.

Sooo, I'll go ahead and include Sagi's patches with the vhost-scsi
change below if there are no objections from MKP and/or driver
maintainers, and post an -rc2 series after the merge window closes to
address the two remaining qla2xxx specific items.


Indeed the patch looks like a tcm bugfix.  There is no reason to modify 
the LLD.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-v2 0/6] vhost/scsi: Add T10 PI SGL passthrough support

2014-06-09 Thread Paolo Bonzini

Il 08/06/2014 18:05, Michael S. Tsirkin ha scritto:


OK, finally went over this, looks good to me:

Acked-by: Michael S. Tsirkin m...@redhat.com

However, we really should stop making more changes
before fixing ANY_LAYOUT in this driver.

virtio 1.0 should be out soon and that makes ANY_LAYOUT
a required feature.


Agreed.  virtio-blk btw is getting ANY_LAYOUT support in QEMU 2.1, and 
virtio-scsi will follow suit.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: ibmvscsi: protect abort handler from done-scmd in flight

2014-06-05 Thread Paolo Bonzini

Il 05/06/2014 08:16, Liu Ping Fan ha scritto:

Take the following scene in guest:
seqA: scsi_done() - gapX (before taking REQ_ATOM_COMPLETE)
seqB: scmd_eh_abort_handler()- ...- ibmvscsi_eh_abort_handler()-
  ...-scsi_put_command(scmd)

If seqA is scheduled at gapX, and seqB reclaims scmd. Then when seqA
comes back, it tries to access the scmd when is turned back to mempool.

This patch fixes the race by ensuring when ibmvscsi_eh_abort_handler()
returns, no scsi_done is in flight

Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
---
When trying to figure the scsi_cmnd in flight issue, I learned from Paolo 
(thanks).
He showed me the way how virtscsi resolves the race between abort-handler
and scsi_done in flight. And I think that this method is also needed by 
ibmvscsi.
---
 drivers/scsi/ibmvscsi/ibmvscsi.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index fa76440..325cef6 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -1828,16 +1828,19 @@ static void ibmvscsi_handle_crq(struct viosrp_crq *crq,

if ((crq-status != VIOSRP_OK  crq-status != VIOSRP_OK2)  
evt_struct-cmnd)
evt_struct-cmnd-result = DID_ERROR  16;
-   if (evt_struct-done)
-   evt_struct-done(evt_struct);
-   else
-   dev_err(hostdata-dev, returned done() is NULL; not running 
it!\n);

/*
 * Lock the host_lock before messing with these structures, since we
 * are running in a task context
+* Also, this lock helps ibmvscsi_eh_abort_handler() to shield the
+* scsi_done() in flight.
 */
spin_lock_irqsave(evt_struct-hostdata-host-host_lock, flags);
+   if (evt_struct-done)
+   evt_struct-done(evt_struct);
+   else
+   dev_err(hostdata-dev, returned done() is NULL; not running 
it!\n);
+
list_del(evt_struct-list);
free_event_struct(evt_struct-hostdata-pool, evt_struct);
spin_unlock_irqrestore(evt_struct-hostdata-host-host_lock, flags);



I think this is not necessary because ibmvscsi places TMFs and commands 
in the same queue; virtio-scsi instead uses two different queues.


So ibmvscsi_handle_crq processes all completed requests, which naturally 
serializes the processing of the TMF and the command.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/6] virtio-scsi patches for 3.16 + midlayer fix

2014-06-04 Thread Paolo Bonzini
As mentioned in the v1 submission, these patches were delayed a bit
by the discussion and testing of patch 5.  Debugging the problem also
led to the discovery of a midlayer bug fixed by patch 4.

Paolo

v1-v2: fix all occurrences of scmd-result |= DID_TIME_OUT  16
in patch 4

Ming Lei (2):
  virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends()
  virtio-scsi: replace target spinlock with seqcount

Paolo Bonzini (2):
  virtio-scsi: avoid cancelling uninitialized work items
  virtio-scsi: fix various bad behavior on aborted requests

Ulrich Obergfell (1):
  scsi_error: fix invalid setting of host byte

Venkatesh Srinivas (1):
  virtio-scsi: Implement change_queue_depth for virtscsi targets

 drivers/scsi/scsi_error.c  |   6 +-
 drivers/scsi/virtio_scsi.c | 148 +++--
 2 files changed, 93 insertions(+), 61 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/6] scsi_error: fix invalid setting of host byte

2014-06-04 Thread Paolo Bonzini
From: Ulrich Obergfell uober...@redhat.com

After scsi_try_to_abort_cmd returns, the eh_abort_handler may have
already found that the command has completed in the device, causing
the host_byte to be nonzero (e.g. it could be DID_ABORT).  When
this happens, ORing DID_TIME_OUT into the host byte will corrupt
the result field and initiate an unwanted command retry.

Fix this by using set_host_byte instead, following the model of
commit 2082ebc45af9c9c648383b8cde0dc1948eadbf31.

Cc: sta...@vger.kernel.org
Signed-off-by: Ulrich Obergfell uober...@redhat.com
[Fix all instances according to review comments. - Paolo]
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
v1-v2: fix all occurrences [Bart]

 drivers/scsi/scsi_error.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f17aa7aa7879..8e7a0f8d4f52 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -131,7 +131,7 @@ scmd_eh_abort_handler(struct work_struct *work)
aborting command %p\n, scmd));
rtn = scsi_try_to_abort_cmd(sdev-host-hostt, scmd);
if (rtn == SUCCESS) {
-   scmd-result |= DID_TIME_OUT  16;
+   set_host_byte(scmd, DID_TIME_OUT);
if (scsi_host_eh_past_deadline(sdev-host)) {
SCSI_LOG_ERROR_RECOVERY(3,
scmd_printk(KERN_INFO, scmd,
@@ -167,7 +167,7 @@ scmd_eh_abort_handler(struct work_struct *work)
scmd_printk(KERN_WARNING, scmd,
scmd %p terminate 
aborted command\n, scmd));
-   scmd-result |= DID_TIME_OUT  16;
+   set_host_byte(scmd, DID_TIME_OUT);
scsi_finish_command(scmd);
}
 }
@@ -291,7 +291,7 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
if (scsi_abort_command(scmd) == SUCCESS)
return BLK_EH_NOT_HANDLED;
 
-   scmd-result |= DID_TIME_OUT  16;
+   set_host_byte(scmd, DID_TIME_OUT);
 
if (unlikely(rtn == BLK_EH_NOT_HANDLED 
 !scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD)))
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 6/6] virtio-scsi: Implement change_queue_depth for virtscsi targets

2014-06-04 Thread Paolo Bonzini
From: Venkatesh Srinivas venkate...@google.com

change_queue_depth allows changing per-target queue depth via sysfs.

It also allows the SCSI midlayer to ramp down the number of concurrent
inflight requests in response to a SCSI BUSY status response and allows
the midlayer to ramp the count back up to the device maximum when the
BUSY condition has resolved.

Signed-off-by: Venkatesh Srinivas venkate...@google.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index fda9fb35..de0f8d9b3682 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -26,6 +26,7 @@
 #include scsi/scsi_host.h
 #include scsi/scsi_device.h
 #include scsi/scsi_cmnd.h
+#include scsi/scsi_tcq.h
 #include linux/seqlock.h
 
 #define VIRTIO_SCSI_MEMPOOL_SZ 64
@@ -629,6 +630,36 @@ static int virtscsi_device_reset(struct scsi_cmnd *sc)
return virtscsi_tmf(vscsi, cmd);
 }
 
+/**
+ * virtscsi_change_queue_depth() - Change a virtscsi target's queue depth
+ * @sdev:  Virtscsi target whose queue depth to change
+ * @qdepth:New queue depth
+ * @reason:Reason for the queue depth change.
+ */
+static int virtscsi_change_queue_depth(struct scsi_device *sdev,
+  int qdepth,
+  int reason)
+{
+   struct Scsi_Host *shost = sdev-host;
+   int max_depth = shost-cmd_per_lun;
+
+   switch (reason) {
+   case SCSI_QDEPTH_QFULL: /* Drop qdepth in response to BUSY state */
+   scsi_track_queue_full(sdev, qdepth);
+   break;
+   case SCSI_QDEPTH_RAMP_UP: /* Raise qdepth after BUSY state resolved */
+   case SCSI_QDEPTH_DEFAULT: /* Manual change via sysfs */
+   scsi_adjust_queue_depth(sdev,
+   scsi_get_tag_type(sdev),
+   min(max_depth, qdepth));
+   break;
+   default:
+   return -EOPNOTSUPP;
+   }
+
+   return sdev-queue_depth;
+}
+
 static int virtscsi_abort(struct scsi_cmnd *sc)
 {
struct virtio_scsi *vscsi = shost_priv(sc-device-host);
@@ -683,6 +714,7 @@ static struct scsi_host_template 
virtscsi_host_template_single = {
.proc_name = virtio_scsi,
.this_id = -1,
.queuecommand = virtscsi_queuecommand_single,
+   .change_queue_depth = virtscsi_change_queue_depth,
.eh_abort_handler = virtscsi_abort,
.eh_device_reset_handler = virtscsi_device_reset,
 
@@ -699,6 +731,7 @@ static struct scsi_host_template 
virtscsi_host_template_multi = {
.proc_name = virtio_scsi,
.this_id = -1,
.queuecommand = virtscsi_queuecommand_multi,
+   .change_queue_depth = virtscsi_change_queue_depth,
.eh_abort_handler = virtscsi_abort,
.eh_device_reset_handler = virtscsi_device_reset,
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/6] virtio-scsi: fix various bad behavior on aborted requests

2014-06-04 Thread Paolo Bonzini
Even though the virtio-scsi spec guarantees that all requests related
to the TMF will have been completed by the time the TMF itself completes,
the request queue's callback might not have run yet.  This causes requests
to be completed more than once, and as a result triggers a variety of
BUGs or oopses.

Cc: sta...@vger.kernel.org
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index d66c4ee2c774..fda9fb35 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -235,6 +235,16 @@ static void virtscsi_req_done(struct virtqueue *vq)
virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd);
 };
 
+static void virtscsi_poll_requests(struct virtio_scsi *vscsi)
+{
+   int i, num_vqs;
+
+   num_vqs = vscsi-num_queues;
+   for (i = 0; i  num_vqs; i++)
+   virtscsi_vq_done(vscsi, vscsi-req_vqs[i],
+virtscsi_complete_cmd);
+}
+
 static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf)
 {
struct virtio_scsi_cmd *cmd = buf;
@@ -579,6 +589,18 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, struct 
virtio_scsi_cmd *cmd)
cmd-resp.tmf.response == VIRTIO_SCSI_S_FUNCTION_SUCCEEDED)
ret = SUCCESS;
 
+   /*
+* The spec guarantees that all requests related to the TMF have
+* been completed, but the callback might not have run yet if
+* we're using independent interrupts (e.g. MSI).  Poll the
+* virtqueues once.
+*
+* In the abort case, sc-scsi_done will do nothing, because
+* the block layer must have detected a timeout and as a result
+* REQ_ATOM_COMPLETE has been set.
+*/
+   virtscsi_poll_requests(vscsi);
+
 out:
mempool_free(cmd, virtscsi_cmd_pool);
return ret;
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/6] virtio-scsi: avoid cancelling uninitialized work items

2014-06-04 Thread Paolo Bonzini
Calling the workqueue interface on uninitialized work items isn't a
good idea even if they're zeroed. It's not failing catastrophically only
through happy accidents.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index f0b4cdbfceb0..d66c4ee2c774 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -253,6 +253,8 @@ static void virtscsi_ctrl_done(struct virtqueue *vq)
virtscsi_vq_done(vscsi, vscsi-ctrl_vq, virtscsi_complete_free);
 };
 
+static void virtscsi_handle_event(struct work_struct *work);
+
 static int virtscsi_kick_event(struct virtio_scsi *vscsi,
   struct virtio_scsi_event_node *event_node)
 {
@@ -260,6 +262,7 @@ static int virtscsi_kick_event(struct virtio_scsi *vscsi,
struct scatterlist sg;
unsigned long flags;
 
+   INIT_WORK(event_node-work, virtscsi_handle_event);
sg_init_one(sg, event_node-event, sizeof(struct virtio_scsi_event));
 
spin_lock_irqsave(vscsi-event_vq.vq_lock, flags);
@@ -377,7 +380,6 @@ static void virtscsi_complete_event(struct virtio_scsi 
*vscsi, void *buf)
 {
struct virtio_scsi_event_node *event_node = buf;
 
-   INIT_WORK(event_node-work, virtscsi_handle_event);
schedule_work(event_node-work);
 }
 
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/6] virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends()

2014-06-04 Thread Paolo Bonzini
From: Ming Lei tom.leim...@gmail.com

Access to tgt-req_vq is strictly serialized by spin_lock
of tgt-tgt_lock, so the ACCESS_ONCE() isn't necessary.

smp_read_barrier_depends() in virtscsi_req_done was introduced
to order reading req_vq and decreasing tgt-reqs, but it isn't
needed now because req_vq is read from
scsi-req_vqs[vq-index - VIRTIO_SCSI_VQ_BASE] instead of
tgt-req_vq, so remove the unnecessary barrier.

Also remove related comment about the barrier.

Cc: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Ming Lei tom.leim...@gmail.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 53 ++
 1 file changed, 6 insertions(+), 47 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index db3b494e5926..fc054935eb1f 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -73,17 +73,12 @@ struct virtio_scsi_vq {
  * queue, and also lets the driver optimize the IRQ affinity for the virtqueues
  * (each virtqueue's affinity is set to the CPU that owns the queue).
  *
- * An interesting effect of this policy is that only writes to req_vq need to
- * take the tgt_lock.  Read can be done outside the lock because:
+ * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq
+ * could be done locklessly, but we do not do it yet.
  *
- * - writes of req_vq only occur when atomic_inc_return(tgt-reqs) returns 1.
- *   In that case, no other CPU is reading req_vq: even if they were in
- *   virtscsi_queuecommand_multi, they would be spinning on tgt_lock.
- *
- * - reads of req_vq only occur when the target is not idle (reqs != 0).
- *   A CPU that enters virtscsi_queuecommand_multi will not modify req_vq.
- *
- * Similarly, decrements of reqs are never concurrent with writes of req_vq.
+ * Decrements of reqs are never concurrent with writes of req_vq: before the
+ * decrement reqs will be != 0; after the decrement the virtqueue completion
+ * routine will not use the req_vq so it can be changed by a new request.
  * Thus they can happen outside the tgt_lock, provided of course we make reqs
  * an atomic_t.
  */
@@ -238,38 +233,6 @@ static void virtscsi_req_done(struct virtqueue *vq)
int index = vq-index - VIRTIO_SCSI_VQ_BASE;
struct virtio_scsi_vq *req_vq = vscsi-req_vqs[index];
 
-   /*
-* Read req_vq before decrementing the reqs field in
-* virtscsi_complete_cmd.
-*
-* With barriers:
-*
-*  CPU #0  virtscsi_queuecommand_multi (CPU #1)
-*  
-*  lock vq_lock
-*  read req_vq
-*  read reqs (reqs = 1)
-*  write reqs (reqs = 0)
-*  increment reqs (reqs = 1)
-*  write req_vq
-*
-* Possible reordering without barriers:
-*
-*  CPU #0  virtscsi_queuecommand_multi (CPU #1)
-*  
-*  lock vq_lock
-*  read reqs (reqs = 1)
-*  write reqs (reqs = 0)
-*  increment reqs (reqs = 1)
-*  write req_vq
-*  read (wrong) req_vq
-*
-* We do not need a full smp_rmb, because req_vq is required to get
-* to tgt-reqs: tgt is vscsi-tgt[sc-device-id], where sc is stored
-* in the virtqueue as the user token.
-*/
-   smp_read_barrier_depends();
-
virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd);
 };
 
@@ -560,12 +523,8 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct 
virtio_scsi *vscsi,
 
spin_lock_irqsave(tgt-tgt_lock, flags);
 
-   /*
-* The memory barrier after atomic_inc_return matches
-* the smp_read_barrier_depends() in virtscsi_req_done.
-*/
if (atomic_inc_return(tgt-reqs)  1)
-   vq = ACCESS_ONCE(tgt-req_vq);
+   vq = tgt-req_vq;
else {
queue_num = smp_processor_id();
while (unlikely(queue_num = vscsi-num_queues))
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/6] virtio-scsi: replace target spinlock with seqcount

2014-06-04 Thread Paolo Bonzini
From: Ming Lei ming@canonical.com

The spinlock of tgt_lock is only for serializing read and write
req_vq, one lockless seqcount is enough for the purpose.

On one 16core VM with vhost-scsi backend, the patch can improve
IOPS with 3% on random read test.

Signed-off-by: Ming Lei ming@canonical.com
[Add initialization in virtscsi_target_alloc. - Paolo]
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 42 +-
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index fc054935eb1f..f0b4cdbfceb0 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -26,6 +26,7 @@
 #include scsi/scsi_host.h
 #include scsi/scsi_device.h
 #include scsi/scsi_cmnd.h
+#include linux/seqlock.h
 
 #define VIRTIO_SCSI_MEMPOOL_SZ 64
 #define VIRTIO_SCSI_EVENT_LEN 8
@@ -73,18 +74,16 @@ struct virtio_scsi_vq {
  * queue, and also lets the driver optimize the IRQ affinity for the virtqueues
  * (each virtqueue's affinity is set to the CPU that owns the queue).
  *
- * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq
- * could be done locklessly, but we do not do it yet.
+ * tgt_seq is held to serialize reading and writing req_vq.
  *
  * Decrements of reqs are never concurrent with writes of req_vq: before the
  * decrement reqs will be != 0; after the decrement the virtqueue completion
  * routine will not use the req_vq so it can be changed by a new request.
- * Thus they can happen outside the tgt_lock, provided of course we make reqs
+ * Thus they can happen outside the tgt_seq, provided of course we make reqs
  * an atomic_t.
  */
 struct virtio_scsi_target_state {
-   /* This spinlock never held at the same time as vq_lock. */
-   spinlock_t tgt_lock;
+   seqcount_t tgt_seq;
 
/* Count of outstanding requests. */
atomic_t reqs;
@@ -521,19 +520,33 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct 
virtio_scsi *vscsi,
unsigned long flags;
u32 queue_num;
 
-   spin_lock_irqsave(tgt-tgt_lock, flags);
+   local_irq_save(flags);
+   if (atomic_inc_return(tgt-reqs)  1) {
+   unsigned long seq;
+
+   do {
+   seq = read_seqcount_begin(tgt-tgt_seq);
+   vq = tgt-req_vq;
+   } while (read_seqcount_retry(tgt-tgt_seq, seq));
+   } else {
+   /* no writes can be concurrent because of atomic_t */
+   write_seqcount_begin(tgt-tgt_seq);
+
+   /* keep previous req_vq if there is reader found */
+   if (unlikely(atomic_read(tgt-reqs)  1)) {
+   vq = tgt-req_vq;
+   goto unlock;
+   }
 
-   if (atomic_inc_return(tgt-reqs)  1)
-   vq = tgt-req_vq;
-   else {
queue_num = smp_processor_id();
while (unlikely(queue_num = vscsi-num_queues))
queue_num -= vscsi-num_queues;
-
tgt-req_vq = vq = vscsi-req_vqs[queue_num];
+ unlock:
+   write_seqcount_end(tgt-tgt_seq);
}
+   local_irq_restore(flags);
 
-   spin_unlock_irqrestore(tgt-tgt_lock, flags);
return vq;
 }
 
@@ -618,14 +631,17 @@ static int virtscsi_abort(struct scsi_cmnd *sc)
 
 static int virtscsi_target_alloc(struct scsi_target *starget)
 {
+   struct Scsi_Host *sh = dev_to_shost(starget-dev.parent);
+   struct virtio_scsi *vscsi = shost_priv(sh);
+
struct virtio_scsi_target_state *tgt =
kmalloc(sizeof(*tgt), GFP_KERNEL);
if (!tgt)
return -ENOMEM;
 
-   spin_lock_init(tgt-tgt_lock);
+   seqcount_init(tgt-tgt_seq);
atomic_set(tgt-reqs, 0);
-   tgt-req_vq = NULL;
+   tgt-req_vq = vscsi-req_vqs[0];
 
starget-hostdata = tgt;
return 0;
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/6] virtio-scsi patches for 3.16 + midlayer fix

2014-06-04 Thread Paolo Bonzini

Il 04/06/2014 13:21, Bart Van Assche ha scritto:


Thanks for the quick respin. However, since you are mentioning that in
v2 all occurrences of scmd-result |= DID_TIME_OUT  16 have been
addressed, this made me wonder whether you had noticed the following
code in scsi_decide_disposition() ?

switch (host_byte(scmd-result)) {
[ ... ]
case DID_ABORT:
if (scmd-eh_eflags  SCSI_EH_ABORT_SCHEDULED) {
scmd-result |= DID_TIME_OUT  16;
return SUCCESS;
}


Nope, I miscounted how many occurrences were left.  v3 on the way.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 6/6] virtio-scsi: Implement change_queue_depth for virtscsi targets

2014-06-04 Thread Paolo Bonzini
From: Venkatesh Srinivas venkate...@google.com

change_queue_depth allows changing per-target queue depth via sysfs.

It also allows the SCSI midlayer to ramp down the number of concurrent
inflight requests in response to a SCSI BUSY status response and allows
the midlayer to ramp the count back up to the device maximum when the
BUSY condition has resolved.

Signed-off-by: Venkatesh Srinivas venkate...@google.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index fda9fb35..de0f8d9b3682 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -26,6 +26,7 @@
 #include scsi/scsi_host.h
 #include scsi/scsi_device.h
 #include scsi/scsi_cmnd.h
+#include scsi/scsi_tcq.h
 #include linux/seqlock.h
 
 #define VIRTIO_SCSI_MEMPOOL_SZ 64
@@ -629,6 +630,36 @@ static int virtscsi_device_reset(struct scsi_cmnd *sc)
return virtscsi_tmf(vscsi, cmd);
 }
 
+/**
+ * virtscsi_change_queue_depth() - Change a virtscsi target's queue depth
+ * @sdev:  Virtscsi target whose queue depth to change
+ * @qdepth:New queue depth
+ * @reason:Reason for the queue depth change.
+ */
+static int virtscsi_change_queue_depth(struct scsi_device *sdev,
+  int qdepth,
+  int reason)
+{
+   struct Scsi_Host *shost = sdev-host;
+   int max_depth = shost-cmd_per_lun;
+
+   switch (reason) {
+   case SCSI_QDEPTH_QFULL: /* Drop qdepth in response to BUSY state */
+   scsi_track_queue_full(sdev, qdepth);
+   break;
+   case SCSI_QDEPTH_RAMP_UP: /* Raise qdepth after BUSY state resolved */
+   case SCSI_QDEPTH_DEFAULT: /* Manual change via sysfs */
+   scsi_adjust_queue_depth(sdev,
+   scsi_get_tag_type(sdev),
+   min(max_depth, qdepth));
+   break;
+   default:
+   return -EOPNOTSUPP;
+   }
+
+   return sdev-queue_depth;
+}
+
 static int virtscsi_abort(struct scsi_cmnd *sc)
 {
struct virtio_scsi *vscsi = shost_priv(sc-device-host);
@@ -683,6 +714,7 @@ static struct scsi_host_template 
virtscsi_host_template_single = {
.proc_name = virtio_scsi,
.this_id = -1,
.queuecommand = virtscsi_queuecommand_single,
+   .change_queue_depth = virtscsi_change_queue_depth,
.eh_abort_handler = virtscsi_abort,
.eh_device_reset_handler = virtscsi_device_reset,
 
@@ -699,6 +731,7 @@ static struct scsi_host_template 
virtscsi_host_template_multi = {
.proc_name = virtio_scsi,
.this_id = -1,
.queuecommand = virtscsi_queuecommand_multi,
+   .change_queue_depth = virtscsi_change_queue_depth,
.eh_abort_handler = virtscsi_abort,
.eh_device_reset_handler = virtscsi_device_reset,
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/6] virtio-scsi: fix various bad behavior on aborted requests

2014-06-04 Thread Paolo Bonzini
Even though the virtio-scsi spec guarantees that all requests related
to the TMF will have been completed by the time the TMF itself completes,
the request queue's callback might not have run yet.  This causes requests
to be completed more than once, and as a result triggers a variety of
BUGs or oopses.

Cc: sta...@vger.kernel.org
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index d66c4ee2c774..fda9fb35 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -235,6 +235,16 @@ static void virtscsi_req_done(struct virtqueue *vq)
virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd);
 };
 
+static void virtscsi_poll_requests(struct virtio_scsi *vscsi)
+{
+   int i, num_vqs;
+
+   num_vqs = vscsi-num_queues;
+   for (i = 0; i  num_vqs; i++)
+   virtscsi_vq_done(vscsi, vscsi-req_vqs[i],
+virtscsi_complete_cmd);
+}
+
 static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf)
 {
struct virtio_scsi_cmd *cmd = buf;
@@ -579,6 +589,18 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, struct 
virtio_scsi_cmd *cmd)
cmd-resp.tmf.response == VIRTIO_SCSI_S_FUNCTION_SUCCEEDED)
ret = SUCCESS;
 
+   /*
+* The spec guarantees that all requests related to the TMF have
+* been completed, but the callback might not have run yet if
+* we're using independent interrupts (e.g. MSI).  Poll the
+* virtqueues once.
+*
+* In the abort case, sc-scsi_done will do nothing, because
+* the block layer must have detected a timeout and as a result
+* REQ_ATOM_COMPLETE has been set.
+*/
+   virtscsi_poll_requests(vscsi);
+
 out:
mempool_free(cmd, virtscsi_cmd_pool);
return ret;
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/6] virtio-scsi: replace target spinlock with seqcount

2014-06-04 Thread Paolo Bonzini
From: Ming Lei ming@canonical.com

The spinlock of tgt_lock is only for serializing read and write
req_vq, one lockless seqcount is enough for the purpose.

On one 16core VM with vhost-scsi backend, the patch can improve
IOPS with 3% on random read test.

Signed-off-by: Ming Lei ming@canonical.com
[Add initialization in virtscsi_target_alloc. - Paolo]
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 42 +-
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index fc054935eb1f..f0b4cdbfceb0 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -26,6 +26,7 @@
 #include scsi/scsi_host.h
 #include scsi/scsi_device.h
 #include scsi/scsi_cmnd.h
+#include linux/seqlock.h
 
 #define VIRTIO_SCSI_MEMPOOL_SZ 64
 #define VIRTIO_SCSI_EVENT_LEN 8
@@ -73,18 +74,16 @@ struct virtio_scsi_vq {
  * queue, and also lets the driver optimize the IRQ affinity for the virtqueues
  * (each virtqueue's affinity is set to the CPU that owns the queue).
  *
- * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq
- * could be done locklessly, but we do not do it yet.
+ * tgt_seq is held to serialize reading and writing req_vq.
  *
  * Decrements of reqs are never concurrent with writes of req_vq: before the
  * decrement reqs will be != 0; after the decrement the virtqueue completion
  * routine will not use the req_vq so it can be changed by a new request.
- * Thus they can happen outside the tgt_lock, provided of course we make reqs
+ * Thus they can happen outside the tgt_seq, provided of course we make reqs
  * an atomic_t.
  */
 struct virtio_scsi_target_state {
-   /* This spinlock never held at the same time as vq_lock. */
-   spinlock_t tgt_lock;
+   seqcount_t tgt_seq;
 
/* Count of outstanding requests. */
atomic_t reqs;
@@ -521,19 +520,33 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct 
virtio_scsi *vscsi,
unsigned long flags;
u32 queue_num;
 
-   spin_lock_irqsave(tgt-tgt_lock, flags);
+   local_irq_save(flags);
+   if (atomic_inc_return(tgt-reqs)  1) {
+   unsigned long seq;
+
+   do {
+   seq = read_seqcount_begin(tgt-tgt_seq);
+   vq = tgt-req_vq;
+   } while (read_seqcount_retry(tgt-tgt_seq, seq));
+   } else {
+   /* no writes can be concurrent because of atomic_t */
+   write_seqcount_begin(tgt-tgt_seq);
+
+   /* keep previous req_vq if there is reader found */
+   if (unlikely(atomic_read(tgt-reqs)  1)) {
+   vq = tgt-req_vq;
+   goto unlock;
+   }
 
-   if (atomic_inc_return(tgt-reqs)  1)
-   vq = tgt-req_vq;
-   else {
queue_num = smp_processor_id();
while (unlikely(queue_num = vscsi-num_queues))
queue_num -= vscsi-num_queues;
-
tgt-req_vq = vq = vscsi-req_vqs[queue_num];
+ unlock:
+   write_seqcount_end(tgt-tgt_seq);
}
+   local_irq_restore(flags);
 
-   spin_unlock_irqrestore(tgt-tgt_lock, flags);
return vq;
 }
 
@@ -618,14 +631,17 @@ static int virtscsi_abort(struct scsi_cmnd *sc)
 
 static int virtscsi_target_alloc(struct scsi_target *starget)
 {
+   struct Scsi_Host *sh = dev_to_shost(starget-dev.parent);
+   struct virtio_scsi *vscsi = shost_priv(sh);
+
struct virtio_scsi_target_state *tgt =
kmalloc(sizeof(*tgt), GFP_KERNEL);
if (!tgt)
return -ENOMEM;
 
-   spin_lock_init(tgt-tgt_lock);
+   seqcount_init(tgt-tgt_seq);
atomic_set(tgt-reqs, 0);
-   tgt-req_vq = NULL;
+   tgt-req_vq = vscsi-req_vqs[0];
 
starget-hostdata = tgt;
return 0;
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/6] scsi_error: fix invalid setting of host byte

2014-06-04 Thread Paolo Bonzini
From: Ulrich Obergfell uober...@redhat.com

After scsi_try_to_abort_cmd returns, the eh_abort_handler may have
already found that the command has completed in the device, causing
the host_byte to be nonzero (e.g. it could be DID_ABORT).  When
this happens, ORing DID_TIME_OUT into the host byte will corrupt
the result field and initiate an unwanted command retry.

Fix this by using set_host_byte instead, following the model of
commit 2082ebc45af9c9c648383b8cde0dc1948eadbf31.

Cc: sta...@vger.kernel.org
Signed-off-by: Ulrich Obergfell uober...@redhat.com
[Fix all instances according to review comments. - Paolo]
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
v1-v2: fix all occurrences [Bart] except one
v2-v3: really fix all occurrences [Bart]

 drivers/scsi/scsi_error.c | 6 +++---
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f17aa7aa7879..8e7a0f8d4f52 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -131,7 +131,7 @@ scmd_eh_abort_handler(struct work_struct *work)
aborting command %p\n, scmd));
rtn = scsi_try_to_abort_cmd(sdev-host-hostt, scmd);
if (rtn == SUCCESS) {
-   scmd-result |= DID_TIME_OUT  16;
+   set_host_byte(scmd, DID_TIME_OUT);
if (scsi_host_eh_past_deadline(sdev-host)) {
SCSI_LOG_ERROR_RECOVERY(3,
scmd_printk(KERN_INFO, scmd,
@@ -167,7 +167,7 @@ scmd_eh_abort_handler(struct work_struct *work)
scmd_printk(KERN_WARNING, scmd,
scmd %p terminate 
aborted command\n, scmd));
-   scmd-result |= DID_TIME_OUT  16;
+   set_host_byte(scmd, DID_TIME_OUT);
scsi_finish_command(scmd);
}
 }
@@ -291,7 +291,7 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
if (scsi_abort_command(scmd) == SUCCESS)
return BLK_EH_NOT_HANDLED;
 
-   scmd-result |= DID_TIME_OUT  16;
+   set_host_byte(scmd, DID_TIME_OUT);
 
if (unlikely(rtn == BLK_EH_NOT_HANDLED 
 !scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD)))
@@ -1776,7 +1776,7 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
break;
case DID_ABORT:
if (scmd-eh_eflags  SCSI_EH_ABORT_SCHEDULED) {
-   scmd-result |= DID_TIME_OUT  16;
+   set_host_byte(scmd, DID_TIME_OUT);
return SUCCESS;
}
case DID_NO_CONNECT:
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/6] virtio-scsi: avoid cancelling uninitialized work items

2014-06-04 Thread Paolo Bonzini
Calling the workqueue interface on uninitialized work items isn't a
good idea even if they're zeroed. It's not failing catastrophically only
through happy accidents.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index f0b4cdbfceb0..d66c4ee2c774 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -253,6 +253,8 @@ static void virtscsi_ctrl_done(struct virtqueue *vq)
virtscsi_vq_done(vscsi, vscsi-ctrl_vq, virtscsi_complete_free);
 };
 
+static void virtscsi_handle_event(struct work_struct *work);
+
 static int virtscsi_kick_event(struct virtio_scsi *vscsi,
   struct virtio_scsi_event_node *event_node)
 {
@@ -260,6 +262,7 @@ static int virtscsi_kick_event(struct virtio_scsi *vscsi,
struct scatterlist sg;
unsigned long flags;
 
+   INIT_WORK(event_node-work, virtscsi_handle_event);
sg_init_one(sg, event_node-event, sizeof(struct virtio_scsi_event));
 
spin_lock_irqsave(vscsi-event_vq.vq_lock, flags);
@@ -377,7 +380,6 @@ static void virtscsi_complete_event(struct virtio_scsi 
*vscsi, void *buf)
 {
struct virtio_scsi_event_node *event_node = buf;
 
-   INIT_WORK(event_node-work, virtscsi_handle_event);
schedule_work(event_node-work);
 }
 
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/6] virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends()

2014-06-04 Thread Paolo Bonzini
From: Ming Lei tom.leim...@gmail.com

Access to tgt-req_vq is strictly serialized by spin_lock
of tgt-tgt_lock, so the ACCESS_ONCE() isn't necessary.

smp_read_barrier_depends() in virtscsi_req_done was introduced
to order reading req_vq and decreasing tgt-reqs, but it isn't
needed now because req_vq is read from
scsi-req_vqs[vq-index - VIRTIO_SCSI_VQ_BASE] instead of
tgt-req_vq, so remove the unnecessary barrier.

Also remove related comment about the barrier.

Cc: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Ming Lei tom.leim...@gmail.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 53 ++
 1 file changed, 6 insertions(+), 47 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index db3b494e5926..fc054935eb1f 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -73,17 +73,12 @@ struct virtio_scsi_vq {
  * queue, and also lets the driver optimize the IRQ affinity for the virtqueues
  * (each virtqueue's affinity is set to the CPU that owns the queue).
  *
- * An interesting effect of this policy is that only writes to req_vq need to
- * take the tgt_lock.  Read can be done outside the lock because:
+ * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq
+ * could be done locklessly, but we do not do it yet.
  *
- * - writes of req_vq only occur when atomic_inc_return(tgt-reqs) returns 1.
- *   In that case, no other CPU is reading req_vq: even if they were in
- *   virtscsi_queuecommand_multi, they would be spinning on tgt_lock.
- *
- * - reads of req_vq only occur when the target is not idle (reqs != 0).
- *   A CPU that enters virtscsi_queuecommand_multi will not modify req_vq.
- *
- * Similarly, decrements of reqs are never concurrent with writes of req_vq.
+ * Decrements of reqs are never concurrent with writes of req_vq: before the
+ * decrement reqs will be != 0; after the decrement the virtqueue completion
+ * routine will not use the req_vq so it can be changed by a new request.
  * Thus they can happen outside the tgt_lock, provided of course we make reqs
  * an atomic_t.
  */
@@ -238,38 +233,6 @@ static void virtscsi_req_done(struct virtqueue *vq)
int index = vq-index - VIRTIO_SCSI_VQ_BASE;
struct virtio_scsi_vq *req_vq = vscsi-req_vqs[index];
 
-   /*
-* Read req_vq before decrementing the reqs field in
-* virtscsi_complete_cmd.
-*
-* With barriers:
-*
-*  CPU #0  virtscsi_queuecommand_multi (CPU #1)
-*  
-*  lock vq_lock
-*  read req_vq
-*  read reqs (reqs = 1)
-*  write reqs (reqs = 0)
-*  increment reqs (reqs = 1)
-*  write req_vq
-*
-* Possible reordering without barriers:
-*
-*  CPU #0  virtscsi_queuecommand_multi (CPU #1)
-*  
-*  lock vq_lock
-*  read reqs (reqs = 1)
-*  write reqs (reqs = 0)
-*  increment reqs (reqs = 1)
-*  write req_vq
-*  read (wrong) req_vq
-*
-* We do not need a full smp_rmb, because req_vq is required to get
-* to tgt-reqs: tgt is vscsi-tgt[sc-device-id], where sc is stored
-* in the virtqueue as the user token.
-*/
-   smp_read_barrier_depends();
-
virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd);
 };
 
@@ -560,12 +523,8 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct 
virtio_scsi *vscsi,
 
spin_lock_irqsave(tgt-tgt_lock, flags);
 
-   /*
-* The memory barrier after atomic_inc_return matches
-* the smp_read_barrier_depends() in virtscsi_req_done.
-*/
if (atomic_inc_return(tgt-reqs)  1)
-   vq = ACCESS_ONCE(tgt-req_vq);
+   vq = tgt-req_vq;
else {
queue_num = smp_processor_id();
while (unlikely(queue_num = vscsi-num_queues))
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/6] virtio-scsi patches for 3.16 + midlayer fix

2014-06-04 Thread Paolo Bonzini
As mentioned in the v1 submission, these patches were delayed a bit
by the discussion and testing of patch 5.  Debugging the problem also
led to the discovery of a midlayer bug fixed by patch 4.

Paolo

v1-v2: fix all occurrences of scmd-result |= DID_TIME_OUT  16
in patch 4

v2-v3: really fix all occurrences in patch 4

Ming Lei (2):
  virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends()
  virtio-scsi: replace target spinlock with seqcount

Paolo Bonzini (2):
  virtio-scsi: avoid cancelling uninitialized work items
  virtio-scsi: fix various bad behavior on aborted requests

Ulrich Obergfell (1):
  scsi_error: fix invalid setting of host byte

Venkatesh Srinivas (1):
  virtio-scsi: Implement change_queue_depth for virtscsi targets

 drivers/scsi/scsi_error.c  |   6 +-
 drivers/scsi/virtio_scsi.c | 148 +++--
 2 files changed, 93 insertions(+), 61 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 5/6] virtio-scsi: fix various bad behavior on aborted requests

2014-06-04 Thread Paolo Bonzini

Il 04/06/2014 19:29, Venkatesh Srinivas ha scritto:

Do you really want to poll the request VQs for completions if the TMF
was rejected?


I wasn't sure, but bugs in this path are hard enough that I preferred 
the safer patch.



TMF ABORT may return FUNCTION REJECTED if the command to abort
completed before the device saw the TMF ABORT message, for example. In
such cases, this would
unnecessarily lengthen the EH path.


The cost of virtscsi_poll_requests should be nothing compared to the 
delay between the timeout and the invocation of the delayed_work, no?


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present

2014-06-03 Thread Paolo Bonzini

Il 03/06/2014 03:00, Martin K. Petersen ha scritto:

Paolo == Paolo Bonzini pbonz...@redhat.com writes:



+ sdev_printk(KERN_ERR, sdev,
+ %s: VPD page 0x83 NAA descriptor not found\n, __func__);
+
+ return;


Paolo I suspect this error will be relatively common.

You're right. But we would like to see it for devices that actually
implement copy offload. So I suggest the following tweak...


[SCSI] Look up and store NAA if VPD page 0x83 is present

Copy offloading requires us to know the NAA descriptor for both source
target device. This descriptor is mandatory in the Device Identification
VPD page. Locate this descriptor in the returned VPD data so we don't
have to do lookups for every copy command.


Reviewed-by: Paolo Bonzini pbonz...@redhat.com


Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Reviewed-by: Nicholas Bellinger n...@linux-iscsi.org

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 88d46fe6bf98..190dca4a8494 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -1024,6 +1024,62 @@ int scsi_get_vpd_page(struct scsi_device *sdev, u8 page, 
unsigned char *buf,
 EXPORT_SYMBOL_GPL(scsi_get_vpd_page);

 /**
+ * scsi_lookup_naa - Lookup NAA descriptor in VPD page 0x83
+ * @sdev: The device to ask
+ *
+ * Copy offloading requires us to know the NAA descriptor for both
+ * source and target device. This descriptor is mandatory in the Device
+ * Identification VPD page. Locate this descriptor in the returned VPD
+ * data so we don't have to do lookups for every copy command.
+ */
+static void scsi_lookup_naa(struct scsi_device *sdev)
+{
+   unsigned char *buf = sdev-vpd_pg83;
+   unsigned int len = sdev-vpd_pg83_len;
+
+   if (buf[1] != 0x83 || get_unaligned_be16(buf[2]) == 0) {
+   sdev_printk(KERN_ERR, sdev,
+   %s: VPD page 0x83 contains no descriptors\n,
+   __func__);
+   return;
+   }
+
+   buf += 4;
+   len -= 4;
+
+   do {
+   unsigned int desig_len = buf[3] + 4;
+
+   /* Binary code set */
+   if ((buf[0]  0xf) != 1)
+   goto skip;
+
+   /* Target association */
+   if ((buf[1]  4)  0x3)
+   goto skip;
+
+   /* NAA designator */
+   if ((buf[1]  0xf) != 0x3)
+   goto skip;
+
+   sdev-naa = buf;
+   sdev-naa_len = desig_len;
+
+   return;
+
+skip:
+   buf += desig_len;
+   len -= desig_len;
+
+   } while (len  0);
+
+   sdev_printk(KERN_ERR, sdev,
+   %s: VPD page 0x83 NAA descriptor not found\n, __func__);
+
+   return;
+}
+
+/**
  * scsi_attach_vpd - Attach Vital Product Data to a SCSI device structure
  * @sdev: The device to ask
  *
@@ -1107,6 +1163,10 @@ retry_pg83:
}
sdev-vpd_pg83_len = result;
sdev-vpd_pg83 = vpd_buf;
+
+   /* Lookup NAA if 3PC set in INQUIRY response */
+   if (sdev-inquiry_len = 6  sdev-inquiry[5]  (1  3))
+   scsi_lookup_naa(sdev);
}
 }

diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 5853c913d2b0..67bb70012802 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -119,6 +119,8 @@ struct scsi_device {
unsigned char *vpd_pg83;
int vpd_pg80_len;
unsigned char *vpd_pg80;
+   unsigned char naa_len;
+   unsigned char *naa;
unsigned char current_tag;  /* current tag */
struct scsi_target  *sdev_target;   /* used only for single_lun */




--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] virtio-scsi: fix various bad behavior on aborted requests

2014-06-03 Thread Paolo Bonzini
Even though the virtio-scsi spec guarantees that all requests related
to the TMF will have been completed by the time the TMF itself completes,
the request queue's callback might not have run yet.  This causes requests
to be completed more than once, and as a result triggers a variety of
BUGs or oopses.

Cc: sta...@vger.kernel.org
Cc: Ulrich Obergfell uober...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index d66c4ee2c774..fda9fb35 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -235,6 +235,16 @@ static void virtscsi_req_done(struct virtqueue *vq)
virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd);
 };
 
+static void virtscsi_poll_requests(struct virtio_scsi *vscsi)
+{
+   int i, num_vqs;
+
+   num_vqs = vscsi-num_queues;
+   for (i = 0; i  num_vqs; i++)
+   virtscsi_vq_done(vscsi, vscsi-req_vqs[i],
+virtscsi_complete_cmd);
+}
+
 static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf)
 {
struct virtio_scsi_cmd *cmd = buf;
@@ -579,6 +589,18 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, struct 
virtio_scsi_cmd *cmd)
cmd-resp.tmf.response == VIRTIO_SCSI_S_FUNCTION_SUCCEEDED)
ret = SUCCESS;
 
+   /*
+* The spec guarantees that all requests related to the TMF have
+* been completed, but the callback might not have run yet if
+* we're using independent interrupts (e.g. MSI).  Poll the
+* virtqueues once.
+*
+* In the abort case, sc-scsi_done will do nothing, because
+* the block layer must have detected a timeout and as a result
+* REQ_ATOM_COMPLETE has been set.
+*/
+   virtscsi_poll_requests(vscsi);
+
 out:
mempool_free(cmd, virtscsi_cmd_pool);
return ret;
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] virtio-scsi: Implement change_queue_depth for virtscsi targets

2014-06-03 Thread Paolo Bonzini
From: Venkatesh Srinivas venkate...@google.com

change_queue_depth allows changing per-target queue depth via sysfs.

It also allows the SCSI midlayer to ramp down the number of concurrent
inflight requests in response to a SCSI BUSY status response and allows
the midlayer to ramp the count back up to the device maximum when the
BUSY condition has resolved.

Signed-off-by: Venkatesh Srinivas venkate...@google.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index fda9fb35..de0f8d9b3682 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -26,6 +26,7 @@
 #include scsi/scsi_host.h
 #include scsi/scsi_device.h
 #include scsi/scsi_cmnd.h
+#include scsi/scsi_tcq.h
 #include linux/seqlock.h
 
 #define VIRTIO_SCSI_MEMPOOL_SZ 64
@@ -629,6 +630,36 @@ static int virtscsi_device_reset(struct scsi_cmnd *sc)
return virtscsi_tmf(vscsi, cmd);
 }
 
+/**
+ * virtscsi_change_queue_depth() - Change a virtscsi target's queue depth
+ * @sdev:  Virtscsi target whose queue depth to change
+ * @qdepth:New queue depth
+ * @reason:Reason for the queue depth change.
+ */
+static int virtscsi_change_queue_depth(struct scsi_device *sdev,
+  int qdepth,
+  int reason)
+{
+   struct Scsi_Host *shost = sdev-host;
+   int max_depth = shost-cmd_per_lun;
+
+   switch (reason) {
+   case SCSI_QDEPTH_QFULL: /* Drop qdepth in response to BUSY state */
+   scsi_track_queue_full(sdev, qdepth);
+   break;
+   case SCSI_QDEPTH_RAMP_UP: /* Raise qdepth after BUSY state resolved */
+   case SCSI_QDEPTH_DEFAULT: /* Manual change via sysfs */
+   scsi_adjust_queue_depth(sdev,
+   scsi_get_tag_type(sdev),
+   min(max_depth, qdepth));
+   break;
+   default:
+   return -EOPNOTSUPP;
+   }
+
+   return sdev-queue_depth;
+}
+
 static int virtscsi_abort(struct scsi_cmnd *sc)
 {
struct virtio_scsi *vscsi = shost_priv(sc-device-host);
@@ -683,6 +714,7 @@ static struct scsi_host_template 
virtscsi_host_template_single = {
.proc_name = virtio_scsi,
.this_id = -1,
.queuecommand = virtscsi_queuecommand_single,
+   .change_queue_depth = virtscsi_change_queue_depth,
.eh_abort_handler = virtscsi_abort,
.eh_device_reset_handler = virtscsi_device_reset,
 
@@ -699,6 +731,7 @@ static struct scsi_host_template 
virtscsi_host_template_multi = {
.proc_name = virtio_scsi,
.this_id = -1,
.queuecommand = virtscsi_queuecommand_multi,
+   .change_queue_depth = virtscsi_change_queue_depth,
.eh_abort_handler = virtscsi_abort,
.eh_device_reset_handler = virtscsi_device_reset,
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] scsi_error: fix invalid setting of host byte

2014-06-03 Thread Paolo Bonzini
From: Ulrich Obergfell uober...@redhat.com

After scsi_try_to_abort_cmd returns, the eh_abort_handler may have
already found that the command has completed in the device, causing
the host_byte to be nonzero (e.g. it could be DID_ABORT).  When
this happens, ORing DID_TIME_OUT into the host byte will corrupt
the result field and initiate an unwanted command retry.

Fix this by using set_host_byte instead, following the model of
commit 2082ebc45af9c9c648383b8cde0dc1948eadbf31.

Cc: sta...@vger.kernel.org
Signed-off-by: Ulrich Obergfell uober...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/scsi/scsi_error.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f17aa7aa7879..ab06ef6fd726 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -131,7 +131,7 @@ scmd_eh_abort_handler(struct work_struct *work)
aborting command %p\n, scmd));
rtn = scsi_try_to_abort_cmd(sdev-host-hostt, scmd);
if (rtn == SUCCESS) {
-   scmd-result |= DID_TIME_OUT  16;
+   set_host_byte(scmd, DID_TIME_OUT);
if (scsi_host_eh_past_deadline(sdev-host)) {
SCSI_LOG_ERROR_RECOVERY(3,
scmd_printk(KERN_INFO, scmd,
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends()

2014-06-03 Thread Paolo Bonzini
From: Ming Lei tom.leim...@gmail.com

Access to tgt-req_vq is strictly serialized by spin_lock
of tgt-tgt_lock, so the ACCESS_ONCE() isn't necessary.

smp_read_barrier_depends() in virtscsi_req_done was introduced
to order reading req_vq and decreasing tgt-reqs, but it isn't
needed now because req_vq is read from
scsi-req_vqs[vq-index - VIRTIO_SCSI_VQ_BASE] instead of
tgt-req_vq, so remove the unnecessary barrier.

Also remove related comment about the barrier.

Cc: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Ming Lei tom.leim...@gmail.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 53 ++
 1 file changed, 6 insertions(+), 47 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index db3b494e5926..fc054935eb1f 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -73,17 +73,12 @@ struct virtio_scsi_vq {
  * queue, and also lets the driver optimize the IRQ affinity for the virtqueues
  * (each virtqueue's affinity is set to the CPU that owns the queue).
  *
- * An interesting effect of this policy is that only writes to req_vq need to
- * take the tgt_lock.  Read can be done outside the lock because:
+ * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq
+ * could be done locklessly, but we do not do it yet.
  *
- * - writes of req_vq only occur when atomic_inc_return(tgt-reqs) returns 1.
- *   In that case, no other CPU is reading req_vq: even if they were in
- *   virtscsi_queuecommand_multi, they would be spinning on tgt_lock.
- *
- * - reads of req_vq only occur when the target is not idle (reqs != 0).
- *   A CPU that enters virtscsi_queuecommand_multi will not modify req_vq.
- *
- * Similarly, decrements of reqs are never concurrent with writes of req_vq.
+ * Decrements of reqs are never concurrent with writes of req_vq: before the
+ * decrement reqs will be != 0; after the decrement the virtqueue completion
+ * routine will not use the req_vq so it can be changed by a new request.
  * Thus they can happen outside the tgt_lock, provided of course we make reqs
  * an atomic_t.
  */
@@ -238,38 +233,6 @@ static void virtscsi_req_done(struct virtqueue *vq)
int index = vq-index - VIRTIO_SCSI_VQ_BASE;
struct virtio_scsi_vq *req_vq = vscsi-req_vqs[index];
 
-   /*
-* Read req_vq before decrementing the reqs field in
-* virtscsi_complete_cmd.
-*
-* With barriers:
-*
-*  CPU #0  virtscsi_queuecommand_multi (CPU #1)
-*  
-*  lock vq_lock
-*  read req_vq
-*  read reqs (reqs = 1)
-*  write reqs (reqs = 0)
-*  increment reqs (reqs = 1)
-*  write req_vq
-*
-* Possible reordering without barriers:
-*
-*  CPU #0  virtscsi_queuecommand_multi (CPU #1)
-*  
-*  lock vq_lock
-*  read reqs (reqs = 1)
-*  write reqs (reqs = 0)
-*  increment reqs (reqs = 1)
-*  write req_vq
-*  read (wrong) req_vq
-*
-* We do not need a full smp_rmb, because req_vq is required to get
-* to tgt-reqs: tgt is vscsi-tgt[sc-device-id], where sc is stored
-* in the virtqueue as the user token.
-*/
-   smp_read_barrier_depends();
-
virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd);
 };
 
@@ -560,12 +523,8 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct 
virtio_scsi *vscsi,
 
spin_lock_irqsave(tgt-tgt_lock, flags);
 
-   /*
-* The memory barrier after atomic_inc_return matches
-* the smp_read_barrier_depends() in virtscsi_req_done.
-*/
if (atomic_inc_return(tgt-reqs)  1)
-   vq = ACCESS_ONCE(tgt-req_vq);
+   vq = tgt-req_vq;
else {
queue_num = smp_processor_id();
while (unlikely(queue_num = vscsi-num_queues))
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH resend 0/6] virtio-scsi patches for 3.16 + a midlayer one-liner

2014-06-03 Thread Paolo Bonzini
Sorry for the delay, mostly caused by the discussion and testing of
patch 5.  That's the most important of the bunch, and debugging the
problem also led to the discovery of a midlayer bug fixed by patch 4.

I hope these can make their way into 3.16.  Thanks!

resend: forgot linux-scsi. :(

Paolo


Ming Lei (2):
  virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends()
  virtio-scsi: replace target spinlock with seqcount

Paolo Bonzini (2):
  virtio-scsi: avoid cancelling uninitialized work items
  virtio-scsi: fix various bad behavior on aborted requests

Ulrich Obergfell (1):
  scsi_error: fix invalid setting of host byte

Venkatesh Srinivas (1):
  virtio-scsi: Implement change_queue_depth for virtscsi targets

 drivers/scsi/scsi_error.c  |   2 +-
 drivers/scsi/virtio_scsi.c | 148 +++--
 2 files changed, 91 insertions(+), 59 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present

2014-06-02 Thread Paolo Bonzini
Il 29/05/2014 05:52, Martin K. Petersen ha scritto:
 + sdev_printk(KERN_ERR, sdev,
 + %s: VPD page 0x83 NAA descriptor not found\n, __func__);
 +
 + return;

I suspect this error will be relatively common.

libata for example has

if (ata_id_has_wwn(args-id)) {
/* SAT defined lu world wide name */
/* piv=0, assoc=lu, code_set=binary, designator=NAA */
rbuf[num + 0] = 1;
rbuf[num + 1] = 3;
rbuf[num + 3] = ATA_ID_WWN_LEN;
num += 4;
ata_id_string(args-id, (unsigned char *) rbuf + num,
  ATA_ID_WWN, ATA_ID_WWN_LEN);
num += ATA_ID_WWN_LEN;
}
rbuf[3] = num - 4;/* page len (assume less than 256 bytes) */

and most of the time IDE disks in a virtual machine are configured without
a WWN.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd

2014-05-30 Thread Paolo Bonzini

Il 30/05/2014 10:15, Liu Ping Fan ha scritto:

When running io stress test on large latency scsi-disk, e.g guest with virtscsi
on a nfs image. It can trigger the BUG_ON(test_bit(REQ_ATOM_COMPLETE, 
req-atomic_flags));
in blk_start_request().
Since there is a race between latency finishing scmd and the re-allocated 
scmd.
I.e a request is still referred by a scmd, but we had turn it back to
slab.

This series introduces the ref on scmd to exclude this issue, and the following 
is ref rules.

  inc ref rules is like the following:
When setup a scmd, inited as 1. When add a timer inc 1.

  dec ref rules is like the following:
-for the normal ref
   scsi_done() will drop the ref when fail to acquire REQ_ATOM_COMPLETE 
immediately
   or drop the ref by scsi_end_request()
   or drop by return SUCCESS_REMOVE
-for a timer ref
   when deleting timer, if !list_empty(timeout_list), then there is a timer 
ref, and
   drop it.


patch1-2: fix the current potential bug
patch3~6: prepare for the mechanism for the ref
patch7:   the ref rules core
patch8-9:  e.g and test-issue for the new mechanism. Since lack of many 
virtscsi background,
   patch8 may be poor and need to be improved :)


Note: all the patches are based on rhel7, whose kernel version is linux-3.10.
  I will rebase them onto the latest commit if my method is practical.


This series is not necessary, this is a bug in the virtscsi driver.  I 
have a ten line patch to fix it, but I haven't yet tested it properly.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-scsi: replace target spinlock with seqcount

2014-05-28 Thread Paolo Bonzini

Il 28/05/2014 03:48, Ming Lei ha scritto:

On Wed, May 28, 2014 at 12:57 AM, Paolo Bonzini pbonz...@redhat.com wrote:

Il 27/05/2014 18:50, Venkatesh Srinivas ha scritto:


Hi,

I think this patch has a small race involving just two commands:

1. The first command to a target is in virtscsi_pick_vq(), after
   atomic_inc_return(tgt-tgt_lock)) but before write_seqcount_begin()


Good catch, thanks Venkatesh

And it doesn't happen in my test, so looks it won't be easy to trigger, :-)


No, it's basically impossible because the race window is very small and 
the first command (INQUIRY or REPORT LUNS) is likely to be synchronous 
anyway.  But it's there anyway.



Specifically:

@@ -508,19 +507,33 @@ static struct virtio_scsi_vq
*virtscsi_pick_vq(struct virtio_scsi *vscsi,
 unsigned long flags;
 u32 queue_num;

-spin_lock_irqsave(tgt-tgt_lock, flags);
+local_irq_save(flags);
+if (atomic_inc_return(tgt-reqs)  1) {
+unsigned long seq;
+
+do {
+seq = read_seqcount_begin(tgt-tgt_seq);
+vq = tgt-req_vq;
+} while (read_seqcount_retry(tgt-tgt_seq, seq));
+} else {

A second virtscsi_pick_vq() here will read a stale or NULL tgt-req_vq.



The NULL case is true, indeed I was going to send a patch to initialize
tgt-req_vq but I have not tested it.

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index fc054935eb1f..f0b4cdbfceb0 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -618,14 +631,17 @@ static int virtscsi_abort(struct scsi_cmnd *sc)

 static int virtscsi_target_alloc(struct scsi_target *starget)
 {
+   struct Scsi_Host *sh = dev_to_shost(starget-dev.parent);
+   struct virtio_scsi *vscsi = shost_priv(sh);
+
struct virtio_scsi_target_state *tgt =
kmalloc(sizeof(*tgt), GFP_KERNEL);
if (!tgt)
return -ENOMEM;


seqcount_init(tgt-tgt_seq);
atomic_set(tgt-reqs, 0);
-   tgt-req_vq = NULL;
+   tgt-req_vq = vscsi-req_vqs[0];

starget-hostdata = tgt;
return 0;


Paolo, do you need me to integrate this one into the patch? or
you will make it as one standalone?


I will integrate it myself after testing it.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [target:for-next 20/20] drivers/scsi/virtio_scsi.c:531:48: error: dereferencing pointer to incomplete type

2014-05-28 Thread Paolo Bonzini

Il 28/05/2014 00:21, Nicholas A. Bellinger ha scritto:

On Mon, 2014-05-26 at 13:30 -0400, Martin K. Petersen wrote:

Nic == Nicholas A Bellinger n...@linux-iscsi.org writes:



What about #ifdef'ing VIRTIO_SCSI_F_T10_PI support out if
!CONFIG_BLK_DEV_INTEGRITY?


Nic I figured it was slightly cleaner to enable BLK_DEV_INTEGRITY by
Nic default when referencing struct blk_integrity (following what
Nic IBLOCK does), than adding the equivalent #ifdef's..

Nic MKP, do you have a preference on this..?

Well, I guess how important the virtio stuff is in the embedded space?

In the block layer we have all these wrappers that allow things to Do
The Right Thing when the BLK_DEV_INTEGRITY is disabled. I'm not entirely
sure it worth the hassle to do the same for target and virtio. The
memory savings aren't that big to begin with.

And besides, the bip pointer field in struct bio is about to become
generic with my impending copy offload patches anyway,



In that case, I'll leave as is unless Paolo has an objection.


No, that's fine.

Paolo

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler

2014-05-27 Thread Paolo Bonzini

Il 27/05/2014 10:56, James Bottomley ha scritto:

The whole reason BUG_ON doesn't leave a log trace is to try to prevent
corruption propagating to the data storage devices.  What you propose
would be inviting that corruption in the name of getting a log entry.


Even this is not entirely correct.  kdump can be used to capture the 
full log of the BUG_ON.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] block: Introduce blk_rq_completed()

2014-05-27 Thread Paolo Bonzini

Il 27/05/2014 12:21, James Bottomley ha scritto:

I could also see us one day extending the TMF capability to abort any
running command, which would make even an assertion of block timed out
or completed invalid.


Actually the assertion would remain valid, and that's exactly what Bart 
wants to document with this assertion.


Completed from the point of view of the block layer really means do 
not run the softirq.  If you wanted to abort any running command, you 
still would mark the request as completed for the block layer before 
issuing the TMF.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] block: Introduce blk_rq_completed()

2014-05-27 Thread Paolo Bonzini

Il 27/05/2014 12:59, James Bottomley ha scritto:

On Tue, 2014-05-27 at 12:47 +0200, Paolo Bonzini wrote:

Il 27/05/2014 12:21, James Bottomley ha scritto:

I could also see us one day extending the TMF capability to abort any
running command, which would make even an assertion of block timed out
or completed invalid.


Actually the assertion would remain valid, and that's exactly what Bart
wants to document with this assertion.


No, it wouldn't: if we abort a running command by definition the command
hadn't timed out and might not be completed.  This is required by TMF
handling because now you have an abort racing with a completion.  Either
the command completes normally because it misses the abort or the abort
gets to it and its returned status is set to TASK_ABORTED.  That's the
only way you can tell if the abort was successful or not.

If you're thinking we would tell block to ignore returning commands
before issuing the abort, we'd never be able to tell if the abort were
successful, so we have to allow the race to collect the status.


You could use a different mechanism than a softirq to tell the abort 
were successful, for example by overriding scsi_done.  But with respect 
to the block layer, the mechanics of avoiding the race and double-free 
would probably be the same.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] block: Introduce blk_rq_completed()

2014-05-27 Thread Paolo Bonzini

Il 27/05/2014 13:26, James Bottomley ha scritto:

You could use a different mechanism than a softirq to tell the abort
were successful, for example by overriding scsi_done.  But with respect
to the block layer, the mechanics of avoiding the race and double-free
would probably be the same.


I think there's some confusion about what the race and double free is:
It only occurs with timeouts.  In a timeout situation, the host had
decided it's not waiting any longer for the target to respond and
proceeds to error recovery.  At any time between the host making this
decision up to the point it kicks the target hard enough to clear all
in-flight commands, the target may return the command.  If we didn't
have some ignore function on command completions while we're handling
errors, this would lead to double completion.

If we decided to allow arbitrary aborts of running commands, we would
send a TMF in during the normal (i.e. un timed out) command period.
Because there's no timeout involved, there's no double free problem.
The race in this case is whether the abort catches the command or not
and to mediate that race we need the normal status return.


I'm not sure why no timeout implies no double free.  There would 
still be a race between the interrupt handler and softirq on one side, 
and the abort handler on the other.  The interrupt handler's call to 
cmd-scsi_done ends up triggering the softirq and thus freeing the 
command with scsi_put_command.  The abort handler, as you mentioned, 
wants the status return so it needs the interrupt handler to run---but 
not the softirq.


A simple way to avoid this could be to skip the softirq processing, by 
marking the request block-layer-complete, and do everything in the abort 
handler.  The interrupt handler would still run and fill in the status.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-scsi: replace target spinlock with seqcount

2014-05-27 Thread Paolo Bonzini

Il 27/05/2014 18:50, Venkatesh Srinivas ha scritto:

Hi,

I think this patch has a small race involving just two commands:

1. The first command to a target is in virtscsi_pick_vq(), after
   atomic_inc_return(tgt-tgt_lock)) but before write_seqcount_begin()
2. A second command to the same target enters virtscsi_pick_vq(). It
   will hit the same atomic inc, take the upper branch of the
   conditional, and read out a stale (or NULL) tgt-req_vq.

Specifically:

@@ -508,19 +507,33 @@ static struct virtio_scsi_vq
*virtscsi_pick_vq(struct virtio_scsi *vscsi,
 unsigned long flags;
 u32 queue_num;

-spin_lock_irqsave(tgt-tgt_lock, flags);
+local_irq_save(flags);
+if (atomic_inc_return(tgt-reqs)  1) {
+unsigned long seq;
+
+do {
+seq = read_seqcount_begin(tgt-tgt_seq);
+vq = tgt-req_vq;
+} while (read_seqcount_retry(tgt-tgt_seq, seq));
+} else {

A second virtscsi_pick_vq() here will read a stale or NULL tgt-req_vq.


The NULL case is true, indeed I was going to send a patch to initialize
tgt-req_vq but I have not tested it.

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index fc054935eb1f..f0b4cdbfceb0 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -618,14 +631,17 @@ static int virtscsi_abort(struct scsi_cmnd *sc)
 static int virtscsi_target_alloc(struct scsi_target *starget)
 {
+   struct Scsi_Host *sh = dev_to_shost(starget-dev.parent);
+   struct virtio_scsi *vscsi = shost_priv(sh);
+
struct virtio_scsi_target_state *tgt =
kmalloc(sizeof(*tgt), GFP_KERNEL);
if (!tgt)
return -ENOMEM;

seqcount_init(tgt-tgt_seq);
atomic_set(tgt-reqs, 0);
-   tgt-req_vq = NULL;
+   tgt-req_vq = vscsi-req_vqs[0];

starget-hostdata = tgt;
return 0;


The case of a stale req_vq is okay and is the (IMO reasonable) price to
pay for allowing more concurrency.  If you have concurrent requests
from multiple VCPUs, multiqueue is not great for your workload anyway,
at least with the current steering algorithm.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler

2014-05-26 Thread Paolo Bonzini

Il 26/05/2014 17:14, Bart Van Assche ha scritto:

scsi_put_command() is either invoked before a command is queued or
after a command has completed. scsi_cmnd.abort_work is scheduled
after a command has timed out and before it is finished. The block
layer guarantees that either the softirq_done_fn() or the
rq_timed_out_fn() function is invoked but not both. This means that
scsi_put_command() is never invoked while abort_work is scheduled.
Hence remove the cancel_delayed_work() call from scsi_put_command().

Similarly, scsi_abort_command() is only invoked from the SCSI
timeout handler. If scsi_abort_command() is invoked for a SCSI
command with the SCSI_EH_ABORT_SCHEDULED flag set this means that
scmd_eh_abort_handler() has already invoked scsi_queue_insert() and
hence that scsi_cmnd.abort_work is no longer pending. Hence also
remove the cancel_delayed_work() call from scsi_abort_command().

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Hannes Reinecke h...@suse.de
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Jens Axboe ax...@fb.com
Cc: Joe Lawrence joe.lawre...@stratus.com
---
 drivers/scsi/scsi.c   | 2 +-
 drivers/scsi/scsi_error.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 88d46fe..c972eab 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -334,7 +334,7 @@ void scsi_put_command(struct scsi_cmnd *cmd)
list_del_init(cmd-list);
spin_unlock_irqrestore(cmd-device-list_lock, flags);

-   cancel_delayed_work(cmd-abort_work);
+   WARN_ON_ONCE(delayed_work_pending(cmd-abort_work));

__scsi_put_command(cmd-device-host, cmd);
 }
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f17aa7a..5232583 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -193,7 +193,7 @@ scsi_abort_command(struct scsi_cmnd *scmd)
SCSI_LOG_ERROR_RECOVERY(3,
scmd_printk(KERN_INFO, scmd,
scmd %p previous abort failed\n, scmd));
-   cancel_delayed_work(scmd-abort_work);
+   WARN_ON_ONCE(delayed_work_pending(scmd-abort_work));
return FAILED;
}




I still prefer a BUG_ON in scsi_put_command, but anyway: series

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [target:for-next 20/20] drivers/scsi/virtio_scsi.c:531:48: error: dereferencing pointer to incomplete type

2014-05-23 Thread Paolo Bonzini

Il 22/05/2014 22:46, Nicholas A. Bellinger ha scritto:

Hi Fengguang,

On Thu, 2014-05-22 at 11:13 +0800, kbuild test robot wrote:

tree:   git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git 
for-next
head:   4baaa7d589e24bfe87dfd6c7a1229108be404a28
commit: 4baaa7d589e24bfe87dfd6c7a1229108be404a28 [20/20] virtio-scsi: Enable 
DIF/DIX modes in SCSI host LLD
config: x86_64-randconfig-x000 (attached as .config)

All error/warnings:

   drivers/scsi/virtio_scsi.c: In function 'virtio_scsi_init_hdr_pi':

drivers/scsi/virtio_scsi.c:531:48: error: dereferencing pointer to incomplete 
type

  cmd_pi-pi_bytesout = blk_rq_sectors(rq) * bi-tuple_size;
   ^

drivers/scsi/virtio_scsi.c:533:47: error: dereferencing pointer to incomplete 
type

  cmd_pi-pi_bytesin = blk_rq_sectors(rq) * bi-tuple_size;
  ^

vim +531 drivers/scsi/virtio_scsi.c

   525  if (!rq || !scsi_prot_sg_count(sc))
   526  return;
   527  
   528  bi = blk_get_integrity(rq-rq_disk);
   529  
   530  if (sc-sc_data_direction == DMA_TO_DEVICE)
  531   cmd_pi-pi_bytesout = blk_rq_sectors(rq) * 
bi-tuple_size;
   532  else if (sc-sc_data_direction == DMA_FROM_DEVICE)
  533   cmd_pi-pi_bytesin = blk_rq_sectors(rq) * 
bi-tuple_size;
   534  }
   535  
   536  static int virtscsi_queuecommand(struct virtio_scsi *vscsi,



Squashing the following into the original commit to enable blk-integrity
for virtio-scsi to address the randconfig build failure above.

Thanks!

--nab

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 02832d6..baca589 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -1773,6 +1773,7 @@ config SCSI_BFA_FC
 config SCSI_VIRTIO
tristate virtio-scsi support
depends on VIRTIO
+   select BLK_DEV_INTEGRITY
help
   This is the virtual HBA driver for virtio.  If the kernel will
   be used in a virtual machine, say Y or M.


What about #ifdef'ing VIRTIO_SCSI_F_T10_PI support out if 
!CONFIG_BLK_DEV_INTEGRITY?


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-23 Thread Paolo Bonzini

Il 23/05/2014 03:28, Elliott, Robert (Server Storage) ha scritto:

Depending on the transport, there may be a race condition between
the command status and the ABORT TASK response; the ABORT TASK
response might get back first.  I think that means
scsi_eh_abort_handler's call to scsi_finish_command could
happen during or after scsi_softirq_done has called
scsi_finish_command.


That was my doubt, in fact.  But actually Bart explained that this is 
not the case.  Either scsi_eh_abort_handler will be called via the work 
item, or the command will never be sent to the softirq.


Paolo

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command()

2014-05-23 Thread Paolo Bonzini

Il 23/05/2014 08:09, Hannes Reinecke ha scritto:


And when freeing a command we absolutely need to make sure that
the workqueue is empty.
So calling cancel_delayed_work() was the obvious thing to do.


You would need cancel_delayed_work_sync, but if it really happened that 
the work item is running, it would cause a double free.



I'd be fine with adding a WARN_ON(!list_empty(cmd-abort_work))
here, however. This will clear up the intent of this statement.


BUG_ON even, since you'd get badness from the double free anyway.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command()

2014-05-23 Thread Paolo Bonzini

Il 23/05/2014 12:37, Bart Van Assche ha scritto:

On 05/23/14 11:24, Paolo Bonzini wrote:

Il 23/05/2014 08:09, Hannes Reinecke ha scritto:


And when freeing a command we absolutely need to make sure that
the workqueue is empty.
So calling cancel_delayed_work() was the obvious thing to do.


You would need cancel_delayed_work_sync, but if it really happened that
the work item is running, it would cause a double free.


I'd be fine with adding a WARN_ON(!list_empty(cmd-abort_work))
here, however. This will clear up the intent of this statement.


BUG_ON even, since you'd get badness from the double free anyway.


Hello Paolo,

Are you aware that Linus strongly prefers WARN_ON_ONCE() over BUG_ON() ?
See e.g. https://lkml.org/lkml/2012/9/27/461 or
https://lkml.org/lkml/2014/4/28/657.


Yes, I am and I even downgraded some KVM BUG_ONs recently.

But in this case I think that memory corruption is going to happen 
anyway unless you consciously leak the Scsi_Cmnd * (because if you use 
WARN_ON, you also need to return early as Linus suggested in the second 
email).


So the WARN_ON/BUG_ON choice here should not just consider what makes 
the problem easier to debug; hanging the machine before guaranteed 
badness seems to me like a good use for BUG_ON.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-v2 6/6] virtio-scsi: Enable DIF/DIX modes in SCSI host LLD

2014-05-22 Thread Paolo Bonzini

Il 22/05/2014 04:26, Nicholas A. Bellinger ha scritto:

From: Nicholas Bellinger n...@linux-iscsi.org

This patch updates virtscsi_probe() to setup necessary Scsi_Host
level protection resources. (currently hardcoded to 1)

It changes virtscsi_add_cmd() to attach outgoing / incoming
protection SGLs preceeding the data payload, and is using the
new virtio_scsi_cmd_req_pi-d[oi],pi_niv field to signal


Wrong field name in the commit message...


to signal to vhost/scsi how many prot_sgs to expect.


s/prot_sgs to expect/bytes to expect for protection data/

Apart from this, which can be fixed in the pull request,

Acked-by: Paolo Bonzini pbonz...@redhat.com

for getting this in via the target tree.

Paolo


v4 changes:
  - Convert virtio_scsi_init_hdr_pi to use pi_bytesout + pi_bytesin
(mst + paolo + nab)
  - Use blk_integrity-tuple_size to calculate pi bytes (nab)

v3 changes:
  - Use VIRTIO_SCSI_F_T10_PI to determine PI or non PI header (Paolo)

v2 changes:
  - Make protection buffer come before data buffer (Paolo)
  - Enable virtio_scsi_cmd_req_pi usage (Paolo)

Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Martin K. Petersen martin.peter...@oracle.com
Cc: Christoph Hellwig h...@lst.de
Cc: Hannes Reinecke h...@suse.de
Cc: Sagi Grimberg sa...@dev.mellanox.co.il
Cc: H. Peter Anvin h...@zytor.com
Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org
---
 drivers/scsi/virtio_scsi.c |   86 ++--
 1 file changed, 68 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 16bfd50..cc634b0 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -37,6 +37,7 @@ struct virtio_scsi_cmd {
struct completion *comp;
union {
struct virtio_scsi_cmd_req   cmd;
+   struct virtio_scsi_cmd_req_picmd_pi;
struct virtio_scsi_ctrl_tmf_req  tmf;
struct virtio_scsi_ctrl_an_req   an;
} req;
@@ -440,7 +441,7 @@ static int virtscsi_add_cmd(struct virtqueue *vq,
size_t req_size, size_t resp_size, gfp_t gfp)
 {
struct scsi_cmnd *sc = cmd-sc;
-   struct scatterlist *sgs[4], req, resp;
+   struct scatterlist *sgs[6], req, resp;
struct sg_table *out, *in;
unsigned out_num = 0, in_num = 0;

@@ -458,16 +459,24 @@ static int virtscsi_add_cmd(struct virtqueue *vq,
sgs[out_num++] = req;

/* Data-out buffer.  */
-   if (out)
+   if (out) {
+   /* Place WRITE protection SGLs before Data OUT payload */
+   if (scsi_prot_sg_count(sc))
+   sgs[out_num++] = scsi_prot_sglist(sc);
sgs[out_num++] = out-sgl;
+   }

/* Response header.  */
sg_init_one(resp, cmd-resp, resp_size);
sgs[out_num + in_num++] = resp;

/* Data-in buffer */
-   if (in)
+   if (in) {
+   /* Place READ protection SGLs before Data IN payload */
+   if (scsi_prot_sg_count(sc))
+   sgs[out_num + in_num++] = scsi_prot_sglist(sc);
sgs[out_num + in_num++] = in-sgl;
+   }

return virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, gfp);
 }
@@ -492,12 +501,44 @@ static int virtscsi_kick_cmd(struct virtio_scsi_vq *vq,
return err;
 }

+static void virtio_scsi_init_hdr(struct virtio_scsi_cmd_req *cmd,
+struct scsi_cmnd *sc)
+{
+   cmd-lun[0] = 1;
+   cmd-lun[1] = sc-device-id;
+   cmd-lun[2] = (sc-device-lun  8) | 0x40;
+   cmd-lun[3] = sc-device-lun  0xff;
+   cmd-tag = (unsigned long)sc;
+   cmd-task_attr = VIRTIO_SCSI_S_SIMPLE;
+   cmd-prio = 0;
+   cmd-crn = 0;
+}
+
+static void virtio_scsi_init_hdr_pi(struct virtio_scsi_cmd_req_pi *cmd_pi,
+   struct scsi_cmnd *sc)
+{
+   struct request *rq = sc-request;
+   struct blk_integrity *bi;
+
+   virtio_scsi_init_hdr((struct virtio_scsi_cmd_req *)cmd_pi, sc);
+
+   if (!rq || !scsi_prot_sg_count(sc))
+   return;
+
+   bi = blk_get_integrity(rq-rq_disk);
+
+   if (sc-sc_data_direction == DMA_TO_DEVICE)
+   cmd_pi-pi_bytesout = blk_rq_sectors(rq) * bi-tuple_size;
+   else if (sc-sc_data_direction == DMA_FROM_DEVICE)
+   cmd_pi-pi_bytesin = blk_rq_sectors(rq) * bi-tuple_size;
+}
+
 static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
 struct virtio_scsi_vq *req_vq,
 struct scsi_cmnd *sc)
 {
struct virtio_scsi_cmd *cmd;
-   int ret;
+   int ret, req_size;

struct Scsi_Host *shost = virtio_scsi_host(vscsi-vdev);
BUG_ON(scsi_sg_count(sc)  shost-sg_tablesize);
@@ -515,22 +556,20 @@ static int virtscsi_queuecommand(struct virtio_scsi 
*vscsi,

memset(cmd, 0, sizeof(*cmd));
cmd-sc

Re: [PATCH-v2 0/6] vhost/scsi: Add T10 PI SGL passthrough support

2014-05-22 Thread Paolo Bonzini

Il 22/05/2014 04:26, Nicholas A. Bellinger ha scritto:

From: Nicholas Bellinger n...@linux-iscsi.org

Hi MST, MKP, Paolo  Co,

Here is the v2 patch series for adding T1O protection information (PI)
SGL passthrough support between virtio-scsi LLD + vhost-scsi fabric
endpoints.

Following MST's recommendation, it includes the changes for using
bytes intead of number of iovecs in virtio_scsi_cmd_req_pi along with
the associated changes to virtio-scsi + vhost/scsi code.

For vhost/scsi, this includes walking the leading iovec's length(s)
to determine where protection payload ends, and real data payload
starts.  For virtio-scsi LLD code, this includes locating struct
blk_integrity for using blk_rq_sectors + -tuple_size to calculate
the total bytes for outgoing cmd_pi-pi_bytes[out,in] values.

The full list of changes from last series include:

  - Use pi_bytesout + pi_bytesin instead of niovs in virtio-scsi PI
header (mst + paolo)
  - Add prot_pto=1 in tcm_vhost_submission_work() when no PI buffer
exists (nab)
  - Get rid of req + cdb pointer casts in vhost_scsi_handle_vq (mst)
  - Ensure that virtio_scsi_cmd_req_pi processing happens regardless
of data_num in vhost_scsi_handle_vq (nab)
  - Pass TARGET_PROT_ALL into transport_init_session_tags() (nab)
  - Convert vhost_scsi_handle_vq to use memcpy_fromiovecend() (mst)
  - Convert vhost_scsi_handle_vq to use pi_bytesout + pi_bytesin (nab)
  - Convert virtio_scsi_init_hdr_pi() to use pi_bytesout + pi_bytesin
(mst + paolo + nab)
  - Use blk_integrity-tuple_size to calculate pi bytes (nab)

Please review for v3.16-rc1 code.

Thanks!

--nab

Nicholas Bellinger (6):
  virtio-scsi.h: Add virtio_scsi_cmd_req_pi + VIRTIO_SCSI_F_T10_PI bits
  vhost/scsi: Move sanity check into vhost_scsi_map_iov_to_sgl
  vhost/scsi: Add preallocation of protection SGLs
  vhost/scsi: Add T10 PI IOV - SGL memory mapping logic
  vhost/scsi: Enable T10 PI IOV - SGL memory mapping
  virtio-scsi: Enable DIF/DIX modes in SCSI host LLD

 drivers/scsi/virtio_scsi.c  |   86 +---
 drivers/vhost/scsi.c|  305 +--
 include/linux/virtio_scsi.h |   15 ++-
 3 files changed, 292 insertions(+), 114 deletions(-)



Looks good, thanks!

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command()

2014-05-22 Thread Paolo Bonzini

Il 21/05/2014 15:30, Bart Van Assche ha scritto:

+static bool scmd_being_handled_in_other_context(struct scsi_cmnd *scmd)
+{
+   struct Scsi_Host *shost = scmd-device-host;
+   struct scsi_cmnd *c;
+   unsigned long flags;
+   bool ret = false;
+
+   if (!blk_rq_completed(scmd-request))
+   return true;
+
+   spin_lock_irqsave(shost-host_lock, flags);
+   list_for_each_entry(c, shost-eh_cmd_q, eh_entry) {
+   if (c == scmd) {
+   ret = true;
+   break;
+   }
+   }
+   spin_unlock_irqrestore(shost-host_lock, flags);
+
+   return ret;
+}
+
 /**
  * scmd_eh_abort_handler - Handle command aborts
  * @work:  command to be aborted.
@@ -120,6 +142,8 @@ scmd_eh_abort_handler(struct work_struct *work)
struct scsi_device *sdev = scmd-device;
int rtn;

+   WARN_ON_ONCE(scmd_being_handled_in_other_context(scmd));


What about a simpler, though less accuracte

WARN_ON(!blk_rq_completed(scmd-request));

that doesn't need the host_lock?

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-21 Thread Paolo Bonzini

Il 21/05/2014 16:16, Mark Wu ha scritto:

Is it possible that when scsi_done is invoked, the scsi command or the
request has been freed and then reallocated for a new I/O request?  Because
of this the bit flag REQ_ATOM_COMPLETE becomes unreliable.  Thanks!


It is up to the driver to ensure that the interrupt handler will not 
process the Scsi_Cmnd* after returning from the eh_abort_handler.  If 
you do this, what you say cannot happen.  Otherwise you'll get a variety 
of failures, the most common of which for me are OOPSes and a BUG in 
blk_finish_request.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] virtio-scsi: Enable DIF/DIX modes in SCSI host LLD

2014-05-20 Thread Paolo Bonzini

Il 19/05/2014 21:07, Nicholas A. Bellinger ha scritto:

Hi MST,

So I've finally got some cycles to get back to this code, and wanted to
verify the outstanding items you had previously raised:

- Convert vhost-scsi to be independent of IOV layout using
  memcpy_fromiovecend. (Does this effect existing non PI virtio-scsi
  operation..?)
- Report VIRTIO_F_ANY_LAYOUT feature bit to userspace.
- Convert virtio_scsi_cmd_req_pi to bytes field instead of number of
  iovecs.


This is the only item really required, since the bytes field is what 
will be in VIRTIO 1.0.  The other would be nice to have, but not a 
blocker for PI support.



- Ensure virtio_scsi_cmd_req_pi is naturally aligned


mst already commented on this.


- Figure out why QEMU is not acking (any) vhost-scsi feature bits


This is a separate bug, isn't it?  It need not block PI support.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio_scsi: don't call virtqueue_add_sgs(... GFP_NOIO) holding spinlock.

2014-05-20 Thread Paolo Bonzini

Il 20/05/2014 07:12, Rusty Russell ha scritto:

This triggers every time we do a SCSI abort:

virtscsi_tmf - virtscsi_kick_cmd (grab lock and call) - virtscsi_add_cmd
- virtqueue_add_sgs (GFP_NOIO)

Logs look like this:
 sd 0:0:0:0: [sda] abort
 BUG: sleeping function called from invalid context at mm/slub.c:966
 in_atomic(): 1, irqs_disabled(): 1, pid: 6, name: kworker/u2:0
 3 locks held by kworker/u2:0/6:
  #0:  (scsi_tmf_%dshost-host_no){..}, at: [c0153180] 
process_one_work+0xe0/0x3d0
  #1:  (((cmd-abort_work)-work)){..}, at: [c0153180] 
process_one_work+0xe0/0x3d0
  #2:  ((virtscsi_vq-vq_lock)-rlock){..}, at: [c043f508] 
virtscsi_kick_cmd+0x18/0x1b0
 CPU: 0 PID: 6 Comm: kworker/u2:0 Not tainted 3.15.0-rc5+ #110
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5-rc1-0-gb1d4dc9-20140515_140003-nilsson.home.kraxel.org 04/01/2014
 Workqueue: scsi_tmf_0 scmd_eh_abort_handler

Signed-off-by: Rusty Russell ru...@rustcorp.com.au

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index db3b494e5926..62757afd93bb 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -433,11 +433,10 @@ static void virtscsi_event_done(struct virtqueue *vq)
  * @cmd: command structure
  * @req_size   : size of the request buffer
  * @resp_size  : size of the response buffer
- * @gfp: flags to use for memory allocations
  */
 static int virtscsi_add_cmd(struct virtqueue *vq,
struct virtio_scsi_cmd *cmd,
-   size_t req_size, size_t resp_size, gfp_t gfp)
+   size_t req_size, size_t resp_size)
 {
struct scsi_cmnd *sc = cmd-sc;
struct scatterlist *sgs[4], req, resp;
@@ -469,19 +468,19 @@ static int virtscsi_add_cmd(struct virtqueue *vq,
if (in)
sgs[out_num + in_num++] = in-sgl;

-   return virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, gfp);
+   return virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, GFP_ATOMIC);
 }

 static int virtscsi_kick_cmd(struct virtio_scsi_vq *vq,
 struct virtio_scsi_cmd *cmd,
-size_t req_size, size_t resp_size, gfp_t gfp)
+size_t req_size, size_t resp_size)
 {
unsigned long flags;
int err;
bool needs_kick = false;

spin_lock_irqsave(vq-vq_lock, flags);
-   err = virtscsi_add_cmd(vq-vq, cmd, req_size, resp_size, gfp);
+   err = virtscsi_add_cmd(vq-vq, cmd, req_size, resp_size);
if (!err)
needs_kick = virtqueue_kick_prepare(vq-vq);

@@ -530,8 +529,7 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
memcpy(cmd-req.cmd.cdb, sc-cmnd, sc-cmd_len);

if (virtscsi_kick_cmd(req_vq, cmd,
- sizeof cmd-req.cmd, sizeof cmd-resp.cmd,
- GFP_ATOMIC) == 0)
+ sizeof cmd-req.cmd, sizeof cmd-resp.cmd) == 0)
ret = 0;
else
mempool_free(cmd, virtscsi_cmd_pool);
@@ -596,8 +594,7 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, struct 
virtio_scsi_cmd *cmd)

cmd-comp = comp;
if (virtscsi_kick_cmd(vscsi-ctrl_vq, cmd,
- sizeof cmd-req.tmf, sizeof cmd-resp.tmf,
- GFP_NOIO)  0)
+ sizeof cmd-req.tmf, sizeof cmd-resp.tmf)  0)
goto out;

wait_for_completion(comp);



Acked-by: Paolo Bonzini pbonz...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-20 Thread Paolo Bonzini

Il 20/05/2014 10:10, Bart Van Assche ha scritto:

REQ_ATOM_COMPLETE is already set before scsi_eh_scmd_add() is called
since that function is only invoked after the block layer has marked a
request as complete. The only callers of scsi_eh_scmd_add() are
scsi_softirq_done(), scsi_times_out() and scmd_eh_abort_handler(). That
last function is invoked (indirectly) by scsi_times_out().


Yes, and answering my own question, you cannot have a dangling pointer 
in eh_abort_handler (unless you have a driver bug).  This is because 
once eh_abort_handler is called, you know the interrupt handler will not 
trigger the softirq and scsi_finish_command won't be called.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-19 Thread Paolo Bonzini

Hi all,

I'm trying to understand asynchronous abort in the current upstream 
code, and the code seems to have some dubious locking.


Here are some examples of the issue:

1) dangling pointers: scsi_put_command calls cancel_delayed_work(), but 
that doesn't mean that the scmd_eh_abort_handler couldn't be already 
running.  If the scmd_eh_abort_handler starts while the softirq handler 
is calling scsi_put_command (e.g. scsi_finish_command - 
scsi_io_completion - scsi_end_request - scsi_next_command), the 
pointer to the Scsi_Cmnd* becomes invalid in the middle of the abort 
handler.


2) reentrancy: the softirq handler and scmd_eh_abort_handler can run 
concurrently, and call scsi_finish_command without any lock protecting 
the calls.  You can then get memory corruption.




I don't have any reproducer for this; we're seeing related crashes in 
virtio-scsi EH but those are due to a bug in the driver.  But it means 
that I have no sensible way to write the eh_abort_handler.


Example (1) means that the eh_abort_handler cannot use the passed 
Scsi_Cmnd, because it might not even be valid when entering the 
eh_abort_handler.  Example (2) means that the eh_abort_handler cannot 
return SUCCESS if it detects that the command has been completed in the 
meanwhile.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-19 Thread Paolo Bonzini

Il 19/05/2014 17:08, Bart Van Assche ha scritto:

On 05/19/14 16:08, Paolo Bonzini wrote:

2) reentrancy: the softirq handler and scmd_eh_abort_handler can run
concurrently, and call scsi_finish_command without any lock protecting
the calls.  You can then get memory corruption.


I'm not sure what the recommended approach is to address this race. But
it is possible to address this in the LLD. See e.g. the srp_claim_req()
function in the SRP LLD and how it is invoked from the reply handler,
the abort handler and the reset handlers in that LLD.


That's not enough, unless I'm missing something.  Say the request 
handler claims the request and the abort handler doesn't:


- the request handler calls scsi_done and ends up in scsi_finish_command.

- the abort handler will return SUCCESS, and scmd_eh_abort_handler then 
calls scsi_finish_command.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends()

2014-05-08 Thread Paolo Bonzini

Il 08/05/2014 07:24, Ming Lei ha scritto:

Access to tgt-req_vq is strictly serialized by spin_lock
of tgt-tgt_lock, so the ACCESS_ONCE() isn't necessary.

smp_read_barrier_depends() in virtscsi_req_done was introduced
to order reading req_vq and decreasing tgt-reqs, but it isn't
needed now because req_vq is read from
scsi-req_vqs[vq-index - VIRTIO_SCSI_VQ_BASE] instead of
tgt-req_vq, so remove the unnecessary barrier.

Also remove related comment about the barrier.

Cc: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Ming Lei tom.leim...@gmail.com
---
v1:
- fix comment on decrements of reqs with writing of req_vq
as suggested by Paolo

 drivers/scsi/virtio_scsi.c |   55 +---
 1 file changed, 6 insertions(+), 49 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 16bfd50..bedea60 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -73,19 +73,12 @@ struct virtio_scsi_vq {
  * queue, and also lets the driver optimize the IRQ affinity for the virtqueues
  * (each virtqueue's affinity is set to the CPU that owns the queue).
  *
- * An interesting effect of this policy is that only writes to req_vq need to
- * take the tgt_lock.  Read can be done outside the lock because:
+ * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq
+ * could be done locklessly, but we do not do it yet.
  *
- * - writes of req_vq only occur when atomic_inc_return(tgt-reqs) returns 1.
- *   In that case, no other CPU is reading req_vq: even if they were in
- *   virtscsi_queuecommand_multi, they would be spinning on tgt_lock.
- *
- * - reads of req_vq only occur when the target is not idle (reqs != 0).
- *   A CPU that enters virtscsi_queuecommand_multi will not modify req_vq.
- *
- * Similarly, decrements of reqs are never concurrent with writes of req_vq.
- * Thus they can happen outside the tgt_lock, provided of course we make reqs
- * an atomic_t.
+ * Decrements of reqs are never concurrent with writes of req_vq because
+ * request is always completd after being queued. Thus they can happen
+ * outside the tgt_lock, provided of course we make reqs an atomic_t.


Decrements of reqs are never concurrent with writes of req_vq: before 
the decrement reqs will be != 0; after the decrement the virtqueue 
completion routine will not use the req_vq so it can be changed by a new 
request.  Thus they can happen outside the tgt_lock, provided of course 
we make reqs an atomic_t.


Otherwise looks good.

Paolo


  */
 struct virtio_scsi_target_state {
/* This spinlock never held at the same time as vq_lock. */
@@ -238,38 +231,6 @@ static void virtscsi_req_done(struct virtqueue *vq)
int index = vq-index - VIRTIO_SCSI_VQ_BASE;
struct virtio_scsi_vq *req_vq = vscsi-req_vqs[index];

-   /*
-* Read req_vq before decrementing the reqs field in
-* virtscsi_complete_cmd.
-*
-* With barriers:
-*
-*  CPU #0  virtscsi_queuecommand_multi (CPU #1)
-*  
-*  lock vq_lock
-*  read req_vq
-*  read reqs (reqs = 1)
-*  write reqs (reqs = 0)
-*  increment reqs (reqs = 1)
-*  write req_vq
-*
-* Possible reordering without barriers:
-*
-*  CPU #0  virtscsi_queuecommand_multi (CPU #1)
-*  
-*  lock vq_lock
-*  read reqs (reqs = 1)
-*  write reqs (reqs = 0)
-*  increment reqs (reqs = 1)
-*  write req_vq
-*  read (wrong) req_vq
-*
-* We do not need a full smp_rmb, because req_vq is required to get
-* to tgt-reqs: tgt is vscsi-tgt[sc-device-id], where sc is stored
-* in the virtqueue as the user token.
-*/
-   smp_read_barrier_depends();
-
virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd);
 };

@@ -560,12 +521,8 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct 
virtio_scsi *vscsi,

spin_lock_irqsave(tgt-tgt_lock, flags);

-   /*
-* The memory barrier after atomic_inc_return matches
-* the smp_read_barrier_depends() in virtscsi_req_done.
-*/
if (atomic_inc_return(tgt-reqs)  1)
-   vq = ACCESS_ONCE(tgt-req_vq);
+   vq = tgt-req_vq;
else {
queue_num = smp_processor_id();
while (unlikely(queue_num = vscsi-num_queues))



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-scsi: two questions related with picking up queue

2014-05-08 Thread Paolo Bonzini

Il 08/05/2014 12:44, Ming Lei ha scritto:

On Wed, 07 May 2014 18:43:45 +0200
Paolo Bonzini pbonz...@redhat.com wrote:



Per-CPU spinlocks have bad scalability problems, especially if you're
overcommitting.  Writing req_vq is not at all rare.


OK, thought about it further, and I believe seqcount may
be a match for the case, could you take a look at below patch?

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 13dd500..1adbad7 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -26,6 +26,7 @@
 #include scsi/scsi_host.h
 #include scsi/scsi_device.h
 #include scsi/scsi_cmnd.h
+#include linux/seqlock.h

 #define VIRTIO_SCSI_MEMPOOL_SZ 64
 #define VIRTIO_SCSI_EVENT_LEN 8
@@ -73,18 +74,16 @@ struct virtio_scsi_vq {
  * queue, and also lets the driver optimize the IRQ affinity for the virtqueues
  * (each virtqueue's affinity is set to the CPU that owns the queue).
  *
- * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq
- * could be done locklessly, but we do not do it yet.
+ * tgt_seq is held to serialize reading and writing req_vq.
  *
  * Decrements of reqs are never concurrent with writes of req_vq: before the
  * decrement reqs will be != 0; after the decrement the virtqueue completion
  * routine will not use the req_vq so it can be changed by a new request.
- * Thus they can happen outside the tgt_lock, provided of course we make reqs
+ * Thus they can happen outside the tgt_seq, provided of course we make reqs
  * an atomic_t.
  */
 struct virtio_scsi_target_state {
-   /* This spinlock never held at the same time as vq_lock. */
-   spinlock_t tgt_lock;
+   seqcount_t tgt_seq;

/* Count of outstanding requests. */
atomic_t reqs;
@@ -521,19 +520,33 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct 
virtio_scsi *vscsi,
unsigned long flags;
u32 queue_num;

-   spin_lock_irqsave(tgt-tgt_lock, flags);
+   local_irq_save(flags);
+   if (atomic_inc_return(tgt-reqs)  1) {
+   unsigned long seq;
+
+   do {
+   seq = read_seqcount_begin(tgt-tgt_seq);
+   vq = tgt-req_vq;
+   } while (read_seqcount_retry(tgt-tgt_seq, seq));
+   } else {
+   /* no writes can be concurrent because of atomic_t */
+   write_seqcount_begin(tgt-tgt_seq);
+
+   /* keep previous req_vq if there is reader found */
+   if (unlikely(atomic_read(tgt-reqs)  1)) {
+   vq = tgt-req_vq;
+   goto unlock;
+   }

queue_num = smp_processor_id();
while (unlikely(queue_num = vscsi-num_queues))
queue_num -= vscsi-num_queues;
tgt-req_vq = vq = vscsi-req_vqs[queue_num];
+ unlock:
+   write_seqcount_end(tgt-tgt_seq);
}
+   local_irq_restore(flags);


I find this harder to think about than the double-check with a 
spin_lock_irqsave in the middle, and the read side is not lock free anymore.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-scsi: two questions related with picking up queue

2014-05-08 Thread Paolo Bonzini

Il 08/05/2014 14:55, Ming Lei ha scritto:

On Thu, May 8, 2014 at 8:17 PM, Paolo Bonzini pbonz...@redhat.com wrote:

Il 08/05/2014 12:44, Ming Lei ha scritto:


On Wed, 07 May 2014 18:43:45 +0200
Paolo Bonzini pbonz...@redhat.com wrote:



Per-CPU spinlocks have bad scalability problems, especially if you're
overcommitting.  Writing req_vq is not at all rare.



OK, thought about it further, and I believe seqcount may
be a match for the case, could you take a look at below patch?

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 13dd500..1adbad7 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -26,6 +26,7 @@
 #include scsi/scsi_host.h
 #include scsi/scsi_device.h
 #include scsi/scsi_cmnd.h
+#include linux/seqlock.h

 #define VIRTIO_SCSI_MEMPOOL_SZ 64
 #define VIRTIO_SCSI_EVENT_LEN 8
@@ -73,18 +74,16 @@ struct virtio_scsi_vq {
  * queue, and also lets the driver optimize the IRQ affinity for the
virtqueues
  * (each virtqueue's affinity is set to the CPU that owns the queue).
  *
- * tgt_lock is held to serialize reading and writing req_vq. Reading
req_vq
- * could be done locklessly, but we do not do it yet.
+ * tgt_seq is held to serialize reading and writing req_vq.
  *
  * Decrements of reqs are never concurrent with writes of req_vq: before
the
  * decrement reqs will be != 0; after the decrement the virtqueue
completion
  * routine will not use the req_vq so it can be changed by a new request.
- * Thus they can happen outside the tgt_lock, provided of course we make
reqs
+ * Thus they can happen outside the tgt_seq, provided of course we make
reqs
  * an atomic_t.
  */
 struct virtio_scsi_target_state {
-   /* This spinlock never held at the same time as vq_lock. */
-   spinlock_t tgt_lock;
+   seqcount_t tgt_seq;

/* Count of outstanding requests. */
atomic_t reqs;
@@ -521,19 +520,33 @@ static struct virtio_scsi_vq
*virtscsi_pick_vq(struct virtio_scsi *vscsi,
unsigned long flags;
u32 queue_num;

-   spin_lock_irqsave(tgt-tgt_lock, flags);
+   local_irq_save(flags);
+   if (atomic_inc_return(tgt-reqs)  1) {
+   unsigned long seq;
+
+   do {
+   seq = read_seqcount_begin(tgt-tgt_seq);
+   vq = tgt-req_vq;
+   } while (read_seqcount_retry(tgt-tgt_seq, seq));
+   } else {
+   /* no writes can be concurrent because of atomic_t */
+   write_seqcount_begin(tgt-tgt_seq);
+
+   /* keep previous req_vq if there is reader found */
+   if (unlikely(atomic_read(tgt-reqs)  1)) {
+   vq = tgt-req_vq;
+   goto unlock;
+   }

queue_num = smp_processor_id();
while (unlikely(queue_num = vscsi-num_queues))
queue_num -= vscsi-num_queues;
tgt-req_vq = vq = vscsi-req_vqs[queue_num];
+ unlock:
+   write_seqcount_end(tgt-tgt_seq);
}
+   local_irq_restore(flags);



I find this harder to think about than the double-check with a
spin_lock_irqsave in the middle,


Sorry, could you explain it a bit? With seqcount, spin_lock
isn't needed, which should have been used for serialize
read and write.


Yes, the spin lock is not needed but you are still potentially spinning 
on the read side.



and the read side is not lock free anymore.


It is still lock free, because reader won't block reader, and
both read_seqcount_begin and read_seqcount_retry only
checks if there is writer in progress or being completed,
and the two helpers are very cheap.


Lock-free has a precise meaning, which is that the system will progress 
regardless of scheduling.  In this case, readers won't progress while 
the writer is preempted between write_seqcount_begin and write_seqcount_end.


My cmpxchg example had a lock-free read-side and a blocking write-side, 
while your code is the opposite, the write-side is lock-free and the 
read-side is blocking.


I'm not sure which is better.  You can try both and the current code, 
and show that some benchmarks improve.  Otherwise, it's better to leave 
the current code.  Simple code is better that complex code that was 
never benchmarked (which is why in the end I and Wanlong Gao settled for 
the simple spinlock).


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-scsi: two questions related with picking up queue

2014-05-07 Thread Paolo Bonzini

Il 07/05/2014 03:07, Ming Lei ha scritto:

Hi Paolo,

On Tue, May 6, 2014 at 9:17 PM, Paolo Bonzini pbonz...@redhat.com wrote:

Il 06/05/2014 11:26, Ming Lei ha scritto:


Hi Paolo and All,

One question is about ACCESS_ONCE() in virtscsi_pick_vq(),
looks it needn't since both reading and writing tgt-req_vq holds
tgt-tgt_lock.



You're right.  It should be possible to avoid the lock in virtscsi_pick_vq
like this:

value = atomic_read(tgt-reqs);


I am wondering if atomic_read() is OK because  atomic_read()
should be treated as a simple C statement, and it may not reflect
the latest value of tgt-reqs.


It would be caught by cmpxchg below.


retry:
if (value != 0) {
old_value = atomic_cmpxchg(tgt-regs, value, value + 1)
if (old_value != value) {


Maybe ' if (old_value != value  !old_value) ' is a bit better.


No, because if you have failed the cmpxchg you haven't incremented 
tgt-reqs.



value = old_value;
goto retry;
}

vq = ACCESS_ONCE(tgt-req_vq);
} else {
spin_lock_irqsave(tgt-tgt_lock, flags);

// tgt-reqs may not be 0 anymore, need to recheck
value = atomic_read(tgt-reqs);
if (atomic_read(tgt-reqs) != 0) {
spin_unlock_irqrestore(tgt-tgt_lock, flags);
goto retry;
}


Same with above, if atomic_read() still returns zero even
after it is increased in read path from another CPU, then
an obsolete vq pointer might be seen in the read path.


If I understand you correctly, then the CPUs wouldn't be cache-coherent. 
 You would have bigger problems.




// tgt-reqs now will remain fixed to 0.
...
tgt-req_vq = vq = ...;
smp_wmb();
atomic_set(tgt-reqs, 1);
spin_unlock_irqrestore(tgt-tgt_lock, flags);
}

return vq;

and then you would need the ACCESS_ONCE but I'm not sure it's worthwhile.



Yes, I agree, :-)




Another one is about the comment in virtscsi_req_done(), which
said smp_read_barrier_depends() is needed for avoiding
out of order between reading req_vq and decreasing tgt-reqs.
But if I understand correctly, in virtscsi_req_done(), req_vq is
read from vscsi-req_vqs[vq-index - VIRTIO_SCSI_VQ_BASE],
instead of tgt-req_vq, and the former won't change wrt.
inc/dec tgt-reqs, so can the barrier be removed?



Right.  The comment is obsolete and dates to before vq-index existed.


OK, I will cook a patch to remove the barrier and cleanup the comment.


Thanks.  Please remove the ACCESS_ONCE too.

Paolo

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] virtio_scsi: remove smp_read_barrier_depends

2014-05-07 Thread Paolo Bonzini

Il 07/05/2014 16:13, Ming Lei ha scritto:

order reading req_vq and decreasing tgt-reqs, but
it isn't needed now because req_vq is read from
scsi-req_vqs[vq-index - VIRTIO_SCSI_VQ_BASE]
instead of tgt-req_vq, so remove the unnecessary
barrier.

Also remove related comment about the barrier.

Cc: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Ming Lei tom.leim...@gmail.com
---
 drivers/scsi/virtio_scsi.c |   52 +++-
 1 file changed, 3 insertions(+), 49 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index aead20f..0c77ed6 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -73,19 +73,9 @@ struct virtio_scsi_vq {
  * queue, and also lets the driver optimize the IRQ affinity for the virtqueues
  * (each virtqueue's affinity is set to the CPU that owns the queue).
  *
- * An interesting effect of this policy is that only writes to req_vq need to
- * take the tgt_lock.  Read can be done outside the lock because:
- *
- * - writes of req_vq only occur when atomic_inc_return(tgt-reqs) returns 1.
- *   In that case, no other CPU is reading req_vq: even if they were in
- *   virtscsi_queuecommand_multi, they would be spinning on tgt_lock.
- *
- * - reads of req_vq only occur when the target is not idle (reqs != 0).
- *   A CPU that enters virtscsi_queuecommand_multi will not modify req_vq.
- *
- * Similarly, decrements of reqs are never concurrent with writes of req_vq.
- * Thus they can happen outside the tgt_lock, provided of course we make reqs
- * an atomic_t.


You are removing a bit too much; decrements of reqs are still done 
outside the tgt_lock, and you need to explain why.



+ * tgt_lock is held to serialize reading and writing req_vq, and reading
+ * req_vq should have been done concurrently, but need a bit complicated
+ * trick.


tgt_lock is held to serialize reading and writing req_vq.  Reading 
req_vq could be done locklessly, but we do not do it yet.


You can also merge the two patches together.

Paolo


  */
 struct virtio_scsi_target_state {
/* This spinlock never held at the same time as vq_lock. */
@@ -238,38 +228,6 @@ static void virtscsi_req_done(struct virtqueue *vq)
int index = vq-index - VIRTIO_SCSI_VQ_BASE;
struct virtio_scsi_vq *req_vq = vscsi-req_vqs[index];

-   /*
-* Read req_vq before decrementing the reqs field in
-* virtscsi_complete_cmd.
-*
-* With barriers:
-*
-*  CPU #0  virtscsi_queuecommand_multi (CPU #1)
-*  
-*  lock vq_lock
-*  read req_vq
-*  read reqs (reqs = 1)
-*  write reqs (reqs = 0)
-*  increment reqs (reqs = 1)
-*  write req_vq
-*
-* Possible reordering without barriers:
-*
-*  CPU #0  virtscsi_queuecommand_multi (CPU #1)
-*  
-*  lock vq_lock
-*  read reqs (reqs = 1)
-*  write reqs (reqs = 0)
-*  increment reqs (reqs = 1)
-*  write req_vq
-*  read (wrong) req_vq
-*
-* We do not need a full smp_rmb, because req_vq is required to get
-* to tgt-reqs: tgt is vscsi-tgt[sc-device-id], where sc is stored
-* in the virtqueue as the user token.
-*/
-   smp_read_barrier_depends();
-
virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd);
 };

@@ -560,10 +518,6 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct 
virtio_scsi *vscsi,

spin_lock_irqsave(tgt-tgt_lock, flags);

-   /*
-* The memory barrier after atomic_inc_return matches
-* the smp_read_barrier_depends() in virtscsi_req_done.
-*/
if (atomic_inc_return(tgt-reqs)  1)
vq = tgt-req_vq;
else {



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends()

2014-05-07 Thread Paolo Bonzini
Il 07/05/2014 16:34, Ming Lei ha scritto:
   *
 - * An interesting effect of this policy is that only writes to req_vq need to
 - * take the tgt_lock.  Read can be done outside the lock because:
 - *
 - * - writes of req_vq only occur when atomic_inc_return(tgt-reqs) returns 
 1.
 - *   In that case, no other CPU is reading req_vq: even if they were in
 - *   virtscsi_queuecommand_multi, they would be spinning on tgt_lock.
 - *
 - * - reads of req_vq only occur when the target is not idle (reqs != 0).
 - *   A CPU that enters virtscsi_queuecommand_multi will not modify req_vq.
 + * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq
 + * could be done locklessly, but we do not do it yet.
   *
   * Similarly, decrements of reqs are never concurrent with writes of req_vq.
   * Thus they can happen outside the tgt_lock, provided of course we make reqs

The part you deleted explains _why_ decrements of reqs are never 
concurrent with writes of req_vq.  Perhaps:

 * An interesting effect of this policy is that only increments to reqs and 
writes
 * to req_vq need to take the tgt_lock.  Reads of req_vq and decrements or 
req_vq
 * can be done outside the lock because:
 *
 * - writes of req_vq only occur when atomic_inc_return(tgt-reqs) returns 1.
 *   In that case, no other CPU is reading req_vq: even if they were in
 *   virtscsi_queuecommand_multi, they would be spinning on tgt_lock.
 *
 * - reads of req_vq only occur when the target is not idle (reqs != 0).
 *   A CPU that enters virtscsi_queuecommand_multi will not modify req_vq.
 *
 * - likewise, decrements of reqs only occur when reqs != 0.  If the decremented
 *   value is zero, the first CPU that enters virtscsi_queuecommand_multi will
 *   modify req_vq and the others will spin on tgt_lock.
 *
 * We do not try to read req_vq locklessly for simplicity, so tgt_lock is used
 * to serialize reads of req_vq too.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends()

2014-05-07 Thread Paolo Bonzini

Il 07/05/2014 18:03, Ming Lei ha scritto:

 *
 * - likewise, decrements of reqs only occur when reqs != 0.  If the decremented
 *   value is zero, the first CPU that enters virtscsi_queuecommand_multi will
 *   modify req_vq and the others will spin on tgt_lock.


The fact should be obvious too,


Perhaps, but in your patch you're leaving a Similarly that doesn't 
apply anymore.



  * We do not try to read req_vq locklessly for simplicity, so tgt_lock is used
  * to serialize reads of req_vq too.

Maybe it is better to just mention it isn't done yet, and maybe someone
will figure out simple way to support lockless reading req_vq.


The one I sketched is not too hard.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-scsi: two questions related with picking up queue

2014-05-07 Thread Paolo Bonzini

Il 07/05/2014 18:24, Ming Lei ha scritto:

On Tue, 06 May 2014 15:17:15 +0200
Paolo Bonzini pbonz...@redhat.com wrote:


Il 06/05/2014 11:26, Ming Lei ha scritto:

Hi Paolo and All,

One question is about ACCESS_ONCE() in virtscsi_pick_vq(),
looks it needn't since both reading and writing tgt-req_vq holds
tgt-tgt_lock.


You're right.  It should be possible to avoid the lock in
virtscsi_pick_vq like this:

value = atomic_read(tgt-reqs);
retry:
if (value != 0) {
old_value = atomic_cmpxchg(tgt-regs, value, value + 1)
if (old_value != value) {
value = old_value;
goto retry;
}

smp_mb__after_atomic_cmpxchg(); // you get the idea :)
vq = ACCESS_ONCE(tgt-req_vq);
} else {
spin_lock_irqsave(tgt-tgt_lock, flags);

// tgt-reqs may not be 0 anymore, need to recheck
value = atomic_read(tgt-reqs);
if (atomic_read(tgt-reqs) != 0) {
spin_unlock_irqrestore(tgt-tgt_lock, flags);
goto retry;
}

// tgt-reqs now will remain fixed to 0.
...
tgt-req_vq = vq = ...;
smp_wmb();
atomic_set(tgt-reqs, 1);
spin_unlock_irqrestore(tgt-tgt_lock, flags);
}

return vq;



Another approach I thought of is to use percpu spinlock, and
the idea is simple:

- all perpcu locks are held for writing req_vq, and
- only percpu lock is needed for reading req_vq.

What do you think about the below patch?


Per-CPU spinlocks have bad scalability problems, especially if you're 
overcommitting.  Writing req_vq is not at all rare.


Paolo


diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 697fa53..00deab4 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -82,7 +82,7 @@ struct virtio_scsi_vq {
  */
 struct virtio_scsi_target_state {
/* This spinlock never held at the same time as vq_lock. */
-   spinlock_t tgt_lock;
+   spinlock_t __percpu *lock;

/* Count of outstanding requests. */
atomic_t reqs;
@@ -517,21 +517,46 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct 
virtio_scsi *vscsi,
 {
struct virtio_scsi_vq *vq;
unsigned long flags;
-   u32 queue_num;
+   u32 cpu = get_cpu();
+   spinlock_t  *lock = per_cpu_ptr(tgt-lock, cpu);

-   spin_lock_irqsave(tgt-tgt_lock, flags);
+   spin_lock_irqsave(lock, flags);

if (atomic_inc_return(tgt-reqs)  1)
vq = tgt-req_vq;
else {
-   queue_num = smp_processor_id();
+   u32 queue_num = cpu;
+   int i;
+
while (unlikely(queue_num = vscsi-num_queues))
queue_num -= vscsi-num_queues;

-   tgt-req_vq = vq = vscsi-req_vqs[queue_num];
+   /*
+* there should be only one writing because of atomic
+* counter, so we don't worry about deadlock, but
+* might need to teach lockdep to not complain it
+*/
+   for_each_possible_cpu(i) {
+   spinlock_t *other = per_cpu_ptr(tgt-lock, i);
+   if (i != cpu)
+   spin_lock(other);
+   }
+
+   /* only update req_vq when reqs is one*/
+   if (atomic_read(tgt-reqs) == 1)
+   tgt-req_vq = vq = vscsi-req_vqs[queue_num];
+   else
+   vq = tgt-req_vq;
+
+   for_each_possible_cpu(i) {
+   spinlock_t *other = per_cpu_ptr(tgt-lock, i);
+   if (i != cpu)
+   spin_unlock(other);
+   }
}

-   spin_unlock_irqrestore(tgt-tgt_lock, flags);
+   spin_unlock_irqrestore(lock, flags);
+   put_cpu();
return vq;
 }

@@ -618,10 +643,22 @@ static int virtscsi_target_alloc(struct scsi_target 
*starget)
 {
struct virtio_scsi_target_state *tgt =
kmalloc(sizeof(*tgt), GFP_KERNEL);
+   int i;
+
if (!tgt)
return -ENOMEM;

-   spin_lock_init(tgt-tgt_lock);
+   tgt-lock = alloc_percpu(spinlock_t);
+   if (!tgt-lock) {
+   kfree(tgt);
+   return -ENOMEM;
+   }
+
+   for_each_possible_cpu(i) {
+   spinlock_t *lock = per_cpu_ptr(tgt-lock, i);
+   spin_lock_init(lock);
+   }
+
atomic_set(tgt-reqs, 0);
tgt-req_vq = NULL;

@@ -632,6 +669,7 @@ static int virtscsi_target_alloc(struct scsi_target 
*starget)
 static void virtscsi_target_destroy(struct scsi_target *starget)
 {
struct virtio_scsi_target_state *tgt = starget-hostdata;
+   free_percpu(tgt-lock

Re: virtio-scsi: two questions related with picking up queue

2014-05-06 Thread Paolo Bonzini

Il 06/05/2014 11:26, Ming Lei ha scritto:

Hi Paolo and All,

One question is about ACCESS_ONCE() in virtscsi_pick_vq(),
looks it needn't since both reading and writing tgt-req_vq holds
tgt-tgt_lock.


You're right.  It should be possible to avoid the lock in 
virtscsi_pick_vq like this:


value = atomic_read(tgt-reqs);
retry:
if (value != 0) {
old_value = atomic_cmpxchg(tgt-regs, value, value + 1)
if (old_value != value) {
value = old_value;
goto retry;
}

smp_mb__after_atomic_cmpxchg(); // you get the idea :)
vq = ACCESS_ONCE(tgt-req_vq);
} else {
spin_lock_irqsave(tgt-tgt_lock, flags);

// tgt-reqs may not be 0 anymore, need to recheck
value = atomic_read(tgt-reqs);
if (atomic_read(tgt-reqs) != 0) {
spin_unlock_irqrestore(tgt-tgt_lock, flags);
goto retry;
}

// tgt-reqs now will remain fixed to 0.
...
tgt-req_vq = vq = ...;
smp_wmb();
atomic_set(tgt-reqs, 1);
spin_unlock_irqrestore(tgt-tgt_lock, flags);
}

return vq;

and then you would need the ACCESS_ONCE but I'm not sure it's worthwhile.


Another one is about the comment in virtscsi_req_done(), which
said smp_read_barrier_depends() is needed for avoiding
out of order between reading req_vq and decreasing tgt-reqs.
But if I understand correctly, in virtscsi_req_done(), req_vq is
read from vscsi-req_vqs[vq-index - VIRTIO_SCSI_VQ_BASE],
instead of tgt-req_vq, and the former won't change wrt.
inc/dec tgt-reqs, so can the barrier be removed?


Right.  The comment is obsolete and dates to before vq-index existed.

Paolo


Any comments about the above?

Thanks,



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] SCSI fixes for 3.15-rc3

2014-04-28 Thread Paolo Bonzini

Il 25/04/2014 16:59, James Bottomley ha scritto:

This is a set of seven fixes, three (hpsa) and free'd command references
correcting bugs in the last round of updates and the remaining four
correcting problems within the SCSI error handler that was causing a
deadlock within USB.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes

The short changelog is:

Alan Stern (1):
  Fix command result state propagation

Christoph Hellwig (2):
  don't reference freed command in scsi_prep_return
  don't reference freed command in scsi_init_sgtable

Hannes Reinecke (1):
  Fix USB deadlock caused by SCSI error handling

James Bottomley (2):
  More USB deadlock fixes
  Fix spurious request sense in error handling

scame...@beardog.cce.hp.com (1):
  hpsa: fix NULL dereference in hpsa_put_ctlr_into_performant_mode()

And the diffstat:

 drivers/scsi/hpsa.c   |  8 
 drivers/scsi/scsi_error.c | 12 
 drivers/scsi/scsi_lib.c   |  6 --
 3 files changed, 20 insertions(+), 6 deletions(-)

With full diffs below.


James,

can you queue http://article.gmane.org/gmane.linux.kernel/1682301/raw 
please?


Thanks,

Paolo


James

---

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 8cf4a0c..9a6e4a2 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -7463,6 +7463,10 @@ static void hpsa_put_ctlr_into_performant_mode(struct 
ctlr_info *h)
if (hpsa_simple_mode)
return;

+   trans_support = readl((h-cfgtable-TransportSupport));
+   if (!(trans_support  PERFORMANT_MODE))
+   return;
+
/* Check for I/O accelerator mode support */
if (trans_support  CFGTBL_Trans_io_accel1) {
transMethod |= CFGTBL_Trans_io_accel1 |
@@ -7479,10 +7483,6 @@ static void hpsa_put_ctlr_into_performant_mode(struct 
ctlr_info *h)
}

/* TODO, check that this next line h-nreply_queues is correct */
-   trans_support = readl((h-cfgtable-TransportSupport));
-   if (!(trans_support  PERFORMANT_MODE))
-   return;
-
h-nreply_queues = h-msix_vector  0 ? h-msix_vector : 1;
hpsa_get_max_perf_mode_cmds(h);
/* Performant mode ring buffer and supporting data structures */
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 771c16b..f17aa7a 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -189,6 +189,7 @@ scsi_abort_command(struct scsi_cmnd *scmd)
/*
 * Retry after abort failed, escalate to next level.
 */
+   scmd-eh_eflags = ~SCSI_EH_ABORT_SCHEDULED;
SCSI_LOG_ERROR_RECOVERY(3,
scmd_printk(KERN_INFO, scmd,
scmd %p previous abort failed\n, scmd));
@@ -920,10 +921,12 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct 
scsi_eh_save *ses,
ses-prot_op = scmd-prot_op;

scmd-prot_op = SCSI_PROT_NORMAL;
+   scmd-eh_eflags = 0;
scmd-cmnd = ses-eh_cmnd;
memset(scmd-cmnd, 0, BLK_MAX_CDB);
memset(scmd-sdb, 0, sizeof(scmd-sdb));
scmd-request-next_rq = NULL;
+   scmd-result = 0;

if (sense_bytes) {
scmd-sdb.length = min_t(unsigned, SCSI_SENSE_BUFFERSIZE,
@@ -1157,6 +1160,15 @@ int scsi_eh_get_sense(struct list_head *work_q,
 __func__));
break;
}
+   if (status_byte(scmd-result) != CHECK_CONDITION)
+   /*
+* don't request sense if there's no check condition
+* status because the error we're processing isn't one
+* that has a sense code (and some devices get
+* confused by sense requests out of the blue)
+*/
+   continue;
+
SCSI_LOG_ERROR_RECOVERY(2, scmd_printk(KERN_INFO, scmd,
  %s: requesting sense\n,
  current-comm));
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 65a123d..9db097a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -137,6 +137,7 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int 
reason, int unbusy)
 * lock such that the kblockd_schedule_work() call happens
 * before blk_cleanup_queue() finishes.
 */
+   cmd-result = 0;
spin_lock_irqsave(q-queue_lock, flags);
blk_requeue_request(q, cmd-request);
kblockd_schedule_work(q, device-requeue_work);
@@ -1044,6 +1045,7 @@ static int scsi_init_sgtable(struct request *req, struct 
scsi_data_buffer *sdb,
  */
 int scsi_init_io(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 {
+   struct scsi_device *sdev = cmd-device;
struct request *rq = 

Re: [PATCH 1/6] virtio-scsi.h: Add virtio_scsi_cmd_req_pi + VIRTIO_SCSI_F_T10_PI bits

2014-04-12 Thread Paolo Bonzini

Il 09/04/2014 09:22, Michael S. Tsirkin ha scritto:

 The interface uses bytes instead of iovecs as the unit, and
 vhost-scsi can add the (temporary...) requirement that do_pi_nbytes
 and di_pi_nbytes comprise an integer number of iovecs.

 Paolo

That would be a reasonable intermediate step, but I'm worried there
are more assumptions lurking in tere.

In particular is it legal for PI to be in the same iov
entry as data? If not they really need to use a
separate buf entry.


No, in fact that would mean that do/di_pi_nbytes are not an integer 
number of iovecs.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-scsi: Skip setting affinity on uninitialized vq

2014-04-11 Thread Paolo Bonzini
 put the if inside the loop, but it's really all or nothing since the 
failure point is find_vqs.


Not a problem though; the queues are few and this is not a hot path anyway.

Acked-by: Paolo Bonzini pbonz...@redhat.com

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-scsi: Skip setting affinity on uninitialized vq

2014-04-11 Thread Paolo Bonzini

Il 11/04/2014 03:23, Fam Zheng ha scritto:

virtscsi_init calls virtscsi_remove_vqs on err, even before initializing
the vqs. The latter calls virtscsi_set_affinity, so let's check the
pointer there before setting affinity on it.

This fixes a panic when setting device's num_queues=2 on RHEL 6.5:

qemu-system-x86_64 ... \
-device virtio-scsi-pci,id=scsi0,addr=0x13,...,num_queues=2 \
-drive file=/stor/vm/dummy.raw,id=drive-scsi-disk,... \
-device scsi-hd,drive=drive-scsi-disk,...

[0.354734] scsi0 : Virtio SCSI HBA
[0.379504] BUG: unable to handle kernel NULL pointer dereference at 
0020
[0.380141] IP: [814741ef] __virtscsi_set_affinity+0x4f/0x120
[0.380141] PGD 0
[0.380141] Oops:  [#1] SMP
[0.380141] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0+ #5
[0.380141] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[0.380141] task: 88003c9f ti: 88003c9f8000 task.ti: 
88003c9f8000
[0.380141] RIP: 0010:[814741ef]  [814741ef] 
__virtscsi_set_affinity+0x4f/0x120
[0.380141] RSP: :88003c9f9c08  EFLAGS: 00010256
[0.380141] RAX:  RBX: 88003c3a9d40 RCX: 1070
[0.380141] RDX: 0002 RSI:  RDI: 
[0.380141] RBP: 88003c9f9c28 R08: 000136c0 R09: 88003c801c00
[0.380141] R10: 81475229 R11: 0008 R12: 
[0.380141] R13: 81cc7ca8 R14: 88003cac3d40 R15: 88003cac37a0
[0.380141] FS:  () GS:88003e40() 
knlGS:
[0.380141] CS:  0010 DS:  ES:  CR0: 8005003b
[0.380141] CR2: 0020 CR3: 01c0e000 CR4: 06f0
[0.380141] Stack:
[0.380141]  88003c3a9d40  88003cac3d80 
88003cac3d40
[0.380141]  88003c9f9c48 814742e8 88003c26d000 
88003c26d000
[0.380141]  88003c9f9c68 81474321 88003c26d000 
88003c3a9d40
[0.380141] Call Trace:
[0.380141]  [814742e8] virtscsi_set_affinity+0x28/0x40
[0.380141]  [81474321] virtscsi_remove_vqs+0x21/0x50
[0.380141]  [81475231] virtscsi_init+0x91/0x240
[0.380141]  [81365290] ? vp_get+0x50/0x70
[0.380141]  [81475544] virtscsi_probe+0xf4/0x280
[0.380141]  [81363ea5] virtio_dev_probe+0xe5/0x140
[0.380141]  [8144c669] driver_probe_device+0x89/0x230
[0.380141]  [8144c8ab] __driver_attach+0x9b/0xa0
[0.380141]  [8144c810] ? driver_probe_device+0x230/0x230
[0.380141]  [8144c810] ? driver_probe_device+0x230/0x230
[0.380141]  [8144ac1c] bus_for_each_dev+0x8c/0xb0
[0.380141]  [8144c499] driver_attach+0x19/0x20
[0.380141]  [8144bf28] bus_add_driver+0x198/0x220
[0.380141]  [8144ce9f] driver_register+0x5f/0xf0
[0.380141]  [81d27c91] ? spi_transport_init+0x79/0x79
[0.380141]  [8136403b] register_virtio_driver+0x1b/0x30
[0.380141]  [81d27d19] init+0x88/0xd6
[0.380141]  [81d27c18] ? scsi_init_procfs+0x5b/0x5b
[0.380141]  [81ce88a7] do_one_initcall+0x7f/0x10a
[0.380141]  [81ce8aa7] kernel_init_freeable+0x14a/0x1de
[0.380141]  [81ce8b3b] ? kernel_init_freeable+0x1de/0x1de
[0.380141]  [817dec20] ? rest_init+0x80/0x80
[0.380141]  [817dec29] kernel_init+0x9/0xf0
[0.380141]  [817e68fc] ret_from_fork+0x7c/0xb0
[0.380141]  [817dec20] ? rest_init+0x80/0x80
[0.380141] RIP  [814741ef] __virtscsi_set_affinity+0x4f/0x120
[0.380141]  RSP 88003c9f9c08
[0.380141] CR2: 0020
[0.380141] ---[ end trace 8074b70c3d5e1d73 ]---
[0.475018] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0009
[0.475018]
[0.475068] Kernel Offset: 0x0 from 0x8100 (relocation range: 
0x8000-0x9fff)
[0.475068] ---[ end Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0009

Signed-off-by: Fam Zheng f...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 16bfd50..3019267 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -750,8 +750,12 @@ static void __virtscsi_set_affinity(struct virtio_scsi 
*vscsi, bool affinity)

vscsi-affinity_hint_set = true;
} else {
-   for (i = 0; i  vscsi-num_queues; i++)
+   for (i = 0; i  vscsi-num_queues; i++) {
+   if (!vscsi-req_vqs[i].vq) {
+   continue;
+   }
virtqueue_set_affinity(vscsi-req_vqs[i].vq, -1);
+   }

vscsi-affinity_hint_set = false;
}



Fam, 

Re: [PATCH 16/39] virtio_scsi: use cmd_size

2014-03-25 Thread Paolo Bonzini

Il 25/03/2014 16:31, Christoph Hellwig ha scritto:

Paolo,

do you have a virtio_scsi tree to picks this up in?  We now have all
the infrastructure for it in James' tree.


I don't, I usually just give my Acked-by and James picks it up.

Paolo

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv2 5/7] vhost/scsi: Enable T10 PI IOV - SGL memory mapping

2014-03-17 Thread Paolo Bonzini
Il 17/03/2014 06:32, Nicholas A. Bellinger ha scritto:
 + if (vq-iov[0].iov_len == sizeof(v_req_pi)) {
 + req = (unsigned char *)v_req_pi;
 + target = v_req_pi.lun[1];
 + req_size = sizeof(v_req_pi);
 + hdr_pi = true;
 + } else if (vq-iov[0].iov_len == sizeof(v_req)) {
 + req = (unsigned char *)v_req;
 + target = v_req.lun[1];
 + req_size = sizeof(v_req);
 + hdr_pi = false;

The right check here is on the negotiated features.

You need a matching QEMU patch to enable the protection information
feature, like this (untested):

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 3983a5b..4c8d5cd 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -154,6 +154,9 @@ static uint32_t vhost_scsi_get_features(VirtIODevice *vdev,
 if (!(s-dev.features  (1  VIRTIO_SCSI_F_HOTPLUG))) {
 features = ~(1  VIRTIO_SCSI_F_HOTPLUG);
 }
+if (!(s-dev.features  (1  VIRTIO_SCSI_F_T10_PI))) {
+features = ~(1  VIRTIO_SCSI_F_T10_PI);
+}
 
 return features;
 }
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 6610b3a..4a551e9 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -629,6 +629,9 @@ static void virtio_scsi_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
+/* Protection information is not supported yet.  */
+dev-guest_features = ~VIRTIO_SCSI_F_T10_PI;
+
 scsi_bus_new(s-bus, sizeof(s-bus), dev,
  virtio_scsi_scsi_info, vdev-bus_name);
 
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 9010246..8621fbf 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -267,6 +267,16 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t);
 .value= no,\
 },\
 {\
+.driver   = virtio-scsi-pci,\
+.property = prot_info,\
+.value= off,\
+},\
+{\
+.driver   = vhost-scsi-pci,\
+.property = prot_info,\
+.value= off,\
+},\
+{\
 .driver   = PIIX4_PM,\
 .property = acpi-pci-hotplug-with-bridge-support,\
 .value= off,\
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 42b1024..a555f49 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -34,6 +34,7 @@
 #define VIRTIO_SCSI_F_INOUT0
 #define VIRTIO_SCSI_F_HOTPLUG  1
 #define VIRTIO_SCSI_F_CHANGE   2
+#define VIRTIO_SCSI_F_T10_PI   3
 
 #define VIRTIO_SCSI_VQ_SIZE 128
 #define VIRTIO_SCSI_CDB_SIZE32
@@ -184,7 +185,9 @@ typedef struct {
 DEFINE_PROP_BIT(hotplug, _state, _feature_field, VIRTIO_SCSI_F_HOTPLUG,  
\
true),  
\
 DEFINE_PROP_BIT(param_change, _state, _feature_field,
\
-VIRTIO_SCSI_F_CHANGE, true)
+VIRTIO_SCSI_F_CHANGE, true)
\
+DEFINE_PROP_BIT(prot_info, _state, _feature_field,   
\
+VIRTIO_SCSI_F_T10_PI, true)
 
 void virtio_scsi_common_realize(DeviceState *dev, Error **errp);
 void virtio_scsi_common_unrealize(DeviceState *dev, Error **errp);

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-scsi: Change sense buffer size to 252

2014-03-06 Thread Paolo Bonzini

Il 06/03/2014 09:47, Fam Zheng ha scritto:

According to SPC-4, section 4.5.2.1, 252 is the limit of sense data. So
increase the value.

Signed-off-by: Fam Zheng f...@redhat.com
---
 include/linux/virtio_scsi.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h
index 4195b97..a437f7f 100644
--- a/include/linux/virtio_scsi.h
+++ b/include/linux/virtio_scsi.h
@@ -28,7 +28,7 @@
 #define _LINUX_VIRTIO_SCSI_H

 #define VIRTIO_SCSI_CDB_SIZE   32
-#define VIRTIO_SCSI_SENSE_SIZE 96
+#define VIRTIO_SCSI_SENSE_SIZE 252

 /* SCSI command request, followed by data-out */
 struct virtio_scsi_cmd_req {



Hi Fam, how did you test this?

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-scsi: Change sense buffer size to 252

2014-03-06 Thread Paolo Bonzini

Il 06/03/2014 12:22, Hannes Reinecke ha scritto:

On 03/06/2014 11:09 AM, Paolo Bonzini wrote:

Il 06/03/2014 09:47, Fam Zheng ha scritto:

According to SPC-4, section 4.5.2.1, 252 is the limit of sense
data. So
increase the value.

Signed-off-by: Fam Zheng f...@redhat.com
---
 include/linux/virtio_scsi.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/virtio_scsi.h
b/include/linux/virtio_scsi.h
index 4195b97..a437f7f 100644
--- a/include/linux/virtio_scsi.h
+++ b/include/linux/virtio_scsi.h
@@ -28,7 +28,7 @@
 #define _LINUX_VIRTIO_SCSI_H

 #define VIRTIO_SCSI_CDB_SIZE   32
-#define VIRTIO_SCSI_SENSE_SIZE 96
+#define VIRTIO_SCSI_SENSE_SIZE 252

 /* SCSI command request, followed by data-out */
 struct virtio_scsi_cmd_req {



Hi Fam, how did you test this?


Is there a specific reason _not_ to use the linux default?
The SCSI stack typically limits the sense code to
SCSI_SENSE_BUFFERSIZE, so using other values have a
limited sense.
Literally :-)


Indeed I don't think this patch makes a difference.  Though I asked not 
from the SCSI stack perspective, but because right now both virtio-scsi 
targets (QEMU and vhost-scsi) are also truncating at 96.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost/scsi: Check LUN structure byte 0 is set to 1, per spec

2014-02-24 Thread Paolo Bonzini
 The virtio spec requires byte 0 of the virtio-scsi LUN structure
 to be '1'.
 
 Signed-off-by: Venkatesh Srinivas venkate...@google.com
 ---
   drivers/vhost/scsi.c | 6 ++
   1 file changed, 6 insertions(+)
 
 diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
 index 0a025b8..e48d4a6 100644
 --- a/drivers/vhost/scsi.c
 +++ b/drivers/vhost/scsi.c
 @@ -1001,6 +1001,12 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct
 vhost_virtqueue *vq)
   break;
   }
   
 + /* virtio-scsi spec requires byte 0 of the lun to be 1 */
 + if (unlikely(v_req.lun[0] != 1)) {
 + vhost_scsi_send_bad_target(vs, vq, head, out);
 + continue;
 + }
 +
   /* Extract the tpgt */
   target = v_req.lun[1];
   tpg = ACCESS_ONCE(vs_tpg[target]);
 --
 1.8.5.3
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/17] scsi: avoid useless free_list lock roundtrips

2014-02-07 Thread Paolo Bonzini

Il 05/02/2014 13:39, Christoph Hellwig ha scritto:

Avoid hitting the host-wide free_list lock unless we need to put a command
back onto the freelist.

Signed-off-by: Christoph Hellwig h...@lst.de
---
 drivers/scsi/scsi.c |   13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index ebcea6c..4139486 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -320,15 +320,16 @@ void __scsi_put_command(struct Scsi_Host *shost, struct 
scsi_cmnd *cmd,
 {
unsigned long flags;

-   /* changing locks here, don't need to restore the irq state */
-   spin_lock_irqsave(shost-free_list_lock, flags);
if (unlikely(list_empty(shost-free_list))) {
-   list_add(cmd-list, shost-free_list);
-   cmd = NULL;
+   spin_lock_irqsave(shost-free_list_lock, flags);
+   if (list_empty(shost-free_list)) {
+   list_add(cmd-list, shost-free_list);
+   cmd = NULL;
+   }
+   spin_unlock_irqrestore(shost-free_list_lock, flags);
}
-   spin_unlock_irqrestore(shost-free_list_lock, flags);

-   if (likely(cmd != NULL))
+   if (cmd)
scsi_pool_free_command(shost-cmd_pool, cmd);

put_device(dev);


Reviewed-by: Paolo Bonzini pbonz...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/17] scsi: add support for per-host cmd pools

2014-02-07 Thread Paolo Bonzini

Il 05/02/2014 13:39, Christoph Hellwig ha scritto:

+   pool = scsi_find_host_cmd_pool(shost);


Should you have a WARN_ON somewhere if shost-hostt-cmd_size  
shost-unchecked_isa_dma?


Apart from this,

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Paolo


+   if (!pool) {
+   pool = scsi_alloc_host_cmd_pool(shost);
+   if (!pool)
+   goto out;
+   }
+


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/17] virtio_scsi: use cmd_size

2014-02-07 Thread Paolo Bonzini

Il 05/02/2014 13:39, Christoph Hellwig ha scritto:

Taken almost entirely from Nicholas Bellinger's scsi-mq conversion.

Signed-off-by: Christoph Hellwig h...@lst.de
---
 drivers/scsi/virtio_scsi.c |   25 +++--
 1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 16bfd50..d9a6074 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -204,7 +204,6 @@ static void virtscsi_complete_cmd(struct virtio_scsi 
*vscsi, void *buf)
set_driver_byte(sc, DRIVER_SENSE);
}

-   mempool_free(cmd, virtscsi_cmd_pool);
sc-scsi_done(sc);

atomic_dec(tgt-reqs);
@@ -279,8 +278,6 @@ static void virtscsi_complete_free(struct virtio_scsi 
*vscsi, void *buf)

if (cmd-comp)
complete_all(cmd-comp);
-   else
-   mempool_free(cmd, virtscsi_cmd_pool);
 }

 static void virtscsi_ctrl_done(struct virtqueue *vq)
@@ -496,10 +493,9 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
 struct virtio_scsi_vq *req_vq,
 struct scsi_cmnd *sc)
 {
-   struct virtio_scsi_cmd *cmd;
-   int ret;
-
struct Scsi_Host *shost = virtio_scsi_host(vscsi-vdev);
+   struct virtio_scsi_cmd *cmd = (struct virtio_scsi_cmd *)(sc + 1);
+
BUG_ON(scsi_sg_count(sc)  shost-sg_tablesize);

/* TODO: check feature bit and fail if unsupported?  */
@@ -508,11 +504,6 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
dev_dbg(sc-device-sdev_gendev,
cmd %p CDB: %#02x\n, sc, sc-cmnd[0]);

-   ret = SCSI_MLQUEUE_HOST_BUSY;
-   cmd = mempool_alloc(virtscsi_cmd_pool, GFP_ATOMIC);
-   if (!cmd)
-   goto out;
-
memset(cmd, 0, sizeof(*cmd));
cmd-sc = sc;
cmd-req.cmd = (struct virtio_scsi_cmd_req){
@@ -531,13 +522,9 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,

if (virtscsi_kick_cmd(req_vq, cmd,
  sizeof cmd-req.cmd, sizeof cmd-resp.cmd,
- GFP_ATOMIC) == 0)
-   ret = 0;
-   else
-   mempool_free(cmd, virtscsi_cmd_pool);
-
-out:
-   return ret;
+ GFP_ATOMIC) != 0)
+   return SCSI_MLQUEUE_HOST_BUSY;
+   return 0;
 }

 static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
@@ -683,6 +670,7 @@ static struct scsi_host_template 
virtscsi_host_template_single = {
.name = Virtio SCSI HBA,
.proc_name = virtio_scsi,
.this_id = -1,
+   .cmd_size = sizeof(struct virtio_scsi_cmd),
.queuecommand = virtscsi_queuecommand_single,
.eh_abort_handler = virtscsi_abort,
.eh_device_reset_handler = virtscsi_device_reset,
@@ -699,6 +687,7 @@ static struct scsi_host_template 
virtscsi_host_template_multi = {
.name = Virtio SCSI HBA,
.proc_name = virtio_scsi,
.this_id = -1,
+   .cmd_size = sizeof(struct virtio_scsi_cmd),
.queuecommand = virtscsi_queuecommand_multi,
.eh_abort_handler = virtscsi_abort,
.eh_device_reset_handler = virtscsi_device_reset,
-- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe
linux-scsi in the body of a message to majord...@vger.kernel.org More
majordomo info at http://vger.kernel.org/majordomo-info.html



Acked-by: Paolo Bonzini pbonz...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] target_core_spc: bounds check for spc_emulate_inquiry()

2013-12-20 Thread Paolo Bonzini
Il 20/12/2013 07:47, Hannes Reinecke ha scritto:
  
  How about changing the local buffer to heap memory instead, and bumping
  SE_INQUIRY_BUF to 1024 bytes..?
  
 Ok. But then we should have a quick check at the start
 
 if (cmd-data_length  SE_INQUIRY_BUF)
   len = cmd-data_length
 else
   len = SE_INQUIRY_BUF
 
 to catch oversized requests.

Why do you need it?  If inquiry data is always 1K,  when
cmd-data_length is large you can just need to write zeroes to the
buffer after the first SE_INQUIRY_BUF bytes.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze

2013-10-28 Thread Paolo Bonzini
Il 28/10/2013 09:01, Asias He ha scritto:
 vqs are freed in virtscsi_freeze but the hotcpu_notifier is not
 unregistered. We will have a use-after-free usage when the notifier
 callback is called after virtscsi_freeze.
 
 Signed-off-by: Asias He as...@redhat.com
 ---
  drivers/scsi/virtio_scsi.c | 15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
 index 74b88ef..b26f1a5 100644
 --- a/drivers/scsi/virtio_scsi.c
 +++ b/drivers/scsi/virtio_scsi.c
 @@ -957,6 +957,10 @@ static void virtscsi_remove(struct virtio_device *vdev)
  #ifdef CONFIG_PM
  static int virtscsi_freeze(struct virtio_device *vdev)
  {
 + struct Scsi_Host *sh = virtio_scsi_host(vdev);
 + struct virtio_scsi *vscsi = shost_priv(sh);
 +
 + unregister_hotcpu_notifier(vscsi-nb);
   virtscsi_remove_vqs(vdev);
   return 0;
  }
 @@ -965,8 +969,17 @@ static int virtscsi_restore(struct virtio_device *vdev)
  {
   struct Scsi_Host *sh = virtio_scsi_host(vdev);
   struct virtio_scsi *vscsi = shost_priv(sh);
 + int err;
 +
 + err = virtscsi_init(vdev, vscsi);
 + if (err)
 + return err;
 +
 + err = register_hotcpu_notifier(vscsi-nb);
 + if (err)
 + vdev-config-del_vqs(vdev);
  
 - return virtscsi_init(vdev, vscsi);
 + return err;
  }
  #endif
  
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Cc: sta...@vger.kernel.org
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI's heuristics for enabling WRITE SAME still need work [was: dm mpath: disable WRITE SAME if it fails]

2013-09-24 Thread Paolo Bonzini
Il 21/09/2013 00:03, Martin K. Petersen ha scritto:
 
 The major headache here of course is that WRITE SAME is inherently
 destructive. We can't just fire off one during discovery and see if it
 works. For WRITE you can issue a command with a transfer length of 0 to
 see if things work. But unfortunately for WRITE SAME a transfer length
 of zero means wipe the entire device. Yikes!

What about WRITE SAME with an out-of-range lba?  Or lba equal to the
size of the disk and #blocks equal to zero.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] SCSI: Fix potential out-of-bounds access in drivers/scsi/sd.c

2013-09-06 Thread Paolo Bonzini
Il 06/09/2013 17:49, Alan Stern ha scritto:
 This patch fixes an out-of-bounds error in sd_read_cache_type(), found
 by Google's AddressSanitizer tool.  When the loop ends, we know that
 offset lies beyond the end of the data in the buffer, so no Caching
 mode page was found.  In theory it may be present, but the buffer size
 is limited to 512 bytes.
 
 Signed-off-by: Alan Stern st...@rowland.harvard.edu
 Reported-by: Dmitry Vyukov dvyu...@google.com
 CC: sta...@vger.kernel.org

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

 
 ---
 
 
 [as1709]
 
 
  drivers/scsi/sd.c |   11 +++
  1 file changed, 3 insertions(+), 8 deletions(-)
 
 Index: usb-3.11/drivers/scsi/sd.c
 ===
 --- usb-3.11.orig/drivers/scsi/sd.c
 +++ usb-3.11/drivers/scsi/sd.c
 @@ -2419,14 +2419,9 @@ sd_read_cache_type(struct scsi_disk *sdk
   }
   }
  
 - if (modepage == 0x3F) {
 - sd_printk(KERN_ERR, sdkp, No Caching mode page 
 -   present\n);
 - goto defaults;
 - } else if ((buffer[offset]  0x3f) != modepage) {
 - sd_printk(KERN_ERR, sdkp, Got wrong page\n);
 - goto defaults;
 - }
 + sd_printk(KERN_ERR, sdkp, No Caching mode page found\n);
 + goto defaults;
 +
   Page_found:
   if (modepage == 8) {
   sdkp-WCE = ((buffer[offset + 2]  0x04) != 0);
 

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Potential out-of-bounds access in drivers/scsi/sd.c

2013-09-04 Thread Paolo Bonzini
Il 04/09/2013 16:32, Alan Stern ha scritto:
 On Wed, 4 Sep 2013, Dmitry Vyukov wrote:
 
 Hi,

 We are working on a memory error detector AddressSanitizer for Linux
 kernel 
 (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel),
 it can detect use-after-free and buffer-overflow errors.
 
 ...
 
 The code in sd_read_cache_type does the following:

 while (offset  len) {
 ...
 }
 ...
 if ((buffer[offset]  0x3f) != modepage) {
 sd_printk(KERN_ERR, sdkp, Got wrong page\n);
 goto defaults;
 }

 When control leaves the while loop, offset = len, so buffer[offset]
 reads random garbage out-of-bounds.
 It the worst case it can lead to crash, or if (buffer[offset]  0x3f)
 happen to be == modepage, then it will read more garbage.

 Please help validate and triage this.
 
 The tool's output is correct.  The patch below should fix it.
 
 Alan Stern
 
 
 
 Index: usb-3.11/drivers/scsi/sd.c
 ===
 --- usb-3.11.orig/drivers/scsi/sd.c
 +++ usb-3.11/drivers/scsi/sd.c
 @@ -2419,7 +2419,7 @@ sd_read_cache_type(struct scsi_disk *sdk
   }
   }
  
 - if (modepage == 0x3F) {
 + if (modepage == 0x3F || offset + 2 = len) {
   sd_printk(KERN_ERR, sdkp, No Caching mode page 
 present\n);
   goto defaults;

If you do this, the buggy if becomes dead code (the loop above doesn't
have any break, so you know that offset = len and the new condition
is always true).

So the patch does indeed prevent the bug, but the code can be simplified.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-scsi: Fix virtqueue affinity setup

2013-08-01 Thread Paolo Bonzini
 vscsi-num_queues counts the number of request virtqueue which does not
 include the control and event virtqueue. It is wrong to subtract
 VIRTIO_SCSI_VQ_BASE from vscsi-num_queues.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

 This patch fixes the following panic.
 
 (qemu) device_del scsi0
 
  BUG: unable to handle kernel NULL pointer dereference at 0020
  IP: [8179b29f] __virtscsi_set_affinity+0x6f/0x120
  PGD 0
  Oops:  [#1] SMP
  Modules linked in:
  CPU: 0 PID: 659 Comm: kworker/0:1 Not tainted 3.11.0-rc2+ #1172
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  Workqueue: kacpi_hotplug _handle_hotplug_event_func
  task: 88007bee1cc0 ti: 88007bfe4000 task.ti: 88007bfe4000
  RIP: 0010:[8179b29f]  [8179b29f]
  __virtscsi_set_affinity+0x6f/0x120
  RSP: 0018:88007bfe5a38  EFLAGS: 00010202
  RAX: 0010 RBX: 880077fd0d28 RCX: 0050
  RDX:  RSI: 0246 RDI: 
  RBP: 88007bfe5a58 R08: 880077f6ff00 R09: 0001
  R10: 8143e673 R11: 0001 R12: 0001
  R13: 880077fd0800 R14:  R15: 88007bf489b0
  FS:  () GS:88007ea0() knlGS:
  CS:  0010 DS:  ES:  CR0: 8005003b
  CR2: 0020 CR3: 79f8b000 CR4: 06f0
  Stack:
   880077fd0d28  880077fd0800 0008
   88007bfe5a78 8179b37d 88007bccc800 88007bccc800
   88007bfe5a98 8179b3b6 88007bccc800 880077fd0d28
  Call Trace:
   [8179b37d] virtscsi_set_affinity+0x2d/0x40
   [8179b3b6] virtscsi_remove_vqs+0x26/0x50
   [8179c7d2] virtscsi_remove+0x82/0xa0
   [814cb6b2] virtio_dev_remove+0x22/0x70
   [8167ca49] __device_release_driver+0x69/0xd0
   [8167cb9d] device_release_driver+0x2d/0x40
   [8167bb96] bus_remove_device+0x116/0x150
   [81679936] device_del+0x126/0x1e0
   [81679a06] device_unregister+0x16/0x30
   [814cb889] unregister_virtio_device+0x19/0x30
   [814cdad6] virtio_pci_remove+0x36/0x80
   [81464ae7] pci_device_remove+0x37/0x70
   [8167ca49] __device_release_driver+0x69/0xd0
   [8167cb9d] device_release_driver+0x2d/0x40
   [8167bb96] bus_remove_device+0x116/0x150
   [81679936] device_del+0x126/0x1e0
   [8145edfc] pci_stop_bus_device+0x9c/0xb0
   [8145f036] pci_stop_and_remove_bus_device+0x16/0x30
   [81474a9e] acpiphp_disable_slot+0x8e/0x150
   [81474f6a] hotplug_event_func+0xba/0x1a0
   [814906c8] ? acpi_os_release_object+0xe/0x12
   [81475911] _handle_hotplug_event_func+0x31/0x70
   [810b5333] process_one_work+0x183/0x500
   [810b66e2] worker_thread+0x122/0x400
   [810b65c0] ? manage_workers+0x2d0/0x2d0
   [810bc5de] kthread+0xce/0xe0
   [810bc510] ? kthread_freezable_should_stop+0x70/0x70
   [81ca045c] ret_from_fork+0x7c/0xb0
   [810bc510] ? kthread_freezable_should_stop+0x70/0x70
  Code: 01 00 00 00 74 59 45 31 e4 83 bb c8 01 00 00 02 74 46 66 2e 0f 1f 84
  00 00 00 00 00 49 63 c4 48 c1 e0 04 48 8b bc 0
 3 10 02 00 00 48 8b 47 20 48 8b 80 d0 01 00 00 48 8b 40 50 48 85 c0 74 07
 be
  RIP  [8179b29f] __virtscsi_set_affinity+0x6f/0x120
   RSP 88007bfe5a38
  CR2: 0020
  ---[ end trace 99679331a3775f48 ]---
 
 CC: sta...@vger.kernel.org
 Signed-off-by: Asias He as...@redhat.com
 ---
  drivers/scsi/virtio_scsi.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
 index 2168258..74b88ef 100644
 --- a/drivers/scsi/virtio_scsi.c
 +++ b/drivers/scsi/virtio_scsi.c
 @@ -751,7 +751,7 @@ static void __virtscsi_set_affinity(struct virtio_scsi
 *vscsi, bool affinity)
  
   vscsi-affinity_hint_set = true;
   } else {
 - for (i = 0; i  vscsi-num_queues - VIRTIO_SCSI_VQ_BASE; i++)
 + for (i = 0; i  vscsi-num_queues; i++)
   virtqueue_set_affinity(vscsi-req_vqs[i].vq, -1);
  
   vscsi-affinity_hint_set = false;
 --
 1.8.3.1
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] SG_IO filtering via sysfs and minimal whitelist

2013-07-05 Thread Paolo Bonzini
Il 25/06/2013 23:19, Paolo Bonzini ha scritto:
 Il 27/05/2013 15:50, Paolo Bonzini ha scritto:

 We've been running in circles for nine months now.  Let's restart from
 the maintainer's suggestion, which was probably dismissed too quickly.

 This is still not a complete solution, because /dev/sgN does not have
 access to its queue object.  Still, it can be a base for discussion.

 If accepted (in a complete form with access to the queue object for
 non-block devices), this series removes the need to fix the opcode
 conflicts as far as I'm concerned.  We could just consider that a
 feature of CONFIG_BLK_DEV_SG_FILTER_MMC.

 Previously posted at https://lkml.org/lkml/2012/9/25/397 (short thread).
 Rebased just fine, this one is compile-tested only.

 Paolo
 
 RFC didn't get many Cs.

Ping^2.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] SG_IO filtering via sysfs and minimal whitelist

2013-06-25 Thread Paolo Bonzini
Il 27/05/2013 15:50, Paolo Bonzini ha scritto:
 
 We've been running in circles for nine months now.  Let's restart from
 the maintainer's suggestion, which was probably dismissed too quickly.
 
 This is still not a complete solution, because /dev/sgN does not have
 access to its queue object.  Still, it can be a base for discussion.
 
 If accepted (in a complete form with access to the queue object for
 non-block devices), this series removes the need to fix the opcode
 conflicts as far as I'm concerned.  We could just consider that a
 feature of CONFIG_BLK_DEV_SG_FILTER_MMC.
 
 Previously posted at https://lkml.org/lkml/2012/9/25/397 (short thread).
 Rebased just fine, this one is compile-tested only.
 
 Paolo

RFC didn't get many Cs.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/9] virtio_scsi: Enable new EH timeout handler

2013-06-12 Thread Paolo Bonzini
Il 10/06/2013 03:40, Hannes Reinecke ha scritto:
 Signed-off-by: Hannes Reinecke h...@suse.de
 ---
  drivers/scsi/virtio_scsi.c | 7 +++
  1 file changed, 7 insertions(+)
 
 diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
 index 2168258..1efd219 100644
 --- a/drivers/scsi/virtio_scsi.c
 +++ b/drivers/scsi/virtio_scsi.c
 @@ -654,6 +654,11 @@ static int virtscsi_abort(struct scsi_cmnd *sc)
   return virtscsi_tmf(vscsi, cmd);
  }
  
 +static enum blk_eh_timer_return virtscsi_timedout(struct scsi_cmnd *scmd)
 +{
 + return scsi_abort_command(scmd);
 +}
 +
  static int virtscsi_target_alloc(struct scsi_target *starget)
  {
   struct virtio_scsi_target_state *tgt =
 @@ -683,6 +688,7 @@ static struct scsi_host_template 
 virtscsi_host_template_single = {
   .queuecommand = virtscsi_queuecommand_single,
   .eh_abort_handler = virtscsi_abort,
   .eh_device_reset_handler = virtscsi_device_reset,
 + .eh_timed_out = virtscsi_timedout,
  
   .can_queue = 1024,
   .dma_boundary = UINT_MAX,
 @@ -699,6 +705,7 @@ static struct scsi_host_template 
 virtscsi_host_template_multi = {
   .queuecommand = virtscsi_queuecommand_multi,
   .eh_abort_handler = virtscsi_abort,
   .eh_device_reset_handler = virtscsi_device_reset,
 + .eh_timed_out = virtscsi_timedout,
  
   .can_queue = 1024,
   .dma_boundary = UINT_MAX,
 

Acked-by: Paolo Bonzini pbonz...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/9] virtio-scsi: Implement TMF timeout

2013-06-12 Thread Paolo Bonzini
Il 10/06/2013 03:40, Hannes Reinecke ha scritto:
 Any TMF might be take longer as expected, or not return at all.
 So we need to use 'wait_for_completion_timeout' when sending
 a TMF to protect against these cases.
 
 Cc: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Hannes Reinecke h...@suse.de
 ---
  drivers/scsi/virtio_scsi.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
 index 1efd219..abfc684 100644
 --- a/drivers/scsi/virtio_scsi.c
 +++ b/drivers/scsi/virtio_scsi.c
 @@ -30,6 +30,7 @@
  #define VIRTIO_SCSI_MEMPOOL_SZ 64
  #define VIRTIO_SCSI_EVENT_LEN 8
  #define VIRTIO_SCSI_VQ_BASE 2
 +#define VIRTIO_SCSI_TMF_TIMEOUT 30
  
  /* Command queue element */
  struct virtio_scsi_cmd {
 @@ -597,8 +598,10 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, 
 struct virtio_scsi_cmd *cmd)
 GFP_NOIO)  0)
   goto out;
  
 - wait_for_completion(comp);
 - if (cmd-resp.tmf.response == VIRTIO_SCSI_S_OK ||
 + if (wait_for_completion_timeout(comp,
 + VIRTIO_SCSI_TMF_TIMEOUT * HZ) == 0)
 + ret = FAILED;
 + else if (cmd-resp.tmf.response == VIRTIO_SCSI_S_OK ||
   cmd-resp.tmf.response == VIRTIO_SCSI_S_FUNCTION_SUCCEEDED)
   ret = SUCCESS;
  
 


Acked-by: Paolo Bonzini pbonz...@redhat.com


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 3/4] block: add back command filter modification via sysfs

2013-05-27 Thread Paolo Bonzini
This adds two new sysfs attributes to the queue kobject.  The attributes
allow reading and writing the whitelist of unprivileged commands.

This is again a bit different from what was removed in commit 018e044
(block: get rid of queue-private command filter, 2009-06-26), but the idea
is the same.  One difference is that it does not use a separate kobject.
Also, the supported sysfs syntax is a bit more expressive: it includes
ranges, the ability to replace all of the filter with a single command,
and does not force usage of hexadecimal.

Since the names are different, and the old ones were anyway never really
enabled, the different API is not a problem.

Cc: linux-scsi@vger.kernel.org
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 Documentation/block/queue-sysfs.txt |  16 ++
 block/Kconfig   |  10 
 block/blk-sysfs.c   |  41 ++
 block/scsi_ioctl.c  | 106 
 include/linux/blkdev.h  |  21 +++
 5 files changed, 194 insertions(+)

diff --git a/Documentation/block/queue-sysfs.txt 
b/Documentation/block/queue-sysfs.txt
index e54ac1d..1152b38 100644
--- a/Documentation/block/queue-sysfs.txt
+++ b/Documentation/block/queue-sysfs.txt
@@ -133,6 +133,22 @@ control of this block device to that new IO scheduler. 
Note that writing
 an IO scheduler name to this file will attempt to load that IO scheduler
 module, if it isn't already present in the system.
 
+sgio_read_filter (RW)
+-
+When read, this file will display a list of SCSI commands (i.e. values of
+the first byte of a CDB) that are always available for unprivileged users
+(via /dev/bsg, /dev/sgNN, or ioctls such as SG_IO and CDROM_SEND_PACKET).
+When written, the list of commands will be modified.  By default it
+will be completely replaced; writing a string that begins with '+' will
+add new commands, and writing a string that begins with '-' will remove
+some commands.  Ranges of commands are supported, for example '0x00-0xff'.
+
+sgio_write_filter (RW)
+--
+When read, this file will display a list of SCSI commands (i.e. values of
+the first byte of a CDB) that are available for unprivileged users
+when the block device is open for writing.  Writing to this file behaves
+as for sgio_read_filter.
 
 
 Jens Axboe jens.ax...@oracle.com, February 2009
diff --git a/block/Kconfig b/block/Kconfig
index a7e40a7..e89d6a2 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -65,6 +65,16 @@ config BLK_DEV_BSG
 
  If unsure, say Y.
 
+config BLK_DEV_SG_FILTER_SYSFS
+   bool Customizable SG_IO filters in sysfs
+   default y
+   help
+ Saying Y here will let you use sysfs to customize the list
+ of SCSI commands that are available (via /dev/sg, /dev/bsg or
+ ioctls such as SG_IO) to unprivileged users.
+
+ If unsure, say Y.
+
 config BLK_DEV_BSGLIB
bool Block layer SG support v4 helper lib
default n
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 2539f8d..3e4ffeb 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -429,6 +429,43 @@ static struct queue_sysfs_entry queue_random_entry = {
.store = queue_store_random,
 };
 
+#ifdef CONFIG_BLK_DEV_SG_FILTER_SYSFS
+static ssize_t queue_sgio_filter_read_show(struct request_queue *q, char *page)
+{
+   return blk_filter_show(q, page, READ);
+}
+
+static ssize_t queue_sgio_filter_write_show(struct request_queue *q,
+char *page)
+{
+   return blk_filter_show(q, page, WRITE);
+}
+
+static ssize_t queue_sgio_filter_read_store(struct request_queue *q,
+   const char *page, size_t count)
+{
+   return blk_filter_store(q, page, count, READ);
+}
+
+static ssize_t queue_sgio_filter_write_store(struct request_queue *q,
+const char *page, size_t count)
+{
+   return blk_filter_store(q, page, count, WRITE);
+}
+
+static struct queue_sysfs_entry queue_sgio_filter_read_entry = {
+   .attr = { .name = sgio_filter_read, .mode = S_IRUGO | S_IWUSR },
+   .show = queue_sgio_filter_read_show,
+   .store = queue_sgio_filter_read_store,
+};
+
+static struct queue_sysfs_entry queue_sgio_filter_write_entry = {
+   .attr = {.name = sgio_filter_write, .mode = S_IRUGO | S_IWUSR },
+   .show = queue_sgio_filter_write_show,
+   .store = queue_sgio_filter_write_store,
+};
+#endif
+
 static struct attribute *default_attrs[] = {
queue_requests_entry.attr,
queue_ra_entry.attr,
@@ -452,6 +489,10 @@ static struct attribute *default_attrs[] = {
queue_rq_affinity_entry.attr,
queue_iostats_entry.attr,
queue_random_entry.attr,
+#ifdef CONFIG_BLK_DEV_SG_FILTER_SYSFS
+   queue_sgio_filter_read_entry.attr,
+   queue_sgio_filter_write_entry.attr,
+#endif
NULL,
 };
 
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index

[RFC PATCH 1/4] block: add back queue-private command filter

2013-05-27 Thread Paolo Bonzini
The command filter used to be mutable via sysfs, but this was broken
and backed out.  Let's add it back.  This patch adds the infrastructure
for filtering, but unlike the old code this one just adds a pointer to
request_queue, so as to make it cheaper in the majority of cases where
no special filtering is desired.

This is a partial (and massaged) revert of commit 018e044 (block: get
rid of queue-private command filter, 2009-06-26).

Cc: linux-scsi@vger.kernel.org
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 block/blk-sysfs.c  |  2 ++
 block/bsg.c|  2 +-
 block/scsi_ioctl.c | 17 +
 drivers/scsi/sg.c  |  4 +++-
 include/linux/blkdev.h | 10 +-
 5 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 5efc5a6..2539f8d 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -530,6 +530,8 @@ static void blk_release_queue(struct kobject *kobj)
 
blkcg_exit_queue(q);
 
+   kfree(q-cmd_filter);
+
if (q-elevator) {
spin_lock_irq(q-queue_lock);
ioc_clear_queue(q);
diff --git a/block/bsg.c b/block/bsg.c
index 420a5a9..fe16cae 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -187,7 +187,7 @@ static int blk_fill_sgv4_hdr_rq(struct request_queue *q, 
struct request *rq,
return -EFAULT;
 
if (hdr-subprotocol == BSG_SUB_PROTOCOL_SCSI_CMD) {
-   if (blk_verify_command(rq-cmd, has_write_perm))
+   if (blk_verify_command(q-cmd_filter, rq-cmd, has_write_perm))
return -EPERM;
} else if (!capable(CAP_SYS_RAWIO))
return -EPERM;
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index a5ffcc9..22b925f 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -34,11 +34,6 @@
 #include scsi/scsi_ioctl.h
 #include scsi/scsi_cmnd.h
 
-struct blk_cmd_filter {
-   unsigned long read_ok[BLK_SCSI_CMD_PER_LONG];
-   unsigned long write_ok[BLK_SCSI_CMD_PER_LONG];
-};
-
 static struct blk_cmd_filter blk_default_cmd_filter;
 
 /* Command group 3 is reserved and should never be used.  */
@@ -197,17 +192,15 @@ static void blk_set_cmd_filter_defaults(struct 
blk_cmd_filter *filter)
__set_bit(GPCMD_SET_READ_AHEAD, filter-write_ok);
 }
 
-int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm)
+int blk_verify_command(struct blk_cmd_filter *filter,
+  unsigned char *cmd, fmode_t has_write_perm)
 {
-   struct blk_cmd_filter *filter = blk_default_cmd_filter;
-
/* root can do any command. */
if (capable(CAP_SYS_RAWIO))
return 0;
 
-   /* if there's no filter set, assume we're filtering everything out */
if (!filter)
-   return -EPERM;
+   filter = blk_default_cmd_filter;
 
/* Anybody who can open the device can do a read-safe command */
if (test_bit(cmd[0], filter-read_ok))
@@ -226,7 +219,7 @@ static int blk_fill_sghdr_rq(struct request_queue *q, 
struct request *rq,
 {
if (copy_from_user(rq-cmd, hdr-cmdp, hdr-cmd_len))
return -EFAULT;
-   if (blk_verify_command(rq-cmd, mode  FMODE_WRITE))
+   if (blk_verify_command(q-cmd_filter, rq-cmd, mode  FMODE_WRITE))
return -EPERM;
 
/*
@@ -473,7 +466,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk 
*disk, fmode_t mode,
if (in_len  copy_from_user(buffer, sic-data + cmdlen, in_len))
goto error;
 
-   err = blk_verify_command(rq-cmd, mode  FMODE_WRITE);
+   err = blk_verify_command(q-cmd_filter, rq-cmd, mode  FMODE_WRITE);
if (err)
goto error;
 
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index df5e961..5f64202 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -218,11 +218,13 @@ static void sg_put_dev(Sg_device *sdp);
 static int sg_allow_access(struct file *filp, unsigned char *cmd)
 {
struct sg_fd *sfp = filp-private_data;
+   struct request_queue *q = sfp-parentdp-device-request_queue;
 
if (sfp-parentdp-device-type == TYPE_SCANNER)
return 0;
 
-   return blk_verify_command(cmd, filp-f_mode  FMODE_WRITE);
+   return blk_verify_command(q-cmd_filter,
+ cmd, filp-f_mode  FMODE_WRITE);
 }
 
 static int get_exclude(Sg_device *sdp)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2fdb4a4..9a8434d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -259,6 +259,11 @@ struct blk_queue_tag {
 #define BLK_SCSI_MAX_CMDS  (256)
 #define BLK_SCSI_CMD_PER_LONG  (BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))
 
+struct blk_cmd_filter {
+   unsigned long read_ok[BLK_SCSI_CMD_PER_LONG];
+   unsigned long write_ok[BLK_SCSI_CMD_PER_LONG];
+};
+
 struct queue_limits {
unsigned long   bounce_pfn;
unsigned long   seg_boundary_mask;
@@ -437,6 +442,8 @@ struct

[RFC PATCH 2/4] scsi: create an all-zero filter for scanners

2013-05-27 Thread Paolo Bonzini
Using /dev/sg for scanners is blocked from unprivileged users.  Reimplement
this using customizable command filters, so that the sysfs knobs will work
in this case too.

Cc: linux-scsi@vger.kernel.org
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/scsi/scsi_scan.c | 8 +++-
 drivers/scsi/sg.c| 3 ---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3e58b22..de5751b 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -783,13 +783,19 @@ static int scsi_add_lun(struct scsi_device *sdev, 
unsigned char *inq_result,
}
 
switch (sdev-type) {
+   case TYPE_SCANNER:
+   sdev-request_queue-cmd_filter =
+   kzalloc(sizeof(struct blk_cmd_filter), GFP_ATOMIC);
+   if (sdev-request_queue-cmd_filter == NULL)
+   return SCSI_SCAN_NO_RESPONSE;
+   /* fallthrough */
+
case TYPE_RBC:
case TYPE_TAPE:
case TYPE_DISK:
case TYPE_PRINTER:
case TYPE_MOD:
case TYPE_PROCESSOR:
-   case TYPE_SCANNER:
case TYPE_MEDIUM_CHANGER:
case TYPE_ENCLOSURE:
case TYPE_COMM:
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 5f64202..f7f5c11 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -220,9 +220,6 @@ static int sg_allow_access(struct file *filp, unsigned char 
*cmd)
struct sg_fd *sfp = filp-private_data;
struct request_queue *q = sfp-parentdp-device-request_queue;
 
-   if (sfp-parentdp-device-type == TYPE_SCANNER)
-   return 0;
-
return blk_verify_command(q-cmd_filter,
  cmd, filp-f_mode  FMODE_WRITE);
 }
-- 
1.8.1.4


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/4] SG_IO filtering via sysfs and minimal whitelist

2013-05-27 Thread Paolo Bonzini
On Wed, 12 Sep 2012 09:05:41 +0100, James Bottomley wrote:
 This is why the whole filter thing was mutable via sysfs.  That way the
 admin could set this up per device.  It sounds like this is what you
 want to fix, rather than opening up more holes in an already leaky
 security apparatus.  The ideal is that we would be much more restrictive
 by default and give root the ability to override this both globally and
 per-device to conform to whatever policy it has for the virtual
 environments.
 
 The patch which removed all of the sysfs pieces was:
 
 commit 018e0446890661504783f92388ecce7138c1566d
 Author: Jens Axboe jens.ax...@oracle.com
 Date:   Fri Jun 26 16:27:10 2009 +0200

 block: get rid of queue-private command filter
 
 So that's probably the place to start for putting it back properly.

[https://lkml.org/lkml/2012/9/12/76]

We've been running in circles for nine months now.  Let's restart from
the maintainer's suggestion, which was probably dismissed too quickly.

This is still not a complete solution, because /dev/sgN does not have
access to its queue object.  Still, it can be a base for discussion.

If accepted (in a complete form with access to the queue object for
non-block devices), this series removes the need to fix the opcode
conflicts as far as I'm concerned.  We could just consider that a
feature of CONFIG_BLK_DEV_SG_FILTER_MMC.

Previously posted at https://lkml.org/lkml/2012/9/25/397 (short thread).
Rebased just fine, this one is compile-tested only.

Paolo

Paolo Bonzini (4):
  block: add back queue-private command filter
  scsi: create an all-zero filter for scanners
  block: add back command filter modification via sysfs
  scsi: lock out SG_IO by default to unprivileged users

 Documentation/block/queue-sysfs.txt |  16 +
 block/Kconfig   |  22 ++
 block/blk-sysfs.c   |  43 
 block/bsg.c |   2 +-
 block/scsi_ioctl.c  | 131 +++-
 drivers/scsi/scsi_scan.c|   8 ++-
 drivers/scsi/sg.c   |   7 +-
 include/linux/blkdev.h  |  31 -
 8 files changed, 238 insertions(+), 22 deletions(-)

-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4/4] scsi: lock out SG_IO by default to unprivileged users

2013-05-27 Thread Paolo Bonzini
Move policy on SG_IO access to userspace.  Some level of checking
at the kernel level (compared to just having an opt-out for the
default whitelist) is needed because any userspace whitelist is too
easily circumvented by doing ptrace on the program that has a file
descriptor open.  This would be a regression for those who are using Unix
permissions, security modules or the device cgroup to confine programs.

A meaningful whitelist can then be set by udev, for example.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
This is not yet usable, because sg devices do not have a link
to the queue object.

 block/Kconfig  | 12 
 block/scsi_ioctl.c | 10 ++
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index e89d6a2..67d1f68 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -75,6 +75,18 @@ config BLK_DEV_SG_FILTER_SYSFS
 
  If unsure, say Y.
 
+config BLK_DEV_SG_FILTER_MMC
+   bool Backwards-compatible default SG_IO filter
+   default y
+   help
+ Saying Y here will let you send several commands (mostly based
+ on the MMC command set) by default to SCSI devices via /dev/sg,
+ /dev/bsg or ioctls such as SG_IO) even for unprivileged users.
+ Saying N will restrict the set of commands that unprivileged
+ users can send to the bare minimum.
+
+ If unsure, say Y.
+
 config BLK_DEV_BSGLIB
bool Block layer SG support v4 helper lib
default n
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 5db39b5..079dfa2 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -113,9 +113,12 @@ static int sg_emulated_host(struct request_queue *q, int 
__user *p)
 static void blk_set_cmd_filter_defaults(struct blk_cmd_filter *filter)
 {
memset(filter, 0, sizeof(*filter));
-
-   /* Basic read-only commands */
+   __set_bit(INQUIRY, filter-read_ok);
__set_bit(TEST_UNIT_READY, filter-read_ok);
+   __set_bit(REPORT_LUNS, filter-read_ok);
+
+#ifdef BLK_DEV_SG_FILTER_MMC
+   /* Basic read-only commands */
__set_bit(REQUEST_SENSE, filter-read_ok);
__set_bit(READ_6, filter-read_ok);
__set_bit(READ_10, filter-read_ok);
@@ -125,14 +128,12 @@ static void blk_set_cmd_filter_defaults(struct 
blk_cmd_filter *filter)
__set_bit(READ_DEFECT_DATA, filter-read_ok);
__set_bit(READ_CAPACITY, filter-read_ok);
__set_bit(READ_LONG, filter-read_ok);
-   __set_bit(INQUIRY, filter-read_ok);
__set_bit(MODE_SENSE, filter-read_ok);
__set_bit(MODE_SENSE_10, filter-read_ok);
__set_bit(LOG_SENSE, filter-read_ok);
__set_bit(START_STOP, filter-read_ok);
__set_bit(GPCMD_VERIFY_10, filter-read_ok);
__set_bit(VERIFY_16, filter-read_ok);
-   __set_bit(REPORT_LUNS, filter-read_ok);
__set_bit(SERVICE_ACTION_IN, filter-read_ok);
__set_bit(RECEIVE_DIAGNOSTIC, filter-read_ok);
__set_bit(MAINTENANCE_IN, filter-read_ok);
@@ -193,6 +194,7 @@ static void blk_set_cmd_filter_defaults(struct 
blk_cmd_filter *filter)
__set_bit(GPCMD_LOAD_UNLOAD, filter-write_ok);
__set_bit(GPCMD_SET_STREAMING, filter-write_ok);
__set_bit(GPCMD_SET_READ_AHEAD, filter-write_ok);
+#endif
 }
 
 int blk_verify_command(struct blk_cmd_filter *filter,
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command

2013-05-25 Thread Paolo Bonzini
Il 25/05/2013 06:14, James Bottomley ha scritto:
 On Fri, 2013-05-24 at 10:32 +0200, Paolo Bonzini wrote:
 Il 24/05/2013 10:03, James Bottomley ha scritto:
 Does anyone in the real world actually care about this bug?

 Yes, or I would move on and not waste so much time on this.

 Fine, so produce a simple fix for this bug which we can discuss that's
 not tied to this feature.

 Honestly, I have no idea how this is even possible.
 Really?  It looks to me like a simple block on the commands for disk
 devices in the opcode switch would do it (with a corresponding change to
 sg.c:sg_allow_access).

 Which switch?  What I can do is something like this in blk_verify_command:
 
 not in blk_verify_command: outside of it, in the three places it's used.

In other words,

if (blk_verify_command(...) ||
blk_verify_command_with_queue(q, ...))

We must have different taste.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-25 Thread Paolo Bonzini
Il 25/05/2013 07:27, Christoph Hellwig ha scritto:
 On Fri, May 24, 2013 at 09:35:02PM -0700, James Bottomley wrote:
 I'll go along with this.  I'm also wondering what the problem would be
 if we just allowed all commands on either CAP_SYS_RAWIO or opening the
 device for write, so we just defer to the filesystem permissions and
 restricted read only opens to the basic all device opcodes.
 
 I've been out of this area for a bit, but the problem used to be that
 you could send destructive commands to a partition.  The right fix
 for that would be to not allow SG_IO on partitions at all, just
 wondering if anything would be broken by this.

Linus wanted to keep that for CAP_SYS_RAWIO.  We found two uses of SG_IO
on partitions: zfs-fuse used SYNCHRONIZE CACHE; some proprietary driver
used TEST UNIT READY.

Really, the solution is to make the bitmaps configurable in userspace.
It is no less secure than unpriv_sgio.  Then the kernel can be
configured at build-time to have either an MMC bitmap and a basic
whitelist of a dozen commands.  We can even avoid working around those
few conflicting opcodes; if you're paranoid you can just configure your
kernel right.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-25 Thread Paolo Bonzini
Il 25/05/2013 09:11, Christoph Hellwig ha scritto:
  Linus wanted to keep that for CAP_SYS_RAWIO.  We found two uses of SG_IO
  on partitions: zfs-fuse used SYNCHRONIZE CACHE; some proprietary driver
  used TEST UNIT READY.
  
  Really, the solution is to make the bitmaps configurable in userspace.
  It is no less secure than unpriv_sgio.  Then the kernel can be
  configured at build-time to have either an MMC bitmap and a basic
  whitelist of a dozen commands.  We can even avoid working around those
  few conflicting opcodes; if you're paranoid you can just configure your
  kernel right.
 
 Keep it simple.  Allowing SG_IO for CAP_SYS_RAWIO probably is fine,
 allowing it for permissions only clearly isn't. All the per-command
 filetering is just complete bullshit and the kind of bloat that
 eventually will make Linux unmaintainable.

What I'm proposing is:

- by default, only allow INQUIRY / REPORT LUNS / TEST UNIT READY.  Keep
the old whitelist available for backwards-compatibility, configurable at
build-time.

- let CAP_SYS_ADMIN users set a different per-device whitelist.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-25 Thread Paolo Bonzini
Il 25/05/2013 10:37, Tejun Heo ha scritto:
 Hey, James.
 
 On Fri, May 24, 2013 at 09:35:02PM -0700, James Bottomley wrote:
 Well, I'd actually much prefer disabling CDB whitelisting for all !MMC
 devices if at all possible.

 I'll go along with this.  I'm also wondering what the problem would be
 
 Don't think we can.  It'd be a behavior change clearly visible to
 userland at this point.

We can (and even for MMC) if it is a build-time configuration knob.  It
would satisfy those people who want the CVE fixed, as long as userspace
gets some configurability.

 * Fix the security bug.  I don't really care how it's fixed as long as
   the amount of whitelisted commands goes down not up.
 
 * It's not like we can remove the filter for !MMC devices at this
   point, so I think it makes sense to make it per-class so that we can
   *remove* commands which aren't relevant for the device type.  Also,
   we probably wanna add read blinking comment yelling that no further
   commands should be added.
 
 * Merge the patch to give out SG_IO access along with write access, so
   the use cases which want to give out SG_IO access can do so
   explicitly and be fully responsible for the device.  This makes
   sense to me.  If one wants to be allowed to issue raw commands to
   the hardware, one takes the full responsibility.

That's not possible; it would make it impossible to do things like using
a privileged helper to open the file and passing it back via SCM_RIGHTS
to an unprivileged program (running as the user).  This is the ptrace
attack that you mentioned.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-25 Thread Paolo Bonzini
Il 25/05/2013 14:48, Tejun Heo ha scritto:
 * Merge the patch to give out SG_IO access along with write access, so
 the use cases which want to give out SG_IO access can do so
 explicitly and be fully responsible for the device.  This makes
 sense to me.  If one wants to be allowed to issue raw commands to
 the hardware, one takes the full responsibility.
  
  That's not possible; it would make it impossible to do things like using
  a privileged helper to open the file and passing it back via SCM_RIGHTS
  to an unprivileged program (running as the user).  This is the ptrace
  attack that you mentioned.
 I have no idea what you're talking about.  I'm describing the same
 thing you implemented and posted.

Ok, I think we need a rewind.  I'll try to post what I mean next week.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-24 Thread Paolo Bonzini
Il 24/05/2013 03:44, Tejun Heo ha scritto:
 On Thu, May 23, 2013 at 11:47:25AM +0200, Paolo Bonzini wrote:
 No no, I'm not talking about it not working for the users - it's just
 passing the commands, it of course works.  I'm doubting about it being
 a worthy security isolation layer.  cdb filtering (of any form really)
 has always been something on the border.  It always was something we
 got stuck with due to lack of other immediate options.

 I understood correctly then. :)  I agree it's not the best, but it's not
 completely broken either.  It has bugs, and that's what these patches
 try to fix.
 
 The same filtering table being applied to different classes of
 hardware is a software bug, but my point is that the practive
 essentially entrusts non-insignificant part of security enforcement to
 the hardware itself.  The variety of hardware in question is very wide
 and significant portion has historically been known to be flaky.

Unproven theory, and contradicted by actual practice.  Bugs are more
common in the handling of borderline conditions, not in the handling of
unimplemented commands.

If you want to be secure aginst buggy firmware, the commands you have to
block are READ and WRITE.  Check out the list of existing USB quirks.

 I'm wondering whether combining 3 into 4 would be good enough.

 No, it wouldn't.  I learnt it the hard way (by having a patch nacked :))
 at http://thread.gmane.org/gmane.linux.kernel/1311887.
 
 Of course you can't do that by adding dangerous commands to the
 existing filtering table which is allowed by default.  I'm saying
 count me out knob could be good enough.  Neither is perfect but at
 least the latter doesn't affect the default cases.

You need to allow more commands.

The count-me-out knob allows all commands.

You cannot always allow all commands.

Ergo, you cannot always use the count-me-out knob.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command

2013-05-24 Thread Paolo Bonzini
Il 24/05/2013 09:36, James Bottomley ha scritto:
 On Thu, 2013-05-23 at 15:58 +0200, Paolo Bonzini wrote:
 Adjust the blk_verify_command function to let it look at per-queue
 data.  This will be done in the next patch.
 
 This is not a bug fix.  This is an enabler for your complex and to my
 mind dubious rework of the SG_IO command filter.  I'm running out of
 ways to say please don't mix bug fixes with features, because this
 redesignating of the original patch set as part 1 and parts 2,3 doesn't
 satisfy the requirement.

I made it part 1/2/3 because parts 2/3 depend on part 1.  It makes
dependency tracking easier, at least in my mind.

If you have another solution that does not require passing request_queue
to blk_verify_command, I'm all ears.

 Does anyone in the real world actually care about this bug?

Yes, or I would move on and not waste so much time on this.

Paolo

 because if
 not perhaps we can just remove the confusion and consider this as a
 feature set.  If there's someone who actually cares, please lets just do
 the bug fix first and argue about the feature later.
 
 James
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command

2013-05-24 Thread Paolo Bonzini
Il 24/05/2013 09:50, James Bottomley ha scritto:
 On Fri, 2013-05-24 at 09:43 +0200, Paolo Bonzini wrote:
 Il 24/05/2013 09:36, James Bottomley ha scritto:
 On Thu, 2013-05-23 at 15:58 +0200, Paolo Bonzini wrote:
 Adjust the blk_verify_command function to let it look at per-queue
 data.  This will be done in the next patch.

 This is not a bug fix.  This is an enabler for your complex and to my
 mind dubious rework of the SG_IO command filter.  I'm running out of
 ways to say please don't mix bug fixes with features, because this
 redesignating of the original patch set as part 1 and parts 2,3 doesn't
 satisfy the requirement.

 I made it part 1/2/3 because parts 2/3 depend on part 1.  It makes
 dependency tracking easier, at least in my mind.

 If you have another solution that does not require passing request_queue
 to blk_verify_command, I'm all ears.
 
 That's a circular response that doesn't answer the question.  The actual
 question is: what is simple fix for the bug that isn't entangled with
 enabling the SG_IO per device type whitelist feature.
 
 Does anyone in the real world actually care about this bug?

 Yes, or I would move on and not waste so much time on this.
 
 Fine, so produce a simple fix for this bug which we can discuss that's
 not tied to this feature.

Honestly, I have no idea how this is even possible.

Paolo

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-24 Thread Paolo Bonzini
Il 24/05/2013 10:02, Tejun Heo ha scritto:
 On Fri, May 24, 2013 at 4:13 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 The same filtering table being applied to different classes of
 hardware is a software bug, but my point is that the practive
 essentially entrusts non-insignificant part of security enforcement to
 the hardware itself.  The variety of hardware in question is very wide
 and significant portion has historically been known to be flaky.

 Unproven theory, and contradicted by actual practice.  Bugs are more
 common in the handling of borderline conditions, not in the handling of
 unimplemented commands.

 If you want to be secure aginst buggy firmware, the commands you have to
 block are READ and WRITE.  Check out the list of existing USB quirks.
 
 Well, I'd actually much prefer disabling CDB whitelisting for all !MMC
 devices if at all possible. You're basically arguing that because what
 we have is already broken, it should be okay to break it further.
 Also, RW commands having more quirks doesn't necessarily indicate that
 they tend to be more broken. They just get hammered on a lot in
 various ways so problems on those commands tend to be more noticeable.

I agree intuition may not count, and it's perfectly possible that
firmware writers forgot a break; or put the wrong location in a jump
table, so that unimplemented commands give interesting results.

However, the _fact_ is that this might happen anyway with the buttload
of commands that are already enabled by the whitelist and that most
disks will never implement.

 You need to allow more commands.
 The count-me-out knob allows all commands.
 You cannot always allow all commands.
 Ergo, you cannot always use the count-me-out knob.
 
 The thing is that both approaches aren't perfect here so you can make
 similar type of argument from the other side. If the system wants to
 give out raw hardware access to VMs, requiring it to delegate the
 device fully could be reasonable.

No, it is not unfortunately.  Allowing to do discards is one thing,
allowing to disrupt the settings of a SAN is another.  You can only
delegate the device fully in these cases:

(a) of course, if the guest is trusted;

(b) if QEMU is running as a confined user, then you can get by with a
userspace whitelist.  (Otherwise you're just a ptrace away from
arbitrary access, as you pointed out).

Unfortunately, there are _real_ problems that this patches fix, such as
log spews due to blocked commands, and these happen even if neither of
the above conditions is true.

Is there anything else I can do?  Sure, I can check for the presence of
the whitelist and hack the VPD pages to hide features that I know the
whitelist will block.  Is it the right thing to do?  In my opinion no.
It makes no sense to have raw device access _disable_ features compared
to emulation.

 Not ideal but widening direct
 command access by default is pretty nasty too.

I actually agree, and that's why I'm trying to balance the widening and
restricting.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command

2013-05-24 Thread Paolo Bonzini
Il 24/05/2013 10:03, James Bottomley ha scritto:
   Does anyone in the real world actually care about this bug?
  
   Yes, or I would move on and not waste so much time on this.
   
   Fine, so produce a simple fix for this bug which we can discuss that's
   not tied to this feature.
  
  Honestly, I have no idea how this is even possible.
 Really?  It looks to me like a simple block on the commands for disk
 devices in the opcode switch would do it (with a corresponding change to
 sg.c:sg_allow_access).

Which switch?  What I can do is something like this in blk_verify_command:

if (q-sgio_type == TYPE_ROM)
return 0;
if (rq-cmd[0] == 0xA4)
return -EPERM;
if (!is_write 
(req-cmd[0] == ... || rq-cmd[0] == ...))
return -EPERM;

But then the particular patch you're replying to is still necessary,
and you're slowing down blk_verify_command.  It may be fine for stable
and -rc, but IMHO it calls for a better implementation in 3.11.

(Besides, I did it like this because it is what Tejun suggested).

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-24 Thread Paolo Bonzini
Il 24/05/2013 11:07, Tejun Heo ha scritto:
 On Fri, May 24, 2013 at 5:31 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 I agree intuition may not count, and it's perfectly possible that
 firmware writers forgot a break; or put the wrong location in a jump
 table, so that unimplemented commands give interesting results.
 
 It's not just unimplemented commands. Exposing any new command exposes
 its borderline problems together with it.

For commands that are used by Linux already, the right way to fix the
problems is not obscuring the commands from userspace's view.  You can
hit the same problems with ioctls or even with normal operation of the
device.

 However, the _fact_ is that this might happen anyway with the buttload
 of commands that are already enabled by the whitelist and that most
 disks will never implement.
 
 Yes and that's why the whitelist is generally frowned upon. It's
 inherently fragile. These devices simply aren't designed and
 implemented to be exposed to lesser security domains directly. It's
 true that it's already kinda broken that way but as I wrote before
 it's a vicious cycle and we don't wanna keep building on top of it.
 This expansion is gonna increase the usage of whitelisting which will
 in turn attract further use cases which are likely to call for even
 more expansion.

And prohibiting the extension of whitelists is gonna increase the
usage of unpriv_sgio and less-secure userspace whitelists.

Anvil, meet hammer.

 The thing is that both approaches aren't perfect here so you can make
 similar type of argument from the other side. If the system wants to
 give out raw hardware access to VMs, requiring it to delegate the
 device fully could be reasonable.

 No, it is not unfortunately.  Allowing to do discards is one thing,
 allowing to disrupt the settings of a SAN is another.  You can only
 delegate the device fully in these cases:

 (a) of course, if the guest is trusted;
 (b) if QEMU is running as a confined user;
 
 If the bulk of filtering can be solved with userland whitelisting as a
 confined user, it should be possible to resolve peripheral problems
 like log messages in simpler way, no? Can you please elaborate on the
 log message problem? Who's spewing those messages?

For example:

if (bdev_write_same(bdev)) {
unsigned char bdn[BDEVNAME_SIZE];

if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
 ZERO_PAGE(0)))
return 0;

bdevname(bdev, bdn);
pr_err(%s: WRITE SAME failed. Manually zeroing.\n, bdn);
}

return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);

The device exposes the ability to zero out LUN blocks, but the command is
not whitelisted and it fails.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command

2013-05-24 Thread Paolo Bonzini
Il 24/05/2013 10:32, Paolo Bonzini ha scritto:
 Il 24/05/2013 10:03, James Bottomley ha scritto:
 Does anyone in the real world actually care about this bug?

 Yes, or I would move on and not waste so much time on this.

 Fine, so produce a simple fix for this bug which we can discuss that's
 not tied to this feature.

 Honestly, I have no idea how this is even possible.
 Really?  It looks to me like a simple block on the commands for disk
 devices in the opcode switch would do it (with a corresponding change to
 sg.c:sg_allow_access).
 
 Which switch?  What I can do is something like this in blk_verify_command:
 
 if (q-sgio_type == TYPE_ROM)
   return 0;
   if (rq-cmd[0] == 0xA4)
   return -EPERM;
   if (!is_write 
   (req-cmd[0] == ... || rq-cmd[0] == ...))
   return -EPERM;
 
 But then the particular patch you're replying to is still necessary,
 and you're slowing down blk_verify_command.  It may be fine for stable
 and -rc, but IMHO it calls for a better implementation in 3.11.

Ok, so I did a patch along these lines.  And it's just as ugly as
everything else that I've been posting in these threads.  Yes, perhaps
it has a redeeming grace in that it is fine for = 3.10, but that's it.

Because actually I agree with you.  The rework of the SG_IO command
filter _is_ dubious to say the least; 300 lines of default, immutable
policy don't belong in the kernel.

So why am I posting this crap?  Because I have to work around the nack
of the generic sysfs bitmap patches, which would have beatifully solved
all of this.

In fact, you had proposed that approach.  I posted it in September 2012.
 Then (as usual) silence for one month until it was quickly dismissed by
Tejun.

*You and Jens* failed to review patches, and this rathole is what that
led to.  It's unpleasant for me as it is for everyone else.

Yes, you and Jens are busy, we all are.  But *you and Jens* are the
maintainers.  Please make a decision instead of drawing straws, so that
we can all go back to our regular business.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-23 Thread Paolo Bonzini
Il 23/05/2013 00:17, Tejun Heo ha scritto:
 Then let's make it fit the use case better.  I really can't see much
 point in crafting the cdb filter when you basically have to entrust
 the device to the user anyway.  Let's either trust the user with the
 device or not.  I'm very doubtful that the filtered access via SG_IO
 can be reliable or secure enough.  Let's please avoid extending a
 broken thing.

Sorry to say that, but I'm very doubtful that... is just conspiracy
theory.

It is not broken.  I'm not _that_ clueless, if it were broken I wouldn't
have had users use it in production.

 One more thing, is it really necessary to have finer granularity than
 provided by file permissions?  What would be the use case?  Do you
 expect to have multiple - two - differing levels of access with and
 without SG_IO?

No, I don't.  I want four levels:

1) no access;

2) read-only access;

3) read-write whitelisted access;

4) generic access;

but it's indeed fine to assume that 3 and 4 will never be given together
to the same disk.  The important point is that 2 and 3 should not
require any privileges except for opening the file.

With the opt-out knob, you still need a long-lived privileged process in
order to set the knob back to no access, and that's undesirable.
Long-lived privileged processes can be SIGKILLed and leave things open
for misuse; instead, if I need something privileged I want to confine it
to a helper that opens the file and passes back the file descriptor.

 for the same user, it's pointless to give out SG_IO access to
 processes while denying for other processes.  As long as ptrace can
 be attached, hijacking such fd is easy. Making it per-device should
 be suitable enough, no?

Yes, and that's what I did.  Such hijacking is why a kernel whitelist is
necessary in untrusted cases (i.e. you cannot just implement it in
userspace).

 There are many use cases, I listed some in my reply to Martin.
 Sometimes you have trust over the guest and can use count-me-out.  But
 in some cases you don't, and yet the current whitelist is not enough
 (e.g. tapes).
 
 Can you elaborate?  Why can't a tape device be entrusted to the user?

In general, any device may or may not be entrusted to the user.  In this
respect, tapes or disks have no difference.

But while the current whitelist is almost okay for disks, it is not
usable for tapes.  Too many essential commands are missing; this is why
extending the whitelist to cover other device types is important for me.
 And since you don't want to open new commands to all classes with no
distinction (which I understand), the only choice is per-class whitelists.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-23 Thread Paolo Bonzini
Il 23/05/2013 11:02, Tejun Heo ha scritto:
 On Thu, May 23, 2013 at 09:45:42AM +0200, Paolo Bonzini wrote:
 Il 23/05/2013 00:17, Tejun Heo ha scritto:
 Then let's make it fit the use case better.  I really can't see much
 point in crafting the cdb filter when you basically have to entrust
 the device to the user anyway.  Let's either trust the user with the
 device or not.  I'm very doubtful that the filtered access via SG_IO
 can be reliable or secure enough.  Let's please avoid extending a
 broken thing.

 Sorry to say that, but I'm very doubtful that... is just conspiracy
 theory.

 It is not broken.  I'm not _that_ clueless, if it were broken I wouldn't
 have had users use it in production.
 
 No no, I'm not talking about it not working for the users - it's just
 passing the commands, it of course works.  I'm doubting about it being
 a worthy security isolation layer.  cdb filtering (of any form really)
 has always been something on the border.  It always was something we
 got stuck with due to lack of other immediate options.

I understood correctly then. :)  I agree it's not the best, but it's not
completely broken either.  It has bugs, and that's what these patches
try to fix.

 One more thing, is it really necessary to have finer granularity than
 provided by file permissions?  What would be the use case?  Do you
 expect to have multiple - two - differing levels of access with and
 without SG_IO?

 No, I don't.  I want four levels:

 1) no access;
 2) read-only access;
 3) read-write whitelisted access;
 4) generic access;

 but it's indeed fine to assume that 3 and 4 will never be given together
 to the same disk.  The important point is that 2 and 3 should not
 require any privileges except for opening the file.
 
 I'm wondering whether combining 3 into 4 would be good enough.

No, it wouldn't.  I learnt it the hard way (by having a patch nacked :))
at http://thread.gmane.org/gmane.linux.kernel/1311887.

 With the opt-out knob, you still need a long-lived privileged process in
 order to set the knob back to no access, and that's undesirable.
 Long-lived privileged processes can be SIGKILLed and leave things open
 for misuse; instead, if I need something privileged I want to confine it
 to a helper that opens the file and passes back the file descriptor.
 
 Hmmm?  Somebody has to give out the access rights anyway, be it a udev
 rule or polkit.  While not having to depend on them could be nice, I
 don't think involving userland is a big deal.  It's already involved
 in closely related capacities anyway.

Polkit need not do anything to give out the access rights, it only has
to just confirm the credentials.  A helper setgid-disk can consult
polkit, open the file, pass the file descriptor back, and exit.  We
already do something similar for tap devices.

 There are many use cases, I listed some in my reply to Martin.
 Sometimes you have trust over the guest and can use count-me-out.  But
 in some cases you don't, and yet the current whitelist is not enough
 (e.g. tapes).

 Can you elaborate?  Why can't a tape device be entrusted to the user?

 I was curious why they wouldn't be able to use count-me-out knob.  The
 white list was nasty but we did it anyway because there were some
 horrible commands which, ISTR, could even affect other devices sharing
 the bus and at least at the beginning the list of commands which
 should be allowed were pretty compact.

Exactly.  Though these commands do not really affect other devices
sharing the bus, they affect other initiators trying to use the device.
 And these commands are generic, so they apply to disks as well as tapes.

Unfortunately, a purely userspace whitelist would be too easy to circumvent.

 Whitelisting those commands is likely to be mostly harmless most of
 the time but I would be surprised if there aren't devices out in the
 wild which can be bricked / affected adversely with carefully crafted
 commands within the allowed cdbs.  The level of security isolation
 which can be provided by command code filtering is somewhat flimsy,
 which is why cdb filtering was always seen as a kludge, which in turn
 is why there's reluctance against extending it to more use cases.

Yes, I understand that.  But I don't believe it is a serious concern.
In the case of tapes, for example, you're not talking about 10 dollar
USB pendrives whose firmware was written in fire-and-forget mode.
Firmware bugs would be updated by the manufacturers.

I cannot exclude there will _never_ be a bug like this, of course.  But
even if there will be one, IMHO it would be exceptional enough that we
can afford fixing it with a per-device quirk.

Scanners have never had any CDB filtering done on them; I searched for
this kind of bug, and I couldn't find any report.

What I am trying to do is to reach a compromise.  The one I'm proposing
is to forbid reserved or vendor-specific commands, while at the same
time opening the doors on more standardized commands.

Paolo

--
To unsubscribe from

[PATCH v3 part1 3/4] sg_io: use different default filters for each device class

2013-05-23 Thread Paolo Bonzini
Store the filters in a 256-entry array, and pick an appropriate filter
for SCSI devices.  Apart from SCSI disks, SG_IO is supported for CCISS,
ide-floppy and virtio-blk devices; TYPE_DISK (which is zero, i.e. the
default) is more appropriate for these devices than TYPE_ROM.

However, all lists are still the same, so there is no semantic change
in this patch.

Cc: sta...@gnu.org
Cc: James E.J. Bottomley jbottom...@parallels.com
Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe ax...@kernel.dk
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 block/scsi_ioctl.c   | 14 +-
 drivers/scsi/scsi_scan.c |  2 ++
 include/linux/blkdev.h   |  2 +-
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 21ddf17..6e18156 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -35,8 +35,8 @@
 #include scsi/scsi_cmnd.h
 
 struct blk_cmd_filter {
-   unsigned long read_ok[BLK_SCSI_CMD_PER_LONG];
-   unsigned long write_ok[BLK_SCSI_CMD_PER_LONG];
+   u32 read_ok[BLK_SCSI_MAX_CMDS];
+   u32 write_ok[BLK_SCSI_MAX_CMDS];
 };
 
 static struct blk_cmd_filter blk_default_cmd_filter;
@@ -117,7 +117,7 @@ static int sg_emulated_host(struct request_queue *q, int 
__user *p)
 static void blk_set_cmd_filter_defaults(struct blk_cmd_filter *filter)
 {
 #define sgio_bitmap_set(cmd, rw) \
-   __set_bit((cmd), filter-rw##_ok)
+   filter-rw##_ok[(cmd)] = ~0;
 
/* Basic read-only commands */
sgio_bitmap_set(TEST_UNIT_READY, read);
@@ -210,16 +210,12 @@ int blk_verify_command(struct request_queue *q,
if (capable(CAP_SYS_RAWIO))
return 0;
 
-   /* if there's no filter set, assume we're filtering everything out */
-   if (!filter)
-   return -EPERM;
-
/* Anybody who can open the device can do a read-safe command */
-   if (test_bit(cmd[0], filter-read_ok))
+   if (filter-read_ok[cmd[0]]  (1  q-sgio_type))
return 0;
 
/* Write-safe commands require a writable open */
-   if (test_bit(cmd[0], filter-write_ok)  has_write_perm)
+   if (has_write_perm  filter-write_ok[cmd[0]]  (1  q-sgio_type))
return 0;
 
return -EPERM;
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3e58b22..86940f3 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -782,6 +782,8 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned 
char *inq_result,
sdev-removable = (inq_result[1]  0x80)  7;
}
 
+   sdev-request_queue-sgio_type = sdev-type;
+
switch (sdev-type) {
case TYPE_RBC:
case TYPE_TAPE:
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4fca347..5e18969 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -257,7 +257,6 @@ struct blk_queue_tag {
 };
 
 #define BLK_SCSI_MAX_CMDS  (256)
-#define BLK_SCSI_CMD_PER_LONG  (BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))
 
 struct queue_limits {
unsigned long   bounce_pfn;
@@ -410,6 +409,7 @@ struct request_queue {
 */
unsigned intsg_timeout;
unsigned intsg_reserved_size;
+   unsigned char   sgio_type;
int node;
 #ifdef CONFIG_BLK_DEV_IO_TRACE
struct blk_trace*blk_trace;
-- 
1.8.1.4


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


<    1   2   3   4   >