Re: [PATCH-v2 0/6] vhost/scsi: Add T10 PI SGL passthrough support
Il 10/06/2014 09:07, Nicholas A. Bellinger ha scritto: OK, finally went over this, looks good to me: Acked-by: Michael S. Tsirkin m...@redhat.com However, we really should stop making more changes before fixing ANY_LAYOUT in this driver. virtio 1.0 should be out soon and that makes ANY_LAYOUT a required feature. What else is required to complete the ANY_LAYOUT conversion..? Basically, you need to stop expecting the request and response headers to be in a separate iov, and also vhost-scsi should not expect pi to be in a separate iov than the main data. A layout that has everything in the same iov would be fine, and similarly for a layout that has the CDB in a separate iov than the rest of the header. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 3/3] TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire
Il 10/06/2014 10:04, Nicholas A. Bellinger ha scritto: That said, there is one other small qla2xxx change to enable per-session PI that is currently missing in Quinn's patch in scsi/for-next code. Sooo, I'll go ahead and include Sagi's patches with the vhost-scsi change below if there are no objections from MKP and/or driver maintainers, and post an -rc2 series after the merge window closes to address the two remaining qla2xxx specific items. Indeed the patch looks like a tcm bugfix. There is no reason to modify the LLD. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-v2 0/6] vhost/scsi: Add T10 PI SGL passthrough support
Il 08/06/2014 18:05, Michael S. Tsirkin ha scritto: OK, finally went over this, looks good to me: Acked-by: Michael S. Tsirkin m...@redhat.com However, we really should stop making more changes before fixing ANY_LAYOUT in this driver. virtio 1.0 should be out soon and that makes ANY_LAYOUT a required feature. Agreed. virtio-blk btw is getting ANY_LAYOUT support in QEMU 2.1, and virtio-scsi will follow suit. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi: ibmvscsi: protect abort handler from done-scmd in flight
Il 05/06/2014 08:16, Liu Ping Fan ha scritto: Take the following scene in guest: seqA: scsi_done() - gapX (before taking REQ_ATOM_COMPLETE) seqB: scmd_eh_abort_handler()- ...- ibmvscsi_eh_abort_handler()- ...-scsi_put_command(scmd) If seqA is scheduled at gapX, and seqB reclaims scmd. Then when seqA comes back, it tries to access the scmd when is turned back to mempool. This patch fixes the race by ensuring when ibmvscsi_eh_abort_handler() returns, no scsi_done is in flight Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- When trying to figure the scsi_cmnd in flight issue, I learned from Paolo (thanks). He showed me the way how virtscsi resolves the race between abort-handler and scsi_done in flight. And I think that this method is also needed by ibmvscsi. --- drivers/scsi/ibmvscsi/ibmvscsi.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c index fa76440..325cef6 100644 --- a/drivers/scsi/ibmvscsi/ibmvscsi.c +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c @@ -1828,16 +1828,19 @@ static void ibmvscsi_handle_crq(struct viosrp_crq *crq, if ((crq-status != VIOSRP_OK crq-status != VIOSRP_OK2) evt_struct-cmnd) evt_struct-cmnd-result = DID_ERROR 16; - if (evt_struct-done) - evt_struct-done(evt_struct); - else - dev_err(hostdata-dev, returned done() is NULL; not running it!\n); /* * Lock the host_lock before messing with these structures, since we * are running in a task context +* Also, this lock helps ibmvscsi_eh_abort_handler() to shield the +* scsi_done() in flight. */ spin_lock_irqsave(evt_struct-hostdata-host-host_lock, flags); + if (evt_struct-done) + evt_struct-done(evt_struct); + else + dev_err(hostdata-dev, returned done() is NULL; not running it!\n); + list_del(evt_struct-list); free_event_struct(evt_struct-hostdata-pool, evt_struct); spin_unlock_irqrestore(evt_struct-hostdata-host-host_lock, flags); I think this is not necessary because ibmvscsi places TMFs and commands in the same queue; virtio-scsi instead uses two different queues. So ibmvscsi_handle_crq processes all completed requests, which naturally serializes the processing of the TMF and the command. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/6] virtio-scsi patches for 3.16 + midlayer fix
As mentioned in the v1 submission, these patches were delayed a bit by the discussion and testing of patch 5. Debugging the problem also led to the discovery of a midlayer bug fixed by patch 4. Paolo v1-v2: fix all occurrences of scmd-result |= DID_TIME_OUT 16 in patch 4 Ming Lei (2): virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends() virtio-scsi: replace target spinlock with seqcount Paolo Bonzini (2): virtio-scsi: avoid cancelling uninitialized work items virtio-scsi: fix various bad behavior on aborted requests Ulrich Obergfell (1): scsi_error: fix invalid setting of host byte Venkatesh Srinivas (1): virtio-scsi: Implement change_queue_depth for virtscsi targets drivers/scsi/scsi_error.c | 6 +- drivers/scsi/virtio_scsi.c | 148 +++-- 2 files changed, 93 insertions(+), 61 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/6] scsi_error: fix invalid setting of host byte
From: Ulrich Obergfell uober...@redhat.com After scsi_try_to_abort_cmd returns, the eh_abort_handler may have already found that the command has completed in the device, causing the host_byte to be nonzero (e.g. it could be DID_ABORT). When this happens, ORing DID_TIME_OUT into the host byte will corrupt the result field and initiate an unwanted command retry. Fix this by using set_host_byte instead, following the model of commit 2082ebc45af9c9c648383b8cde0dc1948eadbf31. Cc: sta...@vger.kernel.org Signed-off-by: Ulrich Obergfell uober...@redhat.com [Fix all instances according to review comments. - Paolo] Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- v1-v2: fix all occurrences [Bart] drivers/scsi/scsi_error.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index f17aa7aa7879..8e7a0f8d4f52 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -131,7 +131,7 @@ scmd_eh_abort_handler(struct work_struct *work) aborting command %p\n, scmd)); rtn = scsi_try_to_abort_cmd(sdev-host-hostt, scmd); if (rtn == SUCCESS) { - scmd-result |= DID_TIME_OUT 16; + set_host_byte(scmd, DID_TIME_OUT); if (scsi_host_eh_past_deadline(sdev-host)) { SCSI_LOG_ERROR_RECOVERY(3, scmd_printk(KERN_INFO, scmd, @@ -167,7 +167,7 @@ scmd_eh_abort_handler(struct work_struct *work) scmd_printk(KERN_WARNING, scmd, scmd %p terminate aborted command\n, scmd)); - scmd-result |= DID_TIME_OUT 16; + set_host_byte(scmd, DID_TIME_OUT); scsi_finish_command(scmd); } } @@ -291,7 +291,7 @@ enum blk_eh_timer_return scsi_times_out(struct request *req) if (scsi_abort_command(scmd) == SUCCESS) return BLK_EH_NOT_HANDLED; - scmd-result |= DID_TIME_OUT 16; + set_host_byte(scmd, DID_TIME_OUT); if (unlikely(rtn == BLK_EH_NOT_HANDLED !scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 6/6] virtio-scsi: Implement change_queue_depth for virtscsi targets
From: Venkatesh Srinivas venkate...@google.com change_queue_depth allows changing per-target queue depth via sysfs. It also allows the SCSI midlayer to ramp down the number of concurrent inflight requests in response to a SCSI BUSY status response and allows the midlayer to ramp the count back up to the device maximum when the BUSY condition has resolved. Signed-off-by: Venkatesh Srinivas venkate...@google.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/virtio_scsi.c | 33 + 1 file changed, 33 insertions(+) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index fda9fb35..de0f8d9b3682 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -26,6 +26,7 @@ #include scsi/scsi_host.h #include scsi/scsi_device.h #include scsi/scsi_cmnd.h +#include scsi/scsi_tcq.h #include linux/seqlock.h #define VIRTIO_SCSI_MEMPOOL_SZ 64 @@ -629,6 +630,36 @@ static int virtscsi_device_reset(struct scsi_cmnd *sc) return virtscsi_tmf(vscsi, cmd); } +/** + * virtscsi_change_queue_depth() - Change a virtscsi target's queue depth + * @sdev: Virtscsi target whose queue depth to change + * @qdepth:New queue depth + * @reason:Reason for the queue depth change. + */ +static int virtscsi_change_queue_depth(struct scsi_device *sdev, + int qdepth, + int reason) +{ + struct Scsi_Host *shost = sdev-host; + int max_depth = shost-cmd_per_lun; + + switch (reason) { + case SCSI_QDEPTH_QFULL: /* Drop qdepth in response to BUSY state */ + scsi_track_queue_full(sdev, qdepth); + break; + case SCSI_QDEPTH_RAMP_UP: /* Raise qdepth after BUSY state resolved */ + case SCSI_QDEPTH_DEFAULT: /* Manual change via sysfs */ + scsi_adjust_queue_depth(sdev, + scsi_get_tag_type(sdev), + min(max_depth, qdepth)); + break; + default: + return -EOPNOTSUPP; + } + + return sdev-queue_depth; +} + static int virtscsi_abort(struct scsi_cmnd *sc) { struct virtio_scsi *vscsi = shost_priv(sc-device-host); @@ -683,6 +714,7 @@ static struct scsi_host_template virtscsi_host_template_single = { .proc_name = virtio_scsi, .this_id = -1, .queuecommand = virtscsi_queuecommand_single, + .change_queue_depth = virtscsi_change_queue_depth, .eh_abort_handler = virtscsi_abort, .eh_device_reset_handler = virtscsi_device_reset, @@ -699,6 +731,7 @@ static struct scsi_host_template virtscsi_host_template_multi = { .proc_name = virtio_scsi, .this_id = -1, .queuecommand = virtscsi_queuecommand_multi, + .change_queue_depth = virtscsi_change_queue_depth, .eh_abort_handler = virtscsi_abort, .eh_device_reset_handler = virtscsi_device_reset, -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/6] virtio-scsi: fix various bad behavior on aborted requests
Even though the virtio-scsi spec guarantees that all requests related to the TMF will have been completed by the time the TMF itself completes, the request queue's callback might not have run yet. This causes requests to be completed more than once, and as a result triggers a variety of BUGs or oopses. Cc: sta...@vger.kernel.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/virtio_scsi.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index d66c4ee2c774..fda9fb35 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -235,6 +235,16 @@ static void virtscsi_req_done(struct virtqueue *vq) virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd); }; +static void virtscsi_poll_requests(struct virtio_scsi *vscsi) +{ + int i, num_vqs; + + num_vqs = vscsi-num_queues; + for (i = 0; i num_vqs; i++) + virtscsi_vq_done(vscsi, vscsi-req_vqs[i], +virtscsi_complete_cmd); +} + static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf) { struct virtio_scsi_cmd *cmd = buf; @@ -579,6 +589,18 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd *cmd) cmd-resp.tmf.response == VIRTIO_SCSI_S_FUNCTION_SUCCEEDED) ret = SUCCESS; + /* +* The spec guarantees that all requests related to the TMF have +* been completed, but the callback might not have run yet if +* we're using independent interrupts (e.g. MSI). Poll the +* virtqueues once. +* +* In the abort case, sc-scsi_done will do nothing, because +* the block layer must have detected a timeout and as a result +* REQ_ATOM_COMPLETE has been set. +*/ + virtscsi_poll_requests(vscsi); + out: mempool_free(cmd, virtscsi_cmd_pool); return ret; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/6] virtio-scsi: avoid cancelling uninitialized work items
Calling the workqueue interface on uninitialized work items isn't a good idea even if they're zeroed. It's not failing catastrophically only through happy accidents. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/virtio_scsi.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index f0b4cdbfceb0..d66c4ee2c774 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -253,6 +253,8 @@ static void virtscsi_ctrl_done(struct virtqueue *vq) virtscsi_vq_done(vscsi, vscsi-ctrl_vq, virtscsi_complete_free); }; +static void virtscsi_handle_event(struct work_struct *work); + static int virtscsi_kick_event(struct virtio_scsi *vscsi, struct virtio_scsi_event_node *event_node) { @@ -260,6 +262,7 @@ static int virtscsi_kick_event(struct virtio_scsi *vscsi, struct scatterlist sg; unsigned long flags; + INIT_WORK(event_node-work, virtscsi_handle_event); sg_init_one(sg, event_node-event, sizeof(struct virtio_scsi_event)); spin_lock_irqsave(vscsi-event_vq.vq_lock, flags); @@ -377,7 +380,6 @@ static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf) { struct virtio_scsi_event_node *event_node = buf; - INIT_WORK(event_node-work, virtscsi_handle_event); schedule_work(event_node-work); } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/6] virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends()
From: Ming Lei tom.leim...@gmail.com Access to tgt-req_vq is strictly serialized by spin_lock of tgt-tgt_lock, so the ACCESS_ONCE() isn't necessary. smp_read_barrier_depends() in virtscsi_req_done was introduced to order reading req_vq and decreasing tgt-reqs, but it isn't needed now because req_vq is read from scsi-req_vqs[vq-index - VIRTIO_SCSI_VQ_BASE] instead of tgt-req_vq, so remove the unnecessary barrier. Also remove related comment about the barrier. Cc: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Ming Lei tom.leim...@gmail.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/virtio_scsi.c | 53 ++ 1 file changed, 6 insertions(+), 47 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index db3b494e5926..fc054935eb1f 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -73,17 +73,12 @@ struct virtio_scsi_vq { * queue, and also lets the driver optimize the IRQ affinity for the virtqueues * (each virtqueue's affinity is set to the CPU that owns the queue). * - * An interesting effect of this policy is that only writes to req_vq need to - * take the tgt_lock. Read can be done outside the lock because: + * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq + * could be done locklessly, but we do not do it yet. * - * - writes of req_vq only occur when atomic_inc_return(tgt-reqs) returns 1. - * In that case, no other CPU is reading req_vq: even if they were in - * virtscsi_queuecommand_multi, they would be spinning on tgt_lock. - * - * - reads of req_vq only occur when the target is not idle (reqs != 0). - * A CPU that enters virtscsi_queuecommand_multi will not modify req_vq. - * - * Similarly, decrements of reqs are never concurrent with writes of req_vq. + * Decrements of reqs are never concurrent with writes of req_vq: before the + * decrement reqs will be != 0; after the decrement the virtqueue completion + * routine will not use the req_vq so it can be changed by a new request. * Thus they can happen outside the tgt_lock, provided of course we make reqs * an atomic_t. */ @@ -238,38 +233,6 @@ static void virtscsi_req_done(struct virtqueue *vq) int index = vq-index - VIRTIO_SCSI_VQ_BASE; struct virtio_scsi_vq *req_vq = vscsi-req_vqs[index]; - /* -* Read req_vq before decrementing the reqs field in -* virtscsi_complete_cmd. -* -* With barriers: -* -* CPU #0 virtscsi_queuecommand_multi (CPU #1) -* -* lock vq_lock -* read req_vq -* read reqs (reqs = 1) -* write reqs (reqs = 0) -* increment reqs (reqs = 1) -* write req_vq -* -* Possible reordering without barriers: -* -* CPU #0 virtscsi_queuecommand_multi (CPU #1) -* -* lock vq_lock -* read reqs (reqs = 1) -* write reqs (reqs = 0) -* increment reqs (reqs = 1) -* write req_vq -* read (wrong) req_vq -* -* We do not need a full smp_rmb, because req_vq is required to get -* to tgt-reqs: tgt is vscsi-tgt[sc-device-id], where sc is stored -* in the virtqueue as the user token. -*/ - smp_read_barrier_depends(); - virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd); }; @@ -560,12 +523,8 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi, spin_lock_irqsave(tgt-tgt_lock, flags); - /* -* The memory barrier after atomic_inc_return matches -* the smp_read_barrier_depends() in virtscsi_req_done. -*/ if (atomic_inc_return(tgt-reqs) 1) - vq = ACCESS_ONCE(tgt-req_vq); + vq = tgt-req_vq; else { queue_num = smp_processor_id(); while (unlikely(queue_num = vscsi-num_queues)) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/6] virtio-scsi: replace target spinlock with seqcount
From: Ming Lei ming@canonical.com The spinlock of tgt_lock is only for serializing read and write req_vq, one lockless seqcount is enough for the purpose. On one 16core VM with vhost-scsi backend, the patch can improve IOPS with 3% on random read test. Signed-off-by: Ming Lei ming@canonical.com [Add initialization in virtscsi_target_alloc. - Paolo] Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/virtio_scsi.c | 42 +- 1 file changed, 29 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index fc054935eb1f..f0b4cdbfceb0 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -26,6 +26,7 @@ #include scsi/scsi_host.h #include scsi/scsi_device.h #include scsi/scsi_cmnd.h +#include linux/seqlock.h #define VIRTIO_SCSI_MEMPOOL_SZ 64 #define VIRTIO_SCSI_EVENT_LEN 8 @@ -73,18 +74,16 @@ struct virtio_scsi_vq { * queue, and also lets the driver optimize the IRQ affinity for the virtqueues * (each virtqueue's affinity is set to the CPU that owns the queue). * - * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq - * could be done locklessly, but we do not do it yet. + * tgt_seq is held to serialize reading and writing req_vq. * * Decrements of reqs are never concurrent with writes of req_vq: before the * decrement reqs will be != 0; after the decrement the virtqueue completion * routine will not use the req_vq so it can be changed by a new request. - * Thus they can happen outside the tgt_lock, provided of course we make reqs + * Thus they can happen outside the tgt_seq, provided of course we make reqs * an atomic_t. */ struct virtio_scsi_target_state { - /* This spinlock never held at the same time as vq_lock. */ - spinlock_t tgt_lock; + seqcount_t tgt_seq; /* Count of outstanding requests. */ atomic_t reqs; @@ -521,19 +520,33 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi, unsigned long flags; u32 queue_num; - spin_lock_irqsave(tgt-tgt_lock, flags); + local_irq_save(flags); + if (atomic_inc_return(tgt-reqs) 1) { + unsigned long seq; + + do { + seq = read_seqcount_begin(tgt-tgt_seq); + vq = tgt-req_vq; + } while (read_seqcount_retry(tgt-tgt_seq, seq)); + } else { + /* no writes can be concurrent because of atomic_t */ + write_seqcount_begin(tgt-tgt_seq); + + /* keep previous req_vq if there is reader found */ + if (unlikely(atomic_read(tgt-reqs) 1)) { + vq = tgt-req_vq; + goto unlock; + } - if (atomic_inc_return(tgt-reqs) 1) - vq = tgt-req_vq; - else { queue_num = smp_processor_id(); while (unlikely(queue_num = vscsi-num_queues)) queue_num -= vscsi-num_queues; - tgt-req_vq = vq = vscsi-req_vqs[queue_num]; + unlock: + write_seqcount_end(tgt-tgt_seq); } + local_irq_restore(flags); - spin_unlock_irqrestore(tgt-tgt_lock, flags); return vq; } @@ -618,14 +631,17 @@ static int virtscsi_abort(struct scsi_cmnd *sc) static int virtscsi_target_alloc(struct scsi_target *starget) { + struct Scsi_Host *sh = dev_to_shost(starget-dev.parent); + struct virtio_scsi *vscsi = shost_priv(sh); + struct virtio_scsi_target_state *tgt = kmalloc(sizeof(*tgt), GFP_KERNEL); if (!tgt) return -ENOMEM; - spin_lock_init(tgt-tgt_lock); + seqcount_init(tgt-tgt_seq); atomic_set(tgt-reqs, 0); - tgt-req_vq = NULL; + tgt-req_vq = vscsi-req_vqs[0]; starget-hostdata = tgt; return 0; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/6] virtio-scsi patches for 3.16 + midlayer fix
Il 04/06/2014 13:21, Bart Van Assche ha scritto: Thanks for the quick respin. However, since you are mentioning that in v2 all occurrences of scmd-result |= DID_TIME_OUT 16 have been addressed, this made me wonder whether you had noticed the following code in scsi_decide_disposition() ? switch (host_byte(scmd-result)) { [ ... ] case DID_ABORT: if (scmd-eh_eflags SCSI_EH_ABORT_SCHEDULED) { scmd-result |= DID_TIME_OUT 16; return SUCCESS; } Nope, I miscounted how many occurrences were left. v3 on the way. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 6/6] virtio-scsi: Implement change_queue_depth for virtscsi targets
From: Venkatesh Srinivas venkate...@google.com change_queue_depth allows changing per-target queue depth via sysfs. It also allows the SCSI midlayer to ramp down the number of concurrent inflight requests in response to a SCSI BUSY status response and allows the midlayer to ramp the count back up to the device maximum when the BUSY condition has resolved. Signed-off-by: Venkatesh Srinivas venkate...@google.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/virtio_scsi.c | 33 + 1 file changed, 33 insertions(+) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index fda9fb35..de0f8d9b3682 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -26,6 +26,7 @@ #include scsi/scsi_host.h #include scsi/scsi_device.h #include scsi/scsi_cmnd.h +#include scsi/scsi_tcq.h #include linux/seqlock.h #define VIRTIO_SCSI_MEMPOOL_SZ 64 @@ -629,6 +630,36 @@ static int virtscsi_device_reset(struct scsi_cmnd *sc) return virtscsi_tmf(vscsi, cmd); } +/** + * virtscsi_change_queue_depth() - Change a virtscsi target's queue depth + * @sdev: Virtscsi target whose queue depth to change + * @qdepth:New queue depth + * @reason:Reason for the queue depth change. + */ +static int virtscsi_change_queue_depth(struct scsi_device *sdev, + int qdepth, + int reason) +{ + struct Scsi_Host *shost = sdev-host; + int max_depth = shost-cmd_per_lun; + + switch (reason) { + case SCSI_QDEPTH_QFULL: /* Drop qdepth in response to BUSY state */ + scsi_track_queue_full(sdev, qdepth); + break; + case SCSI_QDEPTH_RAMP_UP: /* Raise qdepth after BUSY state resolved */ + case SCSI_QDEPTH_DEFAULT: /* Manual change via sysfs */ + scsi_adjust_queue_depth(sdev, + scsi_get_tag_type(sdev), + min(max_depth, qdepth)); + break; + default: + return -EOPNOTSUPP; + } + + return sdev-queue_depth; +} + static int virtscsi_abort(struct scsi_cmnd *sc) { struct virtio_scsi *vscsi = shost_priv(sc-device-host); @@ -683,6 +714,7 @@ static struct scsi_host_template virtscsi_host_template_single = { .proc_name = virtio_scsi, .this_id = -1, .queuecommand = virtscsi_queuecommand_single, + .change_queue_depth = virtscsi_change_queue_depth, .eh_abort_handler = virtscsi_abort, .eh_device_reset_handler = virtscsi_device_reset, @@ -699,6 +731,7 @@ static struct scsi_host_template virtscsi_host_template_multi = { .proc_name = virtio_scsi, .this_id = -1, .queuecommand = virtscsi_queuecommand_multi, + .change_queue_depth = virtscsi_change_queue_depth, .eh_abort_handler = virtscsi_abort, .eh_device_reset_handler = virtscsi_device_reset, -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 5/6] virtio-scsi: fix various bad behavior on aborted requests
Even though the virtio-scsi spec guarantees that all requests related to the TMF will have been completed by the time the TMF itself completes, the request queue's callback might not have run yet. This causes requests to be completed more than once, and as a result triggers a variety of BUGs or oopses. Cc: sta...@vger.kernel.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/virtio_scsi.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index d66c4ee2c774..fda9fb35 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -235,6 +235,16 @@ static void virtscsi_req_done(struct virtqueue *vq) virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd); }; +static void virtscsi_poll_requests(struct virtio_scsi *vscsi) +{ + int i, num_vqs; + + num_vqs = vscsi-num_queues; + for (i = 0; i num_vqs; i++) + virtscsi_vq_done(vscsi, vscsi-req_vqs[i], +virtscsi_complete_cmd); +} + static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf) { struct virtio_scsi_cmd *cmd = buf; @@ -579,6 +589,18 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd *cmd) cmd-resp.tmf.response == VIRTIO_SCSI_S_FUNCTION_SUCCEEDED) ret = SUCCESS; + /* +* The spec guarantees that all requests related to the TMF have +* been completed, but the callback might not have run yet if +* we're using independent interrupts (e.g. MSI). Poll the +* virtqueues once. +* +* In the abort case, sc-scsi_done will do nothing, because +* the block layer must have detected a timeout and as a result +* REQ_ATOM_COMPLETE has been set. +*/ + virtscsi_poll_requests(vscsi); + out: mempool_free(cmd, virtscsi_cmd_pool); return ret; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/6] virtio-scsi: replace target spinlock with seqcount
From: Ming Lei ming@canonical.com The spinlock of tgt_lock is only for serializing read and write req_vq, one lockless seqcount is enough for the purpose. On one 16core VM with vhost-scsi backend, the patch can improve IOPS with 3% on random read test. Signed-off-by: Ming Lei ming@canonical.com [Add initialization in virtscsi_target_alloc. - Paolo] Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/virtio_scsi.c | 42 +- 1 file changed, 29 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index fc054935eb1f..f0b4cdbfceb0 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -26,6 +26,7 @@ #include scsi/scsi_host.h #include scsi/scsi_device.h #include scsi/scsi_cmnd.h +#include linux/seqlock.h #define VIRTIO_SCSI_MEMPOOL_SZ 64 #define VIRTIO_SCSI_EVENT_LEN 8 @@ -73,18 +74,16 @@ struct virtio_scsi_vq { * queue, and also lets the driver optimize the IRQ affinity for the virtqueues * (each virtqueue's affinity is set to the CPU that owns the queue). * - * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq - * could be done locklessly, but we do not do it yet. + * tgt_seq is held to serialize reading and writing req_vq. * * Decrements of reqs are never concurrent with writes of req_vq: before the * decrement reqs will be != 0; after the decrement the virtqueue completion * routine will not use the req_vq so it can be changed by a new request. - * Thus they can happen outside the tgt_lock, provided of course we make reqs + * Thus they can happen outside the tgt_seq, provided of course we make reqs * an atomic_t. */ struct virtio_scsi_target_state { - /* This spinlock never held at the same time as vq_lock. */ - spinlock_t tgt_lock; + seqcount_t tgt_seq; /* Count of outstanding requests. */ atomic_t reqs; @@ -521,19 +520,33 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi, unsigned long flags; u32 queue_num; - spin_lock_irqsave(tgt-tgt_lock, flags); + local_irq_save(flags); + if (atomic_inc_return(tgt-reqs) 1) { + unsigned long seq; + + do { + seq = read_seqcount_begin(tgt-tgt_seq); + vq = tgt-req_vq; + } while (read_seqcount_retry(tgt-tgt_seq, seq)); + } else { + /* no writes can be concurrent because of atomic_t */ + write_seqcount_begin(tgt-tgt_seq); + + /* keep previous req_vq if there is reader found */ + if (unlikely(atomic_read(tgt-reqs) 1)) { + vq = tgt-req_vq; + goto unlock; + } - if (atomic_inc_return(tgt-reqs) 1) - vq = tgt-req_vq; - else { queue_num = smp_processor_id(); while (unlikely(queue_num = vscsi-num_queues)) queue_num -= vscsi-num_queues; - tgt-req_vq = vq = vscsi-req_vqs[queue_num]; + unlock: + write_seqcount_end(tgt-tgt_seq); } + local_irq_restore(flags); - spin_unlock_irqrestore(tgt-tgt_lock, flags); return vq; } @@ -618,14 +631,17 @@ static int virtscsi_abort(struct scsi_cmnd *sc) static int virtscsi_target_alloc(struct scsi_target *starget) { + struct Scsi_Host *sh = dev_to_shost(starget-dev.parent); + struct virtio_scsi *vscsi = shost_priv(sh); + struct virtio_scsi_target_state *tgt = kmalloc(sizeof(*tgt), GFP_KERNEL); if (!tgt) return -ENOMEM; - spin_lock_init(tgt-tgt_lock); + seqcount_init(tgt-tgt_seq); atomic_set(tgt-reqs, 0); - tgt-req_vq = NULL; + tgt-req_vq = vscsi-req_vqs[0]; starget-hostdata = tgt; return 0; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 4/6] scsi_error: fix invalid setting of host byte
From: Ulrich Obergfell uober...@redhat.com After scsi_try_to_abort_cmd returns, the eh_abort_handler may have already found that the command has completed in the device, causing the host_byte to be nonzero (e.g. it could be DID_ABORT). When this happens, ORing DID_TIME_OUT into the host byte will corrupt the result field and initiate an unwanted command retry. Fix this by using set_host_byte instead, following the model of commit 2082ebc45af9c9c648383b8cde0dc1948eadbf31. Cc: sta...@vger.kernel.org Signed-off-by: Ulrich Obergfell uober...@redhat.com [Fix all instances according to review comments. - Paolo] Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- v1-v2: fix all occurrences [Bart] except one v2-v3: really fix all occurrences [Bart] drivers/scsi/scsi_error.c | 6 +++--- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index f17aa7aa7879..8e7a0f8d4f52 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -131,7 +131,7 @@ scmd_eh_abort_handler(struct work_struct *work) aborting command %p\n, scmd)); rtn = scsi_try_to_abort_cmd(sdev-host-hostt, scmd); if (rtn == SUCCESS) { - scmd-result |= DID_TIME_OUT 16; + set_host_byte(scmd, DID_TIME_OUT); if (scsi_host_eh_past_deadline(sdev-host)) { SCSI_LOG_ERROR_RECOVERY(3, scmd_printk(KERN_INFO, scmd, @@ -167,7 +167,7 @@ scmd_eh_abort_handler(struct work_struct *work) scmd_printk(KERN_WARNING, scmd, scmd %p terminate aborted command\n, scmd)); - scmd-result |= DID_TIME_OUT 16; + set_host_byte(scmd, DID_TIME_OUT); scsi_finish_command(scmd); } } @@ -291,7 +291,7 @@ enum blk_eh_timer_return scsi_times_out(struct request *req) if (scsi_abort_command(scmd) == SUCCESS) return BLK_EH_NOT_HANDLED; - scmd-result |= DID_TIME_OUT 16; + set_host_byte(scmd, DID_TIME_OUT); if (unlikely(rtn == BLK_EH_NOT_HANDLED !scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))) @@ -1776,7 +1776,7 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd) break; case DID_ABORT: if (scmd-eh_eflags SCSI_EH_ABORT_SCHEDULED) { - scmd-result |= DID_TIME_OUT 16; + set_host_byte(scmd, DID_TIME_OUT); return SUCCESS; } case DID_NO_CONNECT: -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/6] virtio-scsi: avoid cancelling uninitialized work items
Calling the workqueue interface on uninitialized work items isn't a good idea even if they're zeroed. It's not failing catastrophically only through happy accidents. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/virtio_scsi.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index f0b4cdbfceb0..d66c4ee2c774 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -253,6 +253,8 @@ static void virtscsi_ctrl_done(struct virtqueue *vq) virtscsi_vq_done(vscsi, vscsi-ctrl_vq, virtscsi_complete_free); }; +static void virtscsi_handle_event(struct work_struct *work); + static int virtscsi_kick_event(struct virtio_scsi *vscsi, struct virtio_scsi_event_node *event_node) { @@ -260,6 +262,7 @@ static int virtscsi_kick_event(struct virtio_scsi *vscsi, struct scatterlist sg; unsigned long flags; + INIT_WORK(event_node-work, virtscsi_handle_event); sg_init_one(sg, event_node-event, sizeof(struct virtio_scsi_event)); spin_lock_irqsave(vscsi-event_vq.vq_lock, flags); @@ -377,7 +380,6 @@ static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf) { struct virtio_scsi_event_node *event_node = buf; - INIT_WORK(event_node-work, virtscsi_handle_event); schedule_work(event_node-work); } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/6] virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends()
From: Ming Lei tom.leim...@gmail.com Access to tgt-req_vq is strictly serialized by spin_lock of tgt-tgt_lock, so the ACCESS_ONCE() isn't necessary. smp_read_barrier_depends() in virtscsi_req_done was introduced to order reading req_vq and decreasing tgt-reqs, but it isn't needed now because req_vq is read from scsi-req_vqs[vq-index - VIRTIO_SCSI_VQ_BASE] instead of tgt-req_vq, so remove the unnecessary barrier. Also remove related comment about the barrier. Cc: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Ming Lei tom.leim...@gmail.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/virtio_scsi.c | 53 ++ 1 file changed, 6 insertions(+), 47 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index db3b494e5926..fc054935eb1f 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -73,17 +73,12 @@ struct virtio_scsi_vq { * queue, and also lets the driver optimize the IRQ affinity for the virtqueues * (each virtqueue's affinity is set to the CPU that owns the queue). * - * An interesting effect of this policy is that only writes to req_vq need to - * take the tgt_lock. Read can be done outside the lock because: + * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq + * could be done locklessly, but we do not do it yet. * - * - writes of req_vq only occur when atomic_inc_return(tgt-reqs) returns 1. - * In that case, no other CPU is reading req_vq: even if they were in - * virtscsi_queuecommand_multi, they would be spinning on tgt_lock. - * - * - reads of req_vq only occur when the target is not idle (reqs != 0). - * A CPU that enters virtscsi_queuecommand_multi will not modify req_vq. - * - * Similarly, decrements of reqs are never concurrent with writes of req_vq. + * Decrements of reqs are never concurrent with writes of req_vq: before the + * decrement reqs will be != 0; after the decrement the virtqueue completion + * routine will not use the req_vq so it can be changed by a new request. * Thus they can happen outside the tgt_lock, provided of course we make reqs * an atomic_t. */ @@ -238,38 +233,6 @@ static void virtscsi_req_done(struct virtqueue *vq) int index = vq-index - VIRTIO_SCSI_VQ_BASE; struct virtio_scsi_vq *req_vq = vscsi-req_vqs[index]; - /* -* Read req_vq before decrementing the reqs field in -* virtscsi_complete_cmd. -* -* With barriers: -* -* CPU #0 virtscsi_queuecommand_multi (CPU #1) -* -* lock vq_lock -* read req_vq -* read reqs (reqs = 1) -* write reqs (reqs = 0) -* increment reqs (reqs = 1) -* write req_vq -* -* Possible reordering without barriers: -* -* CPU #0 virtscsi_queuecommand_multi (CPU #1) -* -* lock vq_lock -* read reqs (reqs = 1) -* write reqs (reqs = 0) -* increment reqs (reqs = 1) -* write req_vq -* read (wrong) req_vq -* -* We do not need a full smp_rmb, because req_vq is required to get -* to tgt-reqs: tgt is vscsi-tgt[sc-device-id], where sc is stored -* in the virtqueue as the user token. -*/ - smp_read_barrier_depends(); - virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd); }; @@ -560,12 +523,8 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi, spin_lock_irqsave(tgt-tgt_lock, flags); - /* -* The memory barrier after atomic_inc_return matches -* the smp_read_barrier_depends() in virtscsi_req_done. -*/ if (atomic_inc_return(tgt-reqs) 1) - vq = ACCESS_ONCE(tgt-req_vq); + vq = tgt-req_vq; else { queue_num = smp_processor_id(); while (unlikely(queue_num = vscsi-num_queues)) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/6] virtio-scsi patches for 3.16 + midlayer fix
As mentioned in the v1 submission, these patches were delayed a bit by the discussion and testing of patch 5. Debugging the problem also led to the discovery of a midlayer bug fixed by patch 4. Paolo v1-v2: fix all occurrences of scmd-result |= DID_TIME_OUT 16 in patch 4 v2-v3: really fix all occurrences in patch 4 Ming Lei (2): virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends() virtio-scsi: replace target spinlock with seqcount Paolo Bonzini (2): virtio-scsi: avoid cancelling uninitialized work items virtio-scsi: fix various bad behavior on aborted requests Ulrich Obergfell (1): scsi_error: fix invalid setting of host byte Venkatesh Srinivas (1): virtio-scsi: Implement change_queue_depth for virtscsi targets drivers/scsi/scsi_error.c | 6 +- drivers/scsi/virtio_scsi.c | 148 +++-- 2 files changed, 93 insertions(+), 61 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 5/6] virtio-scsi: fix various bad behavior on aborted requests
Il 04/06/2014 19:29, Venkatesh Srinivas ha scritto: Do you really want to poll the request VQs for completions if the TMF was rejected? I wasn't sure, but bugs in this path are hard enough that I preferred the safer patch. TMF ABORT may return FUNCTION REJECTED if the command to abort completed before the device saw the TMF ABORT message, for example. In such cases, this would unnecessarily lengthen the EH path. The cost of virtscsi_poll_requests should be nothing compared to the delay between the timeout and the invocation of the delayed_work, no? Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present
Il 03/06/2014 03:00, Martin K. Petersen ha scritto: Paolo == Paolo Bonzini pbonz...@redhat.com writes: + sdev_printk(KERN_ERR, sdev, + %s: VPD page 0x83 NAA descriptor not found\n, __func__); + + return; Paolo I suspect this error will be relatively common. You're right. But we would like to see it for devices that actually implement copy offload. So I suggest the following tweak... [SCSI] Look up and store NAA if VPD page 0x83 is present Copy offloading requires us to know the NAA descriptor for both source target device. This descriptor is mandatory in the Device Identification VPD page. Locate this descriptor in the returned VPD data so we don't have to do lookups for every copy command. Reviewed-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Martin K. Petersen martin.peter...@oracle.com Reviewed-by: Nicholas Bellinger n...@linux-iscsi.org diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 88d46fe6bf98..190dca4a8494 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -1024,6 +1024,62 @@ int scsi_get_vpd_page(struct scsi_device *sdev, u8 page, unsigned char *buf, EXPORT_SYMBOL_GPL(scsi_get_vpd_page); /** + * scsi_lookup_naa - Lookup NAA descriptor in VPD page 0x83 + * @sdev: The device to ask + * + * Copy offloading requires us to know the NAA descriptor for both + * source and target device. This descriptor is mandatory in the Device + * Identification VPD page. Locate this descriptor in the returned VPD + * data so we don't have to do lookups for every copy command. + */ +static void scsi_lookup_naa(struct scsi_device *sdev) +{ + unsigned char *buf = sdev-vpd_pg83; + unsigned int len = sdev-vpd_pg83_len; + + if (buf[1] != 0x83 || get_unaligned_be16(buf[2]) == 0) { + sdev_printk(KERN_ERR, sdev, + %s: VPD page 0x83 contains no descriptors\n, + __func__); + return; + } + + buf += 4; + len -= 4; + + do { + unsigned int desig_len = buf[3] + 4; + + /* Binary code set */ + if ((buf[0] 0xf) != 1) + goto skip; + + /* Target association */ + if ((buf[1] 4) 0x3) + goto skip; + + /* NAA designator */ + if ((buf[1] 0xf) != 0x3) + goto skip; + + sdev-naa = buf; + sdev-naa_len = desig_len; + + return; + +skip: + buf += desig_len; + len -= desig_len; + + } while (len 0); + + sdev_printk(KERN_ERR, sdev, + %s: VPD page 0x83 NAA descriptor not found\n, __func__); + + return; +} + +/** * scsi_attach_vpd - Attach Vital Product Data to a SCSI device structure * @sdev: The device to ask * @@ -1107,6 +1163,10 @@ retry_pg83: } sdev-vpd_pg83_len = result; sdev-vpd_pg83 = vpd_buf; + + /* Lookup NAA if 3PC set in INQUIRY response */ + if (sdev-inquiry_len = 6 sdev-inquiry[5] (1 3)) + scsi_lookup_naa(sdev); } } diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 5853c913d2b0..67bb70012802 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -119,6 +119,8 @@ struct scsi_device { unsigned char *vpd_pg83; int vpd_pg80_len; unsigned char *vpd_pg80; + unsigned char naa_len; + unsigned char *naa; unsigned char current_tag; /* current tag */ struct scsi_target *sdev_target; /* used only for single_lun */ -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] virtio-scsi: fix various bad behavior on aborted requests
Even though the virtio-scsi spec guarantees that all requests related to the TMF will have been completed by the time the TMF itself completes, the request queue's callback might not have run yet. This causes requests to be completed more than once, and as a result triggers a variety of BUGs or oopses. Cc: sta...@vger.kernel.org Cc: Ulrich Obergfell uober...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/virtio_scsi.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index d66c4ee2c774..fda9fb35 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -235,6 +235,16 @@ static void virtscsi_req_done(struct virtqueue *vq) virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd); }; +static void virtscsi_poll_requests(struct virtio_scsi *vscsi) +{ + int i, num_vqs; + + num_vqs = vscsi-num_queues; + for (i = 0; i num_vqs; i++) + virtscsi_vq_done(vscsi, vscsi-req_vqs[i], +virtscsi_complete_cmd); +} + static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf) { struct virtio_scsi_cmd *cmd = buf; @@ -579,6 +589,18 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd *cmd) cmd-resp.tmf.response == VIRTIO_SCSI_S_FUNCTION_SUCCEEDED) ret = SUCCESS; + /* +* The spec guarantees that all requests related to the TMF have +* been completed, but the callback might not have run yet if +* we're using independent interrupts (e.g. MSI). Poll the +* virtqueues once. +* +* In the abort case, sc-scsi_done will do nothing, because +* the block layer must have detected a timeout and as a result +* REQ_ATOM_COMPLETE has been set. +*/ + virtscsi_poll_requests(vscsi); + out: mempool_free(cmd, virtscsi_cmd_pool); return ret; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] virtio-scsi: Implement change_queue_depth for virtscsi targets
From: Venkatesh Srinivas venkate...@google.com change_queue_depth allows changing per-target queue depth via sysfs. It also allows the SCSI midlayer to ramp down the number of concurrent inflight requests in response to a SCSI BUSY status response and allows the midlayer to ramp the count back up to the device maximum when the BUSY condition has resolved. Signed-off-by: Venkatesh Srinivas venkate...@google.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/virtio_scsi.c | 33 + 1 file changed, 33 insertions(+) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index fda9fb35..de0f8d9b3682 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -26,6 +26,7 @@ #include scsi/scsi_host.h #include scsi/scsi_device.h #include scsi/scsi_cmnd.h +#include scsi/scsi_tcq.h #include linux/seqlock.h #define VIRTIO_SCSI_MEMPOOL_SZ 64 @@ -629,6 +630,36 @@ static int virtscsi_device_reset(struct scsi_cmnd *sc) return virtscsi_tmf(vscsi, cmd); } +/** + * virtscsi_change_queue_depth() - Change a virtscsi target's queue depth + * @sdev: Virtscsi target whose queue depth to change + * @qdepth:New queue depth + * @reason:Reason for the queue depth change. + */ +static int virtscsi_change_queue_depth(struct scsi_device *sdev, + int qdepth, + int reason) +{ + struct Scsi_Host *shost = sdev-host; + int max_depth = shost-cmd_per_lun; + + switch (reason) { + case SCSI_QDEPTH_QFULL: /* Drop qdepth in response to BUSY state */ + scsi_track_queue_full(sdev, qdepth); + break; + case SCSI_QDEPTH_RAMP_UP: /* Raise qdepth after BUSY state resolved */ + case SCSI_QDEPTH_DEFAULT: /* Manual change via sysfs */ + scsi_adjust_queue_depth(sdev, + scsi_get_tag_type(sdev), + min(max_depth, qdepth)); + break; + default: + return -EOPNOTSUPP; + } + + return sdev-queue_depth; +} + static int virtscsi_abort(struct scsi_cmnd *sc) { struct virtio_scsi *vscsi = shost_priv(sc-device-host); @@ -683,6 +714,7 @@ static struct scsi_host_template virtscsi_host_template_single = { .proc_name = virtio_scsi, .this_id = -1, .queuecommand = virtscsi_queuecommand_single, + .change_queue_depth = virtscsi_change_queue_depth, .eh_abort_handler = virtscsi_abort, .eh_device_reset_handler = virtscsi_device_reset, @@ -699,6 +731,7 @@ static struct scsi_host_template virtscsi_host_template_multi = { .proc_name = virtio_scsi, .this_id = -1, .queuecommand = virtscsi_queuecommand_multi, + .change_queue_depth = virtscsi_change_queue_depth, .eh_abort_handler = virtscsi_abort, .eh_device_reset_handler = virtscsi_device_reset, -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] scsi_error: fix invalid setting of host byte
From: Ulrich Obergfell uober...@redhat.com After scsi_try_to_abort_cmd returns, the eh_abort_handler may have already found that the command has completed in the device, causing the host_byte to be nonzero (e.g. it could be DID_ABORT). When this happens, ORing DID_TIME_OUT into the host byte will corrupt the result field and initiate an unwanted command retry. Fix this by using set_host_byte instead, following the model of commit 2082ebc45af9c9c648383b8cde0dc1948eadbf31. Cc: sta...@vger.kernel.org Signed-off-by: Ulrich Obergfell uober...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/scsi_error.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index f17aa7aa7879..ab06ef6fd726 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -131,7 +131,7 @@ scmd_eh_abort_handler(struct work_struct *work) aborting command %p\n, scmd)); rtn = scsi_try_to_abort_cmd(sdev-host-hostt, scmd); if (rtn == SUCCESS) { - scmd-result |= DID_TIME_OUT 16; + set_host_byte(scmd, DID_TIME_OUT); if (scsi_host_eh_past_deadline(sdev-host)) { SCSI_LOG_ERROR_RECOVERY(3, scmd_printk(KERN_INFO, scmd, -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends()
From: Ming Lei tom.leim...@gmail.com Access to tgt-req_vq is strictly serialized by spin_lock of tgt-tgt_lock, so the ACCESS_ONCE() isn't necessary. smp_read_barrier_depends() in virtscsi_req_done was introduced to order reading req_vq and decreasing tgt-reqs, but it isn't needed now because req_vq is read from scsi-req_vqs[vq-index - VIRTIO_SCSI_VQ_BASE] instead of tgt-req_vq, so remove the unnecessary barrier. Also remove related comment about the barrier. Cc: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Ming Lei tom.leim...@gmail.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/virtio_scsi.c | 53 ++ 1 file changed, 6 insertions(+), 47 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index db3b494e5926..fc054935eb1f 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -73,17 +73,12 @@ struct virtio_scsi_vq { * queue, and also lets the driver optimize the IRQ affinity for the virtqueues * (each virtqueue's affinity is set to the CPU that owns the queue). * - * An interesting effect of this policy is that only writes to req_vq need to - * take the tgt_lock. Read can be done outside the lock because: + * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq + * could be done locklessly, but we do not do it yet. * - * - writes of req_vq only occur when atomic_inc_return(tgt-reqs) returns 1. - * In that case, no other CPU is reading req_vq: even if they were in - * virtscsi_queuecommand_multi, they would be spinning on tgt_lock. - * - * - reads of req_vq only occur when the target is not idle (reqs != 0). - * A CPU that enters virtscsi_queuecommand_multi will not modify req_vq. - * - * Similarly, decrements of reqs are never concurrent with writes of req_vq. + * Decrements of reqs are never concurrent with writes of req_vq: before the + * decrement reqs will be != 0; after the decrement the virtqueue completion + * routine will not use the req_vq so it can be changed by a new request. * Thus they can happen outside the tgt_lock, provided of course we make reqs * an atomic_t. */ @@ -238,38 +233,6 @@ static void virtscsi_req_done(struct virtqueue *vq) int index = vq-index - VIRTIO_SCSI_VQ_BASE; struct virtio_scsi_vq *req_vq = vscsi-req_vqs[index]; - /* -* Read req_vq before decrementing the reqs field in -* virtscsi_complete_cmd. -* -* With barriers: -* -* CPU #0 virtscsi_queuecommand_multi (CPU #1) -* -* lock vq_lock -* read req_vq -* read reqs (reqs = 1) -* write reqs (reqs = 0) -* increment reqs (reqs = 1) -* write req_vq -* -* Possible reordering without barriers: -* -* CPU #0 virtscsi_queuecommand_multi (CPU #1) -* -* lock vq_lock -* read reqs (reqs = 1) -* write reqs (reqs = 0) -* increment reqs (reqs = 1) -* write req_vq -* read (wrong) req_vq -* -* We do not need a full smp_rmb, because req_vq is required to get -* to tgt-reqs: tgt is vscsi-tgt[sc-device-id], where sc is stored -* in the virtqueue as the user token. -*/ - smp_read_barrier_depends(); - virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd); }; @@ -560,12 +523,8 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi, spin_lock_irqsave(tgt-tgt_lock, flags); - /* -* The memory barrier after atomic_inc_return matches -* the smp_read_barrier_depends() in virtscsi_req_done. -*/ if (atomic_inc_return(tgt-reqs) 1) - vq = ACCESS_ONCE(tgt-req_vq); + vq = tgt-req_vq; else { queue_num = smp_processor_id(); while (unlikely(queue_num = vscsi-num_queues)) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH resend 0/6] virtio-scsi patches for 3.16 + a midlayer one-liner
Sorry for the delay, mostly caused by the discussion and testing of patch 5. That's the most important of the bunch, and debugging the problem also led to the discovery of a midlayer bug fixed by patch 4. I hope these can make their way into 3.16. Thanks! resend: forgot linux-scsi. :( Paolo Ming Lei (2): virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends() virtio-scsi: replace target spinlock with seqcount Paolo Bonzini (2): virtio-scsi: avoid cancelling uninitialized work items virtio-scsi: fix various bad behavior on aborted requests Ulrich Obergfell (1): scsi_error: fix invalid setting of host byte Venkatesh Srinivas (1): virtio-scsi: Implement change_queue_depth for virtscsi targets drivers/scsi/scsi_error.c | 2 +- drivers/scsi/virtio_scsi.c | 148 +++-- 2 files changed, 91 insertions(+), 59 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present
Il 29/05/2014 05:52, Martin K. Petersen ha scritto: + sdev_printk(KERN_ERR, sdev, + %s: VPD page 0x83 NAA descriptor not found\n, __func__); + + return; I suspect this error will be relatively common. libata for example has if (ata_id_has_wwn(args-id)) { /* SAT defined lu world wide name */ /* piv=0, assoc=lu, code_set=binary, designator=NAA */ rbuf[num + 0] = 1; rbuf[num + 1] = 3; rbuf[num + 3] = ATA_ID_WWN_LEN; num += 4; ata_id_string(args-id, (unsigned char *) rbuf + num, ATA_ID_WWN, ATA_ID_WWN_LEN); num += ATA_ID_WWN_LEN; } rbuf[3] = num - 4;/* page len (assume less than 256 bytes) */ and most of the time IDE disks in a virtual machine are configured without a WWN. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd
Il 30/05/2014 10:15, Liu Ping Fan ha scritto: When running io stress test on large latency scsi-disk, e.g guest with virtscsi on a nfs image. It can trigger the BUG_ON(test_bit(REQ_ATOM_COMPLETE, req-atomic_flags)); in blk_start_request(). Since there is a race between latency finishing scmd and the re-allocated scmd. I.e a request is still referred by a scmd, but we had turn it back to slab. This series introduces the ref on scmd to exclude this issue, and the following is ref rules. inc ref rules is like the following: When setup a scmd, inited as 1. When add a timer inc 1. dec ref rules is like the following: -for the normal ref scsi_done() will drop the ref when fail to acquire REQ_ATOM_COMPLETE immediately or drop the ref by scsi_end_request() or drop by return SUCCESS_REMOVE -for a timer ref when deleting timer, if !list_empty(timeout_list), then there is a timer ref, and drop it. patch1-2: fix the current potential bug patch3~6: prepare for the mechanism for the ref patch7: the ref rules core patch8-9: e.g and test-issue for the new mechanism. Since lack of many virtscsi background, patch8 may be poor and need to be improved :) Note: all the patches are based on rhel7, whose kernel version is linux-3.10. I will rebase them onto the latest commit if my method is practical. This series is not necessary, this is a bug in the virtscsi driver. I have a ten line patch to fix it, but I haven't yet tested it properly. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-scsi: replace target spinlock with seqcount
Il 28/05/2014 03:48, Ming Lei ha scritto: On Wed, May 28, 2014 at 12:57 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 27/05/2014 18:50, Venkatesh Srinivas ha scritto: Hi, I think this patch has a small race involving just two commands: 1. The first command to a target is in virtscsi_pick_vq(), after atomic_inc_return(tgt-tgt_lock)) but before write_seqcount_begin() Good catch, thanks Venkatesh And it doesn't happen in my test, so looks it won't be easy to trigger, :-) No, it's basically impossible because the race window is very small and the first command (INQUIRY or REPORT LUNS) is likely to be synchronous anyway. But it's there anyway. Specifically: @@ -508,19 +507,33 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi, unsigned long flags; u32 queue_num; -spin_lock_irqsave(tgt-tgt_lock, flags); +local_irq_save(flags); +if (atomic_inc_return(tgt-reqs) 1) { +unsigned long seq; + +do { +seq = read_seqcount_begin(tgt-tgt_seq); +vq = tgt-req_vq; +} while (read_seqcount_retry(tgt-tgt_seq, seq)); +} else { A second virtscsi_pick_vq() here will read a stale or NULL tgt-req_vq. The NULL case is true, indeed I was going to send a patch to initialize tgt-req_vq but I have not tested it. diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index fc054935eb1f..f0b4cdbfceb0 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -618,14 +631,17 @@ static int virtscsi_abort(struct scsi_cmnd *sc) static int virtscsi_target_alloc(struct scsi_target *starget) { + struct Scsi_Host *sh = dev_to_shost(starget-dev.parent); + struct virtio_scsi *vscsi = shost_priv(sh); + struct virtio_scsi_target_state *tgt = kmalloc(sizeof(*tgt), GFP_KERNEL); if (!tgt) return -ENOMEM; seqcount_init(tgt-tgt_seq); atomic_set(tgt-reqs, 0); - tgt-req_vq = NULL; + tgt-req_vq = vscsi-req_vqs[0]; starget-hostdata = tgt; return 0; Paolo, do you need me to integrate this one into the patch? or you will make it as one standalone? I will integrate it myself after testing it. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [target:for-next 20/20] drivers/scsi/virtio_scsi.c:531:48: error: dereferencing pointer to incomplete type
Il 28/05/2014 00:21, Nicholas A. Bellinger ha scritto: On Mon, 2014-05-26 at 13:30 -0400, Martin K. Petersen wrote: Nic == Nicholas A Bellinger n...@linux-iscsi.org writes: What about #ifdef'ing VIRTIO_SCSI_F_T10_PI support out if !CONFIG_BLK_DEV_INTEGRITY? Nic I figured it was slightly cleaner to enable BLK_DEV_INTEGRITY by Nic default when referencing struct blk_integrity (following what Nic IBLOCK does), than adding the equivalent #ifdef's.. Nic MKP, do you have a preference on this..? Well, I guess how important the virtio stuff is in the embedded space? In the block layer we have all these wrappers that allow things to Do The Right Thing when the BLK_DEV_INTEGRITY is disabled. I'm not entirely sure it worth the hassle to do the same for target and virtio. The memory savings aren't that big to begin with. And besides, the bip pointer field in struct bio is about to become generic with my impending copy offload patches anyway, In that case, I'll leave as is unless Paolo has an objection. No, that's fine. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler
Il 27/05/2014 10:56, James Bottomley ha scritto: The whole reason BUG_ON doesn't leave a log trace is to try to prevent corruption propagating to the data storage devices. What you propose would be inviting that corruption in the name of getting a log entry. Even this is not entirely correct. kdump can be used to capture the full log of the BUG_ON. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] block: Introduce blk_rq_completed()
Il 27/05/2014 12:21, James Bottomley ha scritto: I could also see us one day extending the TMF capability to abort any running command, which would make even an assertion of block timed out or completed invalid. Actually the assertion would remain valid, and that's exactly what Bart wants to document with this assertion. Completed from the point of view of the block layer really means do not run the softirq. If you wanted to abort any running command, you still would mark the request as completed for the block layer before issuing the TMF. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] block: Introduce blk_rq_completed()
Il 27/05/2014 12:59, James Bottomley ha scritto: On Tue, 2014-05-27 at 12:47 +0200, Paolo Bonzini wrote: Il 27/05/2014 12:21, James Bottomley ha scritto: I could also see us one day extending the TMF capability to abort any running command, which would make even an assertion of block timed out or completed invalid. Actually the assertion would remain valid, and that's exactly what Bart wants to document with this assertion. No, it wouldn't: if we abort a running command by definition the command hadn't timed out and might not be completed. This is required by TMF handling because now you have an abort racing with a completion. Either the command completes normally because it misses the abort or the abort gets to it and its returned status is set to TASK_ABORTED. That's the only way you can tell if the abort was successful or not. If you're thinking we would tell block to ignore returning commands before issuing the abort, we'd never be able to tell if the abort were successful, so we have to allow the race to collect the status. You could use a different mechanism than a softirq to tell the abort were successful, for example by overriding scsi_done. But with respect to the block layer, the mechanics of avoiding the race and double-free would probably be the same. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] block: Introduce blk_rq_completed()
Il 27/05/2014 13:26, James Bottomley ha scritto: You could use a different mechanism than a softirq to tell the abort were successful, for example by overriding scsi_done. But with respect to the block layer, the mechanics of avoiding the race and double-free would probably be the same. I think there's some confusion about what the race and double free is: It only occurs with timeouts. In a timeout situation, the host had decided it's not waiting any longer for the target to respond and proceeds to error recovery. At any time between the host making this decision up to the point it kicks the target hard enough to clear all in-flight commands, the target may return the command. If we didn't have some ignore function on command completions while we're handling errors, this would lead to double completion. If we decided to allow arbitrary aborts of running commands, we would send a TMF in during the normal (i.e. un timed out) command period. Because there's no timeout involved, there's no double free problem. The race in this case is whether the abort catches the command or not and to mediate that race we need the normal status return. I'm not sure why no timeout implies no double free. There would still be a race between the interrupt handler and softirq on one side, and the abort handler on the other. The interrupt handler's call to cmd-scsi_done ends up triggering the softirq and thus freeing the command with scsi_put_command. The abort handler, as you mentioned, wants the status return so it needs the interrupt handler to run---but not the softirq. A simple way to avoid this could be to skip the softirq processing, by marking the request block-layer-complete, and do everything in the abort handler. The interrupt handler would still run and fill in the status. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-scsi: replace target spinlock with seqcount
Il 27/05/2014 18:50, Venkatesh Srinivas ha scritto: Hi, I think this patch has a small race involving just two commands: 1. The first command to a target is in virtscsi_pick_vq(), after atomic_inc_return(tgt-tgt_lock)) but before write_seqcount_begin() 2. A second command to the same target enters virtscsi_pick_vq(). It will hit the same atomic inc, take the upper branch of the conditional, and read out a stale (or NULL) tgt-req_vq. Specifically: @@ -508,19 +507,33 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi, unsigned long flags; u32 queue_num; -spin_lock_irqsave(tgt-tgt_lock, flags); +local_irq_save(flags); +if (atomic_inc_return(tgt-reqs) 1) { +unsigned long seq; + +do { +seq = read_seqcount_begin(tgt-tgt_seq); +vq = tgt-req_vq; +} while (read_seqcount_retry(tgt-tgt_seq, seq)); +} else { A second virtscsi_pick_vq() here will read a stale or NULL tgt-req_vq. The NULL case is true, indeed I was going to send a patch to initialize tgt-req_vq but I have not tested it. diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index fc054935eb1f..f0b4cdbfceb0 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -618,14 +631,17 @@ static int virtscsi_abort(struct scsi_cmnd *sc) static int virtscsi_target_alloc(struct scsi_target *starget) { + struct Scsi_Host *sh = dev_to_shost(starget-dev.parent); + struct virtio_scsi *vscsi = shost_priv(sh); + struct virtio_scsi_target_state *tgt = kmalloc(sizeof(*tgt), GFP_KERNEL); if (!tgt) return -ENOMEM; seqcount_init(tgt-tgt_seq); atomic_set(tgt-reqs, 0); - tgt-req_vq = NULL; + tgt-req_vq = vscsi-req_vqs[0]; starget-hostdata = tgt; return 0; The case of a stale req_vq is okay and is the (IMO reasonable) price to pay for allowing more concurrency. If you have concurrent requests from multiple VCPUs, multiqueue is not great for your workload anyway, at least with the current steering algorithm. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler
Il 26/05/2014 17:14, Bart Van Assche ha scritto: scsi_put_command() is either invoked before a command is queued or after a command has completed. scsi_cmnd.abort_work is scheduled after a command has timed out and before it is finished. The block layer guarantees that either the softirq_done_fn() or the rq_timed_out_fn() function is invoked but not both. This means that scsi_put_command() is never invoked while abort_work is scheduled. Hence remove the cancel_delayed_work() call from scsi_put_command(). Similarly, scsi_abort_command() is only invoked from the SCSI timeout handler. If scsi_abort_command() is invoked for a SCSI command with the SCSI_EH_ABORT_SCHEDULED flag set this means that scmd_eh_abort_handler() has already invoked scsi_queue_insert() and hence that scsi_cmnd.abort_work is no longer pending. Hence also remove the cancel_delayed_work() call from scsi_abort_command(). Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Hannes Reinecke h...@suse.de Cc: Paolo Bonzini pbonz...@redhat.com Cc: Christoph Hellwig h...@infradead.org Cc: Jens Axboe ax...@fb.com Cc: Joe Lawrence joe.lawre...@stratus.com --- drivers/scsi/scsi.c | 2 +- drivers/scsi/scsi_error.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 88d46fe..c972eab 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -334,7 +334,7 @@ void scsi_put_command(struct scsi_cmnd *cmd) list_del_init(cmd-list); spin_unlock_irqrestore(cmd-device-list_lock, flags); - cancel_delayed_work(cmd-abort_work); + WARN_ON_ONCE(delayed_work_pending(cmd-abort_work)); __scsi_put_command(cmd-device-host, cmd); } diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index f17aa7a..5232583 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -193,7 +193,7 @@ scsi_abort_command(struct scsi_cmnd *scmd) SCSI_LOG_ERROR_RECOVERY(3, scmd_printk(KERN_INFO, scmd, scmd %p previous abort failed\n, scmd)); - cancel_delayed_work(scmd-abort_work); + WARN_ON_ONCE(delayed_work_pending(scmd-abort_work)); return FAILED; } I still prefer a BUG_ON in scsi_put_command, but anyway: series Reviewed-by: Paolo Bonzini pbonz...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [target:for-next 20/20] drivers/scsi/virtio_scsi.c:531:48: error: dereferencing pointer to incomplete type
Il 22/05/2014 22:46, Nicholas A. Bellinger ha scritto: Hi Fengguang, On Thu, 2014-05-22 at 11:13 +0800, kbuild test robot wrote: tree: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git for-next head: 4baaa7d589e24bfe87dfd6c7a1229108be404a28 commit: 4baaa7d589e24bfe87dfd6c7a1229108be404a28 [20/20] virtio-scsi: Enable DIF/DIX modes in SCSI host LLD config: x86_64-randconfig-x000 (attached as .config) All error/warnings: drivers/scsi/virtio_scsi.c: In function 'virtio_scsi_init_hdr_pi': drivers/scsi/virtio_scsi.c:531:48: error: dereferencing pointer to incomplete type cmd_pi-pi_bytesout = blk_rq_sectors(rq) * bi-tuple_size; ^ drivers/scsi/virtio_scsi.c:533:47: error: dereferencing pointer to incomplete type cmd_pi-pi_bytesin = blk_rq_sectors(rq) * bi-tuple_size; ^ vim +531 drivers/scsi/virtio_scsi.c 525 if (!rq || !scsi_prot_sg_count(sc)) 526 return; 527 528 bi = blk_get_integrity(rq-rq_disk); 529 530 if (sc-sc_data_direction == DMA_TO_DEVICE) 531 cmd_pi-pi_bytesout = blk_rq_sectors(rq) * bi-tuple_size; 532 else if (sc-sc_data_direction == DMA_FROM_DEVICE) 533 cmd_pi-pi_bytesin = blk_rq_sectors(rq) * bi-tuple_size; 534 } 535 536 static int virtscsi_queuecommand(struct virtio_scsi *vscsi, Squashing the following into the original commit to enable blk-integrity for virtio-scsi to address the randconfig build failure above. Thanks! --nab diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index 02832d6..baca589 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -1773,6 +1773,7 @@ config SCSI_BFA_FC config SCSI_VIRTIO tristate virtio-scsi support depends on VIRTIO + select BLK_DEV_INTEGRITY help This is the virtual HBA driver for virtio. If the kernel will be used in a virtual machine, say Y or M. What about #ifdef'ing VIRTIO_SCSI_F_T10_PI support out if !CONFIG_BLK_DEV_INTEGRITY? Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?
Il 23/05/2014 03:28, Elliott, Robert (Server Storage) ha scritto: Depending on the transport, there may be a race condition between the command status and the ABORT TASK response; the ABORT TASK response might get back first. I think that means scsi_eh_abort_handler's call to scsi_finish_command could happen during or after scsi_softirq_done has called scsi_finish_command. That was my doubt, in fact. But actually Bart explained that this is not the case. Either scsi_eh_abort_handler will be called via the work item, or the command will never be sent to the softirq. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command()
Il 23/05/2014 08:09, Hannes Reinecke ha scritto: And when freeing a command we absolutely need to make sure that the workqueue is empty. So calling cancel_delayed_work() was the obvious thing to do. You would need cancel_delayed_work_sync, but if it really happened that the work item is running, it would cause a double free. I'd be fine with adding a WARN_ON(!list_empty(cmd-abort_work)) here, however. This will clear up the intent of this statement. BUG_ON even, since you'd get badness from the double free anyway. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command()
Il 23/05/2014 12:37, Bart Van Assche ha scritto: On 05/23/14 11:24, Paolo Bonzini wrote: Il 23/05/2014 08:09, Hannes Reinecke ha scritto: And when freeing a command we absolutely need to make sure that the workqueue is empty. So calling cancel_delayed_work() was the obvious thing to do. You would need cancel_delayed_work_sync, but if it really happened that the work item is running, it would cause a double free. I'd be fine with adding a WARN_ON(!list_empty(cmd-abort_work)) here, however. This will clear up the intent of this statement. BUG_ON even, since you'd get badness from the double free anyway. Hello Paolo, Are you aware that Linus strongly prefers WARN_ON_ONCE() over BUG_ON() ? See e.g. https://lkml.org/lkml/2012/9/27/461 or https://lkml.org/lkml/2014/4/28/657. Yes, I am and I even downgraded some KVM BUG_ONs recently. But in this case I think that memory corruption is going to happen anyway unless you consciously leak the Scsi_Cmnd * (because if you use WARN_ON, you also need to return early as Linus suggested in the second email). So the WARN_ON/BUG_ON choice here should not just consider what makes the problem easier to debug; hanging the machine before guaranteed badness seems to me like a good use for BUG_ON. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-v2 6/6] virtio-scsi: Enable DIF/DIX modes in SCSI host LLD
Il 22/05/2014 04:26, Nicholas A. Bellinger ha scritto: From: Nicholas Bellinger n...@linux-iscsi.org This patch updates virtscsi_probe() to setup necessary Scsi_Host level protection resources. (currently hardcoded to 1) It changes virtscsi_add_cmd() to attach outgoing / incoming protection SGLs preceeding the data payload, and is using the new virtio_scsi_cmd_req_pi-d[oi],pi_niv field to signal Wrong field name in the commit message... to signal to vhost/scsi how many prot_sgs to expect. s/prot_sgs to expect/bytes to expect for protection data/ Apart from this, which can be fixed in the pull request, Acked-by: Paolo Bonzini pbonz...@redhat.com for getting this in via the target tree. Paolo v4 changes: - Convert virtio_scsi_init_hdr_pi to use pi_bytesout + pi_bytesin (mst + paolo + nab) - Use blk_integrity-tuple_size to calculate pi bytes (nab) v3 changes: - Use VIRTIO_SCSI_F_T10_PI to determine PI or non PI header (Paolo) v2 changes: - Make protection buffer come before data buffer (Paolo) - Enable virtio_scsi_cmd_req_pi usage (Paolo) Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Cc: Martin K. Petersen martin.peter...@oracle.com Cc: Christoph Hellwig h...@lst.de Cc: Hannes Reinecke h...@suse.de Cc: Sagi Grimberg sa...@dev.mellanox.co.il Cc: H. Peter Anvin h...@zytor.com Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org --- drivers/scsi/virtio_scsi.c | 86 ++-- 1 file changed, 68 insertions(+), 18 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 16bfd50..cc634b0 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -37,6 +37,7 @@ struct virtio_scsi_cmd { struct completion *comp; union { struct virtio_scsi_cmd_req cmd; + struct virtio_scsi_cmd_req_picmd_pi; struct virtio_scsi_ctrl_tmf_req tmf; struct virtio_scsi_ctrl_an_req an; } req; @@ -440,7 +441,7 @@ static int virtscsi_add_cmd(struct virtqueue *vq, size_t req_size, size_t resp_size, gfp_t gfp) { struct scsi_cmnd *sc = cmd-sc; - struct scatterlist *sgs[4], req, resp; + struct scatterlist *sgs[6], req, resp; struct sg_table *out, *in; unsigned out_num = 0, in_num = 0; @@ -458,16 +459,24 @@ static int virtscsi_add_cmd(struct virtqueue *vq, sgs[out_num++] = req; /* Data-out buffer. */ - if (out) + if (out) { + /* Place WRITE protection SGLs before Data OUT payload */ + if (scsi_prot_sg_count(sc)) + sgs[out_num++] = scsi_prot_sglist(sc); sgs[out_num++] = out-sgl; + } /* Response header. */ sg_init_one(resp, cmd-resp, resp_size); sgs[out_num + in_num++] = resp; /* Data-in buffer */ - if (in) + if (in) { + /* Place READ protection SGLs before Data IN payload */ + if (scsi_prot_sg_count(sc)) + sgs[out_num + in_num++] = scsi_prot_sglist(sc); sgs[out_num + in_num++] = in-sgl; + } return virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, gfp); } @@ -492,12 +501,44 @@ static int virtscsi_kick_cmd(struct virtio_scsi_vq *vq, return err; } +static void virtio_scsi_init_hdr(struct virtio_scsi_cmd_req *cmd, +struct scsi_cmnd *sc) +{ + cmd-lun[0] = 1; + cmd-lun[1] = sc-device-id; + cmd-lun[2] = (sc-device-lun 8) | 0x40; + cmd-lun[3] = sc-device-lun 0xff; + cmd-tag = (unsigned long)sc; + cmd-task_attr = VIRTIO_SCSI_S_SIMPLE; + cmd-prio = 0; + cmd-crn = 0; +} + +static void virtio_scsi_init_hdr_pi(struct virtio_scsi_cmd_req_pi *cmd_pi, + struct scsi_cmnd *sc) +{ + struct request *rq = sc-request; + struct blk_integrity *bi; + + virtio_scsi_init_hdr((struct virtio_scsi_cmd_req *)cmd_pi, sc); + + if (!rq || !scsi_prot_sg_count(sc)) + return; + + bi = blk_get_integrity(rq-rq_disk); + + if (sc-sc_data_direction == DMA_TO_DEVICE) + cmd_pi-pi_bytesout = blk_rq_sectors(rq) * bi-tuple_size; + else if (sc-sc_data_direction == DMA_FROM_DEVICE) + cmd_pi-pi_bytesin = blk_rq_sectors(rq) * bi-tuple_size; +} + static int virtscsi_queuecommand(struct virtio_scsi *vscsi, struct virtio_scsi_vq *req_vq, struct scsi_cmnd *sc) { struct virtio_scsi_cmd *cmd; - int ret; + int ret, req_size; struct Scsi_Host *shost = virtio_scsi_host(vscsi-vdev); BUG_ON(scsi_sg_count(sc) shost-sg_tablesize); @@ -515,22 +556,20 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi, memset(cmd, 0, sizeof(*cmd)); cmd-sc
Re: [PATCH-v2 0/6] vhost/scsi: Add T10 PI SGL passthrough support
Il 22/05/2014 04:26, Nicholas A. Bellinger ha scritto: From: Nicholas Bellinger n...@linux-iscsi.org Hi MST, MKP, Paolo Co, Here is the v2 patch series for adding T1O protection information (PI) SGL passthrough support between virtio-scsi LLD + vhost-scsi fabric endpoints. Following MST's recommendation, it includes the changes for using bytes intead of number of iovecs in virtio_scsi_cmd_req_pi along with the associated changes to virtio-scsi + vhost/scsi code. For vhost/scsi, this includes walking the leading iovec's length(s) to determine where protection payload ends, and real data payload starts. For virtio-scsi LLD code, this includes locating struct blk_integrity for using blk_rq_sectors + -tuple_size to calculate the total bytes for outgoing cmd_pi-pi_bytes[out,in] values. The full list of changes from last series include: - Use pi_bytesout + pi_bytesin instead of niovs in virtio-scsi PI header (mst + paolo) - Add prot_pto=1 in tcm_vhost_submission_work() when no PI buffer exists (nab) - Get rid of req + cdb pointer casts in vhost_scsi_handle_vq (mst) - Ensure that virtio_scsi_cmd_req_pi processing happens regardless of data_num in vhost_scsi_handle_vq (nab) - Pass TARGET_PROT_ALL into transport_init_session_tags() (nab) - Convert vhost_scsi_handle_vq to use memcpy_fromiovecend() (mst) - Convert vhost_scsi_handle_vq to use pi_bytesout + pi_bytesin (nab) - Convert virtio_scsi_init_hdr_pi() to use pi_bytesout + pi_bytesin (mst + paolo + nab) - Use blk_integrity-tuple_size to calculate pi bytes (nab) Please review for v3.16-rc1 code. Thanks! --nab Nicholas Bellinger (6): virtio-scsi.h: Add virtio_scsi_cmd_req_pi + VIRTIO_SCSI_F_T10_PI bits vhost/scsi: Move sanity check into vhost_scsi_map_iov_to_sgl vhost/scsi: Add preallocation of protection SGLs vhost/scsi: Add T10 PI IOV - SGL memory mapping logic vhost/scsi: Enable T10 PI IOV - SGL memory mapping virtio-scsi: Enable DIF/DIX modes in SCSI host LLD drivers/scsi/virtio_scsi.c | 86 +--- drivers/vhost/scsi.c| 305 +-- include/linux/virtio_scsi.h | 15 ++- 3 files changed, 292 insertions(+), 114 deletions(-) Looks good, thanks! Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command()
Il 21/05/2014 15:30, Bart Van Assche ha scritto: +static bool scmd_being_handled_in_other_context(struct scsi_cmnd *scmd) +{ + struct Scsi_Host *shost = scmd-device-host; + struct scsi_cmnd *c; + unsigned long flags; + bool ret = false; + + if (!blk_rq_completed(scmd-request)) + return true; + + spin_lock_irqsave(shost-host_lock, flags); + list_for_each_entry(c, shost-eh_cmd_q, eh_entry) { + if (c == scmd) { + ret = true; + break; + } + } + spin_unlock_irqrestore(shost-host_lock, flags); + + return ret; +} + /** * scmd_eh_abort_handler - Handle command aborts * @work: command to be aborted. @@ -120,6 +142,8 @@ scmd_eh_abort_handler(struct work_struct *work) struct scsi_device *sdev = scmd-device; int rtn; + WARN_ON_ONCE(scmd_being_handled_in_other_context(scmd)); What about a simpler, though less accuracte WARN_ON(!blk_rq_completed(scmd-request)); that doesn't need the host_lock? Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?
Il 21/05/2014 16:16, Mark Wu ha scritto: Is it possible that when scsi_done is invoked, the scsi command or the request has been freed and then reallocated for a new I/O request? Because of this the bit flag REQ_ATOM_COMPLETE becomes unreliable. Thanks! It is up to the driver to ensure that the interrupt handler will not process the Scsi_Cmnd* after returning from the eh_abort_handler. If you do this, what you say cannot happen. Otherwise you'll get a variety of failures, the most common of which for me are OOPSes and a BUG in blk_finish_request. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] virtio-scsi: Enable DIF/DIX modes in SCSI host LLD
Il 19/05/2014 21:07, Nicholas A. Bellinger ha scritto: Hi MST, So I've finally got some cycles to get back to this code, and wanted to verify the outstanding items you had previously raised: - Convert vhost-scsi to be independent of IOV layout using memcpy_fromiovecend. (Does this effect existing non PI virtio-scsi operation..?) - Report VIRTIO_F_ANY_LAYOUT feature bit to userspace. - Convert virtio_scsi_cmd_req_pi to bytes field instead of number of iovecs. This is the only item really required, since the bytes field is what will be in VIRTIO 1.0. The other would be nice to have, but not a blocker for PI support. - Ensure virtio_scsi_cmd_req_pi is naturally aligned mst already commented on this. - Figure out why QEMU is not acking (any) vhost-scsi feature bits This is a separate bug, isn't it? It need not block PI support. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_scsi: don't call virtqueue_add_sgs(... GFP_NOIO) holding spinlock.
Il 20/05/2014 07:12, Rusty Russell ha scritto: This triggers every time we do a SCSI abort: virtscsi_tmf - virtscsi_kick_cmd (grab lock and call) - virtscsi_add_cmd - virtqueue_add_sgs (GFP_NOIO) Logs look like this: sd 0:0:0:0: [sda] abort BUG: sleeping function called from invalid context at mm/slub.c:966 in_atomic(): 1, irqs_disabled(): 1, pid: 6, name: kworker/u2:0 3 locks held by kworker/u2:0/6: #0: (scsi_tmf_%dshost-host_no){..}, at: [c0153180] process_one_work+0xe0/0x3d0 #1: (((cmd-abort_work)-work)){..}, at: [c0153180] process_one_work+0xe0/0x3d0 #2: ((virtscsi_vq-vq_lock)-rlock){..}, at: [c043f508] virtscsi_kick_cmd+0x18/0x1b0 CPU: 0 PID: 6 Comm: kworker/u2:0 Not tainted 3.15.0-rc5+ #110 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-rc1-0-gb1d4dc9-20140515_140003-nilsson.home.kraxel.org 04/01/2014 Workqueue: scsi_tmf_0 scmd_eh_abort_handler Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index db3b494e5926..62757afd93bb 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -433,11 +433,10 @@ static void virtscsi_event_done(struct virtqueue *vq) * @cmd: command structure * @req_size : size of the request buffer * @resp_size : size of the response buffer - * @gfp: flags to use for memory allocations */ static int virtscsi_add_cmd(struct virtqueue *vq, struct virtio_scsi_cmd *cmd, - size_t req_size, size_t resp_size, gfp_t gfp) + size_t req_size, size_t resp_size) { struct scsi_cmnd *sc = cmd-sc; struct scatterlist *sgs[4], req, resp; @@ -469,19 +468,19 @@ static int virtscsi_add_cmd(struct virtqueue *vq, if (in) sgs[out_num + in_num++] = in-sgl; - return virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, gfp); + return virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, GFP_ATOMIC); } static int virtscsi_kick_cmd(struct virtio_scsi_vq *vq, struct virtio_scsi_cmd *cmd, -size_t req_size, size_t resp_size, gfp_t gfp) +size_t req_size, size_t resp_size) { unsigned long flags; int err; bool needs_kick = false; spin_lock_irqsave(vq-vq_lock, flags); - err = virtscsi_add_cmd(vq-vq, cmd, req_size, resp_size, gfp); + err = virtscsi_add_cmd(vq-vq, cmd, req_size, resp_size); if (!err) needs_kick = virtqueue_kick_prepare(vq-vq); @@ -530,8 +529,7 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi, memcpy(cmd-req.cmd.cdb, sc-cmnd, sc-cmd_len); if (virtscsi_kick_cmd(req_vq, cmd, - sizeof cmd-req.cmd, sizeof cmd-resp.cmd, - GFP_ATOMIC) == 0) + sizeof cmd-req.cmd, sizeof cmd-resp.cmd) == 0) ret = 0; else mempool_free(cmd, virtscsi_cmd_pool); @@ -596,8 +594,7 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd *cmd) cmd-comp = comp; if (virtscsi_kick_cmd(vscsi-ctrl_vq, cmd, - sizeof cmd-req.tmf, sizeof cmd-resp.tmf, - GFP_NOIO) 0) + sizeof cmd-req.tmf, sizeof cmd-resp.tmf) 0) goto out; wait_for_completion(comp); Acked-by: Paolo Bonzini pbonz...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?
Il 20/05/2014 10:10, Bart Van Assche ha scritto: REQ_ATOM_COMPLETE is already set before scsi_eh_scmd_add() is called since that function is only invoked after the block layer has marked a request as complete. The only callers of scsi_eh_scmd_add() are scsi_softirq_done(), scsi_times_out() and scmd_eh_abort_handler(). That last function is invoked (indirectly) by scsi_times_out(). Yes, and answering my own question, you cannot have a dangling pointer in eh_abort_handler (unless you have a driver bug). This is because once eh_abort_handler is called, you know the interrupt handler will not trigger the softirq and scsi_finish_command won't be called. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
dangling pointers and/or reentrancy in scmd_eh_abort_handler?
Hi all, I'm trying to understand asynchronous abort in the current upstream code, and the code seems to have some dubious locking. Here are some examples of the issue: 1) dangling pointers: scsi_put_command calls cancel_delayed_work(), but that doesn't mean that the scmd_eh_abort_handler couldn't be already running. If the scmd_eh_abort_handler starts while the softirq handler is calling scsi_put_command (e.g. scsi_finish_command - scsi_io_completion - scsi_end_request - scsi_next_command), the pointer to the Scsi_Cmnd* becomes invalid in the middle of the abort handler. 2) reentrancy: the softirq handler and scmd_eh_abort_handler can run concurrently, and call scsi_finish_command without any lock protecting the calls. You can then get memory corruption. I don't have any reproducer for this; we're seeing related crashes in virtio-scsi EH but those are due to a bug in the driver. But it means that I have no sensible way to write the eh_abort_handler. Example (1) means that the eh_abort_handler cannot use the passed Scsi_Cmnd, because it might not even be valid when entering the eh_abort_handler. Example (2) means that the eh_abort_handler cannot return SUCCESS if it detects that the command has been completed in the meanwhile. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?
Il 19/05/2014 17:08, Bart Van Assche ha scritto: On 05/19/14 16:08, Paolo Bonzini wrote: 2) reentrancy: the softirq handler and scmd_eh_abort_handler can run concurrently, and call scsi_finish_command without any lock protecting the calls. You can then get memory corruption. I'm not sure what the recommended approach is to address this race. But it is possible to address this in the LLD. See e.g. the srp_claim_req() function in the SRP LLD and how it is invoked from the reply handler, the abort handler and the reset handlers in that LLD. That's not enough, unless I'm missing something. Say the request handler claims the request and the abort handler doesn't: - the request handler calls scsi_done and ends up in scsi_finish_command. - the abort handler will return SUCCESS, and scmd_eh_abort_handler then calls scsi_finish_command. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1] virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends()
Il 08/05/2014 07:24, Ming Lei ha scritto: Access to tgt-req_vq is strictly serialized by spin_lock of tgt-tgt_lock, so the ACCESS_ONCE() isn't necessary. smp_read_barrier_depends() in virtscsi_req_done was introduced to order reading req_vq and decreasing tgt-reqs, but it isn't needed now because req_vq is read from scsi-req_vqs[vq-index - VIRTIO_SCSI_VQ_BASE] instead of tgt-req_vq, so remove the unnecessary barrier. Also remove related comment about the barrier. Cc: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Ming Lei tom.leim...@gmail.com --- v1: - fix comment on decrements of reqs with writing of req_vq as suggested by Paolo drivers/scsi/virtio_scsi.c | 55 +--- 1 file changed, 6 insertions(+), 49 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 16bfd50..bedea60 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -73,19 +73,12 @@ struct virtio_scsi_vq { * queue, and also lets the driver optimize the IRQ affinity for the virtqueues * (each virtqueue's affinity is set to the CPU that owns the queue). * - * An interesting effect of this policy is that only writes to req_vq need to - * take the tgt_lock. Read can be done outside the lock because: + * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq + * could be done locklessly, but we do not do it yet. * - * - writes of req_vq only occur when atomic_inc_return(tgt-reqs) returns 1. - * In that case, no other CPU is reading req_vq: even if they were in - * virtscsi_queuecommand_multi, they would be spinning on tgt_lock. - * - * - reads of req_vq only occur when the target is not idle (reqs != 0). - * A CPU that enters virtscsi_queuecommand_multi will not modify req_vq. - * - * Similarly, decrements of reqs are never concurrent with writes of req_vq. - * Thus they can happen outside the tgt_lock, provided of course we make reqs - * an atomic_t. + * Decrements of reqs are never concurrent with writes of req_vq because + * request is always completd after being queued. Thus they can happen + * outside the tgt_lock, provided of course we make reqs an atomic_t. Decrements of reqs are never concurrent with writes of req_vq: before the decrement reqs will be != 0; after the decrement the virtqueue completion routine will not use the req_vq so it can be changed by a new request. Thus they can happen outside the tgt_lock, provided of course we make reqs an atomic_t. Otherwise looks good. Paolo */ struct virtio_scsi_target_state { /* This spinlock never held at the same time as vq_lock. */ @@ -238,38 +231,6 @@ static void virtscsi_req_done(struct virtqueue *vq) int index = vq-index - VIRTIO_SCSI_VQ_BASE; struct virtio_scsi_vq *req_vq = vscsi-req_vqs[index]; - /* -* Read req_vq before decrementing the reqs field in -* virtscsi_complete_cmd. -* -* With barriers: -* -* CPU #0 virtscsi_queuecommand_multi (CPU #1) -* -* lock vq_lock -* read req_vq -* read reqs (reqs = 1) -* write reqs (reqs = 0) -* increment reqs (reqs = 1) -* write req_vq -* -* Possible reordering without barriers: -* -* CPU #0 virtscsi_queuecommand_multi (CPU #1) -* -* lock vq_lock -* read reqs (reqs = 1) -* write reqs (reqs = 0) -* increment reqs (reqs = 1) -* write req_vq -* read (wrong) req_vq -* -* We do not need a full smp_rmb, because req_vq is required to get -* to tgt-reqs: tgt is vscsi-tgt[sc-device-id], where sc is stored -* in the virtqueue as the user token. -*/ - smp_read_barrier_depends(); - virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd); }; @@ -560,12 +521,8 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi, spin_lock_irqsave(tgt-tgt_lock, flags); - /* -* The memory barrier after atomic_inc_return matches -* the smp_read_barrier_depends() in virtscsi_req_done. -*/ if (atomic_inc_return(tgt-reqs) 1) - vq = ACCESS_ONCE(tgt-req_vq); + vq = tgt-req_vq; else { queue_num = smp_processor_id(); while (unlikely(queue_num = vscsi-num_queues)) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio-scsi: two questions related with picking up queue
Il 08/05/2014 12:44, Ming Lei ha scritto: On Wed, 07 May 2014 18:43:45 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Per-CPU spinlocks have bad scalability problems, especially if you're overcommitting. Writing req_vq is not at all rare. OK, thought about it further, and I believe seqcount may be a match for the case, could you take a look at below patch? diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 13dd500..1adbad7 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -26,6 +26,7 @@ #include scsi/scsi_host.h #include scsi/scsi_device.h #include scsi/scsi_cmnd.h +#include linux/seqlock.h #define VIRTIO_SCSI_MEMPOOL_SZ 64 #define VIRTIO_SCSI_EVENT_LEN 8 @@ -73,18 +74,16 @@ struct virtio_scsi_vq { * queue, and also lets the driver optimize the IRQ affinity for the virtqueues * (each virtqueue's affinity is set to the CPU that owns the queue). * - * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq - * could be done locklessly, but we do not do it yet. + * tgt_seq is held to serialize reading and writing req_vq. * * Decrements of reqs are never concurrent with writes of req_vq: before the * decrement reqs will be != 0; after the decrement the virtqueue completion * routine will not use the req_vq so it can be changed by a new request. - * Thus they can happen outside the tgt_lock, provided of course we make reqs + * Thus they can happen outside the tgt_seq, provided of course we make reqs * an atomic_t. */ struct virtio_scsi_target_state { - /* This spinlock never held at the same time as vq_lock. */ - spinlock_t tgt_lock; + seqcount_t tgt_seq; /* Count of outstanding requests. */ atomic_t reqs; @@ -521,19 +520,33 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi, unsigned long flags; u32 queue_num; - spin_lock_irqsave(tgt-tgt_lock, flags); + local_irq_save(flags); + if (atomic_inc_return(tgt-reqs) 1) { + unsigned long seq; + + do { + seq = read_seqcount_begin(tgt-tgt_seq); + vq = tgt-req_vq; + } while (read_seqcount_retry(tgt-tgt_seq, seq)); + } else { + /* no writes can be concurrent because of atomic_t */ + write_seqcount_begin(tgt-tgt_seq); + + /* keep previous req_vq if there is reader found */ + if (unlikely(atomic_read(tgt-reqs) 1)) { + vq = tgt-req_vq; + goto unlock; + } queue_num = smp_processor_id(); while (unlikely(queue_num = vscsi-num_queues)) queue_num -= vscsi-num_queues; tgt-req_vq = vq = vscsi-req_vqs[queue_num]; + unlock: + write_seqcount_end(tgt-tgt_seq); } + local_irq_restore(flags); I find this harder to think about than the double-check with a spin_lock_irqsave in the middle, and the read side is not lock free anymore. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio-scsi: two questions related with picking up queue
Il 08/05/2014 14:55, Ming Lei ha scritto: On Thu, May 8, 2014 at 8:17 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 08/05/2014 12:44, Ming Lei ha scritto: On Wed, 07 May 2014 18:43:45 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Per-CPU spinlocks have bad scalability problems, especially if you're overcommitting. Writing req_vq is not at all rare. OK, thought about it further, and I believe seqcount may be a match for the case, could you take a look at below patch? diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 13dd500..1adbad7 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -26,6 +26,7 @@ #include scsi/scsi_host.h #include scsi/scsi_device.h #include scsi/scsi_cmnd.h +#include linux/seqlock.h #define VIRTIO_SCSI_MEMPOOL_SZ 64 #define VIRTIO_SCSI_EVENT_LEN 8 @@ -73,18 +74,16 @@ struct virtio_scsi_vq { * queue, and also lets the driver optimize the IRQ affinity for the virtqueues * (each virtqueue's affinity is set to the CPU that owns the queue). * - * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq - * could be done locklessly, but we do not do it yet. + * tgt_seq is held to serialize reading and writing req_vq. * * Decrements of reqs are never concurrent with writes of req_vq: before the * decrement reqs will be != 0; after the decrement the virtqueue completion * routine will not use the req_vq so it can be changed by a new request. - * Thus they can happen outside the tgt_lock, provided of course we make reqs + * Thus they can happen outside the tgt_seq, provided of course we make reqs * an atomic_t. */ struct virtio_scsi_target_state { - /* This spinlock never held at the same time as vq_lock. */ - spinlock_t tgt_lock; + seqcount_t tgt_seq; /* Count of outstanding requests. */ atomic_t reqs; @@ -521,19 +520,33 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi, unsigned long flags; u32 queue_num; - spin_lock_irqsave(tgt-tgt_lock, flags); + local_irq_save(flags); + if (atomic_inc_return(tgt-reqs) 1) { + unsigned long seq; + + do { + seq = read_seqcount_begin(tgt-tgt_seq); + vq = tgt-req_vq; + } while (read_seqcount_retry(tgt-tgt_seq, seq)); + } else { + /* no writes can be concurrent because of atomic_t */ + write_seqcount_begin(tgt-tgt_seq); + + /* keep previous req_vq if there is reader found */ + if (unlikely(atomic_read(tgt-reqs) 1)) { + vq = tgt-req_vq; + goto unlock; + } queue_num = smp_processor_id(); while (unlikely(queue_num = vscsi-num_queues)) queue_num -= vscsi-num_queues; tgt-req_vq = vq = vscsi-req_vqs[queue_num]; + unlock: + write_seqcount_end(tgt-tgt_seq); } + local_irq_restore(flags); I find this harder to think about than the double-check with a spin_lock_irqsave in the middle, Sorry, could you explain it a bit? With seqcount, spin_lock isn't needed, which should have been used for serialize read and write. Yes, the spin lock is not needed but you are still potentially spinning on the read side. and the read side is not lock free anymore. It is still lock free, because reader won't block reader, and both read_seqcount_begin and read_seqcount_retry only checks if there is writer in progress or being completed, and the two helpers are very cheap. Lock-free has a precise meaning, which is that the system will progress regardless of scheduling. In this case, readers won't progress while the writer is preempted between write_seqcount_begin and write_seqcount_end. My cmpxchg example had a lock-free read-side and a blocking write-side, while your code is the opposite, the write-side is lock-free and the read-side is blocking. I'm not sure which is better. You can try both and the current code, and show that some benchmarks improve. Otherwise, it's better to leave the current code. Simple code is better that complex code that was never benchmarked (which is why in the end I and Wanlong Gao settled for the simple spinlock). Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio-scsi: two questions related with picking up queue
Il 07/05/2014 03:07, Ming Lei ha scritto: Hi Paolo, On Tue, May 6, 2014 at 9:17 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 06/05/2014 11:26, Ming Lei ha scritto: Hi Paolo and All, One question is about ACCESS_ONCE() in virtscsi_pick_vq(), looks it needn't since both reading and writing tgt-req_vq holds tgt-tgt_lock. You're right. It should be possible to avoid the lock in virtscsi_pick_vq like this: value = atomic_read(tgt-reqs); I am wondering if atomic_read() is OK because atomic_read() should be treated as a simple C statement, and it may not reflect the latest value of tgt-reqs. It would be caught by cmpxchg below. retry: if (value != 0) { old_value = atomic_cmpxchg(tgt-regs, value, value + 1) if (old_value != value) { Maybe ' if (old_value != value !old_value) ' is a bit better. No, because if you have failed the cmpxchg you haven't incremented tgt-reqs. value = old_value; goto retry; } vq = ACCESS_ONCE(tgt-req_vq); } else { spin_lock_irqsave(tgt-tgt_lock, flags); // tgt-reqs may not be 0 anymore, need to recheck value = atomic_read(tgt-reqs); if (atomic_read(tgt-reqs) != 0) { spin_unlock_irqrestore(tgt-tgt_lock, flags); goto retry; } Same with above, if atomic_read() still returns zero even after it is increased in read path from another CPU, then an obsolete vq pointer might be seen in the read path. If I understand you correctly, then the CPUs wouldn't be cache-coherent. You would have bigger problems. // tgt-reqs now will remain fixed to 0. ... tgt-req_vq = vq = ...; smp_wmb(); atomic_set(tgt-reqs, 1); spin_unlock_irqrestore(tgt-tgt_lock, flags); } return vq; and then you would need the ACCESS_ONCE but I'm not sure it's worthwhile. Yes, I agree, :-) Another one is about the comment in virtscsi_req_done(), which said smp_read_barrier_depends() is needed for avoiding out of order between reading req_vq and decreasing tgt-reqs. But if I understand correctly, in virtscsi_req_done(), req_vq is read from vscsi-req_vqs[vq-index - VIRTIO_SCSI_VQ_BASE], instead of tgt-req_vq, and the former won't change wrt. inc/dec tgt-reqs, so can the barrier be removed? Right. The comment is obsolete and dates to before vq-index existed. OK, I will cook a patch to remove the barrier and cleanup the comment. Thanks. Please remove the ACCESS_ONCE too. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] virtio_scsi: remove smp_read_barrier_depends
Il 07/05/2014 16:13, Ming Lei ha scritto: order reading req_vq and decreasing tgt-reqs, but it isn't needed now because req_vq is read from scsi-req_vqs[vq-index - VIRTIO_SCSI_VQ_BASE] instead of tgt-req_vq, so remove the unnecessary barrier. Also remove related comment about the barrier. Cc: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Ming Lei tom.leim...@gmail.com --- drivers/scsi/virtio_scsi.c | 52 +++- 1 file changed, 3 insertions(+), 49 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index aead20f..0c77ed6 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -73,19 +73,9 @@ struct virtio_scsi_vq { * queue, and also lets the driver optimize the IRQ affinity for the virtqueues * (each virtqueue's affinity is set to the CPU that owns the queue). * - * An interesting effect of this policy is that only writes to req_vq need to - * take the tgt_lock. Read can be done outside the lock because: - * - * - writes of req_vq only occur when atomic_inc_return(tgt-reqs) returns 1. - * In that case, no other CPU is reading req_vq: even if they were in - * virtscsi_queuecommand_multi, they would be spinning on tgt_lock. - * - * - reads of req_vq only occur when the target is not idle (reqs != 0). - * A CPU that enters virtscsi_queuecommand_multi will not modify req_vq. - * - * Similarly, decrements of reqs are never concurrent with writes of req_vq. - * Thus they can happen outside the tgt_lock, provided of course we make reqs - * an atomic_t. You are removing a bit too much; decrements of reqs are still done outside the tgt_lock, and you need to explain why. + * tgt_lock is held to serialize reading and writing req_vq, and reading + * req_vq should have been done concurrently, but need a bit complicated + * trick. tgt_lock is held to serialize reading and writing req_vq. Reading req_vq could be done locklessly, but we do not do it yet. You can also merge the two patches together. Paolo */ struct virtio_scsi_target_state { /* This spinlock never held at the same time as vq_lock. */ @@ -238,38 +228,6 @@ static void virtscsi_req_done(struct virtqueue *vq) int index = vq-index - VIRTIO_SCSI_VQ_BASE; struct virtio_scsi_vq *req_vq = vscsi-req_vqs[index]; - /* -* Read req_vq before decrementing the reqs field in -* virtscsi_complete_cmd. -* -* With barriers: -* -* CPU #0 virtscsi_queuecommand_multi (CPU #1) -* -* lock vq_lock -* read req_vq -* read reqs (reqs = 1) -* write reqs (reqs = 0) -* increment reqs (reqs = 1) -* write req_vq -* -* Possible reordering without barriers: -* -* CPU #0 virtscsi_queuecommand_multi (CPU #1) -* -* lock vq_lock -* read reqs (reqs = 1) -* write reqs (reqs = 0) -* increment reqs (reqs = 1) -* write req_vq -* read (wrong) req_vq -* -* We do not need a full smp_rmb, because req_vq is required to get -* to tgt-reqs: tgt is vscsi-tgt[sc-device-id], where sc is stored -* in the virtqueue as the user token. -*/ - smp_read_barrier_depends(); - virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd); }; @@ -560,10 +518,6 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi, spin_lock_irqsave(tgt-tgt_lock, flags); - /* -* The memory barrier after atomic_inc_return matches -* the smp_read_barrier_depends() in virtscsi_req_done. -*/ if (atomic_inc_return(tgt-reqs) 1) vq = tgt-req_vq; else { -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends()
Il 07/05/2014 16:34, Ming Lei ha scritto: * - * An interesting effect of this policy is that only writes to req_vq need to - * take the tgt_lock. Read can be done outside the lock because: - * - * - writes of req_vq only occur when atomic_inc_return(tgt-reqs) returns 1. - * In that case, no other CPU is reading req_vq: even if they were in - * virtscsi_queuecommand_multi, they would be spinning on tgt_lock. - * - * - reads of req_vq only occur when the target is not idle (reqs != 0). - * A CPU that enters virtscsi_queuecommand_multi will not modify req_vq. + * tgt_lock is held to serialize reading and writing req_vq. Reading req_vq + * could be done locklessly, but we do not do it yet. * * Similarly, decrements of reqs are never concurrent with writes of req_vq. * Thus they can happen outside the tgt_lock, provided of course we make reqs The part you deleted explains _why_ decrements of reqs are never concurrent with writes of req_vq. Perhaps: * An interesting effect of this policy is that only increments to reqs and writes * to req_vq need to take the tgt_lock. Reads of req_vq and decrements or req_vq * can be done outside the lock because: * * - writes of req_vq only occur when atomic_inc_return(tgt-reqs) returns 1. * In that case, no other CPU is reading req_vq: even if they were in * virtscsi_queuecommand_multi, they would be spinning on tgt_lock. * * - reads of req_vq only occur when the target is not idle (reqs != 0). * A CPU that enters virtscsi_queuecommand_multi will not modify req_vq. * * - likewise, decrements of reqs only occur when reqs != 0. If the decremented * value is zero, the first CPU that enters virtscsi_queuecommand_multi will * modify req_vq and the others will spin on tgt_lock. * * We do not try to read req_vq locklessly for simplicity, so tgt_lock is used * to serialize reads of req_vq too. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_scsi: remove ACCESS_ONCE() and smp_read_barrier_depends()
Il 07/05/2014 18:03, Ming Lei ha scritto: * * - likewise, decrements of reqs only occur when reqs != 0. If the decremented * value is zero, the first CPU that enters virtscsi_queuecommand_multi will * modify req_vq and the others will spin on tgt_lock. The fact should be obvious too, Perhaps, but in your patch you're leaving a Similarly that doesn't apply anymore. * We do not try to read req_vq locklessly for simplicity, so tgt_lock is used * to serialize reads of req_vq too. Maybe it is better to just mention it isn't done yet, and maybe someone will figure out simple way to support lockless reading req_vq. The one I sketched is not too hard. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio-scsi: two questions related with picking up queue
Il 07/05/2014 18:24, Ming Lei ha scritto: On Tue, 06 May 2014 15:17:15 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Il 06/05/2014 11:26, Ming Lei ha scritto: Hi Paolo and All, One question is about ACCESS_ONCE() in virtscsi_pick_vq(), looks it needn't since both reading and writing tgt-req_vq holds tgt-tgt_lock. You're right. It should be possible to avoid the lock in virtscsi_pick_vq like this: value = atomic_read(tgt-reqs); retry: if (value != 0) { old_value = atomic_cmpxchg(tgt-regs, value, value + 1) if (old_value != value) { value = old_value; goto retry; } smp_mb__after_atomic_cmpxchg(); // you get the idea :) vq = ACCESS_ONCE(tgt-req_vq); } else { spin_lock_irqsave(tgt-tgt_lock, flags); // tgt-reqs may not be 0 anymore, need to recheck value = atomic_read(tgt-reqs); if (atomic_read(tgt-reqs) != 0) { spin_unlock_irqrestore(tgt-tgt_lock, flags); goto retry; } // tgt-reqs now will remain fixed to 0. ... tgt-req_vq = vq = ...; smp_wmb(); atomic_set(tgt-reqs, 1); spin_unlock_irqrestore(tgt-tgt_lock, flags); } return vq; Another approach I thought of is to use percpu spinlock, and the idea is simple: - all perpcu locks are held for writing req_vq, and - only percpu lock is needed for reading req_vq. What do you think about the below patch? Per-CPU spinlocks have bad scalability problems, especially if you're overcommitting. Writing req_vq is not at all rare. Paolo diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 697fa53..00deab4 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -82,7 +82,7 @@ struct virtio_scsi_vq { */ struct virtio_scsi_target_state { /* This spinlock never held at the same time as vq_lock. */ - spinlock_t tgt_lock; + spinlock_t __percpu *lock; /* Count of outstanding requests. */ atomic_t reqs; @@ -517,21 +517,46 @@ static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi, { struct virtio_scsi_vq *vq; unsigned long flags; - u32 queue_num; + u32 cpu = get_cpu(); + spinlock_t *lock = per_cpu_ptr(tgt-lock, cpu); - spin_lock_irqsave(tgt-tgt_lock, flags); + spin_lock_irqsave(lock, flags); if (atomic_inc_return(tgt-reqs) 1) vq = tgt-req_vq; else { - queue_num = smp_processor_id(); + u32 queue_num = cpu; + int i; + while (unlikely(queue_num = vscsi-num_queues)) queue_num -= vscsi-num_queues; - tgt-req_vq = vq = vscsi-req_vqs[queue_num]; + /* +* there should be only one writing because of atomic +* counter, so we don't worry about deadlock, but +* might need to teach lockdep to not complain it +*/ + for_each_possible_cpu(i) { + spinlock_t *other = per_cpu_ptr(tgt-lock, i); + if (i != cpu) + spin_lock(other); + } + + /* only update req_vq when reqs is one*/ + if (atomic_read(tgt-reqs) == 1) + tgt-req_vq = vq = vscsi-req_vqs[queue_num]; + else + vq = tgt-req_vq; + + for_each_possible_cpu(i) { + spinlock_t *other = per_cpu_ptr(tgt-lock, i); + if (i != cpu) + spin_unlock(other); + } } - spin_unlock_irqrestore(tgt-tgt_lock, flags); + spin_unlock_irqrestore(lock, flags); + put_cpu(); return vq; } @@ -618,10 +643,22 @@ static int virtscsi_target_alloc(struct scsi_target *starget) { struct virtio_scsi_target_state *tgt = kmalloc(sizeof(*tgt), GFP_KERNEL); + int i; + if (!tgt) return -ENOMEM; - spin_lock_init(tgt-tgt_lock); + tgt-lock = alloc_percpu(spinlock_t); + if (!tgt-lock) { + kfree(tgt); + return -ENOMEM; + } + + for_each_possible_cpu(i) { + spinlock_t *lock = per_cpu_ptr(tgt-lock, i); + spin_lock_init(lock); + } + atomic_set(tgt-reqs, 0); tgt-req_vq = NULL; @@ -632,6 +669,7 @@ static int virtscsi_target_alloc(struct scsi_target *starget) static void virtscsi_target_destroy(struct scsi_target *starget) { struct virtio_scsi_target_state *tgt = starget-hostdata; + free_percpu(tgt-lock
Re: virtio-scsi: two questions related with picking up queue
Il 06/05/2014 11:26, Ming Lei ha scritto: Hi Paolo and All, One question is about ACCESS_ONCE() in virtscsi_pick_vq(), looks it needn't since both reading and writing tgt-req_vq holds tgt-tgt_lock. You're right. It should be possible to avoid the lock in virtscsi_pick_vq like this: value = atomic_read(tgt-reqs); retry: if (value != 0) { old_value = atomic_cmpxchg(tgt-regs, value, value + 1) if (old_value != value) { value = old_value; goto retry; } smp_mb__after_atomic_cmpxchg(); // you get the idea :) vq = ACCESS_ONCE(tgt-req_vq); } else { spin_lock_irqsave(tgt-tgt_lock, flags); // tgt-reqs may not be 0 anymore, need to recheck value = atomic_read(tgt-reqs); if (atomic_read(tgt-reqs) != 0) { spin_unlock_irqrestore(tgt-tgt_lock, flags); goto retry; } // tgt-reqs now will remain fixed to 0. ... tgt-req_vq = vq = ...; smp_wmb(); atomic_set(tgt-reqs, 1); spin_unlock_irqrestore(tgt-tgt_lock, flags); } return vq; and then you would need the ACCESS_ONCE but I'm not sure it's worthwhile. Another one is about the comment in virtscsi_req_done(), which said smp_read_barrier_depends() is needed for avoiding out of order between reading req_vq and decreasing tgt-reqs. But if I understand correctly, in virtscsi_req_done(), req_vq is read from vscsi-req_vqs[vq-index - VIRTIO_SCSI_VQ_BASE], instead of tgt-req_vq, and the former won't change wrt. inc/dec tgt-reqs, so can the barrier be removed? Right. The comment is obsolete and dates to before vq-index existed. Paolo Any comments about the above? Thanks, -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] SCSI fixes for 3.15-rc3
Il 25/04/2014 16:59, James Bottomley ha scritto: This is a set of seven fixes, three (hpsa) and free'd command references correcting bugs in the last round of updates and the remaining four correcting problems within the SCSI error handler that was causing a deadlock within USB. The patch is available here: git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes The short changelog is: Alan Stern (1): Fix command result state propagation Christoph Hellwig (2): don't reference freed command in scsi_prep_return don't reference freed command in scsi_init_sgtable Hannes Reinecke (1): Fix USB deadlock caused by SCSI error handling James Bottomley (2): More USB deadlock fixes Fix spurious request sense in error handling scame...@beardog.cce.hp.com (1): hpsa: fix NULL dereference in hpsa_put_ctlr_into_performant_mode() And the diffstat: drivers/scsi/hpsa.c | 8 drivers/scsi/scsi_error.c | 12 drivers/scsi/scsi_lib.c | 6 -- 3 files changed, 20 insertions(+), 6 deletions(-) With full diffs below. James, can you queue http://article.gmane.org/gmane.linux.kernel/1682301/raw please? Thanks, Paolo James --- diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index 8cf4a0c..9a6e4a2 100644 --- a/drivers/scsi/hpsa.c +++ b/drivers/scsi/hpsa.c @@ -7463,6 +7463,10 @@ static void hpsa_put_ctlr_into_performant_mode(struct ctlr_info *h) if (hpsa_simple_mode) return; + trans_support = readl((h-cfgtable-TransportSupport)); + if (!(trans_support PERFORMANT_MODE)) + return; + /* Check for I/O accelerator mode support */ if (trans_support CFGTBL_Trans_io_accel1) { transMethod |= CFGTBL_Trans_io_accel1 | @@ -7479,10 +7483,6 @@ static void hpsa_put_ctlr_into_performant_mode(struct ctlr_info *h) } /* TODO, check that this next line h-nreply_queues is correct */ - trans_support = readl((h-cfgtable-TransportSupport)); - if (!(trans_support PERFORMANT_MODE)) - return; - h-nreply_queues = h-msix_vector 0 ? h-msix_vector : 1; hpsa_get_max_perf_mode_cmds(h); /* Performant mode ring buffer and supporting data structures */ diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 771c16b..f17aa7a 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -189,6 +189,7 @@ scsi_abort_command(struct scsi_cmnd *scmd) /* * Retry after abort failed, escalate to next level. */ + scmd-eh_eflags = ~SCSI_EH_ABORT_SCHEDULED; SCSI_LOG_ERROR_RECOVERY(3, scmd_printk(KERN_INFO, scmd, scmd %p previous abort failed\n, scmd)); @@ -920,10 +921,12 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses, ses-prot_op = scmd-prot_op; scmd-prot_op = SCSI_PROT_NORMAL; + scmd-eh_eflags = 0; scmd-cmnd = ses-eh_cmnd; memset(scmd-cmnd, 0, BLK_MAX_CDB); memset(scmd-sdb, 0, sizeof(scmd-sdb)); scmd-request-next_rq = NULL; + scmd-result = 0; if (sense_bytes) { scmd-sdb.length = min_t(unsigned, SCSI_SENSE_BUFFERSIZE, @@ -1157,6 +1160,15 @@ int scsi_eh_get_sense(struct list_head *work_q, __func__)); break; } + if (status_byte(scmd-result) != CHECK_CONDITION) + /* +* don't request sense if there's no check condition +* status because the error we're processing isn't one +* that has a sense code (and some devices get +* confused by sense requests out of the blue) +*/ + continue; + SCSI_LOG_ERROR_RECOVERY(2, scmd_printk(KERN_INFO, scmd, %s: requesting sense\n, current-comm)); diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 65a123d..9db097a 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -137,6 +137,7 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy) * lock such that the kblockd_schedule_work() call happens * before blk_cleanup_queue() finishes. */ + cmd-result = 0; spin_lock_irqsave(q-queue_lock, flags); blk_requeue_request(q, cmd-request); kblockd_schedule_work(q, device-requeue_work); @@ -1044,6 +1045,7 @@ static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb, */ int scsi_init_io(struct scsi_cmnd *cmd, gfp_t gfp_mask) { + struct scsi_device *sdev = cmd-device; struct request *rq =
Re: [PATCH 1/6] virtio-scsi.h: Add virtio_scsi_cmd_req_pi + VIRTIO_SCSI_F_T10_PI bits
Il 09/04/2014 09:22, Michael S. Tsirkin ha scritto: The interface uses bytes instead of iovecs as the unit, and vhost-scsi can add the (temporary...) requirement that do_pi_nbytes and di_pi_nbytes comprise an integer number of iovecs. Paolo That would be a reasonable intermediate step, but I'm worried there are more assumptions lurking in tere. In particular is it legal for PI to be in the same iov entry as data? If not they really need to use a separate buf entry. No, in fact that would mean that do/di_pi_nbytes are not an integer number of iovecs. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-scsi: Skip setting affinity on uninitialized vq
put the if inside the loop, but it's really all or nothing since the failure point is find_vqs. Not a problem though; the queues are few and this is not a hot path anyway. Acked-by: Paolo Bonzini pbonz...@redhat.com Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-scsi: Skip setting affinity on uninitialized vq
Il 11/04/2014 03:23, Fam Zheng ha scritto: virtscsi_init calls virtscsi_remove_vqs on err, even before initializing the vqs. The latter calls virtscsi_set_affinity, so let's check the pointer there before setting affinity on it. This fixes a panic when setting device's num_queues=2 on RHEL 6.5: qemu-system-x86_64 ... \ -device virtio-scsi-pci,id=scsi0,addr=0x13,...,num_queues=2 \ -drive file=/stor/vm/dummy.raw,id=drive-scsi-disk,... \ -device scsi-hd,drive=drive-scsi-disk,... [0.354734] scsi0 : Virtio SCSI HBA [0.379504] BUG: unable to handle kernel NULL pointer dereference at 0020 [0.380141] IP: [814741ef] __virtscsi_set_affinity+0x4f/0x120 [0.380141] PGD 0 [0.380141] Oops: [#1] SMP [0.380141] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0+ #5 [0.380141] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [0.380141] task: 88003c9f ti: 88003c9f8000 task.ti: 88003c9f8000 [0.380141] RIP: 0010:[814741ef] [814741ef] __virtscsi_set_affinity+0x4f/0x120 [0.380141] RSP: :88003c9f9c08 EFLAGS: 00010256 [0.380141] RAX: RBX: 88003c3a9d40 RCX: 1070 [0.380141] RDX: 0002 RSI: RDI: [0.380141] RBP: 88003c9f9c28 R08: 000136c0 R09: 88003c801c00 [0.380141] R10: 81475229 R11: 0008 R12: [0.380141] R13: 81cc7ca8 R14: 88003cac3d40 R15: 88003cac37a0 [0.380141] FS: () GS:88003e40() knlGS: [0.380141] CS: 0010 DS: ES: CR0: 8005003b [0.380141] CR2: 0020 CR3: 01c0e000 CR4: 06f0 [0.380141] Stack: [0.380141] 88003c3a9d40 88003cac3d80 88003cac3d40 [0.380141] 88003c9f9c48 814742e8 88003c26d000 88003c26d000 [0.380141] 88003c9f9c68 81474321 88003c26d000 88003c3a9d40 [0.380141] Call Trace: [0.380141] [814742e8] virtscsi_set_affinity+0x28/0x40 [0.380141] [81474321] virtscsi_remove_vqs+0x21/0x50 [0.380141] [81475231] virtscsi_init+0x91/0x240 [0.380141] [81365290] ? vp_get+0x50/0x70 [0.380141] [81475544] virtscsi_probe+0xf4/0x280 [0.380141] [81363ea5] virtio_dev_probe+0xe5/0x140 [0.380141] [8144c669] driver_probe_device+0x89/0x230 [0.380141] [8144c8ab] __driver_attach+0x9b/0xa0 [0.380141] [8144c810] ? driver_probe_device+0x230/0x230 [0.380141] [8144c810] ? driver_probe_device+0x230/0x230 [0.380141] [8144ac1c] bus_for_each_dev+0x8c/0xb0 [0.380141] [8144c499] driver_attach+0x19/0x20 [0.380141] [8144bf28] bus_add_driver+0x198/0x220 [0.380141] [8144ce9f] driver_register+0x5f/0xf0 [0.380141] [81d27c91] ? spi_transport_init+0x79/0x79 [0.380141] [8136403b] register_virtio_driver+0x1b/0x30 [0.380141] [81d27d19] init+0x88/0xd6 [0.380141] [81d27c18] ? scsi_init_procfs+0x5b/0x5b [0.380141] [81ce88a7] do_one_initcall+0x7f/0x10a [0.380141] [81ce8aa7] kernel_init_freeable+0x14a/0x1de [0.380141] [81ce8b3b] ? kernel_init_freeable+0x1de/0x1de [0.380141] [817dec20] ? rest_init+0x80/0x80 [0.380141] [817dec29] kernel_init+0x9/0xf0 [0.380141] [817e68fc] ret_from_fork+0x7c/0xb0 [0.380141] [817dec20] ? rest_init+0x80/0x80 [0.380141] RIP [814741ef] __virtscsi_set_affinity+0x4f/0x120 [0.380141] RSP 88003c9f9c08 [0.380141] CR2: 0020 [0.380141] ---[ end trace 8074b70c3d5e1d73 ]--- [0.475018] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [0.475018] [0.475068] Kernel Offset: 0x0 from 0x8100 (relocation range: 0x8000-0x9fff) [0.475068] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 Signed-off-by: Fam Zheng f...@redhat.com --- drivers/scsi/virtio_scsi.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 16bfd50..3019267 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -750,8 +750,12 @@ static void __virtscsi_set_affinity(struct virtio_scsi *vscsi, bool affinity) vscsi-affinity_hint_set = true; } else { - for (i = 0; i vscsi-num_queues; i++) + for (i = 0; i vscsi-num_queues; i++) { + if (!vscsi-req_vqs[i].vq) { + continue; + } virtqueue_set_affinity(vscsi-req_vqs[i].vq, -1); + } vscsi-affinity_hint_set = false; } Fam,
Re: [PATCH 16/39] virtio_scsi: use cmd_size
Il 25/03/2014 16:31, Christoph Hellwig ha scritto: Paolo, do you have a virtio_scsi tree to picks this up in? We now have all the infrastructure for it in James' tree. I don't, I usually just give my Acked-by and James picks it up. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv2 5/7] vhost/scsi: Enable T10 PI IOV - SGL memory mapping
Il 17/03/2014 06:32, Nicholas A. Bellinger ha scritto: + if (vq-iov[0].iov_len == sizeof(v_req_pi)) { + req = (unsigned char *)v_req_pi; + target = v_req_pi.lun[1]; + req_size = sizeof(v_req_pi); + hdr_pi = true; + } else if (vq-iov[0].iov_len == sizeof(v_req)) { + req = (unsigned char *)v_req; + target = v_req.lun[1]; + req_size = sizeof(v_req); + hdr_pi = false; The right check here is on the negotiated features. You need a matching QEMU patch to enable the protection information feature, like this (untested): diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c index 3983a5b..4c8d5cd 100644 --- a/hw/scsi/vhost-scsi.c +++ b/hw/scsi/vhost-scsi.c @@ -154,6 +154,9 @@ static uint32_t vhost_scsi_get_features(VirtIODevice *vdev, if (!(s-dev.features (1 VIRTIO_SCSI_F_HOTPLUG))) { features = ~(1 VIRTIO_SCSI_F_HOTPLUG); } +if (!(s-dev.features (1 VIRTIO_SCSI_F_T10_PI))) { +features = ~(1 VIRTIO_SCSI_F_T10_PI); +} return features; } diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index 6610b3a..4a551e9 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -629,6 +629,9 @@ static void virtio_scsi_device_realize(DeviceState *dev, Error **errp) return; } +/* Protection information is not supported yet. */ +dev-guest_features = ~VIRTIO_SCSI_F_T10_PI; + scsi_bus_new(s-bus, sizeof(s-bus), dev, virtio_scsi_scsi_info, vdev-bus_name); diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 9010246..8621fbf 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -267,6 +267,16 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t); .value= no,\ },\ {\ +.driver = virtio-scsi-pci,\ +.property = prot_info,\ +.value= off,\ +},\ +{\ +.driver = vhost-scsi-pci,\ +.property = prot_info,\ +.value= off,\ +},\ +{\ .driver = PIIX4_PM,\ .property = acpi-pci-hotplug-with-bridge-support,\ .value= off,\ diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h index 42b1024..a555f49 100644 --- a/include/hw/virtio/virtio-scsi.h +++ b/include/hw/virtio/virtio-scsi.h @@ -34,6 +34,7 @@ #define VIRTIO_SCSI_F_INOUT0 #define VIRTIO_SCSI_F_HOTPLUG 1 #define VIRTIO_SCSI_F_CHANGE 2 +#define VIRTIO_SCSI_F_T10_PI 3 #define VIRTIO_SCSI_VQ_SIZE 128 #define VIRTIO_SCSI_CDB_SIZE32 @@ -184,7 +185,9 @@ typedef struct { DEFINE_PROP_BIT(hotplug, _state, _feature_field, VIRTIO_SCSI_F_HOTPLUG, \ true), \ DEFINE_PROP_BIT(param_change, _state, _feature_field, \ -VIRTIO_SCSI_F_CHANGE, true) +VIRTIO_SCSI_F_CHANGE, true) \ +DEFINE_PROP_BIT(prot_info, _state, _feature_field, \ +VIRTIO_SCSI_F_T10_PI, true) void virtio_scsi_common_realize(DeviceState *dev, Error **errp); void virtio_scsi_common_unrealize(DeviceState *dev, Error **errp); -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-scsi: Change sense buffer size to 252
Il 06/03/2014 09:47, Fam Zheng ha scritto: According to SPC-4, section 4.5.2.1, 252 is the limit of sense data. So increase the value. Signed-off-by: Fam Zheng f...@redhat.com --- include/linux/virtio_scsi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h index 4195b97..a437f7f 100644 --- a/include/linux/virtio_scsi.h +++ b/include/linux/virtio_scsi.h @@ -28,7 +28,7 @@ #define _LINUX_VIRTIO_SCSI_H #define VIRTIO_SCSI_CDB_SIZE 32 -#define VIRTIO_SCSI_SENSE_SIZE 96 +#define VIRTIO_SCSI_SENSE_SIZE 252 /* SCSI command request, followed by data-out */ struct virtio_scsi_cmd_req { Hi Fam, how did you test this? Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-scsi: Change sense buffer size to 252
Il 06/03/2014 12:22, Hannes Reinecke ha scritto: On 03/06/2014 11:09 AM, Paolo Bonzini wrote: Il 06/03/2014 09:47, Fam Zheng ha scritto: According to SPC-4, section 4.5.2.1, 252 is the limit of sense data. So increase the value. Signed-off-by: Fam Zheng f...@redhat.com --- include/linux/virtio_scsi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h index 4195b97..a437f7f 100644 --- a/include/linux/virtio_scsi.h +++ b/include/linux/virtio_scsi.h @@ -28,7 +28,7 @@ #define _LINUX_VIRTIO_SCSI_H #define VIRTIO_SCSI_CDB_SIZE 32 -#define VIRTIO_SCSI_SENSE_SIZE 96 +#define VIRTIO_SCSI_SENSE_SIZE 252 /* SCSI command request, followed by data-out */ struct virtio_scsi_cmd_req { Hi Fam, how did you test this? Is there a specific reason _not_ to use the linux default? The SCSI stack typically limits the sense code to SCSI_SENSE_BUFFERSIZE, so using other values have a limited sense. Literally :-) Indeed I don't think this patch makes a difference. Though I asked not from the SCSI stack perspective, but because right now both virtio-scsi targets (QEMU and vhost-scsi) are also truncating at 96. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost/scsi: Check LUN structure byte 0 is set to 1, per spec
The virtio spec requires byte 0 of the virtio-scsi LUN structure to be '1'. Signed-off-by: Venkatesh Srinivas venkate...@google.com --- drivers/vhost/scsi.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 0a025b8..e48d4a6 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1001,6 +1001,12 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) break; } + /* virtio-scsi spec requires byte 0 of the lun to be 1 */ + if (unlikely(v_req.lun[0] != 1)) { + vhost_scsi_send_bad_target(vs, vq, head, out); + continue; + } + /* Extract the tpgt */ target = v_req.lun[1]; tpg = ACCESS_ONCE(vs_tpg[target]); -- 1.8.5.3 Reviewed-by: Paolo Bonzini pbonz...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/17] scsi: avoid useless free_list lock roundtrips
Il 05/02/2014 13:39, Christoph Hellwig ha scritto: Avoid hitting the host-wide free_list lock unless we need to put a command back onto the freelist. Signed-off-by: Christoph Hellwig h...@lst.de --- drivers/scsi/scsi.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index ebcea6c..4139486 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -320,15 +320,16 @@ void __scsi_put_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd, { unsigned long flags; - /* changing locks here, don't need to restore the irq state */ - spin_lock_irqsave(shost-free_list_lock, flags); if (unlikely(list_empty(shost-free_list))) { - list_add(cmd-list, shost-free_list); - cmd = NULL; + spin_lock_irqsave(shost-free_list_lock, flags); + if (list_empty(shost-free_list)) { + list_add(cmd-list, shost-free_list); + cmd = NULL; + } + spin_unlock_irqrestore(shost-free_list_lock, flags); } - spin_unlock_irqrestore(shost-free_list_lock, flags); - if (likely(cmd != NULL)) + if (cmd) scsi_pool_free_command(shost-cmd_pool, cmd); put_device(dev); Reviewed-by: Paolo Bonzini pbonz...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/17] scsi: add support for per-host cmd pools
Il 05/02/2014 13:39, Christoph Hellwig ha scritto: + pool = scsi_find_host_cmd_pool(shost); Should you have a WARN_ON somewhere if shost-hostt-cmd_size shost-unchecked_isa_dma? Apart from this, Reviewed-by: Paolo Bonzini pbonz...@redhat.com Paolo + if (!pool) { + pool = scsi_alloc_host_cmd_pool(shost); + if (!pool) + goto out; + } + -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/17] virtio_scsi: use cmd_size
Il 05/02/2014 13:39, Christoph Hellwig ha scritto: Taken almost entirely from Nicholas Bellinger's scsi-mq conversion. Signed-off-by: Christoph Hellwig h...@lst.de --- drivers/scsi/virtio_scsi.c | 25 +++-- 1 file changed, 7 insertions(+), 18 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 16bfd50..d9a6074 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -204,7 +204,6 @@ static void virtscsi_complete_cmd(struct virtio_scsi *vscsi, void *buf) set_driver_byte(sc, DRIVER_SENSE); } - mempool_free(cmd, virtscsi_cmd_pool); sc-scsi_done(sc); atomic_dec(tgt-reqs); @@ -279,8 +278,6 @@ static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf) if (cmd-comp) complete_all(cmd-comp); - else - mempool_free(cmd, virtscsi_cmd_pool); } static void virtscsi_ctrl_done(struct virtqueue *vq) @@ -496,10 +493,9 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi, struct virtio_scsi_vq *req_vq, struct scsi_cmnd *sc) { - struct virtio_scsi_cmd *cmd; - int ret; - struct Scsi_Host *shost = virtio_scsi_host(vscsi-vdev); + struct virtio_scsi_cmd *cmd = (struct virtio_scsi_cmd *)(sc + 1); + BUG_ON(scsi_sg_count(sc) shost-sg_tablesize); /* TODO: check feature bit and fail if unsupported? */ @@ -508,11 +504,6 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi, dev_dbg(sc-device-sdev_gendev, cmd %p CDB: %#02x\n, sc, sc-cmnd[0]); - ret = SCSI_MLQUEUE_HOST_BUSY; - cmd = mempool_alloc(virtscsi_cmd_pool, GFP_ATOMIC); - if (!cmd) - goto out; - memset(cmd, 0, sizeof(*cmd)); cmd-sc = sc; cmd-req.cmd = (struct virtio_scsi_cmd_req){ @@ -531,13 +522,9 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi, if (virtscsi_kick_cmd(req_vq, cmd, sizeof cmd-req.cmd, sizeof cmd-resp.cmd, - GFP_ATOMIC) == 0) - ret = 0; - else - mempool_free(cmd, virtscsi_cmd_pool); - -out: - return ret; + GFP_ATOMIC) != 0) + return SCSI_MLQUEUE_HOST_BUSY; + return 0; } static int virtscsi_queuecommand_single(struct Scsi_Host *sh, @@ -683,6 +670,7 @@ static struct scsi_host_template virtscsi_host_template_single = { .name = Virtio SCSI HBA, .proc_name = virtio_scsi, .this_id = -1, + .cmd_size = sizeof(struct virtio_scsi_cmd), .queuecommand = virtscsi_queuecommand_single, .eh_abort_handler = virtscsi_abort, .eh_device_reset_handler = virtscsi_device_reset, @@ -699,6 +687,7 @@ static struct scsi_host_template virtscsi_host_template_multi = { .name = Virtio SCSI HBA, .proc_name = virtio_scsi, .this_id = -1, + .cmd_size = sizeof(struct virtio_scsi_cmd), .queuecommand = virtscsi_queuecommand_multi, .eh_abort_handler = virtscsi_abort, .eh_device_reset_handler = virtscsi_device_reset, -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Acked-by: Paolo Bonzini pbonz...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] target_core_spc: bounds check for spc_emulate_inquiry()
Il 20/12/2013 07:47, Hannes Reinecke ha scritto: How about changing the local buffer to heap memory instead, and bumping SE_INQUIRY_BUF to 1024 bytes..? Ok. But then we should have a quick check at the start if (cmd-data_length SE_INQUIRY_BUF) len = cmd-data_length else len = SE_INQUIRY_BUF to catch oversized requests. Why do you need it? If inquiry data is always 1K, when cmd-data_length is large you can just need to write zeroes to the buffer after the first SE_INQUIRY_BUF bytes. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze
Il 28/10/2013 09:01, Asias He ha scritto: vqs are freed in virtscsi_freeze but the hotcpu_notifier is not unregistered. We will have a use-after-free usage when the notifier callback is called after virtscsi_freeze. Signed-off-by: Asias He as...@redhat.com --- drivers/scsi/virtio_scsi.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 74b88ef..b26f1a5 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -957,6 +957,10 @@ static void virtscsi_remove(struct virtio_device *vdev) #ifdef CONFIG_PM static int virtscsi_freeze(struct virtio_device *vdev) { + struct Scsi_Host *sh = virtio_scsi_host(vdev); + struct virtio_scsi *vscsi = shost_priv(sh); + + unregister_hotcpu_notifier(vscsi-nb); virtscsi_remove_vqs(vdev); return 0; } @@ -965,8 +969,17 @@ static int virtscsi_restore(struct virtio_device *vdev) { struct Scsi_Host *sh = virtio_scsi_host(vdev); struct virtio_scsi *vscsi = shost_priv(sh); + int err; + + err = virtscsi_init(vdev, vscsi); + if (err) + return err; + + err = register_hotcpu_notifier(vscsi-nb); + if (err) + vdev-config-del_vqs(vdev); - return virtscsi_init(vdev, vscsi); + return err; } #endif Reviewed-by: Paolo Bonzini pbonz...@redhat.com Cc: sta...@vger.kernel.org -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI's heuristics for enabling WRITE SAME still need work [was: dm mpath: disable WRITE SAME if it fails]
Il 21/09/2013 00:03, Martin K. Petersen ha scritto: The major headache here of course is that WRITE SAME is inherently destructive. We can't just fire off one during discovery and see if it works. For WRITE you can issue a command with a transfer length of 0 to see if things work. But unfortunately for WRITE SAME a transfer length of zero means wipe the entire device. Yikes! What about WRITE SAME with an out-of-range lba? Or lba equal to the size of the disk and #blocks equal to zero. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SCSI: Fix potential out-of-bounds access in drivers/scsi/sd.c
Il 06/09/2013 17:49, Alan Stern ha scritto: This patch fixes an out-of-bounds error in sd_read_cache_type(), found by Google's AddressSanitizer tool. When the loop ends, we know that offset lies beyond the end of the data in the buffer, so no Caching mode page was found. In theory it may be present, but the buffer size is limited to 512 bytes. Signed-off-by: Alan Stern st...@rowland.harvard.edu Reported-by: Dmitry Vyukov dvyu...@google.com CC: sta...@vger.kernel.org Reviewed-by: Paolo Bonzini pbonz...@redhat.com --- [as1709] drivers/scsi/sd.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) Index: usb-3.11/drivers/scsi/sd.c === --- usb-3.11.orig/drivers/scsi/sd.c +++ usb-3.11/drivers/scsi/sd.c @@ -2419,14 +2419,9 @@ sd_read_cache_type(struct scsi_disk *sdk } } - if (modepage == 0x3F) { - sd_printk(KERN_ERR, sdkp, No Caching mode page - present\n); - goto defaults; - } else if ((buffer[offset] 0x3f) != modepage) { - sd_printk(KERN_ERR, sdkp, Got wrong page\n); - goto defaults; - } + sd_printk(KERN_ERR, sdkp, No Caching mode page found\n); + goto defaults; + Page_found: if (modepage == 8) { sdkp-WCE = ((buffer[offset + 2] 0x04) != 0); -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Potential out-of-bounds access in drivers/scsi/sd.c
Il 04/09/2013 16:32, Alan Stern ha scritto: On Wed, 4 Sep 2013, Dmitry Vyukov wrote: Hi, We are working on a memory error detector AddressSanitizer for Linux kernel (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel), it can detect use-after-free and buffer-overflow errors. ... The code in sd_read_cache_type does the following: while (offset len) { ... } ... if ((buffer[offset] 0x3f) != modepage) { sd_printk(KERN_ERR, sdkp, Got wrong page\n); goto defaults; } When control leaves the while loop, offset = len, so buffer[offset] reads random garbage out-of-bounds. It the worst case it can lead to crash, or if (buffer[offset] 0x3f) happen to be == modepage, then it will read more garbage. Please help validate and triage this. The tool's output is correct. The patch below should fix it. Alan Stern Index: usb-3.11/drivers/scsi/sd.c === --- usb-3.11.orig/drivers/scsi/sd.c +++ usb-3.11/drivers/scsi/sd.c @@ -2419,7 +2419,7 @@ sd_read_cache_type(struct scsi_disk *sdk } } - if (modepage == 0x3F) { + if (modepage == 0x3F || offset + 2 = len) { sd_printk(KERN_ERR, sdkp, No Caching mode page present\n); goto defaults; If you do this, the buggy if becomes dead code (the loop above doesn't have any break, so you know that offset = len and the new condition is always true). So the patch does indeed prevent the bug, but the code can be simplified. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-scsi: Fix virtqueue affinity setup
vscsi-num_queues counts the number of request virtqueue which does not include the control and event virtqueue. It is wrong to subtract VIRTIO_SCSI_VQ_BASE from vscsi-num_queues. Reviewed-by: Paolo Bonzini pbonz...@redhat.com This patch fixes the following panic. (qemu) device_del scsi0 BUG: unable to handle kernel NULL pointer dereference at 0020 IP: [8179b29f] __virtscsi_set_affinity+0x6f/0x120 PGD 0 Oops: [#1] SMP Modules linked in: CPU: 0 PID: 659 Comm: kworker/0:1 Not tainted 3.11.0-rc2+ #1172 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: kacpi_hotplug _handle_hotplug_event_func task: 88007bee1cc0 ti: 88007bfe4000 task.ti: 88007bfe4000 RIP: 0010:[8179b29f] [8179b29f] __virtscsi_set_affinity+0x6f/0x120 RSP: 0018:88007bfe5a38 EFLAGS: 00010202 RAX: 0010 RBX: 880077fd0d28 RCX: 0050 RDX: RSI: 0246 RDI: RBP: 88007bfe5a58 R08: 880077f6ff00 R09: 0001 R10: 8143e673 R11: 0001 R12: 0001 R13: 880077fd0800 R14: R15: 88007bf489b0 FS: () GS:88007ea0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0020 CR3: 79f8b000 CR4: 06f0 Stack: 880077fd0d28 880077fd0800 0008 88007bfe5a78 8179b37d 88007bccc800 88007bccc800 88007bfe5a98 8179b3b6 88007bccc800 880077fd0d28 Call Trace: [8179b37d] virtscsi_set_affinity+0x2d/0x40 [8179b3b6] virtscsi_remove_vqs+0x26/0x50 [8179c7d2] virtscsi_remove+0x82/0xa0 [814cb6b2] virtio_dev_remove+0x22/0x70 [8167ca49] __device_release_driver+0x69/0xd0 [8167cb9d] device_release_driver+0x2d/0x40 [8167bb96] bus_remove_device+0x116/0x150 [81679936] device_del+0x126/0x1e0 [81679a06] device_unregister+0x16/0x30 [814cb889] unregister_virtio_device+0x19/0x30 [814cdad6] virtio_pci_remove+0x36/0x80 [81464ae7] pci_device_remove+0x37/0x70 [8167ca49] __device_release_driver+0x69/0xd0 [8167cb9d] device_release_driver+0x2d/0x40 [8167bb96] bus_remove_device+0x116/0x150 [81679936] device_del+0x126/0x1e0 [8145edfc] pci_stop_bus_device+0x9c/0xb0 [8145f036] pci_stop_and_remove_bus_device+0x16/0x30 [81474a9e] acpiphp_disable_slot+0x8e/0x150 [81474f6a] hotplug_event_func+0xba/0x1a0 [814906c8] ? acpi_os_release_object+0xe/0x12 [81475911] _handle_hotplug_event_func+0x31/0x70 [810b5333] process_one_work+0x183/0x500 [810b66e2] worker_thread+0x122/0x400 [810b65c0] ? manage_workers+0x2d0/0x2d0 [810bc5de] kthread+0xce/0xe0 [810bc510] ? kthread_freezable_should_stop+0x70/0x70 [81ca045c] ret_from_fork+0x7c/0xb0 [810bc510] ? kthread_freezable_should_stop+0x70/0x70 Code: 01 00 00 00 74 59 45 31 e4 83 bb c8 01 00 00 02 74 46 66 2e 0f 1f 84 00 00 00 00 00 49 63 c4 48 c1 e0 04 48 8b bc 0 3 10 02 00 00 48 8b 47 20 48 8b 80 d0 01 00 00 48 8b 40 50 48 85 c0 74 07 be RIP [8179b29f] __virtscsi_set_affinity+0x6f/0x120 RSP 88007bfe5a38 CR2: 0020 ---[ end trace 99679331a3775f48 ]--- CC: sta...@vger.kernel.org Signed-off-by: Asias He as...@redhat.com --- drivers/scsi/virtio_scsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 2168258..74b88ef 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -751,7 +751,7 @@ static void __virtscsi_set_affinity(struct virtio_scsi *vscsi, bool affinity) vscsi-affinity_hint_set = true; } else { - for (i = 0; i vscsi-num_queues - VIRTIO_SCSI_VQ_BASE; i++) + for (i = 0; i vscsi-num_queues; i++) virtqueue_set_affinity(vscsi-req_vqs[i].vq, -1); vscsi-affinity_hint_set = false; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] SG_IO filtering via sysfs and minimal whitelist
Il 25/06/2013 23:19, Paolo Bonzini ha scritto: Il 27/05/2013 15:50, Paolo Bonzini ha scritto: We've been running in circles for nine months now. Let's restart from the maintainer's suggestion, which was probably dismissed too quickly. This is still not a complete solution, because /dev/sgN does not have access to its queue object. Still, it can be a base for discussion. If accepted (in a complete form with access to the queue object for non-block devices), this series removes the need to fix the opcode conflicts as far as I'm concerned. We could just consider that a feature of CONFIG_BLK_DEV_SG_FILTER_MMC. Previously posted at https://lkml.org/lkml/2012/9/25/397 (short thread). Rebased just fine, this one is compile-tested only. Paolo RFC didn't get many Cs. Ping^2. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] SG_IO filtering via sysfs and minimal whitelist
Il 27/05/2013 15:50, Paolo Bonzini ha scritto: We've been running in circles for nine months now. Let's restart from the maintainer's suggestion, which was probably dismissed too quickly. This is still not a complete solution, because /dev/sgN does not have access to its queue object. Still, it can be a base for discussion. If accepted (in a complete form with access to the queue object for non-block devices), this series removes the need to fix the opcode conflicts as far as I'm concerned. We could just consider that a feature of CONFIG_BLK_DEV_SG_FILTER_MMC. Previously posted at https://lkml.org/lkml/2012/9/25/397 (short thread). Rebased just fine, this one is compile-tested only. Paolo RFC didn't get many Cs. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/9] virtio_scsi: Enable new EH timeout handler
Il 10/06/2013 03:40, Hannes Reinecke ha scritto: Signed-off-by: Hannes Reinecke h...@suse.de --- drivers/scsi/virtio_scsi.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 2168258..1efd219 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -654,6 +654,11 @@ static int virtscsi_abort(struct scsi_cmnd *sc) return virtscsi_tmf(vscsi, cmd); } +static enum blk_eh_timer_return virtscsi_timedout(struct scsi_cmnd *scmd) +{ + return scsi_abort_command(scmd); +} + static int virtscsi_target_alloc(struct scsi_target *starget) { struct virtio_scsi_target_state *tgt = @@ -683,6 +688,7 @@ static struct scsi_host_template virtscsi_host_template_single = { .queuecommand = virtscsi_queuecommand_single, .eh_abort_handler = virtscsi_abort, .eh_device_reset_handler = virtscsi_device_reset, + .eh_timed_out = virtscsi_timedout, .can_queue = 1024, .dma_boundary = UINT_MAX, @@ -699,6 +705,7 @@ static struct scsi_host_template virtscsi_host_template_multi = { .queuecommand = virtscsi_queuecommand_multi, .eh_abort_handler = virtscsi_abort, .eh_device_reset_handler = virtscsi_device_reset, + .eh_timed_out = virtscsi_timedout, .can_queue = 1024, .dma_boundary = UINT_MAX, Acked-by: Paolo Bonzini pbonz...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/9] virtio-scsi: Implement TMF timeout
Il 10/06/2013 03:40, Hannes Reinecke ha scritto: Any TMF might be take longer as expected, or not return at all. So we need to use 'wait_for_completion_timeout' when sending a TMF to protect against these cases. Cc: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Hannes Reinecke h...@suse.de --- drivers/scsi/virtio_scsi.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 1efd219..abfc684 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -30,6 +30,7 @@ #define VIRTIO_SCSI_MEMPOOL_SZ 64 #define VIRTIO_SCSI_EVENT_LEN 8 #define VIRTIO_SCSI_VQ_BASE 2 +#define VIRTIO_SCSI_TMF_TIMEOUT 30 /* Command queue element */ struct virtio_scsi_cmd { @@ -597,8 +598,10 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd *cmd) GFP_NOIO) 0) goto out; - wait_for_completion(comp); - if (cmd-resp.tmf.response == VIRTIO_SCSI_S_OK || + if (wait_for_completion_timeout(comp, + VIRTIO_SCSI_TMF_TIMEOUT * HZ) == 0) + ret = FAILED; + else if (cmd-resp.tmf.response == VIRTIO_SCSI_S_OK || cmd-resp.tmf.response == VIRTIO_SCSI_S_FUNCTION_SUCCEEDED) ret = SUCCESS; Acked-by: Paolo Bonzini pbonz...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 3/4] block: add back command filter modification via sysfs
This adds two new sysfs attributes to the queue kobject. The attributes allow reading and writing the whitelist of unprivileged commands. This is again a bit different from what was removed in commit 018e044 (block: get rid of queue-private command filter, 2009-06-26), but the idea is the same. One difference is that it does not use a separate kobject. Also, the supported sysfs syntax is a bit more expressive: it includes ranges, the ability to replace all of the filter with a single command, and does not force usage of hexadecimal. Since the names are different, and the old ones were anyway never really enabled, the different API is not a problem. Cc: linux-scsi@vger.kernel.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- Documentation/block/queue-sysfs.txt | 16 ++ block/Kconfig | 10 block/blk-sysfs.c | 41 ++ block/scsi_ioctl.c | 106 include/linux/blkdev.h | 21 +++ 5 files changed, 194 insertions(+) diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt index e54ac1d..1152b38 100644 --- a/Documentation/block/queue-sysfs.txt +++ b/Documentation/block/queue-sysfs.txt @@ -133,6 +133,22 @@ control of this block device to that new IO scheduler. Note that writing an IO scheduler name to this file will attempt to load that IO scheduler module, if it isn't already present in the system. +sgio_read_filter (RW) +- +When read, this file will display a list of SCSI commands (i.e. values of +the first byte of a CDB) that are always available for unprivileged users +(via /dev/bsg, /dev/sgNN, or ioctls such as SG_IO and CDROM_SEND_PACKET). +When written, the list of commands will be modified. By default it +will be completely replaced; writing a string that begins with '+' will +add new commands, and writing a string that begins with '-' will remove +some commands. Ranges of commands are supported, for example '0x00-0xff'. + +sgio_write_filter (RW) +-- +When read, this file will display a list of SCSI commands (i.e. values of +the first byte of a CDB) that are available for unprivileged users +when the block device is open for writing. Writing to this file behaves +as for sgio_read_filter. Jens Axboe jens.ax...@oracle.com, February 2009 diff --git a/block/Kconfig b/block/Kconfig index a7e40a7..e89d6a2 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -65,6 +65,16 @@ config BLK_DEV_BSG If unsure, say Y. +config BLK_DEV_SG_FILTER_SYSFS + bool Customizable SG_IO filters in sysfs + default y + help + Saying Y here will let you use sysfs to customize the list + of SCSI commands that are available (via /dev/sg, /dev/bsg or + ioctls such as SG_IO) to unprivileged users. + + If unsure, say Y. + config BLK_DEV_BSGLIB bool Block layer SG support v4 helper lib default n diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 2539f8d..3e4ffeb 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -429,6 +429,43 @@ static struct queue_sysfs_entry queue_random_entry = { .store = queue_store_random, }; +#ifdef CONFIG_BLK_DEV_SG_FILTER_SYSFS +static ssize_t queue_sgio_filter_read_show(struct request_queue *q, char *page) +{ + return blk_filter_show(q, page, READ); +} + +static ssize_t queue_sgio_filter_write_show(struct request_queue *q, +char *page) +{ + return blk_filter_show(q, page, WRITE); +} + +static ssize_t queue_sgio_filter_read_store(struct request_queue *q, + const char *page, size_t count) +{ + return blk_filter_store(q, page, count, READ); +} + +static ssize_t queue_sgio_filter_write_store(struct request_queue *q, +const char *page, size_t count) +{ + return blk_filter_store(q, page, count, WRITE); +} + +static struct queue_sysfs_entry queue_sgio_filter_read_entry = { + .attr = { .name = sgio_filter_read, .mode = S_IRUGO | S_IWUSR }, + .show = queue_sgio_filter_read_show, + .store = queue_sgio_filter_read_store, +}; + +static struct queue_sysfs_entry queue_sgio_filter_write_entry = { + .attr = {.name = sgio_filter_write, .mode = S_IRUGO | S_IWUSR }, + .show = queue_sgio_filter_write_show, + .store = queue_sgio_filter_write_store, +}; +#endif + static struct attribute *default_attrs[] = { queue_requests_entry.attr, queue_ra_entry.attr, @@ -452,6 +489,10 @@ static struct attribute *default_attrs[] = { queue_rq_affinity_entry.attr, queue_iostats_entry.attr, queue_random_entry.attr, +#ifdef CONFIG_BLK_DEV_SG_FILTER_SYSFS + queue_sgio_filter_read_entry.attr, + queue_sgio_filter_write_entry.attr, +#endif NULL, }; diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index
[RFC PATCH 1/4] block: add back queue-private command filter
The command filter used to be mutable via sysfs, but this was broken and backed out. Let's add it back. This patch adds the infrastructure for filtering, but unlike the old code this one just adds a pointer to request_queue, so as to make it cheaper in the majority of cases where no special filtering is desired. This is a partial (and massaged) revert of commit 018e044 (block: get rid of queue-private command filter, 2009-06-26). Cc: linux-scsi@vger.kernel.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/blk-sysfs.c | 2 ++ block/bsg.c| 2 +- block/scsi_ioctl.c | 17 + drivers/scsi/sg.c | 4 +++- include/linux/blkdev.h | 10 +- 5 files changed, 20 insertions(+), 15 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 5efc5a6..2539f8d 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -530,6 +530,8 @@ static void blk_release_queue(struct kobject *kobj) blkcg_exit_queue(q); + kfree(q-cmd_filter); + if (q-elevator) { spin_lock_irq(q-queue_lock); ioc_clear_queue(q); diff --git a/block/bsg.c b/block/bsg.c index 420a5a9..fe16cae 100644 --- a/block/bsg.c +++ b/block/bsg.c @@ -187,7 +187,7 @@ static int blk_fill_sgv4_hdr_rq(struct request_queue *q, struct request *rq, return -EFAULT; if (hdr-subprotocol == BSG_SUB_PROTOCOL_SCSI_CMD) { - if (blk_verify_command(rq-cmd, has_write_perm)) + if (blk_verify_command(q-cmd_filter, rq-cmd, has_write_perm)) return -EPERM; } else if (!capable(CAP_SYS_RAWIO)) return -EPERM; diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index a5ffcc9..22b925f 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -34,11 +34,6 @@ #include scsi/scsi_ioctl.h #include scsi/scsi_cmnd.h -struct blk_cmd_filter { - unsigned long read_ok[BLK_SCSI_CMD_PER_LONG]; - unsigned long write_ok[BLK_SCSI_CMD_PER_LONG]; -}; - static struct blk_cmd_filter blk_default_cmd_filter; /* Command group 3 is reserved and should never be used. */ @@ -197,17 +192,15 @@ static void blk_set_cmd_filter_defaults(struct blk_cmd_filter *filter) __set_bit(GPCMD_SET_READ_AHEAD, filter-write_ok); } -int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm) +int blk_verify_command(struct blk_cmd_filter *filter, + unsigned char *cmd, fmode_t has_write_perm) { - struct blk_cmd_filter *filter = blk_default_cmd_filter; - /* root can do any command. */ if (capable(CAP_SYS_RAWIO)) return 0; - /* if there's no filter set, assume we're filtering everything out */ if (!filter) - return -EPERM; + filter = blk_default_cmd_filter; /* Anybody who can open the device can do a read-safe command */ if (test_bit(cmd[0], filter-read_ok)) @@ -226,7 +219,7 @@ static int blk_fill_sghdr_rq(struct request_queue *q, struct request *rq, { if (copy_from_user(rq-cmd, hdr-cmdp, hdr-cmd_len)) return -EFAULT; - if (blk_verify_command(rq-cmd, mode FMODE_WRITE)) + if (blk_verify_command(q-cmd_filter, rq-cmd, mode FMODE_WRITE)) return -EPERM; /* @@ -473,7 +466,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode, if (in_len copy_from_user(buffer, sic-data + cmdlen, in_len)) goto error; - err = blk_verify_command(rq-cmd, mode FMODE_WRITE); + err = blk_verify_command(q-cmd_filter, rq-cmd, mode FMODE_WRITE); if (err) goto error; diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index df5e961..5f64202 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -218,11 +218,13 @@ static void sg_put_dev(Sg_device *sdp); static int sg_allow_access(struct file *filp, unsigned char *cmd) { struct sg_fd *sfp = filp-private_data; + struct request_queue *q = sfp-parentdp-device-request_queue; if (sfp-parentdp-device-type == TYPE_SCANNER) return 0; - return blk_verify_command(cmd, filp-f_mode FMODE_WRITE); + return blk_verify_command(q-cmd_filter, + cmd, filp-f_mode FMODE_WRITE); } static int get_exclude(Sg_device *sdp) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2fdb4a4..9a8434d 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -259,6 +259,11 @@ struct blk_queue_tag { #define BLK_SCSI_MAX_CMDS (256) #define BLK_SCSI_CMD_PER_LONG (BLK_SCSI_MAX_CMDS / (sizeof(long) * 8)) +struct blk_cmd_filter { + unsigned long read_ok[BLK_SCSI_CMD_PER_LONG]; + unsigned long write_ok[BLK_SCSI_CMD_PER_LONG]; +}; + struct queue_limits { unsigned long bounce_pfn; unsigned long seg_boundary_mask; @@ -437,6 +442,8 @@ struct
[RFC PATCH 2/4] scsi: create an all-zero filter for scanners
Using /dev/sg for scanners is blocked from unprivileged users. Reimplement this using customizable command filters, so that the sysfs knobs will work in this case too. Cc: linux-scsi@vger.kernel.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/scsi_scan.c | 8 +++- drivers/scsi/sg.c| 3 --- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 3e58b22..de5751b 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -783,13 +783,19 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result, } switch (sdev-type) { + case TYPE_SCANNER: + sdev-request_queue-cmd_filter = + kzalloc(sizeof(struct blk_cmd_filter), GFP_ATOMIC); + if (sdev-request_queue-cmd_filter == NULL) + return SCSI_SCAN_NO_RESPONSE; + /* fallthrough */ + case TYPE_RBC: case TYPE_TAPE: case TYPE_DISK: case TYPE_PRINTER: case TYPE_MOD: case TYPE_PROCESSOR: - case TYPE_SCANNER: case TYPE_MEDIUM_CHANGER: case TYPE_ENCLOSURE: case TYPE_COMM: diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 5f64202..f7f5c11 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -220,9 +220,6 @@ static int sg_allow_access(struct file *filp, unsigned char *cmd) struct sg_fd *sfp = filp-private_data; struct request_queue *q = sfp-parentdp-device-request_queue; - if (sfp-parentdp-device-type == TYPE_SCANNER) - return 0; - return blk_verify_command(q-cmd_filter, cmd, filp-f_mode FMODE_WRITE); } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 0/4] SG_IO filtering via sysfs and minimal whitelist
On Wed, 12 Sep 2012 09:05:41 +0100, James Bottomley wrote: This is why the whole filter thing was mutable via sysfs. That way the admin could set this up per device. It sounds like this is what you want to fix, rather than opening up more holes in an already leaky security apparatus. The ideal is that we would be much more restrictive by default and give root the ability to override this both globally and per-device to conform to whatever policy it has for the virtual environments. The patch which removed all of the sysfs pieces was: commit 018e0446890661504783f92388ecce7138c1566d Author: Jens Axboe jens.ax...@oracle.com Date: Fri Jun 26 16:27:10 2009 +0200 block: get rid of queue-private command filter So that's probably the place to start for putting it back properly. [https://lkml.org/lkml/2012/9/12/76] We've been running in circles for nine months now. Let's restart from the maintainer's suggestion, which was probably dismissed too quickly. This is still not a complete solution, because /dev/sgN does not have access to its queue object. Still, it can be a base for discussion. If accepted (in a complete form with access to the queue object for non-block devices), this series removes the need to fix the opcode conflicts as far as I'm concerned. We could just consider that a feature of CONFIG_BLK_DEV_SG_FILTER_MMC. Previously posted at https://lkml.org/lkml/2012/9/25/397 (short thread). Rebased just fine, this one is compile-tested only. Paolo Paolo Bonzini (4): block: add back queue-private command filter scsi: create an all-zero filter for scanners block: add back command filter modification via sysfs scsi: lock out SG_IO by default to unprivileged users Documentation/block/queue-sysfs.txt | 16 + block/Kconfig | 22 ++ block/blk-sysfs.c | 43 block/bsg.c | 2 +- block/scsi_ioctl.c | 131 +++- drivers/scsi/scsi_scan.c| 8 ++- drivers/scsi/sg.c | 7 +- include/linux/blkdev.h | 31 - 8 files changed, 238 insertions(+), 22 deletions(-) -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 4/4] scsi: lock out SG_IO by default to unprivileged users
Move policy on SG_IO access to userspace. Some level of checking at the kernel level (compared to just having an opt-out for the default whitelist) is needed because any userspace whitelist is too easily circumvented by doing ptrace on the program that has a file descriptor open. This would be a regression for those who are using Unix permissions, security modules or the device cgroup to confine programs. A meaningful whitelist can then be set by udev, for example. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- This is not yet usable, because sg devices do not have a link to the queue object. block/Kconfig | 12 block/scsi_ioctl.c | 10 ++ 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/block/Kconfig b/block/Kconfig index e89d6a2..67d1f68 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -75,6 +75,18 @@ config BLK_DEV_SG_FILTER_SYSFS If unsure, say Y. +config BLK_DEV_SG_FILTER_MMC + bool Backwards-compatible default SG_IO filter + default y + help + Saying Y here will let you send several commands (mostly based + on the MMC command set) by default to SCSI devices via /dev/sg, + /dev/bsg or ioctls such as SG_IO) even for unprivileged users. + Saying N will restrict the set of commands that unprivileged + users can send to the bare minimum. + + If unsure, say Y. + config BLK_DEV_BSGLIB bool Block layer SG support v4 helper lib default n diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index 5db39b5..079dfa2 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -113,9 +113,12 @@ static int sg_emulated_host(struct request_queue *q, int __user *p) static void blk_set_cmd_filter_defaults(struct blk_cmd_filter *filter) { memset(filter, 0, sizeof(*filter)); - - /* Basic read-only commands */ + __set_bit(INQUIRY, filter-read_ok); __set_bit(TEST_UNIT_READY, filter-read_ok); + __set_bit(REPORT_LUNS, filter-read_ok); + +#ifdef BLK_DEV_SG_FILTER_MMC + /* Basic read-only commands */ __set_bit(REQUEST_SENSE, filter-read_ok); __set_bit(READ_6, filter-read_ok); __set_bit(READ_10, filter-read_ok); @@ -125,14 +128,12 @@ static void blk_set_cmd_filter_defaults(struct blk_cmd_filter *filter) __set_bit(READ_DEFECT_DATA, filter-read_ok); __set_bit(READ_CAPACITY, filter-read_ok); __set_bit(READ_LONG, filter-read_ok); - __set_bit(INQUIRY, filter-read_ok); __set_bit(MODE_SENSE, filter-read_ok); __set_bit(MODE_SENSE_10, filter-read_ok); __set_bit(LOG_SENSE, filter-read_ok); __set_bit(START_STOP, filter-read_ok); __set_bit(GPCMD_VERIFY_10, filter-read_ok); __set_bit(VERIFY_16, filter-read_ok); - __set_bit(REPORT_LUNS, filter-read_ok); __set_bit(SERVICE_ACTION_IN, filter-read_ok); __set_bit(RECEIVE_DIAGNOSTIC, filter-read_ok); __set_bit(MAINTENANCE_IN, filter-read_ok); @@ -193,6 +194,7 @@ static void blk_set_cmd_filter_defaults(struct blk_cmd_filter *filter) __set_bit(GPCMD_LOAD_UNLOAD, filter-write_ok); __set_bit(GPCMD_SET_STREAMING, filter-write_ok); __set_bit(GPCMD_SET_READ_AHEAD, filter-write_ok); +#endif } int blk_verify_command(struct blk_cmd_filter *filter, -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command
Il 25/05/2013 06:14, James Bottomley ha scritto: On Fri, 2013-05-24 at 10:32 +0200, Paolo Bonzini wrote: Il 24/05/2013 10:03, James Bottomley ha scritto: Does anyone in the real world actually care about this bug? Yes, or I would move on and not waste so much time on this. Fine, so produce a simple fix for this bug which we can discuss that's not tied to this feature. Honestly, I have no idea how this is even possible. Really? It looks to me like a simple block on the commands for disk devices in the opcode switch would do it (with a corresponding change to sg.c:sg_allow_access). Which switch? What I can do is something like this in blk_verify_command: not in blk_verify_command: outside of it, in the three places it's used. In other words, if (blk_verify_command(...) || blk_verify_command_with_queue(q, ...)) We must have different taste. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
Il 25/05/2013 07:27, Christoph Hellwig ha scritto: On Fri, May 24, 2013 at 09:35:02PM -0700, James Bottomley wrote: I'll go along with this. I'm also wondering what the problem would be if we just allowed all commands on either CAP_SYS_RAWIO or opening the device for write, so we just defer to the filesystem permissions and restricted read only opens to the basic all device opcodes. I've been out of this area for a bit, but the problem used to be that you could send destructive commands to a partition. The right fix for that would be to not allow SG_IO on partitions at all, just wondering if anything would be broken by this. Linus wanted to keep that for CAP_SYS_RAWIO. We found two uses of SG_IO on partitions: zfs-fuse used SYNCHRONIZE CACHE; some proprietary driver used TEST UNIT READY. Really, the solution is to make the bitmaps configurable in userspace. It is no less secure than unpriv_sgio. Then the kernel can be configured at build-time to have either an MMC bitmap and a basic whitelist of a dozen commands. We can even avoid working around those few conflicting opcodes; if you're paranoid you can just configure your kernel right. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
Il 25/05/2013 09:11, Christoph Hellwig ha scritto: Linus wanted to keep that for CAP_SYS_RAWIO. We found two uses of SG_IO on partitions: zfs-fuse used SYNCHRONIZE CACHE; some proprietary driver used TEST UNIT READY. Really, the solution is to make the bitmaps configurable in userspace. It is no less secure than unpriv_sgio. Then the kernel can be configured at build-time to have either an MMC bitmap and a basic whitelist of a dozen commands. We can even avoid working around those few conflicting opcodes; if you're paranoid you can just configure your kernel right. Keep it simple. Allowing SG_IO for CAP_SYS_RAWIO probably is fine, allowing it for permissions only clearly isn't. All the per-command filetering is just complete bullshit and the kind of bloat that eventually will make Linux unmaintainable. What I'm proposing is: - by default, only allow INQUIRY / REPORT LUNS / TEST UNIT READY. Keep the old whitelist available for backwards-compatibility, configurable at build-time. - let CAP_SYS_ADMIN users set a different per-device whitelist. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
Il 25/05/2013 10:37, Tejun Heo ha scritto: Hey, James. On Fri, May 24, 2013 at 09:35:02PM -0700, James Bottomley wrote: Well, I'd actually much prefer disabling CDB whitelisting for all !MMC devices if at all possible. I'll go along with this. I'm also wondering what the problem would be Don't think we can. It'd be a behavior change clearly visible to userland at this point. We can (and even for MMC) if it is a build-time configuration knob. It would satisfy those people who want the CVE fixed, as long as userspace gets some configurability. * Fix the security bug. I don't really care how it's fixed as long as the amount of whitelisted commands goes down not up. * It's not like we can remove the filter for !MMC devices at this point, so I think it makes sense to make it per-class so that we can *remove* commands which aren't relevant for the device type. Also, we probably wanna add read blinking comment yelling that no further commands should be added. * Merge the patch to give out SG_IO access along with write access, so the use cases which want to give out SG_IO access can do so explicitly and be fully responsible for the device. This makes sense to me. If one wants to be allowed to issue raw commands to the hardware, one takes the full responsibility. That's not possible; it would make it impossible to do things like using a privileged helper to open the file and passing it back via SCM_RIGHTS to an unprivileged program (running as the user). This is the ptrace attack that you mentioned. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
Il 25/05/2013 14:48, Tejun Heo ha scritto: * Merge the patch to give out SG_IO access along with write access, so the use cases which want to give out SG_IO access can do so explicitly and be fully responsible for the device. This makes sense to me. If one wants to be allowed to issue raw commands to the hardware, one takes the full responsibility. That's not possible; it would make it impossible to do things like using a privileged helper to open the file and passing it back via SCM_RIGHTS to an unprivileged program (running as the user). This is the ptrace attack that you mentioned. I have no idea what you're talking about. I'm describing the same thing you implemented and posted. Ok, I think we need a rewind. I'll try to post what I mean next week. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
Il 24/05/2013 03:44, Tejun Heo ha scritto: On Thu, May 23, 2013 at 11:47:25AM +0200, Paolo Bonzini wrote: No no, I'm not talking about it not working for the users - it's just passing the commands, it of course works. I'm doubting about it being a worthy security isolation layer. cdb filtering (of any form really) has always been something on the border. It always was something we got stuck with due to lack of other immediate options. I understood correctly then. :) I agree it's not the best, but it's not completely broken either. It has bugs, and that's what these patches try to fix. The same filtering table being applied to different classes of hardware is a software bug, but my point is that the practive essentially entrusts non-insignificant part of security enforcement to the hardware itself. The variety of hardware in question is very wide and significant portion has historically been known to be flaky. Unproven theory, and contradicted by actual practice. Bugs are more common in the handling of borderline conditions, not in the handling of unimplemented commands. If you want to be secure aginst buggy firmware, the commands you have to block are READ and WRITE. Check out the list of existing USB quirks. I'm wondering whether combining 3 into 4 would be good enough. No, it wouldn't. I learnt it the hard way (by having a patch nacked :)) at http://thread.gmane.org/gmane.linux.kernel/1311887. Of course you can't do that by adding dangerous commands to the existing filtering table which is allowed by default. I'm saying count me out knob could be good enough. Neither is perfect but at least the latter doesn't affect the default cases. You need to allow more commands. The count-me-out knob allows all commands. You cannot always allow all commands. Ergo, you cannot always use the count-me-out knob. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command
Il 24/05/2013 09:36, James Bottomley ha scritto: On Thu, 2013-05-23 at 15:58 +0200, Paolo Bonzini wrote: Adjust the blk_verify_command function to let it look at per-queue data. This will be done in the next patch. This is not a bug fix. This is an enabler for your complex and to my mind dubious rework of the SG_IO command filter. I'm running out of ways to say please don't mix bug fixes with features, because this redesignating of the original patch set as part 1 and parts 2,3 doesn't satisfy the requirement. I made it part 1/2/3 because parts 2/3 depend on part 1. It makes dependency tracking easier, at least in my mind. If you have another solution that does not require passing request_queue to blk_verify_command, I'm all ears. Does anyone in the real world actually care about this bug? Yes, or I would move on and not waste so much time on this. Paolo because if not perhaps we can just remove the confusion and consider this as a feature set. If there's someone who actually cares, please lets just do the bug fix first and argue about the feature later. James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command
Il 24/05/2013 09:50, James Bottomley ha scritto: On Fri, 2013-05-24 at 09:43 +0200, Paolo Bonzini wrote: Il 24/05/2013 09:36, James Bottomley ha scritto: On Thu, 2013-05-23 at 15:58 +0200, Paolo Bonzini wrote: Adjust the blk_verify_command function to let it look at per-queue data. This will be done in the next patch. This is not a bug fix. This is an enabler for your complex and to my mind dubious rework of the SG_IO command filter. I'm running out of ways to say please don't mix bug fixes with features, because this redesignating of the original patch set as part 1 and parts 2,3 doesn't satisfy the requirement. I made it part 1/2/3 because parts 2/3 depend on part 1. It makes dependency tracking easier, at least in my mind. If you have another solution that does not require passing request_queue to blk_verify_command, I'm all ears. That's a circular response that doesn't answer the question. The actual question is: what is simple fix for the bug that isn't entangled with enabling the SG_IO per device type whitelist feature. Does anyone in the real world actually care about this bug? Yes, or I would move on and not waste so much time on this. Fine, so produce a simple fix for this bug which we can discuss that's not tied to this feature. Honestly, I have no idea how this is even possible. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
Il 24/05/2013 10:02, Tejun Heo ha scritto: On Fri, May 24, 2013 at 4:13 PM, Paolo Bonzini pbonz...@redhat.com wrote: The same filtering table being applied to different classes of hardware is a software bug, but my point is that the practive essentially entrusts non-insignificant part of security enforcement to the hardware itself. The variety of hardware in question is very wide and significant portion has historically been known to be flaky. Unproven theory, and contradicted by actual practice. Bugs are more common in the handling of borderline conditions, not in the handling of unimplemented commands. If you want to be secure aginst buggy firmware, the commands you have to block are READ and WRITE. Check out the list of existing USB quirks. Well, I'd actually much prefer disabling CDB whitelisting for all !MMC devices if at all possible. You're basically arguing that because what we have is already broken, it should be okay to break it further. Also, RW commands having more quirks doesn't necessarily indicate that they tend to be more broken. They just get hammered on a lot in various ways so problems on those commands tend to be more noticeable. I agree intuition may not count, and it's perfectly possible that firmware writers forgot a break; or put the wrong location in a jump table, so that unimplemented commands give interesting results. However, the _fact_ is that this might happen anyway with the buttload of commands that are already enabled by the whitelist and that most disks will never implement. You need to allow more commands. The count-me-out knob allows all commands. You cannot always allow all commands. Ergo, you cannot always use the count-me-out knob. The thing is that both approaches aren't perfect here so you can make similar type of argument from the other side. If the system wants to give out raw hardware access to VMs, requiring it to delegate the device fully could be reasonable. No, it is not unfortunately. Allowing to do discards is one thing, allowing to disrupt the settings of a SAN is another. You can only delegate the device fully in these cases: (a) of course, if the guest is trusted; (b) if QEMU is running as a confined user, then you can get by with a userspace whitelist. (Otherwise you're just a ptrace away from arbitrary access, as you pointed out). Unfortunately, there are _real_ problems that this patches fix, such as log spews due to blocked commands, and these happen even if neither of the above conditions is true. Is there anything else I can do? Sure, I can check for the presence of the whitelist and hack the VPD pages to hide features that I know the whitelist will block. Is it the right thing to do? In my opinion no. It makes no sense to have raw device access _disable_ features compared to emulation. Not ideal but widening direct command access by default is pretty nasty too. I actually agree, and that's why I'm trying to balance the widening and restricting. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command
Il 24/05/2013 10:03, James Bottomley ha scritto: Does anyone in the real world actually care about this bug? Yes, or I would move on and not waste so much time on this. Fine, so produce a simple fix for this bug which we can discuss that's not tied to this feature. Honestly, I have no idea how this is even possible. Really? It looks to me like a simple block on the commands for disk devices in the opcode switch would do it (with a corresponding change to sg.c:sg_allow_access). Which switch? What I can do is something like this in blk_verify_command: if (q-sgio_type == TYPE_ROM) return 0; if (rq-cmd[0] == 0xA4) return -EPERM; if (!is_write (req-cmd[0] == ... || rq-cmd[0] == ...)) return -EPERM; But then the particular patch you're replying to is still necessary, and you're slowing down blk_verify_command. It may be fine for stable and -rc, but IMHO it calls for a better implementation in 3.11. (Besides, I did it like this because it is what Tejun suggested). Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
Il 24/05/2013 11:07, Tejun Heo ha scritto: On Fri, May 24, 2013 at 5:31 PM, Paolo Bonzini pbonz...@redhat.com wrote: I agree intuition may not count, and it's perfectly possible that firmware writers forgot a break; or put the wrong location in a jump table, so that unimplemented commands give interesting results. It's not just unimplemented commands. Exposing any new command exposes its borderline problems together with it. For commands that are used by Linux already, the right way to fix the problems is not obscuring the commands from userspace's view. You can hit the same problems with ioctls or even with normal operation of the device. However, the _fact_ is that this might happen anyway with the buttload of commands that are already enabled by the whitelist and that most disks will never implement. Yes and that's why the whitelist is generally frowned upon. It's inherently fragile. These devices simply aren't designed and implemented to be exposed to lesser security domains directly. It's true that it's already kinda broken that way but as I wrote before it's a vicious cycle and we don't wanna keep building on top of it. This expansion is gonna increase the usage of whitelisting which will in turn attract further use cases which are likely to call for even more expansion. And prohibiting the extension of whitelists is gonna increase the usage of unpriv_sgio and less-secure userspace whitelists. Anvil, meet hammer. The thing is that both approaches aren't perfect here so you can make similar type of argument from the other side. If the system wants to give out raw hardware access to VMs, requiring it to delegate the device fully could be reasonable. No, it is not unfortunately. Allowing to do discards is one thing, allowing to disrupt the settings of a SAN is another. You can only delegate the device fully in these cases: (a) of course, if the guest is trusted; (b) if QEMU is running as a confined user; If the bulk of filtering can be solved with userland whitelisting as a confined user, it should be possible to resolve peripheral problems like log messages in simpler way, no? Can you please elaborate on the log message problem? Who's spewing those messages? For example: if (bdev_write_same(bdev)) { unsigned char bdn[BDEVNAME_SIZE]; if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, ZERO_PAGE(0))) return 0; bdevname(bdev, bdn); pr_err(%s: WRITE SAME failed. Manually zeroing.\n, bdn); } return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); The device exposes the ability to zero out LUN blocks, but the command is not whitelisted and it fails. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command
Il 24/05/2013 10:32, Paolo Bonzini ha scritto: Il 24/05/2013 10:03, James Bottomley ha scritto: Does anyone in the real world actually care about this bug? Yes, or I would move on and not waste so much time on this. Fine, so produce a simple fix for this bug which we can discuss that's not tied to this feature. Honestly, I have no idea how this is even possible. Really? It looks to me like a simple block on the commands for disk devices in the opcode switch would do it (with a corresponding change to sg.c:sg_allow_access). Which switch? What I can do is something like this in blk_verify_command: if (q-sgio_type == TYPE_ROM) return 0; if (rq-cmd[0] == 0xA4) return -EPERM; if (!is_write (req-cmd[0] == ... || rq-cmd[0] == ...)) return -EPERM; But then the particular patch you're replying to is still necessary, and you're slowing down blk_verify_command. It may be fine for stable and -rc, but IMHO it calls for a better implementation in 3.11. Ok, so I did a patch along these lines. And it's just as ugly as everything else that I've been posting in these threads. Yes, perhaps it has a redeeming grace in that it is fine for = 3.10, but that's it. Because actually I agree with you. The rework of the SG_IO command filter _is_ dubious to say the least; 300 lines of default, immutable policy don't belong in the kernel. So why am I posting this crap? Because I have to work around the nack of the generic sysfs bitmap patches, which would have beatifully solved all of this. In fact, you had proposed that approach. I posted it in September 2012. Then (as usual) silence for one month until it was quickly dismissed by Tejun. *You and Jens* failed to review patches, and this rathole is what that led to. It's unpleasant for me as it is for everyone else. Yes, you and Jens are busy, we all are. But *you and Jens* are the maintainers. Please make a decision instead of drawing straws, so that we can all go back to our regular business. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
Il 23/05/2013 00:17, Tejun Heo ha scritto: Then let's make it fit the use case better. I really can't see much point in crafting the cdb filter when you basically have to entrust the device to the user anyway. Let's either trust the user with the device or not. I'm very doubtful that the filtered access via SG_IO can be reliable or secure enough. Let's please avoid extending a broken thing. Sorry to say that, but I'm very doubtful that... is just conspiracy theory. It is not broken. I'm not _that_ clueless, if it were broken I wouldn't have had users use it in production. One more thing, is it really necessary to have finer granularity than provided by file permissions? What would be the use case? Do you expect to have multiple - two - differing levels of access with and without SG_IO? No, I don't. I want four levels: 1) no access; 2) read-only access; 3) read-write whitelisted access; 4) generic access; but it's indeed fine to assume that 3 and 4 will never be given together to the same disk. The important point is that 2 and 3 should not require any privileges except for opening the file. With the opt-out knob, you still need a long-lived privileged process in order to set the knob back to no access, and that's undesirable. Long-lived privileged processes can be SIGKILLed and leave things open for misuse; instead, if I need something privileged I want to confine it to a helper that opens the file and passes back the file descriptor. for the same user, it's pointless to give out SG_IO access to processes while denying for other processes. As long as ptrace can be attached, hijacking such fd is easy. Making it per-device should be suitable enough, no? Yes, and that's what I did. Such hijacking is why a kernel whitelist is necessary in untrusted cases (i.e. you cannot just implement it in userspace). There are many use cases, I listed some in my reply to Martin. Sometimes you have trust over the guest and can use count-me-out. But in some cases you don't, and yet the current whitelist is not enough (e.g. tapes). Can you elaborate? Why can't a tape device be entrusted to the user? In general, any device may or may not be entrusted to the user. In this respect, tapes or disks have no difference. But while the current whitelist is almost okay for disks, it is not usable for tapes. Too many essential commands are missing; this is why extending the whitelist to cover other device types is important for me. And since you don't want to open new commands to all classes with no distinction (which I understand), the only choice is per-class whitelists. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
Il 23/05/2013 11:02, Tejun Heo ha scritto: On Thu, May 23, 2013 at 09:45:42AM +0200, Paolo Bonzini wrote: Il 23/05/2013 00:17, Tejun Heo ha scritto: Then let's make it fit the use case better. I really can't see much point in crafting the cdb filter when you basically have to entrust the device to the user anyway. Let's either trust the user with the device or not. I'm very doubtful that the filtered access via SG_IO can be reliable or secure enough. Let's please avoid extending a broken thing. Sorry to say that, but I'm very doubtful that... is just conspiracy theory. It is not broken. I'm not _that_ clueless, if it were broken I wouldn't have had users use it in production. No no, I'm not talking about it not working for the users - it's just passing the commands, it of course works. I'm doubting about it being a worthy security isolation layer. cdb filtering (of any form really) has always been something on the border. It always was something we got stuck with due to lack of other immediate options. I understood correctly then. :) I agree it's not the best, but it's not completely broken either. It has bugs, and that's what these patches try to fix. One more thing, is it really necessary to have finer granularity than provided by file permissions? What would be the use case? Do you expect to have multiple - two - differing levels of access with and without SG_IO? No, I don't. I want four levels: 1) no access; 2) read-only access; 3) read-write whitelisted access; 4) generic access; but it's indeed fine to assume that 3 and 4 will never be given together to the same disk. The important point is that 2 and 3 should not require any privileges except for opening the file. I'm wondering whether combining 3 into 4 would be good enough. No, it wouldn't. I learnt it the hard way (by having a patch nacked :)) at http://thread.gmane.org/gmane.linux.kernel/1311887. With the opt-out knob, you still need a long-lived privileged process in order to set the knob back to no access, and that's undesirable. Long-lived privileged processes can be SIGKILLed and leave things open for misuse; instead, if I need something privileged I want to confine it to a helper that opens the file and passes back the file descriptor. Hmmm? Somebody has to give out the access rights anyway, be it a udev rule or polkit. While not having to depend on them could be nice, I don't think involving userland is a big deal. It's already involved in closely related capacities anyway. Polkit need not do anything to give out the access rights, it only has to just confirm the credentials. A helper setgid-disk can consult polkit, open the file, pass the file descriptor back, and exit. We already do something similar for tap devices. There are many use cases, I listed some in my reply to Martin. Sometimes you have trust over the guest and can use count-me-out. But in some cases you don't, and yet the current whitelist is not enough (e.g. tapes). Can you elaborate? Why can't a tape device be entrusted to the user? I was curious why they wouldn't be able to use count-me-out knob. The white list was nasty but we did it anyway because there were some horrible commands which, ISTR, could even affect other devices sharing the bus and at least at the beginning the list of commands which should be allowed were pretty compact. Exactly. Though these commands do not really affect other devices sharing the bus, they affect other initiators trying to use the device. And these commands are generic, so they apply to disks as well as tapes. Unfortunately, a purely userspace whitelist would be too easy to circumvent. Whitelisting those commands is likely to be mostly harmless most of the time but I would be surprised if there aren't devices out in the wild which can be bricked / affected adversely with carefully crafted commands within the allowed cdbs. The level of security isolation which can be provided by command code filtering is somewhat flimsy, which is why cdb filtering was always seen as a kludge, which in turn is why there's reluctance against extending it to more use cases. Yes, I understand that. But I don't believe it is a serious concern. In the case of tapes, for example, you're not talking about 10 dollar USB pendrives whose firmware was written in fire-and-forget mode. Firmware bugs would be updated by the manufacturers. I cannot exclude there will _never_ be a bug like this, of course. But even if there will be one, IMHO it would be exceptional enough that we can afford fixing it with a per-device quirk. Scanners have never had any CDB filtering done on them; I searched for this kind of bug, and I couldn't find any report. What I am trying to do is to reach a compromise. The one I'm proposing is to forbid reserved or vendor-specific commands, while at the same time opening the doors on more standardized commands. Paolo -- To unsubscribe from
[PATCH v3 part1 3/4] sg_io: use different default filters for each device class
Store the filters in a 256-entry array, and pick an appropriate filter for SCSI devices. Apart from SCSI disks, SG_IO is supported for CCISS, ide-floppy and virtio-blk devices; TYPE_DISK (which is zero, i.e. the default) is more appropriate for these devices than TYPE_ROM. However, all lists are still the same, so there is no semantic change in this patch. Cc: sta...@gnu.org Cc: James E.J. Bottomley jbottom...@parallels.com Cc: linux-scsi@vger.kernel.org Cc: Jens Axboe ax...@kernel.dk Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/scsi_ioctl.c | 14 +- drivers/scsi/scsi_scan.c | 2 ++ include/linux/blkdev.h | 2 +- 3 files changed, 8 insertions(+), 10 deletions(-) diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index 21ddf17..6e18156 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -35,8 +35,8 @@ #include scsi/scsi_cmnd.h struct blk_cmd_filter { - unsigned long read_ok[BLK_SCSI_CMD_PER_LONG]; - unsigned long write_ok[BLK_SCSI_CMD_PER_LONG]; + u32 read_ok[BLK_SCSI_MAX_CMDS]; + u32 write_ok[BLK_SCSI_MAX_CMDS]; }; static struct blk_cmd_filter blk_default_cmd_filter; @@ -117,7 +117,7 @@ static int sg_emulated_host(struct request_queue *q, int __user *p) static void blk_set_cmd_filter_defaults(struct blk_cmd_filter *filter) { #define sgio_bitmap_set(cmd, rw) \ - __set_bit((cmd), filter-rw##_ok) + filter-rw##_ok[(cmd)] = ~0; /* Basic read-only commands */ sgio_bitmap_set(TEST_UNIT_READY, read); @@ -210,16 +210,12 @@ int blk_verify_command(struct request_queue *q, if (capable(CAP_SYS_RAWIO)) return 0; - /* if there's no filter set, assume we're filtering everything out */ - if (!filter) - return -EPERM; - /* Anybody who can open the device can do a read-safe command */ - if (test_bit(cmd[0], filter-read_ok)) + if (filter-read_ok[cmd[0]] (1 q-sgio_type)) return 0; /* Write-safe commands require a writable open */ - if (test_bit(cmd[0], filter-write_ok) has_write_perm) + if (has_write_perm filter-write_ok[cmd[0]] (1 q-sgio_type)) return 0; return -EPERM; diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 3e58b22..86940f3 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -782,6 +782,8 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result, sdev-removable = (inq_result[1] 0x80) 7; } + sdev-request_queue-sgio_type = sdev-type; + switch (sdev-type) { case TYPE_RBC: case TYPE_TAPE: diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 4fca347..5e18969 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -257,7 +257,6 @@ struct blk_queue_tag { }; #define BLK_SCSI_MAX_CMDS (256) -#define BLK_SCSI_CMD_PER_LONG (BLK_SCSI_MAX_CMDS / (sizeof(long) * 8)) struct queue_limits { unsigned long bounce_pfn; @@ -410,6 +409,7 @@ struct request_queue { */ unsigned intsg_timeout; unsigned intsg_reserved_size; + unsigned char sgio_type; int node; #ifdef CONFIG_BLK_DEV_IO_TRACE struct blk_trace*blk_trace; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html