Re: [PATCH v3 4/4] scsi: provide UAPI version of scsi/sg.h and scsi/scsi_ioctl.h
On 09/25/2015 02:28 AM, Paolo Bonzini wrote: Provide a UAPI version of the header in the kernel, making it easier for interested projects to use an up-to-date version of the header. The new headers are placed under uapi/linux/ so as not to conflict with the glibc-provided headers in /usr/include/scsi. /dev/sgN default values are implementation aspects, and are moved to drivers/scsi/sg.c instead (together with e.g. SG_ALLOW_DIO_DEF). However, SG_SCATTER_SZ is used by Wine so it is kept in linux/sg.h SG_MAX_QUEUE could also be useful. struct scsi_ioctl_command and struct scsi_idlun used to be under "#ifdef __KERNEL__", but they are actually useful for userspace as well. Add them to the new header. Cc: James Bottomley <jbottom...@parallels.com> Cc: Christoph Hellwig <h...@lst.de> Cc: linux-s...@vger.kernel.org Cc: Bart Van Assche <bart.vanass...@sandisk.com> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] vhost: Reduce vhost_work_flush() wakeup latency
On 08/15/13 03:30, Asias He wrote: On Wed, Aug 14, 2013 at 08:58:25PM +0300, Michael S. Tsirkin wrote: On Wed, Aug 14, 2013 at 05:22:36PM +0200, Bart Van Assche wrote: On 08/14/13 13:37, Michael S. Tsirkin wrote: On Wed, Aug 14, 2013 at 09:02:32AM +0200, Bart Van Assche wrote: If the TIF_NEED_RESCHED task flag is set, wake up any vhost_work_flush() waiters before rescheduling instead of after rescheduling. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Asias He as...@redhat.com Why exactly? It's not like flush needs to be extra fast ... I'm not worried about how fast a flush is processed either. But I think this patch is a nice cleanup of the vhost_worker() function. It eliminates the uninitialized_var() construct and moves the assignment of done_seq below the read of queue_seq. Bart. I'm not worried about uninitialized_var - it's just a compiler bug. done_seq below read is nice, but this is at the cost of sprinkling lock/unlock, lock/unlock all over the code. Maybe we can come up with something that doesn't have this property. The extra locking introduced here does not look good to me neither. Please note that although there are additional locking statements in the code, there are no additional locking calls at runtime. Bart. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] vhost_worker fixes
There are two patches in this series: * A patch that reduces vhost_work_flush() latency on busy systems. * A patch that avoids that vhost_work_flush() returns early. I found these issues via code inspection. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] vhost: Reduce vhost_work_flush() wakeup latency
If the TIF_NEED_RESCHED task flag is set, wake up any vhost_work_flush() waiters before rescheduling instead of after rescheduling. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Asias He as...@redhat.com --- drivers/vhost/vhost.c | 42 +++--- 1 file changed, 19 insertions(+), 23 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index e58cf00..e7ffc10 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -201,47 +201,43 @@ static void vhost_vq_reset(struct vhost_dev *dev, static int vhost_worker(void *data) { struct vhost_dev *dev = data; - struct vhost_work *work = NULL; - unsigned uninitialized_var(seq); + struct vhost_work *work; + unsigned seq; mm_segment_t oldfs = get_fs(); set_fs(USER_DS); use_mm(dev-mm); - for (;;) { + spin_lock_irq(dev-work_lock); + while (!kthread_should_stop()) { /* mb paired w/ kthread_stop */ set_current_state(TASK_INTERRUPTIBLE); - - spin_lock_irq(dev-work_lock); - if (work) { - work-done_seq = seq; - if (work-flushing) - wake_up_all(work-done); - } - - if (kthread_should_stop()) { - spin_unlock_irq(dev-work_lock); - __set_current_state(TASK_RUNNING); - break; - } if (!list_empty(dev-work_list)) { work = list_first_entry(dev-work_list, struct vhost_work, node); list_del_init(work-node); seq = work-queue_seq; - } else - work = NULL; - spin_unlock_irq(dev-work_lock); + spin_unlock_irq(dev-work_lock); - if (work) { __set_current_state(TASK_RUNNING); work-fn(work); - if (need_resched()) - schedule(); - } else + + spin_lock_irq(dev-work_lock); + work-done_seq = seq; + if (work-flushing) + wake_up_all(work-done); + } + if (list_empty(dev-work_list) || need_resched()) { + spin_unlock_irq(dev-work_lock); + schedule(); + spin_lock_irq(dev-work_lock); + } } + spin_unlock_irq(dev-work_lock); + + __set_current_state(TASK_RUNNING); unuse_mm(dev-mm); set_fs(oldfs); return 0; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] vhost: Avoid that vhost_work_flush() returns early
If two or more items are queued on dev-work_list before vhost_worker() starts processing these then the value of work-done_seq will be set to the sequence number of a work item that has not yet been processed. Avoid this by letting vhost_worker() count the number of items that have already been processed. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Asias He as...@redhat.com --- drivers/vhost/vhost.c |5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index e7ffc10..11d668a 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -202,7 +202,7 @@ static int vhost_worker(void *data) { struct vhost_dev *dev = data; struct vhost_work *work; - unsigned seq; + unsigned seq = 0; mm_segment_t oldfs = get_fs(); set_fs(USER_DS); @@ -216,14 +216,13 @@ static int vhost_worker(void *data) work = list_first_entry(dev-work_list, struct vhost_work, node); list_del_init(work-node); - seq = work-queue_seq; spin_unlock_irq(dev-work_lock); __set_current_state(TASK_RUNNING); work-fn(work); spin_lock_irq(dev-work_lock); - work-done_seq = seq; + work-done_seq = ++seq; if (work-flushing) wake_up_all(work-done); } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] vhost: Avoid that vhost_work_flush() returns early
On 08/14/13 13:46, Michael S. Tsirkin wrote: I'm confused by this explanation. done_seq is set here: if (work) { work-done_seq = seq; if (work-flushing) wake_up_all(work-done); } and work is set here: if (!list_empty(dev-work_list)) { work = list_first_entry(dev-work_list, struct vhost_work, node); list_del_init(work-node); seq = work-queue_seq; } this work is processed on the next line: if (work) { __set_current_state(TASK_RUNNING); work-fn(work); if (need_resched()) schedule(); } so how do we end up with a sequence of a work item that isn't processed? I was wondering what would happen if a work item got requeued before done_seq is set. I think you are right and this should work fine. Bart. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] vhost: Reduce vhost_work_flush() wakeup latency
On 08/14/13 13:37, Michael S. Tsirkin wrote: On Wed, Aug 14, 2013 at 09:02:32AM +0200, Bart Van Assche wrote: If the TIF_NEED_RESCHED task flag is set, wake up any vhost_work_flush() waiters before rescheduling instead of after rescheduling. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Asias He as...@redhat.com Why exactly? It's not like flush needs to be extra fast ... I'm not worried about how fast a flush is processed either. But I think this patch is a nice cleanup of the vhost_worker() function. It eliminates the uninitialized_var() construct and moves the assignment of done_seq below the read of queue_seq. Bart. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] vhost: Avoid that vhost_work_flush() locks up
Wake up work-done waiters even if the TIF_NEED_RESCHED task flag has been set. This patch fixes a regression introduced in commit d550dda (kernel v3.4). Signed-off-by: Bart Van Assche bvanass...@acm.org Reference: https://bugzilla.kernel.org/show_bug.cgi?id=60505 Cc: Michael S. Tsirkin m...@redhat.com Cc: Asias He as...@redhat.com Cc: Nadav Har'El n...@math.technion.ac.il Cc: Abel Gordon ab...@il.ibm.com Cc: sta...@vger.kernel.org # v3.4+ --- drivers/vhost/vhost.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 60aa5ad..cd544ae 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -227,8 +227,16 @@ static int vhost_worker(void *data) if (work) { __set_current_state(TASK_RUNNING); work-fn(work); - if (need_resched()) + if (need_resched()) { + spin_lock_irq(dev-work_lock); + work-done_seq = seq; + if (work-flushing) + wake_up_all(work-done); + spin_unlock_irq(dev-work_lock); + + work = NULL; schedule(); + } } else schedule(); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: Avoid that vhost_work_flush() locks up
On 07/05/13 10:36, Bart Van Assche wrote: Wake up work-done waiters even if the TIF_NEED_RESCHED task flag has been set. This patch fixes a regression introduced in commit d550dda (kernel v3.4). Reference: https://bugzilla.kernel.org/show_bug.cgi?id=60505 (replying to my own e-mail) Please ignore this patch. Although it might help to wake up vhost_work_flush() earlier, the patch description does not match the patch itself and the patch does not fix the vhost_work_flush() lockup. Bart. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V7 4/5] virtio-scsi: introduce multiqueue support
On 03/23/13 12:28, Wanlong Gao wrote: +static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi, + struct virtio_scsi_target_state *tgt) +{ + struct virtio_scsi_vq *vq; + unsigned long flags; + u32 queue_num; + + spin_lock_irqsave(tgt-tgt_lock, flags); + + /* +* The memory barrier after atomic_inc_return matches +* the smp_read_barrier_depends() in virtscsi_req_done. +*/ + if (atomic_inc_return(tgt-reqs) 1) + vq = ACCESS_ONCE(tgt-req_vq); + else { + queue_num = smp_processor_id(); + while (unlikely(queue_num = vscsi-num_queues)) + queue_num -= vscsi-num_queues; + + tgt-req_vq = vq = vscsi-req_vqs[queue_num]; + } + + spin_unlock_irqrestore(tgt-tgt_lock, flags); + return vq; +} Is there any reason why the smp_processor_id() % vscsi-num_queues computation in virtscsi_pick_vq() has been implemented as a loop instead of as an arithmetic operation ? If so, I would appreciate it if that could be explained in a comment. Thanks, Bart. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver
On 07/04/12 04:24, Nicholas A. Bellinger wrote: +/* Fill in status and signal that we are done processing this command + * + * This is scheduled in the vhost work queue so we are called with the owner + * process mm and can access the vring. + */ +static void vhost_scsi_complete_cmd_work(struct vhost_work *work) +{ As far as I can see vhost_scsi_complete_cmd_work() runs on the context of a work queue kernel thread and hence doesn't have an mm context. Did I misunderstand something ? Bart. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
On 07/05/12 01:52, Nicholas A. Bellinger wrote: fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block 25 Write / 75 Read | ~15K | ~45K | ~70K 75 Write / 25 Read | ~20K | ~55K | ~60K These numbers are interesting. To me these numbers mean that there is a huge performance bottleneck in the virtio-scsi-raw storage path. Why is the virtio-scsi-raw bandwidth only one third of the bare-metal raw block bandwidth ? Bart. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver
On 07/05/12 17:47, Bart Van Assche wrote: On 07/04/12 04:24, Nicholas A. Bellinger wrote: +/* Fill in status and signal that we are done processing this command + * + * This is scheduled in the vhost work queue so we are called with the owner + * process mm and can access the vring. + */ +static void vhost_scsi_complete_cmd_work(struct vhost_work *work) +{ As far as I can see vhost_scsi_complete_cmd_work() runs on the context of a work queue kernel thread and hence doesn't have an mm context. Did I misunderstand something ? Please ignore the above - I've found the answer in vhost_dev_ioctl() and vhost_dev_set_owner(). Bart. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
On 07/05/12 17:53, Bart Van Assche wrote: On 07/05/12 01:52, Nicholas A. Bellinger wrote: fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block 25 Write / 75 Read | ~15K | ~45K | ~70K 75 Write / 25 Read | ~20K | ~55K | ~60K These numbers are interesting. To me these numbers mean that there is a huge performance bottleneck in the virtio-scsi-raw storage path. Why is the virtio-scsi-raw bandwidth only one third of the bare-metal raw block bandwidth ? (replying to my own e-mail) Or maybe the above numbers mean that in the virtio-scsi-raw test I/O was serialized (I/O depth 1) while the other two tests use a large I/O depth (64) ? It can't be a coincidence that the virtio-scsi-raw results are close to the bare-metal results for I/O depth 1. Another question: which functionality does tcm_vhost provide that is not yet provided by the SCSI emulation code in qemu + tcm_loop ? Bart. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On Mon, Feb 13, 2012 at 8:05 AM, Christian Borntraeger borntrae...@de.ibm.com wrote: On 12/02/12 21:16, James Bottomley wrote: Could someone please explain to me why you can't simply fix virtio-blk? I dont think that virtio-scsi will replace virtio-blk everywhere. For non-scsi block devices, image files or logical volumes virtio-blk seems to be the right approach, I think. Or would virtio-blk maintainers give a reason why they're unwilling to have it fixed? I dont consider virtio-blk broken. It just doesnt cover everything. Although I'm not sure whether that helps here: since about a year there is software present in the upstream kernel that allows to use any block device or even a file as a SCSI device. Bart. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Scst-devel] BUG: Process hang when kvm crashed, stacktrace cfs+kvm
On Sat, Jan 21, 2012 at 9:19 PM, kordex kor...@gmail.com wrote: I hit sysrq+t sysrq+p in order to print the tasks with stacktraces and registers. Using ps -e -o pid,state,command will hang ps output, while ps -e -o pid,state will not. Sorry, haven't seen such a hang ever before. Does it also occur with older kernels than 3.1 ? Is it reproducible with 3.3-rc1 ? Bart. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Scst-devel] BUG: Process hang when kvm crashed, stacktrace cfs+kvm
On Sat, Jan 21, 2012 at 2:58 PM, kordex kor...@gmail.com wrote: On Thu, Jan 19, 2012 at 21:12, kordex kor...@gmail.com wrote: On Thu, Jan 19, 2012 at 20:34, kordex kor...@gmail.com wrote: Full dmesg of the case attached. On Thu, Jan 19, 2012 at 13:12, Bart Van Assche bvanass...@acm.org wrote: On Thu, Jan 19, 2012 at 11:54 AM, kordex kor...@gmail.com wrote: root@teletex:/usr/src/linux# dmesg +0x12e7/0x1c50 [810dc322] ? page_add_new_anon_rmap+0x52/0xa0 The above output is truncated - the most interesting part is missing. Do I need to give you anything else? Any leads on what the problem might be? The machine is still running and in this state. In the /proc/meminfo output I see that the system has 16 GB RAM and 33 GB swap memory. Also, MemFree + Buffers = 400 KB, which makes me assume the system was under heavy memory pressure ? Some essential information is still missing - the reason why the kernel started spewing out all these call traces. Did someone hit the SysRq key ? Is it because a task was hanging ? Was a BUG or WARNING hit ? Bart. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Scst-devel] BUG: Process hang when kvm crashed, stacktrace cfs+kvm
On Thu, Jan 19, 2012 at 11:54 AM, kordex kor...@gmail.com wrote: root@teletex:/usr/src/linux# dmesg +0x12e7/0x1c50 [810dc322] ? page_add_new_anon_rmap+0x52/0xa0 The above output is truncated - the most interesting part is missing. Bart. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html