Re: [PATCH] virtio-blk: Disable callback in virtblk_done()
On 09/27/2012 08:10 AM, Rusty Russell wrote: > Asias He writes: > >> On 09/25/2012 10:36 AM, Asias He wrote: >>> This reduces unnecessary interrupts that host could send to guest while >>> guest is in the progress of irq handling. >>> >>> If one vcpu is handling the irq, while another interrupt comes, in >>> handle_edge_irq(), the guest will mask the interrupt via mask_msi_irq() >>> which is a very heavy operation that goes all the way down to host. >>> >>> Signed-off-by: Asias He >>> --- >> >> Here are some performance numbers on qemu: > > I assume this is with qemu using kvm, not qemu in soft emulation? :) Of course. > >> Before: >> - >> seq-read : io=0 B, bw=269730KB/s, iops=67432 , runt= 62200msec >> seq-write : io=0 B, bw=339716KB/s, iops=84929 , runt= 49386msec >> rand-read : io=0 B, bw=270435KB/s, iops=67608 , runt= 62038msec >> rand-write: io=0 B, bw=354436KB/s, iops=88608 , runt= 47335msec >> clat (usec): min=101 , max=138052 , avg=14822.09, stdev=11771.01 >> clat (usec): min=96 , max=81543 , avg=11798.94, stdev=7735.60 >> clat (usec): min=128 , max=140043 , avg=14835.85, stdev=11765.33 >> clat (usec): min=109 , max=147207 , avg=11337.09, stdev=5990.35 >> cpu : usr=15.93%, sys=60.37%, ctx=7764972, majf=0, minf=54 >> cpu : usr=32.73%, sys=120.49%, ctx=7372945, majf=0, minf=1 >> cpu : usr=18.84%, sys=58.18%, ctx=7775420, majf=0, minf=1 >> cpu : usr=24.20%, sys=59.85%, ctx=8307886, majf=0, minf=0 >> vdb: ios=8389107/8368136, merge=0/0, ticks=19457874/14616506, >> in_queue=34206098, util=99.68% >> 43: interrupt in total: 887320 >> fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting >> --ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16 >> --ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0 >> --filename=/dev/vdb --name=seq-read --stonewall --rw=read >> --name=seq-write --stonewall --rw=write --name=rnd-read --stonewall >> --rw=randread --name=rnd-write --stonewall --rw=randwrite >> >> After: >> - >> seq-read : io=0 B, bw=309503KB/s, iops=77375 , runt= 54207msec >> seq-write : io=0 B, bw=448205KB/s, iops=112051 , runt= 37432msec >> rand-read : io=0 B, bw=311254KB/s, iops=77813 , runt= 53902msec >> rand-write: io=0 B, bw=377152KB/s, iops=94287 , runt= 44484msec >> clat (usec): min=81 , max=90588 , avg=12946.06, stdev=9085.94 >> clat (usec): min=57 , max=72264 , avg=8967.97, stdev=5951.04 >> clat (usec): min=29 , max=101046 , avg=12889.95, stdev=9067.91 >> clat (usec): min=52 , max=106152 , avg=10660.56, stdev=4778.19 >> cpu : usr=15.05%, sys=57.92%, ctx=7710941, majf=0, minf=54 >> cpu : usr=26.78%, sys=101.40%, ctx=7387891, majf=0, minf=2 >> cpu : usr=19.03%, sys=58.17%, ctx=7681976, majf=0, minf=8 >> cpu : usr=24.65%, sys=58.34%, ctx=8442632, majf=0, minf=4 >> vdb: ios=8389086/8361888, merge=0/0, ticks=17243780/12742010, >> in_queue=30078377, util=99.59% >> 43: interrupt in total: 1259639 >> fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting >> --ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16 >> --ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0 >> --filename=/dev/vdb --name=seq-read --stonewall --rw=read >> --name=seq-write --stonewall --rw=write --name=rnd-read --stonewall >> --rw=randread --name=rnd-write --stonewall --rw=randwrite >> >>> drivers/block/virtio_blk.c | 19 +++ >>> 1 file changed, 11 insertions(+), 8 deletions(-) >>> >>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c >>> index 53b81d5..0bdde8f 100644 >>> --- a/drivers/block/virtio_blk.c >>> +++ b/drivers/block/virtio_blk.c >>> @@ -274,15 +274,18 @@ static void virtblk_done(struct virtqueue *vq) >>> unsigned int len; >>> >>> spin_lock_irqsave(vblk->disk->queue->queue_lock, flags); >>> - while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) { >>> - if (vbr->bio) { >>> - virtblk_bio_done(vbr); >>> - bio_done = true; >>> - } else { >>> - virtblk_request_done(vbr); >>> - req_done = true; >>> + do { >>> + virtqueue_disable_cb(vq); >>> + while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) { >>> + if (vbr->bio) { >>> + virtblk_bio_done(vbr); >>> + bio_done = true; >>> + } else { >>> + virtblk_request_done(vbr); >>> + req_done = true; >>> + } >>> } >>> - } >>> + } while (!virtqueue_enable_cb(vq)); >>> /* In case queue is stopped waiting for more buffers. */ >>> if (req_done) >>> blk_start_queue(vblk->disk->queue); > > Fascinating. Please just confirm that VIRTIO_RING_F_E
RE: [PATCH 1/3] virtio_console:Merge struct buffer_token into struct port_buffer
Sjur BRENDELAND writes: >> This allocates one redundant sg entry when nrbuf > 0, >> but I think it is OK. (just a comment) > > I did this on purpose for the sake of simplicity, but I can > change this to something like: >alloc_size = sizeof(*buf) + sizeof(buf->sg) * max(nrbufs - 1, 1); That's why we use [0] in the definition (a GCC extension). Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Proposal for virtio standardization.
Hi all, I've had several requests for a more formal approach to the virtio draft spec, and (after some soul-searching) I'd like to try that. The proposal is to use OASIS as the standards body, as it's fairly light-weight as these things go. For me this means paperwork and setting up a Working Group and getting the right people involved as Voting members starting with the current contributors; for most of you it just means a new mailing list, though I'll be cross-posting any drafts and major changes here anyway. I believe that a documented standard (aka virtio 1.0) will increase visibility and adoption in areas outside our normal linux/kvm universe. There's been some of that already, but this is the clearest path to accelerate it. Not the easiest path, but I believe that a solid I/O standard is a Good Thing for everyone. Yet I also want to decouple new and experimental development from the standards effort; running code comes first. New feature bits and new device numbers should be reservable without requiring a full spec change. So the essence of my proposal is: 1) I start a Working Group within OASIS where we can aim for virtio spec 1.0. 2) The current spec is textually reordered so the core is clearly bus-independent, with PCI, mmio, etc appendices. 3) Various clarifications, formalizations and cleanups to the spec text, and possibly elimination of old deprecated features. 4) The only significant change to the spec is that we use PCI capabilities, so we can have infinite feature bits. (see http://lists.linuxfoundation.org/pipermail/virtualization/2011-December/019198.html) 5) Changes to the ring layout and other such things are deferred to a future virtio version; whether this is done within OASIS or externally depends on how well this works for the 1.0 release. Thoughts? Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio-blk: Disable callback in virtblk_done()
Asias He writes: > On 09/25/2012 10:36 AM, Asias He wrote: >> This reduces unnecessary interrupts that host could send to guest while >> guest is in the progress of irq handling. >> >> If one vcpu is handling the irq, while another interrupt comes, in >> handle_edge_irq(), the guest will mask the interrupt via mask_msi_irq() >> which is a very heavy operation that goes all the way down to host. >> >> Signed-off-by: Asias He >> --- > > Here are some performance numbers on qemu: I assume this is with qemu using kvm, not qemu in soft emulation? :) > Before: > - > seq-read : io=0 B, bw=269730KB/s, iops=67432 , runt= 62200msec > seq-write : io=0 B, bw=339716KB/s, iops=84929 , runt= 49386msec > rand-read : io=0 B, bw=270435KB/s, iops=67608 , runt= 62038msec > rand-write: io=0 B, bw=354436KB/s, iops=88608 , runt= 47335msec > clat (usec): min=101 , max=138052 , avg=14822.09, stdev=11771.01 > clat (usec): min=96 , max=81543 , avg=11798.94, stdev=7735.60 > clat (usec): min=128 , max=140043 , avg=14835.85, stdev=11765.33 > clat (usec): min=109 , max=147207 , avg=11337.09, stdev=5990.35 > cpu : usr=15.93%, sys=60.37%, ctx=7764972, majf=0, minf=54 > cpu : usr=32.73%, sys=120.49%, ctx=7372945, majf=0, minf=1 > cpu : usr=18.84%, sys=58.18%, ctx=7775420, majf=0, minf=1 > cpu : usr=24.20%, sys=59.85%, ctx=8307886, majf=0, minf=0 > vdb: ios=8389107/8368136, merge=0/0, ticks=19457874/14616506, > in_queue=34206098, util=99.68% > 43: interrupt in total: 887320 > fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting > --ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16 > --ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0 > --filename=/dev/vdb --name=seq-read --stonewall --rw=read > --name=seq-write --stonewall --rw=write --name=rnd-read --stonewall > --rw=randread --name=rnd-write --stonewall --rw=randwrite > > After: > - > seq-read : io=0 B, bw=309503KB/s, iops=77375 , runt= 54207msec > seq-write : io=0 B, bw=448205KB/s, iops=112051 , runt= 37432msec > rand-read : io=0 B, bw=311254KB/s, iops=77813 , runt= 53902msec > rand-write: io=0 B, bw=377152KB/s, iops=94287 , runt= 44484msec > clat (usec): min=81 , max=90588 , avg=12946.06, stdev=9085.94 > clat (usec): min=57 , max=72264 , avg=8967.97, stdev=5951.04 > clat (usec): min=29 , max=101046 , avg=12889.95, stdev=9067.91 > clat (usec): min=52 , max=106152 , avg=10660.56, stdev=4778.19 > cpu : usr=15.05%, sys=57.92%, ctx=7710941, majf=0, minf=54 > cpu : usr=26.78%, sys=101.40%, ctx=7387891, majf=0, minf=2 > cpu : usr=19.03%, sys=58.17%, ctx=7681976, majf=0, minf=8 > cpu : usr=24.65%, sys=58.34%, ctx=8442632, majf=0, minf=4 > vdb: ios=8389086/8361888, merge=0/0, ticks=17243780/12742010, > in_queue=30078377, util=99.59% > 43: interrupt in total: 1259639 > fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting > --ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16 > --ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0 > --filename=/dev/vdb --name=seq-read --stonewall --rw=read > --name=seq-write --stonewall --rw=write --name=rnd-read --stonewall > --rw=randread --name=rnd-write --stonewall --rw=randwrite > >> drivers/block/virtio_blk.c | 19 +++ >> 1 file changed, 11 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c >> index 53b81d5..0bdde8f 100644 >> --- a/drivers/block/virtio_blk.c >> +++ b/drivers/block/virtio_blk.c >> @@ -274,15 +274,18 @@ static void virtblk_done(struct virtqueue *vq) >> unsigned int len; >> >> spin_lock_irqsave(vblk->disk->queue->queue_lock, flags); >> -while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) { >> -if (vbr->bio) { >> -virtblk_bio_done(vbr); >> -bio_done = true; >> -} else { >> -virtblk_request_done(vbr); >> -req_done = true; >> +do { >> +virtqueue_disable_cb(vq); >> +while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) { >> +if (vbr->bio) { >> +virtblk_bio_done(vbr); >> +bio_done = true; >> +} else { >> +virtblk_request_done(vbr); >> +req_done = true; >> +} >> } >> -} >> +} while (!virtqueue_enable_cb(vq)); >> /* In case queue is stopped waiting for more buffers. */ >> if (req_done) >> blk_start_queue(vblk->disk->queue); Fascinating. Please just confirm that VIRTIO_RING_F_EVENT_IDX is enabled? I forgot about the cool hack which MST put in to defer event updates using disable_cb/enable_cb. Applied! Rusty. __
Re: [PATCHv5 2/3] virtio_console: Add support for remoteproc serial
sjur.brandel...@stericsson.com writes: > From: Sjur Brændeland > > Add a simple serial connection driver called > VIRTIO_ID_RPROC_SERIAL (11) for communicating with a > remote processor in an asymmetric multi-processing > configuration. > ... > static struct port_buffer *alloc_buf(struct virtqueue *vq, size_t buf_size, >int nrbufs) > { > struct port_buffer *buf; > size_t alloc_size; > > + if (is_rproc_serial(vq->vdev) && !irqs_disabled()) > + reclaim_dma_bufs(); Hmm, you need a gfp_t arg into alloc_buf; your last patch simply changed them all to GFP_ATOMIC, which makes the console far less memory friendly. You check !irqs_disabled() in a couple of places; I think the caller needs to indicate (possibly by checking for gfp == GFP_KERNEL) whether it's safe to call reclaim_dma_bufs(). > @@ -838,6 +927,10 @@ static ssize_t port_fops_splice_write(struct > pipe_inode_info *pipe, > .u.data = &sgl, > }; > > + /* rproc_serial does not support splice */ > + if (is_rproc_serial(port->out_vq->vdev)) > + return -EINVAL; Why not? ;) Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] virtio_console:Merge struct buffer_token into struct port_buffer
(2012/09/26 16:48), Sjur BRENDELAND wrote: >>> -static struct port_buffer *alloc_buf(size_t buf_size) >>> +static struct port_buffer *alloc_buf(struct virtqueue *vq, size_t buf_size, >>> +int nrbufs) >>> { >>> struct port_buffer *buf; >>> + size_t alloc_size; >>> >>> - buf = kmalloc(sizeof(*buf), GFP_KERNEL); >>> + /* Allocate buffer and the scatter list */ >>> + alloc_size = sizeof(*buf) + sizeof(struct scatterlist) * nrbufs; >> >> This allocates one redundant sg entry when nrbuf > 0, >> but I think it is OK. (just a comment) > > I did this on purpose for the sake of simplicity, but I can > change this to something like: >alloc_size = sizeof(*buf) + sizeof(buf->sg) * max(nrbufs - 1, 1); You wouldn't need to change that. I think current code is enough simple and reasonable. :) Thanks! -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH 1/3] virtio_console:Merge struct buffer_token into struct port_buffer
> > This merge reduces code size by unifying the approach for > > sending scatter-lists and regular buffers. Any type of > > write operation (splice, write, put_chars) will now allocate > > a port_buffer and send_buf() and free_buf() can always be used. > > Thanks! > This looks much nicer and simpler. I just have some comments below. OK, good to hear that you agree to this kind of change. I'll do a respin of this patch fixing the issues you have pointed out. > > static void free_buf(struct port_buffer *buf) > > { > > + int i; > > + > > kfree(buf->buf); > > this should be done only when !buf->sgpages, or (see below) Agree, I'll put this statement in an else branch to the if-statement below. > > > + > > + if (buf->sgpages) > > + for (i = 0; i < buf->sgpages; i++) { > > + struct page *page = sg_page(&buf->sg[i]); > > + if (!page) > > + break; > > + put_page(page); > > + } > > + > > kfree(buf); > > } > > > > -static struct port_buffer *alloc_buf(size_t buf_size) > > +static struct port_buffer *alloc_buf(struct virtqueue *vq, size_t buf_size, > > +int nrbufs) > > { > > struct port_buffer *buf; > > + size_t alloc_size; > > > > - buf = kmalloc(sizeof(*buf), GFP_KERNEL); > > + /* Allocate buffer and the scatter list */ > > + alloc_size = sizeof(*buf) + sizeof(struct scatterlist) * nrbufs; > > This allocates one redundant sg entry when nrbuf > 0, > but I think it is OK. (just a comment) I did this on purpose for the sake of simplicity, but I can change this to something like: alloc_size = sizeof(*buf) + sizeof(buf->sg) * max(nrbufs - 1, 1); > > + buf = kmalloc(alloc_size, GFP_ATOMIC); > > This should be kzalloc(), or buf->buf and others are not initialized, > which will cause unexpected kfree bug at kfree(buf->buf) in free_buf. Agree, kzalloc() is better in this case. > > if (!buf) > > goto fail; > > - buf->buf = kzalloc(buf_size, GFP_KERNEL); > > + > > + buf->sgpages = nrbufs; > > + if (nrbufs > 0) > > + return buf; > > + > > + buf->buf = kmalloc(buf_size, GFP_ATOMIC); > > You can also use kzalloc here as previous code does. > But if the reason why using kzalloc comes from the security, > I think kmalloc is enough here, since the host can access > all the guest pages anyway. With this new patch alloc_buf() is used both for both RX and TX. The out_vq did previously use malloc(). But I have preserved the legacy behavior for the in_vq by calling memset() in function fill_queue(). Thanks, Sjur ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization