Re: [PATCH] virtio-blk: Disable callback in virtblk_done()

2012-09-26 Thread Asias He
On 09/27/2012 08:10 AM, Rusty Russell wrote:
> Asias He  writes:
> 
>> On 09/25/2012 10:36 AM, Asias He wrote:
>>> This reduces unnecessary interrupts that host could send to guest while
>>> guest is in the progress of irq handling.
>>>
>>> If one vcpu is handling the irq, while another interrupt comes, in
>>> handle_edge_irq(), the guest will mask the interrupt via mask_msi_irq()
>>> which is a very heavy operation that goes all the way down to host.
>>>
>>> Signed-off-by: Asias He 
>>> ---
>>
>> Here are some performance numbers on qemu:
> 
> I assume this is with qemu using kvm, not qemu in soft emulation? :)

Of course.

> 
>> Before:
>> -
>>   seq-read  : io=0 B, bw=269730KB/s, iops=67432 , runt= 62200msec
>>   seq-write : io=0 B, bw=339716KB/s, iops=84929 , runt= 49386msec
>>   rand-read : io=0 B, bw=270435KB/s, iops=67608 , runt= 62038msec
>>   rand-write: io=0 B, bw=354436KB/s, iops=88608 , runt= 47335msec
>> clat (usec): min=101 , max=138052 , avg=14822.09, stdev=11771.01
>> clat (usec): min=96 , max=81543 , avg=11798.94, stdev=7735.60
>> clat (usec): min=128 , max=140043 , avg=14835.85, stdev=11765.33
>> clat (usec): min=109 , max=147207 , avg=11337.09, stdev=5990.35
>>   cpu  : usr=15.93%, sys=60.37%, ctx=7764972, majf=0, minf=54
>>   cpu  : usr=32.73%, sys=120.49%, ctx=7372945, majf=0, minf=1
>>   cpu  : usr=18.84%, sys=58.18%, ctx=7775420, majf=0, minf=1
>>   cpu  : usr=24.20%, sys=59.85%, ctx=8307886, majf=0, minf=0
>>   vdb: ios=8389107/8368136, merge=0/0, ticks=19457874/14616506,
>> in_queue=34206098, util=99.68%
>>  43: interrupt in total: 887320
>> fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting
>> --ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16
>> --ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0
>> --filename=/dev/vdb --name=seq-read --stonewall --rw=read
>> --name=seq-write --stonewall --rw=write --name=rnd-read --stonewall
>> --rw=randread --name=rnd-write --stonewall --rw=randwrite
>>
>> After:
>> -
>>   seq-read  : io=0 B, bw=309503KB/s, iops=77375 , runt= 54207msec
>>   seq-write : io=0 B, bw=448205KB/s, iops=112051 , runt= 37432msec
>>   rand-read : io=0 B, bw=311254KB/s, iops=77813 , runt= 53902msec
>>   rand-write: io=0 B, bw=377152KB/s, iops=94287 , runt= 44484msec
>> clat (usec): min=81 , max=90588 , avg=12946.06, stdev=9085.94
>> clat (usec): min=57 , max=72264 , avg=8967.97, stdev=5951.04
>> clat (usec): min=29 , max=101046 , avg=12889.95, stdev=9067.91
>> clat (usec): min=52 , max=106152 , avg=10660.56, stdev=4778.19
>>   cpu  : usr=15.05%, sys=57.92%, ctx=7710941, majf=0, minf=54
>>   cpu  : usr=26.78%, sys=101.40%, ctx=7387891, majf=0, minf=2
>>   cpu  : usr=19.03%, sys=58.17%, ctx=7681976, majf=0, minf=8
>>   cpu  : usr=24.65%, sys=58.34%, ctx=8442632, majf=0, minf=4
>>   vdb: ios=8389086/8361888, merge=0/0, ticks=17243780/12742010,
>> in_queue=30078377, util=99.59%
>>  43: interrupt in total: 1259639
>> fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting
>> --ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16
>> --ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0
>> --filename=/dev/vdb --name=seq-read --stonewall --rw=read
>> --name=seq-write --stonewall --rw=write --name=rnd-read --stonewall
>> --rw=randread --name=rnd-write --stonewall --rw=randwrite
>>
>>>  drivers/block/virtio_blk.c | 19 +++
>>>  1 file changed, 11 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>> index 53b81d5..0bdde8f 100644
>>> --- a/drivers/block/virtio_blk.c
>>> +++ b/drivers/block/virtio_blk.c
>>> @@ -274,15 +274,18 @@ static void virtblk_done(struct virtqueue *vq)
>>> unsigned int len;
>>>  
>>> spin_lock_irqsave(vblk->disk->queue->queue_lock, flags);
>>> -   while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) {
>>> -   if (vbr->bio) {
>>> -   virtblk_bio_done(vbr);
>>> -   bio_done = true;
>>> -   } else {
>>> -   virtblk_request_done(vbr);
>>> -   req_done = true;
>>> +   do {
>>> +   virtqueue_disable_cb(vq);
>>> +   while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) {
>>> +   if (vbr->bio) {
>>> +   virtblk_bio_done(vbr);
>>> +   bio_done = true;
>>> +   } else {
>>> +   virtblk_request_done(vbr);
>>> +   req_done = true;
>>> +   }
>>> }
>>> -   }
>>> +   } while (!virtqueue_enable_cb(vq));
>>> /* In case queue is stopped waiting for more buffers. */
>>> if (req_done)
>>> blk_start_queue(vblk->disk->queue);
> 
> Fascinating.  Please just confirm that VIRTIO_RING_F_E

RE: [PATCH 1/3] virtio_console:Merge struct buffer_token into struct port_buffer

2012-09-26 Thread Rusty Russell
Sjur BRENDELAND  writes:
>> This allocates one redundant sg entry when nrbuf > 0,
>> but I think it is OK. (just a comment)
>
> I did this on purpose for the sake of simplicity, but I can
> change this to something like:
>alloc_size = sizeof(*buf) + sizeof(buf->sg) * max(nrbufs - 1, 1);

That's why we use [0] in the definition (a GCC extension).

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Proposal for virtio standardization.

2012-09-26 Thread Rusty Russell
Hi all,

I've had several requests for a more formal approach to the
virtio draft spec, and (after some soul-searching) I'd like to try that.

The proposal is to use OASIS as the standards body, as it's
fairly light-weight as these things go.  For me this means paperwork and
setting up a Working Group and getting the right people involved as
Voting members starting with the current contributors; for most of you
it just means a new mailing list, though I'll be cross-posting any
drafts and major changes here anyway.

I believe that a documented standard (aka virtio 1.0) will
increase visibility and adoption in areas outside our normal linux/kvm
universe.  There's been some of that already, but this is the clearest
path to accelerate it.  Not the easiest path, but I believe that a solid
I/O standard is a Good Thing for everyone.

Yet I also want to decouple new and experimental development
from the standards effort; running code comes first.  New feature bits
and new device numbers should be reservable without requiring a full
spec change.

So the essence of my proposal is:
1) I start a Working Group within OASIS where we can aim for virtio spec
   1.0.

2) The current spec is textually reordered so the core is clearly
   bus-independent, with PCI, mmio, etc appendices.

3) Various clarifications, formalizations and cleanups to the spec text,
   and possibly elimination of old deprecated features.

4) The only significant change to the spec is that we use PCI
   capabilities, so we can have infinite feature bits.
   (see
http://lists.linuxfoundation.org/pipermail/virtualization/2011-December/019198.html)

5) Changes to the ring layout and other such things are deferred to a
   future virtio version; whether this is done within OASIS or
   externally depends on how well this works for the 1.0 release.

Thoughts?
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-blk: Disable callback in virtblk_done()

2012-09-26 Thread Rusty Russell
Asias He  writes:

> On 09/25/2012 10:36 AM, Asias He wrote:
>> This reduces unnecessary interrupts that host could send to guest while
>> guest is in the progress of irq handling.
>> 
>> If one vcpu is handling the irq, while another interrupt comes, in
>> handle_edge_irq(), the guest will mask the interrupt via mask_msi_irq()
>> which is a very heavy operation that goes all the way down to host.
>> 
>> Signed-off-by: Asias He 
>> ---
>
> Here are some performance numbers on qemu:

I assume this is with qemu using kvm, not qemu in soft emulation? :)

> Before:
> -
>   seq-read  : io=0 B, bw=269730KB/s, iops=67432 , runt= 62200msec
>   seq-write : io=0 B, bw=339716KB/s, iops=84929 , runt= 49386msec
>   rand-read : io=0 B, bw=270435KB/s, iops=67608 , runt= 62038msec
>   rand-write: io=0 B, bw=354436KB/s, iops=88608 , runt= 47335msec
> clat (usec): min=101 , max=138052 , avg=14822.09, stdev=11771.01
> clat (usec): min=96 , max=81543 , avg=11798.94, stdev=7735.60
> clat (usec): min=128 , max=140043 , avg=14835.85, stdev=11765.33
> clat (usec): min=109 , max=147207 , avg=11337.09, stdev=5990.35
>   cpu  : usr=15.93%, sys=60.37%, ctx=7764972, majf=0, minf=54
>   cpu  : usr=32.73%, sys=120.49%, ctx=7372945, majf=0, minf=1
>   cpu  : usr=18.84%, sys=58.18%, ctx=7775420, majf=0, minf=1
>   cpu  : usr=24.20%, sys=59.85%, ctx=8307886, majf=0, minf=0
>   vdb: ios=8389107/8368136, merge=0/0, ticks=19457874/14616506,
> in_queue=34206098, util=99.68%
>  43: interrupt in total: 887320
> fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting
> --ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16
> --ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0
> --filename=/dev/vdb --name=seq-read --stonewall --rw=read
> --name=seq-write --stonewall --rw=write --name=rnd-read --stonewall
> --rw=randread --name=rnd-write --stonewall --rw=randwrite
>
> After:
> -
>   seq-read  : io=0 B, bw=309503KB/s, iops=77375 , runt= 54207msec
>   seq-write : io=0 B, bw=448205KB/s, iops=112051 , runt= 37432msec
>   rand-read : io=0 B, bw=311254KB/s, iops=77813 , runt= 53902msec
>   rand-write: io=0 B, bw=377152KB/s, iops=94287 , runt= 44484msec
> clat (usec): min=81 , max=90588 , avg=12946.06, stdev=9085.94
> clat (usec): min=57 , max=72264 , avg=8967.97, stdev=5951.04
> clat (usec): min=29 , max=101046 , avg=12889.95, stdev=9067.91
> clat (usec): min=52 , max=106152 , avg=10660.56, stdev=4778.19
>   cpu  : usr=15.05%, sys=57.92%, ctx=7710941, majf=0, minf=54
>   cpu  : usr=26.78%, sys=101.40%, ctx=7387891, majf=0, minf=2
>   cpu  : usr=19.03%, sys=58.17%, ctx=7681976, majf=0, minf=8
>   cpu  : usr=24.65%, sys=58.34%, ctx=8442632, majf=0, minf=4
>   vdb: ios=8389086/8361888, merge=0/0, ticks=17243780/12742010,
> in_queue=30078377, util=99.59%
>  43: interrupt in total: 1259639
> fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting
> --ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16
> --ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0
> --filename=/dev/vdb --name=seq-read --stonewall --rw=read
> --name=seq-write --stonewall --rw=write --name=rnd-read --stonewall
> --rw=randread --name=rnd-write --stonewall --rw=randwrite
>
>>  drivers/block/virtio_blk.c | 19 +++
>>  1 file changed, 11 insertions(+), 8 deletions(-)
>> 
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index 53b81d5..0bdde8f 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -274,15 +274,18 @@ static void virtblk_done(struct virtqueue *vq)
>>  unsigned int len;
>>  
>>  spin_lock_irqsave(vblk->disk->queue->queue_lock, flags);
>> -while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) {
>> -if (vbr->bio) {
>> -virtblk_bio_done(vbr);
>> -bio_done = true;
>> -} else {
>> -virtblk_request_done(vbr);
>> -req_done = true;
>> +do {
>> +virtqueue_disable_cb(vq);
>> +while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) {
>> +if (vbr->bio) {
>> +virtblk_bio_done(vbr);
>> +bio_done = true;
>> +} else {
>> +virtblk_request_done(vbr);
>> +req_done = true;
>> +}
>>  }
>> -}
>> +} while (!virtqueue_enable_cb(vq));
>>  /* In case queue is stopped waiting for more buffers. */
>>  if (req_done)
>>  blk_start_queue(vblk->disk->queue);

Fascinating.  Please just confirm that VIRTIO_RING_F_EVENT_IDX is
enabled?

I forgot about the cool hack which MST put in to defer event updates
using disable_cb/enable_cb.

Applied!
Rusty.
__

Re: [PATCHv5 2/3] virtio_console: Add support for remoteproc serial

2012-09-26 Thread Rusty Russell
sjur.brandel...@stericsson.com writes:
> From: Sjur Brændeland 
>
> Add a simple serial connection driver called
> VIRTIO_ID_RPROC_SERIAL (11) for communicating with a
> remote processor in an asymmetric multi-processing
> configuration.
>
...
>  static struct port_buffer *alloc_buf(struct virtqueue *vq, size_t buf_size,
>int nrbufs)
>  {
>   struct port_buffer *buf;
>   size_t alloc_size;
>  
> + if (is_rproc_serial(vq->vdev) && !irqs_disabled())
> + reclaim_dma_bufs();

Hmm, you need a gfp_t arg into alloc_buf; your last patch simply changed
them all to GFP_ATOMIC, which makes the console far less memory
friendly.

You check !irqs_disabled() in a couple of places; I think the caller
needs to indicate (possibly by checking for gfp == GFP_KERNEL) whether
it's safe to call reclaim_dma_bufs().

> @@ -838,6 +927,10 @@ static ssize_t port_fops_splice_write(struct 
> pipe_inode_info *pipe,
>   .u.data = &sgl,
>   };
>  
> + /* rproc_serial does not support splice */
> + if (is_rproc_serial(port->out_vq->vdev))
> + return -EINVAL;

Why not? ;)

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/3] virtio_console:Merge struct buffer_token into struct port_buffer

2012-09-26 Thread Masami Hiramatsu
(2012/09/26 16:48), Sjur BRENDELAND wrote:
>>> -static struct port_buffer *alloc_buf(size_t buf_size)
>>> +static struct port_buffer *alloc_buf(struct virtqueue *vq, size_t buf_size,
>>> +int nrbufs)
>>>  {
>>> struct port_buffer *buf;
>>> +   size_t alloc_size;
>>>
>>> -   buf = kmalloc(sizeof(*buf), GFP_KERNEL);
>>> +   /* Allocate buffer and the scatter list */
>>> +   alloc_size = sizeof(*buf) + sizeof(struct scatterlist) * nrbufs;
>>
>> This allocates one redundant sg entry when nrbuf > 0,
>> but I think it is OK. (just a comment)
> 
> I did this on purpose for the sake of simplicity, but I can
> change this to something like:
>alloc_size = sizeof(*buf) + sizeof(buf->sg) * max(nrbufs - 1, 1);

You wouldn't need to change that. I think current code is enough simple
and reasonable. :)

Thanks!

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 1/3] virtio_console:Merge struct buffer_token into struct port_buffer

2012-09-26 Thread Sjur BRENDELAND
> > This merge reduces code size by unifying the approach for
> > sending scatter-lists and regular buffers. Any type of
> > write operation (splice, write, put_chars) will now allocate
> > a port_buffer and send_buf() and free_buf() can always be used.
> 
> Thanks!
> This looks much nicer and simpler. I just have some comments below.

OK, good to hear that you agree to this kind of change. I'll do a respin
of this patch fixing the issues you have pointed out.

> >  static void free_buf(struct port_buffer *buf)
> >  {
> > +   int i;
> > +
> > kfree(buf->buf);
> 
> this should be done only when !buf->sgpages, or (see below)

Agree, I'll put this statement in an else branch to the if-statement below.

> 
> > +
> > +   if (buf->sgpages)
> > +   for (i = 0; i < buf->sgpages; i++) {
> > +   struct page *page = sg_page(&buf->sg[i]);
> > +   if (!page)
> > +   break;
> > +   put_page(page);
> > +   }
> > +
> > kfree(buf);
> >  }
> >
> > -static struct port_buffer *alloc_buf(size_t buf_size)
> > +static struct port_buffer *alloc_buf(struct virtqueue *vq, size_t buf_size,
> > +int nrbufs)
> >  {
> > struct port_buffer *buf;
> > +   size_t alloc_size;
> >
> > -   buf = kmalloc(sizeof(*buf), GFP_KERNEL);
> > +   /* Allocate buffer and the scatter list */
> > +   alloc_size = sizeof(*buf) + sizeof(struct scatterlist) * nrbufs;
> 
> This allocates one redundant sg entry when nrbuf > 0,
> but I think it is OK. (just a comment)

I did this on purpose for the sake of simplicity, but I can
change this to something like:
 alloc_size = sizeof(*buf) + sizeof(buf->sg) * max(nrbufs - 1, 1);


> > +   buf = kmalloc(alloc_size, GFP_ATOMIC);
> 
> This should be kzalloc(), or buf->buf and others are not initialized,
> which will cause unexpected kfree bug at kfree(buf->buf) in free_buf.

Agree, kzalloc() is better in this case. 

> > if (!buf)
> > goto fail;
> > -   buf->buf = kzalloc(buf_size, GFP_KERNEL);
> > +
> > +   buf->sgpages = nrbufs;
> > +   if (nrbufs > 0)
> > +   return buf;
> > +
> > +   buf->buf = kmalloc(buf_size, GFP_ATOMIC);
> 
> You can also use kzalloc here as previous code does.
> But if the reason why using kzalloc comes from the security,
> I think kmalloc is enough here, since the host can access
> all the guest pages anyway.

With this new patch alloc_buf() is used both for both RX and TX.
The out_vq did previously use malloc(). But I have preserved
the legacy behavior for the in_vq by calling memset() in function
fill_queue().

Thanks,
Sjur
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization