Re: [kvm-devel] [patch 00/13] RFC: split the global mutex

2008-04-20 Thread Avi Kivity
Marcelo Tosatti wrote:
> On Sun, Apr 20, 2008 at 02:16:52PM +0300, Avi Kivity wrote:
>   
>>> The iperf numbers are pretty good. Performance of UP guests increase 
>>> slightly but SMP
>>> is quite significant.
>>>   
>> I expect you're seeing contention induced by memcpy()s and inefficient 
>> emulation.  With the dma api, I expect the benefit will drop.
>> 
>
> You still have to memcpy() with the dma api. Even with vringfd the
> kernel->user copy has to be performed under the global mutex protection,
> difference being that several packets can be copied per-syscall instead
> of only one.
>
>   

Block does the copy outside the mutex protection, so net can be adapted 
to do the same.  It does mean we will need to block all I/O temporarily 
during memory hotplug.

>> For pure cpu emulation, there is a ton of work to be done: protecting
>> the translator as well as making the translated code smp safe.
>> 
>
> I now believe there is a lot of work (which was not clear before).
> Not particularly interested in getting real emulation to be
> multithreaded.
>
> Anyways, the lack of multithreading in qemu emulation should not be a
> blocker for these patches to get in, since these are infrastructural
> changes.
>
>   

Getting this into qemu upstream is essential as this is far more 
intrusive than anything else we've done.  But again, I believe there are 
many other fruit hanging from lower branches.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 00/13] RFC: split the global mutex

2008-04-20 Thread Marcelo Tosatti
On Sun, Apr 20, 2008 at 02:16:52PM +0300, Avi Kivity wrote:
> >The iperf numbers are pretty good. Performance of UP guests increase 
> >slightly but SMP
> >is quite significant.
>
> I expect you're seeing contention induced by memcpy()s and inefficient 
> emulation.  With the dma api, I expect the benefit will drop.

You still have to memcpy() with the dma api. Even with vringfd the
kernel->user copy has to be performed under the global mutex protection,
difference being that several packets can be copied per-syscall instead
of only one.

> >Note that workloads with multiple busy devices (such as databases, web 
> >servers) should
> >be the real winners.
> >
> >What is the feeling on this? Its not _that_ intrusive and can be easily 
> >NOP'ed out for
> >QEMU.
> >
> >  
> 
> I think many parts are missing (or maybe, I missed them).  You need to 
> lock the qemu internals (there are many read-mostly qemu caches 
> scattered around the code), lock against hotplug, etc.  

Yes, there are some parts missing, such as the bh list and hotplug as
you mention.

> For pure cpu emulation, there is a ton of work to be done: protecting
> the translator as well as making the translated code smp safe.

I now believe there is a lot of work (which was not clear before).
Not particularly interested in getting real emulation to be
multithreaded.

Anyways, the lack of multithreading in qemu emulation should not be a
blocker for these patches to get in, since these are infrastructural
changes.

> I think that QemuDevice makes sense, and that we want this long term, 
> but that we first need to improve efficiency (which reduces cpu 
> utilization _and_ improves scalability) rather than look at scalability 
> alone (which is much harder in addition to the drawback of not reducing 
> cpu utilization).

Will complete the QEMUDevice+splitlock patchset, keep it uptodated, and
test it under a wider variety of workloads.

Thanks.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 00/13] RFC: split the global mutex

2008-04-20 Thread Avi Kivity
Marcelo Tosatti wrote:
> Introduce QEMUDevice, making the ioport/iomem->device relationship visible. 
>
> At the moment it only contains a lock, but could be extended.
>
> With it the following is possible:
> - vcpu's to read/write via ioports/iomem while the iothread is working on 
>   some unrelated device, or just copying data from the kernel.
> - vcpu's to read/write via ioports/iomem to different devices 
> simultaneously.
>
> This patchset is only a proof of concept kind of thing, so only serial+raw 
> image
> are supported. 
>
> Tried two benchmarks, iperf and tiobench. With tiobench the reported latency 
> is 
> significantly lower (20%+), but throughput with IDE is only slightly higher. 
>
> Expect to see larger improvements with a higher performing IO scheme (SCSI 
> still buggy,
> looking at it).
>
> The iperf numbers are pretty good. Performance of UP guests increase slightly 
> but SMP
> is quite significant.
>
>   


I expect you're seeing contention induced by memcpy()s and inefficient 
emulation.  With the dma api, I expect the benefit will drop.


> Note that workloads with multiple busy devices (such as databases, web 
> servers) should
> be the real winners.
>
> What is the feeling on this? Its not _that_ intrusive and can be easily 
> NOP'ed out for
> QEMU.
>
>   

I think many parts are missing (or maybe, I missed them).  You need to 
lock the qemu internals (there are many read-mostly qemu caches 
scattered around the code), lock against hotplug, etc.  For pure cpu 
emulation, there is a ton of work to be done: protecting the translator 
as well as making the translated code smp safe.

I think that QemuDevice makes sense, and that we want this long term, 
but that we first need to improve efficiency (which reduces cpu 
utilization _and_ improves scalability) rather than look at scalability 
alone (which is much harder in addition to the drawback of not reducing 
cpu utilization).


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 00/13] RFC: split the global mutex

2008-04-17 Thread Marcelo Tosatti
Introduce QEMUDevice, making the ioport/iomem->device relationship visible. 

At the moment it only contains a lock, but could be extended.

With it the following is possible:
- vcpu's to read/write via ioports/iomem while the iothread is working on 
  some unrelated device, or just copying data from the kernel.
- vcpu's to read/write via ioports/iomem to different devices 
simultaneously.

This patchset is only a proof of concept kind of thing, so only serial+raw image
are supported. 

Tried two benchmarks, iperf and tiobench. With tiobench the reported latency is 
significantly lower (20%+), but throughput with IDE is only slightly higher. 

Expect to see larger improvements with a higher performing IO scheme (SCSI 
still buggy,
looking at it).

The iperf numbers are pretty good. Performance of UP guests increase slightly 
but SMP
is quite significant.

Note that workloads with multiple busy devices (such as databases, web servers) 
should
be the real winners.

What is the feeling on this? Its not _that_ intrusive and can be easily NOP'ed 
out for
QEMU.

iperf -c 4 -i 60

 e1000

UP guest:
global lock
[SUM]  0.0-10.0 sec156 MBytes131 Mbits/sec
[SUM]  0.0-10.0 sec151 MBytes126 Mbits/sec
[SUM]  0.0-10.0 sec151 MBytes126 Mbits/sec
[SUM]  0.0-10.0 sec151 MBytes127 Mbits/sec
per-device lock
[SUM]  0.0-10.0 sec164 MBytes137 Mbits/sec
[SUM]  0.0-10.0 sec161 MBytes135 Mbits/sec
[SUM]  0.0-10.0 sec158 MBytes133 Mbits/sec
[SUM]  0.0-10.0 sec171 MBytes143 Mbits/sec

SMP guest (4-way)
global lock
[SUM]  0.0-13.0 sec402 MBytes259 Mbits/sec
[SUM]  0.0-10.1 sec469 MBytes391 Mbits/sec
[SUM]  0.0-10.1 sec477 MBytes397 Mbits/sec
[SUM]  0.0-10.0 sec469 MBytes393 Mbits/sec
per-device lock
[SUM]  0.0-13.0 sec471 MBytes304 Mbits/sec
[SUM]  0.0-10.2 sec532 MBytes439 Mbits/sec
[SUM]  0.0-10.1 sec510 MBytes423 Mbits/sec
[SUM]  0.0-10.1 sec529 MBytes441 Mbits/sec

- virtio-net
UP guest:
global lock
[SUM]  0.0-13.0 sec192 MBytes124 Mbits/sec
[SUM]  0.0-10.0 sec213 MBytes178 Mbits/sec
[SUM]  0.0-10.0 sec213 MBytes178 Mbits/sec
[SUM]  0.0-10.0 sec213 MBytes178 Mbits/sec
per-device lock
[SUM]  0.0-13.0 sec193 MBytes125 Mbits/sec
[SUM]  0.0-10.0 sec210 MBytes176 Mbits/sec
[SUM]  0.0-10.0 sec218 MBytes183 Mbits/sec
[SUM]  0.0-10.0 sec216 MBytes181 Mbits/sec

SMP guest:
global lock
[SUM]  0.0-13.0 sec446 MBytes288 Mbits/sec
[SUM]  0.0-10.0 sec521 MBytes437 Mbits/sec
[SUM]  0.0-10.0 sec525 MBytes440 Mbits/sec
[SUM]  0.0-10.0 sec533 MBytes446 Mbits/sec
per-device lock
[SUM]  0.0-13.0 sec512 MBytes331 Mbits/sec
[SUM]  0.0-10.0 sec617 MBytes517 Mbits/sec
[SUM]  0.0-10.1 sec631 MBytes527 Mbits/sec
[SUM]  0.0-10.0 sec626 MBytes524 Mbits/sec


-- 


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel