Re: [kvm-devel] [patch 00/13] RFC: split the global mutex
Marcelo Tosatti wrote: > On Sun, Apr 20, 2008 at 02:16:52PM +0300, Avi Kivity wrote: > >>> The iperf numbers are pretty good. Performance of UP guests increase >>> slightly but SMP >>> is quite significant. >>> >> I expect you're seeing contention induced by memcpy()s and inefficient >> emulation. With the dma api, I expect the benefit will drop. >> > > You still have to memcpy() with the dma api. Even with vringfd the > kernel->user copy has to be performed under the global mutex protection, > difference being that several packets can be copied per-syscall instead > of only one. > > Block does the copy outside the mutex protection, so net can be adapted to do the same. It does mean we will need to block all I/O temporarily during memory hotplug. >> For pure cpu emulation, there is a ton of work to be done: protecting >> the translator as well as making the translated code smp safe. >> > > I now believe there is a lot of work (which was not clear before). > Not particularly interested in getting real emulation to be > multithreaded. > > Anyways, the lack of multithreading in qemu emulation should not be a > blocker for these patches to get in, since these are infrastructural > changes. > > Getting this into qemu upstream is essential as this is far more intrusive than anything else we've done. But again, I believe there are many other fruit hanging from lower branches. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 00/13] RFC: split the global mutex
On Sun, Apr 20, 2008 at 02:16:52PM +0300, Avi Kivity wrote: > >The iperf numbers are pretty good. Performance of UP guests increase > >slightly but SMP > >is quite significant. > > I expect you're seeing contention induced by memcpy()s and inefficient > emulation. With the dma api, I expect the benefit will drop. You still have to memcpy() with the dma api. Even with vringfd the kernel->user copy has to be performed under the global mutex protection, difference being that several packets can be copied per-syscall instead of only one. > >Note that workloads with multiple busy devices (such as databases, web > >servers) should > >be the real winners. > > > >What is the feeling on this? Its not _that_ intrusive and can be easily > >NOP'ed out for > >QEMU. > > > > > > I think many parts are missing (or maybe, I missed them). You need to > lock the qemu internals (there are many read-mostly qemu caches > scattered around the code), lock against hotplug, etc. Yes, there are some parts missing, such as the bh list and hotplug as you mention. > For pure cpu emulation, there is a ton of work to be done: protecting > the translator as well as making the translated code smp safe. I now believe there is a lot of work (which was not clear before). Not particularly interested in getting real emulation to be multithreaded. Anyways, the lack of multithreading in qemu emulation should not be a blocker for these patches to get in, since these are infrastructural changes. > I think that QemuDevice makes sense, and that we want this long term, > but that we first need to improve efficiency (which reduces cpu > utilization _and_ improves scalability) rather than look at scalability > alone (which is much harder in addition to the drawback of not reducing > cpu utilization). Will complete the QEMUDevice+splitlock patchset, keep it uptodated, and test it under a wider variety of workloads. Thanks. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 00/13] RFC: split the global mutex
Marcelo Tosatti wrote: > Introduce QEMUDevice, making the ioport/iomem->device relationship visible. > > At the moment it only contains a lock, but could be extended. > > With it the following is possible: > - vcpu's to read/write via ioports/iomem while the iothread is working on > some unrelated device, or just copying data from the kernel. > - vcpu's to read/write via ioports/iomem to different devices > simultaneously. > > This patchset is only a proof of concept kind of thing, so only serial+raw > image > are supported. > > Tried two benchmarks, iperf and tiobench. With tiobench the reported latency > is > significantly lower (20%+), but throughput with IDE is only slightly higher. > > Expect to see larger improvements with a higher performing IO scheme (SCSI > still buggy, > looking at it). > > The iperf numbers are pretty good. Performance of UP guests increase slightly > but SMP > is quite significant. > > I expect you're seeing contention induced by memcpy()s and inefficient emulation. With the dma api, I expect the benefit will drop. > Note that workloads with multiple busy devices (such as databases, web > servers) should > be the real winners. > > What is the feeling on this? Its not _that_ intrusive and can be easily > NOP'ed out for > QEMU. > > I think many parts are missing (or maybe, I missed them). You need to lock the qemu internals (there are many read-mostly qemu caches scattered around the code), lock against hotplug, etc. For pure cpu emulation, there is a ton of work to be done: protecting the translator as well as making the translated code smp safe. I think that QemuDevice makes sense, and that we want this long term, but that we first need to improve efficiency (which reduces cpu utilization _and_ improves scalability) rather than look at scalability alone (which is much harder in addition to the drawback of not reducing cpu utilization). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [patch 00/13] RFC: split the global mutex
Introduce QEMUDevice, making the ioport/iomem->device relationship visible. At the moment it only contains a lock, but could be extended. With it the following is possible: - vcpu's to read/write via ioports/iomem while the iothread is working on some unrelated device, or just copying data from the kernel. - vcpu's to read/write via ioports/iomem to different devices simultaneously. This patchset is only a proof of concept kind of thing, so only serial+raw image are supported. Tried two benchmarks, iperf and tiobench. With tiobench the reported latency is significantly lower (20%+), but throughput with IDE is only slightly higher. Expect to see larger improvements with a higher performing IO scheme (SCSI still buggy, looking at it). The iperf numbers are pretty good. Performance of UP guests increase slightly but SMP is quite significant. Note that workloads with multiple busy devices (such as databases, web servers) should be the real winners. What is the feeling on this? Its not _that_ intrusive and can be easily NOP'ed out for QEMU. iperf -c 4 -i 60 e1000 UP guest: global lock [SUM] 0.0-10.0 sec156 MBytes131 Mbits/sec [SUM] 0.0-10.0 sec151 MBytes126 Mbits/sec [SUM] 0.0-10.0 sec151 MBytes126 Mbits/sec [SUM] 0.0-10.0 sec151 MBytes127 Mbits/sec per-device lock [SUM] 0.0-10.0 sec164 MBytes137 Mbits/sec [SUM] 0.0-10.0 sec161 MBytes135 Mbits/sec [SUM] 0.0-10.0 sec158 MBytes133 Mbits/sec [SUM] 0.0-10.0 sec171 MBytes143 Mbits/sec SMP guest (4-way) global lock [SUM] 0.0-13.0 sec402 MBytes259 Mbits/sec [SUM] 0.0-10.1 sec469 MBytes391 Mbits/sec [SUM] 0.0-10.1 sec477 MBytes397 Mbits/sec [SUM] 0.0-10.0 sec469 MBytes393 Mbits/sec per-device lock [SUM] 0.0-13.0 sec471 MBytes304 Mbits/sec [SUM] 0.0-10.2 sec532 MBytes439 Mbits/sec [SUM] 0.0-10.1 sec510 MBytes423 Mbits/sec [SUM] 0.0-10.1 sec529 MBytes441 Mbits/sec - virtio-net UP guest: global lock [SUM] 0.0-13.0 sec192 MBytes124 Mbits/sec [SUM] 0.0-10.0 sec213 MBytes178 Mbits/sec [SUM] 0.0-10.0 sec213 MBytes178 Mbits/sec [SUM] 0.0-10.0 sec213 MBytes178 Mbits/sec per-device lock [SUM] 0.0-13.0 sec193 MBytes125 Mbits/sec [SUM] 0.0-10.0 sec210 MBytes176 Mbits/sec [SUM] 0.0-10.0 sec218 MBytes183 Mbits/sec [SUM] 0.0-10.0 sec216 MBytes181 Mbits/sec SMP guest: global lock [SUM] 0.0-13.0 sec446 MBytes288 Mbits/sec [SUM] 0.0-10.0 sec521 MBytes437 Mbits/sec [SUM] 0.0-10.0 sec525 MBytes440 Mbits/sec [SUM] 0.0-10.0 sec533 MBytes446 Mbits/sec per-device lock [SUM] 0.0-13.0 sec512 MBytes331 Mbits/sec [SUM] 0.0-10.0 sec617 MBytes517 Mbits/sec [SUM] 0.0-10.1 sec631 MBytes527 Mbits/sec [SUM] 0.0-10.0 sec626 MBytes524 Mbits/sec -- - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel