Re: [ANNOUNCE] qemu-kvm-0.13.0-rc1
On 09/08/2010 11:33 PM, Anthony Liguori wrote: On 09/08/2010 03:05 PM, Arjan Koers wrote: On 2010-09-08 18:29, Marcelo Tosatti wrote: qemu-kvm-0.13.0-rc1 is now available. This release is based on the upstream qemu 0.13.0-rc1, plus kvm-specific enhancements. This release can be used with the kvm kernel modules provided by your distribution kernel, or by the modules in the kvm-kmod package, such as kvm-kmod-2.6.35. Please help with testing for a stable 0.13.0 release. The build fails when configure flag --disable-cpu-emulation is used: That flag needs to go away. It's perfectly reasonable to want to avoid building the tcg code if you aren't going to use it. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/4] Add a new API to virtio-pci
Rusty Russell wrote on 09/09/2010 09:19:39 AM: > On Wed, 8 Sep 2010 04:59:05 pm Krishna Kumar wrote: > > Add virtio_get_queue_index() to get the queue index of a > > vq. This is needed by the cb handler to locate the queue > > that should be processed. > > This seems a bit weird. I mean, the driver used vdev->config->find_vqs > to find the queues, which returns them (in order). So, can't you put this > into your struct send_queue? I am saving the vqs in the send_queue, but the cb needs to locate the device txq from the svq. The only other way I could think of is to iterate through the send_queue's and compare svq against sq[i]->svq, but cb's happen quite a bit. Is there a better way? static void skb_xmit_done(struct virtqueue *svq) { struct virtnet_info *vi = svq->vdev->priv; int qnum = virtio_get_queue_index(svq) - 1; /* 0 is RX vq */ /* Suppress further interrupts. */ virtqueue_disable_cb(svq); /* We were probably waiting for more output buffers. */ netif_wake_subqueue(vi->dev, qnum); } > Also, why define VIRTIO_MAX_TXQS? If the driver can't handle all of them, > it should simply not use them... The main reason was vhost :) Since vhost_net_release should not fail (__fput can't handle f_op->release() failure), I needed a maximum number of socks to clean up: #define MAX_VQS (1 + VIRTIO_MAX_TXQS) static int vhost_net_release(struct inode *inode, struct file *f) { struct vhost_net *n = f->private_data; struct vhost_dev *dev = &n->dev; struct socket *socks[MAX_VQS]; int i; vhost_net_stop(n, socks); vhost_net_flush(n); vhost_dev_cleanup(dev); for (i = n->dev.nvqs - 1; i >= 0; i--) if (socks[i]) fput(socks[i]->file); ... } Thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/4] Add a new API to virtio-pci
On Wed, 8 Sep 2010 04:59:05 pm Krishna Kumar wrote: > Add virtio_get_queue_index() to get the queue index of a > vq. This is needed by the cb handler to locate the queue > that should be processed. This seems a bit weird. I mean, the driver used vdev->config->find_vqs to find the queues, which returns them (in order). So, can't you put this into your struct send_queue? Also, why define VIRTIO_MAX_TXQS? If the driver can't handle all of them, it should simply not use them... Thanks! Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
vhost not working with version 12.5 with kernel 2.6.35.4
> >>When trying to use vhost I get the error "vhost-net requested but could > >>not be initialized". The only thing I have been able to find about this > >>problem relates to SElinux being turned off which mine is disabled and > >>permissive. Just wondering if there were any other thoughts on this > >>error? Am I correct that it should work with the .35.4 kernel and > >>version 12.5 KVM? > If you mean 0.12.5, no. If you mean 0.12.50 (i.e. a git checkout from some > point after 0.12.0 was released), then it depends on when the checkout is > from. I do mean 0.12.50 checked out from qemu-kvm via git a couple of weeks ago. If I can ask, is 0.12.5 just regular qemu and 0.12.50 qemu-kvm? > >>KVM Host OS: Fedora 12 x86_64 > >>KVM Guest OS Tiny Core Linux 2.6.33.3 kernel > >> Host kernel 2.6.35.4 and qemu-system-x86_64 12.5 compiled from from > >> qemu-kvm repo. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sorry! I made an error in my last email!
Hey, Sorry I made an error with the links in my last email. Here is how it should of been: Over the past few months I have taken a lot of my time to research and ask as many people as possible what the top 5 money making methods are. After weeks and weeks of different answers and even trying over 50 popular products and systems I have come to my conclusion and made the top 5 money making products online list. So here it goes: 1) The Mobile Monopoly - http://tiny.cc/ndonh 2) Auto Traffic Avalanche - http://tiny.cc/3wsuq 3) Auto Blog System - http://tiny.cc/ytf7r 4) Zero Cost Commissions - http://tiny.cc/ermdw 5) CPA Instruments - http://tiny.cc/ruh9b So there you have it. The reason I did this is because I am sick of "gurus" ripping people off, most of them are scammers! Be careful when buying online, only buy from trusted sources. I have checked the 5 sources above and so have thousands of other people just like you and me, and they do work. The problem with the internet is you don't know who to trust. The joke is you can check up reviews on some products and people would give it mixed reviews, some will be saying " amazing products, worked for me" others " a scam don't buy it" so sometimes you don't know who is telling the truth right? Well I let you in to a little secret here, most "gurus" will write reviews about their competitors saying how rubbish they are. They do this to wipe of competition. Therefore I decided I will take action and maybe get a some kind of Internet peace award for this :p so I tested and tested numerous products and came up with this list. So enjoy, it's worth checking these 5 products out, they do work especially the first two, the other three still work but are a bit over hyped. 1) The Mobile Monopoly - http://tiny.cc/ndonh 2) Auto Traffic Avalanche - http://tiny.cc/3wsuq 3) Auto Blog System - http://tiny.cc/ytf7r 4) Zero Cost Commissions - http://tiny.cc/ermdw 5) CPA Instruments - http://tiny.cc/ruh9b All the best, Sam L. Carl -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sorry! I made an error in my last email!
Hey, Sorry I made an error with the links in my last email. Here is how it should of been: Over the past few months I have taken a lot of my time to research and ask as many people as possible what the top 5 money making methods are. After weeks and weeks of different answers and even trying over 50 popular products and systems I have come to my conclusion and made the top 5 money making products online list. So here it goes: 1) The Mobile Monopoly - http://tiny.cc/ndonh 2) Auto Traffic Avalanche - http://tiny.cc/3wsuq 3) Auto Blog System - http://tiny.cc/ytf7r 4) Zero Cost Commissions - http://tiny.cc/ermdw 5) CPA Instruments - http://tiny.cc/ruh9b So there you have it. The reason I did this is because I am sick of "gurus" ripping people off, most of them are scammers! Be careful when buying online, only buy from trusted sources. I have checked the 5 sources above and so have thousands of other people just like you and me, and they do work. The problem with the internet is you don't know who to trust. The joke is you can check up reviews on some products and people would give it mixed reviews, some will be saying " amazing products, worked for me" others " a scam don't buy it" so sometimes you don't know who is telling the truth right? Well I let you in to a little secret here, most "gurus" will write reviews about their competitors saying how rubbish they are. They do this to wipe of competition. Therefore I decided I will take action and maybe get a some kind of Internet peace award for this :p so I tested and tested numerous products and came up with this list. So enjoy, it's worth checking these 5 products out, they do work especially the first two, the other three still work but are a bit over hyped. 1) The Mobile Monopoly - http://tiny.cc/ndonh 2) Auto Traffic Avalanche - http://tiny.cc/3wsuq 3) Auto Blog System - http://tiny.cc/ytf7r 4) Zero Cost Commissions - http://tiny.cc/ermdw 5) CPA Instruments - http://tiny.cc/ruh9b All the best, Sam L. Carl -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] qemu-kvm-0.13.0-rc1
On 09/08/2010 03:05 PM, Arjan Koers wrote: On 2010-09-08 18:29, Marcelo Tosatti wrote: qemu-kvm-0.13.0-rc1 is now available. This release is based on the upstream qemu 0.13.0-rc1, plus kvm-specific enhancements. This release can be used with the kvm kernel modules provided by your distribution kernel, or by the modules in the kvm-kmod package, such as kvm-kmod-2.6.35. Please help with testing for a stable 0.13.0 release. The build fails when configure flag --disable-cpu-emulation is used: That flag needs to go away. Regards, Anthony Liguori ... CCx86_64-softmmu/pcspk.o CCx86_64-softmmu/i8254.o CCx86_64-softmmu/i8254-kvm.o CCx86_64-softmmu/device-assignment.o LINK x86_64-softmmu/qemu-system-x86_64 exec.o: In function `cpu_exec_init_all': /home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_ctx' /home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_prologue_init' collect2: ld returned 1 exit status make[1]: *** [qemu-system-x86_64] Error 1 make: *** [subdir-x86_64-softmmu] Error 2 LINK x86_64-softmmu/qemu-system-x86_64 exec.o: In function `cpu_exec_init_all': /home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_ctx' /home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_prologue_init' collect2: ld returned 1 exit status make[1]: *** [qemu-system-x86_64] Error 1 make: *** [subdir-x86_64-softmmu] Error 2 When line 585 'tcg_prologue_init(&tcg_ctx);' is removed, the compilation succeeds and only one non-fatal warning remains: /home/kvm/qemu-kvm/target-i386/fake-exec.c:26: warning: no previous prototype for ‘code_gen_max_block_size’ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] qemu-kvm-0.13.0-rc1
On 2010-09-08 18:29, Marcelo Tosatti wrote: > > qemu-kvm-0.13.0-rc1 is now available. This release is based on the > upstream qemu 0.13.0-rc1, plus kvm-specific enhancements. > > This release can be used with the kvm kernel modules provided by your > distribution kernel, or by the modules in the kvm-kmod package, such > as kvm-kmod-2.6.35. > > Please help with testing for a stable 0.13.0 release. The build fails when configure flag --disable-cpu-emulation is used: ... CCx86_64-softmmu/pcspk.o CCx86_64-softmmu/i8254.o CCx86_64-softmmu/i8254-kvm.o CCx86_64-softmmu/device-assignment.o LINK x86_64-softmmu/qemu-system-x86_64 exec.o: In function `cpu_exec_init_all': /home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_ctx' /home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_prologue_init' collect2: ld returned 1 exit status make[1]: *** [qemu-system-x86_64] Error 1 make: *** [subdir-x86_64-softmmu] Error 2 LINK x86_64-softmmu/qemu-system-x86_64 exec.o: In function `cpu_exec_init_all': /home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_ctx' /home/kvm/qemu-kvm/exec.c:585: undefined reference to `tcg_prologue_init' collect2: ld returned 1 exit status make[1]: *** [qemu-system-x86_64] Error 1 make: *** [subdir-x86_64-softmmu] Error 2 When line 585 'tcg_prologue_init(&tcg_ctx);' is removed, the compilation succeeds and only one non-fatal warning remains: /home/kvm/qemu-kvm/target-i386/fake-exec.c:26: warning: no previous prototype for ‘code_gen_max_block_size’ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Implement multiqueue virtio-net
"Michael S. Tsirkin" wrote on 09/08/2010 01:40:11 PM: > ___ > >UDP (#numtxqs=8) > > N# BW1 BW2 (%) SD1 SD2 (%) > > __ > > 4 29836 56761 (90.24) 67 63(-5.97) > > 8 27666 63767 (130.48) 326 265 (-18.71) > > 16 25452 60665 (138.35) 13961269 (-9.09) > > 32 26172 63491 (142.59) 56174202 (-25.19) > > 48 26146 64629 (147.18) 12813 9316 (-27.29) > > 64 25575 65448 (155.90) 23063 16346 (-29.12) > > 128 26454 63772 (141.06) 91054 85051 (-6.59) > > __ > > N#: Number of netperf sessions, 90 sec runs > > BW1,SD1,RSD1: Bandwidth (sum across 2 runs in mbps), SD and Remote > > SD for original code > > BW2,SD2,RSD2: Bandwidth (sum across 2 runs in mbps), SD and Remote > > SD for new code. e.g. BW2=40716 means average BW2 was > > 20358 mbps. > > > > What happens with a single netperf? > host -> guest performance with TCP and small packet speed > are also worth measuring. Guest -> Host (single netperf): I am getting a drop of almost 20%. I am trying to figure out why. Host -> guest (single netperf): I am getting an improvement of almost 15%. Again - unexpected. Guest -> Host TCP_RR: I get an average 7.4% increase in #packets for runs upto 128 sessions. With fewer netperf (under 8), there was a drop of 3-7% in #packets, but beyond that, the #packets improved significantly to give an average improvement of 7.4%. So it seems that fewer sessions is having negative effect for some reason on the tx side. The code path in virtio-net has not changed much, so the drop in some cases is quite unexpected. Thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Implement multiqueue virtio-net
> On Wednesday 08 September 2010, Krishna Kumar2 wrote: > > > The new guest and qemu code work with old vhost-net, just with reduced > > > performance, yes? > > > > Yes, I have tested new guest/qemu with old vhost but using > > #numtxqs=1 (or not passing any arguments at all to qemu to > > enable MQ). Giving numtxqs > 1 fails with ENOBUFS in vhost, > > since vhost_net_set_backend in the unmodified vhost checks > > for boundary overflow. > > > > I have also tested running an unmodified guest with new > > vhost/qemu, but qemu should not specify numtxqs>1. > > Can you live migrate a new guest from new-qemu/new-kernel > to old-qemu/old-kernel, new-qemu/old-kernel and old-qemu/new-kernel? > If not, do we need to support all those cases? I have not tried this, though I added some minimal code in virtio_net_load and virtio_net_save. I don't know what needs to be done exactly at this time. I forgot to put this in the "Next steps" list of things to do. Thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH master/stable-0.12/stable-0.13] kvm: reset MSR_IA32_CR_PAT correctly
On Tue, Sep 07, 2010 at 04:21:22PM +0300, Avi Kivity wrote: > The power-on value of MSR_IA32_CR_PAT is not 0 - that disables cacheing and > makes everything dog slow. > > Fix to reset MSR_IA32_CR_PAT to the correct value. > > Signed-off-by: Avi Kivity > --- > qemu-kvm-x86.c | 11 ++- > 1 files changed, 10 insertions(+), 1 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] qemu-kvm-0.13.0-rc1
qemu-kvm-0.13.0-rc1 is now available. This release is based on the upstream qemu 0.13.0-rc1, plus kvm-specific enhancements. This release can be used with the kvm kernel modules provided by your distribution kernel, or by the modules in the kvm-kmod package, such as kvm-kmod-2.6.35. Please help with testing for a stable 0.13.0 release. http://www.linux-kvm.org -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm/e500v2: MMU optimization
On 09/08/2010 02:40 AM, Liu Yu wrote: The patchset aims at mapping guest TLB1 to host TLB0. And it includes: [PATCH 1/2] kvm/e500v2: Remove shadow tlb [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0 The reason we need patch 1 is because patch 1 make things simple and flexible. Only applying patch 1 aslo make kvm work. I've always thought the best long-term "optimization" on these cores is to share in the host PID allocation (i.e. __init_new_context()). This way, the TID in guest mappings would not overlap the TID in host mappings, and guest mappings could be demand-faulted rather than swapped wholesale. To do that, you would need to track the host PID in KVM data structures, I guess in the tlbe_ref structure. -- Hollis Blanchard Mentor Graphics, Embedded Systems Division -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] kvm/e500v2: Remove shadow tlb
On 09/08/2010 02:40 AM, Liu Yu wrote: It is unnecessary to keep shadow tlb. first, shadow tlb keep fixed value in shadow, which make things unflexible. second, remove shadow tlb can save a lot memory. This patch remove shadow tlb and caculate the shadow tlb entry value before we write it to hardware. Also we use new struct tlbe_ref to trace the relation between guest tlb entry and page. Did you look at the performance impact? Back in the day, we did essentially the same thing on 440. However, rather than discard the whole TLB when context switching away from the host (to be demand-faulted when the guest is resumed), we found a noticeable performance improvement by preserving a shadow TLB across context switches. We only use it in the vcpu_put/vcpu_load path. Of course, our TLB was much smaller (64 entries), so the use model may not be the same at all (e.g. it takes longer to restore a full guest TLB working set, but maybe it's not really possible to use all 1024 TLB0 entries in one timeslice anyways). -- Hollis Blanchard Mentor Graphics, Embedded Systems Division -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Implement multiqueue virtio-net
On Wednesday 08 September 2010, Krishna Kumar2 wrote: > > The new guest and qemu code work with old vhost-net, just with reduced > > performance, yes? > > Yes, I have tested new guest/qemu with old vhost but using > #numtxqs=1 (or not passing any arguments at all to qemu to > enable MQ). Giving numtxqs > 1 fails with ENOBUFS in vhost, > since vhost_net_set_backend in the unmodified vhost checks > for boundary overflow. > > I have also tested running an unmodified guest with new > vhost/qemu, but qemu should not specify numtxqs>1. Can you live migrate a new guest from new-qemu/new-kernel to old-qemu/old-kernel, new-qemu/old-kernel and old-qemu/new-kernel? If not, do we need to support all those cases? Arnd -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tracing KVM with Systemtap
On Wed, Sep 8, 2010 at 2:20 PM, Rayson Ho wrote: > Hi all, > > I am a developer of Systemtap. I am looking into tracing KVM (the kernel > part and QEMU) and also the KVM guests with Systemtap. I googled and > found references to Xenprobes and xdt+dtrace, and I was wondering if > someone is working on the dynamic tracing interface for KVM? > > I've read the KVM kernel code and I think some expensive operations > (things that need to be trapped back to the host kernel - eg. loading of > control registers on x86/x64) can be interesting spots for adding an SDT > (static marker), and I/O operations performed for the guests can be > useful information to collect. > > I know that KVM guests run like a userspace process and thus techniques > for tracing Xen might be overkilled, and also gdb can be used to trace > KVM guests. However, are that anything special I need to be aware of > before I go further into the development of the Systemtap KVM probes? > > (Opinions / Suggestions / Criticisms welcome!) Hi Rayson, For the KVM kernel module Linux trace events are already used. For example, see arch/x86/kvm/trace.h and check out /sys/kernel/debug/tracing/events/kvm/*. There is a set of useful static trace points for vm_exit/vm_enter, pio, mmio, etc. For the KVM guest there is perf-kvm(1). This allows perf(1) to look up addresses inside the guest (kernel only?). It produces system-wide performance profiles including guests. Perhaps someone can comment on perf-kvm's full feature set and limitations? For QEMU userspace Prerna Saxena and I are proposing a static tracing patchset. It abstracts the trace backend (SystemTap, LTTng UST, DTrace, etc) from the actual tracepoints so that portability can be achieved. There is a built-in trace backend that has a basic feature set but isn't as fancy as SystemTap. I have implemented LTTng Userspace Tracer support, perhaps you'd like to add SystemTap/DTrace support with sdt.h? http://www.mail-archive.com/qemu-de...@nongnu.org/msg41323.html http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/tracing_v3 Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2353510 ] Fedora 10 and F11 failures
Bugs item #2353510, was opened at 2008-11-27 13:46 Message generated for change (Comment added) made by jessorensen You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2353510&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Works For Me Priority: 9 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: Fedora 10 and F11 failures Initial Comment: Description: Fedora 10 fails to install on KVM. (KVM-79) The DVD version stucks at the near end setup stage, when trying to install GRUB bootloader into HDD. It didn't proceed within one hour, which indicates "stucked" VM. Sometimes it may stuck earlier - during init or during early setup. Live CD (32-bit) started fine on both Intel and AMD. (except top menu minor rendering bug) Guest(s): Fedora 10 64-bit Guest(s): Fedora 10 32-bit Host(s): Fedora 7 64-bit, Intel, KVM-79 Host(s): Fedora 7 64-bit, AMD, KVM-79 Command: (for DVD) qemu-kvm -cdrom /isos/linux/Fedora-10-x86_64-DVD.iso -m 512 -hda /vm/f10-64.qcow2 -boot d *and* (for LiveCD) qemu-kvm -cdrom /isos/linux/F10-i686-Live.iso -m 512 -Alexey, 27.11.2008. -- >Comment By: Jes Sorensen (jessorensen) Date: 2010-09-08 15:35 Message: Tried here with recent KVM / F13 host - installing F11 works just dandy, so problem has been fixed. Closing Jes -- Comment By: Technologov (technologov) Date: 2009-06-11 16:18 Message: Not only Fedora 10, but also Fedora 11 fails in the same way. Raising bug priority. Guest(s): Fedora 10 64-bit DVD Tested on KVM-86, Intel CPU. -- Comment By: Technologov (technologov) Date: 2008-12-02 11:39 Message: I have opened similar bug against Fedora 10 bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=474116 -Alexey -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2353510&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tracing KVM with Systemtap
Hi all, I am a developer of Systemtap. I am looking into tracing KVM (the kernel part and QEMU) and also the KVM guests with Systemtap. I googled and found references to Xenprobes and xdt+dtrace, and I was wondering if someone is working on the dynamic tracing interface for KVM? I've read the KVM kernel code and I think some expensive operations (things that need to be trapped back to the host kernel - eg. loading of control registers on x86/x64) can be interesting spots for adding an SDT (static marker), and I/O operations performed for the guests can be useful information to collect. I know that KVM guests run like a userspace process and thus techniques for tracing Xen might be overkilled, and also gdb can be used to trace KVM guests. However, are that anything special I need to be aware of before I go further into the development of the Systemtap KVM probes? (Opinions / Suggestions / Criticisms welcome!) Thanks, Rayson -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Implement multiqueue virtio-net
"Michael S. Tsirkin" wrote on 09/08/2010 04:18:33 PM: > ___ > > > > > >TCP (#numtxqs=2) > > > > N# BW1 BW2(%) SD1 SD2(%) RSD1 RSD2 > > (%) > > > > > > > > > > ___ > > > > > > 4 26387 40716 (54.30) 20 28 (40.00)86i 85 > > (-1.16) > > > > 8 24356 41843 (71.79) 88 129 (46.59)372 362 > > (-2.68) > > > > 16 23587 40546 (71.89) 375 564 (50.40)1558 1519 > > (-2.50) > > > > 32 22927 39490 (72.24) 16172171 (34.26)6694 5722 > > (-14.52) > > > > 48 23067 39238 (70.10) 39315170 (31.51)15823 13552 > > (-14.35) > > > > 64 22927 38750 (69.01) 71429914 (38.81)28972 26173 > > (-9.66) > > > > 96 22568 38520 (70.68) 16258 27844 (71.26) 65944 73031 > > (10.74) > > > > > > That's a significant hit in TCP SD. Is it caused by the imbalance between > > > number of queues for TX and RX? Since you mention RX is complete, > > > maybe measure with a balanced TX/RX? > > > > Yes, I am not sure why it is so high. > > Any errors at higher levels? Are any packets reordered? I haven't seen any messages logged, and retransmission is similar to non-mq case. Device also has no errors/dropped packets. Anything else I should look for? On the host: # ifconfig vnet0 vnet0 Link encap:Ethernet HWaddr 9A:9D:99:E1:CA:CE inet6 addr: fe80::989d:99ff:fee1:cace/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:5090371 errors:0 dropped:0 overruns:0 frame:0 TX packets:5054616 errors:0 dropped:0 overruns:65 carrier:0 collisions:0 txqueuelen:500 RX bytes:237793761392 (221.4 GiB) TX bytes:333630070 (318.1 MiB) # netstat -s |grep -i retrans 1310 segments retransmited 35 times recovered from packet loss due to fast retransmit 1 timeouts after reno fast retransmit 41 fast retransmits 1236 retransmits in slow start So retranmissions are 0.025% of total packets received from the guest. Thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Implement multiqueue virtio-net
On Wed, Sep 08, 2010 at 02:53:03PM +0530, Krishna Kumar2 wrote: > "Michael S. Tsirkin" wrote on 09/08/2010 01:40:11 PM: > > > > ___ > > > >TCP (#numtxqs=2) > > > N# BW1 BW2(%) SD1 SD2(%) RSD1RSD2 > (%) > > > > > > ___ > > > > 4 26387 40716 (54.30) 20 28 (40.00)86i 85 > (-1.16) > > > 8 24356 41843 (71.79) 88 129 (46.59)372 362 > (-2.68) > > > 16 23587 40546 (71.89) 375 564 (50.40)15581519 > (-2.50) > > > 32 22927 39490 (72.24) 16172171 (34.26)66945722 > (-14.52) > > > 48 23067 39238 (70.10) 39315170 (31.51)15823 13552 > (-14.35) > > > 64 22927 38750 (69.01) 71429914 (38.81)28972 26173 > (-9.66) > > > 96 22568 38520 (70.68) 16258 27844 (71.26) 65944 73031 > (10.74) > > > > That's a significant hit in TCP SD. Is it caused by the imbalance between > > number of queues for TX and RX? Since you mention RX is complete, > > maybe measure with a balanced TX/RX? > > Yes, I am not sure why it is so high. Any errors at higher levels? Are any packets reordered? > I found the same with #RX=#TX > too. As a hack, I tried ixgbe without MQ (set "indices=1" before > calling alloc_etherdev_mq, not sure if that is entirely correct) - > here too SD worsened by around 40%. I can't explain it, since the > virtio-net driver runs lock free once sch_direct_xmit gets > HARD_TX_LOCK for the specific txq. Maybe the SD calculation is not strictly > correct since > more threads are now running parallel and load is higher? Eg, if you > compare SD between > #netperfs = 8 vs 16 for original code (cut-n-paste relevant columns > only) ... > > N# BWSD > 8 24356 88 > 16 23587 375 > > ... SD has increased more than 4 times for the same BW. > > > What happens with a single netperf? > > host -> guest performance with TCP and small packet speed > > are also worth measuring. > > OK, I will do this and send the results later today. > > > At some level, host/guest communication is easy in that we don't really > > care which queue is used. I would like to give some thought (and > > testing) to how is this going to work with a real NIC card and packet > > steering at the backend. > > Any idea? > > I have done a little testing with guest -> remote server both > using a bridge and with macvtap (mq is required only for rx). > I didn't understand what you mean by packet steering though, > is it whether packets go out of the NIC on different queues? > If so, I verified that is the case by putting a counter and > displaying through /debug interface on the host. dev_queue_xmit > on the host handles it by calling dev_pick_tx(). > > > > Guest interrupts for a 4 TXQ device after a 5 min test: > > > # egrep "virtio0|CPU" /proc/interrupts > > > CPU0 CPU1 CPU2CPU3 > > > 40: 000 0PCI-MSI-edge virtio0-config > > > 41: 126955 126912 126505 126940 PCI-MSI-edge virtio0-input > > > 42: 108583 107787 107853 107716 PCI-MSI-edge virtio0-output.0 > > > 43: 300278 297653 299378 300554 PCI-MSI-edge virtio0-output.1 > > > 44: 372607 374884 371092 372011 PCI-MSI-edge virtio0-output.2 > > > 45: 162042 162261 163623 162923 PCI-MSI-edge virtio0-output.3 > > > > Does this mean each interrupt is constantly bouncing between CPUs? > > Yes. I didn't do *any* tuning for the tests. The only "tuning" > was to use 64K IO size with netperf. When I ran default netperf > (16K), I got a little lesser improvement in BW and worse(!) SD > than with 64K. > > Thanks, > > - KK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Implement multiqueue virtio-net
Avi Kivity wrote on 09/08/2010 02:58:21 PM: > >>> 1. This feature was first implemented with a single vhost. > >>> Testing showed 3-8% performance gain for upto 8 netperf > >>> sessions (and sometimes 16), but BW dropped with more > >>> sessions. However, implementing per-txq vhost improved > >>> BW significantly all the way to 128 sessions. > >> Why were vhost kernel changes required? Can't you just instantiate more > >> vhost queues? > > I did try using a single thread processing packets from multiple > > vq's on host, but the BW dropped beyond a certain number of > > sessions. > > Oh - so the interface has not changed (which can be seen from the > patch). That was my concern, I remembered that we planned for vhost-net > to be multiqueue-ready. > > The new guest and qemu code work with old vhost-net, just with reduced > performance, yes? Yes, I have tested new guest/qemu with old vhost but using #numtxqs=1 (or not passing any arguments at all to qemu to enable MQ). Giving numtxqs > 1 fails with ENOBUFS in vhost, since vhost_net_set_backend in the unmodified vhost checks for boundary overflow. I have also tested running an unmodified guest with new vhost/qemu, but qemu should not specify numtxqs>1. > > Are you suggesting this > > combination: > >IRQ on guest: > > 40: CPU0 > > 41: CPU1 > > 42: CPU2 > > 43: CPU3 (all CPUs are on socket #0) > >vhost: > > thread #0: CPU0 > > thread #1: CPU1 > > thread #2: CPU2 > > thread #3: CPU3 > >qemu: > > thread #0: CPU4 > > thread #1: CPU5 > > thread #2: CPU6 > > thread #3: CPU7 (all CPUs are on socket#1) > > May be better to put vcpu threads and vhost threads on the same socket. > > Also need to affine host interrupts. > > >netperf/netserver: > > Run on CPUs 0-4 on both sides > > > > The reason I did not optimize anything from user space is because > > I felt showing the default works reasonably well is important. > > Definitely. Heavy tuning is not a useful path for general end users. > We need to make sure the the scheduler is able to arrive at the optimal > layout without pinning (but perhaps with hints). OK, I will see if I can get results with this. Thanks for your suggestions, - KK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: fixup set_efer()
On 09/04/2010 04:29 PM, Hillf Danton wrote: The second call to kvm_mmu_reset_context() seems unnecessary and is removed. @@ -783,10 +783,6 @@ static int set_efer(struct kvm_vcpu *vcp vcpu->arch.mmu.base_role.nxe = (efer & EFER_NX) && !tdp_enabled; kvm_mmu_reset_context(vcpu); - /* Update reserved bits */ - if ((efer ^ old_efer) & EFER_NX) - kvm_mmu_reset_context(vcpu); - return 0; } Hm. As far as I can tell, it's the first call that is unnecessary. I'll look at the history and try to understand why it was introduced. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] kvm/e500v2: MMU optimization
The patchset aims at mapping guest TLB1 to host TLB0. And it includes: [PATCH 1/2] kvm/e500v2: Remove shadow tlb [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0 The reason we need patch 1 is because patch 1 make things simple and flexible. Only applying patch 1 aslo make kvm work. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0
Current guest TLB1 is mapped to host TLB1. As host kernel only provides 4K uncontinuous pages, we have to break guest large mapping into 4K shadow mappings. These 4K shadow mappings are then mapped into host TLB1 on fly. As host TLB1 only has 13 free entries, there's serious tlb miss. Since e500v2 has a big number of TLB0 entries, it should be help to map those 4K shadow mappings to host TLB0. To achieve this, we need to unlink guest tlb and host tlb, So that guest TLB1 mappings can route to any host TLB0 entries freely. Pages/mappings are considerred in the same kind as host tlb entry. This patch remove the link between pages and guest tlb entry to do the unlink. And keep host_tlb0_ref in each vcpu to trace pages. Then it's easy to map guest TLB1 to host TLB0. In guest ramdisk boot test(guest mainly uses TLB1), with this patch, the tlb miss number get down 90%. Signed-off-by: Liu Yu --- arch/powerpc/include/asm/kvm_e500.h |7 +- arch/powerpc/kvm/e500.c |4 + arch/powerpc/kvm/e500_tlb.c | 280 --- arch/powerpc/kvm/e500_tlb.h |1 + 4 files changed, 104 insertions(+), 188 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_e500.h b/arch/powerpc/include/asm/kvm_e500.h index cb785f9..16c0ed0 100644 --- a/arch/powerpc/include/asm/kvm_e500.h +++ b/arch/powerpc/include/asm/kvm_e500.h @@ -37,13 +37,10 @@ struct tlbe_ref { struct kvmppc_vcpu_e500 { /* Unmodified copy of the guest's TLB. */ struct tlbe *guest_tlb[E500_TLB_NUM]; - /* TLB that's actually used when the guest is running. */ - struct tlbe *shadow_tlb[E500_TLB_NUM]; - /* Pages which are referenced in the shadow TLB. */ - struct tlbe_ref *shadow_refs[E500_TLB_NUM]; + /* Pages which are referenced in host TLB. */ + struct tlbe_ref *host_tlb0_ref; unsigned int guest_tlb_size[E500_TLB_NUM]; - unsigned int shadow_tlb_size[E500_TLB_NUM]; unsigned int guest_tlb_nv[E500_TLB_NUM]; u32 host_pid[E500_PID_NUM]; diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c index e8a00b0..14af6d7 100644 --- a/arch/powerpc/kvm/e500.c +++ b/arch/powerpc/kvm/e500.c @@ -146,6 +146,10 @@ static int __init kvmppc_e500_init(void) if (r) return r; + r = kvmppc_e500_mmu_init(); + if (r) + return r; + /* copy extra E500 exception handlers */ ivor[0] = mfspr(SPRN_IVOR32); ivor[1] = mfspr(SPRN_IVOR33); diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c index 0b657af..a6c2320 100644 --- a/arch/powerpc/kvm/e500_tlb.c +++ b/arch/powerpc/kvm/e500_tlb.c @@ -25,9 +25,15 @@ #include "e500_tlb.h" #include "trace.h" -#define to_htlb1_esel(esel) (tlb1_entry_num - (esel) - 1) +static unsigned int host_tlb0_entry_num; +static unsigned int host_tlb0_assoc; +static unsigned int host_tlb0_assoc_bit; -static unsigned int tlb1_entry_num; +static inline unsigned int get_tlb0_entry_offset(u32 eaddr, u32 esel) +{ + return ((eaddr & 0x7F000) >> (12 - host_tlb0_assoc_bit) | + (esel & (host_tlb0_assoc - 1))); +} void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu) { @@ -62,11 +68,6 @@ static inline unsigned int tlb0_get_next_victim( return victim; } -static inline unsigned int tlb1_max_shadow_size(void) -{ - return tlb1_entry_num - tlbcam_index; -} - static inline int tlbe_is_writable(struct tlbe *tlbe) { return tlbe->mas3 & (MAS3_SW|MAS3_UW); @@ -100,7 +101,7 @@ static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) /* * writing shadow tlb entry to host TLB */ -static inline void __write_host_tlbe(struct tlbe *stlbe) +static inline void __host_tlbe_write(struct tlbe *stlbe) { mtspr(SPRN_MAS1, stlbe->mas1); mtspr(SPRN_MAS2, stlbe->mas2); @@ -109,25 +110,22 @@ static inline void __write_host_tlbe(struct tlbe *stlbe) __asm__ __volatile__ ("tlbwe\n" : : ); } -static inline void write_host_tlbe(struct kvmppc_vcpu_e500 *vcpu_e500, - int tlbsel, int esel, struct tlbe *stlbe) +static inline u32 host_tlb0_write(struct kvmppc_vcpu_e500 *vcpu_e500, + u32 gvaddr, struct tlbe *stlbe) { - local_irq_disable(); - if (tlbsel == 0) { - __write_host_tlbe(stlbe); - } else { - unsigned register mas0; + unsigned register mas0; - mas0 = mfspr(SPRN_MAS0); + local_irq_disable(); - mtspr(SPRN_MAS0, MAS0_TLBSEL(1) | MAS0_ESEL(to_htlb1_esel(esel))); - __write_host_tlbe(stlbe); + mas0 = mfspr(SPRN_MAS0); + __host_tlbe_write(stlbe); - mtspr(SPRN_MAS0, mas0); - } local_irq_enable(); - trace_kvm_stlb_write(index_of(tlbsel, esel), stlbe->mas1, stlbe->mas2, + + trace_kvm_stlb_write(mas0, stlbe->mas1, stlbe->mas2, stlbe->mas3, stlbe->mas7); + +
[PATCH 1/2] kvm/e500v2: Remove shadow tlb
It is unnecessary to keep shadow tlb. first, shadow tlb keep fixed value in shadow, which make things unflexible. second, remove shadow tlb can save a lot memory. This patch remove shadow tlb and caculate the shadow tlb entry value before we write it to hardware. Also we use new struct tlbe_ref to trace the relation between guest tlb entry and page. Signed-off-by: Liu Yu --- arch/powerpc/include/asm/kvm_e500.h |7 +- arch/powerpc/kvm/e500_tlb.c | 287 +-- 2 files changed, 108 insertions(+), 186 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_e500.h b/arch/powerpc/include/asm/kvm_e500.h index 7fea26f..cb785f9 100644 --- a/arch/powerpc/include/asm/kvm_e500.h +++ b/arch/powerpc/include/asm/kvm_e500.h @@ -29,13 +29,18 @@ struct tlbe{ u32 mas7; }; +struct tlbe_ref { + struct page *page; + struct tlbe *gtlbe; +}; + struct kvmppc_vcpu_e500 { /* Unmodified copy of the guest's TLB. */ struct tlbe *guest_tlb[E500_TLB_NUM]; /* TLB that's actually used when the guest is running. */ struct tlbe *shadow_tlb[E500_TLB_NUM]; /* Pages which are referenced in the shadow TLB. */ - struct page **shadow_pages[E500_TLB_NUM]; + struct tlbe_ref *shadow_refs[E500_TLB_NUM]; unsigned int guest_tlb_size[E500_TLB_NUM]; unsigned int shadow_tlb_size[E500_TLB_NUM]; diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c index f11ca0f..0b657af 100644 --- a/arch/powerpc/kvm/e500_tlb.c +++ b/arch/powerpc/kvm/e500_tlb.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2008 Freescale Semiconductor, Inc. All rights reserved. + * Copyright (C) 2008, 2010 Freescale Semiconductor, Inc. All rights reserved. * * Author: Yu Liu, yu@freescale.com * @@ -48,17 +48,6 @@ void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu) tlbe->mas3, tlbe->mas7); } } - - for (tlbsel = 0; tlbsel < 2; tlbsel++) { - printk("Shadow TLB%d:\n", tlbsel); - for (i = 0; i < vcpu_e500->shadow_tlb_size[tlbsel]; i++) { - tlbe = &vcpu_e500->shadow_tlb[tlbsel][i]; - if (tlbe->mas1 & MAS1_VALID) - printk(" S[%d][%3d] | %08X | %08X | %08X | %08X |\n", - tlbsel, i, tlbe->mas1, tlbe->mas2, - tlbe->mas3, tlbe->mas7); - } - } } static inline unsigned int tlb0_get_next_victim( @@ -121,10 +110,8 @@ static inline void __write_host_tlbe(struct tlbe *stlbe) } static inline void write_host_tlbe(struct kvmppc_vcpu_e500 *vcpu_e500, - int tlbsel, int esel) + int tlbsel, int esel, struct tlbe *stlbe) { - struct tlbe *stlbe = &vcpu_e500->shadow_tlb[tlbsel][esel]; - local_irq_disable(); if (tlbsel == 0) { __write_host_tlbe(stlbe); @@ -139,28 +126,12 @@ static inline void write_host_tlbe(struct kvmppc_vcpu_e500 *vcpu_e500, mtspr(SPRN_MAS0, mas0); } local_irq_enable(); + trace_kvm_stlb_write(index_of(tlbsel, esel), stlbe->mas1, stlbe->mas2, + stlbe->mas3, stlbe->mas7); } void kvmppc_e500_tlb_load(struct kvm_vcpu *vcpu, int cpu) { - struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu); - int i; - unsigned register mas0; - - /* Load all valid TLB1 entries to reduce guest tlb miss fault */ - local_irq_disable(); - mas0 = mfspr(SPRN_MAS0); - for (i = 0; i < tlb1_max_shadow_size(); i++) { - struct tlbe *stlbe = &vcpu_e500->shadow_tlb[1][i]; - - if (get_tlb_v(stlbe)) { - mtspr(SPRN_MAS0, MAS0_TLBSEL(1) - | MAS0_ESEL(to_htlb1_esel(i))); - __write_host_tlbe(stlbe); - } - } - mtspr(SPRN_MAS0, mas0); - local_irq_enable(); } void kvmppc_e500_tlb_put(struct kvm_vcpu *vcpu) @@ -202,16 +173,19 @@ static int kvmppc_e500_tlb_index(struct kvmppc_vcpu_e500 *vcpu_e500, } static void kvmppc_e500_shadow_release(struct kvmppc_vcpu_e500 *vcpu_e500, - int tlbsel, int esel) + int stlbsel, int sesel) { - struct tlbe *stlbe = &vcpu_e500->shadow_tlb[tlbsel][esel]; - struct page *page = vcpu_e500->shadow_pages[tlbsel][esel]; + struct tlbe_ref *ref; + struct page *page; + + ref = &vcpu_e500->shadow_refs[stlbsel][sesel]; + page = ref->page; if (page) { - vcpu_e500->shadow_pages[tlbsel][esel] = NULL; + ref->page = NULL; - if (get_tlb_v(stlbe)) { - if (tlbe_is_writable(stlbe)) + if (get_tlb_v(ref->gtlbe)) { + if (tlbe_is_writable(ref->gtlbe)) kvm_release_page_dirty(page);
Re: [PATCH] KVM: x86: fixup kvm_set_cr4()
On 09/04/2010 03:43 PM, Hillf Danton wrote: Subject lines such as "fixup $x" are too general. Try to make them more specific. X86_CR4_VMXE is checked earlier, since [1] virtualization is not allowed in guest, Why does that matter? Note it may change one day. [2] load_pdptrs() could be saved. The common case is that the mov does not fault and we have to call load_pdptrs() anyway. It's a little cleaner to check before doing anything, though. Signed-off-by: Hillf Danton mailto:dhi...@gmail.com>> --- --- o/linux-2.6.36-rc1/arch/x86/kvm/x86.c 2010-08-16 08:41:38.0 +0800 +++ m/linux-2.6.36-rc1/arch/x86/kvm/x86.c 2010-09-04 20:25:04.0 +0800 @@ -539,6 +539,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, u if (cr4 & CR4_RESERVED_BITS) return 1; + if (cr4 & X86_CR4_VMXE) + return 1; + if (!guest_cpuid_has_xsave(vcpu) && (cr4 & X86_CR4_OSXSAVE)) return 1; @@ -550,9 +553,6 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, u && !load_pdptrs(vcpu, vcpu->arch.cr3)) return 1; - if (cr4 & X86_CR4_VMXE) - return 1; - kvm_x86_ops->set_cr4(vcpu, cr4); if ((cr4 ^ old_cr4) & pdptr_bits) -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 19/27] KVM: X86: Propagate fetch faults
On Tue, Sep 07, 2010 at 02:43:16PM -0400, Marcelo Tosatti wrote: > On Mon, Sep 06, 2010 at 05:55:58PM +0200, Joerg Roedel wrote: > > r = x86_decode_insn(&vcpu->arch.emulate_ctxt); > > + if (r == X86EMUL_PROPAGATE_FAULT) > > + goto done; > > + > > x86_decode_insn returns -1 / 0 ? Yes. This looks like a left-over from v2 of the patch-set. I'll check the path again and remove it if not necessary anymore. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Implement multiqueue virtio-net
On 09/08/2010 12:22 PM, Krishna Kumar2 wrote: Avi Kivity wrote on 09/08/2010 01:17:34 PM: On 09/08/2010 10:28 AM, Krishna Kumar wrote: Following patches implement Transmit mq in virtio-net. Also included is the user qemu changes. 1. This feature was first implemented with a single vhost. Testing showed 3-8% performance gain for upto 8 netperf sessions (and sometimes 16), but BW dropped with more sessions. However, implementing per-txq vhost improved BW significantly all the way to 128 sessions. Why were vhost kernel changes required? Can't you just instantiate more vhost queues? I did try using a single thread processing packets from multiple vq's on host, but the BW dropped beyond a certain number of sessions. Oh - so the interface has not changed (which can be seen from the patch). That was my concern, I remembered that we planned for vhost-net to be multiqueue-ready. The new guest and qemu code work with old vhost-net, just with reduced performance, yes? I don't have the code and performance numbers for that right now since it is a bit ancient, I can try to resuscitate that if you want. No need. Guest interrupts for a 4 TXQ device after a 5 min test: # egrep "virtio0|CPU" /proc/interrupts CPU0 CPU1 CPU2CPU3 40: 000 0PCI-MSI-edge virtio0-config 41: 126955 126912 126505 126940 PCI-MSI-edge virtio0-input 42: 108583 107787 107853 107716 PCI-MSI-edge virtio0-output.0 43: 300278 297653 299378 300554 PCI-MSI-edge virtio0-output.1 44: 372607 374884 371092 372011 PCI-MSI-edge virtio0-output.2 45: 162042 162261 163623 162923 PCI-MSI-edge virtio0-output.3 How are vhost threads and host interrupts distributed? We need to move vhost queue threads to be colocated with the related vcpu threads (if no extra cores are available) or on the same socket (if extra cores are available). Similarly, move device interrupts to the same core as the vhost thread. All my testing was without any tuning, including binding netperf& netserver (irqbalance is also off). I assume (maybe wrongly) that the above might give better results? I hope so! Are you suggesting this combination: IRQ on guest: 40: CPU0 41: CPU1 42: CPU2 43: CPU3 (all CPUs are on socket #0) vhost: thread #0: CPU0 thread #1: CPU1 thread #2: CPU2 thread #3: CPU3 qemu: thread #0: CPU4 thread #1: CPU5 thread #2: CPU6 thread #3: CPU7 (all CPUs are on socket#1) May be better to put vcpu threads and vhost threads on the same socket. Also need to affine host interrupts. netperf/netserver: Run on CPUs 0-4 on both sides The reason I did not optimize anything from user space is because I felt showing the default works reasonably well is important. Definitely. Heavy tuning is not a useful path for general end users. We need to make sure the the scheduler is able to arrive at the optimal layout without pinning (but perhaps with hints). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Implement multiqueue virtio-net
Hi Michael, "Michael S. Tsirkin" wrote on 09/08/2010 01:43:26 PM: > On Wed, Sep 08, 2010 at 12:58:59PM +0530, Krishna Kumar wrote: > > 1. mq RX patch is also complete - plan to submit once TX is OK. > > It's good that you split patches, I think it would be interesting to see > the RX patches at least once to complete the picture. > You could make it a separate patchset, tag them as RFC. OK, I need to re-do some parts of it, since I started the TX only branch a couple of weeks earlier and the RX side is outdated. I will try to send that out in the next couple of days, as you say it will help to complete the picture. Reasons to send it only TX now: - Reduce size of patch and complexity - I didn't get much improvement on multiple RX patch (netperf from host -> guest), so needed some time to figure out the reason and fix it. Thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/27] KVM: MMU: Add infrastructure for two-level page walker
On Mon, Sep 06, 2010 at 02:05:35PM -0400, Avi Kivity wrote: > On 09/06/2010 06:55 PM, Joerg Roedel wrote: > > This patch introduces a mmu-callback to translate gpa > > addresses in the walk_addr code. This is later used to > > translate l2_gpa addresses into l1_gpa addresses. > > > @@ -534,6 +534,11 @@ static inline gpa_t gfn_to_gpa(gfn_t gfn) > > return (gpa_t)gfn<< PAGE_SHIFT; > > } > > > > +static inline gfn_t gpa_to_gfn(gpa_t gpa) > > +{ > > + return (gfn_t)gpa>> PAGE_SHIFT; > > +} > > + > > That's a bug - gfn_t may be smaller than gpa_t, so you're truncating > just before the shift. Note the casts in the surrounding functions are > widening, not narrowing. > > However, gfn_t is u64 so the bug is only theoretical. Will fix that in v4 too. Thanks. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Implement multiqueue virtio-net
"Michael S. Tsirkin" wrote on 09/08/2010 01:40:11 PM: > ___ > >TCP (#numtxqs=2) > > N# BW1 BW2(%) SD1 SD2(%) RSD1RSD2 (%) > > > ___ > > 4 26387 40716 (54.30) 20 28 (40.00)86i 85 (-1.16) > > 8 24356 41843 (71.79) 88 129 (46.59)372 362 (-2.68) > > 16 23587 40546 (71.89) 375 564 (50.40)15581519 (-2.50) > > 32 22927 39490 (72.24) 16172171 (34.26)66945722 (-14.52) > > 48 23067 39238 (70.10) 39315170 (31.51)15823 13552 (-14.35) > > 64 22927 38750 (69.01) 71429914 (38.81)28972 26173 (-9.66) > > 96 22568 38520 (70.68) 16258 27844 (71.26) 65944 73031 (10.74) > > That's a significant hit in TCP SD. Is it caused by the imbalance between > number of queues for TX and RX? Since you mention RX is complete, > maybe measure with a balanced TX/RX? Yes, I am not sure why it is so high. I found the same with #RX=#TX too. As a hack, I tried ixgbe without MQ (set "indices=1" before calling alloc_etherdev_mq, not sure if that is entirely correct) - here too SD worsened by around 40%. I can't explain it, since the virtio-net driver runs lock free once sch_direct_xmit gets HARD_TX_LOCK for the specific txq. Maybe the SD calculation is not strictly correct since more threads are now running parallel and load is higher? Eg, if you compare SD between #netperfs = 8 vs 16 for original code (cut-n-paste relevant columns only) ... N# BWSD 8 24356 88 16 23587 375 ... SD has increased more than 4 times for the same BW. > What happens with a single netperf? > host -> guest performance with TCP and small packet speed > are also worth measuring. OK, I will do this and send the results later today. > At some level, host/guest communication is easy in that we don't really > care which queue is used. I would like to give some thought (and > testing) to how is this going to work with a real NIC card and packet > steering at the backend. > Any idea? I have done a little testing with guest -> remote server both using a bridge and with macvtap (mq is required only for rx). I didn't understand what you mean by packet steering though, is it whether packets go out of the NIC on different queues? If so, I verified that is the case by putting a counter and displaying through /debug interface on the host. dev_queue_xmit on the host handles it by calling dev_pick_tx(). > > Guest interrupts for a 4 TXQ device after a 5 min test: > > # egrep "virtio0|CPU" /proc/interrupts > > CPU0 CPU1 CPU2CPU3 > > 40: 000 0PCI-MSI-edge virtio0-config > > 41: 126955 126912 126505 126940 PCI-MSI-edge virtio0-input > > 42: 108583 107787 107853 107716 PCI-MSI-edge virtio0-output.0 > > 43: 300278 297653 299378 300554 PCI-MSI-edge virtio0-output.1 > > 44: 372607 374884 371092 372011 PCI-MSI-edge virtio0-output.2 > > 45: 162042 162261 163623 162923 PCI-MSI-edge virtio0-output.3 > > Does this mean each interrupt is constantly bouncing between CPUs? Yes. I didn't do *any* tuning for the tests. The only "tuning" was to use 64K IO size with netperf. When I ran default netperf (16K), I got a little lesser improvement in BW and worse(!) SD than with 64K. Thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Implement multiqueue virtio-net
Avi Kivity wrote on 09/08/2010 01:17:34 PM: > On 09/08/2010 10:28 AM, Krishna Kumar wrote: > > Following patches implement Transmit mq in virtio-net. Also > > included is the user qemu changes. > > > > 1. This feature was first implemented with a single vhost. > > Testing showed 3-8% performance gain for upto 8 netperf > > sessions (and sometimes 16), but BW dropped with more > > sessions. However, implementing per-txq vhost improved > > BW significantly all the way to 128 sessions. > > Why were vhost kernel changes required? Can't you just instantiate more > vhost queues? I did try using a single thread processing packets from multiple vq's on host, but the BW dropped beyond a certain number of sessions. I don't have the code and performance numbers for that right now since it is a bit ancient, I can try to resuscitate that if you want. > > Guest interrupts for a 4 TXQ device after a 5 min test: > > # egrep "virtio0|CPU" /proc/interrupts > >CPU0 CPU1 CPU2CPU3 > > 40: 000 0PCI-MSI-edge virtio0-config > > 41: 126955 126912 126505 126940 PCI-MSI-edge virtio0-input > > 42: 108583 107787 107853 107716 PCI-MSI-edge virtio0-output.0 > > 43: 300278 297653 299378 300554 PCI-MSI-edge virtio0-output.1 > > 44: 372607 374884 371092 372011 PCI-MSI-edge virtio0-output.2 > > 45: 162042 162261 163623 162923 PCI-MSI-edge virtio0-output.3 > > How are vhost threads and host interrupts distributed? We need to move > vhost queue threads to be colocated with the related vcpu threads (if no > extra cores are available) or on the same socket (if extra cores are > available). Similarly, move device interrupts to the same core as the > vhost thread. All my testing was without any tuning, including binding netperf & netserver (irqbalance is also off). I assume (maybe wrongly) that the above might give better results? Are you suggesting this combination: IRQ on guest: 40: CPU0 41: CPU1 42: CPU2 43: CPU3 (all CPUs are on socket #0) vhost: thread #0: CPU0 thread #1: CPU1 thread #2: CPU2 thread #3: CPU3 qemu: thread #0: CPU4 thread #1: CPU5 thread #2: CPU6 thread #3: CPU7 (all CPUs are on socket#1) netperf/netserver: Run on CPUs 0-4 on both sides The reason I did not optimize anything from user space is because I felt showing the default works reasonably well is important. Thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 22/27] KVM: MMU: Refactor mmu_alloc_roots function
On Wed, Sep 08, 2010 at 03:16:59AM -0400, Avi Kivity wrote: > On 09/07/2010 11:39 PM, Marcelo Tosatti wrote: > > > >> @@ -2406,16 +2441,11 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) > >>root_gfn = pdptr>> PAGE_SHIFT; > >>if (mmu_check_root(vcpu, root_gfn)) > >>return 1; > >> - } else if (vcpu->arch.mmu.root_level == 0) > >> - root_gfn = 0; > >> - if (vcpu->arch.mmu.direct_map) { > >> - direct = 1; > >> - root_gfn = i<< 30; > >>} > >>spin_lock(&vcpu->kvm->mmu_lock); > >>kvm_mmu_free_some_pages(vcpu); > >>sp = kvm_mmu_get_page(vcpu, root_gfn, i<< 30, > >> -PT32_ROOT_LEVEL, direct, > >> +PT32_ROOT_LEVEL, 0, > >> ACC_ALL, NULL); > > Should not write protect the gfn for nonpaging mode. > > > > nonpaging mode should have direct_map set, so wouldn't enter this path > at all. Hmm, actually the nonpaging path does not set direct_map. I'll fix this too in v4. Thanks. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/27] KVM: MMU: Make walk_addr_generic capable for two-level walking
On Tue, Sep 07, 2010 at 01:48:05PM -0400, Marcelo Tosatti wrote: > On Mon, Sep 06, 2010 at 05:55:53PM +0200, Joerg Roedel wrote: > > This patch uses kvm_read_guest_page_tdp to make the > > walk_addr_generic functions suitable for two-level page > > table walking. > > > > Signed-off-by: Joerg Roedel > > --- > > arch/x86/kvm/paging_tmpl.h | 27 --- > > 1 files changed, 20 insertions(+), 7 deletions(-) > > > > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h > > index cd59af1..a5b5759 100644 > > --- a/arch/x86/kvm/paging_tmpl.h > > +++ b/arch/x86/kvm/paging_tmpl.h > > @@ -124,6 +124,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker > > *walker, > > unsigned index, pt_access, uninitialized_var(pte_access); > > gpa_t pte_gpa; > > bool eperm, present, rsvd_fault; > > + int offset; > > + u32 error = 0; > > > > trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault, > > fetch_fault); > > @@ -153,12 +155,13 @@ walk: > > index = PT_INDEX(addr, walker->level); > > > > table_gfn = gpte_to_gfn(pte); > > - pte_gpa = gfn_to_gpa(table_gfn); > > - pte_gpa += index * sizeof(pt_element_t); > > + offset= index * sizeof(pt_element_t); > > + pte_gpa = gfn_to_gpa(table_gfn) + offset; > > walker->table_gfn[walker->level - 1] = table_gfn; > > walker->pte_gpa[walker->level - 1] = pte_gpa; > > > > - if (kvm_read_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte))) { > > + if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, &pte, offset, > > + sizeof(pte), &error)) { > > present = false; > > break; > > } > > If there is failure reading the nested page tables here, you fill > vcpu->arch.fault. But the nested fault error values will be overwritten > at the end of walk_addr() by the original fault values? True. Thanks for pointing that out. I will write a test-case for that too. The results from my implemented tests show that sometimes the error code is not reported correctly too. So I decided to do a v4 of this patch-set with all found issues fixed. Thanks for your review. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/2] qemu-kvm: use usptream eventfd code
On 09/07/2010 08:25 PM, Marcelo Tosatti wrote: On Tue, Sep 07, 2010 at 11:21:32AM +0300, Avi Kivity wrote: On 09/06/2010 11:20 PM, Marcelo Tosatti wrote: Upstream code is equivalent. Signed-off-by: Marcelo Tosatti Index: qemu-kvm/cpus.c === --- qemu-kvm.orig/cpus.c +++ qemu-kvm/cpus.c @@ -290,11 +290,6 @@ void qemu_notify_event(void) { CPUState *env = cpu_single_env; -if (kvm_enabled()) { -qemu_kvm_notify_work(); -return; -} - qemu_event_increment (); if (env) { cpu_exit(env); qemu_event_increment() is indeed equivalent, but what about the rest? Are we guaranteed that cpu_single_env == NULL? No, its not NULL. But env->current is, so its fine. Ok, thanks. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Implement multiqueue virtio-net
On Wed, Sep 08, 2010 at 12:58:59PM +0530, Krishna Kumar wrote: > 1. mq RX patch is also complete - plan to submit once TX is OK. It's good that you split patches, I think it would be interesting to see the RX patches at least once to complete the picture. You could make it a separate patchset, tag them as RFC. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Implement multiqueue virtio-net
On Wed, Sep 08, 2010 at 12:58:59PM +0530, Krishna Kumar wrote: > Following patches implement Transmit mq in virtio-net. Also > included is the user qemu changes. > > 1. This feature was first implemented with a single vhost. >Testing showed 3-8% performance gain for upto 8 netperf >sessions (and sometimes 16), but BW dropped with more >sessions. However, implementing per-txq vhost improved >BW significantly all the way to 128 sessions. > 2. For this mq TX patch, 1 daemon is created for RX and 'n' >daemons for the 'n' TXQ's, for a total of (n+1) daemons. >The (subsequent) RX mq patch changes that to a total of >'n' daemons, where RX and TX vq's share 1 daemon. > 3. Service Demand increases for TCP, but significantly >improves for UDP. > 4. Interoperability: Many combinations, but not all, of >qemu, host, guest tested together. > > > Enabling mq on virtio: > --- > > When following options are passed to qemu: > - smp > 1 > - vhost=on > - mq=on (new option, default:off) > then #txqueues = #cpus. The #txqueues can be changed by using > an optional 'numtxqs' option. e.g. for a smp=4 guest: > vhost=on,mq=on -> #txqueues = 4 > vhost=on,mq=on,numtxqs=8 -> #txqueues = 8 > vhost=on,mq=on,numtxqs=2 -> #txqueues = 2 > > >Performance (guest -> local host): >--- > > System configuration: > Host: 8 Intel Xeon, 8 GB memory > Guest: 4 cpus, 2 GB memory > All testing without any tuning, and TCP netperf with 64K I/O > ___ >TCP (#numtxqs=2) > N# BW1 BW2(%) SD1 SD2(%) RSD1RSD2(%) > ___ > 4 26387 40716 (54.30) 20 28 (40.00)86i 85 (-1.16) > 8 24356 41843 (71.79) 88 129 (46.59)372 362(-2.68) > 16 23587 40546 (71.89) 375 564 (50.40)15581519 (-2.50) > 32 22927 39490 (72.24) 16172171 (34.26)66945722 > (-14.52) > 48 23067 39238 (70.10) 39315170 (31.51)15823 13552 > (-14.35) > 64 22927 38750 (69.01) 71429914 (38.81)28972 26173 (-9.66) > 96 22568 38520 (70.68) 16258 27844 (71.26) 65944 73031 (10.74) That's a significant hit in TCP SD. Is it caused by the imbalance between number of queues for TX and RX? Since you mention RX is complete, maybe measure with a balanced TX/RX? > ___ >UDP (#numtxqs=8) > N# BW1 BW2 (%) SD1 SD2 (%) > __ > 4 29836 56761 (90.24) 67 63(-5.97) > 8 27666 63767 (130.48) 326 265 (-18.71) > 16 25452 60665 (138.35) 13961269 (-9.09) > 32 26172 63491 (142.59) 56174202 (-25.19) > 48 26146 64629 (147.18) 12813 9316 (-27.29) > 64 25575 65448 (155.90) 23063 16346 (-29.12) > 128 26454 63772 (141.06) 91054 85051 (-6.59) > __ > N#: Number of netperf sessions, 90 sec runs > BW1,SD1,RSD1: Bandwidth (sum across 2 runs in mbps), SD and Remote > SD for original code > BW2,SD2,RSD2: Bandwidth (sum across 2 runs in mbps), SD and Remote > SD for new code. e.g. BW2=40716 means average BW2 was > 20358 mbps. > What happens with a single netperf? host -> guest performance with TCP and small packet speed are also worth measuring. >Next steps: >--- > > 1. mq RX patch is also complete - plan to submit once TX is OK. > 2. Cache-align data structures: I didn't see any BW/SD improvement >after making the sq's (and similarly for vhost) cache-aligned >statically: > struct virtnet_info { > ... > struct send_queue sq[16] cacheline_aligned_in_smp; > ... > }; > At some level, host/guest communication is easy in that we don't really care which queue is used. I would like to give some thought (and testing) to how is this going to work with a real NIC card and packet steering at the backend. Any idea? > Guest interrupts for a 4 TXQ device after a 5 min test: > # egrep "virtio0|CPU" /proc/interrupts > CPU0 CPU1 CPU2CPU3 > 40: 000 0PCI-MSI-edge virtio0-config > 41: 126955 126912 126505 126940 PCI-MSI-edge virtio0-input > 42: 108583 107787 107853 107716 PCI-MSI-edge virtio0-output.0 > 43: 300278 297653 299378 300554 PCI-MSI-edge virtio0-ou
Re: [RFC PATCH 0/4] Implement multiqueue virtio-net
On 09/08/2010 10:28 AM, Krishna Kumar wrote: Following patches implement Transmit mq in virtio-net. Also included is the user qemu changes. 1. This feature was first implemented with a single vhost. Testing showed 3-8% performance gain for upto 8 netperf sessions (and sometimes 16), but BW dropped with more sessions. However, implementing per-txq vhost improved BW significantly all the way to 128 sessions. Why were vhost kernel changes required? Can't you just instantiate more vhost queues? 2. For this mq TX patch, 1 daemon is created for RX and 'n' daemons for the 'n' TXQ's, for a total of (n+1) daemons. The (subsequent) RX mq patch changes that to a total of 'n' daemons, where RX and TX vq's share 1 daemon. 3. Service Demand increases for TCP, but significantly improves for UDP. 4. Interoperability: Many combinations, but not all, of qemu, host, guest tested together. Please update the virtio-pci spec @ http://ozlabs.org/~rusty/virtio-spec/. Enabling mq on virtio: --- When following options are passed to qemu: - smp> 1 - vhost=on - mq=on (new option, default:off) then #txqueues = #cpus. The #txqueues can be changed by using an optional 'numtxqs' option. e.g. for a smp=4 guest: vhost=on,mq=on ->#txqueues = 4 vhost=on,mq=on,numtxqs=8 ->#txqueues = 8 vhost=on,mq=on,numtxqs=2 ->#txqueues = 2 Performance (guest -> local host): --- System configuration: Host: 8 Intel Xeon, 8 GB memory Guest: 4 cpus, 2 GB memory All testing without any tuning, and TCP netperf with 64K I/O ___ TCP (#numtxqs=2) N# BW1 BW2(%) SD1 SD2(%) RSD1RSD2(%) ___ 4 26387 40716 (54.30) 20 28 (40.00)86i 85 (-1.16) 8 24356 41843 (71.79) 88 129 (46.59)372 362(-2.68) 16 23587 40546 (71.89) 375 564 (50.40)15581519 (-2.50) 32 22927 39490 (72.24) 16172171 (34.26)66945722 (-14.52) 48 23067 39238 (70.10) 39315170 (31.51)15823 13552 (-14.35) 64 22927 38750 (69.01) 71429914 (38.81)28972 26173 (-9.66) 96 22568 38520 (70.68) 16258 27844 (71.26) 65944 73031 (10.74) ___ UDP (#numtxqs=8) N# BW1 BW2 (%) SD1 SD2 (%) __ 4 29836 56761 (90.24) 67 63(-5.97) 8 27666 63767 (130.48) 326 265 (-18.71) 16 25452 60665 (138.35) 13961269 (-9.09) 32 26172 63491 (142.59) 56174202 (-25.19) 48 26146 64629 (147.18) 12813 9316 (-27.29) 64 25575 65448 (155.90) 23063 16346 (-29.12) 128 26454 63772 (141.06) 91054 85051 (-6.59) Impressive results. __ N#: Number of netperf sessions, 90 sec runs BW1,SD1,RSD1: Bandwidth (sum across 2 runs in mbps), SD and Remote SD for original code BW2,SD2,RSD2: Bandwidth (sum across 2 runs in mbps), SD and Remote SD for new code. e.g. BW2=40716 means average BW2 was 20358 mbps. Next steps: --- 1. mq RX patch is also complete - plan to submit once TX is OK. 2. Cache-align data structures: I didn't see any BW/SD improvement after making the sq's (and similarly for vhost) cache-aligned statically: struct virtnet_info { ... struct send_queue sq[16] cacheline_aligned_in_smp; ... }; Guest interrupts for a 4 TXQ device after a 5 min test: # egrep "virtio0|CPU" /proc/interrupts CPU0 CPU1 CPU2CPU3 40: 000 0PCI-MSI-edge virtio0-config 41: 126955 126912 126505 126940 PCI-MSI-edge virtio0-input 42: 108583 107787 107853 107716 PCI-MSI-edge virtio0-output.0 43: 300278 297653 299378 300554 PCI-MSI-edge virtio0-output.1 44: 372607 374884 371092 372011 PCI-MSI-edge virtio0-output.2 45: 162042 162261 163623 162923 PCI-MSI-edge virtio0-output.3 How are vhost threads and host interrupts distributed? We need to move vhost queue threads to be colocated with the related vcpu threads (if no extra cores are available) or on the same socket (if extra cores are available). Similarly, move device interrupts to the same core as the vhost thread. -- I have a truly marvellous patch that fixes the bug which
[RFC PATCH 4/4] qemu changes
Changes in qemu to support mq TX. Signed-off-by: Krishna Kumar --- hw/vhost.c |8 ++- hw/vhost.h |2 hw/vhost_net.c | 16 +-- hw/vhost_net.h |2 hw/virtio-net.c | 97 ++ hw/virtio-net.h |5 ++ hw/virtio-pci.c |2 net.c | 17 net.h |1 net/tap.c | 61 +--- 10 files changed, 155 insertions(+), 56 deletions(-) diff -ruNp org/hw/vhost.c new/hw/vhost.c --- org/hw/vhost.c 2010-08-09 09:51:58.0 +0530 +++ new/hw/vhost.c 2010-09-08 12:54:50.0 +0530 @@ -599,23 +599,27 @@ static void vhost_virtqueue_cleanup(stru 0, virtio_queue_get_desc_size(vdev, idx)); } -int vhost_dev_init(struct vhost_dev *hdev, int devfd) +int vhost_dev_init(struct vhost_dev *hdev, int devfd, int numtxqs) { uint64_t features; int r; if (devfd >= 0) { hdev->control = devfd; +hdev->nvqs = 2; } else { hdev->control = open("/dev/vhost-net", O_RDWR); if (hdev->control < 0) { return -errno; } } -r = ioctl(hdev->control, VHOST_SET_OWNER, NULL); + +r = ioctl(hdev->control, VHOST_SET_OWNER, numtxqs); if (r < 0) { goto fail; } +hdev->nvqs = numtxqs + 1; + r = ioctl(hdev->control, VHOST_GET_FEATURES, &features); if (r < 0) { goto fail; diff -ruNp org/hw/vhost.h new/hw/vhost.h --- org/hw/vhost.h 2010-07-01 11:42:09.0 +0530 +++ new/hw/vhost.h 2010-09-08 12:54:50.0 +0530 @@ -40,7 +40,7 @@ struct vhost_dev { unsigned long long log_size; }; -int vhost_dev_init(struct vhost_dev *hdev, int devfd); +int vhost_dev_init(struct vhost_dev *hdev, int devfd, int nvqs); void vhost_dev_cleanup(struct vhost_dev *hdev); int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev); void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev); diff -ruNp org/hw/vhost_net.c new/hw/vhost_net.c --- org/hw/vhost_net.c 2010-08-09 09:51:58.0 +0530 +++ new/hw/vhost_net.c 2010-09-08 12:54:50.0 +0530 @@ -36,7 +36,8 @@ struct vhost_net { struct vhost_dev dev; -struct vhost_virtqueue vqs[2]; +struct vhost_virtqueue *vqs; +int nvqs; int backend; VLANClientState *vc; }; @@ -76,7 +77,8 @@ static int vhost_net_get_fd(VLANClientSt } } -struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd) +struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd, +int numtxqs) { int r; struct vhost_net *net = qemu_malloc(sizeof *net); @@ -93,10 +95,14 @@ struct vhost_net *vhost_net_init(VLANCli (1 << VHOST_NET_F_VIRTIO_NET_HDR); net->backend = r; -r = vhost_dev_init(&net->dev, devfd); +r = vhost_dev_init(&net->dev, devfd, numtxqs); if (r < 0) { goto fail; } + +net->nvqs = numtxqs + 1; +net->vqs = qemu_malloc(net->nvqs * (sizeof *net->vqs)); + if (~net->dev.features & net->dev.backend_features) { fprintf(stderr, "vhost lacks feature mask %" PRIu64 " for backend\n", (uint64_t)(~net->dev.features & net->dev.backend_features)); @@ -118,7 +124,6 @@ int vhost_net_start(struct vhost_net *ne struct vhost_vring_file file = { }; int r; -net->dev.nvqs = 2; net->dev.vqs = net->vqs; r = vhost_dev_start(&net->dev, dev); if (r < 0) { @@ -166,7 +171,8 @@ void vhost_net_cleanup(struct vhost_net qemu_free(net); } #else -struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd) +struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd, +int nvqs) { return NULL; } diff -ruNp org/hw/vhost_net.h new/hw/vhost_net.h --- org/hw/vhost_net.h 2010-07-01 11:42:09.0 +0530 +++ new/hw/vhost_net.h 2010-09-08 12:54:50.0 +0530 @@ -6,7 +6,7 @@ struct vhost_net; typedef struct vhost_net VHostNetState; -VHostNetState *vhost_net_init(VLANClientState *backend, int devfd); +VHostNetState *vhost_net_init(VLANClientState *backend, int devfd, int nvqs); int vhost_net_start(VHostNetState *net, VirtIODevice *dev); void vhost_net_stop(VHostNetState *net, VirtIODevice *dev); diff -ruNp org/hw/virtio-net.c new/hw/virtio-net.c --- org/hw/virtio-net.c 2010-07-19 12:41:28.0 +0530 +++ new/hw/virtio-net.c 2010-09-08 12:54:50.0 +0530 @@ -32,17 +32,17 @@ typedef struct VirtIONet uint8_t mac[ETH_ALEN]; uint16_t status; VirtQueue *rx_vq; -VirtQueue *tx_vq; +VirtQueue **tx_vq; VirtQueue *ctrl_vq; NICState *nic; -QEMUTimer *tx_timer; -int tx_timer_active; +QEMUTimer **tx_timer; +int *tx_timer_active; uint32_t has_vnet_hdr; uint8_t has_ufo; struct { VirtQueueElement elem; ssize_t len; -} async_tx; +} *async_tx; int mergeable_rx_bufs;
[RFC PATCH 3/4] Changes for vhost
Changes for mq vhost. vhost_net_open is changed to allocate a vhost_net and return. The remaining initializations are delayed till SET_OWNER. SET_OWNER is changed so that the argument is used to figure out how many txqs to use. Unmodified qemu's will pass NULL, so this is recognized and handled as numtxqs=1. Besides changing handle_tx to use 'vq', this patch also changes handle_rx to take vq as parameter. The mq RX patch requires this change, but till then it is consistent (and less confusing) to make the interfaces for handling rx and tx similar. Signed-off-by: Krishna Kumar --- drivers/vhost/net.c | 272 ++-- drivers/vhost/vhost.c | 152 ++ drivers/vhost/vhost.h | 15 +- 3 files changed, 289 insertions(+), 150 deletions(-) diff -ruNp org/drivers/vhost/net.c tx_only/drivers/vhost/net.c --- org/drivers/vhost/net.c 2010-09-03 16:33:51.0 +0530 +++ tx_only/drivers/vhost/net.c 2010-09-08 10:20:54.0 +0530 @@ -33,12 +33,6 @@ * Using this limit prevents one virtqueue from starving others. */ #define VHOST_NET_WEIGHT 0x8 -enum { - VHOST_NET_VQ_RX = 0, - VHOST_NET_VQ_TX = 1, - VHOST_NET_VQ_MAX = 2, -}; - enum vhost_net_poll_state { VHOST_NET_POLL_DISABLED = 0, VHOST_NET_POLL_STARTED = 1, @@ -47,12 +41,12 @@ enum vhost_net_poll_state { struct vhost_net { struct vhost_dev dev; - struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX]; - struct vhost_poll poll[VHOST_NET_VQ_MAX]; + struct vhost_virtqueue *vqs; + struct vhost_poll *poll; /* Tells us whether we are polling a socket for TX. * We only do this when socket buffer fills up. * Protected by tx vq lock. */ - enum vhost_net_poll_state tx_poll_state; + enum vhost_net_poll_state *tx_poll_state; }; /* Pop first len bytes from iovec. Return number of segments used. */ @@ -92,28 +86,28 @@ static void copy_iovec_hdr(const struct } /* Caller must have TX VQ lock */ -static void tx_poll_stop(struct vhost_net *net) +static void tx_poll_stop(struct vhost_net *net, int qnum) { - if (likely(net->tx_poll_state != VHOST_NET_POLL_STARTED)) + if (likely(net->tx_poll_state[qnum] != VHOST_NET_POLL_STARTED)) return; - vhost_poll_stop(net->poll + VHOST_NET_VQ_TX); - net->tx_poll_state = VHOST_NET_POLL_STOPPED; + vhost_poll_stop(&net->poll[qnum]); + net->tx_poll_state[qnum] = VHOST_NET_POLL_STOPPED; } /* Caller must have TX VQ lock */ -static void tx_poll_start(struct vhost_net *net, struct socket *sock) +static void tx_poll_start(struct vhost_net *net, struct socket *sock, int qnum) { - if (unlikely(net->tx_poll_state != VHOST_NET_POLL_STOPPED)) + if (unlikely(net->tx_poll_state[qnum] != VHOST_NET_POLL_STOPPED)) return; - vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file); - net->tx_poll_state = VHOST_NET_POLL_STARTED; + vhost_poll_start(&net->poll[qnum], sock->file); + net->tx_poll_state[qnum] = VHOST_NET_POLL_STARTED; } /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ -static void handle_tx(struct vhost_net *net) +static void handle_tx(struct vhost_virtqueue *vq) { - struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX]; + struct vhost_net *net = container_of(vq->dev, struct vhost_net, dev); unsigned out, in, s; int head; struct msghdr msg = { @@ -134,7 +128,7 @@ static void handle_tx(struct vhost_net * wmem = atomic_read(&sock->sk->sk_wmem_alloc); if (wmem >= sock->sk->sk_sndbuf) { mutex_lock(&vq->mutex); - tx_poll_start(net, sock); + tx_poll_start(net, sock, vq->qnum); mutex_unlock(&vq->mutex); return; } @@ -144,7 +138,7 @@ static void handle_tx(struct vhost_net * vhost_disable_notify(vq); if (wmem < sock->sk->sk_sndbuf / 2) - tx_poll_stop(net); + tx_poll_stop(net, vq->qnum); hdr_size = vq->vhost_hlen; for (;;) { @@ -159,7 +153,7 @@ static void handle_tx(struct vhost_net * if (head == vq->num) { wmem = atomic_read(&sock->sk->sk_wmem_alloc); if (wmem >= sock->sk->sk_sndbuf * 3 / 4) { - tx_poll_start(net, sock); + tx_poll_start(net, sock, vq->qnum); set_bit(SOCK_ASYNC_NOSPACE, &sock->flags); break; } @@ -189,7 +183,7 @@ static void handle_tx(struct vhost_net * err = sock->ops->sendmsg(NULL, sock, &msg, len); if (unlikely(err < 0)) { vhost_discard_vq_desc(vq, 1); - tx_poll_start(net, sock); +
[RFC PATCH 0/4] Implement multiqueue virtio-net
Following patches implement Transmit mq in virtio-net. Also included is the user qemu changes. 1. This feature was first implemented with a single vhost. Testing showed 3-8% performance gain for upto 8 netperf sessions (and sometimes 16), but BW dropped with more sessions. However, implementing per-txq vhost improved BW significantly all the way to 128 sessions. 2. For this mq TX patch, 1 daemon is created for RX and 'n' daemons for the 'n' TXQ's, for a total of (n+1) daemons. The (subsequent) RX mq patch changes that to a total of 'n' daemons, where RX and TX vq's share 1 daemon. 3. Service Demand increases for TCP, but significantly improves for UDP. 4. Interoperability: Many combinations, but not all, of qemu, host, guest tested together. Enabling mq on virtio: --- When following options are passed to qemu: - smp > 1 - vhost=on - mq=on (new option, default:off) then #txqueues = #cpus. The #txqueues can be changed by using an optional 'numtxqs' option. e.g. for a smp=4 guest: vhost=on,mq=on -> #txqueues = 4 vhost=on,mq=on,numtxqs=8 -> #txqueues = 8 vhost=on,mq=on,numtxqs=2 -> #txqueues = 2 Performance (guest -> local host): --- System configuration: Host: 8 Intel Xeon, 8 GB memory Guest: 4 cpus, 2 GB memory All testing without any tuning, and TCP netperf with 64K I/O ___ TCP (#numtxqs=2) N# BW1 BW2(%) SD1 SD2(%) RSD1RSD2(%) ___ 4 26387 40716 (54.30) 20 28 (40.00)86i 85 (-1.16) 8 24356 41843 (71.79) 88 129 (46.59)372 362(-2.68) 16 23587 40546 (71.89) 375 564 (50.40)15581519 (-2.50) 32 22927 39490 (72.24) 16172171 (34.26)66945722 (-14.52) 48 23067 39238 (70.10) 39315170 (31.51)15823 13552 (-14.35) 64 22927 38750 (69.01) 71429914 (38.81)28972 26173 (-9.66) 96 22568 38520 (70.68) 16258 27844 (71.26) 65944 73031 (10.74) ___ UDP (#numtxqs=8) N# BW1 BW2 (%) SD1 SD2 (%) __ 4 29836 56761 (90.24) 67 63(-5.97) 8 27666 63767 (130.48) 326 265 (-18.71) 16 25452 60665 (138.35) 13961269 (-9.09) 32 26172 63491 (142.59) 56174202 (-25.19) 48 26146 64629 (147.18) 12813 9316 (-27.29) 64 25575 65448 (155.90) 23063 16346 (-29.12) 128 26454 63772 (141.06) 91054 85051 (-6.59) __ N#: Number of netperf sessions, 90 sec runs BW1,SD1,RSD1: Bandwidth (sum across 2 runs in mbps), SD and Remote SD for original code BW2,SD2,RSD2: Bandwidth (sum across 2 runs in mbps), SD and Remote SD for new code. e.g. BW2=40716 means average BW2 was 20358 mbps. Next steps: --- 1. mq RX patch is also complete - plan to submit once TX is OK. 2. Cache-align data structures: I didn't see any BW/SD improvement after making the sq's (and similarly for vhost) cache-aligned statically: struct virtnet_info { ... struct send_queue sq[16] cacheline_aligned_in_smp; ... }; Guest interrupts for a 4 TXQ device after a 5 min test: # egrep "virtio0|CPU" /proc/interrupts CPU0 CPU1 CPU2CPU3 40: 000 0PCI-MSI-edge virtio0-config 41: 126955 126912 126505 126940 PCI-MSI-edge virtio0-input 42: 108583 107787 107853 107716 PCI-MSI-edge virtio0-output.0 43: 300278 297653 299378 300554 PCI-MSI-edge virtio0-output.1 44: 372607 374884 371092 372011 PCI-MSI-edge virtio0-output.2 45: 162042 162261 163623 162923 PCI-MSI-edge virtio0-output.3 Review/feedback appreciated. Signed-off-by: Krishna Kumar --- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 1/4] Add a new API to virtio-pci
Add virtio_get_queue_index() to get the queue index of a vq. This is needed by the cb handler to locate the queue that should be processed. Signed-off-by: Krishna Kumar --- drivers/virtio/virtio_pci.c |9 + include/linux/virtio.h |3 +++ 2 files changed, 12 insertions(+) diff -ruNp org/include/linux/virtio.h tx_only/include/linux/virtio.h --- org/include/linux/virtio.h 2010-09-03 16:33:51.0 +0530 +++ tx_only/include/linux/virtio.h 2010-09-08 10:23:36.0 +0530 @@ -136,4 +136,7 @@ struct virtio_driver { int register_virtio_driver(struct virtio_driver *drv); void unregister_virtio_driver(struct virtio_driver *drv); + +/* return the internal queue index associated with the virtqueue */ +extern int virtio_get_queue_index(struct virtqueue *vq); #endif /* _LINUX_VIRTIO_H */ diff -ruNp org/drivers/virtio/virtio_pci.c tx_only/drivers/virtio/virtio_pci.c --- org/drivers/virtio/virtio_pci.c 2010-09-03 16:33:51.0 +0530 +++ tx_only/drivers/virtio/virtio_pci.c 2010-09-08 10:23:16.0 +0530 @@ -359,6 +359,15 @@ static int vp_request_intx(struct virtio return err; } +/* Return the internal queue index associated with the virtqueue */ +int virtio_get_queue_index(struct virtqueue *vq) +{ + struct virtio_pci_vq_info *info = vq->priv; + + return info->queue_index; +} +EXPORT_SYMBOL(virtio_get_queue_index); + static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index, void (*callback)(struct virtqueue *vq), const char *name, -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 2/4] Changes for virtio-net
Implement mq virtio-net driver. Though struct virtio_net_config changes, it works with old qemu's since the last element is not accessed, unless qemu sets VIRTIO_NET_F_NUMTXQS. Signed-off-by: Krishna Kumar --- drivers/net/virtio_net.c | 213 ++- include/linux/virtio_net.h |6 2 files changed, 166 insertions(+), 53 deletions(-) diff -ruNp org/include/linux/virtio_net.h tx_only/include/linux/virtio_net.h --- org/include/linux/virtio_net.h 2010-09-03 16:33:51.0 +0530 +++ tx_only/include/linux/virtio_net.h 2010-09-08 10:39:22.0 +0530 @@ -7,6 +7,9 @@ #include #include +/* The maximum of transmit queues supported */ +#define VIRTIO_MAX_TXQS16 + /* The feature bitmap for virtio net */ #define VIRTIO_NET_F_CSUM 0 /* Host handles pkts w/ partial csum */ #define VIRTIO_NET_F_GUEST_CSUM1 /* Guest handles pkts w/ partial csum */ @@ -26,6 +29,7 @@ #define VIRTIO_NET_F_CTRL_RX 18 /* Control channel RX mode support */ #define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */ #define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */ +#define VIRTIO_NET_F_NUMTXQS 21 /* Device supports multiple TX queue */ #define VIRTIO_NET_S_LINK_UP 1 /* Link is up */ @@ -34,6 +38,8 @@ struct virtio_net_config { __u8 mac[6]; /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */ __u16 status; + /* number of transmit queues */ + __u16 numtxqs; } __attribute__((packed)); /* This is the first element of the scatter-gather list. If you don't diff -ruNp org/drivers/net/virtio_net.c tx_only/drivers/net/virtio_net.c --- org/drivers/net/virtio_net.c2010-09-03 16:33:51.0 +0530 +++ tx_only/drivers/net/virtio_net.c2010-09-08 12:14:19.0 +0530 @@ -40,9 +40,20 @@ module_param(gso, bool, 0444); #define VIRTNET_SEND_COMMAND_SG_MAX2 +/* Our representation of a send virtqueue */ +struct send_queue { + struct virtqueue *svq; + + /* TX: fragments + linear part + virtio header */ + struct scatterlist tx_sg[MAX_SKB_FRAGS + 2]; +}; + struct virtnet_info { struct virtio_device *vdev; - struct virtqueue *rvq, *svq, *cvq; + int numtxqs;/* Number of tx queues */ + struct send_queue *sq; + struct virtqueue *rvq; + struct virtqueue *cvq; struct net_device *dev; struct napi_struct napi; unsigned int status; @@ -62,9 +73,8 @@ struct virtnet_info { /* Chain pages by the private ptr. */ struct page *pages; - /* fragments + linear part + virtio header */ + /* RX: fragments + linear part + virtio header */ struct scatterlist rx_sg[MAX_SKB_FRAGS + 2]; - struct scatterlist tx_sg[MAX_SKB_FRAGS + 2]; }; struct skb_vnet_hdr { @@ -120,12 +130,13 @@ static struct page *get_a_page(struct vi static void skb_xmit_done(struct virtqueue *svq) { struct virtnet_info *vi = svq->vdev->priv; + int qnum = virtio_get_queue_index(svq) - 1; /* 0 is RX vq */ /* Suppress further interrupts. */ virtqueue_disable_cb(svq); /* We were probably waiting for more output buffers. */ - netif_wake_queue(vi->dev); + netif_wake_subqueue(vi->dev, qnum); } static void set_skb_frag(struct sk_buff *skb, struct page *page, @@ -495,12 +506,13 @@ again: return received; } -static unsigned int free_old_xmit_skbs(struct virtnet_info *vi) +static unsigned int free_old_xmit_skbs(struct virtnet_info *vi, + struct virtqueue *svq) { struct sk_buff *skb; unsigned int len, tot_sgs = 0; - while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) { + while ((skb = virtqueue_get_buf(svq, &len)) != NULL) { pr_debug("Sent skb %p\n", skb); vi->dev->stats.tx_bytes += skb->len; vi->dev->stats.tx_packets++; @@ -510,7 +522,8 @@ static unsigned int free_old_xmit_skbs(s return tot_sgs; } -static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb) +static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb, + struct virtqueue *svq, struct scatterlist *tx_sg) { struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb); const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest; @@ -548,12 +561,12 @@ static int xmit_skb(struct virtnet_info /* Encode metadata header at front. */ if (vi->mergeable_rx_bufs) - sg_set_buf(vi->tx_sg, &hdr->mhdr, sizeof hdr->mhdr); + sg_set_buf(tx_sg, &hdr->mhdr, sizeof hdr->mhdr); else - sg_set_buf(vi->tx_sg, &hdr->hdr, sizeof hdr->hdr); + sg_set_buf(tx_sg, &hdr->hdr, sizeof hdr->hdr); - hdr->num_sg = skb_to_sgvec(skb, vi->tx_sg + 1, 0, skb->len) + 1; - return virtqueue_add_b
Re: [PATCH 22/27] KVM: MMU: Refactor mmu_alloc_roots function
On 09/07/2010 11:39 PM, Marcelo Tosatti wrote: @@ -2406,16 +2441,11 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) root_gfn = pdptr>> PAGE_SHIFT; if (mmu_check_root(vcpu, root_gfn)) return 1; - } else if (vcpu->arch.mmu.root_level == 0) - root_gfn = 0; - if (vcpu->arch.mmu.direct_map) { - direct = 1; - root_gfn = i<< 30; } spin_lock(&vcpu->kvm->mmu_lock); kvm_mmu_free_some_pages(vcpu); sp = kvm_mmu_get_page(vcpu, root_gfn, i<< 30, - PT32_ROOT_LEVEL, direct, + PT32_ROOT_LEVEL, 0, ACC_ALL, NULL); Should not write protect the gfn for nonpaging mode. nonpaging mode should have direct_map set, so wouldn't enter this path at all. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html