Re: KVM usability
"H. Peter Anvin" writes: > On 03/04/2010 12:13 PM, Zachary Amsden wrote: >> >> These are all basic things that are left completely undefined by qemu's >> lack of a top-level configuration file, and it's an inexcusable disgrace. >> > > There is a top-level configuration file for Qemu, at least in the > development tree. It's optional, still, but it's there now. It covers much but not all of the command line. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM-Test: Add kvm userspace unit test
The test use kvm test harness kvmctl load binary test case file to test various function of kvm kernel module. Signed-off-by: sshang --- client/tests/kvm/tests/unit_test.py| 29 + client/tests/kvm/tests_base.cfg.sample |7 +++ 2 files changed, 36 insertions(+), 0 deletions(-) create mode 100644 client/tests/kvm/tests/unit_test.py diff --git a/client/tests/kvm/tests/unit_test.py b/client/tests/kvm/tests/unit_test.py new file mode 100644 index 000..9bc7441 --- /dev/null +++ b/client/tests/kvm/tests/unit_test.py @@ -0,0 +1,29 @@ +import os +from autotest_lib.client.bin import utils +from autotest_lib.client.common_lib import error + +def run_unit_test(test, params, env): +""" +This is kvm userspace unit test, use kvm test harness kvmctl load binary +test case file to test various function of kvm kernel module. +The output of all unit test can be found in the test result dir. +""" + +case_list = params.get("case_list","access apic emulator hypercall irq"\ + " port80 realmode sieve smptest tsc stringio vmexit").split() +srcdir = params.get("srcdir",test.srcdir) +user_dir = os.path.join(srcdir,"kvm_userspace/kvm/user") +os.chdir(user_dir) +test_fail_list = [] + +for i in case_list: +result_file = test.outputdir + "/" + i +testfile = i + ".flat" +results = utils.system("./kvmctl test/x86/bootstrap test/x86/" + \ + testfile + " > " + result_file,ignore_status=True) +if results != 0: +test_fail_list.append(i) + +if test_fail_list: +raise error.TestFail("< " + " ".join(test_fail_list) + \ + " >") diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 040d0c3..0918c26 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -300,6 +300,13 @@ variants: shutdown_method = shell kill_vm = yes kill_vm_gracefully = no + +- unit_test: +type = unit_test +case_list = access apic emulator hypercall msr port80 realmode sieve smptest tsc stringio vmexit +#srcdir should be same as build.cfg +srcdir = +vms = '' # Do not define test variants below shutdown -- 1.5.5.6 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: Use native_store_idt() instead of kvm_get_idt()
This patch use generic linux function native_store_idt() instead of kvm_get_idt(), and also removed the useless function kvm_get_idt(). Signed-off-by: Wei Yongjun --- arch/x86/include/asm/kvm_host.h |5 - arch/x86/kvm/vmx.c |2 +- 2 files changed, 1 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ec891a2..ea1b6c6 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -716,11 +716,6 @@ static inline void kvm_load_ldt(u16 sel) asm("lldt %0" : : "rm"(sel)); } -static inline void kvm_get_idt(struct desc_ptr *table) -{ - asm("sidt %0" : "=m"(*table)); -} - #ifdef CONFIG_X86_64 static inline unsigned long read_msr(unsigned long msr) { diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ae3217d..a08929a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2445,7 +2445,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) vmcs_write16(HOST_TR_SELECTOR, GDT_ENTRY_TSS*8); /* 22.2.4 */ - kvm_get_idt(&dt); + native_store_idt(&dt); vmcs_writel(HOST_IDTR_BASE, dt.address); /* 22.2.4 */ asm("mov $.Lkvm_vmx_return, %0" : "=r"(kvm_vmx_return)); -- 1.6.3.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pc-bios/bios.bin - where it comes from?
On 03/04/2010 04:46 PM, Michael Tokarev wrote: Hello. There are a few bugs filed about an.. interesting behavour. For example: http://www.mail-archive.com/kvm@vger.kernel.org/msg29834.html https://bugs.launchpad.net/qemu/+bug/513273 After quite some mix-n-matching, at least on my test machine, I can say that the issue gets triggered by seabios. When using pc-bios/bios.bin everything is ok. But when using any other bios.bin, even downloading seabios-0.5.1.tar.gz and building it - on a debian lenny system anyway - by running `make', the problem triggers. I tried different versions/variations of vgabios.bin (it's only -vga std which triggers the issue so far), including 0.6b and 0.6c built from sources, vgabios.bin from debian packages (0.6b and 0.6c), and the one included in qemu-0.12.3.tar.gz. And my conclusion so far is that vgabios.bin has exactly _no_ effect on the issue. But when using bios.bin from qemu-kvm-0.12.3.tar.gz, and _only_ that bios.bin, the problem goes away. pc-bios/bios.bin gets built from roms/seabios. We don't ship seabios 0.5.1 in 0.12.3, we ship 0.5.1-stable which is two commits ahead of 0.5.1. So the question arises: where that pc-bios/bios.bin comes from into qemu-0.12.3.tar.gz? It is either built from some other sources (not from seabios-0.5.1), or built with some extra/different compiler/linker options, or built using different compiler/linker. This is partially confirmed on ubuntu as well, but, as far as I understand, there the behavour is different with different versions of vgabios. One of the reasons we include a git submodule and the source for the bios is so that distributors don't have to deal with building the packages independently. Morale of the story is, just use the source we ship and don't try to be more clever than that :-) In case it's not clear: I'm testing qemu-kvm-0.12.3; bios.bin is the same in qemu-0.12.3 and qemu-kvm-0.12.3. BTW, is there any reason preventing updating vgabios to 0.6c version - the latest released one? There's no compelling improvement in 0.6c and updating vgabios is not something I'm eager to do unless there's a strong justification. Regards, Anthony Liguori Thanks! /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
pc-bios/bios.bin - where it comes from?
Hello. There are a few bugs filed about an.. interesting behavour. For example: http://www.mail-archive.com/kvm@vger.kernel.org/msg29834.html https://bugs.launchpad.net/qemu/+bug/513273 After quite some mix-n-matching, at least on my test machine, I can say that the issue gets triggered by seabios. When using pc-bios/bios.bin everything is ok. But when using any other bios.bin, even downloading seabios-0.5.1.tar.gz and building it - on a debian lenny system anyway - by running `make', the problem triggers. I tried different versions/variations of vgabios.bin (it's only -vga std which triggers the issue so far), including 0.6b and 0.6c built from sources, vgabios.bin from debian packages (0.6b and 0.6c), and the one included in qemu-0.12.3.tar.gz. And my conclusion so far is that vgabios.bin has exactly _no_ effect on the issue. But when using bios.bin from qemu-kvm-0.12.3.tar.gz, and _only_ that bios.bin, the problem goes away. So the question arises: where that pc-bios/bios.bin comes from into qemu-0.12.3.tar.gz? It is either built from some other sources (not from seabios-0.5.1), or built with some extra/different compiler/linker options, or built using different compiler/linker. This is partially confirmed on ubuntu as well, but, as far as I understand, there the behavour is different with different versions of vgabios. In case it's not clear: I'm testing qemu-kvm-0.12.3; bios.bin is the same in qemu-0.12.3 and qemu-kvm-0.12.3. BTW, is there any reason preventing updating vgabios to 0.6c version - the latest released one? Thanks! /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
On 03/04/2010 12:13 PM, Zachary Amsden wrote: > > These are all basic things that are left completely undefined by qemu's > lack of a top-level configuration file, and it's an inexcusable disgrace. > There is a top-level configuration file for Qemu, at least in the development tree. It's optional, still, but it's there now. -hpa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Fix segfault with ram_size > 4095M without kvm
* Aurelien Jarno [2010-03-04 15:27]: > On Tue, Feb 23, 2010 at 06:02:15PM +0100, Aurelien Jarno wrote: > > Ryan Harper a écrit : > > > Currently, x86_64-softmmu qemu segfaults when trying to use > 4095M > > > memsize. > > > This patch adds a simple check and error message (much like the 2047 > > > limit on > > > 32-bit hosts) on ram_size in the control path after we determine we're > > > not using kvm > > > > > > Upstream qemu-kvm is affected if using the -no-kvm option; this patch > > > address > > > the segfault there as well. > > > > It looks like workarounding the real bug. At some point both > > i386-softmmu (via PAE) and x86_64-softmmu were able to support > 4GB of > > memory. I remember adding the support long time ago, and testing it with > > 32GB of emulated RAM. > > I have looked into that, and actually one patch to get full support for > > 4GB of memory was not merged: Thanks for looking into this. > > diff --git a/exec.c b/exec.c > index 8389c54..b0bb058 100644 > --- a/exec.c > +++ b/exec.c > @@ -166,7 +166,7 @@ typedef struct PhysPageDesc { > */ > #define L1_BITS (TARGET_VIRT_ADDR_SPACE_BITS - L2_BITS - TARGET_PAGE_BITS) > #else > -#define L1_BITS (32 - L2_BITS - TARGET_PAGE_BITS) > +#define L1_BITS (TARGET_PHYS_ADDR_SPACE_BITS - L2_BITS - TARGET_PAGE_BITS) > #endif > > #define L1_SIZE (1 << L1_BITS) > > While this patch is acceptable for qemu i386, it creates a big L1 table > for x86_64 or other 64-bit architectures, resulting in huge memory > overhead. > > The recent multilevel tables patches from Richard Henderson should fix > the problem for HEAD (I haven't found time to look at them in details). > > As this is not something we really want to backport, your patch makes > sense in stable-0.12. Anthony, do you want me to resend and rebase against 0.12-stable? -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Fix segfault with ram_size > 4095M without kvm
On Tue, Feb 23, 2010 at 06:02:15PM +0100, Aurelien Jarno wrote: > Ryan Harper a écrit : > > Currently, x86_64-softmmu qemu segfaults when trying to use > 4095M memsize. > > This patch adds a simple check and error message (much like the 2047 limit > > on > > 32-bit hosts) on ram_size in the control path after we determine we're > > not using kvm > > > > Upstream qemu-kvm is affected if using the -no-kvm option; this patch > > address > > the segfault there as well. > > It looks like workarounding the real bug. At some point both > i386-softmmu (via PAE) and x86_64-softmmu were able to support > 4GB of > memory. I remember adding the support long time ago, and testing it with > 32GB of emulated RAM. I have looked into that, and actually one patch to get full support for > 4GB of memory was not merged: diff --git a/exec.c b/exec.c index 8389c54..b0bb058 100644 --- a/exec.c +++ b/exec.c @@ -166,7 +166,7 @@ typedef struct PhysPageDesc { */ #define L1_BITS (TARGET_VIRT_ADDR_SPACE_BITS - L2_BITS - TARGET_PAGE_BITS) #else -#define L1_BITS (32 - L2_BITS - TARGET_PAGE_BITS) +#define L1_BITS (TARGET_PHYS_ADDR_SPACE_BITS - L2_BITS - TARGET_PAGE_BITS) #endif #define L1_SIZE (1 << L1_BITS) While this patch is acceptable for qemu i386, it creates a big L1 table for x86_64 or other 64-bit architectures, resulting in huge memory overhead. The recent multilevel tables patches from Richard Henderson should fix the problem for HEAD (I haven't found time to look at them in details). As this is not something we really want to backport, your patch makes sense in stable-0.12. > > Signed-off-by: Ryan Harper > > --- > > vl.c |6 ++ > > 1 files changed, 6 insertions(+), 0 deletions(-) > > > > diff --git a/vl.c b/vl.c > > index db7a178..a659e98 100644 > > --- a/vl.c > > +++ b/vl.c > > @@ -5760,6 +5760,12 @@ int main(int argc, char **argv, char **envp) > > fprintf(stderr, "failed to initialize KVM\n"); > > exit(1); > > } > > +} else { > > +/* without kvm enabled, we can only support 4095 MB RAM */ > > +if (ram_size > (4095UL << 20)) { > > +fprintf(stderr, "qemu: without kvm support at most 4095 MB RAM > > can be simulated\n"); > > +exit(1); > > +} > > } > > > > if (qemu_init_main_loop()) { > > > -- > Aurelien Jarno GPG: 1024D/F1BCDB73 > aurel...@aurel32.net http://www.aurel32.net > > > -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IVSHMEM and limits on shared memory
On Thu, Mar 4, 2010 at 1:12 PM, Khaled Ibrahim wrote: > >> >> As a test, I removed anywhere my patch stored the size of the shared >> memory region and hard coded the size of 512 MB into qemu_ram_alloc >> and pci_register_bar, so that my patch never writes the size of the >> memory region anywhere. And I discovered that the value of 512MB >> still shows up at the offset you mention, so it seems something else >> is storing that value in the wrong location and corrupting memory. >> >> Can you try using the version from the git repo and see if the error recurs? > > Thank you Cam. I tried to build using git repo, but the build crashes while > booting on my machine without the shared memory patch. I used > git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git. Which git repo are you > using? Can you send me a ivshmem patched qemu-kvm, or tell me which stable > qemu-kvm repo should I use? That's the correct repo. Your VM crashes using the latest git repo? That is unusual. I'll send you a tar ball off-list of a patched version of KVM. > > Thanks, > -Khaled > > _ > Hotmail: Trusted email with Microsoft’s powerful SPAM protection. > http://clk.atdmt.com/GBL/go/201469226/direct/01/ > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
On 03/04/2010 02:13 PM, Zachary Amsden wrote: The biggest problem with virt-manager isn't virt-manager, it's that it is trying to do a nearly intractable task. Because a qemu virtual machine is not a machine at all, just a disk image without the proper metadata to track the important properties of the machine, like what revision of PCI chipset, how many disk controllers the thing is using, what kind of graphics card, etc. These are all basic things that are left completely undefined by qemu's lack of a top-level configuration file, and it's an inexcusable disgrace. So virt-manager or any other management tool has the burden of creating and maintaining a bunch of metadata around this workhorse tool called qemu and invoking libvirt to figure out which set of 100,000 blasted command line options to pass on. That's why it falls short of expectations at times, not because virt-manager is crap, but because there is no well defined, well designed infrastructure for it to manage and the ad-hoc solution here is total crap. And this is why we're doing QMP and qdev. It's long overdue infrastructure. It's not just the problem that you describe though. virt-manager is limited by what libvirt provides and today libvirt does not expose nearly enough qemu features for virt-manager to even attempt to solve the problem on it's own. Regards, Anthony Liguori Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
On 03/04/2010 10:00 AM, Lucas Meneghel Rodrigues wrote: On Tue, 2010-03-02 at 11:11 +0100, Peter Zijlstra wrote: On Mon, 2010-03-01 at 09:14 -0600, Anthony Liguori wrote: The real question to ask is, why are you using qemu directly instead of using virt-manager? Because I suspect Ingo, like me, is a command line user, launching a gui to start kvm when there is a kvm command around just sounds daft. Also, I just installed and tried it, virt-manager is a total piece of shit, That statement is far from being fair. I use virt-manager quite a lot, since I want to keep track of what's going on on KVM virtualization for end users in Fedora. What's shipped with Fedora 12 is pretty decent in many regards, but as in any other software there's plenty of room for improvements. The biggest problem with virt-manager isn't virt-manager, it's that it is trying to do a nearly intractable task. Because a qemu virtual machine is not a machine at all, just a disk image without the proper metadata to track the important properties of the machine, like what revision of PCI chipset, how many disk controllers the thing is using, what kind of graphics card, etc. These are all basic things that are left completely undefined by qemu's lack of a top-level configuration file, and it's an inexcusable disgrace. So virt-manager or any other management tool has the burden of creating and maintaining a bunch of metadata around this workhorse tool called qemu and invoking libvirt to figure out which set of 100,000 blasted command line options to pass on. That's why it falls short of expectations at times, not because virt-manager is crap, but because there is no well defined, well designed infrastructure for it to manage and the ad-hoc solution here is total crap. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: IVSHMEM and limits on shared memory
> > As a test, I removed anywhere my patch stored the size of the shared > memory region and hard coded the size of 512 MB into qemu_ram_alloc > and pci_register_bar, so that my patch never writes the size of the > memory region anywhere. And I discovered that the value of 512MB > still shows up at the offset you mention, so it seems something else > is storing that value in the wrong location and corrupting memory. > > Can you try using the version from the git repo and see if the error recurs? Thank you Cam. I tried to build using git repo, but the build crashes while booting on my machine without the shared memory patch. I used git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git. Which git repo are you using? Can you send me a ivshmem patched qemu-kvm, or tell me which stable qemu-kvm repo should I use? Thanks, -Khaled _ Hotmail: Trusted email with Microsoft’s powerful SPAM protection. http://clk.atdmt.com/GBL/go/201469226/direct/01/-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
On Tue, 2010-03-02 at 11:11 +0100, Peter Zijlstra wrote: > On Mon, 2010-03-01 at 09:14 -0600, Anthony Liguori wrote: > > The real > > question to ask is, why are you using qemu directly instead of using > > virt-manager? > > Because I suspect Ingo, like me, is a command line user, launching a gui > to start kvm when there is a kvm command around just sounds daft. > > Also, I just installed and tried it, virt-manager is a total piece of > shit, That statement is far from being fair. I use virt-manager quite a lot, since I want to keep track of what's going on on KVM virtualization for end users in Fedora. What's shipped with Fedora 12 is pretty decent in many regards, but as in any other software there's plenty of room for improvements. > I wouldn't even know how to begin telling it how to start my > freshly baked kernel with serial console on stdio and some block image I > just created from the gentoo stage3 tarball. Fair enough, it is convoluted to do what you want using virt-manager (although possible), but mainly because this wasn't a use case for it. You can't expect the application designers to support every single type of work flow under the sun. > That is, after 5 minutes clicking I have no idea how to even launch an > ISO with the thing, I prefer reading the kvm manpage over using some > mouse only gui crap like that. For the 1st thing you wanted to do, I agree that it was cumbersome. But to create a VM and make it boot an ISO available on your hard drive it *is* trivial. There's a wizard to do it, because it's the main use case of the thing. If you want to point out problems on virt-manager that is fine, and the developers will do what is possible to address problems, insults are not necessary. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/10] Add -kvm option
On 03/02/2010 12:25 PM, Glauber Costa wrote: On Tue, Mar 02, 2010 at 01:31:05AM -0300, Marcelo Tosatti wrote: On Fri, Feb 26, 2010 at 05:12:19PM -0300, Glauber Costa wrote: This option deprecates --enable-kvm. It is a more flexible option, that makes use of qemu-opts, and allow us to pass on options to enable or disable kernel irqchip, for example. Signed-off-by: Glauber Costa Really have to replace -enable-kvm? Can't you keep compatibility for it? We don't have to , but I'd rather deprecate it. I don't feel strongly, though. It needs to stay. For enabling/disabling, if you don't like -enable-kvm, I'd suggest thinking about modeling it through CPU. For instance: -cpu host,accel=kvm|tcg|kvm,tcg Since we already specify CPU's in a global config, if you took this approach, it would make it possible to tweak the default kvm vs. tcg selection within the config file so a user could control whether vms were created with tcg or kvm by default or whether it tried kvm and then fell back to tcg. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/10] Add -kvm option
On Thu, Mar 04, 2010 at 05:20:22PM +0100, Jan Kiszka wrote: > Jan Kiszka wrote: > > Glauber Costa wrote: > >> This option deprecates --enable-kvm. It is a more flexible option, > >> that makes use of qemu-opts, and allow us to pass on options to enable or > >> disable kernel irqchip, for example. > >> > > > > ... > > > >> diff --git a/qemu-options.hx b/qemu-options.hx > >> index 3f49b44..f8fd86d 100644 > >> --- a/qemu-options.hx > >> +++ b/qemu-options.hx > >> @@ -1793,10 +1793,17 @@ Set the filename for the BIOS. > >> ETEXI > >> > >> #ifdef CONFIG_KVM > >> -DEF("enable-kvm", 0, QEMU_OPTION_enable_kvm, \ > >> -"-enable-kvm enable KVM full virtualization support\n") > >> +HXCOMM Options deprecated by -kvm > >> +DEF("enable-kvm", 0, QEMU_OPTION_enable_kvm, "") > >> + > >> +DEF("kvm", HAS_ARG, QEMU_OPTION_kvm, \ > >> +"-kvm enable=on|off,irqchip-in-kernel=on|off\n" \ > >> +"enable KVM full virtualization support\n") > >> + > > Argh, never trust documentation: The magic option is "enabled", not > "enable". :) > > > > > I would prefer "irqchip=kernel|user" - shorter and even more verbose. > > And we should refuse to work if the user tries to enable in-kernel > support without having io-threads enabled. That obviously fails silently > so far. > > "info kvm" should also be extended to report the configuration in force. I am waiting for marcelo to apply your patches (if he hasn't done already), then I'll redo this. Agreed with this point, so plan on changing. > > > > > Forgot if that was discussed already: Do we want "pit=kernel|user" as well? > > > I guess we do. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/10] Add -kvm option
On 02/26/2010 02:12 PM, Glauber Costa wrote: This option deprecates --enable-kvm. It is a more flexible option, that makes use of qemu-opts, and allow us to pass on options to enable or disable kernel irqchip, for example. Signed-off-by: Glauber Costa kernel vs. userspace irqchip shouldn't be a kvm option. Ideally, it would be a -device thing but I think we've agreed that -device won't cover platform devices. So what we probably should do is change the machine option to accept a qopts list, IOW: -M pc,irqchip=user|kernel,pit=user|kernel,... That certainly makes a lot more sense for non-x86 KVM targets (like s390 and ppc). And certainly, there's nothing that says that every x86 KVM target is going to have an APIC... Regards, Anthony Liguori --- kvm-all.c |1 + kvm.h |1 + qemu-config.c | 16 qemu-config.h |1 + qemu-options.hx | 11 +-- vl.c| 11 +++ 6 files changed, 39 insertions(+), 2 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 00e7411..0527e0f 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -52,6 +52,7 @@ typedef struct KVMSlot typedef struct kvm_dirty_log KVMDirtyLog; int kvm_allowed = 0; +int kvm_use_kernel_chip = 1; struct KVMState { diff --git a/kvm.h b/kvm.h index 7278874..480e651 100644 --- a/kvm.h +++ b/kvm.h @@ -20,6 +20,7 @@ #ifdef CONFIG_KVM extern int kvm_allowed; +extern int kvm_use_kernel_chip; #define kvm_enabled() (kvm_allowed) #else diff --git a/qemu-config.c b/qemu-config.c index 246fae6..310838e 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -290,6 +290,21 @@ QemuOptsList qemu_cpudef_opts = { }, }; +QemuOptsList qemu_kvm_opts = { +.name = "kvm", +.head = QTAILQ_HEAD_INITIALIZER(qemu_kvm_opts.head), +.desc = { +{ +.name = "irqchip-in-kernel", +.type = QEMU_OPT_BOOL, +},{ +.name = "enabled", +.type = QEMU_OPT_BOOL, +}, +{ /* end if list */ } +}, +}; + static QemuOptsList *lists[] = { &qemu_drive_opts, &qemu_chardev_opts, @@ -300,6 +315,7 @@ static QemuOptsList *lists[] = { &qemu_global_opts, &qemu_mon_opts, &qemu_cpudef_opts, +&qemu_kvm_opts, NULL, }; diff --git a/qemu-config.h b/qemu-config.h index b335c42..506e5fb 100644 --- a/qemu-config.h +++ b/qemu-config.h @@ -10,6 +10,7 @@ extern QemuOptsList qemu_rtc_opts; extern QemuOptsList qemu_global_opts; extern QemuOptsList qemu_mon_opts; extern QemuOptsList qemu_cpudef_opts; +extern QemuOptsList qemu_kvm_opts; int qemu_set_option(const char *str); int qemu_global_option(const char *str); diff --git a/qemu-options.hx b/qemu-options.hx index 3f49b44..f8fd86d 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -1793,10 +1793,17 @@ Set the filename for the BIOS. ETEXI #ifdef CONFIG_KVM -DEF("enable-kvm", 0, QEMU_OPTION_enable_kvm, \ -"-enable-kvm enable KVM full virtualization support\n") +HXCOMM Options deprecated by -kvm +DEF("enable-kvm", 0, QEMU_OPTION_enable_kvm, "") + +DEF("kvm", HAS_ARG, QEMU_OPTION_kvm, \ +"-kvm enable=on|off,irqchip-in-kernel=on|off\n" \ +"enable KVM full virtualization support\n") + #endif STEXI +...@item -kvm enable=[on|off][,irqchip-in-kernel=on|off] +...@findex -kvm @item -enable-kvm @findex -enable-kvm Enable KVM full virtualization support. This option is only available diff --git a/vl.c b/vl.c index 66e477a..8c94fee 100644 --- a/vl.c +++ b/vl.c @@ -5416,6 +5416,17 @@ int main(int argc, char **argv, char **envp) case QEMU_OPTION_enable_kvm: kvm_allowed = 1; break; +case QEMU_OPTION_kvm: + +opts = qemu_opts_parse(&qemu_kvm_opts, optarg, NULL); +if (!opts) { +fprintf(stderr, "parse error: %s\n", optarg); +exit(1); +} + +kvm_allowed = qemu_opt_get_bool(opts, "enabled", 1); +kvm_use_kernel_chip = qemu_opt_get_bool(opts, "irqchip-in-kernel", 1); +break; #endif case QEMU_OPTION_usb: usb_enabled = 1; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: Rework VCPU state writeback API
On Thu, Mar 04, 2010 at 12:58:58AM -0500, Kevin O'Connor wrote: > On Thu, Mar 04, 2010 at 01:21:12AM -0300, Marcelo Tosatti wrote: > > The regression seems to be caused by seabios commit d7e998f. Kevin, the > > failure can be seen on the attached screenshot, which happens on the > > first reboot of WinXP 32 installation (after copying files etc). > > Sorry - I also noticed a bug in that commit recently. I pushed the > fix I had in my local tree. Thanks, it does fix the issue here. Anthony can you please update seabios? TIA -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm upstream segfaults when using -smp 1
Lucas Meneghel Rodrigues wrote: > Hi folks: > > Today's upstream qemu-kvm.git is crashing when attempting to use -smp 1: > > 03/04 12:56:12 DEBUG|kvm_vm:0461| Running qemu command: > /usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor > unix:/tmp/monitor-20100304-125508-G6lf,server,nowait -drive > file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net > nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 > -smp 1 -drive > file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom > -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp > /usr/local/autotest/tests/kvm/images/tftpboot -boot d -bootp /pxelinux.0 > -boot n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0 > 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) kvm_create_vcpu: Bad file > descriptor > 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) /bin/sh: line 1: 17273 > Segmentation fault (core dumped) /usr/local/autotest/tests/kvm/qemu > -name 'vm1' -monitor unix:/tmp/monitor-20100304-125508-G6lf,server,nowait > -drive file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net > nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 > -smp 1 -drive > file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom > -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp > /usr/local/autotest/tests/kvm/images/tftpboot -boot d -bootp /pxelinux.0 > -boot n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0 > 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) (Process terminated with status > 139) > > I have opened a bug about it on KVM's bug tracking system on sourceforge. > Relevant software versions involved: > > Commit hash for git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git is > 7811d4e8ec057d25db68f900be1f09a142faca49 (tag kvm-88-3686-g7811d4e) > Kernel: 2.6.31.12-174.2.22.fc12.x86_64 > > Please let me know if you need more information about it. > Should be fixed by this: http://thread.gmane.org/gmane.comp.emulators.kvm.devel/47883 Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu-kvm upstream segfaults when using -smp 1
Hi folks: Today's upstream qemu-kvm.git is crashing when attempting to use -smp 1: 03/04 12:56:12 DEBUG|kvm_vm:0461| Running qemu command: /usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor unix:/tmp/monitor-20100304-125508-G6lf,server,nowait -drive file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 -smp 1 -drive file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp /usr/local/autotest/tests/kvm/images/tftpboot -boot d -bootp /pxelinux.0 -boot n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) kvm_create_vcpu: Bad file descriptor 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) /bin/sh: line 1: 17273 Segmentation fault (core dumped) /usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor unix:/tmp/monitor-20100304-125508-G6lf,server,nowait -drive file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 -smp 1 -drive file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp /usr/local/autotest/tests/kvm/images/tftpboot -boot d -bootp /pxelinux.0 -boot n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) (Process terminated with status 139) I have opened a bug about it on KVM's bug tracking system on sourceforge. Relevant software versions involved: Commit hash for git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git is 7811d4e8ec057d25db68f900be1f09a142faca49 (tag kvm-88-3686-g7811d4e) Kernel: 2.6.31.12-174.2.22.fc12.x86_64 Please let me know if you need more information about it. Lucas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2963581 ] qemu-kvm upstream crashes when using -smp 1
Bugs item #2963581, was opened at 2010-03-04 18:10 Message generated for change (Tracker Item Submitted) made by You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2963581&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Lucas Meneghel Rodrigues () Assigned to: Nobody/Anonymous (nobody) Summary: qemu-kvm upstream crashes when using -smp 1 Initial Comment: qemu-kvm.git master is crashing when using -smp 1 Relevant versions: Commit hash for git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git is 7811d4e8ec057d25db68f900be1f09a142faca49 (tag kvm-88-3686-g7811d4e) Kernel: 2.6.31.12-174.2.22.fc12.x86_64 Steps to reproduce 1 - Clone git repo git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git 2 - Build qemu-kvm from this repo 3 - Try to start it with -smp 1, reference command line: 03/04 12:56:12 DEBUG|kvm_vm:0461| Running qemu command: /usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor unix:/tmp/monitor-20100304-125508-G6lf,server,nowait -drive file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 -smp 1 -drive file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp /usr/local/autotest/tests/kvm/images/tftpboot -boot d -bootp /pxelinux.0 -boot n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) kvm_create_vcpu: Bad file descriptor 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) /bin/sh: line 1: 17273 Segmentation fault (core dumped) /usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor unix:/tmp/monitor-20100304-125508-G6lf,server,nowait -drive file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 -smp 1 -drive file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp /usr/local/autotest/tests/kvm/images/tftpboot -boot d -bootp /pxelinux.0 -boot n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) (Process terminated with status 139) So we have a segmentation fault. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2963581&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/10] provide apic-kvm
Glauber Costa wrote: > This patch provides the file apic-kvm.c, which implements a schim over > the kvm in-kernel APIC. > > Signed-off-by: Glauber Costa > --- > Makefile.target |2 +- > hw/apic-kvm.c | 157 > + > hw/pc.c |6 ++- > hw/pc.h |2 + > kvm.h |5 ++ > target-i386/cpu.h |4 ++ > target-i386/kvm.c | 25 - > 7 files changed, 197 insertions(+), 4 deletions(-) > create mode 100644 hw/apic-kvm.c > > diff --git a/Makefile.target b/Makefile.target > index bc5263e..f00af07 100644 > --- a/Makefile.target > +++ b/Makefile.target > @@ -213,7 +213,7 @@ obj-i386-y += usb-uhci.o vmmouse.o vmport.o vmware_vga.o > hpet.o > obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o > obj-i386-y += ne2000-isa.o debugcon.o multiboot.o > > -obj-i386-$(CONFIG_KVM) += ioapic-kvm.o i8259-kvm.o > +obj-i386-$(CONFIG_KVM) += ioapic-kvm.o i8259-kvm.o apic-kvm.o > > # shared objects > obj-ppc-y = ppc.o ide/core.o ide/qdev.o ide/isa.o ide/pci.o ide/macio.o > diff --git a/hw/apic-kvm.c b/hw/apic-kvm.c > new file mode 100644 > index 000..089fa45 > --- /dev/null > +++ b/hw/apic-kvm.c > @@ -0,0 +1,157 @@ > +#include "hw.h" > +#include "pc.h" > +#include "pci.h" > +#include "msix.h" > +#include "qemu-timer.h" > +#include "host-utils.h" > +#include "kvm.h" > + > +#define APIC_LVT_NB 6 > +#define APIC_LVT_LINT0 3 > + > +struct qemu_lapic_state { > +uint32_t apicbase; > +uint8_t id; > +uint8_t arb_id; > +uint8_t tpr; > +uint32_t spurious_vec; > +uint8_t log_dest; > +uint8_t dest_mode; > +uint32_t isr[8]; /* in service register */ > +uint32_t tmr[8]; /* trigger mode register */ > +uint32_t irr[8]; /* interrupt request register */ > +uint32_t lvt[APIC_LVT_NB]; > +uint32_t esr; /* error register */ > +uint32_t icr[2]; > + > +uint32_t divide_conf; > +int count_shift; > +uint32_t initial_count; > +int64_t initial_count_load_time, next_time; > +uint32_t idx; > +int sipi_vector; > +int wait_for_sipi; > +}; > + > +typedef struct APICState { > +CPUState *cpu_env; > + > +/* KVM lapic structure is just a big array of regs. But it is what kvm > + * functions expect. So have both the fields separated, for easy access, > + * and the kvm stucture, for ioctls communications */ > +union { > +struct qemu_lapic_state dev; > +struct kvm_lapic_state kvm_lapic_state; That looks fishy to me on second sight: Is, e.g., loading the kvm_lapic_state from the kernel supposed to magically fill the (totally unaligned) qemu_lapic_state structure? I'm missing the translations of kvm_kernel_lapic_load_from_user/save_to_user here or some effort to arrange qemu_lapic_state in a way that it robustly maps on the register array passed to/from the kernel (if that is possible, haven't checked yet). Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/10] uq/master: irqchip-in-kernel support
Glauber Costa wrote: > Hi guys, > > This is the same in-kernel irqchip support already posted to qemu-devel, > just rebased, retested, etc. It passes my basic tests, so it seem to be > still in good shape. > > It is provided against uq/master as part of the integration efforts Just as another heads-up: host->guest networking performance over slirp and non-virtio NICs suffers with this irqchip support the same way as in qemu-kvm. It's not a bug I expect to be directly related to these changes, but it is at least triggered by them and should now really be addressed. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/10] Add -kvm option
Jan Kiszka wrote: > Glauber Costa wrote: >> This option deprecates --enable-kvm. It is a more flexible option, >> that makes use of qemu-opts, and allow us to pass on options to enable or >> disable kernel irqchip, for example. >> > > ... > >> diff --git a/qemu-options.hx b/qemu-options.hx >> index 3f49b44..f8fd86d 100644 >> --- a/qemu-options.hx >> +++ b/qemu-options.hx >> @@ -1793,10 +1793,17 @@ Set the filename for the BIOS. >> ETEXI >> >> #ifdef CONFIG_KVM >> -DEF("enable-kvm", 0, QEMU_OPTION_enable_kvm, \ >> -"-enable-kvm enable KVM full virtualization support\n") >> +HXCOMM Options deprecated by -kvm >> +DEF("enable-kvm", 0, QEMU_OPTION_enable_kvm, "") >> + >> +DEF("kvm", HAS_ARG, QEMU_OPTION_kvm, \ >> +"-kvm enable=on|off,irqchip-in-kernel=on|off\n" \ >> +"enable KVM full virtualization support\n") >> + Argh, never trust documentation: The magic option is "enabled", not "enable". :) > > I would prefer "irqchip=kernel|user" - shorter and even more verbose. And we should refuse to work if the user tries to enable in-kernel support without having io-threads enabled. That obviously fails silently so far. "info kvm" should also be extended to report the configuration in force. > > Forgot if that was discussed already: Do we want "pit=kernel|user" as well? > No comments on this? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Fix boot CPU setup for the case it is unsupported
On 03/04/2010 02:00 AM, Jan Kiszka wrote: > Commit 52b03dd702 incorrectly failed KVM initialization in case the > kernel did not support KVM_CAP_SET_BOOT_CPU_ID. Fix this, and also > improve error propagation of kvm_create_context at this chance. > > Signed-off-by: Jan Kiszka > --- > > OK, it really was me. :) > > qemu-kvm-x86.c |9 +++-- > qemu-kvm.c |4 +++- > 2 files changed, 10 insertions(+), 3 deletions(-) > > diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c > index 7a5925a..7d42fdc 100644 > --- a/qemu-kvm-x86.c > +++ b/qemu-kvm-x86.c > @@ -672,7 +672,7 @@ static const VMStateDescription vmstate_kvmclock= { > > int kvm_arch_qemu_create_context(void) > { > -int i; > +int i, r; > struct utsname utsname; > > uname(&utsname); > @@ -696,7 +696,12 @@ int kvm_arch_qemu_create_context(void) > vmstate_register(0, &vmstate_kvmclock, &kvmclock_data); > #endif > > -return kvm_set_boot_cpu_id(0); > +r = kvm_set_boot_cpu_id(0); > +if (r < 0 && r != -ENOSYS) { > +return r; > +} > + > +return 0; > } > > static void set_msr_entry(struct kvm_msr_entry *entry, uint32_t index, > diff --git a/qemu-kvm.c b/qemu-kvm.c > index 222ca97..e417f21 100644 > --- a/qemu-kvm.c > +++ b/qemu-kvm.c > @@ -2091,8 +2091,10 @@ static int kvm_create_context(void) > return -1; > } > r = kvm_arch_qemu_create_context(); > -if (r < 0) > +if (r < 0) { > kvm_finalize(kvm_state); > +return -1; > +} > if (kvm_pit && !kvm_pit_reinject) { > if (kvm_reinject_control(kvm_context, 0)) { > fprintf(stderr, "failure to disable in-kernel PIT > reinjection\n"); > Works for me: FC12 host, FC12 guest. David -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
On Thu, Mar 04, 2010 at 11:42:55AM -0300, Marcelo Tosatti wrote: > On Wed, Mar 03, 2010 at 08:12:03PM +0100, Joerg Roedel wrote: > > Hi, > > > > here are the patches that implement nested paging support for nested > > svm. They are somewhat intrusive to the soft-mmu so I post them as RFC > > in the first round to get feedback about the general direction of the > > changes. Nevertheless I am proud to report that with these patches the > > famous kernel-compile benchmark runs only 4% slower in the l2 guest as > > in the l1 guest when l2 is single-processor. With SMP guests the > > situation is very different. The more vcpus the guest has the more is > > the performance drop from l1 to l2. > > Anyway, this post is to get feedback about the overall concept of these > > patches. Please review and give feedback :-) > > Joerg, > > What perf gain does this bring ? (i'm not aware of the current > overhead). The benchmark was an allnoconfig kernel compile in tmpfs which took with the same guest image: as l1-guest with npt: 2m23s as l2-guest with l1(nested)-l2(shadow): around 8-9 minutes as l2-guest with l1(nested)-l2(shadow) without the recent msrpm optimization: around 19 minutes as l2-guest with l1(nested)-l2(nested) [this patchset]: 2m25s-2m30s > Overall comments: > > Can't you translate l2_gpa -> l1_gpa walking the current l1 nested > pagetable, and pass that to the kvm tdp fault path (with the correct > context setup)? If I understand your suggestion correctly, I think thats exactly whats done in the patches. Some words about the design: For nested-nested we need to shadow the l1-nested-ptable on the host. This is done using the vcpu->arch.mmu context which holds the l1 paging modes while the l2 is running. On a npt-fault from the l2 we just instrument the shadow-ptable code. This is the common case. because it happens all the time while the l2 is running. The other thing is that vcpu->arch.mmu.gva_to_gpa is expected to still work and translate virtual addresses of the l2 into physical addresses of the l1 (so it can be accessed with kvm functions). To do this we need to be aware of the L2 paging mode. It is stored in vcpu->arch.nested_mmu context. This context is only used for gva_to_gpa translations. It is not used to build shadow page tables or anything else. Thats the reason only the parts necessary for gva_to_gpa translations of the nested_mmu context are initialized. Since we can not use mmu.gva_to_gpa to translate only between l2_gpa and l1_gpa because this function is required to translate l2_gva to l1_gpa by other parts of kvm, the function which does this translation is moved to nested_mmu.gva_to_gpa. So basically the gva_to_gpa function pointers are swapped between mmu and nested_mmu. The nested_mmu.gva_to_gpa function is used in translate_gpa_nested which is assigned to the newly introduced translate_gpa callback of nested_mmu context. This callback is used in the walk_addr function to translate every l2_gpa address we read from cr3 or the guest ptes into l1_gpa to read the next step from the guest memory. In the old unnested case the translate_gpa callback would point to a function which just returns the gpa it is passed to it unmodified. The walk_addr function is generalized and now there are basically two versions of it: * walk_addr which translates using vcpu->arch.mmu context * walk_addr_nested which translates using vcpu->arch.nested_mmu context Thats pretty much how these patches work. > You probably need to include a flag in base_role to differentiate > between l1 / l2 shadow tables (say if they use the same cr3 value). Not sure if this is necessary. It may be necessary when large pages come into play. Otherwise the host npt pages are distinguished by the shadow npt pages by the direct-flag. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Clock issue in Windows XP guests
Le 04/03/2010 16:13, Zachary Amsden a écrit : On 03/03/2010 11:43 PM, Gilles PIETRI wrote: Hi, I have a host running a 2.6.32.7 kernel, and I'm using qemu-kvm 0.12.2. I have multiple guests, and one of them is running Windows XP. If I stare at the clock, I see that every now & then (~5s), it slows down a bit, and then try to cope with it. If I run some NTP synchronization software like ntpd, the offset is as high as 1s lost every 10s or so, which makes it impossible to use anything time based on the guest (audio stuff, mainly). I tried messing (as said on IRC) with the -rtc parameters, but to no avail. I tried the driftfix=slew option found in the --help output, but it says that driftfix is not a valid setting for rtc.. And anyway, I have no idea what this does (I'll be reading about it probably...) I've seen something remotely connected to this on the proxmox forum, but it was not that helpful (and proxmox runs qemu 0.11.x as it seems): http://forum.proxmox.com/threads/2050-Slow-clock-time-drift-in-windows-guests?p=17962 I remember using the -rtc-td-hack (and in fact, just read again about it here: http://forum.proxmox.com/threads/2381-Recommended-clock-source-for-KVM-guests), but it's not there anymore in 0.12.x, and I have no idea what it used to do (going on for some reading as well, when I have some time ;)) Oh, and the guests running linux are working just fine, and have no clock issue. Has anyone encountered such a problem? No, but I'd certainly like to fix it. Can you send basic host information like /proc/cpuinfo, dmesg from kernel? It's a bi quad core Dell R710, there's a part of the /proc/cpuinfo below, and I attached the dmesg. If you run the one Windows XP guest alone, does it run better or still the same? Hard to say, but I think I had the issue "from the start", when the guest was alone. I can't check that easily now that things are running however.. /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU E5504 @ 2.00GHz stepping: 5 cpu MHz : 1994.647 cache size : 4096 KB physical id : 1 siblings: 4 core id : 0 cpu cores : 4 apicid : 16 initial apicid : 16 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid bogomips: 3989.29 clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: Thanks for your time, Gilou # dmesg [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Linux version 2.6.32.7-r710virt (r...@nsc045.local) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Wed Feb 3 10:04:00 CET 2010 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-2.6.32.7-r710virt root=UUID=2f33521a-a6c5-4a4c-926d-6d9d9aaa62b5 ro [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] AMD AuthenticAMD [0.00] Centaur CentaurHauls [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 000a (usable) [0.00] BIOS-e820: 0010 - cf699000 (usable) [0.00] BIOS-e820: cf699000 - cf6af000 (reserved) [0.00] BIOS-e820: cf6af000 - cf6ce000 (ACPI data) [0.00] BIOS-e820: cf6ce000 - d000 (reserved) [0.00] BIOS-e820: e000 - f000 (reserved) [0.00] BIOS-e820: fe00 - 0001 (reserved) [0.00] BIOS-e820: 0001 - 00043000 (usable) [0.00] DMI 2.6 present. [0.00] last_pfn = 0x43 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-C write-protect [0.00] D-EBFFF uncachable [0.00] EC000-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 00 mask FF8000 write-back [0.00] 1 base 008000 mask FFC000 write-back [0.00] 2 base 00C000 mask FFF000 write-back [0.00] 3 base 01 mask FF write-back [0.00] 4 base 02 mask FE write-back [0.00] 5 base 04 mask FFC000 write-back [0.00] 6 disabled [0.00] 7 disabled [0.00] e820 update range: d
Re: [Qemu-devel] [PATCH 0/6] [PULL] qemu-kvm.git uq/master queue
On 03/04/2010 09:05 AM, Marcelo Tosatti wrote: The following changes since commit 55b1e61f640bb2cf3bed0b4cc6d4ba1326c625d9: Samuel Thibault (1): (curses) Use more descriptive values are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master Avi Kivity (1): Allocate memory below 4GB as one chunk Jan Kiszka (4): KVM: Rework of guest debug state writing KVM: Rework VCPU state writeback API KVM: x86: Restrict writeback of VCPU state x86: Extend validity of bsp_to_cpu Marcelo Tosatti (1): Add option to use file backed guest memory Pulled. Thanks. Regards, Anthony Liguori cpu-all.h |3 + exec.c| 132 hw/apic.c |2 - hw/pc.c | 14 ++ hw/ppc_newworld.c |3 - hw/ppc_oldworld.c |3 - hw/s390-virtio.c |1 - kvm-all.c | 43 +++- kvm.h | 26 +- qemu-options.hx | 16 ++ savevm.c |4 ++ sysemu.h |4 ++ target-i386/kvm.c | 77 +++-- target-i386/machine.c | 11 target-ppc/kvm.c |2 +- target-ppc/machine.c |4 -- target-s390x/kvm.c|3 +- vl.c | 41 +++ 18 files changed, 300 insertions(+), 89 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Clock issue in Windows XP guests
On 03/03/2010 11:43 PM, Gilles PIETRI wrote: Hi, I have a host running a 2.6.32.7 kernel, and I'm using qemu-kvm 0.12.2. I have multiple guests, and one of them is running Windows XP. If I stare at the clock, I see that every now & then (~5s), it slows down a bit, and then try to cope with it. If I run some NTP synchronization software like ntpd, the offset is as high as 1s lost every 10s or so, which makes it impossible to use anything time based on the guest (audio stuff, mainly). I tried messing (as said on IRC) with the -rtc parameters, but to no avail. I tried the driftfix=slew option found in the --help output, but it says that driftfix is not a valid setting for rtc.. And anyway, I have no idea what this does (I'll be reading about it probably...) I've seen something remotely connected to this on the proxmox forum, but it was not that helpful (and proxmox runs qemu 0.11.x as it seems): http://forum.proxmox.com/threads/2050-Slow-clock-time-drift-in-windows-guests?p=17962 I remember using the -rtc-td-hack (and in fact, just read again about it here: http://forum.proxmox.com/threads/2381-Recommended-clock-source-for-KVM-guests), but it's not there anymore in 0.12.x, and I have no idea what it used to do (going on for some reading as well, when I have some time ;)) Oh, and the guests running linux are working just fine, and have no clock issue. Has anyone encountered such a problem? No, but I'd certainly like to fix it. Can you send basic host information like /proc/cpuinfo, dmesg from kernel? If you run the one Windows XP guest alone, does it run better or still the same? Thanks, Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] Add option to use file backed guest memory
Port qemu-kvm's -mem-path and -mem-prealloc options. These are useful for backing guest memory with huge pages via hugetlbfs. Signed-off-by: Marcelo Tosatti CC: john cooper --- cpu-all.h |3 + exec.c | 115 -- qemu-options.hx | 16 vl.c| 12 ++ 4 files changed, 141 insertions(+), 5 deletions(-) diff --git a/cpu-all.h b/cpu-all.h index 8488bfe..9823c24 100644 --- a/cpu-all.h +++ b/cpu-all.h @@ -847,6 +847,9 @@ extern uint8_t *phys_ram_dirty; extern ram_addr_t ram_size; extern ram_addr_t last_ram_offset; +extern const char *mem_path; +extern int mem_prealloc; + /* physical memory access */ /* MMIO pages are identified by a combination of an IO device index and diff --git a/exec.c b/exec.c index 6a3c912..f41518e 100644 --- a/exec.c +++ b/exec.c @@ -2529,6 +2529,99 @@ void qemu_flush_coalesced_mmio_buffer(void) kvm_flush_coalesced_mmio_buffer(); } +#if defined(__linux__) && !defined(TARGET_S390X) + +#include + +#define HUGETLBFS_MAGIC 0x958458f6 + +static long gethugepagesize(const char *path) +{ +struct statfs fs; +int ret; + +do { + ret = statfs(path, &fs); +} while (ret != 0 && errno == EINTR); + +if (ret != 0) { + perror("statfs"); + return 0; +} + +if (fs.f_type != HUGETLBFS_MAGIC) + fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path); + +return fs.f_bsize; +} + +static void *file_ram_alloc(ram_addr_t memory, const char *path) +{ +char *filename; +void *area; +int fd; +#ifdef MAP_POPULATE +int flags; +#endif +unsigned long hpagesize; + +hpagesize = gethugepagesize(path); +if (!hpagesize) { + return NULL; +} + +if (memory < hpagesize) { +return NULL; +} + +if (kvm_enabled() && !kvm_has_sync_mmu()) { +fprintf(stderr, "host lacks kvm mmu notifiers, -mem-path unsupported\n"); +return NULL; +} + +if (asprintf(&filename, "%s/qemu_back_mem.XX", path) == -1) { + return NULL; +} + +fd = mkstemp(filename); +if (fd < 0) { + perror("mkstemp"); + free(filename); + return NULL; +} +unlink(filename); +free(filename); + +memory = (memory+hpagesize-1) & ~(hpagesize-1); + +/* + * ftruncate is not supported by hugetlbfs in older + * hosts, so don't bother bailing out on errors. + * If anything goes wrong with it under other filesystems, + * mmap will fail. + */ +if (ftruncate(fd, memory)) + perror("ftruncate"); + +#ifdef MAP_POPULATE +/* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case + * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED + * to sidestep this quirk. + */ +flags = mem_prealloc ? MAP_POPULATE | MAP_SHARED : MAP_PRIVATE; +area = mmap(0, memory, PROT_READ | PROT_WRITE, flags, fd, 0); +#else +area = mmap(0, memory, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0); +#endif +if (area == MAP_FAILED) { + perror("file_ram_alloc: can't mmap RAM pages"); + close(fd); + return (NULL); +} +return area; +} +#endif + ram_addr_t qemu_ram_alloc(ram_addr_t size) { RAMBlock *new_block; @@ -2536,16 +2629,28 @@ ram_addr_t qemu_ram_alloc(ram_addr_t size) size = TARGET_PAGE_ALIGN(size); new_block = qemu_malloc(sizeof(*new_block)); +if (mem_path) { +#if defined (__linux__) && !defined(TARGET_S390X) +new_block->host = file_ram_alloc(size, mem_path); +if (!new_block->host) +exit(1); +#else +fprintf(stderr, "-mem-path option unsupported\n"); +exit(1); +#endif +} else { #if defined(TARGET_S390X) && defined(CONFIG_KVM) -/* XXX S390 KVM requires the topmost vma of the RAM to be < 256GB */ -new_block->host = mmap((void*)0x100, size, PROT_EXEC|PROT_READ|PROT_WRITE, - MAP_SHARED | MAP_ANONYMOUS, -1, 0); +/* XXX S390 KVM requires the topmost vma of the RAM to be < 256GB */ +new_block->host = mmap((void*)0x100, size, +PROT_EXEC|PROT_READ|PROT_WRITE, +MAP_SHARED | MAP_ANONYMOUS, -1, 0); #else -new_block->host = qemu_vmalloc(size); +new_block->host = qemu_vmalloc(size); #endif #ifdef MADV_MERGEABLE -madvise(new_block->host, size, MADV_MERGEABLE); +madvise(new_block->host, size, MADV_MERGEABLE); #endif +} new_block->offset = last_ram_offset; new_block->length = size; diff --git a/qemu-options.hx b/qemu-options.hx index 7daa246..fd50add 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -314,6 +314,22 @@ a suffix of ``M'' or ``G'' can be used to signify a value in megabytes or gigabytes respectively. ETEXI +DEF("mem-path", HAS_ARG, QEMU_OPTION_mempath, +"-mem-path FILE provide backing storage for guest RAM\n") +STEXI +...@item -mem
[PATCH 6/6] x86: Extend validity of bsp_to_cpu
From: Jan Kiszka As we hard-wire the BSP to CPU 0 anyway and cpuid_apic_id equals cpu_index, bsp_to_cpu can also be based on the latter directly. This will help an early user of it: KVM while initializing mp_state. Signed-off-by: Jan Kiszka Signed-off-by: Marcelo Tosatti --- hw/pc.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index bdc297f..e50a488 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -760,7 +760,8 @@ static void pc_init_ne2k_isa(NICInfo *nd) int cpu_is_bsp(CPUState *env) { -return env->cpuid_apic_id == 0; +/* We hard-wire the BSP to the first CPU. */ +return env->cpu_index == 0; } static CPUState *pc_new_cpu(const char *cpu_model) -- 1.6.6.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] KVM: Rework VCPU state writeback API
From: Jan Kiszka This grand cleanup drops all reset and vmsave/load related synchronization points in favor of four(!) generic hooks: - cpu_synchronize_all_states in qemu_savevm_state_complete (initial sync from kernel before vmsave) - cpu_synchronize_all_post_init in qemu_loadvm_state (writeback after vmload) - cpu_synchronize_all_post_init in main after machine init - cpu_synchronize_all_post_reset in qemu_system_reset (writeback after system reset) These writeback points + the existing one of VCPU exec after cpu_synchronize_state map on three levels of writeback: - KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run) - KVM_PUT_RESET_STATE (on synchronous system reset, all VCPUs stopped) - KVM_PUT_FULL_STATE(on init or vmload, all VCPUs stopped as well) This level is passed to the arch-specific VCPU state writing function that will decide which concrete substates need to be written. That way, no writer of load, save or reset functions that interact with in-kernel KVM states will ever have to worry about synchronization again. That also means that a lot of reasons for races, segfaults and deadlocks are eliminated. cpu_synchronize_state remains untouched, just as Anthony suggested. We continue to need it before reading or writing of VCPU states that are also tracked by in-kernel KVM subsystems. Consequently, this patch removes many cpu_synchronize_state calls that are now redundant, just like remaining explicit register syncs. Signed-off-by: Jan Kiszka Signed-off-by: Marcelo Tosatti --- exec.c| 17 - hw/apic.c |2 -- hw/ppc_newworld.c |3 --- hw/ppc_oldworld.c |3 --- hw/s390-virtio.c |1 - kvm-all.c | 19 +-- kvm.h | 25 - savevm.c |4 sysemu.h |4 target-i386/kvm.c |2 +- target-i386/machine.c | 11 --- target-ppc/kvm.c |2 +- target-ppc/machine.c |4 target-s390x/kvm.c|3 +-- vl.c | 29 + 15 files changed, 77 insertions(+), 52 deletions(-) diff --git a/exec.c b/exec.c index f41518e..891e0ee 100644 --- a/exec.c +++ b/exec.c @@ -512,21 +512,6 @@ void cpu_exec_init_all(unsigned long tb_size) #if defined(CPU_SAVE_VERSION) && !defined(CONFIG_USER_ONLY) -static void cpu_common_pre_save(void *opaque) -{ -CPUState *env = opaque; - -cpu_synchronize_state(env); -} - -static int cpu_common_pre_load(void *opaque) -{ -CPUState *env = opaque; - -cpu_synchronize_state(env); -return 0; -} - static int cpu_common_post_load(void *opaque, int version_id) { CPUState *env = opaque; @@ -544,8 +529,6 @@ static const VMStateDescription vmstate_cpu_common = { .version_id = 1, .minimum_version_id = 1, .minimum_version_id_old = 1, -.pre_save = cpu_common_pre_save, -.pre_load = cpu_common_pre_load, .post_load = cpu_common_post_load, .fields = (VMStateField []) { VMSTATE_UINT32(halted, CPUState), diff --git a/hw/apic.c b/hw/apic.c index 87e7dc0..3c90f4c 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -938,8 +938,6 @@ static void apic_reset(void *opaque) APICState *s = opaque; int bsp; -cpu_synchronize_state(s->cpu_env); - bsp = cpu_is_bsp(s->cpu_env); s->apicbase = 0xfee0 | (bsp ? MSR_IA32_APICBASE_BSP : 0) | MSR_IA32_APICBASE_ENABLE; diff --git a/hw/ppc_newworld.c b/hw/ppc_newworld.c index bc86c85..d4f9013 100644 --- a/hw/ppc_newworld.c +++ b/hw/ppc_newworld.c @@ -167,9 +167,6 @@ static void ppc_core99_init (ram_addr_t ram_size, envs[i] = env; } -/* Make sure all register sets take effect */ -cpu_synchronize_state(env); - /* allocate RAM */ ram_offset = qemu_ram_alloc(ram_size); cpu_register_physical_memory(0, ram_size, ram_offset); diff --git a/hw/ppc_oldworld.c b/hw/ppc_oldworld.c index 04a7835..93c95ba 100644 --- a/hw/ppc_oldworld.c +++ b/hw/ppc_oldworld.c @@ -165,9 +165,6 @@ static void ppc_heathrow_init (ram_addr_t ram_size, envs[i] = env; } -/* Make sure all register sets take effect */ -cpu_synchronize_state(env); - /* allocate RAM */ if (ram_size > (2047 << 20)) { fprintf(stderr, diff --git a/hw/s390-virtio.c b/hw/s390-virtio.c index 3582728..ad3386f 100644 --- a/hw/s390-virtio.c +++ b/hw/s390-virtio.c @@ -185,7 +185,6 @@ static void s390_init(ram_addr_t ram_size, exit(1); } -cpu_synchronize_state(env); env->psw.addr = KERN_IMAGE_START; env->psw.mask = 0x00018000ULL; } diff --git a/kvm-all.c b/kvm-all.c index 2f7e33a..534ead0 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -156,10 +156,6 @@ static void kvm_reset_vcpu(void *opaque) CPUState *env = opaque; kvm_arch_reset_vcpu(env); -if (kvm_arch_put_registers(env)) { -fprintf(stderr, "Fat
[PATCH 3/6] KVM: Rework of guest debug state writing
From: Jan Kiszka So far we synchronized any dirty VCPU state back into the kernel before updating the guest debug state. This was a tribute to a deficite in x86 kernels before 2.6.33. But as this is an arch-dependent issue, it is better handle in the x86 part of KVM and remove the writeback point for generic code. This also avoids overwriting the flushed state later on if user space decides to change some more registers before resuming the guest. We furthermore need to reinject guest exceptions via the appropriate mechanism. That is KVM_SET_GUEST_DEBUG for older kernels and KVM_SET_VCPU_EVENTS for recent ones. Using both mechanisms at the same time will cause state corruptions. Signed-off-by: Jan Kiszka Signed-off-by: Marcelo Tosatti --- kvm-all.c | 24 kvm.h |1 + target-i386/kvm.c | 47 +++ 3 files changed, 60 insertions(+), 12 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 1a02076..2f7e33a 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -65,6 +65,7 @@ struct KVMState int broken_set_mem_region; int migration_log; int vcpu_events; +int robust_singlestep; #ifdef KVM_CAP_SET_GUEST_DEBUG struct kvm_sw_breakpoint_head kvm_sw_breakpoints; #endif @@ -659,6 +660,12 @@ int kvm_init(int smp_cpus) s->vcpu_events = kvm_check_extension(s, KVM_CAP_VCPU_EVENTS); #endif +s->robust_singlestep = 0; +#ifdef KVM_CAP_X86_ROBUST_SINGLESTEP +s->robust_singlestep = +kvm_check_extension(s, KVM_CAP_X86_ROBUST_SINGLESTEP); +#endif + ret = kvm_arch_init(s, smp_cpus); if (ret < 0) goto err; @@ -917,6 +924,11 @@ int kvm_has_vcpu_events(void) return kvm_state->vcpu_events; } +int kvm_has_robust_singlestep(void) +{ +return kvm_state->robust_singlestep; +} + void kvm_setup_guest_memory(void *start, size_t size) { if (!kvm_has_sync_mmu()) { @@ -974,10 +986,6 @@ static void kvm_invoke_set_guest_debug(void *data) struct kvm_set_guest_debug_data *dbg_data = data; CPUState *env = dbg_data->env; -if (env->kvm_vcpu_dirty) { -kvm_arch_put_registers(env); -env->kvm_vcpu_dirty = 0; -} dbg_data->err = kvm_vcpu_ioctl(env, KVM_SET_GUEST_DEBUG, &dbg_data->dbg); } @@ -985,12 +993,12 @@ int kvm_update_guest_debug(CPUState *env, unsigned long reinject_trap) { struct kvm_set_guest_debug_data data; -data.dbg.control = 0; -if (env->singlestep_enabled) -data.dbg.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP; +data.dbg.control = reinject_trap; +if (env->singlestep_enabled) { +data.dbg.control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP; +} kvm_arch_update_guest_debug(env, &data.dbg); -data.dbg.control |= reinject_trap; data.env = env; on_vcpu(env, kvm_invoke_set_guest_debug, &data); diff --git a/kvm.h b/kvm.h index a74dfcb..a602e45 100644 --- a/kvm.h +++ b/kvm.h @@ -40,6 +40,7 @@ int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size); int kvm_has_sync_mmu(void); int kvm_has_vcpu_events(void); +int kvm_has_robust_singlestep(void); void kvm_setup_guest_memory(void *start, size_t size); diff --git a/target-i386/kvm.c b/target-i386/kvm.c index d2116a7..e0247ea 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -852,6 +852,37 @@ static int kvm_get_vcpu_events(CPUState *env) return 0; } +static int kvm_guest_debug_workarounds(CPUState *env) +{ +int ret = 0; +#ifdef KVM_CAP_SET_GUEST_DEBUG +unsigned long reinject_trap = 0; + +if (!kvm_has_vcpu_events()) { +if (env->exception_injected == 1) { +reinject_trap = KVM_GUESTDBG_INJECT_DB; +} else if (env->exception_injected == 3) { +reinject_trap = KVM_GUESTDBG_INJECT_BP; +} +env->exception_injected = -1; +} + +/* + * Kernels before KVM_CAP_X86_ROBUST_SINGLESTEP overwrote flags.TF + * injected via SET_GUEST_DEBUG while updating GP regs. Work around this + * by updating the debug state once again if single-stepping is on. + * Another reason to call kvm_update_guest_debug here is a pending debug + * trap raise by the guest. On kernels without SET_VCPU_EVENTS we have to + * reinject them via SET_GUEST_DEBUG. + */ +if (reinject_trap || +(!kvm_has_robust_singlestep() && env->singlestep_enabled)) { +ret = kvm_update_guest_debug(env, reinject_trap); +} +#endif /* KVM_CAP_SET_GUEST_DEBUG */ +return ret; +} + int kvm_arch_put_registers(CPUState *env) { int ret; @@ -880,6 +911,11 @@ int kvm_arch_put_registers(CPUState *env) if (ret < 0) return ret; +/* must be last */ +ret = kvm_guest_debug_workarounds(env); +if (ret < 0) +return ret; + return 0; } @@ -1123,10 +1159,13 @@ int kvm_arch_debug(struct kvm_debug_exit_arch *arch_info) } else if (kvm_find_sw_breakpoint(cpu_single_env, arch_info->pc))
[PATCH 5/6] KVM: x86: Restrict writeback of VCPU state
From: Jan Kiszka Do not write nmi_pending, sipi_vector, and mpstate unless we at least go through a reset. And TSC as well as KVM wallclocks should only be written on full sync, otherwise we risk to drop some time on state read-modify-write. Signed-off-by: Jan Kiszka Signed-off-by: Marcelo Tosatti --- target-i386/kvm.c | 32 1 files changed, 20 insertions(+), 12 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 2c834df..40f8303 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -546,7 +546,7 @@ static void kvm_msr_entry_set(struct kvm_msr_entry *entry, entry->data = value; } -static int kvm_put_msrs(CPUState *env) +static int kvm_put_msrs(CPUState *env, int level) { struct { struct kvm_msrs info; @@ -560,7 +560,6 @@ static int kvm_put_msrs(CPUState *env) kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip); if (kvm_has_msr_star(env)) kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star); -kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc); #ifdef TARGET_X86_64 /* FIXME if lm capable */ kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar); @@ -568,8 +567,12 @@ static int kvm_put_msrs(CPUState *env) kvm_msr_entry_set(&msrs[n++], MSR_FMASK, env->fmask); kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar); #endif -kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); -kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); +if (level == KVM_PUT_FULL_STATE) { +kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc); +kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME, + env->system_time_msr); +kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); +} msr_data.info.nmsrs = n; @@ -782,7 +785,7 @@ static int kvm_get_mp_state(CPUState *env) return 0; } -static int kvm_put_vcpu_events(CPUState *env) +static int kvm_put_vcpu_events(CPUState *env, int level) { #ifdef KVM_CAP_VCPU_EVENTS struct kvm_vcpu_events events; @@ -806,8 +809,11 @@ static int kvm_put_vcpu_events(CPUState *env) events.sipi_vector = env->sipi_vector; -events.flags = -KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR; +events.flags = 0; +if (level >= KVM_PUT_RESET_STATE) { +events.flags |= +KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR; +} return kvm_vcpu_ioctl(env, KVM_SET_VCPU_EVENTS, &events); #else @@ -899,15 +905,17 @@ int kvm_arch_put_registers(CPUState *env, int level) if (ret < 0) return ret; -ret = kvm_put_msrs(env); +ret = kvm_put_msrs(env, level); if (ret < 0) return ret; -ret = kvm_put_mp_state(env); -if (ret < 0) -return ret; +if (level >= KVM_PUT_RESET_STATE) { +ret = kvm_put_mp_state(env); +if (ret < 0) +return ret; +} -ret = kvm_put_vcpu_events(env); +ret = kvm_put_vcpu_events(env, level); if (ret < 0) return ret; -- 1.6.6.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] Allocate memory below 4GB as one chunk
From: Avi Kivity Instead of allocating a separate chunk for the first 640KB and another for 1MB+, allocate one large chunk. This plays well in terms of alignment and size with large pages. Signed-off-by: Avi Kivity Signed-off-by: Marcelo Tosatti --- hw/pc.c | 11 ++- 1 files changed, 2 insertions(+), 9 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 4f6a522..bdc297f 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -833,18 +833,11 @@ static void pc_init1(ram_addr_t ram_size, vmport_init(); /* allocate RAM */ -ram_addr = qemu_ram_alloc(0xa); +ram_addr = qemu_ram_alloc(below_4g_mem_size); cpu_register_physical_memory(0, 0xa, ram_addr); - -/* Allocate, even though we won't register, so we don't break the - * phys_ram_base + PA assumption. This range includes vga (0xa - 0xc), - * and some bios areas, which will be registered later - */ -ram_addr = qemu_ram_alloc(0x10 - 0xa); -ram_addr = qemu_ram_alloc(below_4g_mem_size - 0x10); cpu_register_physical_memory(0x10, below_4g_mem_size - 0x10, - ram_addr); + ram_addr + 0x10); /* above 4giga memory allocation */ if (above_4g_mem_size > 0) { -- 1.6.6.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] [PULL] qemu-kvm.git uq/master queue
The following changes since commit 55b1e61f640bb2cf3bed0b4cc6d4ba1326c625d9: Samuel Thibault (1): (curses) Use more descriptive values are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master Avi Kivity (1): Allocate memory below 4GB as one chunk Jan Kiszka (4): KVM: Rework of guest debug state writing KVM: Rework VCPU state writeback API KVM: x86: Restrict writeback of VCPU state x86: Extend validity of bsp_to_cpu Marcelo Tosatti (1): Add option to use file backed guest memory cpu-all.h |3 + exec.c| 132 hw/apic.c |2 - hw/pc.c | 14 ++ hw/ppc_newworld.c |3 - hw/ppc_oldworld.c |3 - hw/s390-virtio.c |1 - kvm-all.c | 43 +++- kvm.h | 26 +- qemu-options.hx | 16 ++ savevm.c |4 ++ sysemu.h |4 ++ target-i386/kvm.c | 77 +++-- target-i386/machine.c | 11 target-ppc/kvm.c |2 +- target-ppc/machine.c |4 -- target-s390x/kvm.c|3 +- vl.c | 41 +++ 18 files changed, 300 insertions(+), 89 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v3] Add savevm/loadvm support for MCE
On Wed, Mar 03, 2010 at 04:52:46PM +0800, Huang Ying wrote: > MCE registers are saved/load into/from CPUState in > kvm_arch_save/load_regs. To simulate the MCG_STATUS clearing upon > reset, MSR_MCG_STATUS is set to 0 for KVM_PUT_RESET_STATE. > > v3: > > - use msrs[] in kvm_arch_load/save_regs and get_msr_entry directly. > > v2: > > - Rebased on new CPU registers save/load framework. > > Signed-off-by: Huang Ying > --- > qemu-kvm-x86.c | 36 > 1 file changed, 36 insertions(+) Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
On Wed, Mar 03, 2010 at 08:12:03PM +0100, Joerg Roedel wrote: > Hi, > > here are the patches that implement nested paging support for nested > svm. They are somewhat intrusive to the soft-mmu so I post them as RFC > in the first round to get feedback about the general direction of the > changes. Nevertheless I am proud to report that with these patches the > famous kernel-compile benchmark runs only 4% slower in the l2 guest as > in the l1 guest when l2 is single-processor. With SMP guests the > situation is very different. The more vcpus the guest has the more is > the performance drop from l1 to l2. > Anyway, this post is to get feedback about the overall concept of these > patches. Please review and give feedback :-) Joerg, What perf gain does this bring ? (i'm not aware of the current overhead). Overall comments: Can't you translate l2_gpa -> l1_gpa walking the current l1 nested pagetable, and pass that to the kvm tdp fault path (with the correct context setup)? You probably need to include a flag in base_role to differentiate between l1 / l2 shadow tables (say if they use the same cr3 value). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 03/10] x86: Extend validity of cpu_is_bsp
On Thu, Mar 04, 2010 at 12:35:45PM +0100, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Thu, Mar 04, 2010 at 09:23:46AM +0100, Jan Kiszka wrote: > >> Gleb Natapov wrote: > >>> On Thu, Mar 04, 2010 at 12:34:22AM +0100, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Mon, Mar 01, 2010 at 06:17:22PM +0100, Jan Kiszka wrote: > >> As we hard-wire the BSP to CPU 0 anyway and cpuid_apic_id equals > >> cpu_index, cpu_is_bsp can also be based on the latter directly. This > >> will help an early user of it: KVM while initializing mp_state. > >> > >> Signed-off-by: Jan Kiszka > >> --- > >> hw/pc.c |3 ++- > >> 1 files changed, 2 insertions(+), 1 deletions(-) > >> > >> diff --git a/hw/pc.c b/hw/pc.c > >> index b90a79e..58c32ea 100644 > >> --- a/hw/pc.c > >> +++ b/hw/pc.c > >> @@ -767,7 +767,8 @@ static void pc_init_ne2k_isa(NICInfo *nd) > >> > >> int cpu_is_bsp(CPUState *env) > >> { > >> -return env->cpuid_apic_id == 0; > >> +/* We hard-wire the BSP to the first CPU. */ > >> +return env->cpu_index == 0; > >> } > > We should not assume that. The function was written like that > > specifically so the code around it will not rely on this assumption. > > Now you change that specifically to write code that will do incorrect > > assumptions. I don't see the logic here. > The logic is that we do not support any other mapping yet - with or > without this change. Without it, we complicate the APIC initialization > for (so far) no good reason. Once we want to support different BSP > assignments, we need to go through the code and rework some parts anyway. > > >>> As far as I remember the only part that was missing was a command line to > >>> specify apic IDs for each CPU and what CPU is BSP. The code was ready > >>> otherwise. I's very sad if this was broken by other modifications. But > >>> changes like that actually pushes us back from our goal. Why not rework > >>> code so it will work with correct cpu_is_bsp() function instead of > >>> introducing this hack? > >> If you can confirm that there is a serious use case behind it, I will > >> look into this again. But so far, I did not find it. > >> > > Firs of all it is correctness issue. We should emulate x86 platform and > > nothing there says that BSP apic id is zero. Second part of CPU topology > > information is encoded in apic id. i.e when socket/core/ht topology is > > used we can't just arbitrary specify apic ids for each logical cpu, we > > should follow the rules described in SDM. For instance when more then 16 > > CPUs are present AMD advices to start numbering apic ids from 16 and leave > > first 16 IDs for IOAPICs. And third introduction of this hack shows that > > something is done wrong in other places of the code. Somewhere > > initialization order is incorrect. > > Well, it looks like we need to answer two questions: How shall to user > specify the BSP? And how to reliably map this on QEMU's internal > cpu_index? Depending on this, apic numbering may or may not be an > orthogonal issue. > Two good question :) We can extend -cpu command to let as specify base apic id for each socket. Apic ids of logical cpus are derived from this base acpi id depending on where in hierarchy the logical cpu resided. cpu_index thing in QEMU is pretty messy. The way non x86 arches use it make it hard to cleanup, so this is why I didn't want to rely on it at all and use acpi id instead. Thinking about it cpu_is_bsp() should really check BSP bit in apic base register. > BTW, do real systems allow to hot plug BSP as well? Or how is the case > handled when you unplug the BSP and then reboot the box? > Did you mean hot unplug BSP? OS determines what CPU is BSP by checking BSP bit in APIC base register. My guess is that there is some pin on CPU which value is mirrored as BSP bit in APIC base register. Board may have some logic to check what sockets are populated and chose one of them as BSP by pulling its pin up. But this is only guess. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] QEMU-KVM: Ask kernel about supported svm features
Joerg Roedel wrote: > On Wed, Mar 03, 2010 at 11:58:49PM +0100, Alexander Graf wrote: > >> Am 03.03.2010 um 20:15 schrieb Joerg Roedel : >> >> >>> This patch adds code to ask the kernel about the svm >>> features it supports for its guests and propagates them to >>> the guest. The new capability is necessary because the old >>> behavior of the kernel was to just return the host svm >>> features but every svm-feature needs emulation in the nested >>> svm kernel code. The new capability indicates that the >>> kernel is aware of that when returning svm cpuid >>> information. >>> >> Do we really need that complexity? >> > > Yes :-) > > >> By default the kernel masks out unsupported cpuid features anyway. So >> if we don't have npt guest support (enabled), the kernel module should >> just mask it out. >> > > The kernel does not mask out unsupported features. I also don't think > this would be a good idea because userspace won't be aware of that > change. > Fact it, we need a way to report the npt-emulation feature to userspace > because old kvm versions don't support it. So we can't pass the npt bit > unconditionally. The get_supported_cpuid ioctl is the way of choice > here. > But the current way get_supported_cpuid works for function 0x800a is > broken because it reports the host features. This was the reason to > introduce the new capability. > That's what I mean by masking. It used to happen implicitly, but has been changed to directly asking the kernel for its capabilities apparently. >> IOW, always passing npt should work. No capability should make it >> get masked out. >> > > No, as stated above always passing npt-bit into the kernel and letting > it mask out there isn't a good way to go (not only because this will > break if you use new qem-kvm on old kernel-space). > Ah, so we did have a bug in old KVM kernel modules. Sigh. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] QEMU-KVM: Ask kernel about supported svm features
On Wed, Mar 03, 2010 at 11:58:49PM +0100, Alexander Graf wrote: > > Am 03.03.2010 um 20:15 schrieb Joerg Roedel : > > >This patch adds code to ask the kernel about the svm > >features it supports for its guests and propagates them to > >the guest. The new capability is necessary because the old > >behavior of the kernel was to just return the host svm > >features but every svm-feature needs emulation in the nested > >svm kernel code. The new capability indicates that the > >kernel is aware of that when returning svm cpuid > >information. > > Do we really need that complexity? Yes :-) > By default the kernel masks out unsupported cpuid features anyway. So > if we don't have npt guest support (enabled), the kernel module should > just mask it out. The kernel does not mask out unsupported features. I also don't think this would be a good idea because userspace won't be aware of that change. Fact it, we need a way to report the npt-emulation feature to userspace because old kvm versions don't support it. So we can't pass the npt bit unconditionally. The get_supported_cpuid ioctl is the way of choice here. But the current way get_supported_cpuid works for function 0x800a is broken because it reports the host features. This was the reason to introduce the new capability. > IOW, always passing npt should work. No capability should make it > get masked out. No, as stated above always passing npt-bit into the kernel and letting it mask out there isn't a good way to go (not only because this will break if you use new qem-kvm on old kernel-space). Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 03/10] x86: Extend validity of cpu_is_bsp
Gleb Natapov wrote: > On Thu, Mar 04, 2010 at 09:23:46AM +0100, Jan Kiszka wrote: >> Gleb Natapov wrote: >>> On Thu, Mar 04, 2010 at 12:34:22AM +0100, Jan Kiszka wrote: Gleb Natapov wrote: > On Mon, Mar 01, 2010 at 06:17:22PM +0100, Jan Kiszka wrote: >> As we hard-wire the BSP to CPU 0 anyway and cpuid_apic_id equals >> cpu_index, cpu_is_bsp can also be based on the latter directly. This >> will help an early user of it: KVM while initializing mp_state. >> >> Signed-off-by: Jan Kiszka >> --- >> hw/pc.c |3 ++- >> 1 files changed, 2 insertions(+), 1 deletions(-) >> >> diff --git a/hw/pc.c b/hw/pc.c >> index b90a79e..58c32ea 100644 >> --- a/hw/pc.c >> +++ b/hw/pc.c >> @@ -767,7 +767,8 @@ static void pc_init_ne2k_isa(NICInfo *nd) >> >> int cpu_is_bsp(CPUState *env) >> { >> -return env->cpuid_apic_id == 0; >> +/* We hard-wire the BSP to the first CPU. */ >> +return env->cpu_index == 0; >> } > We should not assume that. The function was written like that > specifically so the code around it will not rely on this assumption. > Now you change that specifically to write code that will do incorrect > assumptions. I don't see the logic here. The logic is that we do not support any other mapping yet - with or without this change. Without it, we complicate the APIC initialization for (so far) no good reason. Once we want to support different BSP assignments, we need to go through the code and rework some parts anyway. >>> As far as I remember the only part that was missing was a command line to >>> specify apic IDs for each CPU and what CPU is BSP. The code was ready >>> otherwise. I's very sad if this was broken by other modifications. But >>> changes like that actually pushes us back from our goal. Why not rework >>> code so it will work with correct cpu_is_bsp() function instead of >>> introducing this hack? >> If you can confirm that there is a serious use case behind it, I will >> look into this again. But so far, I did not find it. >> > Firs of all it is correctness issue. We should emulate x86 platform and > nothing there says that BSP apic id is zero. Second part of CPU topology > information is encoded in apic id. i.e when socket/core/ht topology is > used we can't just arbitrary specify apic ids for each logical cpu, we > should follow the rules described in SDM. For instance when more then 16 > CPUs are present AMD advices to start numbering apic ids from 16 and leave > first 16 IDs for IOAPICs. And third introduction of this hack shows that > something is done wrong in other places of the code. Somewhere > initialization order is incorrect. Well, it looks like we need to answer two questions: How shall to user specify the BSP? And how to reliably map this on QEMU's internal cpu_index? Depending on this, apic numbering may or may not be an orthogonal issue. BTW, do real systems allow to hot plug BSP as well? Or how is the case handled when you unplug the BSP and then reboot the box? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
On Thu, Mar 04, 2010 at 12:44:48AM +0100, Alexander Graf wrote: > > On 03.03.2010, at 20:12, Joerg Roedel wrote: > > > Hi, > > > > here are the patches that implement nested paging support for nested > > svm. They are somewhat intrusive to the soft-mmu so I post them as RFC > > in the first round to get feedback about the general direction of the > > changes. Nevertheless I am proud to report that with these patches the > > famous kernel-compile benchmark runs only 4% slower in the l2 guest as > > in the l1 guest when l2 is single-processor. With SMP guests the > > situation is very different. The more vcpus the guest has the more is > > the performance drop from l1 to l2. > > Anyway, this post is to get feedback about the overall concept of these > > patches. Please review and give feedback :-) > > Nice job! It's great to see you finally got around to it :-). > > Have you tracked what slows down SMP l2 guests yet? So far I've been > assuming that IPIs just completely kill the performance, but I guess > it shouldn't be that bad, especially now where you have sped up the > #VMEXIT path that much. I have not yet looked deeper into this issue. I also suspect lockholder preemption to be the cause for this. I did the test with a populated nested page table too and the slowdown is still there. But thats all guessing, I need to do some research for the exact reasons. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/18] KVM: SVM: Report Nested Paging support to userspace
On Thu, Mar 04, 2010 at 12:37:42AM +0100, Alexander Graf wrote: > > On 03.03.2010, at 20:12, Joerg Roedel wrote: > > > This patch implements the reporting of the nested paging > > feature support to userspace. > > > > Signed-off-by: Joerg Roedel > > --- > > arch/x86/kvm/svm.c | 10 ++ > > 1 files changed, 10 insertions(+), 0 deletions(-) > > > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > > index fe1398e..ce71023 100644 > > --- a/arch/x86/kvm/svm.c > > +++ b/arch/x86/kvm/svm.c > > @@ -3289,6 +3289,16 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu) > > > > static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 > > *entry) > > { > > + switch (func) { > > + case 0x800A: > > + if (!npt_enabled) > > + break; > > if (!nested) > break; True, but shouldn't matter much because if nested is off the guest will not see the svm bit. It would only see that the processor could do nested paging if it had svm support ;-) Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Clock issue in Windows XP guests
Hi, I have a host running a 2.6.32.7 kernel, and I'm using qemu-kvm 0.12.2. I have multiple guests, and one of them is running Windows XP. If I stare at the clock, I see that every now & then (~5s), it slows down a bit, and then try to cope with it. If I run some NTP synchronization software like ntpd, the offset is as high as 1s lost every 10s or so, which makes it impossible to use anything time based on the guest (audio stuff, mainly). I tried messing (as said on IRC) with the -rtc parameters, but to no avail. I tried the driftfix=slew option found in the --help output, but it says that driftfix is not a valid setting for rtc.. And anyway, I have no idea what this does (I'll be reading about it probably...) I've seen something remotely connected to this on the proxmox forum, but it was not that helpful (and proxmox runs qemu 0.11.x as it seems): http://forum.proxmox.com/threads/2050-Slow-clock-time-drift-in-windows-guests?p=17962 I remember using the -rtc-td-hack (and in fact, just read again about it here: http://forum.proxmox.com/threads/2381-Recommended-clock-source-for-KVM-guests), but it's not there anymore in 0.12.x, and I have no idea what it used to do (going on for some reading as well, when I have some time ;)) Oh, and the guests running linux are working just fine, and have no clock issue. Has anyone encountered such a problem? Regards, Gilou -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu-kvm: Fix boot CPU setup for the case it is unsupported
Commit 52b03dd702 incorrectly failed KVM initialization in case the kernel did not support KVM_CAP_SET_BOOT_CPU_ID. Fix this, and also improve error propagation of kvm_create_context at this chance. Signed-off-by: Jan Kiszka --- OK, it really was me. :) qemu-kvm-x86.c |9 +++-- qemu-kvm.c |4 +++- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 7a5925a..7d42fdc 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -672,7 +672,7 @@ static const VMStateDescription vmstate_kvmclock= { int kvm_arch_qemu_create_context(void) { -int i; +int i, r; struct utsname utsname; uname(&utsname); @@ -696,7 +696,12 @@ int kvm_arch_qemu_create_context(void) vmstate_register(0, &vmstate_kvmclock, &kvmclock_data); #endif -return kvm_set_boot_cpu_id(0); +r = kvm_set_boot_cpu_id(0); +if (r < 0 && r != -ENOSYS) { +return r; +} + +return 0; } static void set_msr_entry(struct kvm_msr_entry *entry, uint32_t index, diff --git a/qemu-kvm.c b/qemu-kvm.c index 222ca97..e417f21 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -2091,8 +2091,10 @@ static int kvm_create_context(void) return -1; } r = kvm_arch_qemu_create_context(); -if (r < 0) +if (r < 0) { kvm_finalize(kvm_state); +return -1; +} if (kvm_pit && !kvm_pit_reinject) { if (kvm_reinject_control(kvm_context, 0)) { fprintf(stderr, "failure to disable in-kernel PIT reinjection\n"); signature.asc Description: OpenPGP digital signature
Re: [PATCH v4 03/10] x86: Extend validity of cpu_is_bsp
On Thu, Mar 04, 2010 at 09:23:46AM +0100, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Thu, Mar 04, 2010 at 12:34:22AM +0100, Jan Kiszka wrote: > >> Gleb Natapov wrote: > >>> On Mon, Mar 01, 2010 at 06:17:22PM +0100, Jan Kiszka wrote: > As we hard-wire the BSP to CPU 0 anyway and cpuid_apic_id equals > cpu_index, cpu_is_bsp can also be based on the latter directly. This > will help an early user of it: KVM while initializing mp_state. > > Signed-off-by: Jan Kiszka > --- > hw/pc.c |3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/hw/pc.c b/hw/pc.c > index b90a79e..58c32ea 100644 > --- a/hw/pc.c > +++ b/hw/pc.c > @@ -767,7 +767,8 @@ static void pc_init_ne2k_isa(NICInfo *nd) > > int cpu_is_bsp(CPUState *env) > { > -return env->cpuid_apic_id == 0; > +/* We hard-wire the BSP to the first CPU. */ > +return env->cpu_index == 0; > } > >>> We should not assume that. The function was written like that > >>> specifically so the code around it will not rely on this assumption. > >>> Now you change that specifically to write code that will do incorrect > >>> assumptions. I don't see the logic here. > >> The logic is that we do not support any other mapping yet - with or > >> without this change. Without it, we complicate the APIC initialization > >> for (so far) no good reason. Once we want to support different BSP > >> assignments, we need to go through the code and rework some parts anyway. > >> > > As far as I remember the only part that was missing was a command line to > > specify apic IDs for each CPU and what CPU is BSP. The code was ready > > otherwise. I's very sad if this was broken by other modifications. But > > changes like that actually pushes us back from our goal. Why not rework > > code so it will work with correct cpu_is_bsp() function instead of > > introducing this hack? > > If you can confirm that there is a serious use case behind it, I will > look into this again. But so far, I did not find it. > Firs of all it is correctness issue. We should emulate x86 platform and nothing there says that BSP apic id is zero. Second part of CPU topology information is encoded in apic id. i.e when socket/core/ht topology is used we can't just arbitrary specify apic ids for each logical cpu, we should follow the rules described in SDM. For instance when more then 16 CPUs are present AMD advices to start numbering apic ids from 16 and leave first 16 IDs for IOAPICs. And third introduction of this hack shows that something is done wrong in other places of the code. Somewhere initialization order is incorrect. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: segfault at start with latest qemu-kvm.git
Jan Kiszka wrote: > David S. Ahern wrote: >> On 03/03/2010 04:20 PM, Jan Kiszka wrote: >>> David S. Ahern wrote: On 03/03/2010 04:08 PM, Jan Kiszka wrote: > David S. Ahern wrote: >> With latest qemu-kvm.git I am getting a segfault at start: >> >> /tmp/qemu-kvm-test/bin/qemu-system-x86_64 -m 1024 -smp 2 \ >> -drive file=/images/f12-x86_64.img,if=virtio,cache=none,boot=on >> >> kvm_create_vcpu: Invalid argument >> Segmentation fault (core dumped) >> >> >> git bisect points to: >> >> Bisecting: 0 revisions left to test after this (roughly 0 steps) >> [52b03dd70261934688cb00768c4b1e404716a337] qemu-kvm: Move >> kvm_set_boot_cpu_id >> >> >> $ git show >> commit 7811d4e8ec057d25db68f900be1f09a142faca49 >> Author: Marcelo Tosatti >> Date: Mon Mar 1 21:36:31 2010 -0300 >> >> >> If I manually back out the patch it will boot fine. >> > Problem persists after removing the build directory and doing a fresh > configure && make? I'm asking before taking the bug (which would be > mine, likely) as I recently spent some hours "debugging" a volatile > build system issue. > > Jan > Before sending the email I pulled a fresh clone in a completely different directory (/tmp) to determine if it was something I introduced. I then went back to my usual location, unapplied the patch and it worked fine. >>> OK, that reason can be excluded. What's your host kernel kvm version? >>> >>> (Of course, the issue does not show up here. But virtio currently does >>> not boot for me - independent of my patch.) >>> >>> Jan >>> >> Fedora Core 12, >> >> Linux daahern-lx 2.6.31.12-174.2.22.fc12.x86_64 #1 SMP Fri Feb 19 >> 18:55:03 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux >> > > Reproduced after switching back to kvm-kmod-2.6.31, will debug. > Subtle memory corruption: qemu_malloc is returning a pointer that happens to become kvm_state twice. I bet my patch just exchanges some of the involved parties and exposes the issue more prominently. Trying to understand malloc's issue now... Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH v4 03/10] x86: Extend validity of cpu_is_bsp
Gleb Natapov wrote: > On Thu, Mar 04, 2010 at 12:34:22AM +0100, Jan Kiszka wrote: >> Gleb Natapov wrote: >>> On Mon, Mar 01, 2010 at 06:17:22PM +0100, Jan Kiszka wrote: As we hard-wire the BSP to CPU 0 anyway and cpuid_apic_id equals cpu_index, cpu_is_bsp can also be based on the latter directly. This will help an early user of it: KVM while initializing mp_state. Signed-off-by: Jan Kiszka --- hw/pc.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index b90a79e..58c32ea 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -767,7 +767,8 @@ static void pc_init_ne2k_isa(NICInfo *nd) int cpu_is_bsp(CPUState *env) { -return env->cpuid_apic_id == 0; +/* We hard-wire the BSP to the first CPU. */ +return env->cpu_index == 0; } >>> We should not assume that. The function was written like that >>> specifically so the code around it will not rely on this assumption. >>> Now you change that specifically to write code that will do incorrect >>> assumptions. I don't see the logic here. >> The logic is that we do not support any other mapping yet - with or >> without this change. Without it, we complicate the APIC initialization >> for (so far) no good reason. Once we want to support different BSP >> assignments, we need to go through the code and rework some parts anyway. >> > As far as I remember the only part that was missing was a command line to > specify apic IDs for each CPU and what CPU is BSP. The code was ready > otherwise. I's very sad if this was broken by other modifications. But > changes like that actually pushes us back from our goal. Why not rework > code so it will work with correct cpu_is_bsp() function instead of > introducing this hack? If you can confirm that there is a serious use case behind it, I will look into this again. But so far, I did not find it. Jan signature.asc Description: OpenPGP digital signature