Re: KVM usability

2010-03-04 Thread Markus Armbruster
"H. Peter Anvin"  writes:

> On 03/04/2010 12:13 PM, Zachary Amsden wrote:
>> 
>> These are all basic things that are left completely undefined by qemu's 
>> lack of a top-level configuration file, and it's an inexcusable disgrace.
>> 
>
> There is a top-level configuration file for Qemu, at least in the
> development tree.  It's optional, still, but it's there now.

It covers much but not all of the command line.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM-Test: Add kvm userspace unit test

2010-03-04 Thread sshang
  The test use kvm test harness kvmctl load binary test case file to test 
various function of kvm kernel module.

Signed-off-by: sshang 
---
 client/tests/kvm/tests/unit_test.py|   29 +
 client/tests/kvm/tests_base.cfg.sample |7 +++
 2 files changed, 36 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/tests/unit_test.py

diff --git a/client/tests/kvm/tests/unit_test.py 
b/client/tests/kvm/tests/unit_test.py
new file mode 100644
index 000..9bc7441
--- /dev/null
+++ b/client/tests/kvm/tests/unit_test.py
@@ -0,0 +1,29 @@
+import os
+from autotest_lib.client.bin import utils
+from autotest_lib.client.common_lib import error
+
+def run_unit_test(test, params, env):
+"""
+This is kvm userspace unit test, use kvm test harness kvmctl load binary
+test case file to test various function of kvm kernel module.
+The output of all unit test can be found in the test result dir.
+"""
+
+case_list = params.get("case_list","access apic emulator hypercall irq"\
+  " port80 realmode sieve smptest tsc stringio vmexit").split()
+srcdir = params.get("srcdir",test.srcdir)
+user_dir = os.path.join(srcdir,"kvm_userspace/kvm/user")
+os.chdir(user_dir)
+test_fail_list = []
+
+for i in case_list:
+result_file = test.outputdir + "/" + i
+testfile = i + ".flat"
+results = utils.system("./kvmctl test/x86/bootstrap test/x86/" + \
+ testfile + " > " + result_file,ignore_status=True)
+if results != 0:
+test_fail_list.append(i)
+
+if test_fail_list:
+raise error.TestFail("< " + " ".join(test_fail_list) + \
+   " >")
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 040d0c3..0918c26 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -300,6 +300,13 @@ variants:
 shutdown_method = shell
 kill_vm = yes
 kill_vm_gracefully = no
+
+- unit_test:
+type = unit_test
+case_list = access apic emulator hypercall msr port80 realmode sieve 
smptest tsc stringio vmexit
+#srcdir should be same as build.cfg
+srcdir = 
+vms = ''
 # Do not define test variants below shutdown
 
 
-- 
1.5.5.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: Use native_store_idt() instead of kvm_get_idt()

2010-03-04 Thread Wei Yongjun
This patch use generic linux function native_store_idt()
instead of kvm_get_idt(), and also removed the useless
function kvm_get_idt().

Signed-off-by: Wei Yongjun 
---
 arch/x86/include/asm/kvm_host.h |5 -
 arch/x86/kvm/vmx.c  |2 +-
 2 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ec891a2..ea1b6c6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -716,11 +716,6 @@ static inline void kvm_load_ldt(u16 sel)
asm("lldt %0" : : "rm"(sel));
 }
 
-static inline void kvm_get_idt(struct desc_ptr *table)
-{
-   asm("sidt %0" : "=m"(*table));
-}
-
 #ifdef CONFIG_X86_64
 static inline unsigned long read_msr(unsigned long msr)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ae3217d..a08929a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2445,7 +2445,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 
vmcs_write16(HOST_TR_SELECTOR, GDT_ENTRY_TSS*8);  /* 22.2.4 */
 
-   kvm_get_idt(&dt);
+   native_store_idt(&dt);
vmcs_writel(HOST_IDTR_BASE, dt.address);   /* 22.2.4 */
 
asm("mov $.Lkvm_vmx_return, %0" : "=r"(kvm_vmx_return));
-- 
1.6.3.3


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pc-bios/bios.bin - where it comes from?

2010-03-04 Thread Anthony Liguori

On 03/04/2010 04:46 PM, Michael Tokarev wrote:

Hello.

There are a few bugs filed about an.. interesting
behavour.  For example:

  http://www.mail-archive.com/kvm@vger.kernel.org/msg29834.html
  https://bugs.launchpad.net/qemu/+bug/513273

After quite some mix-n-matching, at least on my test machine,
I can say that the issue gets triggered by seabios.  When
using pc-bios/bios.bin everything is ok.  But when using
any other bios.bin, even downloading seabios-0.5.1.tar.gz
and building it - on a debian lenny system anyway - by
running `make', the problem triggers.

I tried different versions/variations of vgabios.bin
(it's only -vga std which triggers the issue so far),
including 0.6b and 0.6c built from sources, vgabios.bin
from debian packages (0.6b and 0.6c), and the one
included in qemu-0.12.3.tar.gz.  And my conclusion
so far is that vgabios.bin has exactly _no_ effect on
the issue.

But when using bios.bin from qemu-kvm-0.12.3.tar.gz,
and _only_ that bios.bin, the problem goes away.
   


pc-bios/bios.bin gets built from roms/seabios.

We don't ship seabios 0.5.1 in 0.12.3, we ship 0.5.1-stable which is two 
commits ahead of 0.5.1.



So the question arises: where that pc-bios/bios.bin
comes from into qemu-0.12.3.tar.gz?  It is either
built from some other sources (not from seabios-0.5.1),
or built with some extra/different compiler/linker options,
or built using different compiler/linker.

This is partially confirmed on ubuntu as well, but,
as far as I understand, there the behavour is different
with different versions of vgabios.
   


One of the reasons we include a git submodule and the source for the 
bios is so that distributors don't have to deal with building the 
packages independently.  Morale of the story is, just use the source we 
ship and don't try to be more clever than that :-)



In case it's not clear: I'm testing qemu-kvm-0.12.3;
bios.bin is the same in qemu-0.12.3 and qemu-kvm-0.12.3.

BTW, is there any reason preventing updating vgabios
to 0.6c version - the latest released one?
   


There's no compelling improvement in 0.6c and updating vgabios is not 
something I'm eager to do unless there's a strong justification.


Regards,

Anthony Liguori




Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pc-bios/bios.bin - where it comes from?

2010-03-04 Thread Michael Tokarev
Hello.

There are a few bugs filed about an.. interesting
behavour.  For example:

 http://www.mail-archive.com/kvm@vger.kernel.org/msg29834.html
 https://bugs.launchpad.net/qemu/+bug/513273

After quite some mix-n-matching, at least on my test machine,
I can say that the issue gets triggered by seabios.  When
using pc-bios/bios.bin everything is ok.  But when using
any other bios.bin, even downloading seabios-0.5.1.tar.gz
and building it - on a debian lenny system anyway - by
running `make', the problem triggers.

I tried different versions/variations of vgabios.bin
(it's only -vga std which triggers the issue so far),
including 0.6b and 0.6c built from sources, vgabios.bin
from debian packages (0.6b and 0.6c), and the one
included in qemu-0.12.3.tar.gz.  And my conclusion
so far is that vgabios.bin has exactly _no_ effect on
the issue.

But when using bios.bin from qemu-kvm-0.12.3.tar.gz,
and _only_ that bios.bin, the problem goes away.

So the question arises: where that pc-bios/bios.bin
comes from into qemu-0.12.3.tar.gz?  It is either
built from some other sources (not from seabios-0.5.1),
or built with some extra/different compiler/linker options,
or built using different compiler/linker.

This is partially confirmed on ubuntu as well, but,
as far as I understand, there the behavour is different
with different versions of vgabios.

In case it's not clear: I'm testing qemu-kvm-0.12.3;
bios.bin is the same in qemu-0.12.3 and qemu-kvm-0.12.3.

BTW, is there any reason preventing updating vgabios
to 0.6c version - the latest released one?

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM usability

2010-03-04 Thread H. Peter Anvin
On 03/04/2010 12:13 PM, Zachary Amsden wrote:
> 
> These are all basic things that are left completely undefined by qemu's 
> lack of a top-level configuration file, and it's an inexcusable disgrace.
> 

There is a top-level configuration file for Qemu, at least in the
development tree.  It's optional, still, but it's there now.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Fix segfault with ram_size > 4095M without kvm

2010-03-04 Thread Ryan Harper
* Aurelien Jarno  [2010-03-04 15:27]:
> On Tue, Feb 23, 2010 at 06:02:15PM +0100, Aurelien Jarno wrote:
> > Ryan Harper a écrit :
> > > Currently, x86_64-softmmu qemu segfaults when trying to use > 4095M 
> > > memsize.
> > > This patch adds a simple check and error message (much like the 2047 
> > > limit on
> > > 32-bit hosts) on ram_size in the control path after we determine we're
> > > not using kvm
> > > 
> > > Upstream qemu-kvm is affected if using the -no-kvm option; this patch 
> > > address
> > > the segfault there as well.
> > 
> > It looks like workarounding the real bug. At some point both
> > i386-softmmu (via PAE) and x86_64-softmmu were able to support > 4GB of
> > memory. I remember adding the support long time ago, and testing it with
> > 32GB of emulated RAM.
> 
> I have looked into that, and actually one patch to get full support for
>  > 4GB of memory was not merged:

Thanks for looking into this.

> 
> diff --git a/exec.c b/exec.c
> index 8389c54..b0bb058 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -166,7 +166,7 @@ typedef struct PhysPageDesc {
>   */
>  #define L1_BITS (TARGET_VIRT_ADDR_SPACE_BITS - L2_BITS - TARGET_PAGE_BITS)
>  #else
> -#define L1_BITS (32 - L2_BITS - TARGET_PAGE_BITS)
> +#define L1_BITS (TARGET_PHYS_ADDR_SPACE_BITS - L2_BITS - TARGET_PAGE_BITS)
>  #endif
> 
>  #define L1_SIZE (1 << L1_BITS)
> 
> While this patch is acceptable for qemu i386, it creates a big L1 table
> for x86_64 or other 64-bit architectures, resulting in huge memory 
> overhead.
> 
> The recent multilevel tables patches from Richard Henderson should fix 
> the problem for HEAD (I haven't found time to look at them in details).
> 
> As this is not something we really want to backport, your patch makes
> sense in stable-0.12.

Anthony, do you want me to resend and rebase against 0.12-stable?


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Fix segfault with ram_size > 4095M without kvm

2010-03-04 Thread Aurelien Jarno
On Tue, Feb 23, 2010 at 06:02:15PM +0100, Aurelien Jarno wrote:
> Ryan Harper a écrit :
> > Currently, x86_64-softmmu qemu segfaults when trying to use > 4095M memsize.
> > This patch adds a simple check and error message (much like the 2047 limit 
> > on
> > 32-bit hosts) on ram_size in the control path after we determine we're
> > not using kvm
> > 
> > Upstream qemu-kvm is affected if using the -no-kvm option; this patch 
> > address
> > the segfault there as well.
> 
> It looks like workarounding the real bug. At some point both
> i386-softmmu (via PAE) and x86_64-softmmu were able to support > 4GB of
> memory. I remember adding the support long time ago, and testing it with
> 32GB of emulated RAM.

I have looked into that, and actually one patch to get full support for
 > 4GB of memory was not merged:

diff --git a/exec.c b/exec.c
index 8389c54..b0bb058 100644
--- a/exec.c
+++ b/exec.c
@@ -166,7 +166,7 @@ typedef struct PhysPageDesc {
  */
 #define L1_BITS (TARGET_VIRT_ADDR_SPACE_BITS - L2_BITS - TARGET_PAGE_BITS)
 #else
-#define L1_BITS (32 - L2_BITS - TARGET_PAGE_BITS)
+#define L1_BITS (TARGET_PHYS_ADDR_SPACE_BITS - L2_BITS - TARGET_PAGE_BITS)
 #endif
 
 #define L1_SIZE (1 << L1_BITS)

While this patch is acceptable for qemu i386, it creates a big L1 table
for x86_64 or other 64-bit architectures, resulting in huge memory 
overhead.

The recent multilevel tables patches from Richard Henderson should fix 
the problem for HEAD (I haven't found time to look at them in details).

As this is not something we really want to backport, your patch makes
sense in stable-0.12.


> > Signed-off-by: Ryan Harper 
> > ---
> >  vl.c |6 ++
> >  1 files changed, 6 insertions(+), 0 deletions(-)
> > 
> > diff --git a/vl.c b/vl.c
> > index db7a178..a659e98 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -5760,6 +5760,12 @@ int main(int argc, char **argv, char **envp)
> >  fprintf(stderr, "failed to initialize KVM\n");
> >  exit(1);
> >  }
> > +} else {
> > +/* without kvm enabled, we can only support 4095 MB RAM */
> > +if (ram_size > (4095UL << 20)) {
> > +fprintf(stderr, "qemu: without kvm support at most 4095 MB RAM 
> > can be simulated\n");
> > +exit(1);
> > +}
> >  }
> >  
> >  if (qemu_init_main_loop()) {
> 
> 
> -- 
> Aurelien Jarno  GPG: 1024D/F1BCDB73
> aurel...@aurel32.net http://www.aurel32.net
> 
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IVSHMEM and limits on shared memory

2010-03-04 Thread Cam Macdonell
On Thu, Mar 4, 2010 at 1:12 PM, Khaled Ibrahim  wrote:
>
>>
>> As a test, I removed anywhere my patch stored the size of the shared
>> memory region and hard coded the size of 512 MB into qemu_ram_alloc
>> and pci_register_bar, so that my patch never writes the size of the
>> memory region anywhere. And I discovered that the value of 512MB
>> still shows up at the offset you mention, so it seems something else
>> is storing that value in the wrong location and corrupting memory.
>>
>> Can you try using the version from the git repo and see if the error recurs?
>
> Thank you Cam. I tried to build using git repo, but the build crashes while 
> booting on my machine without the shared memory patch. I used 
> git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git. Which git repo are you 
> using? Can you send me a ivshmem patched qemu-kvm, or tell me which stable 
> qemu-kvm repo should I use?

That's the correct repo.

Your VM crashes using the latest git repo?  That is unusual.  I'll
send you a tar ball off-list of a patched version of KVM.

>
> Thanks,
> -Khaled
>
> _
> Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
> http://clk.atdmt.com/GBL/go/201469226/direct/01/
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM usability

2010-03-04 Thread Anthony Liguori

On 03/04/2010 02:13 PM, Zachary Amsden wrote:


The biggest problem with virt-manager isn't virt-manager, it's that it 
is trying to do a nearly intractable task.  Because a qemu virtual 
machine is not a machine at all, just a disk image without the proper 
metadata to track the important properties of the machine, like what 
revision of PCI chipset, how many disk controllers the thing is using, 
what kind of graphics card, etc.


These are all basic things that are left completely undefined by 
qemu's lack of a top-level configuration file, and it's an inexcusable 
disgrace.


So virt-manager or any other management tool has the burden of 
creating and maintaining a bunch of metadata around this workhorse 
tool called qemu and invoking libvirt to figure out which set of 
100,000 blasted command line options to pass on.


That's why it falls short of expectations at times, not because 
virt-manager is crap, but because there is no well defined, well 
designed infrastructure for it to manage and the ad-hoc solution here 
is total crap.


And this is why we're doing QMP and qdev.  It's long overdue 
infrastructure.  It's not just the problem that you describe though.  
virt-manager is limited by what libvirt provides and today libvirt does 
not expose nearly enough qemu features for virt-manager to even attempt 
to solve the problem on it's own.


Regards,

Anthony Liguori


Zach


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM usability

2010-03-04 Thread Zachary Amsden

On 03/04/2010 10:00 AM, Lucas Meneghel Rodrigues wrote:

On Tue, 2010-03-02 at 11:11 +0100, Peter Zijlstra wrote:
   

On Mon, 2010-03-01 at 09:14 -0600, Anthony Liguori wrote:
 

The real
question to ask is, why are you using qemu directly instead of using
virt-manager?
   

Because I suspect Ingo, like me, is a command line user, launching a gui
to start kvm when there is a kvm command around just sounds daft.

Also, I just installed and tried it, virt-manager is a total piece of
shit,
 

That statement is far from being fair. I use virt-manager quite a lot,
since I want to keep track of what's going on on KVM virtualization for
end users in Fedora. What's shipped with Fedora 12 is pretty decent in
many regards, but as in any other software there's plenty of room for
improvements.
   


The biggest problem with virt-manager isn't virt-manager, it's that it 
is trying to do a nearly intractable task.  Because a qemu virtual 
machine is not a machine at all, just a disk image without the proper 
metadata to track the important properties of the machine, like what 
revision of PCI chipset, how many disk controllers the thing is using, 
what kind of graphics card, etc.


These are all basic things that are left completely undefined by qemu's 
lack of a top-level configuration file, and it's an inexcusable disgrace.


So virt-manager or any other management tool has the burden of creating 
and maintaining a bunch of metadata around this workhorse tool called 
qemu and invoking libvirt to figure out which set of 100,000 blasted 
command line options to pass on.


That's why it falls short of expectations at times, not because 
virt-manager is crap, but because there is no well defined, well 
designed infrastructure for it to manage and the ad-hoc solution here is 
total crap.


Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: IVSHMEM and limits on shared memory

2010-03-04 Thread Khaled Ibrahim

>
> As a test, I removed anywhere my patch stored the size of the shared
> memory region and hard coded the size of 512 MB into qemu_ram_alloc
> and pci_register_bar, so that my patch never writes the size of the
> memory region anywhere. And I discovered that the value of 512MB
> still shows up at the offset you mention, so it seems something else
> is storing that value in the wrong location and corrupting memory.
>
> Can you try using the version from the git repo and see if the error recurs?

Thank you Cam. I tried to build using git repo, but the build crashes while 
booting on my machine without the shared memory patch. I used 
git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git. Which git repo are you 
using? Can you send me a ivshmem patched qemu-kvm, or tell me which stable 
qemu-kvm repo should I use? 

Thanks,
-Khaled
  
_
Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
http://clk.atdmt.com/GBL/go/201469226/direct/01/--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM usability

2010-03-04 Thread Lucas Meneghel Rodrigues
On Tue, 2010-03-02 at 11:11 +0100, Peter Zijlstra wrote:
> On Mon, 2010-03-01 at 09:14 -0600, Anthony Liguori wrote:
> > The real 
> > question to ask is, why are you using qemu directly instead of using 
> > virt-manager? 
> 
> Because I suspect Ingo, like me, is a command line user, launching a gui
> to start kvm when there is a kvm command around just sounds daft.
> 
> Also, I just installed and tried it, virt-manager is a total piece of
> shit, 

That statement is far from being fair. I use virt-manager quite a lot,
since I want to keep track of what's going on on KVM virtualization for
end users in Fedora. What's shipped with Fedora 12 is pretty decent in
many regards, but as in any other software there's plenty of room for
improvements.

> I wouldn't even know how to begin telling it how to start my
> freshly baked kernel with serial console on stdio and some block image I
> just created from the gentoo stage3 tarball.

Fair enough, it is convoluted to do what you want using virt-manager
(although possible), but mainly because this wasn't a use case for it.
You can't expect the application designers to support every single type
of work flow under the sun.

> That is, after 5 minutes clicking I have no idea how to even launch an
> ISO with the thing, I prefer reading the kvm manpage over using some
> mouse only gui crap like that.

For the 1st thing you wanted to do, I agree that it was cumbersome. But
to create a VM and make it boot an ISO available on your hard drive it
*is* trivial. There's a wizard to do it, because it's the main use case
of the thing.

If you want to point out problems on virt-manager that is fine, and the
developers will do what is possible to address problems, insults are not
necessary.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] Add -kvm option

2010-03-04 Thread Anthony Liguori

On 03/02/2010 12:25 PM, Glauber Costa wrote:

On Tue, Mar 02, 2010 at 01:31:05AM -0300, Marcelo Tosatti wrote:
   

On Fri, Feb 26, 2010 at 05:12:19PM -0300, Glauber Costa wrote:
 

This option deprecates --enable-kvm. It is a more flexible option,
that makes use of qemu-opts, and allow us to pass on options to enable or
disable kernel irqchip, for example.

Signed-off-by: Glauber Costa
   

Really have to replace -enable-kvm? Can't you keep compatibility for it?
 

We don't have to , but I'd rather deprecate it. I don't feel strongly, though.
   


It needs to stay.

For enabling/disabling, if you don't like -enable-kvm, I'd suggest 
thinking about modeling it through CPU.  For instance:


-cpu host,accel=kvm|tcg|kvm,tcg

Since we already specify CPU's in a global config, if you took this 
approach, it would make it possible to tweak the default kvm vs. tcg 
selection within the config file so a user could control whether vms 
were created with tcg or kvm by default or whether it tried kvm and then 
fell back to tcg.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] Add -kvm option

2010-03-04 Thread Glauber Costa
On Thu, Mar 04, 2010 at 05:20:22PM +0100, Jan Kiszka wrote:
> Jan Kiszka wrote:
> > Glauber Costa wrote:
> >> This option deprecates --enable-kvm. It is a more flexible option,
> >> that makes use of qemu-opts, and allow us to pass on options to enable or
> >> disable kernel irqchip, for example.
> >>
> > 
> > ...
> > 
> >> diff --git a/qemu-options.hx b/qemu-options.hx
> >> index 3f49b44..f8fd86d 100644
> >> --- a/qemu-options.hx
> >> +++ b/qemu-options.hx
> >> @@ -1793,10 +1793,17 @@ Set the filename for the BIOS.
> >>  ETEXI
> >>  
> >>  #ifdef CONFIG_KVM
> >> -DEF("enable-kvm", 0, QEMU_OPTION_enable_kvm, \
> >> -"-enable-kvm enable KVM full virtualization support\n")
> >> +HXCOMM Options deprecated by -kvm
> >> +DEF("enable-kvm", 0, QEMU_OPTION_enable_kvm, "")
> >> +
> >> +DEF("kvm", HAS_ARG, QEMU_OPTION_kvm, \
> >> +"-kvm enable=on|off,irqchip-in-kernel=on|off\n" \
> >> +"enable KVM full virtualization support\n")
> >> +
> 
> Argh, never trust documentation: The magic option is "enabled", not
> "enable". :)
> 
> > 
> > I would prefer "irqchip=kernel|user" - shorter and even more verbose.
> 
> And we should refuse to work if the user tries to enable in-kernel
> support without having io-threads enabled. That obviously fails silently
> so far.
> 
> "info kvm" should also be extended to report the configuration in force.
I am waiting for marcelo to apply your patches (if he hasn't done already), then
I'll redo this. Agreed with this point, so plan on changing.

> 
> > 
> > Forgot if that was discussed already: Do we want "pit=kernel|user" as well?
> > 
> 
I guess we do.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] Add -kvm option

2010-03-04 Thread Anthony Liguori

On 02/26/2010 02:12 PM, Glauber Costa wrote:

This option deprecates --enable-kvm. It is a more flexible option,
that makes use of qemu-opts, and allow us to pass on options to enable or
disable kernel irqchip, for example.

Signed-off-by: Glauber Costa
   


kernel vs. userspace irqchip shouldn't be a kvm option.

Ideally, it would be a -device thing but I think we've agreed that 
-device won't cover platform devices.  So what we probably should do is 
change the machine option to accept a qopts list,  IOW:


-M pc,irqchip=user|kernel,pit=user|kernel,...

That certainly makes a lot more sense for non-x86 KVM targets (like s390 
and ppc).  And certainly, there's nothing that says that every x86 KVM 
target is going to have an APIC...


Regards,

Anthony Liguori


---
  kvm-all.c   |1 +
  kvm.h   |1 +
  qemu-config.c   |   16 
  qemu-config.h   |1 +
  qemu-options.hx |   11 +--
  vl.c|   11 +++
  6 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 00e7411..0527e0f 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -52,6 +52,7 @@ typedef struct KVMSlot
  typedef struct kvm_dirty_log KVMDirtyLog;

  int kvm_allowed = 0;
+int kvm_use_kernel_chip = 1;

  struct KVMState
  {
diff --git a/kvm.h b/kvm.h
index 7278874..480e651 100644
--- a/kvm.h
+++ b/kvm.h
@@ -20,6 +20,7 @@

  #ifdef CONFIG_KVM
  extern int kvm_allowed;
+extern int kvm_use_kernel_chip;

  #define kvm_enabled() (kvm_allowed)
  #else
diff --git a/qemu-config.c b/qemu-config.c
index 246fae6..310838e 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -290,6 +290,21 @@ QemuOptsList qemu_cpudef_opts = {
  },
  };

+QemuOptsList qemu_kvm_opts = {
+.name = "kvm",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_kvm_opts.head),
+.desc = {
+{
+.name = "irqchip-in-kernel",
+.type = QEMU_OPT_BOOL,
+},{
+.name = "enabled",
+.type = QEMU_OPT_BOOL,
+},
+{ /* end if list */ }
+},
+};
+
  static QemuOptsList *lists[] = {
  &qemu_drive_opts,
  &qemu_chardev_opts,
@@ -300,6 +315,7 @@ static QemuOptsList *lists[] = {
  &qemu_global_opts,
  &qemu_mon_opts,
  &qemu_cpudef_opts,
+&qemu_kvm_opts,
  NULL,
  };

diff --git a/qemu-config.h b/qemu-config.h
index b335c42..506e5fb 100644
--- a/qemu-config.h
+++ b/qemu-config.h
@@ -10,6 +10,7 @@ extern QemuOptsList qemu_rtc_opts;
  extern QemuOptsList qemu_global_opts;
  extern QemuOptsList qemu_mon_opts;
  extern QemuOptsList qemu_cpudef_opts;
+extern QemuOptsList qemu_kvm_opts;

  int qemu_set_option(const char *str);
  int qemu_global_option(const char *str);
diff --git a/qemu-options.hx b/qemu-options.hx
index 3f49b44..f8fd86d 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1793,10 +1793,17 @@ Set the filename for the BIOS.
  ETEXI

  #ifdef CONFIG_KVM
-DEF("enable-kvm", 0, QEMU_OPTION_enable_kvm, \
-"-enable-kvm enable KVM full virtualization support\n")
+HXCOMM Options deprecated by -kvm
+DEF("enable-kvm", 0, QEMU_OPTION_enable_kvm, "")
+
+DEF("kvm", HAS_ARG, QEMU_OPTION_kvm, \
+"-kvm enable=on|off,irqchip-in-kernel=on|off\n" \
+"enable KVM full virtualization support\n")
+
  #endif
  STEXI
+...@item -kvm enable=[on|off][,irqchip-in-kernel=on|off]
+...@findex -kvm
  @item -enable-kvm
  @findex -enable-kvm
  Enable KVM full virtualization support. This option is only available
diff --git a/vl.c b/vl.c
index 66e477a..8c94fee 100644
--- a/vl.c
+++ b/vl.c
@@ -5416,6 +5416,17 @@ int main(int argc, char **argv, char **envp)
  case QEMU_OPTION_enable_kvm:
  kvm_allowed = 1;
  break;
+case QEMU_OPTION_kvm:
+
+opts = qemu_opts_parse(&qemu_kvm_opts, optarg, NULL);
+if (!opts) {
+fprintf(stderr, "parse error: %s\n", optarg);
+exit(1);
+}
+
+kvm_allowed = qemu_opt_get_bool(opts, "enabled", 1);
+kvm_use_kernel_chip = qemu_opt_get_bool(opts, 
"irqchip-in-kernel", 1);
+break;
  #endif
  case QEMU_OPTION_usb:
  usb_enabled = 1;
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: Rework VCPU state writeback API

2010-03-04 Thread Marcelo Tosatti
On Thu, Mar 04, 2010 at 12:58:58AM -0500, Kevin O'Connor wrote:
> On Thu, Mar 04, 2010 at 01:21:12AM -0300, Marcelo Tosatti wrote:
> > The regression seems to be caused by seabios commit d7e998f. Kevin, the
> > failure can be seen on the attached screenshot, which happens on the
> > first reboot of WinXP 32 installation (after copying files etc).
> 
> Sorry - I also noticed a bug in that commit recently.  I pushed the
> fix I had in my local tree.

Thanks, it does fix the issue here. Anthony can you please update
seabios?

TIA

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm upstream segfaults when using -smp 1

2010-03-04 Thread Jan Kiszka
Lucas Meneghel Rodrigues wrote:
> Hi folks:
> 
> Today's upstream qemu-kvm.git is crashing when attempting to use -smp 1:
> 
> 03/04 12:56:12 DEBUG|kvm_vm:0461| Running qemu command:
> /usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor 
> unix:/tmp/monitor-20100304-125508-G6lf,server,nowait -drive 
> file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net 
> nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 
> -smp 1 -drive 
> file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom
>  -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp 
> /usr/local/autotest/tests/kvm/images/tftpboot  -boot d -bootp /pxelinux.0 
> -boot n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0
> 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) kvm_create_vcpu: Bad file 
> descriptor
> 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) /bin/sh: line 1: 17273 
> Segmentation fault  (core dumped) /usr/local/autotest/tests/kvm/qemu 
> -name 'vm1' -monitor unix:/tmp/monitor-20100304-125508-G6lf,server,nowait 
> -drive file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net 
> nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 
> -smp 1 -drive 
> file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom
>  -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp 
> /usr/local/autotest/tests/kvm/images/tftpboot -boot d -bootp /pxelinux.0 
> -boot n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0
> 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) (Process terminated with status 
> 139)
> 
> I have opened a bug about it on KVM's bug tracking system on sourceforge. 
> Relevant software versions involved:
> 
> Commit hash for git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git is 
> 7811d4e8ec057d25db68f900be1f09a142faca49 (tag kvm-88-3686-g7811d4e)
> Kernel: 2.6.31.12-174.2.22.fc12.x86_64
> 
> Please let me know if you need more information about it. 
> 

Should be fixed by this:

http://thread.gmane.org/gmane.comp.emulators.kvm.devel/47883

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu-kvm upstream segfaults when using -smp 1

2010-03-04 Thread Lucas Meneghel Rodrigues
Hi folks:

Today's upstream qemu-kvm.git is crashing when attempting to use -smp 1:

03/04 12:56:12 DEBUG|kvm_vm:0461| Running qemu command:
/usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor 
unix:/tmp/monitor-20100304-125508-G6lf,server,nowait -drive 
file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net 
nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 
-smp 1 -drive 
file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom
 -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp 
/usr/local/autotest/tests/kvm/images/tftpboot  -boot d -bootp /pxelinux.0 -boot 
n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0
03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) kvm_create_vcpu: Bad file 
descriptor
03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) /bin/sh: line 1: 17273 
Segmentation fault  (core dumped) /usr/local/autotest/tests/kvm/qemu -name 
'vm1' -monitor unix:/tmp/monitor-20100304-125508-G6lf,server,nowait -drive 
file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net 
nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 
-smp 1 -drive 
file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom
 -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp 
/usr/local/autotest/tests/kvm/images/tftpboot -boot d -bootp /pxelinux.0 -boot 
n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0
03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) (Process terminated with status 
139)

I have opened a bug about it on KVM's bug tracking system on sourceforge. 
Relevant software versions involved:

Commit hash for git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git is 
7811d4e8ec057d25db68f900be1f09a142faca49 (tag kvm-88-3686-g7811d4e)
Kernel: 2.6.31.12-174.2.22.fc12.x86_64

Please let me know if you need more information about it. 

Lucas

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2963581 ] qemu-kvm upstream crashes when using -smp 1

2010-03-04 Thread SourceForge.net
Bugs item #2963581, was opened at 2010-03-04 18:10
Message generated for change (Tracker Item Submitted) made by 
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2963581&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: qemu
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Lucas Meneghel Rodrigues ()
Assigned to: Nobody/Anonymous (nobody)
Summary: qemu-kvm upstream crashes when using -smp 1

Initial Comment:
qemu-kvm.git master is crashing when using -smp 1

Relevant versions:
Commit hash for git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git is 
7811d4e8ec057d25db68f900be1f09a142faca49 (tag kvm-88-3686-g7811d4e)
Kernel: 2.6.31.12-174.2.22.fc12.x86_64

Steps to reproduce
1 - Clone git repo git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git
2 - Build qemu-kvm from this repo
3 - Try to start it with -smp 1, reference command line:
03/04 12:56:12 DEBUG|kvm_vm:0461| Running qemu command:
/usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor 
unix:/tmp/monitor-20100304-125508-G6lf,server,nowait -drive 
file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net 
nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 
-smp 1 -drive 
file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom
 -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp 
/usr/local/autotest/tests/kvm/images/tftpboot  -boot d -bootp /pxelinux.0 -boot 
n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0
03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) kvm_create_vcpu: Bad file 
descriptor
03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) /bin/sh: line 1: 17273 
Segmentation fault  (core dumped) /usr/local/autotest/tests/kvm/qemu -name 
'vm1' -monitor unix:/tmp/monitor-20100304-125508-G6lf,server,nowait -drive 
file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net 
nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 
-smp 1 -drive 
file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom
 -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp 
/usr/local/autotest/tests/kvm/images/tftpboot -boot d -bootp /pxelinux.0 -boot 
n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0
03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) (Process terminated with status 
139)

So we have a segmentation fault.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2963581&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/10] provide apic-kvm

2010-03-04 Thread Jan Kiszka
Glauber Costa wrote:
> This patch provides the file apic-kvm.c, which implements a schim over
> the kvm in-kernel APIC.
> 
> Signed-off-by: Glauber Costa 
> ---
>  Makefile.target   |2 +-
>  hw/apic-kvm.c |  157 
> +
>  hw/pc.c   |6 ++-
>  hw/pc.h   |2 +
>  kvm.h |5 ++
>  target-i386/cpu.h |4 ++
>  target-i386/kvm.c |   25 -
>  7 files changed, 197 insertions(+), 4 deletions(-)
>  create mode 100644 hw/apic-kvm.c
> 
> diff --git a/Makefile.target b/Makefile.target
> index bc5263e..f00af07 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -213,7 +213,7 @@ obj-i386-y += usb-uhci.o vmmouse.o vmport.o vmware_vga.o 
> hpet.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += ne2000-isa.o debugcon.o multiboot.o
>  
> -obj-i386-$(CONFIG_KVM) += ioapic-kvm.o i8259-kvm.o
> +obj-i386-$(CONFIG_KVM) += ioapic-kvm.o i8259-kvm.o apic-kvm.o
>  
>  # shared objects
>  obj-ppc-y = ppc.o ide/core.o ide/qdev.o ide/isa.o ide/pci.o ide/macio.o
> diff --git a/hw/apic-kvm.c b/hw/apic-kvm.c
> new file mode 100644
> index 000..089fa45
> --- /dev/null
> +++ b/hw/apic-kvm.c
> @@ -0,0 +1,157 @@
> +#include "hw.h"
> +#include "pc.h"
> +#include "pci.h"
> +#include "msix.h"
> +#include "qemu-timer.h"
> +#include "host-utils.h"
> +#include "kvm.h"
> +
> +#define APIC_LVT_NB  6
> +#define APIC_LVT_LINT0   3
> +
> +struct qemu_lapic_state {
> +uint32_t apicbase;
> +uint8_t id;
> +uint8_t arb_id;
> +uint8_t tpr;
> +uint32_t spurious_vec;
> +uint8_t log_dest;
> +uint8_t dest_mode;
> +uint32_t isr[8];  /* in service register */
> +uint32_t tmr[8];  /* trigger mode register */
> +uint32_t irr[8]; /* interrupt request register */
> +uint32_t lvt[APIC_LVT_NB];
> +uint32_t esr; /* error register */
> +uint32_t icr[2];
> +
> +uint32_t divide_conf;
> +int count_shift;
> +uint32_t initial_count;
> +int64_t initial_count_load_time, next_time;
> +uint32_t idx;
> +int sipi_vector;
> +int wait_for_sipi;
> +};
> +
> +typedef struct APICState {
> +CPUState *cpu_env;
> +
> +/* KVM lapic structure is just a big array of regs. But it is what kvm
> + * functions expect. So have both the fields separated, for easy access,
> + * and the kvm stucture, for ioctls communications */
> +union {
> +struct qemu_lapic_state dev;
> +struct kvm_lapic_state kvm_lapic_state;

That looks fishy to me on second sight: Is, e.g., loading the
kvm_lapic_state from the kernel supposed to magically fill the (totally
unaligned) qemu_lapic_state structure? I'm missing the translations of
kvm_kernel_lapic_load_from_user/save_to_user here or some effort to
arrange qemu_lapic_state in a way that it robustly maps on the register
array passed to/from the kernel (if that is possible, haven't checked yet).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10] uq/master: irqchip-in-kernel support

2010-03-04 Thread Jan Kiszka
Glauber Costa wrote:
> Hi guys,
> 
> This is the same in-kernel irqchip support already posted to qemu-devel,
> just rebased, retested, etc. It passes my basic tests, so it seem to be
> still in good shape.
> 
> It is provided against uq/master as part of the integration efforts

Just as another heads-up:

host->guest networking performance over slirp and non-virtio NICs
suffers with this irqchip support the same way as in qemu-kvm. It's not
a bug I expect to be directly related to these changes, but it is at
least triggered by them and should now really be addressed.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] Add -kvm option

2010-03-04 Thread Jan Kiszka
Jan Kiszka wrote:
> Glauber Costa wrote:
>> This option deprecates --enable-kvm. It is a more flexible option,
>> that makes use of qemu-opts, and allow us to pass on options to enable or
>> disable kernel irqchip, for example.
>>
> 
> ...
> 
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 3f49b44..f8fd86d 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -1793,10 +1793,17 @@ Set the filename for the BIOS.
>>  ETEXI
>>  
>>  #ifdef CONFIG_KVM
>> -DEF("enable-kvm", 0, QEMU_OPTION_enable_kvm, \
>> -"-enable-kvm enable KVM full virtualization support\n")
>> +HXCOMM Options deprecated by -kvm
>> +DEF("enable-kvm", 0, QEMU_OPTION_enable_kvm, "")
>> +
>> +DEF("kvm", HAS_ARG, QEMU_OPTION_kvm, \
>> +"-kvm enable=on|off,irqchip-in-kernel=on|off\n" \
>> +"enable KVM full virtualization support\n")
>> +

Argh, never trust documentation: The magic option is "enabled", not
"enable". :)

> 
> I would prefer "irqchip=kernel|user" - shorter and even more verbose.

And we should refuse to work if the user tries to enable in-kernel
support without having io-threads enabled. That obviously fails silently
so far.

"info kvm" should also be extended to report the configuration in force.

> 
> Forgot if that was discussed already: Do we want "pit=kernel|user" as well?
> 

No comments on this?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Fix boot CPU setup for the case it is unsupported

2010-03-04 Thread David S. Ahern




On 03/04/2010 02:00 AM, Jan Kiszka wrote:
> Commit 52b03dd702 incorrectly failed KVM initialization in case the
> kernel did not support KVM_CAP_SET_BOOT_CPU_ID. Fix this, and also
> improve error propagation of kvm_create_context at this chance.
> 
> Signed-off-by: Jan Kiszka 
> ---
> 
> OK, it really was me. :)
> 
>  qemu-kvm-x86.c |9 +++--
>  qemu-kvm.c |4 +++-
>  2 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> index 7a5925a..7d42fdc 100644
> --- a/qemu-kvm-x86.c
> +++ b/qemu-kvm-x86.c
> @@ -672,7 +672,7 @@ static const VMStateDescription vmstate_kvmclock= {
>  
>  int kvm_arch_qemu_create_context(void)
>  {
> -int i;
> +int i, r;
>  struct utsname utsname;
>  
>  uname(&utsname);
> @@ -696,7 +696,12 @@ int kvm_arch_qemu_create_context(void)
>  vmstate_register(0, &vmstate_kvmclock, &kvmclock_data);
>  #endif
>  
> -return kvm_set_boot_cpu_id(0);
> +r = kvm_set_boot_cpu_id(0);
> +if (r < 0 && r != -ENOSYS) {
> +return r;
> +}
> +
> +return 0;
>  }
>  
>  static void set_msr_entry(struct kvm_msr_entry *entry, uint32_t index,
> diff --git a/qemu-kvm.c b/qemu-kvm.c
> index 222ca97..e417f21 100644
> --- a/qemu-kvm.c
> +++ b/qemu-kvm.c
> @@ -2091,8 +2091,10 @@ static int kvm_create_context(void)
>  return -1;
>  }
>  r = kvm_arch_qemu_create_context();
> -if (r < 0)
> +if (r < 0) {
>  kvm_finalize(kvm_state);
> +return -1;
> +}
>  if (kvm_pit && !kvm_pit_reinject) {
>  if (kvm_reinject_control(kvm_context, 0)) {
>  fprintf(stderr, "failure to disable in-kernel PIT 
> reinjection\n");
> 

Works for me: FC12 host, FC12 guest.

David
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)

2010-03-04 Thread Joerg Roedel
On Thu, Mar 04, 2010 at 11:42:55AM -0300, Marcelo Tosatti wrote:
> On Wed, Mar 03, 2010 at 08:12:03PM +0100, Joerg Roedel wrote:
> > Hi,
> > 
> > here are the patches that implement nested paging support for nested
> > svm. They are somewhat intrusive to the soft-mmu so I post them as RFC
> > in the first round to get feedback about the general direction of the
> > changes.  Nevertheless I am proud to report that with these patches the
> > famous kernel-compile benchmark runs only 4% slower in the l2 guest as
> > in the l1 guest when l2 is single-processor. With SMP guests the
> > situation is very different. The more vcpus the guest has the more is
> > the performance drop from l1 to l2. 
> > Anyway, this post is to get feedback about the overall concept of these
> > patches.  Please review and give feedback :-)
> 
> Joerg,
> 
> What perf gain does this bring ? (i'm not aware of the current
> overhead).

The benchmark was an allnoconfig kernel compile in tmpfs which took with
the same guest image:

as l1-guest with npt:

2m23s

as l2-guest with l1(nested)-l2(shadow):

around 8-9 minutes

as l2-guest with l1(nested)-l2(shadow) without the recent msrpm
optimization:

around 19 minutes

as l2-guest with l1(nested)-l2(nested) [this patchset]:

2m25s-2m30s

> Overall comments:
> 
> Can't you translate l2_gpa -> l1_gpa walking the current l1 nested
> pagetable, and pass that to the kvm tdp fault path (with the correct
> context setup)?

If I understand your suggestion correctly, I think thats exactly whats
done in the patches. Some words about the design:

For nested-nested we need to shadow the l1-nested-ptable on the host.
This is done using the vcpu->arch.mmu context which holds the l1 paging
modes while the l2 is running. On a npt-fault from the l2 we just
instrument the shadow-ptable code. This is the common case. because it
happens all the time while the l2 is running.

The other thing is that vcpu->arch.mmu.gva_to_gpa is expected to still
work and translate virtual addresses of the l2 into physical addresses
of the l1 (so it can be accessed with kvm functions).

To do this we need to be aware of the L2 paging mode. It is stored in
vcpu->arch.nested_mmu context. This context is only used for gva_to_gpa
translations. It is not used to build shadow page tables or anything
else. Thats the reason only the parts necessary for gva_to_gpa
translations of the nested_mmu context are initialized.

Since we can not use mmu.gva_to_gpa to translate only between l2_gpa and
l1_gpa because this function is required to translate l2_gva to l1_gpa
by other parts of kvm, the function which does this translation is moved
to nested_mmu.gva_to_gpa. So basically the gva_to_gpa function pointers
are swapped between mmu and nested_mmu.

The nested_mmu.gva_to_gpa function is used in translate_gpa_nested which
is assigned to the newly introduced translate_gpa callback of nested_mmu
context.

This callback is used in the walk_addr function to translate every
l2_gpa address we read from cr3 or the guest ptes into l1_gpa to read
the next step from the guest memory.

In the old unnested case the translate_gpa callback would point to a
function which just returns the gpa it is passed to it unmodified. The
walk_addr function is generalized and now there are basically two
versions of it:

* walk_addr which translates using vcpu->arch.mmu context
* walk_addr_nested which translates using vcpu->arch.nested_mmu
  context

Thats pretty much how these patches work.

> You probably need to include a flag in base_role to differentiate
> between l1 / l2 shadow tables (say if they use the same cr3 value).

Not sure if this is necessary. It may be necessary when large pages come
into play. Otherwise the host npt pages are distinguished by the shadow
npt pages by the direct-flag.

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock issue in Windows XP guests

2010-03-04 Thread Gilles PIETRI

Le 04/03/2010 16:13, Zachary Amsden a écrit :

On 03/03/2010 11:43 PM, Gilles PIETRI wrote:

Hi,

I have a host running a 2.6.32.7 kernel, and I'm using qemu-kvm 
0.12.2. I have multiple guests, and one of them is running Windows XP. 
If I stare at the clock, I see that every now & then (~5s), it slows 
down a bit, and then try to cope with it. If I run some NTP 
synchronization software like ntpd, the offset is as high as 1s lost 
every 10s or so, which makes it impossible to use anything time based 
on the guest (audio stuff, mainly).


I tried messing (as said on IRC) with the -rtc parameters, but to no 
avail. I tried the driftfix=slew option found in the --help output, 
but it says that driftfix is not a valid setting for rtc.. And anyway, 
I have no idea what this does (I'll be reading about it probably...)


I've seen something remotely connected to this on the proxmox forum, 
but it was not that helpful (and proxmox runs qemu 0.11.x as it seems):
http://forum.proxmox.com/threads/2050-Slow-clock-time-drift-in-windows-guests?p=17962 



I remember using the -rtc-td-hack (and in fact, just read again about 
it here: 
http://forum.proxmox.com/threads/2381-Recommended-clock-source-for-KVM-guests), 
but it's not there anymore in 0.12.x, and I have no idea what it used 
to do (going on for some reading as well, when I have some time ;))


Oh, and the guests running linux are working just fine, and have no 
clock issue.


Has anyone encountered such a problem?


No, but I'd certainly like to fix it.  Can you send basic host 
information like /proc/cpuinfo, dmesg from kernel?


It's a bi quad core Dell R710, there's a part of the /proc/cpuinfo 
below, and I attached the dmesg.




If you run the one Windows XP guest alone, does it run better or still 
the same?


Hard to say, but I think I had the issue "from the start", when the 
guest was alone. I can't check that easily now that things are running 
however..


/proc/cpuinfo

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 26
model name  : Intel(R) Xeon(R) CPU   E5504  @ 2.00GHz
stepping: 5
cpu MHz : 1994.647
cache size  : 4096 KB
physical id : 1
siblings: 4
core id : 0
cpu cores   : 4
apicid  : 16
initial apicid  : 16
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good 
xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 
ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi 
flexpriority ept vpid

bogomips: 3989.29
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Thanks for your time,

Gilou
# dmesg 
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 2.6.32.7-r710virt (r...@nsc045.local) (gcc version 
4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Wed Feb 3 10:04:00 CET 2010
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-2.6.32.7-r710virt 
root=UUID=2f33521a-a6c5-4a4c-926d-6d9d9aaa62b5 ro
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 000a (usable)
[0.00]  BIOS-e820: 0010 - cf699000 (usable)
[0.00]  BIOS-e820: cf699000 - cf6af000 (reserved)
[0.00]  BIOS-e820: cf6af000 - cf6ce000 (ACPI data)
[0.00]  BIOS-e820: cf6ce000 - d000 (reserved)
[0.00]  BIOS-e820: e000 - f000 (reserved)
[0.00]  BIOS-e820: fe00 - 0001 (reserved)
[0.00]  BIOS-e820: 0001 - 00043000 (usable)
[0.00] DMI 2.6 present.
[0.00] last_pfn = 0x43 max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-C write-protect
[0.00]   D-EBFFF uncachable
[0.00]   EC000-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 00 mask FF8000 write-back
[0.00]   1 base 008000 mask FFC000 write-back
[0.00]   2 base 00C000 mask FFF000 write-back
[0.00]   3 base 01 mask FF write-back
[0.00]   4 base 02 mask FE write-back
[0.00]   5 base 04 mask FFC000 write-back
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] e820 update range: d

Re: [Qemu-devel] [PATCH 0/6] [PULL] qemu-kvm.git uq/master queue

2010-03-04 Thread Anthony Liguori

On 03/04/2010 09:05 AM, Marcelo Tosatti wrote:

The following changes since commit 55b1e61f640bb2cf3bed0b4cc6d4ba1326c625d9:
   Samuel Thibault (1):
 (curses) Use more descriptive values

are available in the git repository at:

   git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

Avi Kivity (1):
   Allocate memory below 4GB as one chunk

Jan Kiszka (4):
   KVM: Rework of guest debug state writing
   KVM: Rework VCPU state writeback API
   KVM: x86: Restrict writeback of VCPU state
   x86: Extend validity of bsp_to_cpu

Marcelo Tosatti (1):
   Add option to use file backed guest memory
   


Pulled.  Thanks.

Regards,

Anthony Liguori


  cpu-all.h |3 +
  exec.c|  132 
  hw/apic.c |2 -
  hw/pc.c   |   14 ++
  hw/ppc_newworld.c |3 -
  hw/ppc_oldworld.c |3 -
  hw/s390-virtio.c  |1 -
  kvm-all.c |   43 +++-
  kvm.h |   26 +-
  qemu-options.hx   |   16 ++
  savevm.c  |4 ++
  sysemu.h  |4 ++
  target-i386/kvm.c |   77 +++--
  target-i386/machine.c |   11 
  target-ppc/kvm.c  |2 +-
  target-ppc/machine.c  |4 --
  target-s390x/kvm.c|3 +-
  vl.c  |   41 +++
  18 files changed, 300 insertions(+), 89 deletions(-)



   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock issue in Windows XP guests

2010-03-04 Thread Zachary Amsden

On 03/03/2010 11:43 PM, Gilles PIETRI wrote:

Hi,

I have a host running a 2.6.32.7 kernel, and I'm using qemu-kvm 
0.12.2. I have multiple guests, and one of them is running Windows XP. 
If I stare at the clock, I see that every now & then (~5s), it slows 
down a bit, and then try to cope with it. If I run some NTP 
synchronization software like ntpd, the offset is as high as 1s lost 
every 10s or so, which makes it impossible to use anything time based 
on the guest (audio stuff, mainly).


I tried messing (as said on IRC) with the -rtc parameters, but to no 
avail. I tried the driftfix=slew option found in the --help output, 
but it says that driftfix is not a valid setting for rtc.. And anyway, 
I have no idea what this does (I'll be reading about it probably...)


I've seen something remotely connected to this on the proxmox forum, 
but it was not that helpful (and proxmox runs qemu 0.11.x as it seems):
http://forum.proxmox.com/threads/2050-Slow-clock-time-drift-in-windows-guests?p=17962 



I remember using the -rtc-td-hack (and in fact, just read again about 
it here: 
http://forum.proxmox.com/threads/2381-Recommended-clock-source-for-KVM-guests), 
but it's not there anymore in 0.12.x, and I have no idea what it used 
to do (going on for some reading as well, when I have some time ;))


Oh, and the guests running linux are working just fine, and have no 
clock issue.


Has anyone encountered such a problem?


No, but I'd certainly like to fix it.  Can you send basic host 
information like /proc/cpuinfo, dmesg from kernel?


If you run the one Windows XP guest alone, does it run better or still 
the same?


Thanks,

Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] Add option to use file backed guest memory

2010-03-04 Thread Marcelo Tosatti
Port qemu-kvm's -mem-path and -mem-prealloc options. These are useful
for backing guest memory with huge pages via hugetlbfs.

Signed-off-by: Marcelo Tosatti 
CC: john cooper 
---
 cpu-all.h   |3 +
 exec.c  |  115 --
 qemu-options.hx |   16 
 vl.c|   12 ++
 4 files changed, 141 insertions(+), 5 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 8488bfe..9823c24 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -847,6 +847,9 @@ extern uint8_t *phys_ram_dirty;
 extern ram_addr_t ram_size;
 extern ram_addr_t last_ram_offset;
 
+extern const char *mem_path;
+extern int mem_prealloc;
+
 /* physical memory access */
 
 /* MMIO pages are identified by a combination of an IO device index and
diff --git a/exec.c b/exec.c
index 6a3c912..f41518e 100644
--- a/exec.c
+++ b/exec.c
@@ -2529,6 +2529,99 @@ void qemu_flush_coalesced_mmio_buffer(void)
 kvm_flush_coalesced_mmio_buffer();
 }
 
+#if defined(__linux__) && !defined(TARGET_S390X)
+
+#include 
+
+#define HUGETLBFS_MAGIC   0x958458f6
+
+static long gethugepagesize(const char *path)
+{
+struct statfs fs;
+int ret;
+
+do {
+   ret = statfs(path, &fs);
+} while (ret != 0 && errno == EINTR);
+
+if (ret != 0) {
+   perror("statfs");
+   return 0;
+}
+
+if (fs.f_type != HUGETLBFS_MAGIC)
+   fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
+
+return fs.f_bsize;
+}
+
+static void *file_ram_alloc(ram_addr_t memory, const char *path)
+{
+char *filename;
+void *area;
+int fd;
+#ifdef MAP_POPULATE
+int flags;
+#endif
+unsigned long hpagesize;
+
+hpagesize = gethugepagesize(path);
+if (!hpagesize) {
+   return NULL;
+}
+
+if (memory < hpagesize) {
+return NULL;
+}
+
+if (kvm_enabled() && !kvm_has_sync_mmu()) {
+fprintf(stderr, "host lacks kvm mmu notifiers, -mem-path 
unsupported\n");
+return NULL;
+}
+
+if (asprintf(&filename, "%s/qemu_back_mem.XX", path) == -1) {
+   return NULL;
+}
+
+fd = mkstemp(filename);
+if (fd < 0) {
+   perror("mkstemp");
+   free(filename);
+   return NULL;
+}
+unlink(filename);
+free(filename);
+
+memory = (memory+hpagesize-1) & ~(hpagesize-1);
+
+/*
+ * ftruncate is not supported by hugetlbfs in older
+ * hosts, so don't bother bailing out on errors.
+ * If anything goes wrong with it under other filesystems,
+ * mmap will fail.
+ */
+if (ftruncate(fd, memory))
+   perror("ftruncate");
+
+#ifdef MAP_POPULATE
+/* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
+ * MAP_PRIVATE is requested.  For mem_prealloc we mmap as MAP_SHARED
+ * to sidestep this quirk.
+ */
+flags = mem_prealloc ? MAP_POPULATE | MAP_SHARED : MAP_PRIVATE;
+area = mmap(0, memory, PROT_READ | PROT_WRITE, flags, fd, 0);
+#else
+area = mmap(0, memory, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+#endif
+if (area == MAP_FAILED) {
+   perror("file_ram_alloc: can't mmap RAM pages");
+   close(fd);
+   return (NULL);
+}
+return area;
+}
+#endif
+
 ram_addr_t qemu_ram_alloc(ram_addr_t size)
 {
 RAMBlock *new_block;
@@ -2536,16 +2629,28 @@ ram_addr_t qemu_ram_alloc(ram_addr_t size)
 size = TARGET_PAGE_ALIGN(size);
 new_block = qemu_malloc(sizeof(*new_block));
 
+if (mem_path) {
+#if defined (__linux__) && !defined(TARGET_S390X)
+new_block->host = file_ram_alloc(size, mem_path);
+if (!new_block->host)
+exit(1);
+#else
+fprintf(stderr, "-mem-path option unsupported\n");
+exit(1);
+#endif
+} else {
 #if defined(TARGET_S390X) && defined(CONFIG_KVM)
-/* XXX S390 KVM requires the topmost vma of the RAM to be < 256GB */
-new_block->host = mmap((void*)0x100, size, 
PROT_EXEC|PROT_READ|PROT_WRITE,
-   MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+/* XXX S390 KVM requires the topmost vma of the RAM to be < 256GB */
+new_block->host = mmap((void*)0x100, size,
+PROT_EXEC|PROT_READ|PROT_WRITE,
+MAP_SHARED | MAP_ANONYMOUS, -1, 0);
 #else
-new_block->host = qemu_vmalloc(size);
+new_block->host = qemu_vmalloc(size);
 #endif
 #ifdef MADV_MERGEABLE
-madvise(new_block->host, size, MADV_MERGEABLE);
+madvise(new_block->host, size, MADV_MERGEABLE);
 #endif
+}
 new_block->offset = last_ram_offset;
 new_block->length = size;
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 7daa246..fd50add 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -314,6 +314,22 @@ a suffix of ``M'' or ``G'' can be used to signify a value 
in megabytes or
 gigabytes respectively.
 ETEXI
 
+DEF("mem-path", HAS_ARG, QEMU_OPTION_mempath,
+"-mem-path FILE  provide backing storage for guest RAM\n")
+STEXI
+...@item -mem

[PATCH 6/6] x86: Extend validity of bsp_to_cpu

2010-03-04 Thread Marcelo Tosatti
From: Jan Kiszka 

As we hard-wire the BSP to CPU 0 anyway and cpuid_apic_id equals
cpu_index, bsp_to_cpu can also be based on the latter directly. This
will help an early user of it: KVM while initializing mp_state.

Signed-off-by: Jan Kiszka 
Signed-off-by: Marcelo Tosatti 
---
 hw/pc.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index bdc297f..e50a488 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -760,7 +760,8 @@ static void pc_init_ne2k_isa(NICInfo *nd)
 
 int cpu_is_bsp(CPUState *env)
 {
-return env->cpuid_apic_id == 0;
+/* We hard-wire the BSP to the first CPU. */
+return env->cpu_index == 0;
 }
 
 static CPUState *pc_new_cpu(const char *cpu_model)
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] KVM: Rework VCPU state writeback API

2010-03-04 Thread Marcelo Tosatti
From: Jan Kiszka 

This grand cleanup drops all reset and vmsave/load related
synchronization points in favor of four(!) generic hooks:

- cpu_synchronize_all_states in qemu_savevm_state_complete
  (initial sync from kernel before vmsave)
- cpu_synchronize_all_post_init in qemu_loadvm_state
  (writeback after vmload)
- cpu_synchronize_all_post_init in main after machine init
- cpu_synchronize_all_post_reset in qemu_system_reset
  (writeback after system reset)

These writeback points + the existing one of VCPU exec after
cpu_synchronize_state map on three levels of writeback:

- KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run)
- KVM_PUT_RESET_STATE   (on synchronous system reset, all VCPUs stopped)
- KVM_PUT_FULL_STATE(on init or vmload, all VCPUs stopped as well)

This level is passed to the arch-specific VCPU state writing function
that will decide which concrete substates need to be written. That way,
no writer of load, save or reset functions that interact with in-kernel
KVM states will ever have to worry about synchronization again. That
also means that a lot of reasons for races, segfaults and deadlocks are
eliminated.

cpu_synchronize_state remains untouched, just as Anthony suggested. We
continue to need it before reading or writing of VCPU states that are
also tracked by in-kernel KVM subsystems.

Consequently, this patch removes many cpu_synchronize_state calls that
are now redundant, just like remaining explicit register syncs.

Signed-off-by: Jan Kiszka 
Signed-off-by: Marcelo Tosatti 
---
 exec.c|   17 -
 hw/apic.c |2 --
 hw/ppc_newworld.c |3 ---
 hw/ppc_oldworld.c |3 ---
 hw/s390-virtio.c  |1 -
 kvm-all.c |   19 +--
 kvm.h |   25 -
 savevm.c  |4 
 sysemu.h  |4 
 target-i386/kvm.c |2 +-
 target-i386/machine.c |   11 ---
 target-ppc/kvm.c  |2 +-
 target-ppc/machine.c  |4 
 target-s390x/kvm.c|3 +--
 vl.c  |   29 +
 15 files changed, 77 insertions(+), 52 deletions(-)

diff --git a/exec.c b/exec.c
index f41518e..891e0ee 100644
--- a/exec.c
+++ b/exec.c
@@ -512,21 +512,6 @@ void cpu_exec_init_all(unsigned long tb_size)
 
 #if defined(CPU_SAVE_VERSION) && !defined(CONFIG_USER_ONLY)
 
-static void cpu_common_pre_save(void *opaque)
-{
-CPUState *env = opaque;
-
-cpu_synchronize_state(env);
-}
-
-static int cpu_common_pre_load(void *opaque)
-{
-CPUState *env = opaque;
-
-cpu_synchronize_state(env);
-return 0;
-}
-
 static int cpu_common_post_load(void *opaque, int version_id)
 {
 CPUState *env = opaque;
@@ -544,8 +529,6 @@ static const VMStateDescription vmstate_cpu_common = {
 .version_id = 1,
 .minimum_version_id = 1,
 .minimum_version_id_old = 1,
-.pre_save = cpu_common_pre_save,
-.pre_load = cpu_common_pre_load,
 .post_load = cpu_common_post_load,
 .fields  = (VMStateField []) {
 VMSTATE_UINT32(halted, CPUState),
diff --git a/hw/apic.c b/hw/apic.c
index 87e7dc0..3c90f4c 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -938,8 +938,6 @@ static void apic_reset(void *opaque)
 APICState *s = opaque;
 int bsp;
 
-cpu_synchronize_state(s->cpu_env);
-
 bsp = cpu_is_bsp(s->cpu_env);
 s->apicbase = 0xfee0 |
 (bsp ? MSR_IA32_APICBASE_BSP : 0) | MSR_IA32_APICBASE_ENABLE;
diff --git a/hw/ppc_newworld.c b/hw/ppc_newworld.c
index bc86c85..d4f9013 100644
--- a/hw/ppc_newworld.c
+++ b/hw/ppc_newworld.c
@@ -167,9 +167,6 @@ static void ppc_core99_init (ram_addr_t ram_size,
 envs[i] = env;
 }
 
-/* Make sure all register sets take effect */
-cpu_synchronize_state(env);
-
 /* allocate RAM */
 ram_offset = qemu_ram_alloc(ram_size);
 cpu_register_physical_memory(0, ram_size, ram_offset);
diff --git a/hw/ppc_oldworld.c b/hw/ppc_oldworld.c
index 04a7835..93c95ba 100644
--- a/hw/ppc_oldworld.c
+++ b/hw/ppc_oldworld.c
@@ -165,9 +165,6 @@ static void ppc_heathrow_init (ram_addr_t ram_size,
 envs[i] = env;
 }
 
-/* Make sure all register sets take effect */
-cpu_synchronize_state(env);
-
 /* allocate RAM */
 if (ram_size > (2047 << 20)) {
 fprintf(stderr,
diff --git a/hw/s390-virtio.c b/hw/s390-virtio.c
index 3582728..ad3386f 100644
--- a/hw/s390-virtio.c
+++ b/hw/s390-virtio.c
@@ -185,7 +185,6 @@ static void s390_init(ram_addr_t ram_size,
 exit(1);
 }
 
-cpu_synchronize_state(env);
 env->psw.addr = KERN_IMAGE_START;
 env->psw.mask = 0x00018000ULL;
 }
diff --git a/kvm-all.c b/kvm-all.c
index 2f7e33a..534ead0 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -156,10 +156,6 @@ static void kvm_reset_vcpu(void *opaque)
 CPUState *env = opaque;
 
 kvm_arch_reset_vcpu(env);
-if (kvm_arch_put_registers(env)) {
-fprintf(stderr, "Fat

[PATCH 3/6] KVM: Rework of guest debug state writing

2010-03-04 Thread Marcelo Tosatti
From: Jan Kiszka 

So far we synchronized any dirty VCPU state back into the kernel before
updating the guest debug state. This was a tribute to a deficite in x86
kernels before 2.6.33. But as this is an arch-dependent issue, it is
better handle in the x86 part of KVM and remove the writeback point for
generic code. This also avoids overwriting the flushed state later on if
user space decides to change some more registers before resuming the
guest.

We furthermore need to reinject guest exceptions via the appropriate
mechanism. That is KVM_SET_GUEST_DEBUG for older kernels and
KVM_SET_VCPU_EVENTS for recent ones. Using both mechanisms at the same
time will cause state corruptions.

Signed-off-by: Jan Kiszka 
Signed-off-by: Marcelo Tosatti 
---
 kvm-all.c |   24 
 kvm.h |1 +
 target-i386/kvm.c |   47 +++
 3 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 1a02076..2f7e33a 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -65,6 +65,7 @@ struct KVMState
 int broken_set_mem_region;
 int migration_log;
 int vcpu_events;
+int robust_singlestep;
 #ifdef KVM_CAP_SET_GUEST_DEBUG
 struct kvm_sw_breakpoint_head kvm_sw_breakpoints;
 #endif
@@ -659,6 +660,12 @@ int kvm_init(int smp_cpus)
 s->vcpu_events = kvm_check_extension(s, KVM_CAP_VCPU_EVENTS);
 #endif
 
+s->robust_singlestep = 0;
+#ifdef KVM_CAP_X86_ROBUST_SINGLESTEP
+s->robust_singlestep =
+kvm_check_extension(s, KVM_CAP_X86_ROBUST_SINGLESTEP);
+#endif
+
 ret = kvm_arch_init(s, smp_cpus);
 if (ret < 0)
 goto err;
@@ -917,6 +924,11 @@ int kvm_has_vcpu_events(void)
 return kvm_state->vcpu_events;
 }
 
+int kvm_has_robust_singlestep(void)
+{
+return kvm_state->robust_singlestep;
+}
+
 void kvm_setup_guest_memory(void *start, size_t size)
 {
 if (!kvm_has_sync_mmu()) {
@@ -974,10 +986,6 @@ static void kvm_invoke_set_guest_debug(void *data)
 struct kvm_set_guest_debug_data *dbg_data = data;
 CPUState *env = dbg_data->env;
 
-if (env->kvm_vcpu_dirty) {
-kvm_arch_put_registers(env);
-env->kvm_vcpu_dirty = 0;
-}
 dbg_data->err = kvm_vcpu_ioctl(env, KVM_SET_GUEST_DEBUG, &dbg_data->dbg);
 }
 
@@ -985,12 +993,12 @@ int kvm_update_guest_debug(CPUState *env, unsigned long 
reinject_trap)
 {
 struct kvm_set_guest_debug_data data;
 
-data.dbg.control = 0;
-if (env->singlestep_enabled)
-data.dbg.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;
+data.dbg.control = reinject_trap;
 
+if (env->singlestep_enabled) {
+data.dbg.control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;
+}
 kvm_arch_update_guest_debug(env, &data.dbg);
-data.dbg.control |= reinject_trap;
 data.env = env;
 
 on_vcpu(env, kvm_invoke_set_guest_debug, &data);
diff --git a/kvm.h b/kvm.h
index a74dfcb..a602e45 100644
--- a/kvm.h
+++ b/kvm.h
@@ -40,6 +40,7 @@ int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t 
size);
 
 int kvm_has_sync_mmu(void);
 int kvm_has_vcpu_events(void);
+int kvm_has_robust_singlestep(void);
 
 void kvm_setup_guest_memory(void *start, size_t size);
 
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index d2116a7..e0247ea 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -852,6 +852,37 @@ static int kvm_get_vcpu_events(CPUState *env)
 return 0;
 }
 
+static int kvm_guest_debug_workarounds(CPUState *env)
+{
+int ret = 0;
+#ifdef KVM_CAP_SET_GUEST_DEBUG
+unsigned long reinject_trap = 0;
+
+if (!kvm_has_vcpu_events()) {
+if (env->exception_injected == 1) {
+reinject_trap = KVM_GUESTDBG_INJECT_DB;
+} else if (env->exception_injected == 3) {
+reinject_trap = KVM_GUESTDBG_INJECT_BP;
+}
+env->exception_injected = -1;
+}
+
+/*
+ * Kernels before KVM_CAP_X86_ROBUST_SINGLESTEP overwrote flags.TF
+ * injected via SET_GUEST_DEBUG while updating GP regs. Work around this
+ * by updating the debug state once again if single-stepping is on.
+ * Another reason to call kvm_update_guest_debug here is a pending debug
+ * trap raise by the guest. On kernels without SET_VCPU_EVENTS we have to
+ * reinject them via SET_GUEST_DEBUG.
+ */
+if (reinject_trap ||
+(!kvm_has_robust_singlestep() && env->singlestep_enabled)) {
+ret = kvm_update_guest_debug(env, reinject_trap);
+}
+#endif /* KVM_CAP_SET_GUEST_DEBUG */
+return ret;
+}
+
 int kvm_arch_put_registers(CPUState *env)
 {
 int ret;
@@ -880,6 +911,11 @@ int kvm_arch_put_registers(CPUState *env)
 if (ret < 0)
 return ret;
 
+/* must be last */
+ret = kvm_guest_debug_workarounds(env);
+if (ret < 0)
+return ret;
+
 return 0;
 }
 
@@ -1123,10 +1159,13 @@ int kvm_arch_debug(struct kvm_debug_exit_arch 
*arch_info)
 } else if (kvm_find_sw_breakpoint(cpu_single_env, arch_info->pc))

[PATCH 5/6] KVM: x86: Restrict writeback of VCPU state

2010-03-04 Thread Marcelo Tosatti
From: Jan Kiszka 

Do not write nmi_pending, sipi_vector, and mpstate unless we at least go
through a reset. And TSC as well as KVM wallclocks should only be
written on full sync, otherwise we risk to drop some time on state
read-modify-write.

Signed-off-by: Jan Kiszka 
Signed-off-by: Marcelo Tosatti 
---
 target-i386/kvm.c |   32 
 1 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 2c834df..40f8303 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -546,7 +546,7 @@ static void kvm_msr_entry_set(struct kvm_msr_entry *entry,
 entry->data = value;
 }
 
-static int kvm_put_msrs(CPUState *env)
+static int kvm_put_msrs(CPUState *env, int level)
 {
 struct {
 struct kvm_msrs info;
@@ -560,7 +560,6 @@ static int kvm_put_msrs(CPUState *env)
 kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip);
 if (kvm_has_msr_star(env))
kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star);
-kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
 #ifdef TARGET_X86_64
 /* FIXME if lm capable */
 kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar);
@@ -568,8 +567,12 @@ static int kvm_put_msrs(CPUState *env)
 kvm_msr_entry_set(&msrs[n++], MSR_FMASK, env->fmask);
 kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar);
 #endif
-kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
-kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
+if (level == KVM_PUT_FULL_STATE) {
+kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
+kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
+  env->system_time_msr);
+kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
+}
 
 msr_data.info.nmsrs = n;
 
@@ -782,7 +785,7 @@ static int kvm_get_mp_state(CPUState *env)
 return 0;
 }
 
-static int kvm_put_vcpu_events(CPUState *env)
+static int kvm_put_vcpu_events(CPUState *env, int level)
 {
 #ifdef KVM_CAP_VCPU_EVENTS
 struct kvm_vcpu_events events;
@@ -806,8 +809,11 @@ static int kvm_put_vcpu_events(CPUState *env)
 
 events.sipi_vector = env->sipi_vector;
 
-events.flags =
-KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR;
+events.flags = 0;
+if (level >= KVM_PUT_RESET_STATE) {
+events.flags |=
+KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR;
+}
 
 return kvm_vcpu_ioctl(env, KVM_SET_VCPU_EVENTS, &events);
 #else
@@ -899,15 +905,17 @@ int kvm_arch_put_registers(CPUState *env, int level)
 if (ret < 0)
 return ret;
 
-ret = kvm_put_msrs(env);
+ret = kvm_put_msrs(env, level);
 if (ret < 0)
 return ret;
 
-ret = kvm_put_mp_state(env);
-if (ret < 0)
-return ret;
+if (level >= KVM_PUT_RESET_STATE) {
+ret = kvm_put_mp_state(env);
+if (ret < 0)
+return ret;
+}
 
-ret = kvm_put_vcpu_events(env);
+ret = kvm_put_vcpu_events(env, level);
 if (ret < 0)
 return ret;
 
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] Allocate memory below 4GB as one chunk

2010-03-04 Thread Marcelo Tosatti
From: Avi Kivity 

Instead of allocating a separate chunk for the first 640KB and another
for 1MB+, allocate one large chunk.  This plays well in terms of alignment
and size with large pages.

Signed-off-by: Avi Kivity 
Signed-off-by: Marcelo Tosatti 
---
 hw/pc.c |   11 ++-
 1 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 4f6a522..bdc297f 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -833,18 +833,11 @@ static void pc_init1(ram_addr_t ram_size,
 vmport_init();
 
 /* allocate RAM */
-ram_addr = qemu_ram_alloc(0xa);
+ram_addr = qemu_ram_alloc(below_4g_mem_size);
 cpu_register_physical_memory(0, 0xa, ram_addr);
-
-/* Allocate, even though we won't register, so we don't break the
- * phys_ram_base + PA assumption. This range includes vga (0xa - 
0xc),
- * and some bios areas, which will be registered later
- */
-ram_addr = qemu_ram_alloc(0x10 - 0xa);
-ram_addr = qemu_ram_alloc(below_4g_mem_size - 0x10);
 cpu_register_physical_memory(0x10,
  below_4g_mem_size - 0x10,
- ram_addr);
+ ram_addr + 0x10);
 
 /* above 4giga memory allocation */
 if (above_4g_mem_size > 0) {
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] [PULL] qemu-kvm.git uq/master queue

2010-03-04 Thread Marcelo Tosatti
The following changes since commit 55b1e61f640bb2cf3bed0b4cc6d4ba1326c625d9:
  Samuel Thibault (1):
(curses) Use more descriptive values

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

Avi Kivity (1):
  Allocate memory below 4GB as one chunk

Jan Kiszka (4):
  KVM: Rework of guest debug state writing
  KVM: Rework VCPU state writeback API
  KVM: x86: Restrict writeback of VCPU state
  x86: Extend validity of bsp_to_cpu

Marcelo Tosatti (1):
  Add option to use file backed guest memory

 cpu-all.h |3 +
 exec.c|  132 
 hw/apic.c |2 -
 hw/pc.c   |   14 ++
 hw/ppc_newworld.c |3 -
 hw/ppc_oldworld.c |3 -
 hw/s390-virtio.c  |1 -
 kvm-all.c |   43 +++-
 kvm.h |   26 +-
 qemu-options.hx   |   16 ++
 savevm.c  |4 ++
 sysemu.h  |4 ++
 target-i386/kvm.c |   77 +++--
 target-i386/machine.c |   11 
 target-ppc/kvm.c  |2 +-
 target-ppc/machine.c  |4 --
 target-s390x/kvm.c|3 +-
 vl.c  |   41 +++
 18 files changed, 300 insertions(+), 89 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v3] Add savevm/loadvm support for MCE

2010-03-04 Thread Marcelo Tosatti
On Wed, Mar 03, 2010 at 04:52:46PM +0800, Huang Ying wrote:
> MCE registers are saved/load into/from CPUState in
> kvm_arch_save/load_regs. To simulate the MCG_STATUS clearing upon
> reset, MSR_MCG_STATUS is set to 0 for KVM_PUT_RESET_STATE.
> 
> v3:
> 
>  - use msrs[] in kvm_arch_load/save_regs and get_msr_entry directly.
> 
> v2:
> 
>  - Rebased on new CPU registers save/load framework.
> 
> Signed-off-by: Huang Ying 
> ---
>  qemu-kvm-x86.c |   36 
>  1 file changed, 36 insertions(+)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)

2010-03-04 Thread Marcelo Tosatti
On Wed, Mar 03, 2010 at 08:12:03PM +0100, Joerg Roedel wrote:
> Hi,
> 
> here are the patches that implement nested paging support for nested
> svm. They are somewhat intrusive to the soft-mmu so I post them as RFC
> in the first round to get feedback about the general direction of the
> changes.  Nevertheless I am proud to report that with these patches the
> famous kernel-compile benchmark runs only 4% slower in the l2 guest as
> in the l1 guest when l2 is single-processor. With SMP guests the
> situation is very different. The more vcpus the guest has the more is
> the performance drop from l1 to l2. 
> Anyway, this post is to get feedback about the overall concept of these
> patches.  Please review and give feedback :-)

Joerg,

What perf gain does this bring ? (i'm not aware of the current
overhead).

Overall comments:

Can't you translate l2_gpa -> l1_gpa walking the current l1 nested
pagetable, and pass that to the kvm tdp fault path (with the correct
context setup)?

You probably need to include a flag in base_role to differentiate
between l1 / l2 shadow tables (say if they use the same cr3 value).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 03/10] x86: Extend validity of cpu_is_bsp

2010-03-04 Thread Gleb Natapov
On Thu, Mar 04, 2010 at 12:35:45PM +0100, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Thu, Mar 04, 2010 at 09:23:46AM +0100, Jan Kiszka wrote:
> >> Gleb Natapov wrote:
> >>> On Thu, Mar 04, 2010 at 12:34:22AM +0100, Jan Kiszka wrote:
>  Gleb Natapov wrote:
> > On Mon, Mar 01, 2010 at 06:17:22PM +0100, Jan Kiszka wrote:
> >> As we hard-wire the BSP to CPU 0 anyway and cpuid_apic_id equals
> >> cpu_index, cpu_is_bsp can also be based on the latter directly. This
> >> will help an early user of it: KVM while initializing mp_state.
> >>
> >> Signed-off-by: Jan Kiszka 
> >> ---
> >>  hw/pc.c |3 ++-
> >>  1 files changed, 2 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/hw/pc.c b/hw/pc.c
> >> index b90a79e..58c32ea 100644
> >> --- a/hw/pc.c
> >> +++ b/hw/pc.c
> >> @@ -767,7 +767,8 @@ static void pc_init_ne2k_isa(NICInfo *nd)
> >>  
> >>  int cpu_is_bsp(CPUState *env)
> >>  {
> >> -return env->cpuid_apic_id == 0;
> >> +/* We hard-wire the BSP to the first CPU. */
> >> +return env->cpu_index == 0;
> >>  }
> > We should not assume that. The function was written like that
> > specifically so the code around it will not rely on this assumption.
> > Now you change that specifically to write code that will do incorrect
> > assumptions. I don't see the logic here.
>  The logic is that we do not support any other mapping yet - with or
>  without this change. Without it, we complicate the APIC initialization
>  for (so far) no good reason. Once we want to support different BSP
>  assignments, we need to go through the code and rework some parts anyway.
> 
> >>> As far as I remember the only part that was missing was a command line to
> >>> specify apic IDs for each CPU and what CPU is BSP. The code was ready
> >>> otherwise. I's very sad if this was broken by other modifications. But
> >>> changes like that actually pushes us back from our goal. Why not rework
> >>> code so it will work with correct cpu_is_bsp() function instead of
> >>> introducing this hack?
> >> If you can confirm that there is a serious use case behind it, I will
> >> look into this again. But so far, I did not find it.
> >>
> > Firs of all it is correctness issue. We should emulate x86 platform and
> > nothing there says that BSP apic id is zero. Second part of CPU topology
> > information is encoded in apic id. i.e when socket/core/ht topology is
> > used we can't just arbitrary specify apic ids for each logical cpu, we
> > should follow the rules described in SDM. For instance when more then 16
> > CPUs are present AMD advices to start numbering apic ids from 16 and leave
> > first 16 IDs for IOAPICs. And third introduction of this hack shows that
> > something is done wrong in other places of the code. Somewhere
> > initialization order is incorrect.
> 
> Well, it looks like we need to answer two questions: How shall to user
> specify the BSP? And how to reliably map this on QEMU's internal
> cpu_index? Depending on this, apic numbering may or may not be an
> orthogonal issue.
> 
Two good question :) We can extend -cpu command to let as specify base
apic id for each socket. Apic ids of logical cpus are derived from this
base acpi id depending on where in hierarchy the logical cpu resided.
cpu_index thing in QEMU is pretty messy. The way non x86 arches use it
make it hard to cleanup, so this is why I didn't want to rely on it at
all and use acpi id instead. Thinking about it cpu_is_bsp() should
really check BSP bit in apic base register. 


> BTW, do real systems allow to hot plug BSP as well? Or how is the case
> handled when you unplug the BSP and then reboot the box?
> 
Did you mean hot unplug BSP? OS determines what CPU is BSP by checking
BSP bit in APIC base register. My guess is that there is some pin on CPU
which value is mirrored as BSP bit in APIC base register. Board may have
some logic to check what sockets are populated and chose one of them as
BSP by pulling its pin up. But this is only guess.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] QEMU-KVM: Ask kernel about supported svm features

2010-03-04 Thread Alexander Graf
Joerg Roedel wrote:
> On Wed, Mar 03, 2010 at 11:58:49PM +0100, Alexander Graf wrote:
>   
>> Am 03.03.2010 um 20:15 schrieb Joerg Roedel :
>>
>> 
>>> This patch adds code to ask the kernel about the svm
>>> features it supports for its guests and propagates them to
>>> the guest. The new capability is necessary because the old
>>> behavior of the kernel was to just return the host svm
>>> features but every svm-feature needs emulation in the nested
>>> svm kernel code. The new capability indicates that the
>>> kernel is aware of that when returning svm cpuid
>>> information.
>>>   
>> Do we really need that complexity?
>> 
>
> Yes :-)
>
>   
>> By default the kernel masks out unsupported cpuid features anyway. So
>> if we don't have npt guest support (enabled), the kernel module should
>> just mask it out.
>> 
>
> The kernel does not mask out unsupported features. I also don't think
> this would be a good idea because userspace won't be aware of that
> change.
> Fact it, we need a way to report the npt-emulation feature to userspace
> because old kvm versions don't support it. So we can't pass the npt bit
> unconditionally. The get_supported_cpuid ioctl is the way of choice
> here.
> But the current way get_supported_cpuid works for function 0x800a is
> broken because it reports the host features. This was the reason to
> introduce the new capability.
>   

That's what I mean by masking. It used to happen implicitly, but has
been changed to directly asking the kernel for its capabilities apparently.

>> IOW, always passing npt should work. No capability should make it
>> get masked out.
>> 
>
> No, as stated above always passing npt-bit into the kernel and letting
> it mask out there isn't a good way to go (not only because this will
> break if you use new qem-kvm on old kernel-space).
>   

Ah, so we did have a bug in old KVM kernel modules. Sigh.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] QEMU-KVM: Ask kernel about supported svm features

2010-03-04 Thread Joerg Roedel
On Wed, Mar 03, 2010 at 11:58:49PM +0100, Alexander Graf wrote:
> 
> Am 03.03.2010 um 20:15 schrieb Joerg Roedel :
> 
> >This patch adds code to ask the kernel about the svm
> >features it supports for its guests and propagates them to
> >the guest. The new capability is necessary because the old
> >behavior of the kernel was to just return the host svm
> >features but every svm-feature needs emulation in the nested
> >svm kernel code. The new capability indicates that the
> >kernel is aware of that when returning svm cpuid
> >information.
> 
> Do we really need that complexity?

Yes :-)

> By default the kernel masks out unsupported cpuid features anyway. So
> if we don't have npt guest support (enabled), the kernel module should
> just mask it out.

The kernel does not mask out unsupported features. I also don't think
this would be a good idea because userspace won't be aware of that
change.
Fact it, we need a way to report the npt-emulation feature to userspace
because old kvm versions don't support it. So we can't pass the npt bit
unconditionally. The get_supported_cpuid ioctl is the way of choice
here.
But the current way get_supported_cpuid works for function 0x800a is
broken because it reports the host features. This was the reason to
introduce the new capability.
 
> IOW, always passing npt should work. No capability should make it
> get masked out.

No, as stated above always passing npt-bit into the kernel and letting
it mask out there isn't a good way to go (not only because this will
break if you use new qem-kvm on old kernel-space).

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 03/10] x86: Extend validity of cpu_is_bsp

2010-03-04 Thread Jan Kiszka
Gleb Natapov wrote:
> On Thu, Mar 04, 2010 at 09:23:46AM +0100, Jan Kiszka wrote:
>> Gleb Natapov wrote:
>>> On Thu, Mar 04, 2010 at 12:34:22AM +0100, Jan Kiszka wrote:
 Gleb Natapov wrote:
> On Mon, Mar 01, 2010 at 06:17:22PM +0100, Jan Kiszka wrote:
>> As we hard-wire the BSP to CPU 0 anyway and cpuid_apic_id equals
>> cpu_index, cpu_is_bsp can also be based on the latter directly. This
>> will help an early user of it: KVM while initializing mp_state.
>>
>> Signed-off-by: Jan Kiszka 
>> ---
>>  hw/pc.c |3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/hw/pc.c b/hw/pc.c
>> index b90a79e..58c32ea 100644
>> --- a/hw/pc.c
>> +++ b/hw/pc.c
>> @@ -767,7 +767,8 @@ static void pc_init_ne2k_isa(NICInfo *nd)
>>  
>>  int cpu_is_bsp(CPUState *env)
>>  {
>> -return env->cpuid_apic_id == 0;
>> +/* We hard-wire the BSP to the first CPU. */
>> +return env->cpu_index == 0;
>>  }
> We should not assume that. The function was written like that
> specifically so the code around it will not rely on this assumption.
> Now you change that specifically to write code that will do incorrect
> assumptions. I don't see the logic here.
 The logic is that we do not support any other mapping yet - with or
 without this change. Without it, we complicate the APIC initialization
 for (so far) no good reason. Once we want to support different BSP
 assignments, we need to go through the code and rework some parts anyway.

>>> As far as I remember the only part that was missing was a command line to
>>> specify apic IDs for each CPU and what CPU is BSP. The code was ready
>>> otherwise. I's very sad if this was broken by other modifications. But
>>> changes like that actually pushes us back from our goal. Why not rework
>>> code so it will work with correct cpu_is_bsp() function instead of
>>> introducing this hack?
>> If you can confirm that there is a serious use case behind it, I will
>> look into this again. But so far, I did not find it.
>>
> Firs of all it is correctness issue. We should emulate x86 platform and
> nothing there says that BSP apic id is zero. Second part of CPU topology
> information is encoded in apic id. i.e when socket/core/ht topology is
> used we can't just arbitrary specify apic ids for each logical cpu, we
> should follow the rules described in SDM. For instance when more then 16
> CPUs are present AMD advices to start numbering apic ids from 16 and leave
> first 16 IDs for IOAPICs. And third introduction of this hack shows that
> something is done wrong in other places of the code. Somewhere
> initialization order is incorrect.

Well, it looks like we need to answer two questions: How shall to user
specify the BSP? And how to reliably map this on QEMU's internal
cpu_index? Depending on this, apic numbering may or may not be an
orthogonal issue.

BTW, do real systems allow to hot plug BSP as well? Or how is the case
handled when you unplug the BSP and then reboot the box?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)

2010-03-04 Thread Joerg Roedel
On Thu, Mar 04, 2010 at 12:44:48AM +0100, Alexander Graf wrote:
> 
> On 03.03.2010, at 20:12, Joerg Roedel wrote:
> 
> > Hi,
> > 
> > here are the patches that implement nested paging support for nested
> > svm. They are somewhat intrusive to the soft-mmu so I post them as RFC
> > in the first round to get feedback about the general direction of the
> > changes.  Nevertheless I am proud to report that with these patches the
> > famous kernel-compile benchmark runs only 4% slower in the l2 guest as
> > in the l1 guest when l2 is single-processor. With SMP guests the
> > situation is very different. The more vcpus the guest has the more is
> > the performance drop from l1 to l2. 
> > Anyway, this post is to get feedback about the overall concept of these
> > patches.  Please review and give feedback :-)
> 
> Nice job! It's great to see you finally got around to it :-).
> 
> Have you tracked what slows down SMP l2 guests yet? So far I've been
> assuming that IPIs just completely kill the performance, but I guess
> it shouldn't be that bad, especially now where you have sped up the
> #VMEXIT path that much.

I have not yet looked deeper into this issue. I also suspect lockholder
preemption to be the cause for this. I did the test with a
populated nested page table too and the slowdown is still there. But
thats all guessing, I need to do some research for the exact reasons.

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 17/18] KVM: SVM: Report Nested Paging support to userspace

2010-03-04 Thread Joerg Roedel
On Thu, Mar 04, 2010 at 12:37:42AM +0100, Alexander Graf wrote:
> 
> On 03.03.2010, at 20:12, Joerg Roedel wrote:
> 
> > This patch implements the reporting of the nested paging
> > feature support to userspace.
> > 
> > Signed-off-by: Joerg Roedel 
> > ---
> > arch/x86/kvm/svm.c |   10 ++
> > 1 files changed, 10 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index fe1398e..ce71023 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -3289,6 +3289,16 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
> > 
> > static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 
> > *entry)
> > {
> > +   switch (func) {
> > +   case 0x800A:
> > +   if (!npt_enabled)
> > +   break;
> 
> if (!nested)
>   break;

True, but shouldn't matter much because if nested is off the guest will
not see the svm bit. It would only see that the processor could do
nested paging if it had svm support ;-)

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Clock issue in Windows XP guests

2010-03-04 Thread Gilles PIETRI

Hi,

I have a host running a 2.6.32.7 kernel, and I'm using qemu-kvm 0.12.2. 
I have multiple guests, and one of them is running Windows XP. If I 
stare at the clock, I see that every now & then (~5s), it slows down a 
bit, and then try to cope with it. If I run some NTP synchronization 
software like ntpd, the offset is as high as 1s lost every 10s or so, 
which makes it impossible to use anything time based on the guest (audio 
stuff, mainly).


I tried messing (as said on IRC) with the -rtc parameters, but to no 
avail. I tried the driftfix=slew option found in the --help output, but 
it says that driftfix is not a valid setting for rtc.. And anyway, I 
have no idea what this does (I'll be reading about it probably...)


I've seen something remotely connected to this on the proxmox forum, but 
it was not that helpful (and proxmox runs qemu 0.11.x as it seems):

http://forum.proxmox.com/threads/2050-Slow-clock-time-drift-in-windows-guests?p=17962

I remember using the -rtc-td-hack (and in fact, just read again about it 
here: 
http://forum.proxmox.com/threads/2381-Recommended-clock-source-for-KVM-guests), 
but it's not there anymore in 0.12.x, and I have no idea what it used to 
do (going on for some reading as well, when I have some time ;))


Oh, and the guests running linux are working just fine, and have no 
clock issue.


Has anyone encountered such a problem?

Regards,

Gilou
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu-kvm: Fix boot CPU setup for the case it is unsupported

2010-03-04 Thread Jan Kiszka
Commit 52b03dd702 incorrectly failed KVM initialization in case the
kernel did not support KVM_CAP_SET_BOOT_CPU_ID. Fix this, and also
improve error propagation of kvm_create_context at this chance.

Signed-off-by: Jan Kiszka 
---

OK, it really was me. :)

 qemu-kvm-x86.c |9 +++--
 qemu-kvm.c |4 +++-
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 7a5925a..7d42fdc 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -672,7 +672,7 @@ static const VMStateDescription vmstate_kvmclock= {
 
 int kvm_arch_qemu_create_context(void)
 {
-int i;
+int i, r;
 struct utsname utsname;
 
 uname(&utsname);
@@ -696,7 +696,12 @@ int kvm_arch_qemu_create_context(void)
 vmstate_register(0, &vmstate_kvmclock, &kvmclock_data);
 #endif
 
-return kvm_set_boot_cpu_id(0);
+r = kvm_set_boot_cpu_id(0);
+if (r < 0 && r != -ENOSYS) {
+return r;
+}
+
+return 0;
 }
 
 static void set_msr_entry(struct kvm_msr_entry *entry, uint32_t index,
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 222ca97..e417f21 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2091,8 +2091,10 @@ static int kvm_create_context(void)
 return -1;
 }
 r = kvm_arch_qemu_create_context();
-if (r < 0)
+if (r < 0) {
 kvm_finalize(kvm_state);
+return -1;
+}
 if (kvm_pit && !kvm_pit_reinject) {
 if (kvm_reinject_control(kvm_context, 0)) {
 fprintf(stderr, "failure to disable in-kernel PIT reinjection\n");



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v4 03/10] x86: Extend validity of cpu_is_bsp

2010-03-04 Thread Gleb Natapov
On Thu, Mar 04, 2010 at 09:23:46AM +0100, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Thu, Mar 04, 2010 at 12:34:22AM +0100, Jan Kiszka wrote:
> >> Gleb Natapov wrote:
> >>> On Mon, Mar 01, 2010 at 06:17:22PM +0100, Jan Kiszka wrote:
>  As we hard-wire the BSP to CPU 0 anyway and cpuid_apic_id equals
>  cpu_index, cpu_is_bsp can also be based on the latter directly. This
>  will help an early user of it: KVM while initializing mp_state.
> 
>  Signed-off-by: Jan Kiszka 
>  ---
>   hw/pc.c |3 ++-
>   1 files changed, 2 insertions(+), 1 deletions(-)
> 
>  diff --git a/hw/pc.c b/hw/pc.c
>  index b90a79e..58c32ea 100644
>  --- a/hw/pc.c
>  +++ b/hw/pc.c
>  @@ -767,7 +767,8 @@ static void pc_init_ne2k_isa(NICInfo *nd)
>   
>   int cpu_is_bsp(CPUState *env)
>   {
>  -return env->cpuid_apic_id == 0;
>  +/* We hard-wire the BSP to the first CPU. */
>  +return env->cpu_index == 0;
>   }
> >>> We should not assume that. The function was written like that
> >>> specifically so the code around it will not rely on this assumption.
> >>> Now you change that specifically to write code that will do incorrect
> >>> assumptions. I don't see the logic here.
> >> The logic is that we do not support any other mapping yet - with or
> >> without this change. Without it, we complicate the APIC initialization
> >> for (so far) no good reason. Once we want to support different BSP
> >> assignments, we need to go through the code and rework some parts anyway.
> >>
> > As far as I remember the only part that was missing was a command line to
> > specify apic IDs for each CPU and what CPU is BSP. The code was ready
> > otherwise. I's very sad if this was broken by other modifications. But
> > changes like that actually pushes us back from our goal. Why not rework
> > code so it will work with correct cpu_is_bsp() function instead of
> > introducing this hack?
> 
> If you can confirm that there is a serious use case behind it, I will
> look into this again. But so far, I did not find it.
> 
Firs of all it is correctness issue. We should emulate x86 platform and
nothing there says that BSP apic id is zero. Second part of CPU topology
information is encoded in apic id. i.e when socket/core/ht topology is
used we can't just arbitrary specify apic ids for each logical cpu, we
should follow the rules described in SDM. For instance when more then 16
CPUs are present AMD advices to start numbering apic ids from 16 and leave
first 16 IDs for IOAPICs. And third introduction of this hack shows that
something is done wrong in other places of the code. Somewhere
initialization order is incorrect.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: segfault at start with latest qemu-kvm.git

2010-03-04 Thread Jan Kiszka
Jan Kiszka wrote:
> David S. Ahern wrote:
>> On 03/03/2010 04:20 PM, Jan Kiszka wrote:
>>> David S. Ahern wrote:

 On 03/03/2010 04:08 PM, Jan Kiszka wrote:
> David S. Ahern wrote:
>> With latest qemu-kvm.git I am getting a segfault at start:
>>
>> /tmp/qemu-kvm-test/bin/qemu-system-x86_64 -m 1024 -smp 2 \
>>   -drive file=/images/f12-x86_64.img,if=virtio,cache=none,boot=on
>>
>> kvm_create_vcpu: Invalid argument
>> Segmentation fault (core dumped)
>>
>>
>> git bisect points to:
>>
>> Bisecting: 0 revisions left to test after this (roughly 0 steps)
>> [52b03dd70261934688cb00768c4b1e404716a337] qemu-kvm: Move
>> kvm_set_boot_cpu_id
>>
>>
>> $ git show
>> commit 7811d4e8ec057d25db68f900be1f09a142faca49
>> Author: Marcelo Tosatti 
>> Date:   Mon Mar 1 21:36:31 2010 -0300
>>
>>
>> If I manually back out the patch it will boot fine.
>>
> Problem persists after removing the build directory and doing a fresh
> configure && make? I'm asking before taking the bug (which would be
> mine, likely) as I recently spent some hours "debugging" a volatile
> build system issue.
>
> Jan
>
 Before sending the email I pulled a fresh clone in a completely
 different directory (/tmp) to determine if it was something I
 introduced. I then went back to my usual location, unapplied the patch
 and it worked fine.
>>> OK, that reason can be excluded. What's your host kernel kvm version?
>>>
>>> (Of course, the issue does not show up here. But virtio currently does
>>> not boot for me - independent of my patch.)
>>>
>>> Jan
>>>
>> Fedora Core 12,
>>
>> Linux daahern-lx 2.6.31.12-174.2.22.fc12.x86_64 #1 SMP Fri Feb 19
>> 18:55:03 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
>>
> 
> Reproduced after switching back to kvm-kmod-2.6.31, will debug.
> 

Subtle memory corruption: qemu_malloc is returning a pointer that
happens to become kvm_state twice. I bet my patch just exchanges some of
the involved parties and exposes the issue more prominently. Trying to
understand malloc's issue now...

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v4 03/10] x86: Extend validity of cpu_is_bsp

2010-03-04 Thread Jan Kiszka
Gleb Natapov wrote:
> On Thu, Mar 04, 2010 at 12:34:22AM +0100, Jan Kiszka wrote:
>> Gleb Natapov wrote:
>>> On Mon, Mar 01, 2010 at 06:17:22PM +0100, Jan Kiszka wrote:
 As we hard-wire the BSP to CPU 0 anyway and cpuid_apic_id equals
 cpu_index, cpu_is_bsp can also be based on the latter directly. This
 will help an early user of it: KVM while initializing mp_state.

 Signed-off-by: Jan Kiszka 
 ---
  hw/pc.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)

 diff --git a/hw/pc.c b/hw/pc.c
 index b90a79e..58c32ea 100644
 --- a/hw/pc.c
 +++ b/hw/pc.c
 @@ -767,7 +767,8 @@ static void pc_init_ne2k_isa(NICInfo *nd)
  
  int cpu_is_bsp(CPUState *env)
  {
 -return env->cpuid_apic_id == 0;
 +/* We hard-wire the BSP to the first CPU. */
 +return env->cpu_index == 0;
  }
>>> We should not assume that. The function was written like that
>>> specifically so the code around it will not rely on this assumption.
>>> Now you change that specifically to write code that will do incorrect
>>> assumptions. I don't see the logic here.
>> The logic is that we do not support any other mapping yet - with or
>> without this change. Without it, we complicate the APIC initialization
>> for (so far) no good reason. Once we want to support different BSP
>> assignments, we need to go through the code and rework some parts anyway.
>>
> As far as I remember the only part that was missing was a command line to
> specify apic IDs for each CPU and what CPU is BSP. The code was ready
> otherwise. I's very sad if this was broken by other modifications. But
> changes like that actually pushes us back from our goal. Why not rework
> code so it will work with correct cpu_is_bsp() function instead of
> introducing this hack?

If you can confirm that there is a serious use case behind it, I will
look into this again. But so far, I did not find it.

Jan



signature.asc
Description: OpenPGP digital signature