Re: [Qemu-devel] [PATCH 2/2] qemu-kvm: Add svm cpuid features

2010-09-12 Thread Avi Kivity

 On 09/11/2010 05:20 PM, Joerg Roedel wrote:

On Sat, Sep 11, 2010 at 03:43:02PM +0200, Alexander Graf wrote:

@@ -305,6 +322,8 @@ static x86_def_t builtin_x86_defs[] = {
 CPUID_EXT3_OSVW, CPUID_EXT3_IBS */
 .ext3_features = CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM |
 CPUID_EXT3_ABM | CPUID_EXT3_SSE4A,
+.svm_features = CPUID_SVM_NPT | CPUID_SVM_LBRV | CPUID_SVM_NRIPSAVE |
+CPUID_SVM_VMCBCLEAN,

Does that phenom already do all those? It does NPT, but I'm not sure
about NRIPSAVE for example.

Depends on which Phenom you have. A Phenom II has NRIPSAVE but the old
Phenoms don't have it. For the SVM features it is not that important
what the host hardware supports but what KVM can emulate. VMCBCLEAN can
be emulated without supporting it in the host for example.


Well, let's have a phenom2 type for those new features (and any other 
features the phenom 2 has).  What's the point of using the name of 
existing hardware if it doesn't match that hardware?


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] qemu-kvm-0.13.0-rc1

2010-09-12 Thread Avi Kivity

 On 09/10/2010 10:48 PM, Anthony Liguori wrote:
I agree, is there any reason not to enable compiling less into the 
binary?

There are folks interested in eliminating as much as possible to reduce
the attack surface and auditing requirements, for example.


It's not a bad idea, it's just that what --disable-cpu-emulation does 
is evil.  Being that I wrote the implementation, I'm quite confident 
in declare it as such :-)




Oh, I thought you were against the idea in itself for some reason.

I'll patch it for 0.13, but any ideas on how it should be rework for master?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2] qemu-kvm: Add svm cpuid features

2010-09-12 Thread Alexander Graf

On 12.09.2010, at 08:05, Avi Kivity wrote:

 On 09/11/2010 05:20 PM, Joerg Roedel wrote:
 On Sat, Sep 11, 2010 at 03:43:02PM +0200, Alexander Graf wrote:
 @@ -305,6 +322,8 @@ static x86_def_t builtin_x86_defs[] = {
 CPUID_EXT3_OSVW, CPUID_EXT3_IBS */
 .ext3_features = CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM |
 CPUID_EXT3_ABM | CPUID_EXT3_SSE4A,
 +.svm_features = CPUID_SVM_NPT | CPUID_SVM_LBRV | 
 CPUID_SVM_NRIPSAVE |
 +CPUID_SVM_VMCBCLEAN,
 Does that phenom already do all those? It does NPT, but I'm not sure
 about NRIPSAVE for example.
 Depends on which Phenom you have. A Phenom II has NRIPSAVE but the old
 Phenoms don't have it. For the SVM features it is not that important
 what the host hardware supports but what KVM can emulate. VMCBCLEAN can
 be emulated without supporting it in the host for example.
 
 Well, let's have a phenom2 type for those new features (and any other 
 features the phenom 2 has).  What's the point of using the name of existing 
 hardware if it doesn't match that hardware?

Isn't that what cpu=host is there for? I don't want to see cpu type cluttering 
like we have it on ppc. I added the core2duo type for Mac OS X guests for which 
those are basically the oldest supported CPUs.

For the Phenom type, I honestly don't remember why, but there was also a good 
reason to add it. In fact, I use it today to have nested virt without -cpu host 
on hardware that's too new for my guests.

Either way, I don't think we need a phenom2 type. The features additional are 
minor enough to not really matter and all use cases I can come up with require 
either -cpu host (local virt) or -cpu phenom (migration).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add missing tcg_prologue_init() for --disable-cpu-emulation build

2010-09-12 Thread Avi Kivity
Add missing tcg_prologue_init() and tcg_ctx, remove code_gen_max_block_size.

Fixes ./configure --disable-cpu-emulation.

Signed-off-by: Avi Kivity a...@redhat.com
---
 target-i386/fake-exec.c |   12 +++-
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/target-i386/fake-exec.c b/target-i386/fake-exec.c
index dfa202d..e6f8363 100644
--- a/target-i386/fake-exec.c
+++ b/target-i386/fake-exec.c
@@ -12,22 +12,20 @@
  */
 #include exec.h
 #include cpu.h
+#include tcg.h
 
 int code_copy_enabled = 0;
 
 CCTable cc_table[CC_OP_NB];
 
+TCGContext tcg_ctx;
+
 void cpu_dump_statistics (CPUState *env, FILE*f,
   int (*cpu_fprintf)(FILE *f, const char *fmt, ...),
   int flags)
 {
 }
 
-unsigned long code_gen_max_block_size(void)
-{
-return 32;
-}
-
 void cpu_gen_init(void)
 {
 }
@@ -48,3 +46,7 @@ int cpu_x86_gen_code(CPUState *env, TranslationBlock *tb, int 
*gen_code_size_ptr
 void optimize_flags_init(void)
 {
 }
+
+void tcg_prologue_init(TCGContext *ctx)
+{
+}
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Don't launch guest if -no-kvm when tcg is not configured in

2010-09-12 Thread Avi Kivity
Signed-off-by: Avi Kivity a...@redhat.com
---
 vl.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 2a32cc5..22a3616 100644
--- a/vl.c
+++ b/vl.c
@@ -2473,6 +2473,10 @@ int main(int argc, char **argv, char **envp)
 break;
case QEMU_OPTION_no_kvm:
kvm_allowed = 0;
+#ifdef CONFIG_NO_CPU_EMULATION
+fprintf(stderr, cpu emulation not configured\n);
+exit(1);
+#endif
break;
 #ifdef CONFIG_KVM
case QEMU_OPTION_no_kvm_irqchip: {
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2] qemu-kvm: Add svm cpuid features

2010-09-12 Thread Avi Kivity

 On 09/12/2010 09:16 AM, Alexander Graf wrote:


Depends on which Phenom you have. A Phenom II has NRIPSAVE but the old
Phenoms don't have it. For the SVM features it is not that important
what the host hardware supports but what KVM can emulate. VMCBCLEAN can
be emulated without supporting it in the host for example.

Well, let's have a phenom2 type for those new features (and any other features 
the phenom 2 has).  What's the point of using the name of existing hardware if 
it doesn't match that hardware?

Isn't that what cpu=host is there for? I don't want to see cpu type cluttering 
like we have it on ppc. I added the core2duo type for Mac OS X guests for which 
those are basically the oldest supported CPUs.


-cpu host is to all supported features into your guest.
-cpu phenom is to pretend you are running on a phenom cpu.  This is 
useful for a migration farm for which the greatest common denominator is 
a phenom.


Those are separate use cases.


For the Phenom type, I honestly don't remember why, but there was also a good 
reason to add it. In fact, I use it today to have nested virt without -cpu host 
on hardware that's too new for my guests.


Curious, what guests balk at modern hardware but are fine with phenom?


Either way, I don't think we need a phenom2 type. The features additional are 
minor enough to not really matter and all use cases I can come up with require 
either -cpu host (local virt) or -cpu phenom (migration).


I'm fine with this (or with adding phenom2).  But don't make phenom 
contain flags that real phenoms don't have.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 24/29] KVM: MMU: Track NX state in struct kvm_mmu

2010-09-12 Thread Avi Kivity

 On 09/10/2010 06:31 PM, Joerg Roedel wrote:

With Nested Paging emulation the NX state between the two
MMU contexts may differ. To make sure that always the right
fault error code is recorded this patch moves the NX state
into struct kvm_mmu so that the code can distinguish between
L1 and L2 NX state.

--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -259,6 +259,8 @@ struct kvm_mmu {
u64 *lm_root;
u64 rsvd_bits_mask[2][4];

+   bool nx;
+
u64 pdptrs[4]; /* pae */
  };


This is redundant with mmu-base_role.nx.  Please post a follow on patch 
to use that instead.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 18/29] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa

2010-09-12 Thread Avi Kivity

 On 09/10/2010 06:30 PM, Joerg Roedel wrote:

This patch implements logic to make sure that either a
page-fault/page-fault-vmexit or a nested-page-fault-vmexit
is propagated back to the guest.

@@ -338,6 +338,22 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu)
kvm_queue_exception_e(vcpu, PF_VECTOR, error_code);
  }

+void kvm_propagate_fault(struct kvm_vcpu *vcpu)
+{
+   u32 nested, error;
+
+   error   = vcpu-arch.fault.error_code;
+   nested  = error   PFERR_NESTED_MASK;
+   error   = error  ~PFERR_NESTED_MASK;
+
+   vcpu-arch.fault.error_code = error;
+
+   if (mmu_is_nested(vcpu)  !nested)
+   vcpu-arch.nested_mmu.inject_page_fault(vcpu);
+   else
+   vcpu-arch.mmu.inject_page_fault(vcpu);
+}
+


Tacking a non-architectural bit on top of the error code is not very 
nice.  Can you move it to a separate vcpu-arch.fault.nested?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/29] Nested Paging Virtualization for KVM v4

2010-09-12 Thread Avi Kivity

 On 09/10/2010 06:30 PM, Joerg Roedel wrote:

Hi Avi, Marcelo,

as promised here is -v4 of my patch-set for virtualizing nested paging in KVM.
I addresses all of your review comments in this version and fixed a misbehavior
in the nested page-walker code where it would have reported the wrong
error-code on an emulated nested page fault.

Here is the complete list of changes:

* Fixed the bug in the gpa_to_gfn function
* Made sure that the right fault values are kept in the two-dimensional
  page-table walker
* Fixed the return code of x86_decode_insn so that a page-fault within
  that function can be handled
* Set vcpu-arch.mmu.direct_map to true for nonpaging mode too
* Changed code so that the right access-mode is used on nested
  page-table walks
* Made the NX mode a capability of the MMU context to distinguish
  between l1-nx and l2-nx (1 additional patch)
* Fixed the bug that KVM always reports the SVM flag as supported to
  userspace, it should only be reported on AMD hardware when nesting is
  enabled

As the patch-set before this one was tested with the same set of combinations
too. I found no regressions to current avi/master. This patch-set applies on
current avi/master plus the three fixes I sent last week.

As with the last version of the patch-set I also pushed this one to a tree on
korg. Find it in

git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-kvm.git 
npt-virt-v4

Please review these patches and/or apply them :-) As usual I appreciate your
feedback.


All applied, thanks.  I'll go play with my new toy now.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] S390: Add virtio hotplug add support

2010-09-12 Thread Avi Kivity

 On 09/12/2010 02:42 AM, Alexander Graf wrote:

On 24.08.2010, at 15:48, Alexander Graf wrote:


The one big missing feature in s390-virtio was hotplugging. This is no more.
This patch implements hotplug add support, so you can on the fly add new devices
in the guest.

Keep in mind that this needs a patch for qemu to actually leverage the
functionality.

Signed-off-by: Alexander Grafag...@suse.de

ping (on the patch set)?



Actually Marcelo applied it.  But the natural place for it is Rusty's 
virtio tree.  Rusty, if you want to take it, let me know and I'll drop 
it from kvm.git.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] svm: Add test for selective cr0 intercept

2010-09-12 Thread Avi Kivity

 On 09/10/2010 06:34 PM, Joerg Roedel wrote:

This patch adds a test to check if the selective cr0
intercept emulation of the kvm svm emulation works.

Signed-off-by: Joerg Roedeljoerg.roe...@amd.com
---
  x86/svm.c |   37 -
  1 files changed, 36 insertions(+), 1 deletions(-)

diff --git a/x86/svm.c b/x86/svm.c
index 2f1c900..e65360e 100644
--- a/x86/svm.c
+++ b/x86/svm.c
@@ -357,6 +357,40 @@ static bool check_asid_zero(struct test *test)
  return test-vmcb-control.exit_code == SVM_EXIT_ERR;
  }

+static void sel_cr0_prepare(struct test *test)
+{
+vmcb_ident(test-vmcb);
+test-vmcb-control.intercept |= (1ULL  INTERCEPT_SELECTIVE_CR0);
+}
+
+static bool sel_cr0_finished(struct test *test)
+{
+   return true;
+}


Coding style - kvm-unit-tests uses the qemu style.  Please drop the tabs 
from new code.



+
+static void sel_cr0_test(struct test *test)
+{
+   unsigned long cr0;
+
+   /* read cr0, clear CD, and write back */


set CD.  Better to ^= it to be sure to trigger.


+   cr0  = read_cr0();
+   cr0 |= (1UL  30);
+   write_cr0(cr0);


How about a test that ^= TS to see that we don't intercept unnecessarily?


+
+   /*
+* If we are here the test failed, not sure what to do now because we
+* are not in guest-mode anymore so we can't trigger an intercept.
+* Trigger a tripple-fault for now.
+*/
+   printf(sel_cr0 test failed. Can not recover from this - exiting\n);
+   exit(1);


Don't understand - we're still in guest mode (only running very 
slowly...).  All you have to do is fall off the end here, and you'll 
exit with VMMCALL.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] svm: Run tests with NPT enabled if available

2010-09-12 Thread Avi Kivity

 On 09/10/2010 06:34 PM, Joerg Roedel wrote:

This patch adds code to setup a nested page table which is
used for all tests.




+
+printf(NPT detected - running all tests with NPT enabled\n);
+
+/*
+ * Nested paging supported - Build a nested page table
+ * Build the page-table bottom-up and map everything with 2M pages
+ */
+
+address = 0;
+
+/* PTE level */


Conflicts with previous comment - these aren't 2M pages.


+for (i = 0; i  2048; ++i) {
+page = alloc_page();
+
+for (j = 0; j  512; ++j, address += 4096)
+page[j] = address | 0x067ULL;


{ }


  static void vmcb_set_seg(struct vmcb_seg *seg, u16 selector,
@@ -56,6 +110,11 @@ static void vmcb_ident(struct vmcb *vmcb)
  save-g_pat = rdmsr(MSR_IA32_CR_PAT);
  save-dbgctl = rdmsr(MSR_IA32_DEBUGCTLMSR);
  ctrl-intercept = (1ULL  INTERCEPT_VMRUN) | (1ULL  INTERCEPT_VMMCALL);
+
+if (npt_supported()) {
+ctrl-nested_ctl = 1;
+ctrl-nested_cr3 = (u64)pml4e;
+}
  }

  struct test {



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: high load with usb device

2010-09-12 Thread Avi Kivity

 On 09/10/2010 07:17 PM, Michael Tokarev wrote:

Note the changed subject line.

I just did a few tests with linux guest (amd64 2.6.35 kernel).

And it shows the same behavour as win7 (unlike winXP), namely,
high host CPU load when guest is idle.


Not for me - F12 idle guest takes 3.5% cpu.

What does 'top' in the guest show now?  Connect with ssh to avoid 
triggering the GUI.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] svm: Add test for selective cr0 intercept

2010-09-12 Thread Joerg Roedel
On Sun, Sep 12, 2010 at 11:06:34AM +0200, Avi Kivity wrote:
  On 09/10/2010 06:34 PM, Joerg Roedel wrote:
 +/*
 + * If we are here the test failed, not sure what to do now because we
 + * are not in guest-mode anymore so we can't trigger an intercept.
 + * Trigger a tripple-fault for now.
 + */
 +printf(sel_cr0 test failed. Can not recover from this - exiting\n);
 +exit(1);

 Don't understand - we're still in guest mode (only running very  
 slowly...).  All you have to do is fall off the end here, and you'll  
 exit with VMMCALL.

The bug I fixed was, that the guest continues to run in l1 mode with the
l2 rip, rsp, and rax. So if the bug is there, it continues to run in
this function, but with no chance to intercept anymore because the guest
is not longer in emulated guest mode.

I agree that this is a test specific to that bug. I should probably add
another test to check if the correct intercepts are reported.

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2] qemu-kvm: Add svm cpuid features

2010-09-12 Thread Alexander Graf

Am 12.09.2010 um 10:01 schrieb Avi Kivity a...@redhat.com:

 On 09/12/2010 09:16 AM, Alexander Graf wrote:
 
 Depends on which Phenom you have. A Phenom II has NRIPSAVE but the old
 Phenoms don't have it. For the SVM features it is not that important
 what the host hardware supports but what KVM can emulate. VMCBCLEAN can
 be emulated without supporting it in the host for example.
 Well, let's have a phenom2 type for those new features (and any other 
 features the phenom 2 has).  What's the point of using the name of existing 
 hardware if it doesn't match that hardware?
 Isn't that what cpu=host is there for? I don't want to see cpu type 
 cluttering like we have it on ppc. I added the core2duo type for Mac OS X 
 guests for which those are basically the oldest supported CPUs.
 
 -cpu host is to all supported features into your guest.
 -cpu phenom is to pretend you are running on a phenom cpu.  This is useful 
 for a migration farm for which the greatest common denominator is a phenom.
 
 Those are separate use cases.

Exactly. And the benefit from phenom - phenom2 is so minor that it doesn't 
make sense.

 
 For the Phenom type, I honestly don't remember why, but there was also a 
 good reason to add it. In fact, I use it today to have nested virt without 
 -cpu host on hardware that's too new for my guests.
 
 Curious, what guests balk at modern hardware but are fine with phenom?

Sles11 GA ;).

 
 Either way, I don't think we need a phenom2 type. The features additional 
 are minor enough to not really matter and all use cases I can come up with 
 require either -cpu host (local virt) or -cpu phenom (migration).
 
 I'm fine with this (or with adding phenom2).  But don't make phenom contain 
 flags that real phenoms don't have.

Those were my words :).


Alex
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2] qemu-kvm: Add svm cpuid features

2010-09-12 Thread Avi Kivity

 On 09/12/2010 12:06 PM, Alexander Graf wrote:

For the Phenom type, I honestly don't remember why, but there was also a good 
reason to add it. In fact, I use it today to have nested virt without -cpu host 
on hardware that's too new for my guests.

Curious, what guests balk at modern hardware but are fine with phenom?

Sles11 GA ;).


Still curious, how does -cpu host break it?


Either way, I don't think we need a phenom2 type. The features additional are 
minor enough to not really matter and all use cases I can come up with require 
either -cpu host (local virt) or -cpu phenom (migration).

I'm fine with this (or with adding phenom2).  But don't make phenom contain 
flags that real phenoms don't have.

Those were my words :).


Then we are in agreement.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2] qemu-kvm: Add svm cpuid features

2010-09-12 Thread Alexander Graf

Am 12.09.2010 um 12:22 schrieb Avi Kivity a...@redhat.com:

 On 09/12/2010 12:06 PM, Alexander Graf wrote:
 For the Phenom type, I honestly don't remember why, but there was also a 
 good reason to add it. In fact, I use it today to have nested virt without 
 -cpu host on hardware that's too new for my guests.
 Curious, what guests balk at modern hardware but are fine with phenom?
 Sles11 GA ;).
 
 Still curious, how does -cpu host break it?

Uh, something in the power management module. Just give it a try on an Istambul 
system. Just use the 11.1 iso - the kernels are close enough.

Alex

 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-12 Thread Michael S. Tsirkin
On Thu, Sep 09, 2010 at 03:15:53PM +0530, Krishna Kumar2 wrote:
  Krishna Kumar2/India/IBM wrote on 09/08/2010 10:17:49 PM:
 
 Some more results and likely cause for single netperf
 degradation below.
 
 
  Guest - Host (single netperf):
  I am getting a drop of almost 20%. I am trying to figure out
  why.
 
  Host - guest (single netperf):
  I am getting an improvement of almost 15%. Again - unexpected.
 
  Guest - Host TCP_RR: I get an average 7.4% increase in #packets
  for runs upto 128 sessions. With fewer netperf (under 8), there
  was a drop of 3-7% in #packets, but beyond that, the #packets
  improved significantly to give an average improvement of 7.4%.
 
  So it seems that fewer sessions is having negative effect for
  some reason on the tx side. The code path in virtio-net has not
  changed much, so the drop in some cases is quite unexpected.
 
 The drop for the single netperf seems to be due to multiple vhost.
 I changed the patch to start *single* vhost:
 
 Guest - Host (1 netperf, 64K): BW: 10.79%, SD: -1.45%
 Guest - Host (1 netperf) : Latency: -3%, SD: 3.5%
 
 Single vhost performs well but hits the barrier at 16 netperf
 sessions:
 
 SINGLE vhost (Guest - Host):
   1 netperf:BW: 10.7% SD: -1.4%
   4 netperfs:   BW: 3%SD: 1.4%
   8 netperfs:   BW: 17.7% SD: -10%
   16 netperfs:  BW: 4.7%  SD: -7.0%
   32 netperfs:  BW: -6.1% SD: -5.7%
 BW and SD both improves (guest multiple txqs help). For 32
 netperfs, SD improves.
 
 But with multiple vhosts, guest is able to send more packets
 and BW increases much more (SD too increases, but I think
 that is expected).

Why is this expected?

 From the earlier results:
 
 N#  BW1 BW2(%)  SD1 SD2(%)  RSD1RSD2(%)
 ___
 4   26387   40716 (54.30)   20  28   (40.00)86  85
 (-1.16)
 8   24356   41843 (71.79)   88  129  (46.59)372 362
 (-2.68)
 16  23587   40546 (71.89)   375 564  (50.40)15581519
 (-2.50)
 32  22927   39490 (72.24)   16172171 (34.26)66945722
 (-14.52)
 48  23067   39238 (70.10)   39315170 (31.51)15823   13552
 (-14.35)
 64  22927   38750 (69.01)   71429914 (38.81)28972   26173
 (-9.66)
 96  22568   38520 (70.68)   16258   27844 (71.26)   65944   73031
 (10.74)
 ___
 (All tests were done without any tuning)
 
 From my testing:
 
 1. Single vhost improves mq guest performance upto 16
netperfs but degrades after that.
 2. Multiple vhost degrades single netperf guest
performance, but significantly improves performance
for any number of netperf sessions.
 
 Likely cause for the 1 stream degradation with multiple
 vhost patch:
 
 1. Two vhosts run handling the RX and TX respectively.
I think the issue is related to cache ping-pong esp
since these run on different cpus/sockets.

Right. With TCP I think we are better off handling
TX and RX for a socket by the same vhost, so that
packet and its ack are handled by the same thread.
Is this what happens with RX multiqueue patch?
How do we select an RX queue to put the packet on?


 2. I (re-)modified the patch to share RX with TX[0]. The
performance drop is the same, but the reason is the
guest is not using txq[0] in most cases (dev_pick_tx),
so vhost's rx and tx are running on different threads.
But whenever the guest uses txq[0], only one vhost
runs and the performance is similar to original.
 
 I went back to my *submitted* patch and started a guest
 with numtxq=16 and pinned every vhost to cpus #01. Now
 whether guest used txq[0] or txq[n], the performance is
 similar or better (between 10-27% across 10 runs) than
 original code. Also, -6% to -24% improvement in SD.
 
 I will start a full test run of original vs submitted
 code with minimal tuning (Avi also suggested the same),
 and re-send. Please let me know if you need any other
 data.
 
 Thanks,
 
 - KK
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2] qemu-kvm: Add svm cpuid features

2010-09-12 Thread Avi Kivity

 On 09/12/2010 01:14 PM, Alexander Graf wrote:

Uh, something in the power management module. Just give it a try on an Istambul 
system. Just use the 11.1 iso - the kernels are close enough.


Oh, probably some silly msr.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vhost, something changed between 2.6.35 and 2.6.36 ?

2010-09-12 Thread Michael S. Tsirkin
On Fri, Sep 10, 2010 at 03:37:36PM +0200, Dhaval Giani wrote:
 Hi,
 
 I have been trying to get vhost+macvtap to work for me. I run it as
 
 /root/qemu-kvm-vhost-net/bin/qemu-system-x86_64 -hda $IMAGE  -serial
 stdio -monitor telnet::,server,nowait -vnc :4: -m 3G -net
 nic,model=virtio,macaddr=$MACADDR,netdev=macvtap0 -netdev
 tap,id=macvtap0,vhost=on,fd=3 3 /dev/tap5
 
 in 2.6.35, which worked just fine. On the other hand, with 2.6.36, i
 don't have working networking. I am using the same image and same
 macaddress. The qemu is the version from
 git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu-kvm.git vhost .

BTW, by now, all these patches are merged so upstream qemu-kvm should work
just fine for you as well.

 Any suggestions will be welcome!
 
 Thanks,
 Dhaval

You are running this as non-root user, correct?
This could be the permission issue that got fixed
by 87d6a412bd1ed82c14cabd4b408003b23bbd2880.
Could you please check the latest master from Linus,
and let me and the list know? Thanks!

Another thing to try if this does *not* help:

enable CONFIG_DYNAMIC_DEBUG in kernel,
rebuild the kernel,
mount debugfs:

mount -t debugfs none /sys/kernel/debug

and then enable debug for vhost_net as described in
Documentation/dynamic-debug-howto.txt:

echo 'module vhost_net +p'  /sys/kernel/debug/dynamic_debug/control

Then start qemu, and after running a test, run dmesg and see if there
are any messages from vhost_net. If yes please send them to
me and to the list.

Thanks!

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 18/24] Exiting from L2 to L1

2010-09-12 Thread Nadav Har'El
Hi,

Continuing to work on the nested VMX patches,

On Mon, Jun 14, 2010, Avi Kivity wrote about Re: [PATCH 18/24] Exiting from L2 
to L1:
 On 06/13/2010 03:31 PM, Nadav Har'El wrote:
...
 +/* prepare_vmcs_12 is called when the nested L2 guest exits and we want to
 + * prepare to run its L1 parent. L1 keeps a vmcs for L2 (vmcs12), and this
 + * function updates it to reflect the state of the registers during the 
 exit,
...
 +vmcs12-tsc_offset = vmcs_read64(TSC_OFFSET);

 
 TSC_OFFSET cannot have changed.

Right. I cleaned up this function now, to only copy the fields that could
have changed, namely fields listed as guest-state or exit-information fields
in the spec. Control fields like this TSC_OFFSET and more examples you found
below, indeed could not have changed while L2 was running or during the exit.

 +vmcs12-guest_ia32_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);

 Without msr bitmaps, cannot change.

I added a TODO before this (and a couple of others) for future optimization.
I'm not even convinced how much quicker it is to check the MSR bitmap before
doing vmcs_read64, vs just to going ahead and vmreading it in any case.

 +vmcs12-vmcs_link_pointer = vmcs_read64(VMCS_LINK_POINTER);

 Can this change?

Well, according to the spec, (SDM vol 3B), VMCS link pointer is a guest-state
field, but it is listed as being for future expansion. I guess that with
current hardware, it cannot change, but for future hardware it might. I'm
not sure if it's wiser to ignore this field for now (and shave a bit off
the l2-l1 switch time), or just copy it anyway, as I do now.
What would you prefer?

 +if (vmcs_config.vmentry_ctrl  VM_ENTRY_LOAD_IA32_PAT)
 +vmcs12-guest_ia32_pat = vmcs_read64(GUEST_IA32_PAT);

 
 Should check for VM_EXIT_SAVE_IA32_PAT, no?

You're absolutely right. Fixed.

 +vmcs12-vm_entry_intr_info_field =
 +vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);

 
 Autocleared, no need to read.

Well, we need to clear the valid bit on exit, so we don't mistakenly inject
the same interrupt twice. There were two ways to do it: 1. clear it ourselves,
or 2. copy the value from vmcs02 where the processor already cleared it.
There are pros and cons for each approach, but I'll change like you suggest,
to clear it ourselves:

vmcs12-vm_entry_intr_info_field = ~INTR_INFO_VALID_MASK;

 +vmcs12-vm_instruction_error = vmcs_read32(VM_INSTRUCTION_ERROR);

 We don't want to pass this to the guest?

I didn't quite understand your question, but now that I look at it, I see
that VM_INSTRUCTION_ERROR has nothing to do with exits, but is only modified
when running VMX instructions, and our emulation of VMX instructions already
sets it appropriately, so no sense of copying it here.

 
 +vmcs12-vm_exit_reason  = vmcs_read32(VM_EXIT_REASON);
 +vmcs12-vm_exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
 +vmcs12-vm_exit_intr_error_code = 
 vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
 +vmcs12-idt_vectoring_info_field =
 +vmcs_read32(IDT_VECTORING_INFO_FIELD);
 +vmcs12-idt_vectoring_error_code =
 +vmcs_read32(IDT_VECTORING_ERROR_CODE);
 +vmcs12-vm_exit_instruction_len = 
 vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
 +vmcs12-vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);

 
 For the above, if the host handles the exit, we must not clobber guest 
 fields.  A subsequent guest vmread will see the changed values even 
 though from its point of view a vmexit has not occurred.
 
 But no, that can't happen, since a vmread needs to have a vmexit first 
 to happen.  Still, best to delay this.+

All this code is in prepare_vmcs_12, which only gets called if we exit from
L2 to L1 - it doesn't get called when we exit from L2 to L0 (when the host
handles the exit).

As long as a certain field gets written to on *every* exit, and not just
on some of them, I believe we can safely copy the current values in vmcs02
to vmcs12, knowing these are current values from the current exit, and not
some old values we shouldn't be copying.

You may have a point (if that was your point?) that some fields are not
always written to - e.g., for most exits vm_exit_intr_info doesn't get
written to and just one valid bit is cleared. As the code is now, we copy
vmcs02's field, which might have been written earlier (e.g., during
an exit to L0) and not now, and an observant L1 might notice this value
it should not have seen. However, I don't see any problem with that, because
the valid bit would be correctly turned off, and the spec says that all
other bits are undefined (for irrelevant exits) and copying-old-values is one
legal setting for undefined bits...

 +/* If any of the CRO_GUEST_HOST_MASK bits are off, the L2 guest may
 + * have changed some cr0 bits without us ever saving them in the 
 shadow
 + * vmcs. So we need to save these changes now.
...
 +
 +vmcs12-guest_cr4 = vmcs_readl(GUEST_CR4);
 
 Can't we have the same issue with 

Re: [PATCH 18/24] Exiting from L2 to L1

2010-09-12 Thread Avi Kivity

 On 09/12/2010 04:05 PM, Nadav Har'El wrote:

Hi,

Continuing to work on the nested VMX patches,


Great.  Hopefully you can commit some time to it or you'll spend a lot 
of cycles just catching up.


Joerg just merged nested npt; as far as I can tell it is 100% in line 
with nested ept, but please take a look as well.





+   vmcs12-guest_ia32_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);


Without msr bitmaps, cannot change.

I added a TODO before this (and a couple of others) for future optimization.
I'm not even convinced how much quicker it is to check the MSR bitmap before
doing vmcs_read64, vs just to going ahead and vmreading it in any case.


IIRC we don't support msr bitmaps now, so no need to check anything.

In general, avoid vmcs reads as much as possible.  Just think of your 
code running in a guest!



+vmcs12-vmcs_link_pointer = vmcs_read64(VMCS_LINK_POINTER);

Can this change?

Well, according to the spec, (SDM vol 3B), VMCS link pointer is a guest-state
field, but it is listed as being for future expansion. I guess that with
current hardware, it cannot change, but for future hardware it might. I'm
not sure if it's wiser to ignore this field for now (and shave a bit off
the l2-l1 switch time), or just copy it anyway, as I do now.
What would you prefer?


If it changes in the future, it can only be under the influence of some 
control or at least guest-discoverable capability.  Since we don't 
expose such control or capability, there's no need to copy it.



+   vmcs12-vm_entry_intr_info_field =
+   vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);


Autocleared, no need to read.

Well, we need to clear the valid bit on exit, so we don't mistakenly inject
the same interrupt twice.


I don't think so.  We write this field as part of guest entry (that is, 
after the decision re which vmcs to use, yes?), so guest entry will 
always follow writing this field.  Since guest entry clears the field, 
reading it after an exit will necessarily return 0.


What can happen is that the contents of the field is transferred to the 
IDT_VECTORING_INFO field or VM_EXIT_INTR_INFO field.


(question: on a failed vmentry, is this field cleared?)


There were two ways to do it: 1. clear it ourselves,
or 2. copy the value from vmcs02 where the processor already cleared it.
There are pros and cons for each approach, but I'll change like you suggest,
to clear it ourselves:

vmcs12-vm_entry_intr_info_field= ~INTR_INFO_VALID_MASK;


That's really a temporary variable, I don't think you need to touch it.


+   vmcs12-vm_exit_reason  = vmcs_read32(VM_EXIT_REASON);
+   vmcs12-vm_exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
+   vmcs12-vm_exit_intr_error_code =
vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
+   vmcs12-idt_vectoring_info_field =
+   vmcs_read32(IDT_VECTORING_INFO_FIELD);
+   vmcs12-idt_vectoring_error_code =
+   vmcs_read32(IDT_VECTORING_ERROR_CODE);
+   vmcs12-vm_exit_instruction_len =
vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+   vmcs12-vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);


For the above, if the host handles the exit, we must not clobber guest
fields.  A subsequent guest vmread will see the changed values even
though from its point of view a vmexit has not occurred.

But no, that can't happen, since a vmread needs to have a vmexit first
to happen.  Still, best to delay this.+

All this code is in prepare_vmcs_12, which only gets called if we exit from
L2 to L1 - it doesn't get called when we exit from L2 to L0 (when the host
handles the exit).


Ok.  That answers my concern.


As long as a certain field gets written to on *every* exit, and not just
on some of them, I believe we can safely copy the current values in vmcs02
to vmcs12, knowing these are current values from the current exit, and not
some old values we shouldn't be copying.

You may have a point (if that was your point?) that some fields are not
always written to - e.g., for most exits vm_exit_intr_info doesn't get
written to and just one valid bit is cleared. As the code is now, we copy
vmcs02's field, which might have been written earlier (e.g., during
an exit to L0) and not now, and an observant L1 might notice this value
it should not have seen. However, I don't see any problem with that, because
the valid bit would be correctly turned off, and the spec says that all
other bits are undefined (for irrelevant exits) and copying-old-values is one
legal setting for undefined bits...


No, I wasn't worried about that, simply misunderstood the code.


+   /* If any of the CRO_GUEST_HOST_MASK bits are off, the L2 guest may
+* have changed some cr0 bits without us ever saving them in the
shadow
+* vmcs. So we need to save these changes now.

...

+
+   vmcs12-guest_cr4 = vmcs_readl(GUEST_CR4);

Can't we have the same issue with cr4?

I guess we can. I didn't think this (giving guest full control over cr4 bits)
was happening in KVM, but 

Re: [Qemu-devel] [PATCH 2/2] qemu-kvm: Add svm cpuid features

2010-09-12 Thread Joerg Roedel
On Sun, Sep 12, 2010 at 10:01:28AM +0200, Avi Kivity wrote:
  On 09/12/2010 09:16 AM, Alexander Graf wrote:

 Either way, I don't think we need a phenom2 type. The features
 additional are minor enough to not really matter and all use cases I
 can come up with require either -cpu host (local virt) or -cpu phenom
 (migration).

 I'm fine with this (or with adding phenom2).  But don't make phenom  
 contain flags that real phenoms don't have.

How about features that are not supported by the hardware but can be
supported in emulation? The VMCBCLEAN feature is one of those which
makes a lot of sense to reduce the emulated world-switch times. I guess
its ok to enable those with -cpu host?

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2] qemu-kvm: Add svm cpuid features

2010-09-12 Thread Avi Kivity

 On 09/12/2010 04:30 PM, Joerg Roedel wrote:



Either way, I don't think we need a phenom2 type. The features
additional are minor enough to not really matter and all use cases I
can come up with require either -cpu host (local virt) or -cpu phenom
(migration).

I'm fine with this (or with adding phenom2).  But don't make phenom
contain flags that real phenoms don't have.

How about features that are not supported by the hardware but can be
supported in emulation? The VMCBCLEAN feature is one of those which
makes a lot of sense to reduce the emulated world-switch times. I guess
its ok to enable those with -cpu host?




It's a good question.  We do that with x2apic, so yes.

Userspace can do four things with KVM_GET_SUPPORTED_CPUID:
- feed it right back to KVM_SET_CPUID2, getting maximum features and 
hopefully performance
- use it to mask the real cpuid which it fetches directly, getting 
something as similar as possible to the host
- use it to mask a predefined cpu type, getting something as similar as 
possible to that cpu type
- use it to verify that a predefined cpu type is supported, getting 
somthing exactly the same as that cpu type


-cpu host is the first option.  If the need comes for the second option, 
we can provide it.


I'll note that KVM_GET_SUPPORTED_CPUID can return values not present in 
the host cpuid in the documentation.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vhost, something changed between 2.6.35 and 2.6.36 ?

2010-09-12 Thread Dhaval Giani
On Sun, Sep 12, 2010 at 2:05 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Fri, Sep 10, 2010 at 03:37:36PM +0200, Dhaval Giani wrote:
 Hi,

 I have been trying to get vhost+macvtap to work for me. I run it as

 /root/qemu-kvm-vhost-net/bin/qemu-system-x86_64 -hda $IMAGE  -serial
 stdio -monitor telnet::,server,nowait -vnc :4: -m 3G -net
 nic,model=virtio,macaddr=$MACADDR,netdev=macvtap0 -netdev
 tap,id=macvtap0,vhost=on,fd=3 3 /dev/tap5

 in 2.6.35, which worked just fine. On the other hand, with 2.6.36, i
 don't have working networking. I am using the same image and same
 macaddress. The qemu is the version from
 git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu-kvm.git vhost .

 BTW, by now, all these patches are merged so upstream qemu-kvm should work
 just fine for you as well.

 Any suggestions will be welcome!

 Thanks,
 Dhaval

 You are running this as non-root user, correct?

nope as root.

 This could be the permission issue that got fixed
 by 87d6a412bd1ed82c14cabd4b408003b23bbd2880.
 Could you please check the latest master from Linus,
 and let me and the list know? Thanks!


this is with git of friday evening CEST.

 Another thing to try if this does *not* help:

 enable CONFIG_DYNAMIC_DEBUG in kernel,
 rebuild the kernel,
 mount debugfs:

        mount -t debugfs none /sys/kernel/debug

 and then enable debug for vhost_net as described in
 Documentation/dynamic-debug-howto.txt:


I will give this a run on monday morning when i am at the lab again.

        echo 'module vhost_net +p'  /sys/kernel/debug/dynamic_debug/control

 Then start qemu, and after running a test, run dmesg and see if there
 are any messages from vhost_net. If yes please send them to
 me and to the list.

 Thanks!



thanks!
Dhaval
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] svm: Run tests with NPT enabled if available

2010-09-12 Thread Joerg Roedel
On Sun, Sep 12, 2010 at 11:10:50AM +0200, Avi Kivity wrote:
  On 09/10/2010 06:34 PM, Joerg Roedel wrote:
 This patch adds code to setup a nested page table which is
 used for all tests.


 +
 +printf(NPT detected - running all tests with NPT enabled\n);
 +
 +/*
 + * Nested paging supported - Build a nested page table
 + * Build the page-table bottom-up and map everything with 2M pages
 + */
 +
 +address = 0;
 +
 +/* PTE level */

 Conflicts with previous comment - these aren't 2M pages.

Oh right, I used 2M pages in the first version but figured out then that
2MB are not granular enough to write all the NPT tests. Thus I changed
it to 4k pages and forgot to update the comment. I will resend this one.

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Document that KVM_GET_SUPPORTED_CPUID may return emulated values

2010-09-12 Thread Avi Kivity
Signed-off-by: Avi Kivity a...@redhat.com
---
 Documentation/kvm/api.txt |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index 24d6341..b336266 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -1042,8 +1042,9 @@ number is just right, the 'nent' field is adjusted to the 
number of valid
 entries in the 'entries' array, which is then filled.
 
 The entries returned are the host cpuid as returned by the cpuid instruction,
-with unknown or unsupported features masked out.  The fields in each entry
-are defined as follows:
+with unknown or unsupported features masked out.  Some features (for example,
+x2apic), may not be present in the host cpu, but are exposed by kvm if it can
+emulate them efficiently. The fields in each entry are defined as follows:
 
   function: the eax value used to obtain the entry
   index: the ecx value used to obtain the entry (for entries that are
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] svm: Add test for selective cr0 intercept

2010-09-12 Thread Avi Kivity

 On 09/12/2010 11:36 AM, Joerg Roedel wrote:

On Sun, Sep 12, 2010 at 11:06:34AM +0200, Avi Kivity wrote:

  On 09/10/2010 06:34 PM, Joerg Roedel wrote:

+   /*
+* If we are here the test failed, not sure what to do now because we
+* are not in guest-mode anymore so we can't trigger an intercept.
+* Trigger a tripple-fault for now.
+*/
+   printf(sel_cr0 test failed. Can not recover from this - exiting\n);
+   exit(1);

Don't understand - we're still in guest mode (only running very
slowly...).  All you have to do is fall off the end here, and you'll
exit with VMMCALL.

The bug I fixed was, that the guest continues to run in l1 mode with the
l2 rip, rsp, and rax. So if the bug is there, it continues to run in
this function, but with no chance to intercept anymore because the guest
is not longer in emulated guest mode.


I see.  Of course it is right to test for specific regressions, not just 
random features.



I agree that this is a test specific to that bug. I should probably add
another test to check if the correct intercepts are reported.


The more, the merrier!

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vhost, something changed between 2.6.35 and 2.6.36 ?

2010-09-12 Thread Michael S. Tsirkin
On Sun, Sep 12, 2010 at 04:39:29PM +0200, Dhaval Giani wrote:
 On Sun, Sep 12, 2010 at 2:05 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Fri, Sep 10, 2010 at 03:37:36PM +0200, Dhaval Giani wrote:
  Hi,
 
  I have been trying to get vhost+macvtap to work for me. I run it as
 
  /root/qemu-kvm-vhost-net/bin/qemu-system-x86_64 -hda $IMAGE  -serial
  stdio -monitor telnet::,server,nowait -vnc :4: -m 3G -net
  nic,model=virtio,macaddr=$MACADDR,netdev=macvtap0 -netdev
  tap,id=macvtap0,vhost=on,fd=3 3 /dev/tap5
 
  in 2.6.35, which worked just fine. On the other hand, with 2.6.36, i
  don't have working networking. I am using the same image and same
  macaddress. The qemu is the version from
  git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu-kvm.git vhost .
 
  BTW, by now, all these patches are merged so upstream qemu-kvm should work
  just fine for you as well.
 
  Any suggestions will be welcome!
 
  Thanks,
  Dhaval
 
  You are running this as non-root user, correct?
 
 nope as root.
 
  This could be the permission issue that got fixed
  by 87d6a412bd1ed82c14cabd4b408003b23bbd2880.
  Could you please check the latest master from Linus,
  and let me and the list know? Thanks!
 
 
 this is with git of friday evening CEST.
 
  Another thing to try if this does *not* help:
 
  enable CONFIG_DYNAMIC_DEBUG in kernel,
  rebuild the kernel,
  mount debugfs:
 
         mount -t debugfs none /sys/kernel/debug
 
  and then enable debug for vhost_net as described in
  Documentation/dynamic-debug-howto.txt:
 
 
 I will give this a run on monday morning when i am at the lab again.
 
         echo 'module vhost_net +p'  /sys/kernel/debug/dynamic_debug/control
 
  Then start qemu, and after running a test, run dmesg and see if there
  are any messages from vhost_net. If yes please send them to
  me and to the list.
 
  Thanks!
 
 
 
 thanks!
 Dhaval

Another thing to try check is generic net core issues.

For this, try running tcpdump on both tap in host
and on virtio net device in guest. Then
send packets to host from guest and back, and check whether
they appears on virtio and on tap.

Thanks!

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] qemu-kvm-0.13.0-rc1

2010-09-12 Thread Anthony Liguori

On 09/12/2010 01:11 AM, Avi Kivity wrote:

 On 09/10/2010 10:48 PM, Anthony Liguori wrote:
I agree, is there any reason not to enable compiling less into the 
binary?

There are folks interested in eliminating as much as possible to reduce
the attack surface and auditing requirements, for example.


It's not a bad idea, it's just that what --disable-cpu-emulation does 
is evil.  Being that I wrote the implementation, I'm quite confident 
in declare it as such :-)




Oh, I thought you were against the idea in itself for some reason.

I'll patch it for 0.13, but any ideas on how it should be rework for 
master?


Glauber's old Accel interface was close to the right approach.  We need 
to abstract the exec.c interfaces to use a function pointer table and 
have a TCG and KVM implementation.  The function pointer tables can then 
be registered by a module_init() and we can simply not include the kvm 
or TCG files are build time to disable the functionality.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] qemu-kvm-0.13.0-rc1

2010-09-12 Thread Avi Kivity

 On 09/12/2010 05:31 PM, Anthony Liguori wrote:

On 09/12/2010 01:11 AM, Avi Kivity wrote:

 On 09/10/2010 10:48 PM, Anthony Liguori wrote:
I agree, is there any reason not to enable compiling less into the 
binary?
There are folks interested in eliminating as much as possible to 
reduce

the attack surface and auditing requirements, for example.


It's not a bad idea, it's just that what --disable-cpu-emulation 
does is evil.  Being that I wrote the implementation, I'm quite 
confident in declare it as such :-)




Oh, I thought you were against the idea in itself for some reason.

I'll patch it for 0.13, but any ideas on how it should be rework for 
master?


Glauber's old Accel interface was close to the right approach.  We 
need to abstract the exec.c interfaces to use a function pointer table 
and have a TCG and KVM implementation.  The function pointer tables 
can then be registered by a module_init() and we can simply not 
include the kvm or TCG files are build time to disable the functionality.


Yes, I remember it now.

Glauber, can you bring those patches back from the land of the dead?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 18/24] Exiting from L2 to L1

2010-09-12 Thread Nadav Har'El
On Sun, Sep 12, 2010, Avi Kivity wrote about Re: [PATCH 18/24] Exiting from L2 
to L1:
 Great.  Hopefully you can commit some time to it or you'll spend a lot 
 of cycles just catching up.

Right. I will.

 Joerg just merged nested npt; as far as I can tell it is 100% in line 
 with nested ept, but please take a look as well.

Indeed. Making nested EPT work based on the nested NPT work is one of the
first things I plan to do after the basic nested VMX patches are accepted.
As you know, we already have a version of nested EPT working in our testing
code, but I'll need to tweak it a bit to use the common nested NPT code.

 +  vmcs12-guest_ia32_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
 
 Without msr bitmaps, cannot change.
 I added a TODO before this (and a couple of others) for future 
 optimization.
 I'm not even convinced how much quicker it is to check the MSR bitmap 
 before
 doing vmcs_read64, vs just to going ahead and vmreading it in any case.
 
 IIRC we don't support msr bitmaps now, so no need to check anything.

I do think we support msr bitmaps... E.g., we have
nested_vmx_exit_handled_msr() to check whether L1 requires an exit for a
certain MSR access.  Where don't we support them? (but I'm not denying the
possiblity that this support still has holes or bugs).

 In general, avoid vmcs reads as much as possible.  Just think of your 
 code running in a guest!

Yes. On the other hand, I don't want to be sorry in the future when I want
to support some feature, but because I wanted to shave off 1% of the L2-L1
switching time, and 0.01% of the total run time (and I'm just making
numbers up...), I now need to find a dozen places where things need to change
to support this feature. On the other hand, this will likely happen anyway ;-)

 
 +vmcs12-vmcs_link_pointer = vmcs_read64(VMCS_LINK_POINTER);
 Can this change?
..
 If it changes in the future, it can only be under the influence of some 
 control or at least guest-discoverable capability.  Since we don't 
 expose such control or capability, there's no need to copy it.

You convinced me. Removed it.

 +  vmcs12-vm_entry_intr_info_field =
 +  vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
 
 Autocleared, no need to read.
 Well, we need to clear the valid bit on exit, so we don't mistakenly 
 inject
 the same interrupt twice.
 
 I don't think so.  We write this field as part of guest entry (that is, 
 after the decision re which vmcs to use, yes?), so guest entry will 
 always follow writing this field.  Since guest entry clears the field, 
 reading it after an exit will necessarily return 0.

Well, you obviously know the KVM code much better than I do, but from what
I saw, I thought (maybe I misunderstood) that in normal (non-nested) KVM,
this field only gets written on injection, not on every entry, so the code
relies on the fact that the processor turns off the valid bit during exit,
to avoid the same event being injected when the same field value is used for
another entry. I can only find code which resets this field in vmx_vcpu_reset(),
but that doesn't get called on every entry, right? Or am I missing something?

 What can happen is that the contents of the field is transferred to the 
 IDT_VECTORING_INFO field or VM_EXIT_INTR_INFO field.
 
 (question: on a failed vmentry, is this field cleared?)

I don't know the answer :-)

 There were two ways to do it: 1. clear it ourselves,
 or 2. copy the value from vmcs02 where the processor already cleared it.
 There are pros and cons for each approach, but I'll change like you 
 suggest,
 to clear it ourselves:
 
  vmcs12-vm_entry_intr_info_field= ~INTR_INFO_VALID_MASK;
 
 That's really a temporary variable, I don't think you need to touch it.

But we need to emulate the correct VMX behavior. According to the spec, the
valid bit on this field needs to be cleared on vmexit, so we need to do it
also on emulated exits from L2 to L1. If we're sure that we already cleared it
earlier, then fine, but if not (and like I said, I failed to find this code),
we need to do it now, on exit - either by explictly clearing the bit or by
copying a value where the processor cleared this bit (arguably, the former is
more correct emulation).

 I didn't mean register independent helper; one function for cr0 and one 
 function for cr4 so the reader can just see the name and pretend to 
 understand what it does, instead of seeing a bunch of incomprehensible 
 bitwise operations.

Ok, done:

/*
 * On a nested exit from L2 to L1, vmcs12.guest_cr0 might not be up-to-date
 * because L2 may have changed some cr0 bits directly (see CRO_GUEST_HOST_MASK)
 * without L0 trapping the change and updating vmcs12.
 * This function returns the value we should put in vmcs12.guest_cr0. It's not
 * enough to just return the current (vmcs02) GUEST_CR0. This may not be the
 * guest cr0 that L1 thought it was giving its L2 guest - it is possible that
 * L1 wished to allow its guest to set a cr0 bit directly, but we (L0) asked
 * to trap 

Re: [PATCH 18/24] Exiting from L2 to L1

2010-09-12 Thread Avi Kivity

 On 09/12/2010 07:05 PM, Nadav Har'El wrote:



+   vmcs12-guest_ia32_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);


Without msr bitmaps, cannot change.

I added a TODO before this (and a couple of others) for future
optimization.
I'm not even convinced how much quicker it is to check the MSR bitmap
before
doing vmcs_read64, vs just to going ahead and vmreading it in any case.

IIRC we don't support msr bitmaps now, so no need to check anything.

I do think we support msr bitmaps... E.g., we have
nested_vmx_exit_handled_msr() to check whether L1 requires an exit for a
certain MSR access.  Where don't we support them? (but I'm not denying the
possiblity that this support still has holes or bugs).


I was just talking from memory.  If you do support them, that's great.

Note that kvm itself doesn't support give the guest control of 
DEBUGCTLMSR, so you should just be able to read it from the shadow value 
(which strangely doesn't exist - I'll post a fix).



In general, avoid vmcs reads as much as possible.  Just think of your
code running in a guest!

Yes. On the other hand, I don't want to be sorry in the future when I want
to support some feature, but because I wanted to shave off 1% of the L2-L1
switching time, and 0.01% of the total run time (and I'm just making
numbers up...), I now need to find a dozen places where things need to change
to support this feature. On the other hand, this will likely happen anyway ;-)


Well, with msrs you have two cases: those which the guest controls and 
those which are shadowed.  So all we need is a systematic way for 
dealing with the two types.



+   vmcs12-vm_entry_intr_info_field =
+   vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);


Autocleared, no need to read.

Well, we need to clear the valid bit on exit, so we don't mistakenly
inject
the same interrupt twice.

I don't think so.  We write this field as part of guest entry (that is,
after the decision re which vmcs to use, yes?), so guest entry will
always follow writing this field.  Since guest entry clears the field,
reading it after an exit will necessarily return 0.

Well, you obviously know the KVM code much better than I do, but from what
I saw, I thought (maybe I misunderstood) that in normal (non-nested) KVM,
this field only gets written on injection, not on every entry, so the code
relies on the fact that the processor turns off the valid bit during exit,
to avoid the same event being injected when the same field value is used for
another entry.


Correct.


I can only find code which resets this field in vmx_vcpu_reset(),
but that doesn't get called on every entry, right? Or am I missing something?


prepare_vmcs12() is called in response for a 2-1 vmexit which is first 
trapped by 0, yes?  Because it's called immediately after a vmexit, 
VM_ENTRY_INTR_INFO_FIELD is guaranteed to have been cleared by the 
processor.


There are two cases where VM_ENTRY_INTR_INFO_FIELD can potentially not 
be cleared by hardware:


1. if we call prepare_vmcs12() between injection and entry.  This cannot 
happen AFAICT.
2. if the vmexit was really a failed 1-2 vmentry, and if the processor 
doesn't clear VM_ENTRY_INTR_INFO_FIELD in response to vm entry failures 
(need to check scripture)


If neither of these are valid, the code can be removed.  If only the 
second, we might make it conditional.



What can happen is that the contents of the field is transferred to the
IDT_VECTORING_INFO field or VM_EXIT_INTR_INFO field.

(question: on a failed vmentry, is this field cleared?)

I don't know the answer :-)


Sheng?


There were two ways to do it: 1. clear it ourselves,
or 2. copy the value from vmcs02 where the processor already cleared it.
There are pros and cons for each approach, but I'll change like you
suggest,
to clear it ourselves:

vmcs12-vm_entry_intr_info_field= ~INTR_INFO_VALID_MASK;

That's really a temporary variable, I don't think you need to touch it.

But we need to emulate the correct VMX behavior. According to the spec, the
valid bit on this field needs to be cleared on vmexit, so we need to do it
also on emulated exits from L2 to L1. If we're sure that we already cleared it
earlier, then fine, but if not (and like I said, I failed to find this code),
we need to do it now, on exit - either by explictly clearing the bit or by
copying a value where the processor cleared this bit (arguably, the former is
more correct emulation).


Sorry, I misread it as vmx-idt_vectoring_info which is a temporary 
variable used to cache IDT_VECTORING_INFO.  Ignore my remark.



I didn't mean register independent helper; one function for cr0 and one
function for cr4 so the reader can just see the name and pretend to
understand what it does, instead of seeing a bunch of incomprehensible
bitwise operations.

Ok, done:

/*
  * On a nested exit from L2 to L1, vmcs12.guest_cr0 might not be up-to-date
  * because L2 may have changed some cr0 bits directly (see CRO_GUEST_HOST_MASK)
  * without L0 trapping 

Re: [PATCH 18/24] Exiting from L2 to L1

2010-09-12 Thread Nadav Har'El
On Sun, Sep 12, 2010, Avi Kivity wrote about Re: [PATCH 18/24] Exiting from L2 
to L1:
 +vmcs12-vm_entry_intr_info_field =
 +vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
...
 prepare_vmcs12() is called in response for a 2-1 vmexit which is first 
 trapped by 0, yes?  Because it's called immediately after a vmexit, 
 VM_ENTRY_INTR_INFO_FIELD is guaranteed to have been cleared by the 
 processor.

Indeed - it is cleared in the real vmcs, i.e., vmcs02, but we also need
to clear it in the emulated vmcs that L1 sees for L2, i.e., vmcs12.

The original code (quoted above) just copied the vmcs02 value to vmcs12,
which was (mostly) fine because the vmcs02 value has a correctly-cleared
bit - but you asked to avoid the vmread.

So the second option is to just explicitly remove the valid bit from
vmcs12-vm_entry_intro_info_field, which I do now.
vmcs12-vm_entry_intro_info_field was set by L1 before it entered L2, and now
that L2 is exiting back to L1, we need to clear the valid bit.

The more I think about it, the more I become convinced that the second
option is indeed better than the first option (the original code in the
patch).

 There are two cases where VM_ENTRY_INTR_INFO_FIELD can potentially not 
 be cleared by hardware:
...
 If neither of these are valid, the code can be removed.  If only the 
 second, we might make it conditional.

Again, unless I'm misunderstanding what you mean, the hardware only
modified vmcs02 (the hardware vmcs), not vmcs12. We need to modify vmcs12
as well, to remove the valid bit. If we don't, when L1 enters into the same
L2 again, the same old value will be copied again from vmcs12 to vmcs02,
and cause an injection of the same interrupt again.

And by the way, I haven't said this enough, but thanks for your continued
reviews and all your very useful corrections for these patches!

Nadav.

-- 
Nadav Har'El|   Sunday, Sep 12 2010, 5 Tishri 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |I before E except after C. We live in a
http://nadav.harel.org.il   |weird society!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM-test: Add a new macaddress pool algorithm v3

2010-09-12 Thread Lucas Meneghel Rodrigues
From: Amos Kong ak...@redhat.com

Old method uses the addresses in the config files which could lead serious
problem when multiple tests running in different hosts.
This patch adds a new macaddress pool algorithm, it generates the mac prefix
based on mac address of the host, and fix it to correspond to IEEE802.
When user have set the mac_prefix in the config file, we should use it instead
of the dynamic generated mac prefix.
Add a parameter like 'preserve_mac', to preserve the original mac address, for
things like migration.

MAC addresses are recorded into a dictionary 'address_pool' in following
format: {{'20100310-165222-Wt7l:0' : 'AE:9D:94:6A:9b:f9'},...}
  20100310-165222-Wt7l : instance attribute of VM
  0: index of NIC
  AE:9D:94:6A:9b:f9: mac address
Use 'vm instance' + 'nic index' as the key, macaddress is the value.

Changes from v2:
- Instead of basing ourselves in a physical interface address to get
an address prefix, just generate one with the prefix 0x9a (convention)
randomly and add it to the address pool. If there's already one
prefix, keep it there and just return it.
- Made messages more consistent and informative
- Made function names consistent all across the board
- Fixed some single line spacing between functions

Changs from v1:
- Use 'vm instance' + 'nic index' as the key of address_pool, address is value.
- Put 'mac_lock' and 'address_pool' to '/tmp', for sharing them to other
  autotest instances running on the same host.
- Change function names for less confusion.
- Do not copy 'vm.instance' in vm.clone()
- Split 'adding get_ifname function' to another patch

Signed-off-by: Jason Wang jasow...@redhat.com
Signed-off-by: Feng Yang fy...@redhat.com
Signed-off-by: Amos Kong ak...@redhat.com
---
 client/tests/kvm/kvm_utils.py  |  114 
 client/tests/kvm/kvm_vm.py |   83 ++--
 client/tests/kvm/tests_base.cfg.sample |2 +-
 3 files changed, 193 insertions(+), 6 deletions(-)

diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index fb2d1c2..bb5c868 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -5,6 +5,7 @@ KVM test utility functions.
 
 
 import time, string, random, socket, os, signal, re, logging, commands, cPickle
+import fcntl, shelve
 from autotest_lib.client.bin import utils
 from autotest_lib.client.common_lib import error, logging_config
 import kvm_subprocess
@@ -82,6 +83,119 @@ def get_sub_dict_names(dict, keyword):
 
 # Functions related to MAC/IP addresses
 
+def _generate_mac_address_prefix():
+
+Generate a MAC address prefix. By convention we will set KVM autotest
+MAC addresses to start with 0x9a.
+
+l = [0x9a, random.randint(0x00, 0x7f), random.randint(0x00, 0x7f),
+ random.randint(0x00, 0xff)]
+prefix = ':'.join(map(lambda x: %02x % x, l)) + :
+return prefix
+
+
+def generate_mac_address_prefix():
+
+Generate a random MAC address prefix and add it to the MAC pool dictionary.
+If there's a MAC prefix there already, do not update the MAC pool and just
+return what's in there.
+
+lock_file = open(/tmp/mac_lock, 'w')
+fcntl.lockf(lock_file.fileno() ,fcntl.LOCK_EX)
+mac_pool = shelve.open(/tmp/address_pool, writeback=False)
+
+if mac_pool.get('prefix'):
+prefix = mac_pool.get('prefix')
+logging.debug('Retrieved previously generated MAC prefix for this '
+  'host: %s', prefix)
+else:
+prefix = _generate_mac_address_prefix()
+mac_pool['prefix'] = prefix
+logging.debug('Generated MAC address prefix for this host: %s', prefix)
+
+mac_pool.close()
+fcntl.lockf(lock_file.fileno(), fcntl.LOCK_UN)
+lock_file.close()
+
+return prefix
+
+
+def generate_mac_address(root_dir, instance_vm, nic_index, prefix=None):
+
+Random generate a MAC address and add it to the MAC pool.
+
+Try to generate macaddress based on the mac address prefix, add it to a
+dictionary 'address_pool'.
+key = VM instance + nic index, value = mac address
+{['20100310-165222-Wt7l:0'] : 'AE:9D:94:6A:9b:f9'}
+
+@param root_dir: Root dir for kvm.
+@param instance_vm: Here we use instance of vm.
+@param nic_index: The index of nic.
+@param prefix: Prefix of MAC address.
+@return: MAC address string.
+
+if prefix is None:
+prefix = generate_mac_address_prefix()
+
+lock_file = open(/tmp/mac_lock, 'w')
+fcntl.lockf(lock_file.fileno() ,fcntl.LOCK_EX)
+mac_pool = shelve.open(/tmp/address_pool, writeback=False)
+found = False
+key = %s:%s % (instance_vm, nic_index)
+
+if mac_pool.get(key):
+found = True
+mac = mac_pool.get(key)
+
+while not found:
+suffix = %02x:%02x % (random.randint(0x00,0xfe),
+random.randint(0x00,0xfe))
+mac = prefix + suffix
+mac_list = 

Re: [PATCH 2/3] S390: Add virtio hotplug add support

2010-09-12 Thread Rusty Russell
On Sun, 12 Sep 2010 06:30:43 pm Avi Kivity wrote:
   On 09/12/2010 02:42 AM, Alexander Graf wrote:
  On 24.08.2010, at 15:48, Alexander Graf wrote:
 
  The one big missing feature in s390-virtio was hotplugging. This is no 
  more.
  This patch implements hotplug add support, so you can on the fly add new 
  devices
  in the guest.
 
  Keep in mind that this needs a patch for qemu to actually leverage the
  functionality.
 
  Signed-off-by: Alexander Grafag...@suse.de
  ping (on the patch set)?
 
 
 Actually Marcelo applied it.  But the natural place for it is Rusty's 
 virtio tree.  Rusty, if you want to take it, let me know and I'll drop 
 it from kvm.git.

I thought it would be in the s390 tree, which is why I didn't take it...

But I'm *always* happy to let do the work!

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-12 Thread Krishna Kumar2
Michael S. Tsirkin m...@redhat.com wrote on 09/12/2010 05:10:25 PM:

  SINGLE vhost (Guest - Host):
 1 netperf:BW: 10.7% SD: -1.4%
 4 netperfs:   BW: 3%SD: 1.4%
 8 netperfs:   BW: 17.7% SD: -10%
16 netperfs:  BW: 4.7%  SD: -7.0%
32 netperfs:  BW: -6.1% SD: -5.7%
  BW and SD both improves (guest multiple txqs help). For 32
  netperfs, SD improves.
 
  But with multiple vhosts, guest is able to send more packets
  and BW increases much more (SD too increases, but I think
  that is expected).

 Why is this expected?

Results with the original kernel:
_
#   BW  SD  RSD
__
1   20903   1   6
2   21963   6   25
4   22042   23  102
8   21674   97  419
16  22281   379 1663
24  22521   857 3748
32  22976   15286594
40  23197   239010239
48  22973   354215074
64  23809   648627244
80  23564   10169   43118
96  22977   14954   62948
128 23649   27067   113892


With higher number of threads running in parallel, SD
increased. In this case most threads run in parallel
only till __dev_xmit_skb (#numtxqs=1). With mq TX patch,
higher number of threads run in parallel through
ndo_start_xmit. I *think* the increase in SD is to do
with higher # of threads running for larger code path
From the numbers I posted with the patch (cut-n-paste
only the % parts), BW increased much more than the SD,
sometimes more than twice the increase in SD.

N#  BW% SD%  RSD%
4   54.30   40.00-1.16
8   71.79   46.59-2.68
16  71.89   50.40-2.50
32  72.24   34.26-14.52
48  70.10   31.51-14.35
64  69.01   38.81-9.66
96  70.68   71.2610.74

I also think SD calculation gets skewed for guest-local
host testing. For this test, I ran a guest with numtxqs=16.
The first result below is with my patch, which creates 16
vhosts. The second result is with a modified patch which
creates only 2 vhosts (testing with #netperfs = 64):

#vhosts  BW% SD%RSD%
16   20.79   186.01 149.74
230.89   34.55  18.44

The remote SD increases with the number of vhost threads,
but that number seems to correlate with guest SD. So though
BW% increased slightly from 20% to 30%, SD fell drastically
from 186% to 34%. I think it could be a calculation skew
with host SD, which also fell from 150% to 18%.

I am planning to submit 2nd patch rev with restricted
number of vhosts.

  Likely cause for the 1 stream degradation with multiple
  vhost patch:
 
  1. Two vhosts run handling the RX and TX respectively.
 I think the issue is related to cache ping-pong esp
 since these run on different cpus/sockets.

 Right. With TCP I think we are better off handling
 TX and RX for a socket by the same vhost, so that
 packet and its ack are handled by the same thread.
 Is this what happens with RX multiqueue patch?
 How do we select an RX queue to put the packet on?

My (unsubmitted) RX patch doesn't do this yet, that is
something I will check.

Thanks,

- KK

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/4] Add a new API to virtio-pci

2010-09-12 Thread Krishna Kumar2
Michael S. Tsirkin m...@redhat.com wrote on 09/12/2010 05:16:37 PM:

 Michael S. Tsirkin m...@redhat.com
 09/12/2010 05:16 PM

 On Thu, Sep 09, 2010 at 07:19:33PM +0530, Krishna Kumar2 wrote:
  Unfortunately I need a
  constant in vhost for now.

 Maybe not even that: you create multiple vhost-net
 devices so vhost-net in kernel does not care about these
 either, right? So this can be just part of vhost_net.h
 in qemu.

Sorry, I didn't understand what you meant.

I can remove all socks[] arrays/constants by pre-allocating
sockets in vhost_setup_vqs. Then I can remove all socks
parameters in vhost_net_stop, vhost_net_release and
vhost_net_reset_owner.

Does this make sense?

Thanks,

- KK

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 18/24] Exiting from L2 to L1

2010-09-12 Thread Sheng Yang
On Monday 13 September 2010 01:21:29 Avi Kivity wrote:
   On 09/12/2010 07:05 PM, Nadav Har'El wrote:
  I don't think so.  We write this field as part of guest entry (that is,
  after the decision re which vmcs to use, yes?), so guest entry will
  always follow writing this field.  Since guest entry clears the field,
  reading it after an exit will necessarily return 0.
  
  Well, you obviously know the KVM code much better than I do, but from
  what I saw, I thought (maybe I misunderstood) that in normal
  (non-nested) KVM, this field only gets written on injection, not on
  every entry, so the code relies on the fact that the processor turns off
  the valid bit during exit, to avoid the same event being injected when
  the same field value is used for another entry.
 
 Correct.
 
  I can only find code which resets this field in vmx_vcpu_reset(),
  but that doesn't get called on every entry, right? Or am I missing
  something?
 
 prepare_vmcs12() is called in response for a 2-1 vmexit which is first
 trapped by 0, yes?  Because it's called immediately after a vmexit,
 VM_ENTRY_INTR_INFO_FIELD is guaranteed to have been cleared by the
 processor.
 
 There are two cases where VM_ENTRY_INTR_INFO_FIELD can potentially not
 be cleared by hardware:
 
 1. if we call prepare_vmcs12() between injection and entry.  This cannot
 happen AFAICT.
 2. if the vmexit was really a failed 1-2 vmentry, and if the processor
 doesn't clear VM_ENTRY_INTR_INFO_FIELD in response to vm entry failures
 (need to check scripture)
 
 If neither of these are valid, the code can be removed.  If only the
 second, we might make it conditional.
 
  What can happen is that the contents of the field is transferred to the
  IDT_VECTORING_INFO field or VM_EXIT_INTR_INFO field.
  
  (question: on a failed vmentry, is this field cleared?)
  
  I don't know the answer :-)
 
 Sheng?

According to SDM 23.7 VM-ENTRY FAILURES DURING OR AFTER LOADING
GUEST STATE:

Although this process resembles that of a VM exit, many steps taken during a VM 
exit do not occur for these VM-entry failures:
• Most VM-exit information fields are not updated (see step 1 above).
• The valid bit in the VM-entry interruption-information field is *not* cleared.
• The guest-state area is not modified.
• No MSRs are saved into the VM-exit MSR-store area.

So VM entry failure would result in _keep_ valid bit of 
VM_ENTRY_INTR_INFO_FIELD.

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html