Re: [PATCH 0/24] Nested VMX, v5

2010-10-17 Thread Avi Kivity

 On 10/17/2010 02:39 PM, Nadav Har'El wrote:

On Sun, Oct 17, 2010, Avi Kivity wrote about "Re: [PATCH 0/24] Nested VMX, v5":
>  >patch. In short, try running the L0 kernel with the "nosmp" option,
>  What are the problems with smp?

Unfortunately, there appears to be a bug which causes KVM with nested VMX to
hang when SMP is enabled, even if you don't try to use more than one CPU for
the guest. I still need to debug this to figure out why.


Well, that seems pretty critical.


>  >   give the
>  >"-cpu host" option to qemu,
>
>  Why is this needed?

Qemu has a list of cpu types, and for each type it lists its features. The
problem is that Qemu doesn't list the "VMX" feature for any of the CPUs,
even those (like core 2 duo). I have a trivial patch to qemu to add the "VMX"
feature to those CPUs, which is harmless even if KVM doesn't support nested
VMX (qemu will drop features which KVM doesn't support). But until I send
such a patch to qemu, the easiest workaround is just to use "-cpu host" -
which will (among other things) tell qemu to emulate a machine which has vmx,
just like the host does.

(I also explained this in the intro to v6 of the patch).


Ok.  I think we can get that patch merged, just so you don't have to 
re-explain it over and over again.  Please post it to qemu-devel.



>
>  >and the "nested=1 ept=0 vpid=0" options to the
>  >kvm-intel module in L0.
>
>  Why are those needed?  Seems trivial to support a nonept guest on an ept
>  host - all you do is switch cr3 during vmentry and vmexit.

nested=1 is needed because you asked for it *not* to be the default :-)

You're right, ept=1 on the host *could* be supported even before nested ept
is supported (this is the mode we called "shadow on ept" in the paper).
But at the moment, I believe it doesn't work correctly. I'll add making this
case work to my TODO list.

I'm not sure why vpid=0 is needed (but I verified that you get a failed entry
if you don't use it). I understood that there was some discussion on what is
the proper way to do nested vpid, and that in the meantime it isn't supported,
but I agree that it should have been possible to use vpid normally to run L1's
but avoid using it when running L2's. Again, I'll need to debug this issue
to understand how difficult it would be to fix this case.


My feeling is the smp and vpid failures are due to bugs.  vpid=0 in 
particular forces a tlb flush on every exit which might mask your true 
bug.  smp might be due to host vcpu migration.  Are we vmclearing the 
right vmcs?


ept=1 may not be due to a bug per se, but my feeling is that it should 
be very easy to implement.  In particular nsvm started out on npt (but 
not nnpt) and had issues with shadow-on-shadow (IIRC).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] Nested VMX, v5

2010-10-17 Thread Nadav Har'El
On Sun, Oct 17, 2010, Avi Kivity wrote about "Re: [PATCH 0/24] Nested VMX, v5":
> >patch. In short, try running the L0 kernel with the "nosmp" option,
> What are the problems with smp?

Unfortunately, there appears to be a bug which causes KVM with nested VMX to
hang when SMP is enabled, even if you don't try to use more than one CPU for
the guest. I still need to debug this to figure out why.

> >  give the
> >"-cpu host" option to qemu,
> 
> Why is this needed?

Qemu has a list of cpu types, and for each type it lists its features. The
problem is that Qemu doesn't list the "VMX" feature for any of the CPUs,
even those (like core 2 duo). I have a trivial patch to qemu to add the "VMX"
feature to those CPUs, which is harmless even if KVM doesn't support nested
VMX (qemu will drop features which KVM doesn't support). But until I send
such a patch to qemu, the easiest workaround is just to use "-cpu host" -
which will (among other things) tell qemu to emulate a machine which has vmx,
just like the host does.

(I also explained this in the intro to v6 of the patch).

> 
> >and the "nested=1 ept=0 vpid=0" options to the
> >kvm-intel module in L0.
> 
> Why are those needed?  Seems trivial to support a nonept guest on an ept 
> host - all you do is switch cr3 during vmentry and vmexit.

nested=1 is needed because you asked for it *not* to be the default :-)

You're right, ept=1 on the host *could* be supported even before nested ept
is supported (this is the mode we called "shadow on ept" in the paper).
But at the moment, I believe it doesn't work correctly. I'll add making this
case work to my TODO list.

I'm not sure why vpid=0 is needed (but I verified that you get a failed entry
if you don't use it). I understood that there was some discussion on what is
the proper way to do nested vpid, and that in the meantime it isn't supported,
but I agree that it should have been possible to use vpid normally to run L1's
but avoid using it when running L2's. Again, I'll need to debug this issue
to understand how difficult it would be to fix this case.

Nadav.

-- 
Nadav Har'El|  Sunday, Oct 17 2010, 9 Heshvan 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Strike not only while the iron is hot,
http://nadav.harel.org.il   |make the iron hot by striking it.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] Nested VMX, v5

2010-10-17 Thread Avi Kivity

 On 10/17/2010 02:03 PM, Nadav Har'El wrote:

On Tue, Jun 15, 2010, Avi Kivity wrote about "Re: [PATCH 0/24] Nested VMX, v5":
>  I've tried to test the patches, but I see a vm-entry failure code 7 on
>  the very first vmentry.  Guest is Fedora 12 x86-64 (2.6.32.9-70.fc12).

Hi, as you can see, I posted a new set of patches, which apply to the current
trunk. Can you please give it another try? Thanks!

Please make sure you follow the instructions in the introduction to the
patch. In short, try running the L0 kernel with the "nosmp" option,


What are the problems with smp?


  give the
"-cpu host" option to qemu,


Why is this needed?


and the "nested=1 ept=0 vpid=0" options to the
kvm-intel module in L0.


Why are those needed?  Seems trivial to support a nonept guest on an ept 
host - all you do is switch cr3 during vmentry and vmexit.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] Nested VMX, v5

2010-10-17 Thread Nadav Har'El
On Tue, Jun 15, 2010, Avi Kivity wrote about "Re: [PATCH 0/24] Nested VMX, v5":
> I've tried to test the patches, but I see a vm-entry failure code 7 on 
> the very first vmentry.  Guest is Fedora 12 x86-64 (2.6.32.9-70.fc12).

Hi, as you can see, I posted a new set of patches, which apply to the current
trunk. Can you please give it another try? Thanks!

Please make sure you follow the instructions in the introduction to the
patch. In short, try running the L0 kernel with the "nosmp" option, give the
"-cpu host" option to qemu, and the "nested=1 ept=0 vpid=0" options to the
kvm-intel module in L0.

Thanks,
Nadav.

-- 
Nadav Har'El|  Sunday, Oct 17 2010, 9 Heshvan 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |This space is for sale - inquire inside.
http://nadav.harel.org.il   |
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] Nested VMX, v5

2010-07-14 Thread Sheng Yang
On Sunday 13 June 2010 20:22:33 Nadav Har'El wrote:
> Hi Avi,
> 
> This is a followup of our nested VMX patches that Orit Wasserman posted in
> December. We've addressed most of the comments and concerns that you and
> others on the mailing list had with the previous patch set. We hope you'll
> find these patches easier to understand, and suitable for applying to KVM.
> 
> 
> The following 24 patches implement nested VMX support. The patches enable a
> guest to use the VMX APIs in order to run its own nested guests. I.e., it
> allows running hypervisors (that use VMX) under KVM. We describe the theory
> behind this work, our implementation, and its performance characteristics,
> in IBM Research report H-0282, "The Turtles Project: Design and
> Implementation of Nested Virtualization", available at:
> 
>   http://bit.ly/a0o9te
> 
> The current patches support running Linux under a nested KVM using shadow
> page table (with bypass_guest_pf disabled). They support multiple nested
> hypervisors, which can run multiple guests. Only 64-bit nested hypervisors
> are supported. SMP is supported. Additional patches for running Windows
> under nested KVM, and Linux under nested VMware server, and support for
> nested EPT, are currently running in the lab, and will be sent as
> follow-on patchsets.

Hi Nadav

Do you have a tree or code base and instruction to try this patchset? I've 
spent 
some time on it, but can't get it right...

--
regards
Yang, Sheng

> 
> These patches were written by:
>  Abel Gordon, abelg  il.ibm.com
>  Nadav Har'El, nyh  il.ibm.com
>  Orit Wasserman, oritw  il.ibm.com
>  Ben-Ami Yassor, benami  il.ibm.com
>  Muli Ben-Yehuda, muli  il.ibm.com
> 
> With contributions by:
>  Anthony Liguori, aliguori  us.ibm.com
>  Mike Day, mdday  us.ibm.com
> 
> This work was inspired by the nested SVM support by Alexander Graf and
> Joerg Roedel.
> 
> 
> Changes since v4:
> * Rebased to the current KVM tree.
> * Support for lazy FPU loading.
> * Implemented about 90 requests and suggestions made on the mailing list
>   regarding the previous version of this patch set.
> * Split the changes into many more, and better documented, patches.
> 
> --
> Nadav Har'El
> IBM Haifa Research Lab
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] Nested VMX, v5

2010-07-11 Thread Avi Kivity

On 07/11/2010 06:39 PM, Nadav Har'El wrote:

On Sun, Jul 11, 2010, Avi Kivity wrote about "Re: [PATCH 0/24] Nested VMX, v5":
   

nesting-
aware L1 guest hypervisors to actually use that internal structure to
modify
vmcs12 directly, without vmread/vmwrite and exits.

   

No, they can't, since (for writes) L0 might cache the information and
not read it again.  For reads, L0 might choose to update vmcs12 on demand.
 

Well, in the current version of the nested code, all L0 does on a L1 vmwrite
is to update the in-memory vmcs12 structure. It doesn't not update vmcs02,
nor cache anything, nor remember what has changed and what hasn't. So replacing
it with a direct write to the memory structure should be fine...
   


Note you said "current version".  What if this later changes?

So, we cannot allow a guest to access vmcs12 directly.  There has to be 
a protocol that allows the guest to know what it can touch and what it 
can't (or, tell the host what the guest touched and what it hasn't).  
Otherwise, we lose the ability to optimize.



Of course, this situation isn't optimal, and we *should* optimize the number of
unnecessary vmwrites L2 entry and exit (and we actually tried some of this
in our tech report), but it's not in the current patch set.  When we do these
kind of optimizations, you're right that:

   

A pvvmread/write needs to communicate with L0 about what fields are
valid (likely using available and dirty bitmaps).
 


It's right even before we do these optimizations, so a pv guest written 
before the optimizations can run on an optimized host.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] Nested VMX, v5

2010-07-11 Thread Nadav Har'El
On Sun, Jul 11, 2010, Avi Kivity wrote about "Re: [PATCH 0/24] Nested VMX, v5":
> >nesting-
> >aware L1 guest hypervisors to actually use that internal structure to 
> >modify
> >vmcs12 directly, without vmread/vmwrite and exits.
> >   
> 
> No, they can't, since (for writes) L0 might cache the information and 
> not read it again.  For reads, L0 might choose to update vmcs12 on demand.

Well, in the current version of the nested code, all L0 does on a L1 vmwrite
is to update the in-memory vmcs12 structure. It doesn't not update vmcs02,
nor cache anything, nor remember what has changed and what hasn't. So replacing
it with a direct write to the memory structure should be fine...

Of course, this situation isn't optimal, and we *should* optimize the number of
unnecessary vmwrites L2 entry and exit (and we actually tried some of this
in our tech report), but it's not in the current patch set.  When we do these
kind of optimizations, you're right that:

> A pvvmread/write needs to communicate with L0 about what fields are 
> valid (likely using available and dirty bitmaps).


-- 
Nadav Har'El|   Sunday, Jul 11 2010, 1 Av 5770
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |If marriage was illegal, only outlaws
http://nadav.harel.org.il   |would have in-laws.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] Nested VMX, v5

2010-07-11 Thread Avi Kivity

On 07/11/2010 11:27 AM, Nadav Har'El wrote:




1: Basically there are 2 diferent type in VMCS, one is defined by hardware,
whose layout is unknown to VMM. Another one is defined by VMM (this patch)
and used for vmcs12.
The former one is using "struct vmcs" to describe its data instance, but the
later one doesn't have a clear definition (or struct vmcs12?). I suggest we
can have a distinguish struct for this, for example "struct sw_vmcs"
(software vmcs), or "struct vvmcs" (virtual vmcs).
 

I decided (but let me know if you have reservations) to use the name
"struct vmcs_fields" for the memory structure that contains the long list of
vmcs fields. I think this name describes the structure's content well.
   


I liked vvmcs myself...


As in the last version of the patches, this list of vmcs fields will not on
its own be vmcs12's structure, because vmcs12, as a spec-compliant vmcs, also
needs to contain a couple of additional fields in its beginning, and we also
need a few more runtime fields.
   


... for the spec-compliant vmcs in L1's memory.


2: vmcsxy (vmcs12, vmcs02, vmcs01) are for instances of either
"struct vmcs", or "struct sw_vmcs", but not for struct Clear distinguish
between data structure and instance helps IMO.
 

I agree with you that using the name "vmcs12" for both the type (struct vmcs12)
and instance of another type (struct vmcs_fields *vmcs12) is somewhat strange,
but I can only think of two alternatives:

1. Invent a new name for "struct vmcs12", say "struct sw_vmcs" as you
suggested. But I think it will just make things less clear, because we
replace the self-explanatory name vmcs12 by a less clear name.

2. Stop separating "struct vmcs_fields" (formerly struct shadow_vmcs) and
"struct vmcs12" which contains it and a few more fields - and instead
put everything in one structure (and call that sw_vmcs or whatever).
   


I like this.


These extra fields will not be useful for vmcs01, but it's not a terrible
waste (because vmcs01 already doesn't use a lot of these fields).
   


You don't really need vmcs01 to be a vvmcs (or sw_vmcsw).  IIRC you only 
need it when copying around vmcss, which you can avoid completely by 
initializing vmcs01 and vmcs02 using common initialization routines for 
the host part.



Personally, I find these two alternatives even less appealing than the
current alternative (with "struct vmcs12" describing vmcs12's type, and
it contains a struct vmcs_fields inside). What do you think?
   


IMO, vmcs_fields is artificial.  As soon as you eliminate the vmcs copy, 
you won't have any use for it, and then you can fold it into its container.



5: guest VMPTRLD emulation. Current patch creates vmcs02 instance each
time when guest VMPTRLD, and free the instance at VMCLEAR. The code may
fail if the vmcs (un-vmcleared) exceeds certain threshold to avoid denial
of service. That is fine, but it brings additional complexity and may pay
with a lot of memory. I think we can emulate using concept of "cached vmcs"
here in case L1 VMM doesn't do vmclear in time.  L0 VMM can simply flush
those vmcs02 to guest memory i.e. vmcs12 per need. For example if the cached
vcs02 exceeds 10, we can do automatically flush.
 

Right. I've already discussed this idea over the list with Avi Kivity, and
it is on my todo list and definitely should be done.
The current approach is simpler, because I don't need to add special code for
rebuilding a forgotten vmcs02 from vmcs12 - the current prepare_vmcs02 only
updates some of the fields, and I'll need to do some testing to figure out
what exactly is missing for a full rebuild.
   


You already support "full rebuild" - that's what happens when you first 
see a vmcs, when you launch a guest.



I think the current code is "good enough" as an ad-interim solution, because
users that follow the spec will not forget to VMCLEAR anyway (and if they
do, only they will suffer). And I wouldn't say that "a lot of memory" is
involved - at worst, an L1 can now cause 256 pages, or 1 MB, to be wasted on
this. More normally, an L1 will only have a few L2 guests, and only spend
a few pages for this - certainly much much less than he'd spend on actually
holding the L2's memory.
   


It's perfectly legitimate for a guest to disappear a vmcs.  It might 
swap it to disk, or move it to a separate NUMA node.  While I don't 
expect the first, the second will probably happen sometime.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] Nested VMX, v5

2010-07-11 Thread Avi Kivity

On 07/11/2010 03:49 PM, Nadav Har'El wrote:


In any case, the obvious problem with this whole idea on VMX is that it
requires a modified guest hypervisor, which reduces its usefulness.
This is why we didn't think we should "advertise" the ability to bypass
vmread/vmwrite in L1 and write directly to the vmcs12's. But Avi Kivity
already asked me to add a document about the vmcs12 internal structure,
and once I've done that, I guess you can now consider it "fair" for nesting-
aware L1 guest hypervisors to actually use that internal structure to modify
vmcs12 directly, without vmread/vmwrite and exits.
   


No, they can't, since (for writes) L0 might cache the information and 
not read it again.  For reads, L0 might choose to update vmcs12 on demand.


A pvvmread/write needs to communicate with L0 about what fields are 
valid (likely using available and dirty bitmaps).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] Nested VMX, v5

2010-07-11 Thread Nadav Har'El
On Sun, Jul 11, 2010, Alexander Graf wrote about "Re: [PATCH 0/24] Nested VMX, 
v5":
> Thinking about this - it would be perfectly legal to split the VMCS into two 
> separate structs, right? You could have one struct that you map directly into 
> the guest, so modifications to that struct don't trap. Of course the l1 guest 
> shouldn't be able to modify all fields of the VMCS, so you'd still keep a 
> second struct around with shadow fields. While at it, also add a bitmap to 
> store the dirtyness status of your fields in, if you need that.
> 
> That way a nesting aware guest could use a PV memory write instead of the 
> (slow) instruction emulation. That should dramatically speed up nesting vmx.

Hi,

We already tried this idea, and described the results in our tech report
(see http://www.mulix.org/pubs/turtles/h-0282.pdf). 

We didn't do things quite as cleanly as you suggested - we didn't split the
structure and make only part of it available directly to the guest. Rather,
we  only did what we had to do to get the performance improvement: We modified
L1 to access the VMCS directly, assuming the nested's vmcs12 structure layout,
instead of calling vmread/vmwrite.

As you can see in the various benchmarks in section 4 (Evaluation) of the
report, the so-called PV vmread/vmwrite method had a noticable, though perhaps
not as dramatic as you hoped, effect. For example, for the kernbench benchmark,
nested kvm overhead (over single-level kvm virtualization) came down from
14.5% to 10.3%, and for the specjbb benchmark, the overhead came down from
7.8% to 6.3%. In a microbenchmark less representative of real-life workloads,
we were able to measure a halving of the overhead by adding the PV
vmread/vmwrite.

In any case, the obvious problem with this whole idea on VMX is that it
requires a modified guest hypervisor, which reduces its usefulness.
This is why we didn't think we should "advertise" the ability to bypass
vmread/vmwrite in L1 and write directly to the vmcs12's. But Avi Kivity
already asked me to add a document about the vmcs12 internal structure,
and once I've done that, I guess you can now consider it "fair" for nesting-
aware L1 guest hypervisors to actually use that internal structure to modify
vmcs12 directly, without vmread/vmwrite and exits.

By the way, I see on the KVM Forum 2010 schedule that Eddie Dong will be
talking about "Examining KVM as Nested Virtualization Friendly Guest".
I'm looking forward to reading the proceedings (unfortunately, I won't be
able to travel to the actual meeting).

Nadav.


-- 
Nadav Har'El|  Sunday, Jul 11 2010, 29 Tammuz 5770
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |I used to work in a pickle factory, until
http://nadav.harel.org.il   |I got canned.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] Nested VMX, v5

2010-07-11 Thread Alexander Graf

On 11.07.2010, at 10:27, Nadav Har'El wrote:

> On Fri, Jul 09, 2010, Dong, Eddie wrote about "RE: [PATCH 0/24] Nested VMX, 
> v5":
>>  Thnaks for the posting and in general the patches are well written.
>> I like the concept of VMCSxy and I feel it is pretty clear (better than my
>> previous naming as well), but there are some confusing inside, especially
>> for the term "shadow" which I feel quit hard.
> 
> Hi, and thanks for the excellent ideas. As you saw, I indeed started to
> convert and converge the old terminology (including that ambiguous term
> "shadow") into the new names vmcs01, vmcs02, vmcs12 - names which we
> introduced in our technical report.
> But I have not gone all the way with these changes. I should have, and I'll
> do it now.
> 
>> 1: Basically there are 2 diferent type in VMCS, one is defined by hardware,
>> whose layout is unknown to VMM. Another one is defined by VMM (this patch)
>> and used for vmcs12.
>> The former one is using "struct vmcs" to describe its data instance, but the
>> later one doesn't have a clear definition (or struct vmcs12?). I suggest we
>> can have a distinguish struct for this, for example "struct sw_vmcs"
>> (software vmcs), or "struct vvmcs" (virtual vmcs).
> 
> I decided (but let me know if you have reservations) to use the name
> "struct vmcs_fields" for the memory structure that contains the long list of
> vmcs fields. I think this name describes the structure's content well.
> 
> As in the last version of the patches, this list of vmcs fields will not on
> its own be vmcs12's structure, because vmcs12, as a spec-compliant vmcs, also
> needs to contain a couple of additional fields in its beginning, and we also
> need a few more runtime fields.

Thinking about this - it would be perfectly legal to split the VMCS into two 
separate structs, right? You could have one struct that you map directly into 
the guest, so modifications to that struct don't trap. Of course the l1 guest 
shouldn't be able to modify all fields of the VMCS, so you'd still keep a 
second struct around with shadow fields. While at it, also add a bitmap to 
store the dirtyness status of your fields in, if you need that.

That way a nesting aware guest could use a PV memory write instead of the 
(slow) instruction emulation. That should dramatically speed up nesting vmx.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] Nested VMX, v5

2010-07-11 Thread Nadav Har'El
On Fri, Jul 09, 2010, Dong, Eddie wrote about "RE: [PATCH 0/24] Nested VMX, v5":
>   Thnaks for the posting and in general the patches are well written.
> I like the concept of VMCSxy and I feel it is pretty clear (better than my
> previous naming as well), but there are some confusing inside, especially
> for the term "shadow" which I feel quit hard.

Hi, and thanks for the excellent ideas. As you saw, I indeed started to
convert and converge the old terminology (including that ambiguous term
"shadow") into the new names vmcs01, vmcs02, vmcs12 - names which we
introduced in our technical report.
But I have not gone all the way with these changes. I should have, and I'll
do it now.

> 1: Basically there are 2 diferent type in VMCS, one is defined by hardware,
> whose layout is unknown to VMM. Another one is defined by VMM (this patch)
> and used for vmcs12.
> The former one is using "struct vmcs" to describe its data instance, but the
> later one doesn't have a clear definition (or struct vmcs12?). I suggest we
> can have a distinguish struct for this, for example "struct sw_vmcs"
> (software vmcs), or "struct vvmcs" (virtual vmcs).

I decided (but let me know if you have reservations) to use the name
"struct vmcs_fields" for the memory structure that contains the long list of
vmcs fields. I think this name describes the structure's content well.

As in the last version of the patches, this list of vmcs fields will not on
its own be vmcs12's structure, because vmcs12, as a spec-compliant vmcs, also
needs to contain a couple of additional fields in its beginning, and we also
need a few more runtime fields.

>   2: vmcsxy (vmcs12, vmcs02, vmcs01) are for instances of either
> "struct vmcs", or "struct sw_vmcs", but not for struct Clear distinguish
> between data structure and instance helps IMO.

I agree with you that using the name "vmcs12" for both the type (struct vmcs12)
and instance of another type (struct vmcs_fields *vmcs12) is somewhat strange,
but I can only think of two alternatives:

1. Invent a new name for "struct vmcs12", say "struct sw_vmcs" as you
   suggested. But I think it will just make things less clear, because we
   replace the self-explanatory name vmcs12 by a less clear name.

2. Stop separating "struct vmcs_fields" (formerly struct shadow_vmcs) and
   "struct vmcs12" which contains it and a few more fields - and instead 
   put everything in one structure (and call that sw_vmcs or whatever).
   These extra fields will not be useful for vmcs01, but it's not a terrible
   waste (because vmcs01 already doesn't use a lot of these fields).

Personally, I find these two alternatives even less appealing than the
current alternative (with "struct vmcs12" describing vmcs12's type, and
it contains a struct vmcs_fields inside). What do you think?

> 3: We may use prefix or suffix in addition to vmcsxy to explictly state the
> format of that instance. For example vmcs02 in current patch is for hardware
> use, hence it is an instance "struct vmcs", but vmcs01 is an instance of
> "struct sw_vmcs". Postfix and prefix helps to make better understand.

I agree. After changing the old name struct shadow_vmcs to vmcs_fields, now
I can use a name like vmcs01_fields for the old l1_shadow_vmcs (memory copy
of vmcs01's fields) and vmcs01 for the old l1_vmcs (the actual hardware VMCS
used to run L1). This is is indeed more readable, thanks.

> 4: Rename l2_vmcs to vmcs02, l1_shadow_vmcs to vmcs01, l1_vmcs to
> vmcs02, with prefix/postfix can strengthen above concept of vmcsxy.

Good ideas.

renamed l2_vmcs, l2_vmcs_list, and the likes, to vmcs02.

Renamed l1_shadow_vmcs to vmcs01_fields, ands l1_vmcs to vmcs01 (NOT vmcs02).

renamed l2_shadow_vmcs, l2svmcs, nested_vmcs, and the likes, to vmcs12
(I decided not to use the longer name vmcs12_fields, because I don't think it
adds any clarity). I also renamed get_shadow_vmcs to get_vmcs12_fields.

> 5: guest VMPTRLD emulation. Current patch creates vmcs02 instance each
> time when guest VMPTRLD, and free the instance at VMCLEAR. The code may
> fail if the vmcs (un-vmcleared) exceeds certain threshold to avoid denial
> of service. That is fine, but it brings additional complexity and may pay
> with a lot of memory. I think we can emulate using concept of "cached vmcs"
> here in case L1 VMM doesn't do vmclear in time.  L0 VMM can simply flush
> those vmcs02 to guest memory i.e. vmcs12 per need. For example if the cached
> vcs02 exceeds 10, we can do automatically flush.

Right. I've already discussed this idea over the list with Avi Kivity, and
it is on my todo list and definitely should be done.
The current approach is simpler, because I d

RE: [PATCH 0/24] Nested VMX, v5

2010-07-09 Thread Dong, Eddie
Nadav Har'El wrote:
> Hi Avi,
> 
> This is a followup of our nested VMX patches that Orit Wasserman
> posted in December. We've addressed most of the comments and concerns
> that you and others on the mailing list had with the previous patch
> set. We hope you'll find these patches easier to understand, and
> suitable for applying to KVM. 
> 
> 
> The following 24 patches implement nested VMX support. The patches
> enable a guest to use the VMX APIs in order to run its own nested
> guests. I.e., it allows running hypervisors (that use VMX) under KVM.
> We describe the theory behind this work, our implementation, and its
> performance characteristics, 
> in IBM Research report H-0282, "The Turtles Project: Design and
> Implementation of Nested Virtualization", available at:
> 
>   http://bit.ly/a0o9te
> 
> The current patches support running Linux under a nested KVM using
> shadow page table (with bypass_guest_pf disabled). They support
> multiple nested hypervisors, which can run multiple guests. Only
> 64-bit nested hypervisors are supported. SMP is supported. Additional
> patches for running Windows under nested KVM, and Linux under nested
> VMware server, and support for nested EPT, are currently running in
> the lab, and will be sent as follow-on patchsets. 
> 

Nadav & All:
Thnaks for the posting and in general the patches are well written. I 
like the concept of VMCSxy and I feel it is pretty clear (better than my 
previous naming as well), but there are some confusing inside, especially for 
the term "shadow" which I feel quit hard.

Comments from me:
1: Basically there are 2 diferent type in VMCS, one is defined by 
hardware, whose layout is unknown to VMM. Another one is defined by VMM (this 
patch) and used for vmcs12.

The former one is using "struct vmcs" to describe its data instance, 
but the later one doesn't have a clear definition (or struct vmcs12?). I 
suggest we can have a distinguish struct for this, for example "struct 
sw_vmcs"(software vmcs), or "struct vvmcs" (virtual vmcs).

2: vmcsxy (vmcs12, vmcs02, vmcs01) are for instances of either "struct 
vmcs", or "struct sw_vmcs", but not for struct Clear distinguish between data 
structure and instance helps IMO.

3: We may use prefix or suffix in addition to vmcsxy to explictly state 
the format of that instance. For example vmcs02 in current patch is for 
hardware use, hence it is an instance "struct vmcs", but vmcs01 is an instance 
of "struct sw_vmcs". Postfix and prefix helps to make better understand.

4: Rename l2_vmcs to vmcs02, l1_shadow_vmcs to vmcs01, l1_vmcs to 
vmcs02, with prefix/postfix can strengthen above concept of vmcsxy.


5: guest VMPTRLD emulation. Current patch creates vmcs02 instance each 
time when guest VMPTRLD, and free the instance at VMCLEAR. The code may fail if 
the vmcs (un-vmcleared) exceeds certain threshold to avoid denial of service. 
That is fine, but it brings additional complexity and may pay with a lot of 
memory. I think we can emulate using concept of "cached vmcs" here in case L1 
VMM doesn't do vmclear in time.  L0 VMM can simply flush those vmcs02 to guest 
memory i.e. vmcs12 per need. For example if the cached vcs02 exceeds 10, we can 
do automatically flush.


Thx, Eddie




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] Nested VMX, v5

2010-06-15 Thread Avi Kivity

On 06/14/2010 04:03 PM, Nadav Har'El wrote:



Let's try to get this merged quickly.
 

I'll start fixing the individual patches and resending them individually, and
when I've fixed everything I'll resubmit the whole lot. I hope that this time
I can do it in a matter of days, not months.
   


I've tried to test the patches, but I see a vm-entry failure code 7 on 
the very first vmentry.  Guest is Fedora 12 x86-64 (2.6.32.9-70.fc12).


If you can post a git tree with the next round, that will make it easier 
for people experimenting with the patches.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] Nested VMX, v5

2010-06-14 Thread Nadav Har'El
On Mon, Jun 14, 2010, Avi Kivity wrote about "Re: [PATCH 0/24] Nested VMX, v5":
> Overall, very nice.  The finer split and better documentation really 
> help reviewing, thanks.

Thank you for the review and all the accurate comments!

> Let's try to get this merged quickly.

I'll start fixing the individual patches and resending them individually, and
when I've fixed everything I'll resubmit the whole lot. I hope that this time
I can do it in a matter of days, not months.

Thanks,
Nadav.

-- 
Nadav Har'El|   Monday, Jun 14 2010, 2 Tammuz 5770
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |An egotist is a person of low taste, more
http://nadav.harel.org.il   |interested in himself than in me.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] Nested VMX, v5

2010-06-14 Thread Avi Kivity

On 06/13/2010 03:22 PM, Nadav Har'El wrote:

Hi Avi,

This is a followup of our nested VMX patches that Orit Wasserman posted in
December. We've addressed most of the comments and concerns that you and
others on the mailing list had with the previous patch set. We hope you'll
find these patches easier to understand, and suitable for applying to KVM.


The following 24 patches implement nested VMX support. The patches enable a
guest to use the VMX APIs in order to run its own nested guests. I.e., it
allows running hypervisors (that use VMX) under KVM. We describe the theory
behind this work, our implementation, and its performance characteristics,
in IBM Research report H-0282, "The Turtles Project: Design and Implementation
of Nested Virtualization", available at:

http://bit.ly/a0o9te

The current patches support running Linux under a nested KVM using shadow
page table (with bypass_guest_pf disabled). They support multiple nested
hypervisors, which can run multiple guests. Only 64-bit nested hypervisors
are supported. SMP is supported. Additional patches for running Windows under
nested KVM, and Linux under nested VMware server, and support for nested EPT,
are currently running in the lab, and will be sent as follow-on patchsets.

These patches were written by:
  Abel Gordon, abelg  il.ibm.com
  Nadav Har'El, nyh  il.ibm.com
  Orit Wasserman, oritw  il.ibm.com
  Ben-Ami Yassor, benami  il.ibm.com
  Muli Ben-Yehuda, muli  il.ibm.com

With contributions by:
  Anthony Liguori, aliguori  us.ibm.com
  Mike Day, mdday  us.ibm.com

This work was inspired by the nested SVM support by Alexander Graf and Joerg
Roedel.


Changes since v4:
* Rebased to the current KVM tree.
* Support for lazy FPU loading.
* Implemented about 90 requests and suggestions made on the mailing list
   regarding the previous version of this patch set.
* Split the changes into many more, and better documented, patches.

   


Overall, very nice.  The finer split and better documentation really 
help reviewing, thanks.


Let's try to get this merged quickly.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html