Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-24 Thread Nadav Har'El
On Tue, May 24, 2011, Tian, Kevin wrote about RE: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
  +static inline void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs)
  +{
  +   vmcs_clear(loaded_vmcs-vmcs);
  +   loaded_vmcs-cpu = -1;
  +   loaded_vmcs-launched = 0;
  +}
  +
 
 call it vmcs_init instead since you now remove original vmcs_init invocation,
 which more reflect the necessity of adding VMCLEAR for a new vmcs?

The best name for this function, I think, would have been loaded_vmcs_clear,
because this function isn't necessarily used to init - it's also called to
VMCLEAR an old vmcs (and flush its content back to memory) - in that sense
it is definitely not a vmcs_init.

Unfortunately, I already have a whole chain of functions with this name :(
the existing loaded_vmcs_clear() does an IPI to the CPU which has this VMCS
loaded, and causes it to run __loaded_vmcs_clear(), which in turn calls
the above loaded_vmcs_init(). I wish I could call all three functions
loaded_vmcs_clear(), but of course I can't. If anyone reading this has a good
suggestion on how to name these three functions, please let me know.

  +static void __loaded_vmcs_clear(void *arg)
   {
  -   struct vcpu_vmx *vmx = arg;
  +   struct loaded_vmcs *loaded_vmcs = arg;
  int cpu = raw_smp_processor_id();
  
  -   if (vmx-vcpu.cpu == cpu)
  -   vmcs_clear(vmx-vmcs);
  -   if (per_cpu(current_vmcs, cpu) == vmx-vmcs)
  +   if (loaded_vmcs-cpu != cpu)
  +   return; /* cpu migration can race with cpu offline */
 
 what do you mean by cpu migration here? why does 'cpu offline'
 matter here regarding to the cpu change?

__loaded_vmcs_clear() is typically called in one of two cases: cpu migration
means that a guest that used to run on one CPU, and had its VMCS loaded there,
suddenly needs to run on a different CPU, so we need to clear the VMCS on
the old CPU. cpu offline means that we want to take a certain CPU offline,
and before we do that we should VMCLEAR all VMCSs which were loaded on it.

The (vmx-cpu.cpu != cpu) case in __loaded_vmcs_clear should ideally never
happen: In the cpu offline path, we only call it for the loaded_vmcss which
we know for sure are loaded on the current cpu. In the cpu migration path,
loaded_vmcs_clear runs __loaded_vmcs_clear on the right CPU, which ensures that
equality.

But, there can be a race condition (this was actually explained to me a while
back by Avi - I never seen this happening in practice): Imagine that cpu
migration calls loaded_vmcs_clear, which tells the old cpu (via IPI) to
VMCLEAR this vmcs. But before that old CPU gets a chance to act on that IPI,
a decision is made to take it offline, and all loaded_vmcs loaded on it
(including the one in question) are cleared. When that CPU acts on this IPI,
it notices that vmx-cpu.cpu==-1, i.e., != cpu, so it doesn't need to do
anything (in the new version of the code, I made this more explicit, by
returning immediately in this case).

At least this is the theory. As I said, I didn't see this problem in practice
(unsurprising, since I never offlined any CPU). Maybe Avi or someone else can
comment more about this (vmx-cpu.cpu == cpu) check, which existed before
my patch - in __vcpu_clear().

  +static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs)
  +{
  +   if (!loaded_vmcs-vmcs)
  +   return;
  +   loaded_vmcs_clear(loaded_vmcs);
  +   free_vmcs(loaded_vmcs-vmcs);
  +   loaded_vmcs-vmcs = NULL;
  +}
 
 prefer to not do cleanup work through loaded_vmcs since it's just a pointer
 to a loaded vmcs structure. Though you can carefully arrange the nested
 vmcs cleanup happening before it, it's not very clean and a bit error prone
 simply from this function itself. It's clearer to directly cleanup vmcs01, and
 if you want an assertion could be added to make sure loaded_vmcs doesn't
 point to any stale vmcs02 structure after nested cleanup step.

I'm afraid I didn't understand what you meant here... Basically, this
free_loaded_vmcs() is just a shortcut for loaded_vmcs_clear() and free_vmcs(),
as doing both is needed in 3 places: nested_free_vmcs02,
nested_free_all_saved_vmcss, vmx_free_vcpu. The same function is needed
for both vmcs01 and vmcs02 VMCSs - in both cases when we don't need them any
more we need to VMCLEAR them and then free the VMCS memory. Note that this
function does *not* free the loaded_vmcs structure itself.

What's wrong with this?
Would you prefer that I remove this function and explictly call
loaded_vmcs_clear() and then free_vmcs() in all three places?

Thanks,
Nadav.

-- 
Nadav Har'El|  Tuesday, May 24 2011, 20 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Linux: Because a PC is a terrible thing
http://nadav.harel.org.il   |to waste.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info

RE: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-24 Thread Tian, Kevin
 From: Nadav Har'El [mailto:n...@math.technion.ac.il]
 Sent: Tuesday, May 24, 2011 3:57 PM
 
 On Tue, May 24, 2011, Tian, Kevin wrote about RE: [PATCH 08/31] nVMX: Fix
 local_vcpus_link handling:
   +static inline void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs)
   +{
   + vmcs_clear(loaded_vmcs-vmcs);
   + loaded_vmcs-cpu = -1;
   + loaded_vmcs-launched = 0;
   +}
   +
 
  call it vmcs_init instead since you now remove original vmcs_init 
  invocation,
  which more reflect the necessity of adding VMCLEAR for a new vmcs?
 
 The best name for this function, I think, would have been loaded_vmcs_clear,
 because this function isn't necessarily used to init - it's also called to
 VMCLEAR an old vmcs (and flush its content back to memory) - in that sense
 it is definitely not a vmcs_init.
 
 Unfortunately, I already have a whole chain of functions with this name :(
 the existing loaded_vmcs_clear() does an IPI to the CPU which has this VMCS
 loaded, and causes it to run __loaded_vmcs_clear(), which in turn calls
 the above loaded_vmcs_init(). I wish I could call all three functions
 loaded_vmcs_clear(), but of course I can't. If anyone reading this has a good
 suggestion on how to name these three functions, please let me know.

how about loaded_vmcs_reset?

 
   +static void __loaded_vmcs_clear(void *arg)
{
   - struct vcpu_vmx *vmx = arg;
   + struct loaded_vmcs *loaded_vmcs = arg;
 int cpu = raw_smp_processor_id();
  
   - if (vmx-vcpu.cpu == cpu)
   - vmcs_clear(vmx-vmcs);
   - if (per_cpu(current_vmcs, cpu) == vmx-vmcs)
   + if (loaded_vmcs-cpu != cpu)
   + return; /* cpu migration can race with cpu offline */
 
  what do you mean by cpu migration here? why does 'cpu offline'
  matter here regarding to the cpu change?
 
 __loaded_vmcs_clear() is typically called in one of two cases: cpu migration
 means that a guest that used to run on one CPU, and had its VMCS loaded
 there,
 suddenly needs to run on a different CPU, so we need to clear the VMCS on
 the old CPU. cpu offline means that we want to take a certain CPU offline,
 and before we do that we should VMCLEAR all VMCSs which were loaded on it.

So here you need explicitly differentiate a vcpu and a real cpu. For the 1st 
case
it's just 'vcpu migration, and the latter it's the real 'cpu offline'. 'cpu 
migration' 
is generally a RAS feature in mission critical world. :-) 

 
 The (vmx-cpu.cpu != cpu) case in __loaded_vmcs_clear should ideally never
 happen: In the cpu offline path, we only call it for the loaded_vmcss which
 we know for sure are loaded on the current cpu. In the cpu migration path,
 loaded_vmcs_clear runs __loaded_vmcs_clear on the right CPU, which ensures
 that
 equality.
 
 But, there can be a race condition (this was actually explained to me a while
 back by Avi - I never seen this happening in practice): Imagine that cpu
 migration calls loaded_vmcs_clear, which tells the old cpu (via IPI) to
 VMCLEAR this vmcs. But before that old CPU gets a chance to act on that IPI,
 a decision is made to take it offline, and all loaded_vmcs loaded on it
 (including the one in question) are cleared. When that CPU acts on this IPI,
 it notices that vmx-cpu.cpu==-1, i.e., != cpu, so it doesn't need to do
 anything (in the new version of the code, I made this more explicit, by
 returning immediately in this case).

the reverse also holds true. Right between the point where cpu_offline hits
a loaded_vmcs and the point where it calls __loaded_vmcs_clear, it's possible
that the vcpu is migrated to another cpu, and it's likely that migration path
(vmx_vcpu_load) has invoked loaded_vmcs_clear but hasn't delete this vmcs
from old cpu's linked list. This way later when __loaded_vmcs_clear is
invoked on the offlined cpu, there's still chance to observe cpu as -1.

 
 At least this is the theory. As I said, I didn't see this problem in practice
 (unsurprising, since I never offlined any CPU). Maybe Avi or someone else can
 comment more about this (vmx-cpu.cpu == cpu) check, which existed before
 my patch - in __vcpu_clear().

I agree this check is necessary, but just want you to make the comment clear
with right term.

 
   +static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs)
   +{
   + if (!loaded_vmcs-vmcs)
   + return;
   + loaded_vmcs_clear(loaded_vmcs);
   + free_vmcs(loaded_vmcs-vmcs);
   + loaded_vmcs-vmcs = NULL;
   +}
 
  prefer to not do cleanup work through loaded_vmcs since it's just a pointer
  to a loaded vmcs structure. Though you can carefully arrange the nested
  vmcs cleanup happening before it, it's not very clean and a bit error prone
  simply from this function itself. It's clearer to directly cleanup vmcs01, 
  and
  if you want an assertion could be added to make sure loaded_vmcs doesn't
  point to any stale vmcs02 structure after nested cleanup step.
 
 I'm afraid I didn't understand what you meant here... Basically, this
 free_loaded_vmcs() is just a shortcut for loaded_vmcs_clear

Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-24 Thread Roedel, Joerg
On Mon, May 23, 2011 at 12:51:55PM -0400, Avi Kivity wrote:
 On 05/23/2011 07:43 PM, Roedel, Joerg wrote:
  On Mon, May 23, 2011 at 11:49:17AM -0400, Avi Kivity wrote:
 
Joerg, is
  
 if (unlikely(cpu != vcpu-cpu)) {
 svm-asid_generation = 0;
 mark_all_dirty(svm-vmcb);
 }
  
susceptible to cpu offline/online?
 
  I don't think so. This should be safe for cpu offline/online as long as
  the cpu-number value is not reused for another physical cpu. But that
  should be the case afaik.
 
 
 Why not? offline/online does reuse cpu numbers AFAIK (and it must, if 
 you have a fully populated machine and offline/online just one cpu).

Yes, you are right. There is a slight possibility that the asid is not
updated when a vcpu has asid_generation == 1 and hasn't been running on
another cpu while this given cpu was offlined/onlined. Very unlikely,
but we can not rule it out.

Probably we should make the local_vcpu_list from vmx generic, use it
from svm  and fix it this way.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-24 Thread Nadav Har'El
On Tue, May 24, 2011, Roedel, Joerg wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
 Probably we should make the local_vcpu_list from vmx generic, use it
 from svm  and fix it this way.

The point is, local_vcpu_list is now gone, replaced by a loaded_vmcss_on_cpu,
and vcpu-cpu is not set to -1 for any vcpu when a CPU is offlined - also in
VMX...

-- 
Nadav Har'El|  Tuesday, May 24 2011, 20 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |In case of emergency, this box may be
http://nadav.harel.org.il   |used as a quotation device.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-24 Thread Roedel, Joerg
On Tue, May 24, 2011 at 05:28:38AM -0400, Nadav Har'El wrote:
 On Tue, May 24, 2011, Roedel, Joerg wrote about Re: [PATCH 08/31] nVMX: Fix 
 local_vcpus_link handling:
  Probably we should make the local_vcpu_list from vmx generic, use it
  from svm  and fix it this way.
 
 The point is, local_vcpu_list is now gone, replaced by a loaded_vmcss_on_cpu,
 and vcpu-cpu is not set to -1 for any vcpu when a CPU is offlined - also in
 VMX...

loaded_vmcss_on_cpu sound similar, probably this can be generalized. Is
this code already upstream or is this changed with your nVMX patch-set?

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-24 Thread Avi Kivity

On 05/24/2011 12:57 PM, Roedel, Joerg wrote:

On Tue, May 24, 2011 at 05:28:38AM -0400, Nadav Har'El wrote:
  On Tue, May 24, 2011, Roedel, Joerg wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
Probably we should make the local_vcpu_list from vmx generic, use it
from svm  and fix it this way.

  The point is, local_vcpu_list is now gone, replaced by a loaded_vmcss_on_cpu,
  and vcpu-cpu is not set to -1 for any vcpu when a CPU is offlined - also in
  VMX...

loaded_vmcss_on_cpu sound similar, probably this can be generalized.


It's not the same: there is a main:1 relationship between vmcss and 
vcpus (like vmcbs and vcpus).


However, it may be that the general case for svm also needs to treat 
individual vmcbs differently.




Is
this code already upstream or is this changed with your nVMX patch-set?



Not upstream yet (however generalization, if needed, will be done after 
it's upstream).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-24 Thread Nadav Har'El
On Tue, May 24, 2011, Roedel, Joerg wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
 loaded_vmcss_on_cpu sound similar, probably this can be generalized.

I don't think so - now that a VCPU may have several VMCSs (L1, L2), each
of those may be loaded on a different cpu so we have a list of VMCSs
(the new loaded_vmcs structure), not vcpus. When we offline a CPU, we recall
all VMCSs loaded on it from this list, and clear them; We mark cpu=-1 for
each of those vmcs, but vcpu-cpu remains untouched (and not set to -1)
for all the vcpus.

 Is this code already upstream or is this changed with your nVMX patch-set?

Avi asked me to send the patch that does this *before* nvmx. But he did not
yet merge it.


-- 
Nadav Har'El|  Tuesday, May 24 2011, 20 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |If you notice this notice, you'll notice
http://nadav.harel.org.il   |it's not worth noticing but is noticable.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-24 Thread Avi Kivity

On 05/24/2011 11:20 AM, Tian, Kevin wrote:


  The (vmx-cpu.cpu != cpu) case in __loaded_vmcs_clear should ideally never
  happen: In the cpu offline path, we only call it for the loaded_vmcss which
  we know for sure are loaded on the current cpu. In the cpu migration path,
  loaded_vmcs_clear runs __loaded_vmcs_clear on the right CPU, which ensures
  that
  equality.

  But, there can be a race condition (this was actually explained to me a while
  back by Avi - I never seen this happening in practice): Imagine that cpu
  migration calls loaded_vmcs_clear, which tells the old cpu (via IPI) to
  VMCLEAR this vmcs. But before that old CPU gets a chance to act on that IPI,
  a decision is made to take it offline, and all loaded_vmcs loaded on it
  (including the one in question) are cleared. When that CPU acts on this IPI,
  it notices that vmx-cpu.cpu==-1, i.e., != cpu, so it doesn't need to do
  anything (in the new version of the code, I made this more explicit, by
  returning immediately in this case).

the reverse also holds true. Right between the point where cpu_offline hits
a loaded_vmcs and the point where it calls __loaded_vmcs_clear, it's possible
that the vcpu is migrated to another cpu, and it's likely that migration path
(vmx_vcpu_load) has invoked loaded_vmcs_clear but hasn't delete this vmcs
from old cpu's linked list. This way later when __loaded_vmcs_clear is
invoked on the offlined cpu, there's still chance to observe cpu as -1.


I don't think it's possible.  Both calls are done with interrupts disabled.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-24 Thread Tian, Kevin
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Tuesday, May 24, 2011 7:06 PM
 
 On 05/24/2011 11:20 AM, Tian, Kevin wrote:
  
The (vmx-cpu.cpu != cpu) case in __loaded_vmcs_clear should ideally
 never
happen: In the cpu offline path, we only call it for the loaded_vmcss 
   which
we know for sure are loaded on the current cpu. In the cpu migration
 path,
loaded_vmcs_clear runs __loaded_vmcs_clear on the right CPU, which
 ensures
that
equality.
  
But, there can be a race condition (this was actually explained to me a
 while
back by Avi - I never seen this happening in practice): Imagine that cpu
migration calls loaded_vmcs_clear, which tells the old cpu (via IPI) to
VMCLEAR this vmcs. But before that old CPU gets a chance to act on that
 IPI,
a decision is made to take it offline, and all loaded_vmcs loaded on it
(including the one in question) are cleared. When that CPU acts on this
 IPI,
it notices that vmx-cpu.cpu==-1, i.e., != cpu, so it doesn't need to do
anything (in the new version of the code, I made this more explicit, by
returning immediately in this case).
 
  the reverse also holds true. Right between the point where cpu_offline hits
  a loaded_vmcs and the point where it calls __loaded_vmcs_clear, it's 
  possible
  that the vcpu is migrated to another cpu, and it's likely that migration 
  path
  (vmx_vcpu_load) has invoked loaded_vmcs_clear but hasn't delete this vmcs
  from old cpu's linked list. This way later when __loaded_vmcs_clear is
  invoked on the offlined cpu, there's still chance to observe cpu as -1.
 
 I don't think it's possible.  Both calls are done with interrupts disabled.

If that's the case then there's another potential issue. Deadlock may happen
when calling smp_call_function_single with interrupt disabled. 

Thanks
Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-24 Thread Avi Kivity

On 05/24/2011 02:20 PM, Tian, Kevin wrote:

  I don't think it's possible.  Both calls are done with interrupts disabled.

If that's the case then there's another potential issue. Deadlock may happen
when calling smp_call_function_single with interrupt disabled.


We don't do that.  vcpu migration calls vcpu_clear() with interrupts 
enabled, which then calls smp_call_function_single(), which calls 
__vcpu_clear() with interrupts disabled.  vmclear_local_vcpus() is 
called from interrupts disabled (and calls __vcpu_clear() directly).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-24 Thread Tian, Kevin
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Tuesday, May 24, 2011 7:27 PM
 
 On 05/24/2011 02:20 PM, Tian, Kevin wrote:
I don't think it's possible.  Both calls are done with interrupts 
   disabled.
 
  If that's the case then there's another potential issue. Deadlock may happen
  when calling smp_call_function_single with interrupt disabled.
 
 We don't do that.  vcpu migration calls vcpu_clear() with interrupts
 enabled, which then calls smp_call_function_single(), which calls
 __vcpu_clear() with interrupts disabled.  vmclear_local_vcpus() is
 called from interrupts disabled (and calls __vcpu_clear() directly).
 

OK, that's clear to me now. 

Thanks
Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-24 Thread Avi Kivity

On 05/24/2011 02:30 PM, Tian, Kevin wrote:


  We don't do that.  vcpu migration calls vcpu_clear() with interrupts
  enabled, which then calls smp_call_function_single(), which calls
  __vcpu_clear() with interrupts disabled.  vmclear_local_vcpus() is
  called from interrupts disabled (and calls __vcpu_clear() directly).


OK, that's clear to me now.


Are there still open issues about the patch?

(Nadav, please post patches in the future in new threads so they're 
easier to find)


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-24 Thread Tian, Kevin
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Tuesday, May 24, 2011 7:37 PM
 
 On 05/24/2011 02:30 PM, Tian, Kevin wrote:
  
We don't do that.  vcpu migration calls vcpu_clear() with interrupts
enabled, which then calls smp_call_function_single(), which calls
__vcpu_clear() with interrupts disabled.  vmclear_local_vcpus() is
called from interrupts disabled (and calls __vcpu_clear() directly).
  
 
  OK, that's clear to me now.
 
 Are there still open issues about the patch?
 
 (Nadav, please post patches in the future in new threads so they're
 easier to find)
 

I'm fine with this patch except that Nadav needs to clarify the comment
in __loaded_vmcs_clear (regarding to 'cpu migration' and 'cpu offline' part
which I replied in another mail)

Thanks
Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-24 Thread Nadav Har'El
On Tue, May 24, 2011, Tian, Kevin wrote about RE: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
 I'm fine with this patch except that Nadav needs to clarify the comment
 in __loaded_vmcs_clear (regarding to 'cpu migration' and 'cpu offline' part
 which I replied in another mail)

I added a single letter, v, to my comment ;-)

Please note that the same code existed previously, and didn't have any comment.
If you find this short comment more confusing (or wrong) than helpful, then I
can just remove it.

Avi, I'll send a new version of patch 1 in a few minutes, in a new thread
this time ;-) Please let me know when (or if) you are prepared to apply the
rest of the patches, so I can send a new version, rebased to the current
trunk and with all the fixes people asked for in the last few days.


-- 
Nadav Har'El|  Tuesday, May 24 2011, 20 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Ms Piggy's last words: I'm pink,
http://nadav.harel.org.il   |therefore I'm ham.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-23 Thread Avi Kivity

On 05/22/2011 11:57 AM, Nadav Har'El wrote:

Hi Avi and Marcelo, here is a the new first patch to the nvmx patch set,
which overhauls the handling of vmcss on cpus, as you asked.

As you guessed, the nested entry and exit code becomes much simpler and
cleaner, with the whole VMCS switching code on entry, for example, reduced
to:
cpu = get_cpu();
vmx-loaded_vmcs = vmcs02;
vmx_vcpu_put(vcpu);
vmx_vcpu_load(vcpu, cpu);
vcpu-cpu = cpu;
put_cpu();


That's wonderful, it indicates the code is much better integrated.  
Perhaps later we can refine it  to have separate _load and _put for 
host-related and guest-related parts (I think they already exist in the 
code, except they are always called together), but that is an 
optimization, and not the most important one by far.



You can apply this patch separately from the rest of the patch set, if you
wish. I'm sending just this one, like you asked - and can send the rest of
the patches when you ask me to.


Subject: [PATCH 01/31] nVMX: Keep list of loaded VMCSs, instead of vcpus.

In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu vcpus_on_cpu linked list of vcpus loaded on each CPU.

The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.

So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx-loaded_vmcs is a pointer to the
currently active VMCS.

--- .before/arch/x86/kvm/x86.c  2011-05-22 11:41:57.0 +0300
+++ .after/arch/x86/kvm/x86.c   2011-05-22 11:41:57.0 +0300
@@ -2119,7 +2119,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu
if (need_emulate_wbinvd(vcpu)) {
if (kvm_x86_ops-has_wbinvd_exit())
cpumask_set_cpu(cpu, vcpu-arch.wbinvd_dirty_mask);
-   else if (vcpu-cpu != -1  vcpu-cpu != cpu)
+   else if (vcpu-cpu != -1  vcpu-cpu != cpu
+ cpu_online(vcpu-cpu))
smp_call_function_single(vcpu-cpu,
wbinvd_ipi, NULL, 1);
}


Is this a necessary part of this patch?  Or an semi-related bugfix?

I think that it can't actually trigger before this patch due to luck.  
svm doesn't clear vcpu-cpu on cpu offline, but on the other hand it 
-has_wbinvd_exit().


Joerg, is

if (unlikely(cpu != vcpu-cpu)) {
svm-asid_generation = 0;
mark_all_dirty(svm-vmcb);
}

susceptible to cpu offline/online?


@@ -971,22 +992,22 @@ static void vmx_vcpu_load(struct kvm_vcp

if (!vmm_exclusive)
kvm_cpu_vmxon(phys_addr);
-   else if (vcpu-cpu != cpu)
-   vcpu_clear(vmx);
+   else if (vmx-loaded_vmcs-cpu != cpu)
+   loaded_vmcs_clear(vmx-loaded_vmcs);

-   if (per_cpu(current_vmcs, cpu) != vmx-vmcs) {
-   per_cpu(current_vmcs, cpu) = vmx-vmcs;
-   vmcs_load(vmx-vmcs);
+   if (per_cpu(current_vmcs, cpu) != vmx-loaded_vmcs-vmcs) {
+   per_cpu(current_vmcs, cpu) = vmx-loaded_vmcs-vmcs;
+   vmcs_load(vmx-loaded_vmcs-vmcs);
}

-   if (vcpu-cpu != cpu) {
+   if (vmx-loaded_vmcs-cpu != cpu) {
struct desc_ptr *gdt =__get_cpu_var(host_gdt);
unsigned long sysenter_esp;

kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
local_irq_disable();
-   list_add(vmx-local_vcpus_link,
-   per_cpu(vcpus_on_cpu, cpu));
+   list_add(vmx-loaded_vmcs-loaded_vmcss_on_cpu_link,
+   per_cpu(loaded_vmcss_on_cpu, cpu));
local_irq_enable();

/*
@@ -999,13 +1020,15 @@ static void vmx_vcpu_load(struct kvm_vcp
rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp);
vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
}
+   vmx-loaded_vmcs-cpu = cpu;


This should be within the if () block.


@@ -4344,11 +4369,13 @@ static struct kvm_vcpu *vmx_create_vcpu(
goto uninit_vcpu;
}

-   vmx-vmcs = alloc_vmcs();
-   if (!vmx-vmcs)
+   vmx-loaded_vmcs =vmx-vmcs01;
+   vmx-loaded_vmcs-vmcs = alloc_vmcs();
+   if (!vmx-loaded_vmcs-vmcs)
goto free_msrs;
-
-   vmcs_init(vmx-vmcs);
+   vmcs_init(vmx-loaded_vmcs-vmcs);
+   vmx-loaded_vmcs-cpu = -1;
+   

Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-23 Thread Gleb Natapov
On Mon, May 23, 2011 at 06:49:17PM +0300, Avi Kivity wrote:
 (regarding interrupts, I think we can do that work post-merge.  But
 I'd like to see Kevin's comments addressed)
 
To be fair this wasn't addressed for almost two years now.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-23 Thread Roedel, Joerg
On Mon, May 23, 2011 at 11:49:17AM -0400, Avi Kivity wrote:

 Joerg, is
 
  if (unlikely(cpu != vcpu-cpu)) {
  svm-asid_generation = 0;
  mark_all_dirty(svm-vmcb);
  }
 
 susceptible to cpu offline/online?

I don't think so. This should be safe for cpu offline/online as long as
the cpu-number value is not reused for another physical cpu. But that
should be the case afaik.

Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-23 Thread Avi Kivity

On 05/23/2011 07:43 PM, Roedel, Joerg wrote:

On Mon, May 23, 2011 at 11:49:17AM -0400, Avi Kivity wrote:

  Joerg, is

   if (unlikely(cpu != vcpu-cpu)) {
   svm-asid_generation = 0;
   mark_all_dirty(svm-vmcb);
   }

  susceptible to cpu offline/online?

I don't think so. This should be safe for cpu offline/online as long as
the cpu-number value is not reused for another physical cpu. But that
should be the case afaik.



Why not? offline/online does reuse cpu numbers AFAIK (and it must, if 
you have a fully populated machine and offline/online just one cpu).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-23 Thread Nadav Har'El
Hi, and thanks again for the reviews,

On Mon, May 23, 2011, Avi Kivity wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
  if (need_emulate_wbinvd(vcpu)) {
  if (kvm_x86_ops-has_wbinvd_exit())
  cpumask_set_cpu(cpu, vcpu-arch.wbinvd_dirty_mask);
 -else if (vcpu-cpu != -1  vcpu-cpu != cpu)
 +else if (vcpu-cpu != -1  vcpu-cpu != cpu
 +  cpu_online(vcpu-cpu))
  smp_call_function_single(vcpu-cpu,
  wbinvd_ipi, NULL, 1);
  }
 
 Is this a necessary part of this patch?  Or an semi-related bugfix?
 
 I think that it can't actually trigger before this patch due to luck.  
 svm doesn't clear vcpu-cpu on cpu offline, but on the other hand it 
 -has_wbinvd_exit().

Well, this was Marcelo's patch:  When I suggested that we might have problems
because vcpu-cpu now isn't cleared to -1 when a cpu is offlined, he looked
at the code and said that he thinks this is the only place that will have
problems, and offered this patch, which I simply included in mine. I'm afraid
to admit I don't understand that part of the code, so I can't judge if this
is important or not. I'll drop it from my patch for now (and you can apply
Marcelo's patch separately).

 +if (vmx-loaded_vmcs-cpu != cpu) {
  struct desc_ptr *gdt =__get_cpu_var(host_gdt);
  unsigned long sysenter_esp;
 
  kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
  local_irq_disable();
 -list_add(vmx-local_vcpus_link,
 -per_cpu(vcpus_on_cpu, cpu));
 +list_add(vmx-loaded_vmcs-loaded_vmcss_on_cpu_link,
 +per_cpu(loaded_vmcss_on_cpu, cpu));
  local_irq_enable();
 
  /*
 @@ -999,13 +1020,15 @@ static void vmx_vcpu_load(struct kvm_vcp
  rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp);
  vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 
  */
  }
 +vmx-loaded_vmcs-cpu = cpu;
 This should be within the if () block.

Makes sense :-) Done.

 +vmcs_init(vmx-loaded_vmcs-vmcs);
 +vmx-loaded_vmcs-cpu = -1;
 +vmx-loaded_vmcs-launched = 0;
 
 Perhaps a loaded_vmcs_init() to encapsulate initialization of these 
 three fields, you'll probably reuse it later.

It's good you pointed this out, because it made me suddenly realise that I
forgot to VMCLEAR the new vmcs02's I allocate. In practice it never made a
difference, but better safe than sorry.

I had to restructure some of the code a bit to be able to properly use this
new function (in 3 places - __loaded_vmcs_clear, nested_get_current_vmcs02,
vmx_create_cpu).

 Please repost separately after the fix, I'd like to apply it before the 
 rest of the series.

I am adding a new version of this patch at the end of this mail.

 (regarding interrupts, I think we can do that work post-merge.  But I'd 
 like to see Kevin's comments addressed)

I replied to his comments. Done some of the things he asked, and asked for
more info on why/where he believes the current code is incorrect where I
didn't understand what problems he pointed to, and am now waiting for him
to reply.


--- 8 -- 8 -- 8 -- 8 --- 8 ---

Subject: [PATCH 01/31] nVMX: Keep list of loaded VMCSs, instead of vcpus.

In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu vcpus_on_cpu linked list of vcpus loaded on each CPU.

The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.

So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx-loaded_vmcs is a pointer to the
currently active VMCS.

Signed-off-by: Nadav Har'El n...@il.ibm.com
---
 arch/x86/kvm/vmx.c |  150 ---
 1 file changed, 86 insertions(+), 64 deletions(-)

--- .before/arch/x86/kvm/vmx.c  2011-05-23 21:46:14.0 +0300
+++ .after/arch/x86/kvm/vmx.c   2011-05-23 21:46:14.0 +0300
@@ -116,6 +116,18 @@ struct vmcs {
char data[0];
 };
 
+/*
+ * Track a VMCS that may be loaded on a certain CPU. If it is (cpu!=-1), also
+ * remember whether it was VMLAUNCHed, and maintain a linked list of all VMCSs
+ * loaded on this CPU (so we can clear them if the CPU goes down).
+ */
+struct loaded_vmcs {
+   struct vmcs *vmcs;
+   int cpu;
+   int launched

Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-23 Thread Nadav Har'El
On Mon, May 23, 2011, Gleb Natapov wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
 On Mon, May 23, 2011 at 06:49:17PM +0300, Avi Kivity wrote:
  (regarding interrupts, I think we can do that work post-merge.  But
  I'd like to see Kevin's comments addressed)
  
 To be fair this wasn't addressed for almost two years now.

Gleb, I assume by this you meant the idt-vectoring information issue, not
Kevin's comments (which I only saw a couple of days ago)?

-- 
Nadav Har'El|   Monday, May 23 2011, 20 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Someone offered you a cute little quote
http://nadav.harel.org.il   |for your signature? JUST SAY NO!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-23 Thread Gleb Natapov
On Mon, May 23, 2011 at 09:59:01PM +0300, Nadav Har'El wrote:
 On Mon, May 23, 2011, Gleb Natapov wrote about Re: [PATCH 08/31] nVMX: Fix 
 local_vcpus_link handling:
  On Mon, May 23, 2011 at 06:49:17PM +0300, Avi Kivity wrote:
   (regarding interrupts, I think we can do that work post-merge.  But
   I'd like to see Kevin's comments addressed)
   
  To be fair this wasn't addressed for almost two years now.
 
 Gleb, I assume by this you meant the idt-vectoring information issue, not
 Kevin's comments (which I only saw a couple of days ago)?
 
Yes, of course.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-23 Thread Tian, Kevin
 From: Avi Kivity
 Sent: Monday, May 23, 2011 11:49 PM
 (regarding interrupts, I think we can do that work post-merge.  But I'd
 like to see Kevin's comments addressed)

My earlier comment has been addressed by Nadav with his explanation.

Thanks
Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-23 Thread Tian, Kevin
 From: Nadav Har'El
 Sent: Tuesday, May 24, 2011 2:51 AM
 
  +  vmcs_init(vmx-loaded_vmcs-vmcs);
  +  vmx-loaded_vmcs-cpu = -1;
  +  vmx-loaded_vmcs-launched = 0;
 
  Perhaps a loaded_vmcs_init() to encapsulate initialization of these
  three fields, you'll probably reuse it later.
 
 It's good you pointed this out, because it made me suddenly realise that I
 forgot to VMCLEAR the new vmcs02's I allocate. In practice it never made a
 difference, but better safe than sorry.

yes, that's what spec requires. You need VMCLEAR on any new VMCS which
does implementation specific initialization in that VMCS region.

 
 I had to restructure some of the code a bit to be able to properly use this
 new function (in 3 places - __loaded_vmcs_clear, nested_get_current_vmcs02,
 vmx_create_cpu).
 
  Please repost separately after the fix, I'd like to apply it before the
  rest of the series.
 
 I am adding a new version of this patch at the end of this mail.
 
  (regarding interrupts, I think we can do that work post-merge.  But I'd
  like to see Kevin's comments addressed)
 
 I replied to his comments. Done some of the things he asked, and asked for
 more info on why/where he believes the current code is incorrect where I
 didn't understand what problems he pointed to, and am now waiting for him
 to reply.

As I replied in another thread, I believe this has been explained clearly by 
Nadav.

 
 
 --- 8 -- 8 -- 8 -- 8 --- 8 ---
 
 Subject: [PATCH 01/31] nVMX: Keep list of loaded VMCSs, instead of vcpus.
 
 In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
 because (at least in theory) the processor might not have written all of its
 content back to memory. Since a patch from June 26, 2008, this is done using
 a per-cpu vcpus_on_cpu linked list of vcpus loaded on each CPU.
 
 The problem is that with nested VMX, we no longer have the concept of a
 vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
 L2s), and each of those may be have been last loaded on a different cpu.
 
 So instead of linking the vcpus, we link the VMCSs, using a new structure
 loaded_vmcs. This structure contains the VMCS, and the information
 pertaining
 to its loading on a specific cpu (namely, the cpu number, and whether it
 was already launched on this cpu once). In nested we will also use the same
 structure to hold L2 VMCSs, and vmx-loaded_vmcs is a pointer to the
 currently active VMCS.
 
 Signed-off-by: Nadav Har'El n...@il.ibm.com
 ---
  arch/x86/kvm/vmx.c |  150 ---
  1 file changed, 86 insertions(+), 64 deletions(-)
 
 --- .before/arch/x86/kvm/vmx.c2011-05-23 21:46:14.0 +0300
 +++ .after/arch/x86/kvm/vmx.c 2011-05-23 21:46:14.0 +0300
 @@ -116,6 +116,18 @@ struct vmcs {
   char data[0];
  };
 
 +/*
 + * Track a VMCS that may be loaded on a certain CPU. If it is (cpu!=-1), also
 + * remember whether it was VMLAUNCHed, and maintain a linked list of all
 VMCSs
 + * loaded on this CPU (so we can clear them if the CPU goes down).
 + */
 +struct loaded_vmcs {
 + struct vmcs *vmcs;
 + int cpu;
 + int launched;
 + struct list_head loaded_vmcss_on_cpu_link;
 +};
 +
  struct shared_msr_entry {
   unsigned index;
   u64 data;
 @@ -124,9 +136,7 @@ struct shared_msr_entry {
 
  struct vcpu_vmx {
   struct kvm_vcpu   vcpu;
 - struct list_head  local_vcpus_link;
   unsigned long host_rsp;
 - int   launched;
   u8fail;
   u8cpl;
   bool  nmi_known_unmasked;
 @@ -140,7 +150,14 @@ struct vcpu_vmx {
   u64   msr_host_kernel_gs_base;
   u64   msr_guest_kernel_gs_base;
  #endif
 - struct vmcs  *vmcs;
 + /*
 +  * loaded_vmcs points to the VMCS currently used in this vcpu. For a
 +  * non-nested (L1) guest, it always points to vmcs01. For a nested
 +  * guest (L2), it points to a different VMCS.
 +  */
 + struct loaded_vmcsvmcs01;
 + struct loaded_vmcs   *loaded_vmcs;
 + bool  __launched; /* temporary, used in
 vmx_vcpu_run */
   struct msr_autoload {
   unsigned nr;
   struct vmx_msr_entry guest[NR_AUTOLOAD_MSRS];
 @@ -200,7 +217,11 @@ static int vmx_set_tss_addr(struct kvm *
 
  static DEFINE_PER_CPU(struct vmcs *, vmxarea);
  static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
 -static DEFINE_PER_CPU(struct list_head, vcpus_on_cpu);
 +/*
 + * We maintain a per-CPU linked-list of VMCS loaded on that CPU. This is
 needed
 + * when a CPU is brought down, and we need to VMCLEAR all VMCSs loaded
 on it.
 + */
 +static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
 
  static unsigned long *vmx_io_bitmap_a;
 @@ -501,6 +522,13 @@ static void vmcs_clear(struct vmcs *vmcs
   

Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-22 Thread Nadav Har'El
On Wed, May 18, 2011, Marcelo Tosatti wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
 Humpf, right. OK, you can handle the x86.c usage with
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
...

Hi Avi and Marcelo, here is a the new first patch to the nvmx patch set,
which overhauls the handling of vmcss on cpus, as you asked.

As you guessed, the nested entry and exit code becomes much simpler and
cleaner, with the whole VMCS switching code on entry, for example, reduced
to:
cpu = get_cpu();
vmx-loaded_vmcs = vmcs02;
vmx_vcpu_put(vcpu);
vmx_vcpu_load(vcpu, cpu);
vcpu-cpu = cpu;
put_cpu();

You can apply this patch separately from the rest of the patch set, if you
wish. I'm sending just this one, like you asked - and can send the rest of
the patches when you ask me to.


Subject: [PATCH 01/31] nVMX: Keep list of loaded VMCSs, instead of vcpus.

In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu vcpus_on_cpu linked list of vcpus loaded on each CPU.

The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.

So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx-loaded_vmcs is a pointer to the
currently active VMCS.

Signed-off-by: Nadav Har'El n...@il.ibm.com
---
 arch/x86/kvm/vmx.c |  129 ++-
 arch/x86/kvm/x86.c |3 -
 2 files changed, 80 insertions(+), 52 deletions(-)

--- .before/arch/x86/kvm/x86.c  2011-05-22 11:41:57.0 +0300
+++ .after/arch/x86/kvm/x86.c   2011-05-22 11:41:57.0 +0300
@@ -2119,7 +2119,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu 
if (need_emulate_wbinvd(vcpu)) {
if (kvm_x86_ops-has_wbinvd_exit())
cpumask_set_cpu(cpu, vcpu-arch.wbinvd_dirty_mask);
-   else if (vcpu-cpu != -1  vcpu-cpu != cpu)
+   else if (vcpu-cpu != -1  vcpu-cpu != cpu
+cpu_online(vcpu-cpu))
smp_call_function_single(vcpu-cpu,
wbinvd_ipi, NULL, 1);
}
--- .before/arch/x86/kvm/vmx.c  2011-05-22 11:41:57.0 +0300
+++ .after/arch/x86/kvm/vmx.c   2011-05-22 11:41:58.0 +0300
@@ -116,6 +116,18 @@ struct vmcs {
char data[0];
 };
 
+/*
+ * Track a VMCS that may be loaded on a certain CPU. If it is (cpu!=-1), also
+ * remember whether it was VMLAUNCHed, and maintain a linked list of all VMCSs
+ * loaded on this CPU (so we can clear them if the CPU goes down).
+ */
+struct loaded_vmcs {
+   struct vmcs *vmcs;
+   int cpu;
+   int launched;
+   struct list_head loaded_vmcss_on_cpu_link;
+};
+
 struct shared_msr_entry {
unsigned index;
u64 data;
@@ -124,9 +136,7 @@ struct shared_msr_entry {
 
 struct vcpu_vmx {
struct kvm_vcpu   vcpu;
-   struct list_head  local_vcpus_link;
unsigned long host_rsp;
-   int   launched;
u8fail;
u8cpl;
bool  nmi_known_unmasked;
@@ -140,7 +150,14 @@ struct vcpu_vmx {
u64   msr_host_kernel_gs_base;
u64   msr_guest_kernel_gs_base;
 #endif
-   struct vmcs  *vmcs;
+   /*
+* loaded_vmcs points to the VMCS currently used in this vcpu. For a
+* non-nested (L1) guest, it always points to vmcs01. For a nested
+* guest (L2), it points to a different VMCS.
+*/
+   struct loaded_vmcsvmcs01;
+   struct loaded_vmcs   *loaded_vmcs;
+   bool  __launched; /* temporary, used in vmx_vcpu_run */
struct msr_autoload {
unsigned nr;
struct vmx_msr_entry guest[NR_AUTOLOAD_MSRS];
@@ -200,7 +217,11 @@ static int vmx_set_tss_addr(struct kvm *
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
-static DEFINE_PER_CPU(struct list_head, vcpus_on_cpu);
+/*
+ * We maintain a per-CPU linked-list of VMCS loaded on that CPU. This is needed
+ * when a CPU is brought down, and we need to VMCLEAR all VMCSs loaded on it.
+ */
+static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
 static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
 
 static unsigned long *vmx_io_bitmap_a;
@@ -514,25 +535,25 @@ static void

Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-18 Thread Avi Kivity

On 05/17/2011 10:52 PM, Marcelo Tosatti wrote:

On Tue, May 17, 2011 at 10:30:30PM +0300, Nadav Har'El wrote:
  On Tue, May 17, 2011, Marcelo Tosatti wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
  this is what I planned to do, until it dawned on me that I can't, because 
cpu
  isn't part of vmx (where the vmcs and launched sit in the standard 
KVM), but
  ...
vcpu-cpu remains there. There is a new -cpu field on struct vmcs, just
as saved_vmcs has in the current patches, to note the cpu which the VMCS
was last loaded.

  So we'll have two fields, vmx.vcpu.cpu and vmx.vmcs.cpu, which are supposed
  to always contain the same value. Are you fine with that?

Yes. Avi?


Yes.

They have different meanings.  vcpu-cpu means where the task that runs 
the vcpu is running (or last ran).  vmcs-cpu means which cpu has the 
vmcs cached.  They need not be the same when we have multiple vmcs's for 
a vcpu; but vmx-vmcs-cpu will chase vcpu-cpu as it changes.


Please post this patch separately instead of reposting the entire 
series, we can apply it independently.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-18 Thread Avi Kivity

On 05/18/2011 08:52 AM, Nadav Har'El wrote:

On Tue, May 17, 2011, Marcelo Tosatti wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
  On Tue, May 17, 2011 at 10:30:30PM +0300, Nadav Har'El wrote:
So we'll have two fields, vmx.vcpu.cpu and vmx.vmcs.cpu, which are 
supposed
to always contain the same value. Are you fine with that?

  Yes. Avi?

Oops, it's even worse than I said, because if the new vmclear_local_vmcss
clears the vmcs currently used on some vcpu, it will update vmcs.cpu on that
vcpu to -1, but will *not* update vmx.vcpu.cpu, which remain its old value,
and potentially cause problems when it is used (e.g., in x86.c) instead
of vmx.vmcs.cpu.



I did a quick audit and it seems fine.  If it isn't, we'll fix it when 
we see the problem.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-18 Thread Nadav Har'El
On Wed, May 18, 2011, Avi Kivity wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
 I did a quick audit and it seems fine.  If it isn't, we'll fix it when 
 we see the problem.

Ok, then, I'm working on the code with the new approach.

My fear was that some CPU 7 is taken down, but vcpu.cpu remains 7 (not set to
-1). If cpu 7 nevers comes up again, it's not a problem because when we run
the same vcpu again on a different cpu, it's not 7 so we do what needs to be
done on CPU switch. But, what if CPU 7 does come up again later, and we find
ourselves running again on CPU 7, but it's not the same CPU 7 and we don't
know it? Is this case at all possible?

-- 
Nadav Har'El|Wednesday, May 18 2011, 14 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Enjoy the new millennium; it might be
http://nadav.harel.org.il   |your last.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-18 Thread Avi Kivity

On 05/18/2011 12:02 PM, Nadav Har'El wrote:

On Wed, May 18, 2011, Avi Kivity wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
  I did a quick audit and it seems fine.  If it isn't, we'll fix it when
  we see the problem.

Ok, then, I'm working on the code with the new approach.

My fear was that some CPU 7 is taken down, but vcpu.cpu remains 7 (not set to
-1). If cpu 7 nevers comes up again, it's not a problem because when we run
the same vcpu again on a different cpu, it's not 7 so we do what needs to be
done on CPU switch. But, what if CPU 7 does come up again later, and we find
ourselves running again on CPU 7, but it's not the same CPU 7 and we don't
know it? Is this case at all possible?


It's certainly possible, but it's independent of this patch.

It's even handled, see kvm_arch_hardware_enable().

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-18 Thread Marcelo Tosatti
On Wed, May 18, 2011 at 08:52:36AM +0300, Nadav Har'El wrote:
 On Tue, May 17, 2011, Marcelo Tosatti wrote about Re: [PATCH 08/31] nVMX: 
 Fix local_vcpus_link handling:
  On Tue, May 17, 2011 at 10:30:30PM +0300, Nadav Har'El wrote:
   So we'll have two fields, vmx.vcpu.cpu and vmx.vmcs.cpu, which are 
   supposed
   to always contain the same value. Are you fine with that?
  
  Yes. Avi?
 
 Oops, it's even worse than I said, because if the new vmclear_local_vmcss
 clears the vmcs currently used on some vcpu, it will update vmcs.cpu on that
 vcpu to -1, but will *not* update vmx.vcpu.cpu, which remain its old value,
 and potentially cause problems when it is used (e.g., in x86.c) instead
 of vmx.vmcs.cpu.

Humpf, right. OK, you can handle the x86.c usage with

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 64edf57..b5fd9b4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2118,7 +2118,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (need_emulate_wbinvd(vcpu)) {
if (kvm_x86_ops-has_wbinvd_exit())
cpumask_set_cpu(cpu, vcpu-arch.wbinvd_dirty_mask);
-   else if (vcpu-cpu != -1  vcpu-cpu != cpu)
+   else if (vcpu-cpu != -1  vcpu-cpu != cpu  
cpu_online(vcpu-cpu))
smp_call_function_single(vcpu-cpu,
wbinvd_ipi, NULL, 1);
}

Note this is not just about the code being nicer, but simplicity is
crucial, the code is tricky enough with one linked list.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-18 Thread Nadav Har'El
On Wed, May 18, 2011, Marcelo Tosatti wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
 Note this is not just about the code being nicer, but simplicity is
 crucial, the code is tricky enough with one linked list.

Unfortunately, it's not obvious that the method you suggested (and which, like
I said, was the first method I considered as well, and rejected) will be
simpler or less tricky, with its two cpu variables, vmcs pointing to l1_vmcs
even in the non-nested case, and a bunch of other issues. The main benefit of
the code as I already posted it was that it didn't add *any* complexity or
changed anything to the non-nested case. The code I'm writing now based on
your suggestions is more risky in the sense that it *may* break some things
completely unrelated to nested.

In any case, like I said, I'm working on a version using your and Avi's
suggestions, and will send it for your review shortly.

Thanks for all the ideas,
Nadav.

-- 
Nadav Har'El|Wednesday, May 18 2011, 14 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Sign seen in restaurant: We Reserve The
http://nadav.harel.org.il   |Right To Serve Refuse To Anyone!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-17 Thread Marcelo Tosatti
On Mon, May 16, 2011 at 10:48:01PM +0300, Nadav Har'El wrote:
 In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
 because (at least in theory) the processor might not have written all of its
 content back to memory. Since a patch from June 26, 2008, this is done using
 a per-cpu vcpus_on_cpu linked list of vcpus loaded on each CPU.
 
 The problem is that with nested VMX, we no longer have the concept of a
 vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
 L2s), and each of those may be have been last loaded on a different cpu.
 
 Our solution is to hold, in addition to vcpus_on_cpu, a second linked list
 saved_vmcss_on_cpu, which holds the current list of saved VMCSs, i.e.,
 VMCSs which are loaded on this CPU but are not the vmx-vmcs of any of
 the vcpus. These saved VMCSs include L1's VMCS while L2 is running
 (saved_vmcs01), and L2 VMCSs not currently used - because L1 is running or
 because the vmcs02_pool contains more than one entry.
 
 When we will switch between L1's and L2's VMCSs, they need to be moved
 between vcpus_on_cpu and saved_vmcs_on_cpu lists and vice versa. A new
 function, nested_maintain_per_cpu_lists(), takes care of that.
 
 Signed-off-by: Nadav Har'El n...@il.ibm.com
 ---
  arch/x86/kvm/vmx.c |   67 +++
  1 file changed, 67 insertions(+)
 
 --- .before/arch/x86/kvm/vmx.c2011-05-16 22:36:47.0 +0300
 +++ .after/arch/x86/kvm/vmx.c 2011-05-16 22:36:47.0 +0300
 @@ -181,6 +181,7 @@ struct saved_vmcs {
   struct vmcs *vmcs;
   int cpu;
   int launched;
 + struct list_head local_saved_vmcss_link; /* see saved_vmcss_on_cpu */
  };
  
  /* Used to remember the last vmcs02 used for some recently used vmcs12s */
 @@ -315,7 +316,20 @@ static int vmx_set_tss_addr(struct kvm *
  
  static DEFINE_PER_CPU(struct vmcs *, vmxarea);
  static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
 +/*
 + * We maintain a per-CPU linked-list vcpus_on_cpu, holding for each CPU a 
 list
 + * of vcpus whose VMCS are loaded on that CPU. This is needed when a CPU is
 + * brought down, and we need to VMCLEAR all VMCSs loaded on it.
 + *
 + * With nested VMX, we have additional VMCSs which are not the current
 + * vmx-vmcs of any vcpu, but may also be loaded on some CPU: While L2 is
 + * running, L1's VMCS is loaded but not the VMCS of any vcpu; While L1 is
 + * running, a previously used L2 VMCS might still be around and loaded on 
 some
 + * CPU, somes even more than one such L2 VMCSs is kept (see 
 VMCS02_POOL_SIZE).
 + * The list of these additional VMCSs is kept on cpu saved_vmcss_on_cpu.
 + */
  static DEFINE_PER_CPU(struct list_head, vcpus_on_cpu);
 +static DEFINE_PER_CPU(struct list_head, saved_vmcss_on_cpu);
  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
  
  static unsigned long *vmx_io_bitmap_a;
 @@ -1818,6 +1832,7 @@ static int hardware_enable(void *garbage
   return -EBUSY;
  
   INIT_LIST_HEAD(per_cpu(vcpus_on_cpu, cpu));
 + INIT_LIST_HEAD(per_cpu(saved_vmcss_on_cpu, cpu));
   rdmsrl(MSR_IA32_FEATURE_CONTROL, old);
  
   test_bits = FEATURE_CONTROL_LOCKED;
 @@ -1860,10 +1875,13 @@ static void kvm_cpu_vmxoff(void)
   asm volatile (__ex(ASM_VMX_VMXOFF) : : : cc);
  }
  
 +static void vmclear_local_saved_vmcss(void);
 +
  static void hardware_disable(void *garbage)
  {
   if (vmm_exclusive) {
   vmclear_local_vcpus();
 + vmclear_local_saved_vmcss();
   kvm_cpu_vmxoff();
   }
   write_cr4(read_cr4()  ~X86_CR4_VMXE);
 @@ -4248,6 +4266,8 @@ static void __nested_free_saved_vmcs(voi
   vmcs_clear(saved_vmcs-vmcs);
   if (per_cpu(current_vmcs, saved_vmcs-cpu) == saved_vmcs-vmcs)
   per_cpu(current_vmcs, saved_vmcs-cpu) = NULL;
 + list_del(saved_vmcs-local_saved_vmcss_link);
 + saved_vmcs-cpu = -1;
  }
  
  /*
 @@ -4265,6 +4285,21 @@ static void nested_free_saved_vmcs(struc
   free_vmcs(saved_vmcs-vmcs);
  }
  
 +/*
 + * VMCLEAR all the currently unused (not vmx-vmcs on any vcpu) saved_vmcss
 + * which were loaded on the current CPU. See also vmclear_load_vcpus(), which
 + * does the same for VMCS currently used in vcpus.
 + */
 +static void vmclear_local_saved_vmcss(void)
 +{
 + int cpu = raw_smp_processor_id();
 + struct saved_vmcs *v, *n;
 +
 + list_for_each_entry_safe(v, n, per_cpu(saved_vmcss_on_cpu, cpu),
 +  local_saved_vmcss_link)
 + __nested_free_saved_vmcs(v);
 +}
 +
  /* Free and remove from pool a vmcs02 saved for a vmcs12 (if there is one) */
  static void nested_free_vmcs02(struct vcpu_vmx *vmx, gpa_t vmptr)
  {
 @@ -5143,6 +5178,38 @@ static void vmx_set_supported_cpuid(u32 
  {
  }
  
 +/*
 + * Maintain the vcpus_on_cpu and saved_vmcss_on_cpu lists of vcpus and
 + * inactive saved_vmcss on nested entry (L1-L2) or nested exit (L2-L1).
 + *
 + * nested_maintain_per_cpu_lists should be called after the VMCS 

Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-17 Thread Avi Kivity

On 05/17/2011 04:19 PM, Marcelo Tosatti wrote:

  +/*
  + * Maintain the vcpus_on_cpu and saved_vmcss_on_cpu lists of vcpus and
  + * inactive saved_vmcss on nested entry (L1-L2) or nested exit (L2-L1).
  + *
  + * nested_maintain_per_cpu_lists should be called after the VMCS was 
switched
  + * to the new one, with parameters giving both the new on (after the entry
  + * or exit) and the old one, in that order.
  + */
  +static void nested_maintain_per_cpu_lists(struct vcpu_vmx *vmx,
  + struct saved_vmcs *new_vmcs,
  + struct saved_vmcs *old_vmcs)
  +{
  + /*
  +  * When a vcpus's old vmcs is saved, we need to drop it from
  +  * vcpus_on_cpu and put it on saved_vmcss_on_cpu.
  +  */
  + if (old_vmcs-cpu != -1) {
  + list_del(vmx-local_vcpus_link);
  + list_add(old_vmcs-local_saved_vmcss_link,
  + per_cpu(saved_vmcss_on_cpu, old_vmcs-cpu));
  + }

This new handling of vmcs could be simplified (local_vcpus_link must be
manipulated with interrupts disabled, BTW).

What about having a per-CPU VMCS list instead of per-CPU vcpu list?
local_vmcs_link list node could be in struct saved_vmcs (and
a current_saved_vmcs pointer in struct vcpu_vmx).

vmx_vcpu_load would then add to this list at

 if (per_cpu(current_vmcs, cpu) != vmx-vmcs) {
 per_cpu(current_vmcs, cpu) = vmx-vmcs;
 vmcs_load(vmx-vmcs);
 }


Right, that's the easiest thing to do.

Perhaps even easier (avoids duplication):

struct raw_vmcs {
u32 revision_id;
u32 abort;
char data[0];
};

struct vmcs {
struct raw_vmcs *raw_vmcs;
struct list_head local_vmcs_link;
};

struct vcpu_vmx {
...
struct vmcs *vmcs;  /* often points at l1_vmcs */
struct vmcs l1_vmcs;
...
};

static DEFINE_PER_CPU(struct list_head, vmcss_on_cpu);

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-17 Thread Nadav Har'El
On Tue, May 17, 2011, Avi Kivity wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
 (local_vcpus_link must be manipulated with interrupts disabled, BTW).

Thanks, I'll look into that.

 What about having a per-CPU VMCS list instead of per-CPU vcpu list?
 Perhaps even easier (avoids duplication):
 
 struct raw_vmcs {
 u32 revision_id;
 u32 abort;
 char data[0];
 };
 
 struct vmcs {
 struct raw_vmcs *raw_vmcs;
 struct list_head local_vmcs_link;
 };
 
 struct vcpu_vmx {
 ...
 struct vmcs *vmcs;  /* often points at l1_vmcs */
 struct vmcs l1_vmcs;
 ...
 };
 
 static DEFINE_PER_CPU(struct list_head, vmcss_on_cpu);

This is an interesting suggestion. My initial plan was to do something similar
to this, and I agree it could have been nicer code, but I had to change it
after bumping into too many obstacles.

For example, currently, vmclear_local_vcpus() not only VMCLEARs the vmcss,
it also sets vmx-vcpu.cpu = -1, xmv-launched=0 for the vcpus holding these
VMCSs.  If we had only a list of VMCSs, how can we mark the vcpus as being not
currently loaded (cpu=-1)?


-- 
Nadav Har'El|  Tuesday, May 17 2011, 13 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |I'm a peripheral visionary: I see into
http://nadav.harel.org.il   |the future, but mostly off to the sides.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-17 Thread Marcelo Tosatti
On Tue, May 17, 2011 at 05:35:32PM +0300, Nadav Har'El wrote:
 On Tue, May 17, 2011, Avi Kivity wrote about Re: [PATCH 08/31] nVMX: Fix 
 local_vcpus_link handling:
  static DEFINE_PER_CPU(struct list_head, vmcss_on_cpu);
 
 This is an interesting suggestion. My initial plan was to do something similar
 to this, and I agree it could have been nicer code, but I had to change it
 after bumping into too many obstacles.
 
 For example, currently, vmclear_local_vcpus() not only VMCLEARs the vmcss,
 it also sets vmx-vcpu.cpu = -1, xmv-launched=0 for the vcpus holding these
 VMCSs.  If we had only a list of VMCSs, how can we mark the vcpus as being not
 currently loaded (cpu=-1)?

Do it in vcpu_clear, its just an optimization not necessary in
vmclear_local_vcpus path.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-17 Thread Avi Kivity

On 05/17/2011 05:35 PM, Nadav Har'El wrote:



What about having a per-CPU VMCS list instead of per-CPU vcpu list?

Perhaps even easier (avoids duplication):

struct raw_vmcs {
 u32 revision_id;
 u32 abort;
 char data[0];
};

struct vmcs {
 struct raw_vmcs *raw_vmcs;
 struct list_head local_vmcs_link;
};

struct vcpu_vmx {
 ...
 struct vmcs *vmcs;  /* often points at l1_vmcs */
 struct vmcs l1_vmcs;
 ...
};

static DEFINE_PER_CPU(struct list_head, vmcss_on_cpu);

This is an interesting suggestion. My initial plan was to do something similar
to this, and I agree it could have been nicer code, but I had to change it
after bumping into too many obstacles.

For example, currently, vmclear_local_vcpus() not only VMCLEARs the vmcss,
it also sets vmx-vcpu.cpu = -1, xmv-launched=0 for the vcpus holding these
VMCSs.  If we had only a list of VMCSs, how can we mark the vcpus as being not
currently loaded (cpu=-1)?



-launched and -cpu simply move into struct vmcs.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-17 Thread Nadav Har'El
On Tue, May 17, 2011, Marcelo Tosatti wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
  For example, currently, vmclear_local_vcpus() not only VMCLEARs the vmcss,
  it also sets vmx-vcpu.cpu = -1, xmv-launched=0 for the vcpus holding these
  VMCSs.  If we had only a list of VMCSs, how can we mark the vcpus as being 
  not
  currently loaded (cpu=-1)?
 
 Do it in vcpu_clear, its just an optimization not necessary in
 vmclear_local_vcpus path.

Well, what if (say) we're running L2, and L1's vmcs is saved in saved_vmcs01
and is not the current vmcs of the vcpu, and then we shut down the CPU on
which this saved_vmcs01 was loaded. We need not only to VMCLEAR this vmcs,
we need to also remember that this vmcs is not loaded, so when we nested_vmexit
back to L1, we know we need to load the vmcs again.

There's solution to this (which Avi also mentioned in his email) - it is to
use everywhere my saved_vmcs type (which I'd rename loaded vmcs), which
includes the vmcs *and* the cpu (and possibly launched).
If the cpu field was part of vmx, this was easy - but cpu is a field of
vcpu, not vmx, so I have problems encapsulating both vmcs and cpu in
one structure everywhere.

These are the kind of problems I wrapped my head with, until I gave up and
came up with the current solution...

-- 
Nadav Har'El|  Tuesday, May 17 2011, 14 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Bigamy: Having one wife too many.
http://nadav.harel.org.il   |Monogamy: The same thing!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-17 Thread Nadav Har'El
On Tue, May 17, 2011, Avi Kivity wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
 VMCSs.  If we had only a list of VMCSs, how can we mark the vcpus as being 
 not
 currently loaded (cpu=-1)?
 
 
 -launched and -cpu simply move into struct vmcs.

As I explained in the sister thread (this discussion is becoming a tree ;-))
this is what I planned to do, until it dawned on me that I can't, because cpu
isn't part of vmx (where the vmcs and launched sit in the standard KVM), but
rather part of vcpu... When I gave up trying to solve these interdependencies
and avoiding modifying half of KVM, I came up with the current solution.

Maybe I'm missing something - I'd be happy if we do find a solution that
simplifies this code.


-- 
Nadav Har'El|  Tuesday, May 17 2011, 14 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Why do we drive on a parkway and park on
http://nadav.harel.org.il   |a driveway?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-17 Thread Marcelo Tosatti
On Tue, May 17, 2011 at 09:11:32PM +0300, Nadav Har'El wrote:
 On Tue, May 17, 2011, Avi Kivity wrote about Re: [PATCH 08/31] nVMX: Fix 
 local_vcpus_link handling:
  VMCSs.  If we had only a list of VMCSs, how can we mark the vcpus as being 
  not
  currently loaded (cpu=-1)?
  
  
  -launched and -cpu simply move into struct vmcs.
 
 As I explained in the sister thread (this discussion is becoming a tree ;-))
 this is what I planned to do, until it dawned on me that I can't, because 
 cpu
 isn't part of vmx (where the vmcs and launched sit in the standard KVM), but
 rather part of vcpu... When I gave up trying to solve these 
 interdependencies
 and avoiding modifying half of KVM, I came up with the current solution.
 
 Maybe I'm missing something - I'd be happy if we do find a solution that
 simplifies this code.

vcpu-cpu remains there. There is a new -cpu field on struct vmcs, just
as saved_vmcs has in the current patches, to note the cpu which the VMCS 
was last loaded.

As mentioned there is no need to set vcpu-cpu = -1 in __vcpu_clear,
the IPI handler, that can be done in vcpu_clear.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-17 Thread Nadav Har'El
On Tue, May 17, 2011, Marcelo Tosatti wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
  this is what I planned to do, until it dawned on me that I can't, because 
  cpu
  isn't part of vmx (where the vmcs and launched sit in the standard KVM), but
...
 vcpu-cpu remains there. There is a new -cpu field on struct vmcs, just
 as saved_vmcs has in the current patches, to note the cpu which the VMCS 
 was last loaded.

So we'll have two fields, vmx.vcpu.cpu and vmx.vmcs.cpu, which are supposed
to always contain the same value. Are you fine with that?

 As mentioned there is no need to set vcpu-cpu = -1 in __vcpu_clear,
 the IPI handler, that can be done in vcpu_clear.

Right, this is true.

-- 
Nadav Har'El|  Tuesday, May 17 2011, 14 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |A mathematician is a device for turning
http://nadav.harel.org.il   |coffee into theorems -- P. Erdos
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-17 Thread Marcelo Tosatti
On Tue, May 17, 2011 at 10:30:30PM +0300, Nadav Har'El wrote:
 On Tue, May 17, 2011, Marcelo Tosatti wrote about Re: [PATCH 08/31] nVMX: 
 Fix local_vcpus_link handling:
   this is what I planned to do, until it dawned on me that I can't, because 
   cpu
   isn't part of vmx (where the vmcs and launched sit in the standard KVM), 
   but
 ...
  vcpu-cpu remains there. There is a new -cpu field on struct vmcs, just
  as saved_vmcs has in the current patches, to note the cpu which the VMCS 
  was last loaded.
 
 So we'll have two fields, vmx.vcpu.cpu and vmx.vmcs.cpu, which are supposed
 to always contain the same value. Are you fine with that?

Yes. Avi?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-17 Thread Nadav Har'El
On Tue, May 17, 2011, Marcelo Tosatti wrote about Re: [PATCH 08/31] nVMX: Fix 
local_vcpus_link handling:
 On Tue, May 17, 2011 at 10:30:30PM +0300, Nadav Har'El wrote:
  So we'll have two fields, vmx.vcpu.cpu and vmx.vmcs.cpu, which are supposed
  to always contain the same value. Are you fine with that?
 
 Yes. Avi?

Oops, it's even worse than I said, because if the new vmclear_local_vmcss
clears the vmcs currently used on some vcpu, it will update vmcs.cpu on that
vcpu to -1, but will *not* update vmx.vcpu.cpu, which remain its old value,
and potentially cause problems when it is used (e.g., in x86.c) instead
of vmx.vmcs.cpu.

-- 
Nadav Har'El|Wednesday, May 18 2011, 14 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |An egotist is a person of low taste, more
http://nadav.harel.org.il   |interested in himself than in me.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/31] nVMX: Fix local_vcpus_link handling

2011-05-16 Thread Nadav Har'El
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu vcpus_on_cpu linked list of vcpus loaded on each CPU.

The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.

Our solution is to hold, in addition to vcpus_on_cpu, a second linked list
saved_vmcss_on_cpu, which holds the current list of saved VMCSs, i.e.,
VMCSs which are loaded on this CPU but are not the vmx-vmcs of any of
the vcpus. These saved VMCSs include L1's VMCS while L2 is running
(saved_vmcs01), and L2 VMCSs not currently used - because L1 is running or
because the vmcs02_pool contains more than one entry.

When we will switch between L1's and L2's VMCSs, they need to be moved
between vcpus_on_cpu and saved_vmcs_on_cpu lists and vice versa. A new
function, nested_maintain_per_cpu_lists(), takes care of that.

Signed-off-by: Nadav Har'El n...@il.ibm.com
---
 arch/x86/kvm/vmx.c |   67 +++
 1 file changed, 67 insertions(+)

--- .before/arch/x86/kvm/vmx.c  2011-05-16 22:36:47.0 +0300
+++ .after/arch/x86/kvm/vmx.c   2011-05-16 22:36:47.0 +0300
@@ -181,6 +181,7 @@ struct saved_vmcs {
struct vmcs *vmcs;
int cpu;
int launched;
+   struct list_head local_saved_vmcss_link; /* see saved_vmcss_on_cpu */
 };
 
 /* Used to remember the last vmcs02 used for some recently used vmcs12s */
@@ -315,7 +316,20 @@ static int vmx_set_tss_addr(struct kvm *
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
+/*
+ * We maintain a per-CPU linked-list vcpus_on_cpu, holding for each CPU a list
+ * of vcpus whose VMCS are loaded on that CPU. This is needed when a CPU is
+ * brought down, and we need to VMCLEAR all VMCSs loaded on it.
+ *
+ * With nested VMX, we have additional VMCSs which are not the current
+ * vmx-vmcs of any vcpu, but may also be loaded on some CPU: While L2 is
+ * running, L1's VMCS is loaded but not the VMCS of any vcpu; While L1 is
+ * running, a previously used L2 VMCS might still be around and loaded on some
+ * CPU, somes even more than one such L2 VMCSs is kept (see VMCS02_POOL_SIZE).
+ * The list of these additional VMCSs is kept on cpu saved_vmcss_on_cpu.
+ */
 static DEFINE_PER_CPU(struct list_head, vcpus_on_cpu);
+static DEFINE_PER_CPU(struct list_head, saved_vmcss_on_cpu);
 static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
 
 static unsigned long *vmx_io_bitmap_a;
@@ -1818,6 +1832,7 @@ static int hardware_enable(void *garbage
return -EBUSY;
 
INIT_LIST_HEAD(per_cpu(vcpus_on_cpu, cpu));
+   INIT_LIST_HEAD(per_cpu(saved_vmcss_on_cpu, cpu));
rdmsrl(MSR_IA32_FEATURE_CONTROL, old);
 
test_bits = FEATURE_CONTROL_LOCKED;
@@ -1860,10 +1875,13 @@ static void kvm_cpu_vmxoff(void)
asm volatile (__ex(ASM_VMX_VMXOFF) : : : cc);
 }
 
+static void vmclear_local_saved_vmcss(void);
+
 static void hardware_disable(void *garbage)
 {
if (vmm_exclusive) {
vmclear_local_vcpus();
+   vmclear_local_saved_vmcss();
kvm_cpu_vmxoff();
}
write_cr4(read_cr4()  ~X86_CR4_VMXE);
@@ -4248,6 +4266,8 @@ static void __nested_free_saved_vmcs(voi
vmcs_clear(saved_vmcs-vmcs);
if (per_cpu(current_vmcs, saved_vmcs-cpu) == saved_vmcs-vmcs)
per_cpu(current_vmcs, saved_vmcs-cpu) = NULL;
+   list_del(saved_vmcs-local_saved_vmcss_link);
+   saved_vmcs-cpu = -1;
 }
 
 /*
@@ -4265,6 +4285,21 @@ static void nested_free_saved_vmcs(struc
free_vmcs(saved_vmcs-vmcs);
 }
 
+/*
+ * VMCLEAR all the currently unused (not vmx-vmcs on any vcpu) saved_vmcss
+ * which were loaded on the current CPU. See also vmclear_load_vcpus(), which
+ * does the same for VMCS currently used in vcpus.
+ */
+static void vmclear_local_saved_vmcss(void)
+{
+   int cpu = raw_smp_processor_id();
+   struct saved_vmcs *v, *n;
+
+   list_for_each_entry_safe(v, n, per_cpu(saved_vmcss_on_cpu, cpu),
+local_saved_vmcss_link)
+   __nested_free_saved_vmcs(v);
+}
+
 /* Free and remove from pool a vmcs02 saved for a vmcs12 (if there is one) */
 static void nested_free_vmcs02(struct vcpu_vmx *vmx, gpa_t vmptr)
 {
@@ -5143,6 +5178,38 @@ static void vmx_set_supported_cpuid(u32 
 {
 }
 
+/*
+ * Maintain the vcpus_on_cpu and saved_vmcss_on_cpu lists of vcpus and
+ * inactive saved_vmcss on nested entry (L1-L2) or nested exit (L2-L1).
+ *
+ * nested_maintain_per_cpu_lists should be called after the VMCS was switched
+ * to the new one, with parameters giving both the new on (after the entry
+ * or exit) and the old one, in that order.
+