Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-10 Thread Gleb Natapov
On Sat, Oct 09, 2010 at 08:30:18PM +0200, Avi Kivity wrote:
  On 10/07/2010 07:47 PM, Gleb Natapov wrote:
 On Thu, Oct 07, 2010 at 11:50:08AM +0200, Avi Kivity wrote:
On 10/04/2010 05:56 PM, Gleb Natapov wrote:
   If a guest accesses swapped out memory do not swap it in from vcpu thread
   context. Schedule work to do swapping and put vcpu into halted state
   instead.
   
   Interrupts will still be delivered to the guest and if interrupt will
   cause reschedule guest will continue to run another task.
   
   
   +
   +static bool can_do_async_pf(struct kvm_vcpu *vcpu)
   +{
   +if (unlikely(!irqchip_in_kernel(vcpu-kvm) ||
   + kvm_event_needs_reinjection(vcpu)))
   +return false;
   +
   +return kvm_x86_ops-interrupt_allowed(vcpu);
   +}
 
   Strictly speaking, if the cpu can handle NMIs it can take an apf?
 
 We can always do apf, but if vcpu can't do anything hwy bother. For NMI
 watchdog yes, may be it is worth to allow apf if nmi is allowed.
 
 Actually it's very dangerous - the IRET from APF will re-enable
 NMIs.  So without the guest enabling apf-in-nmi we shouldn't allow
 it.
 
Good point.

 Not worth the complexity IMO.
 
   @@ -5112,6 +5122,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (unlikely(r))
goto out;
   
   +kvm_check_async_pf_completion(vcpu);
   +if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED) {
   +/* Page is swapped out. Do synthetic halt */
   +r = 1;
   +goto out;
   +}
   +
 
   Why do it here in the fast path?  Can't you halt the cpu when
   starting the page fault?
 Page fault may complete before guest re-entry. We do not want to halt vcpu
 in this case.
 
 So unhalt on completion.
 
I want to avoid touching vcpu state from work if possible. Work code does
not contain arch dependent code right now and mp_state is x86 thing

 
   I guess the apf threads can't touch mp_state, but they can have a
   KVM_REQ to trigger the check.
 This will require KVM_REQ check on fast path, so what's the difference
 performance wise.
 
 We already have a KVM_REQ check (if (vcpu-requests)) so it doesn't
 cost anything extra.
if (vcpu-requests) does not clear req bit, so what will have to be added
is: if (kvm_check_request(KVM_REQ_APF_HLT, vcpu)) which is even more
expensive then my check (but not so expensive to worry about).

 
   
   @@ -6040,6 +6064,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm)
  int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
  {
return vcpu-arch.mp_state == KVM_MP_STATE_RUNNABLE
   +|| !list_empty_careful(vcpu-async_pf.done)
|| vcpu-arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
|| vcpu-arch.nmi_pending ||
(kvm_arch_interrupt_allowed(vcpu)
 
   Unrelated, shouldn't kvm_arch_vcpu_runnable() look at
   vcpu-requests?  Specifically KVM_REQ_EVENT?
 I think KVM_REQ_EVENT is covered by checking nmi and interrupt queue
 here.
 
 No, the nmi and interrupt queues are only filled when the lapic is
 polled via KVM_REQ_EVENT.  I'll prepare a patch.
I don't think you are correct. nmi_pending is filled before setting
KVM_REQ_EVENT and kvm_cpu_has_interrupt() checks directly in apic/pic.

 
   +
   +TRACE_EVENT(
   +kvm_async_pf_not_present,
   +TP_PROTO(u64 gva),
   +TP_ARGS(gva),
 
   Do you actually have a gva with tdp?  With nested virtualization,
   how do you interpret this gva?
 With tdp it is gpa just like tdp_page_fault gets gpa where shadow page
 version gets gva. Nested virtualization is too complex to interpret.
 
 It's not good to have a tracepoint that depends on cpu mode (without
 recording that mode). I think we have the same issue in
 trace_kvm_page_fault though.
We have mmu_is_nested(). I'll just disable apf while vcpu is in nested
mode for now.

 
   +
   +TRACE_EVENT(
   +kvm_async_pf_completed,
   +TP_PROTO(unsigned long address, struct page *page, u64 gva),
   +TP_ARGS(address, page, gva),
 
   What does address mean?  There's also gva?
 
 hva.
 
 Is hva helpful here?  Generally gpa is better, but may not be
 available since it's ambiguous.
 
 
 
   +void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
   +{
   +/* cancel outstanding work queue item */
   +while (!list_empty(vcpu-async_pf.queue)) {
   +struct kvm_async_pf *work =
   +list_entry(vcpu-async_pf.queue.next,
   +   typeof(*work), queue);
   +cancel_work_sync(work-work);
   +list_del(work-queue);
   +if (!work-page) /* work was canceled */
   +kmem_cache_free(async_pf_cache, work);
   +}
 
   Are you holding any lock here?
 
   If not, what protects vcpu-async_pf.queue?
 Nothing. It is accessed only from vcpu thread.
 

Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-10 Thread Gleb Natapov
On Sat, Oct 09, 2010 at 08:32:14PM +0200, Avi Kivity wrote:
  On 10/09/2010 08:30 PM, Avi Kivity wrote:
 So that info cpu will interfere with apf? Migration should work
 in regular way. apf state should not be migrated since it has no meaning
 on the destination. I'll make sure synthetic halt state will not
 interfere with migration.
 
 
 If you deliver an apf, the guest expects a completion.
 
 
 btw, the token generation scheme resets as well.  Draining the queue
 fixes that as well.
 
I don't see what's there to fix. Can you explain what problem you see in
the way current code works?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-10 Thread Avi Kivity

 On 10/10/2010 09:29 AM, Gleb Natapov wrote:

On Sat, Oct 09, 2010 at 08:30:18PM +0200, Avi Kivity wrote:
   On 10/07/2010 07:47 PM, Gleb Natapov wrote:
  On Thu, Oct 07, 2010 at 11:50:08AM +0200, Avi Kivity wrote:
  On 10/04/2010 05:56 PM, Gleb Natapov wrote:
 If a guest accesses swapped out memory do not swap it in from vcpu 
thread
 context. Schedule work to do swapping and put vcpu into halted state
 instead.
 
 Interrupts will still be delivered to the guest and if interrupt will
 cause reschedule guest will continue to run another task.
 
 
 +
 +static bool can_do_async_pf(struct kvm_vcpu *vcpu)
 +{
 +  if (unlikely(!irqchip_in_kernel(vcpu-kvm) ||
 +   kvm_event_needs_reinjection(vcpu)))
 +  return false;
 +
 +  return kvm_x86_ops-interrupt_allowed(vcpu);
 +}
  
 Strictly speaking, if the cpu can handle NMIs it can take an apf?
  
  We can always do apf, but if vcpu can't do anything hwy bother. For NMI
  watchdog yes, may be it is worth to allow apf if nmi is allowed.

  Actually it's very dangerous - the IRET from APF will re-enable
  NMIs.  So without the guest enabling apf-in-nmi we shouldn't allow
  it.

Good point.

  Not worth the complexity IMO.

 @@ -5112,6 +5122,13 @@ static int vcpu_enter_guest(struct kvm_vcpu 
*vcpu)
if (unlikely(r))
goto out;
 
 +  kvm_check_async_pf_completion(vcpu);
 +  if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED) {
 +  /* Page is swapped out. Do synthetic halt */
 +  r = 1;
 +  goto out;
 +  }
 +
  
 Why do it here in the fast path?  Can't you halt the cpu when
 starting the page fault?
  Page fault may complete before guest re-entry. We do not want to halt vcpu
  in this case.

  So unhalt on completion.

I want to avoid touching vcpu state from work if possible. Work code does
not contain arch dependent code right now and mp_state is x86 thing



Use a KVM_REQ.



  
 I guess the apf threads can't touch mp_state, but they can have a
 KVM_REQ to trigger the check.
  This will require KVM_REQ check on fast path, so what's the difference
  performance wise.

  We already have a KVM_REQ check (if (vcpu-requests)) so it doesn't
  cost anything extra.
if (vcpu-requests) does not clear req bit, so what will have to be added
is: if (kvm_check_request(KVM_REQ_APF_HLT, vcpu)) which is even more
expensive then my check (but not so expensive to worry about).


It's only expensive when it happens.  Most entries will have the bit clear.



 
 @@ -6040,6 +6064,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm)
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 {
return vcpu-arch.mp_state == KVM_MP_STATE_RUNNABLE
 +  || !list_empty_careful(vcpu-async_pf.done)
|| vcpu-arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
|| vcpu-arch.nmi_pending ||
(kvm_arch_interrupt_allowed(vcpu)
  
 Unrelated, shouldn't kvm_arch_vcpu_runnable() look at
 vcpu-requests?  Specifically KVM_REQ_EVENT?
  I think KVM_REQ_EVENT is covered by checking nmi and interrupt queue
  here.

  No, the nmi and interrupt queues are only filled when the lapic is
  polled via KVM_REQ_EVENT.  I'll prepare a patch.
I don't think you are correct. nmi_pending is filled before setting
KVM_REQ_EVENT and kvm_cpu_has_interrupt() checks directly in apic/pic.


Right.



 +
 +TRACE_EVENT(
 +  kvm_async_pf_not_present,
 +  TP_PROTO(u64 gva),
 +  TP_ARGS(gva),
  
 Do you actually have a gva with tdp?  With nested virtualization,
 how do you interpret this gva?
  With tdp it is gpa just like tdp_page_fault gets gpa where shadow page
  version gets gva. Nested virtualization is too complex to interpret.

  It's not good to have a tracepoint that depends on cpu mode (without
  recording that mode). I think we have the same issue in
  trace_kvm_page_fault though.
We have mmu_is_nested(). I'll just disable apf while vcpu is in nested
mode for now.


What if we get the apf in non-nested mode and it completes in nested mode?



 +
 +  /* do alloc nowait since if we are going to sleep anyway we
 + may as well sleep faulting in page */
 /*
  * multi
  * line
  * comment
  */
  
 (but a good one, this is subtle)
  
 I missed where you halt the vcpu.  Can you point me at the function?
  
 Note this is a synthetic halt and must not be visible to live
 migration, or we risk live migrating a halted state which doesn't
 really exist.
  
 Might be simplest to drain the apf queue on any of the save/restore 
ioctls.
  
  So that info cpu will interfere with apf? Migration should work
  in regular way. apf state should not be migrated since it has no meaning
  on the destination. I'll make sure synthetic halt state will not
  

Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-10 Thread Avi Kivity

 On 10/10/2010 05:55 PM, Avi Kivity wrote:

There is special completion that tells guest to wake all sleeping tasks
on vcpu. It is delivered after migration on the destination.




Yes, I saw.

What if you can't deliver it?  is it possible that some other vcpu 
will start receiving apfs that alias the old ones?  Or is the 
broadcast global?




And, is the broadcast used only for migrations?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-10 Thread Gleb Natapov
On Sun, Oct 10, 2010 at 05:55:25PM +0200, Avi Kivity wrote:
 
  @@ -5112,6 +5122,13 @@ static int vcpu_enter_guest(struct kvm_vcpu 
  *vcpu)
  if (unlikely(r))
  goto out;
  
  +   kvm_check_async_pf_completion(vcpu);
  +   if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED) {
  +   /* Page is swapped out. Do synthetic halt */
  +   r = 1;
  +   goto out;
  +   }
  +
   
  Why do it here in the fast path?  Can't you halt the cpu when
  starting the page fault?
   Page fault may complete before guest re-entry. We do not want to halt 
  vcpu
   in this case.
 
   So unhalt on completion.
 
 I want to avoid touching vcpu state from work if possible. Work code does
 not contain arch dependent code right now and mp_state is x86 thing
 
 
 Use a KVM_REQ.
 
Completion happens asynchronously. CPU may not be even halted at that
point. Actually completion does unhalt vcpu. It puts completed work into
vcpu-async_pf.done list and wakes vcpu thread if it sleeps. Next
invocation of kvm_arch_vcpu_runnable() will return true since 
vcpu-async_pf.done
is not empty and vcpu will be unhalted in usual way by kvm_vcpu_block().

 
   
  I guess the apf threads can't touch mp_state, but they can have a
  KVM_REQ to trigger the check.
   This will require KVM_REQ check on fast path, so what's the difference
   performance wise.
 
   We already have a KVM_REQ check (if (vcpu-requests)) so it doesn't
   cost anything extra.
 if (vcpu-requests) does not clear req bit, so what will have to be added
 is: if (kvm_check_request(KVM_REQ_APF_HLT, vcpu)) which is even more
 expensive then my check (but not so expensive to worry about).
 
 It's only expensive when it happens.  Most entries will have the bit clear.
kvm_check_async_pf_completion() (the one that detects if vcpu should be
halted) is called after vcpu-requests processing. It is done in order
to delay completion checking as far as possible in hope to get
completion before next vcpu entry and skip sending apf, so I do it at
the last possible moment before event injection.

 
  +
  +TRACE_EVENT(
  +   kvm_async_pf_not_present,
  +   TP_PROTO(u64 gva),
  +   TP_ARGS(gva),
   
  Do you actually have a gva with tdp?  With nested virtualization,
  how do you interpret this gva?
   With tdp it is gpa just like tdp_page_fault gets gpa where shadow page
   version gets gva. Nested virtualization is too complex to interpret.
 
   It's not good to have a tracepoint that depends on cpu mode (without
   recording that mode). I think we have the same issue in
   trace_kvm_page_fault though.
 We have mmu_is_nested(). I'll just disable apf while vcpu is in nested
 mode for now.
 
 What if we get the apf in non-nested mode and it completes in nested mode?
 
I am not yet sure we have any problem with nested mode at all. I am
looking at it. If we have we can skip prefault if in nested.

 
  +
  +   /* do alloc nowait since if we are going to sleep anyway we
  +  may as well sleep faulting in page */
  /*
   * multi
   * line
   * comment
   */
   
  (but a good one, this is subtle)
   
  I missed where you halt the vcpu.  Can you point me at the function?
   
  Note this is a synthetic halt and must not be visible to live
  migration, or we risk live migrating a halted state which doesn't
  really exist.
   
  Might be simplest to drain the apf queue on any of the save/restore 
  ioctls.
   
   So that info cpu will interfere with apf? Migration should work
   in regular way. apf state should not be migrated since it has no meaning
   on the destination. I'll make sure synthetic halt state will not
   interfere with migration.
 
   If you deliver an apf, the guest expects a completion.
 
 There is special completion that tells guest to wake all sleeping tasks
 on vcpu. It is delivered after migration on the destination.
 
 
 Yes, I saw.
 
 What if you can't deliver it?  is it possible that some other vcpu
How can this happen? If I can't deliverer it I can't deliver
non-broadcast apfs too.

 will start receiving apfs that alias the old ones?  Or is the
 broadcast global?
 
Broadcast is not global but tokens are unique per cpu so other vcpu will
not be able to receiving apfs that alias the old ones (if I understand
what you mean correctly). 

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-10 Thread Gleb Natapov
On Sun, Oct 10, 2010 at 05:56:31PM +0200, Avi Kivity wrote:
  On 10/10/2010 05:55 PM, Avi Kivity wrote:
 There is special completion that tells guest to wake all sleeping tasks
 on vcpu. It is delivered after migration on the destination.
 
 
 
 Yes, I saw.
 
 What if you can't deliver it?  is it possible that some other vcpu
 will start receiving apfs that alias the old ones?  Or is the
 broadcast global?
 
 
 And, is the broadcast used only for migrations?
 
Any time apf is enabled on vcpu broadcast is sent to the vcpu. Guest should be
careful here and do not write to apf msr without a reason.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-09 Thread Avi Kivity

 On 10/07/2010 07:47 PM, Gleb Natapov wrote:

On Thu, Oct 07, 2010 at 11:50:08AM +0200, Avi Kivity wrote:
   On 10/04/2010 05:56 PM, Gleb Natapov wrote:
  If a guest accesses swapped out memory do not swap it in from vcpu thread
  context. Schedule work to do swapping and put vcpu into halted state
  instead.
  
  Interrupts will still be delivered to the guest and if interrupt will
  cause reschedule guest will continue to run another task.
  
  
  +
  +static bool can_do_async_pf(struct kvm_vcpu *vcpu)
  +{
  + if (unlikely(!irqchip_in_kernel(vcpu-kvm) ||
  +  kvm_event_needs_reinjection(vcpu)))
  + return false;
  +
  + return kvm_x86_ops-interrupt_allowed(vcpu);
  +}

  Strictly speaking, if the cpu can handle NMIs it can take an apf?

We can always do apf, but if vcpu can't do anything hwy bother. For NMI
watchdog yes, may be it is worth to allow apf if nmi is allowed.


Actually it's very dangerous - the IRET from APF will re-enable NMIs.  
So without the guest enabling apf-in-nmi we shouldn't allow it.


Not worth the complexity IMO.


  @@ -5112,6 +5122,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (unlikely(r))
goto out;
  
  + kvm_check_async_pf_completion(vcpu);
  + if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED) {
  + /* Page is swapped out. Do synthetic halt */
  + r = 1;
  + goto out;
  + }
  +

  Why do it here in the fast path?  Can't you halt the cpu when
  starting the page fault?
Page fault may complete before guest re-entry. We do not want to halt vcpu
in this case.


So unhalt on completion.



  I guess the apf threads can't touch mp_state, but they can have a
  KVM_REQ to trigger the check.
This will require KVM_REQ check on fast path, so what's the difference
performance wise.


We already have a KVM_REQ check (if (vcpu-requests)) so it doesn't cost 
anything extra.



  
  @@ -6040,6 +6064,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm)
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 {
return vcpu-arch.mp_state == KVM_MP_STATE_RUNNABLE
  + || !list_empty_careful(vcpu-async_pf.done)
|| vcpu-arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
|| vcpu-arch.nmi_pending ||
(kvm_arch_interrupt_allowed(vcpu)

  Unrelated, shouldn't kvm_arch_vcpu_runnable() look at
  vcpu-requests?  Specifically KVM_REQ_EVENT?
I think KVM_REQ_EVENT is covered by checking nmi and interrupt queue
here.


No, the nmi and interrupt queues are only filled when the lapic is 
polled via KVM_REQ_EVENT.  I'll prepare a patch.



  +
  +TRACE_EVENT(
  + kvm_async_pf_not_present,
  + TP_PROTO(u64 gva),
  + TP_ARGS(gva),

  Do you actually have a gva with tdp?  With nested virtualization,
  how do you interpret this gva?
With tdp it is gpa just like tdp_page_fault gets gpa where shadow page
version gets gva. Nested virtualization is too complex to interpret.


It's not good to have a tracepoint that depends on cpu mode (without 
recording that mode). I think we have the same issue in 
trace_kvm_page_fault though.



  +
  +TRACE_EVENT(
  + kvm_async_pf_completed,
  + TP_PROTO(unsigned long address, struct page *page, u64 gva),
  + TP_ARGS(address, page, gva),

  What does address mean?  There's also gva?

hva.


Is hva helpful here?  Generally gpa is better, but may not be available 
since it's ambiguous.





  +void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
  +{
  + /* cancel outstanding work queue item */
  + while (!list_empty(vcpu-async_pf.queue)) {
  + struct kvm_async_pf *work =
  + list_entry(vcpu-async_pf.queue.next,
  +typeof(*work), queue);
  + cancel_work_sync(work-work);
  + list_del(work-queue);
  + if (!work-page) /* work was canceled */
  + kmem_cache_free(async_pf_cache, work);
  + }

  Are you holding any lock here?

  If not, what protects vcpu-async_pf.queue?
Nothing. It is accessed only from vcpu thread.

  If yes, cancel_work_sync() will need to aquire it too (in case work
  is running now and needs to take the lock, and cacncel_work_sync()
  needs to wait for it) -  deadlock.

Work never touches this list.


So, an apf is always in -queue and when completed also in -done?

Is it not cleaner to list_move the apf from -queue to -done?  saves a 
-link.


Can be done later.


  +
  + /* do alloc nowait since if we are going to sleep anyway we
  +may as well sleep faulting in page */
  /*
   * multi
   * line
   * comment
   */

  (but a good one, this is subtle)

  I missed where you halt the vcpu.  Can you point me at the function?

  Note this is a synthetic halt and must not be visible to live
  migration, or we risk live migrating a halted state which doesn't
  really exist.

  Might be simplest to drain the apf queue on 

Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-09 Thread Avi Kivity

 On 10/09/2010 08:30 PM, Avi Kivity wrote:

So that info cpu will interfere with apf? Migration should work
in regular way. apf state should not be migrated since it has no meaning
on the destination. I'll make sure synthetic halt state will not
interfere with migration.



If you deliver an apf, the guest expects a completion.



btw, the token generation scheme resets as well.  Draining the queue 
fixes that as well.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-07 Thread Avi Kivity

 On 10/04/2010 05:56 PM, Gleb Natapov wrote:

If a guest accesses swapped out memory do not swap it in from vcpu thread
context. Schedule work to do swapping and put vcpu into halted state
instead.

Interrupts will still be delivered to the guest and if interrupt will
cause reschedule guest will continue to run another task.


+
+static bool can_do_async_pf(struct kvm_vcpu *vcpu)
+{
+   if (unlikely(!irqchip_in_kernel(vcpu-kvm) ||
+kvm_event_needs_reinjection(vcpu)))
+   return false;
+
+   return kvm_x86_ops-interrupt_allowed(vcpu);
+}


Strictly speaking, if the cpu can handle NMIs it can take an apf?


@@ -5112,6 +5122,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (unlikely(r))
goto out;

+   kvm_check_async_pf_completion(vcpu);
+   if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED) {
+   /* Page is swapped out. Do synthetic halt */
+   r = 1;
+   goto out;
+   }
+


Why do it here in the fast path?  Can't you halt the cpu when starting 
the page fault?


I guess the apf threads can't touch mp_state, but they can have a 
KVM_REQ to trigger the check.



if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
inject_pending_event(vcpu);

@@ -5781,6 +5798,9 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)

kvm_make_request(KVM_REQ_EVENT, vcpu);

+   kvm_clear_async_pf_completion_queue(vcpu);
+   memset(vcpu-arch.apf.gfns, 0xff, sizeof vcpu-arch.apf.gfns);


An ordinary for loop is less tricky, even if it means one more line.



@@ -6040,6 +6064,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm)
  int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
  {
return vcpu-arch.mp_state == KVM_MP_STATE_RUNNABLE
+   || !list_empty_careful(vcpu-async_pf.done)
|| vcpu-arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
|| vcpu-arch.nmi_pending ||
(kvm_arch_interrupt_allowed(vcpu)


Unrelated, shouldn't kvm_arch_vcpu_runnable() look at vcpu-requests?  
Specifically KVM_REQ_EVENT?



+static void kvm_add_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+   u32 key = kvm_async_pf_hash_fn(gfn);
+
+   while (vcpu-arch.apf.gfns[key] != -1)
+   key = kvm_async_pf_next_probe(key);


Not sure what that -1 converts to on i386 where gfn_t is u64.

+
+void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work)
+{
+   vcpu-arch.mp_state = KVM_MP_STATE_HALTED;
+
+   if (work == kvm_double_apf)
+   trace_kvm_async_pf_doublefault(kvm_rip_read(vcpu));
+   else {
+   trace_kvm_async_pf_not_present(work-gva);
+
+   kvm_add_async_pf_gfn(vcpu, work-arch.gfn);
+   }
+}


Just have vcpu as the argument for tracepoints to avoid unconditional 
kvm_rip_read (slow on Intel), and call kvm_rip_read() in 
tp_fast_assign().  Similarly you can pass work instead of work-gva, 
though that's not nearly as important.



+
+TRACE_EVENT(
+   kvm_async_pf_not_present,
+   TP_PROTO(u64 gva),
+   TP_ARGS(gva),


Do you actually have a gva with tdp?  With nested virtualization, how do 
you interpret this gva?

+
+TRACE_EVENT(
+   kvm_async_pf_completed,
+   TP_PROTO(unsigned long address, struct page *page, u64 gva),
+   TP_ARGS(address, page, gva),


What does address mean?  There's also gva?


+
+   TP_STRUCT__entry(
+   __field(unsigned long, address)
+   __field(struct page*, page)
+   __field(u64, gva)
+   ),
+
+   TP_fast_assign(
+   __entry-address = address;
+   __entry-page = page;
+   __entry-gva = gva;
+   ),


Recording a struct page * in a tracepoint?  Userspace can read this 
entry, better to the page_to_pfn() here.




+void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
+{
+   /* cancel outstanding work queue item */
+   while (!list_empty(vcpu-async_pf.queue)) {
+   struct kvm_async_pf *work =
+   list_entry(vcpu-async_pf.queue.next,
+  typeof(*work), queue);
+   cancel_work_sync(work-work);
+   list_del(work-queue);
+   if (!work-page) /* work was canceled */
+   kmem_cache_free(async_pf_cache, work);
+   }


Are you holding any lock here?

If not, what protects vcpu-async_pf.queue?
If yes, cancel_work_sync() will need to aquire it too (in case work is 
running now and needs to take the lock, and cacncel_work_sync() needs to 
wait for it) - deadlock.



+
+   /* do alloc nowait since if we are going to sleep anyway we
+  may as well sleep faulting in page */

/*
 * multi
 * line
 * comment
 */

(but a good one, this is subtle)

I missed where you halt the vcpu.  Can you point me at the function?

Note this 

Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-07 Thread Avi Kivity

 On 10/07/2010 11:50 AM, Avi Kivity wrote:


I missed where you halt the vcpu.  Can you point me at the function?


Found it.

Multiplexing a guest state with a host state is a bad idea.  Better have 
separate state.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-07 Thread Avi Kivity

 On 10/06/2010 12:52 PM, Gleb Natapov wrote:

On Wed, Oct 06, 2010 at 12:50:01PM +0200, Avi Kivity wrote:
   On 10/05/2010 04:59 PM, Marcelo Tosatti wrote:
  On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote:
 If a guest accesses swapped out memory do not swap it in from vcpu 
thread
 context. Schedule work to do swapping and put vcpu into halted state
 instead.
  
 Interrupts will still be delivered to the guest and if interrupt will
 cause reschedule guest will continue to run another task.
  
 Signed-off-by: Gleb Natapovg...@redhat.com
 ---
  arch/x86/include/asm/kvm_host.h |   17 +++
  arch/x86/kvm/Kconfig|1 +
  arch/x86/kvm/Makefile   |1 +
  arch/x86/kvm/mmu.c  |   51 +-
  arch/x86/kvm/paging_tmpl.h  |4 +-
  arch/x86/kvm/x86.c  |  109 +++-
  include/linux/kvm_host.h|   31 ++
  include/trace/events/kvm.h  |   88 
  virt/kvm/Kconfig|3 +
  virt/kvm/async_pf.c |  220 
+++
  virt/kvm/async_pf.h |   36 +++
  virt/kvm/kvm_main.c |   57 --
  12 files changed, 603 insertions(+), 15 deletions(-)
  create mode 100644 virt/kvm/async_pf.c
  create mode 100644 virt/kvm/async_pf.h
  
  
 +  async_pf_cache = NULL;
 +}
 +
 +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu)
 +{
 +  INIT_LIST_HEAD(vcpu-async_pf.done);
 +  INIT_LIST_HEAD(vcpu-async_pf.queue);
 +  spin_lock_init(vcpu-async_pf.lock);
 +}
 +
 +static void async_pf_execute(struct work_struct *work)
 +{
 +  struct page *page;
 +  struct kvm_async_pf *apf =
 +  container_of(work, struct kvm_async_pf, work);
 +  struct mm_struct *mm = apf-mm;
 +  struct kvm_vcpu *vcpu = apf-vcpu;
 +  unsigned long addr = apf-addr;
 +  gva_t gva = apf-gva;
 +
 +  might_sleep();
 +
 +  use_mm(mm);
 +  down_read(mm-mmap_sem);
 +  get_user_pages(current, mm, addr, 1, 1, 0,page, NULL);
 +  up_read(mm-mmap_sem);
 +  unuse_mm(mm);
 +
 +  spin_lock(vcpu-async_pf.lock);
 +  list_add_tail(apf-link,vcpu-async_pf.done);
 +  apf-page = page;
 +  spin_unlock(vcpu-async_pf.lock);
  
  This can fail, and apf-page become NULL.

  Does it even become NULL?  On error, get_user_pages() won't update
  the pages argument, so page becomes garbage here.

apf is allocated with kmem_cache_zalloc() and -page is set to NULL in
kvm_setup_async_pf() to be extra sure.



But you assign apf-page = page;, overriding it with garbage here.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-07 Thread Rik van Riel

On 10/07/2010 05:50 AM, Avi Kivity wrote:


+static bool can_do_async_pf(struct kvm_vcpu *vcpu)
+{
+ if (unlikely(!irqchip_in_kernel(vcpu-kvm) ||
+ kvm_event_needs_reinjection(vcpu)))
+ return false;
+
+ return kvm_x86_ops-interrupt_allowed(vcpu);
+}


Strictly speaking, if the cpu can handle NMIs it can take an apf?


Strictly speaking, yes.

However, it may not be able to DO anything with it, since
it won't be able to reschedule the context it's running :)

--
All rights reversed
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-07 Thread Avi Kivity

 On 10/07/2010 03:24 PM, Rik van Riel wrote:

On 10/07/2010 05:50 AM, Avi Kivity wrote:


+static bool can_do_async_pf(struct kvm_vcpu *vcpu)
+{
+ if (unlikely(!irqchip_in_kernel(vcpu-kvm) ||
+ kvm_event_needs_reinjection(vcpu)))
+ return false;
+
+ return kvm_x86_ops-interrupt_allowed(vcpu);
+}


Strictly speaking, if the cpu can handle NMIs it can take an apf?


Strictly speaking, yes.

However, it may not be able to DO anything with it, since
it won't be able to reschedule the context it's running :)



I was thinking about keeping the watchdog nmi handler responsive.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-07 Thread Gleb Natapov
On Thu, Oct 07, 2010 at 11:50:08AM +0200, Avi Kivity wrote:
  On 10/04/2010 05:56 PM, Gleb Natapov wrote:
 If a guest accesses swapped out memory do not swap it in from vcpu thread
 context. Schedule work to do swapping and put vcpu into halted state
 instead.
 
 Interrupts will still be delivered to the guest and if interrupt will
 cause reschedule guest will continue to run another task.
 
 
 +
 +static bool can_do_async_pf(struct kvm_vcpu *vcpu)
 +{
 +if (unlikely(!irqchip_in_kernel(vcpu-kvm) ||
 + kvm_event_needs_reinjection(vcpu)))
 +return false;
 +
 +return kvm_x86_ops-interrupt_allowed(vcpu);
 +}
 
 Strictly speaking, if the cpu can handle NMIs it can take an apf?
 
We can always do apf, but if vcpu can't do anything hwy bother. For NMI
watchdog yes, may be it is worth to allow apf if nmi is allowed.

 @@ -5112,6 +5122,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
  if (unlikely(r))
  goto out;
 
 +kvm_check_async_pf_completion(vcpu);
 +if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED) {
 +/* Page is swapped out. Do synthetic halt */
 +r = 1;
 +goto out;
 +}
 +
 
 Why do it here in the fast path?  Can't you halt the cpu when
 starting the page fault?
Page fault may complete before guest re-entry. We do not want to halt vcpu
in this case.
 
 I guess the apf threads can't touch mp_state, but they can have a
 KVM_REQ to trigger the check.
This will require KVM_REQ check on fast path, so what's the difference
performance wise.

 
  if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
  inject_pending_event(vcpu);
 
 @@ -5781,6 +5798,9 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
 
  kvm_make_request(KVM_REQ_EVENT, vcpu);
 
 +kvm_clear_async_pf_completion_queue(vcpu);
 +memset(vcpu-arch.apf.gfns, 0xff, sizeof vcpu-arch.apf.gfns);
 
 An ordinary for loop is less tricky, even if it means one more line.
 
 
 @@ -6040,6 +6064,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm)
   int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
   {
  return vcpu-arch.mp_state == KVM_MP_STATE_RUNNABLE
 +|| !list_empty_careful(vcpu-async_pf.done)
  || vcpu-arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
  || vcpu-arch.nmi_pending ||
  (kvm_arch_interrupt_allowed(vcpu)
 
 Unrelated, shouldn't kvm_arch_vcpu_runnable() look at
 vcpu-requests?  Specifically KVM_REQ_EVENT?
I think KVM_REQ_EVENT is covered by checking nmi and interrupt queue
here.

 
 +static void kvm_add_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 +{
 +u32 key = kvm_async_pf_hash_fn(gfn);
 +
 +while (vcpu-arch.apf.gfns[key] != -1)
 +key = kvm_async_pf_next_probe(key);
 
 Not sure what that -1 converts to on i386 where gfn_t is u64.
Will check.

 +
 +void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 + struct kvm_async_pf *work)
 +{
 +vcpu-arch.mp_state = KVM_MP_STATE_HALTED;
 +
 +if (work == kvm_double_apf)
 +trace_kvm_async_pf_doublefault(kvm_rip_read(vcpu));
 +else {
 +trace_kvm_async_pf_not_present(work-gva);
 +
 +kvm_add_async_pf_gfn(vcpu, work-arch.gfn);
 +}
 +}
 
 Just have vcpu as the argument for tracepoints to avoid
 unconditional kvm_rip_read (slow on Intel), and call kvm_rip_read()
 in tp_fast_assign().  Similarly you can pass work instead of
 work-gva, though that's not nearly as important.
 
Will do.

 +
 +TRACE_EVENT(
 +kvm_async_pf_not_present,
 +TP_PROTO(u64 gva),
 +TP_ARGS(gva),
 
 Do you actually have a gva with tdp?  With nested virtualization,
 how do you interpret this gva?
With tdp it is gpa just like tdp_page_fault gets gpa where shadow page
version gets gva. Nested virtualization is too complex to interpret.

 +
 +TRACE_EVENT(
 +kvm_async_pf_completed,
 +TP_PROTO(unsigned long address, struct page *page, u64 gva),
 +TP_ARGS(address, page, gva),
 
 What does address mean?  There's also gva?
 
hva.

 +
 +TP_STRUCT__entry(
 +__field(unsigned long, address)
 +__field(struct page*, page)
 +__field(u64, gva)
 +),
 +
 +TP_fast_assign(
 +__entry-address = address;
 +__entry-page = page;
 +__entry-gva = gva;
 +),
 
 Recording a struct page * in a tracepoint?  Userspace can read this
 entry, better to the page_to_pfn() here.

OK.
 
 
 +void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
 +{
 +/* cancel outstanding work queue item */
 +while (!list_empty(vcpu-async_pf.queue)) {
 +struct kvm_async_pf *work =
 +list_entry(vcpu-async_pf.queue.next,
 +   typeof(*work), queue);
 +cancel_work_sync(work-work);
 +list_del(work-queue);
 +if (!work-page) /* work was canceled */
 +

Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-07 Thread Gleb Natapov
On Thu, Oct 07, 2010 at 11:54:22AM +0200, Avi Kivity wrote:
  On 10/06/2010 12:52 PM, Gleb Natapov wrote:
 On Wed, Oct 06, 2010 at 12:50:01PM +0200, Avi Kivity wrote:
On 10/05/2010 04:59 PM, Marcelo Tosatti wrote:
   On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote:
  If a guest accesses swapped out memory do not swap it in from vcpu 
  thread
  context. Schedule work to do swapping and put vcpu into halted state
  instead.
   
  Interrupts will still be delivered to the guest and if interrupt will
  cause reschedule guest will continue to run another task.
   
  Signed-off-by: Gleb Natapovg...@redhat.com
  ---
   arch/x86/include/asm/kvm_host.h |   17 +++
   arch/x86/kvm/Kconfig|1 +
   arch/x86/kvm/Makefile   |1 +
   arch/x86/kvm/mmu.c  |   51 +-
   arch/x86/kvm/paging_tmpl.h  |4 +-
   arch/x86/kvm/x86.c  |  109 +++-
   include/linux/kvm_host.h|   31 ++
   include/trace/events/kvm.h  |   88 
   virt/kvm/Kconfig|3 +
   virt/kvm/async_pf.c |  220 
  +++
   virt/kvm/async_pf.h |   36 +++
   virt/kvm/kvm_main.c |   57 --
   12 files changed, 603 insertions(+), 15 deletions(-)
   create mode 100644 virt/kvm/async_pf.c
   create mode 100644 virt/kvm/async_pf.h
   
   
  +async_pf_cache = NULL;
  +}
  +
  +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu)
  +{
  +INIT_LIST_HEAD(vcpu-async_pf.done);
  +INIT_LIST_HEAD(vcpu-async_pf.queue);
  +spin_lock_init(vcpu-async_pf.lock);
  +}
  +
  +static void async_pf_execute(struct work_struct *work)
  +{
  +struct page *page;
  +struct kvm_async_pf *apf =
  +container_of(work, struct kvm_async_pf, work);
  +struct mm_struct *mm = apf-mm;
  +struct kvm_vcpu *vcpu = apf-vcpu;
  +unsigned long addr = apf-addr;
  +gva_t gva = apf-gva;
  +
  +might_sleep();
  +
  +use_mm(mm);
  +down_read(mm-mmap_sem);
  +get_user_pages(current, mm, addr, 1, 1, 0,page, NULL);
  +up_read(mm-mmap_sem);
  +unuse_mm(mm);
  +
  +spin_lock(vcpu-async_pf.lock);
  +list_add_tail(apf-link,vcpu-async_pf.done);
  +apf-page = page;
  +spin_unlock(vcpu-async_pf.lock);
   
   This can fail, and apf-page become NULL.
 
   Does it even become NULL?  On error, get_user_pages() won't update
   the pages argument, so page becomes garbage here.
 
 apf is allocated with kmem_cache_zalloc() and -page is set to NULL in
 kvm_setup_async_pf() to be extra sure.
 
 
 But you assign apf-page = page;, overriding it with garbage here.
 
Ah, right you are.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-06 Thread Avi Kivity

 On 10/05/2010 04:59 PM, Marcelo Tosatti wrote:

On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote:
  If a guest accesses swapped out memory do not swap it in from vcpu thread
  context. Schedule work to do swapping and put vcpu into halted state
  instead.

  Interrupts will still be delivered to the guest and if interrupt will
  cause reschedule guest will continue to run another task.

  Signed-off-by: Gleb Natapovg...@redhat.com
  ---
   arch/x86/include/asm/kvm_host.h |   17 +++
   arch/x86/kvm/Kconfig|1 +
   arch/x86/kvm/Makefile   |1 +
   arch/x86/kvm/mmu.c  |   51 +-
   arch/x86/kvm/paging_tmpl.h  |4 +-
   arch/x86/kvm/x86.c  |  109 +++-
   include/linux/kvm_host.h|   31 ++
   include/trace/events/kvm.h  |   88 
   virt/kvm/Kconfig|3 +
   virt/kvm/async_pf.c |  220 
+++
   virt/kvm/async_pf.h |   36 +++
   virt/kvm/kvm_main.c |   57 --
   12 files changed, 603 insertions(+), 15 deletions(-)
   create mode 100644 virt/kvm/async_pf.c
   create mode 100644 virt/kvm/async_pf.h


  + async_pf_cache = NULL;
  +}
  +
  +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu)
  +{
  + INIT_LIST_HEAD(vcpu-async_pf.done);
  + INIT_LIST_HEAD(vcpu-async_pf.queue);
  + spin_lock_init(vcpu-async_pf.lock);
  +}
  +
  +static void async_pf_execute(struct work_struct *work)
  +{
  + struct page *page;
  + struct kvm_async_pf *apf =
  + container_of(work, struct kvm_async_pf, work);
  + struct mm_struct *mm = apf-mm;
  + struct kvm_vcpu *vcpu = apf-vcpu;
  + unsigned long addr = apf-addr;
  + gva_t gva = apf-gva;
  +
  + might_sleep();
  +
  + use_mm(mm);
  + down_read(mm-mmap_sem);
  + get_user_pages(current, mm, addr, 1, 1, 0,page, NULL);
  + up_read(mm-mmap_sem);
  + unuse_mm(mm);
  +
  + spin_lock(vcpu-async_pf.lock);
  + list_add_tail(apf-link,vcpu-async_pf.done);
  + apf-page = page;
  + spin_unlock(vcpu-async_pf.lock);

This can fail, and apf-page become NULL.


Does it even become NULL?  On error, get_user_pages() won't update the 
pages argument, so page becomes garbage here.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-06 Thread Gleb Natapov
On Wed, Oct 06, 2010 at 12:50:01PM +0200, Avi Kivity wrote:
  On 10/05/2010 04:59 PM, Marcelo Tosatti wrote:
 On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote:
   If a guest accesses swapped out memory do not swap it in from vcpu thread
   context. Schedule work to do swapping and put vcpu into halted state
   instead.
 
   Interrupts will still be delivered to the guest and if interrupt will
   cause reschedule guest will continue to run another task.
 
   Signed-off-by: Gleb Natapovg...@redhat.com
   ---
arch/x86/include/asm/kvm_host.h |   17 +++
arch/x86/kvm/Kconfig|1 +
arch/x86/kvm/Makefile   |1 +
arch/x86/kvm/mmu.c  |   51 +-
arch/x86/kvm/paging_tmpl.h  |4 +-
arch/x86/kvm/x86.c  |  109 +++-
include/linux/kvm_host.h|   31 ++
include/trace/events/kvm.h  |   88 
virt/kvm/Kconfig|3 +
virt/kvm/async_pf.c |  220 
  +++
virt/kvm/async_pf.h |   36 +++
virt/kvm/kvm_main.c |   57 --
12 files changed, 603 insertions(+), 15 deletions(-)
create mode 100644 virt/kvm/async_pf.c
create mode 100644 virt/kvm/async_pf.h
 
 
   + async_pf_cache = NULL;
   +}
   +
   +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu)
   +{
   + INIT_LIST_HEAD(vcpu-async_pf.done);
   + INIT_LIST_HEAD(vcpu-async_pf.queue);
   + spin_lock_init(vcpu-async_pf.lock);
   +}
   +
   +static void async_pf_execute(struct work_struct *work)
   +{
   + struct page *page;
   + struct kvm_async_pf *apf =
   + container_of(work, struct kvm_async_pf, work);
   + struct mm_struct *mm = apf-mm;
   + struct kvm_vcpu *vcpu = apf-vcpu;
   + unsigned long addr = apf-addr;
   + gva_t gva = apf-gva;
   +
   + might_sleep();
   +
   + use_mm(mm);
   + down_read(mm-mmap_sem);
   + get_user_pages(current, mm, addr, 1, 1, 0,page, NULL);
   + up_read(mm-mmap_sem);
   + unuse_mm(mm);
   +
   + spin_lock(vcpu-async_pf.lock);
   + list_add_tail(apf-link,vcpu-async_pf.done);
   + apf-page = page;
   + spin_unlock(vcpu-async_pf.lock);
 
 This can fail, and apf-page become NULL.
 
 Does it even become NULL?  On error, get_user_pages() won't update
 the pages argument, so page becomes garbage here.
 
apf is allocated with kmem_cache_zalloc() and -page is set to NULL in
kvm_setup_async_pf() to be extra sure.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-06 Thread Gleb Natapov
On Tue, Oct 05, 2010 at 11:59:16AM -0300, Marcelo Tosatti wrote:
 On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote:
  If a guest accesses swapped out memory do not swap it in from vcpu thread
  context. Schedule work to do swapping and put vcpu into halted state
  instead.
  
  Interrupts will still be delivered to the guest and if interrupt will
  cause reschedule guest will continue to run another task.
  
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   arch/x86/include/asm/kvm_host.h |   17 +++
   arch/x86/kvm/Kconfig|1 +
   arch/x86/kvm/Makefile   |1 +
   arch/x86/kvm/mmu.c  |   51 +-
   arch/x86/kvm/paging_tmpl.h  |4 +-
   arch/x86/kvm/x86.c  |  109 +++-
   include/linux/kvm_host.h|   31 ++
   include/trace/events/kvm.h  |   88 
   virt/kvm/Kconfig|3 +
   virt/kvm/async_pf.c |  220 
  +++
   virt/kvm/async_pf.h |   36 +++
   virt/kvm/kvm_main.c |   57 --
   12 files changed, 603 insertions(+), 15 deletions(-)
   create mode 100644 virt/kvm/async_pf.c
   create mode 100644 virt/kvm/async_pf.h
  
 
  +   async_pf_cache = NULL;
  +}
  +
  +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu)
  +{
  +   INIT_LIST_HEAD(vcpu-async_pf.done);
  +   INIT_LIST_HEAD(vcpu-async_pf.queue);
  +   spin_lock_init(vcpu-async_pf.lock);
  +}
  +
  +static void async_pf_execute(struct work_struct *work)
  +{
  +   struct page *page;
  +   struct kvm_async_pf *apf =
  +   container_of(work, struct kvm_async_pf, work);
  +   struct mm_struct *mm = apf-mm;
  +   struct kvm_vcpu *vcpu = apf-vcpu;
  +   unsigned long addr = apf-addr;
  +   gva_t gva = apf-gva;
  +
  +   might_sleep();
  +
  +   use_mm(mm);
  +   down_read(mm-mmap_sem);
  +   get_user_pages(current, mm, addr, 1, 1, 0, page, NULL);
  +   up_read(mm-mmap_sem);
  +   unuse_mm(mm);
  +
  +   spin_lock(vcpu-async_pf.lock);
  +   list_add_tail(apf-link, vcpu-async_pf.done);
  +   apf-page = page;
  +   spin_unlock(vcpu-async_pf.lock);
 
 This can fail, and apf-page become NULL.
 
  +   if (list_empty_careful(vcpu-async_pf.done))
  +   return;
  +
  +   spin_lock(vcpu-async_pf.lock);
  +   work = list_first_entry(vcpu-async_pf.done, typeof(*work), link);
  +   list_del(work-link);
  +   spin_unlock(vcpu-async_pf.lock);
  +
  +   kvm_arch_async_page_present(vcpu, work);
  +
  +free:
  +   list_del(work-queue);
  +   vcpu-async_pf.queued--;
  +   put_page(work-page);
  +   kmem_cache_free(async_pf_cache, work);
  +}
 
 Better handle it here (and other sites).
Yeah. We should just reenter gust and let usual code path handle error
on next guest access.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-05 Thread Marcelo Tosatti
On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote:
 If a guest accesses swapped out memory do not swap it in from vcpu thread
 context. Schedule work to do swapping and put vcpu into halted state
 instead.
 
 Interrupts will still be delivered to the guest and if interrupt will
 cause reschedule guest will continue to run another task.
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 ---
  arch/x86/include/asm/kvm_host.h |   17 +++
  arch/x86/kvm/Kconfig|1 +
  arch/x86/kvm/Makefile   |1 +
  arch/x86/kvm/mmu.c  |   51 +-
  arch/x86/kvm/paging_tmpl.h  |4 +-
  arch/x86/kvm/x86.c  |  109 +++-
  include/linux/kvm_host.h|   31 ++
  include/trace/events/kvm.h  |   88 
  virt/kvm/Kconfig|3 +
  virt/kvm/async_pf.c |  220 
 +++
  virt/kvm/async_pf.h |   36 +++
  virt/kvm/kvm_main.c |   57 --
  12 files changed, 603 insertions(+), 15 deletions(-)
  create mode 100644 virt/kvm/async_pf.c
  create mode 100644 virt/kvm/async_pf.h
 

 + async_pf_cache = NULL;
 +}
 +
 +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu)
 +{
 + INIT_LIST_HEAD(vcpu-async_pf.done);
 + INIT_LIST_HEAD(vcpu-async_pf.queue);
 + spin_lock_init(vcpu-async_pf.lock);
 +}
 +
 +static void async_pf_execute(struct work_struct *work)
 +{
 + struct page *page;
 + struct kvm_async_pf *apf =
 + container_of(work, struct kvm_async_pf, work);
 + struct mm_struct *mm = apf-mm;
 + struct kvm_vcpu *vcpu = apf-vcpu;
 + unsigned long addr = apf-addr;
 + gva_t gva = apf-gva;
 +
 + might_sleep();
 +
 + use_mm(mm);
 + down_read(mm-mmap_sem);
 + get_user_pages(current, mm, addr, 1, 1, 0, page, NULL);
 + up_read(mm-mmap_sem);
 + unuse_mm(mm);
 +
 + spin_lock(vcpu-async_pf.lock);
 + list_add_tail(apf-link, vcpu-async_pf.done);
 + apf-page = page;
 + spin_unlock(vcpu-async_pf.lock);

This can fail, and apf-page become NULL.

 + if (list_empty_careful(vcpu-async_pf.done))
 + return;
 +
 + spin_lock(vcpu-async_pf.lock);
 + work = list_first_entry(vcpu-async_pf.done, typeof(*work), link);
 + list_del(work-link);
 + spin_unlock(vcpu-async_pf.lock);
 +
 + kvm_arch_async_page_present(vcpu, work);
 +
 +free:
 + list_del(work-queue);
 + vcpu-async_pf.queued--;
 + put_page(work-page);
 + kmem_cache_free(async_pf_cache, work);
 +}

Better handle it here (and other sites).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-04 Thread Gleb Natapov
If a guest accesses swapped out memory do not swap it in from vcpu thread
context. Schedule work to do swapping and put vcpu into halted state
instead.

Interrupts will still be delivered to the guest and if interrupt will
cause reschedule guest will continue to run another task.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |   17 +++
 arch/x86/kvm/Kconfig|1 +
 arch/x86/kvm/Makefile   |1 +
 arch/x86/kvm/mmu.c  |   51 +-
 arch/x86/kvm/paging_tmpl.h  |4 +-
 arch/x86/kvm/x86.c  |  109 +++-
 include/linux/kvm_host.h|   31 ++
 include/trace/events/kvm.h  |   88 
 virt/kvm/Kconfig|3 +
 virt/kvm/async_pf.c |  220 +++
 virt/kvm/async_pf.h |   36 +++
 virt/kvm/kvm_main.c |   57 --
 12 files changed, 603 insertions(+), 15 deletions(-)
 create mode 100644 virt/kvm/async_pf.c
 create mode 100644 virt/kvm/async_pf.h

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e209078..5f154d3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -83,6 +83,8 @@
 #define KVM_NR_FIXED_MTRR_REGION 88
 #define KVM_NR_VAR_MTRR 8
 
+#define ASYNC_PF_PER_VCPU 64
+
 extern spinlock_t kvm_lock;
 extern struct list_head vm_list;
 
@@ -412,6 +414,10 @@ struct kvm_vcpu_arch {
u64 hv_vapic;
 
cpumask_var_t wbinvd_dirty_mask;
+
+   struct {
+   gfn_t gfns[roundup_pow_of_two(ASYNC_PF_PER_VCPU)];
+   } apf;
 };
 
 struct kvm_arch {
@@ -585,7 +591,12 @@ struct kvm_x86_ops {
const struct trace_print_flags *exit_reasons_str;
 };
 
+struct kvm_arch_async_pf {
+   gfn_t gfn;
+};
+
 extern struct kvm_x86_ops *kvm_x86_ops;
+extern struct kvm_async_pf *kvm_double_apf;
 
 int kvm_mmu_module_init(void);
 void kvm_mmu_module_exit(void);
@@ -823,4 +834,10 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
 
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
 
+void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index ddc131f..50f6364 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -28,6 +28,7 @@ config KVM
select HAVE_KVM_IRQCHIP
select HAVE_KVM_EVENTFD
select KVM_APIC_ARCHITECTURE
+   select KVM_ASYNC_PF
select USER_RETURN_NOTIFIER
select KVM_MMIO
---help---
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 31a7035..c53bf19 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -9,6 +9,7 @@ kvm-y   += $(addprefix ../../../virt/kvm/, 
kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o \
assigned-dev.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
+kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/, async_pf.o)
 
 kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
   i8254.o timer.o
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c94c432..4d49b5e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -18,9 +18,11 @@
  *
  */
 
+#include irq.h
 #include mmu.h
 #include x86.h
 #include kvm_cache_regs.h
+#include x86.h
 
 #include linux/kvm_host.h
 #include linux/types.h
@@ -2575,6 +2577,49 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
gva_t gva,
 error_code  PFERR_WRITE_MASK, gfn);
 }
 
+int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
+{
+   struct kvm_arch_async_pf arch;
+   arch.gfn = gfn;
+
+   return kvm_setup_async_pf(vcpu, gva, gfn, arch);
+}
+
+static bool can_do_async_pf(struct kvm_vcpu *vcpu)
+{
+   if (unlikely(!irqchip_in_kernel(vcpu-kvm) ||
+kvm_event_needs_reinjection(vcpu)))
+   return false;
+
+   return kvm_x86_ops-interrupt_allowed(vcpu);
+}
+
+static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
+pfn_t *pfn)
+{
+   bool async;
+
+   *pfn = gfn_to_pfn_async(vcpu-kvm, gfn, async);
+
+   if (!async)
+   return false; /* *pfn has correct page already */
+
+   put_page(pfn_to_page(*pfn));
+
+   if (can_do_async_pf(vcpu)) {
+   trace_kvm_try_async_get_page(async, *pfn);
+   if (kvm_find_async_pf_gfn(vcpu, gfn)) {
+   vcpu-async_pf.work = kvm_double_apf;
+   return 

Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-04 Thread Rik van Riel

On 10/04/2010 11:56 AM, Gleb Natapov wrote:

If a guest accesses swapped out memory do not swap it in from vcpu thread
context. Schedule work to do swapping and put vcpu into halted state
instead.

Interrupts will still be delivered to the guest and if interrupt will
cause reschedule guest will continue to run another task.

Signed-off-by: Gleb Natapovg...@redhat.com


This seems quite different from the last version, but it
looks fine to me.

Acked-by: Rik van Riel r...@redhat.com


--
All rights reversed
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html