Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On Sat, Oct 09, 2010 at 08:30:18PM +0200, Avi Kivity wrote: On 10/07/2010 07:47 PM, Gleb Natapov wrote: On Thu, Oct 07, 2010 at 11:50:08AM +0200, Avi Kivity wrote: On 10/04/2010 05:56 PM, Gleb Natapov wrote: If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. + +static bool can_do_async_pf(struct kvm_vcpu *vcpu) +{ +if (unlikely(!irqchip_in_kernel(vcpu-kvm) || + kvm_event_needs_reinjection(vcpu))) +return false; + +return kvm_x86_ops-interrupt_allowed(vcpu); +} Strictly speaking, if the cpu can handle NMIs it can take an apf? We can always do apf, but if vcpu can't do anything hwy bother. For NMI watchdog yes, may be it is worth to allow apf if nmi is allowed. Actually it's very dangerous - the IRET from APF will re-enable NMIs. So without the guest enabling apf-in-nmi we shouldn't allow it. Good point. Not worth the complexity IMO. @@ -5112,6 +5122,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (unlikely(r)) goto out; +kvm_check_async_pf_completion(vcpu); +if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED) { +/* Page is swapped out. Do synthetic halt */ +r = 1; +goto out; +} + Why do it here in the fast path? Can't you halt the cpu when starting the page fault? Page fault may complete before guest re-entry. We do not want to halt vcpu in this case. So unhalt on completion. I want to avoid touching vcpu state from work if possible. Work code does not contain arch dependent code right now and mp_state is x86 thing I guess the apf threads can't touch mp_state, but they can have a KVM_REQ to trigger the check. This will require KVM_REQ check on fast path, so what's the difference performance wise. We already have a KVM_REQ check (if (vcpu-requests)) so it doesn't cost anything extra. if (vcpu-requests) does not clear req bit, so what will have to be added is: if (kvm_check_request(KVM_REQ_APF_HLT, vcpu)) which is even more expensive then my check (but not so expensive to worry about). @@ -6040,6 +6064,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm) int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) { return vcpu-arch.mp_state == KVM_MP_STATE_RUNNABLE +|| !list_empty_careful(vcpu-async_pf.done) || vcpu-arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED || vcpu-arch.nmi_pending || (kvm_arch_interrupt_allowed(vcpu) Unrelated, shouldn't kvm_arch_vcpu_runnable() look at vcpu-requests? Specifically KVM_REQ_EVENT? I think KVM_REQ_EVENT is covered by checking nmi and interrupt queue here. No, the nmi and interrupt queues are only filled when the lapic is polled via KVM_REQ_EVENT. I'll prepare a patch. I don't think you are correct. nmi_pending is filled before setting KVM_REQ_EVENT and kvm_cpu_has_interrupt() checks directly in apic/pic. + +TRACE_EVENT( +kvm_async_pf_not_present, +TP_PROTO(u64 gva), +TP_ARGS(gva), Do you actually have a gva with tdp? With nested virtualization, how do you interpret this gva? With tdp it is gpa just like tdp_page_fault gets gpa where shadow page version gets gva. Nested virtualization is too complex to interpret. It's not good to have a tracepoint that depends on cpu mode (without recording that mode). I think we have the same issue in trace_kvm_page_fault though. We have mmu_is_nested(). I'll just disable apf while vcpu is in nested mode for now. + +TRACE_EVENT( +kvm_async_pf_completed, +TP_PROTO(unsigned long address, struct page *page, u64 gva), +TP_ARGS(address, page, gva), What does address mean? There's also gva? hva. Is hva helpful here? Generally gpa is better, but may not be available since it's ambiguous. +void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu) +{ +/* cancel outstanding work queue item */ +while (!list_empty(vcpu-async_pf.queue)) { +struct kvm_async_pf *work = +list_entry(vcpu-async_pf.queue.next, + typeof(*work), queue); +cancel_work_sync(work-work); +list_del(work-queue); +if (!work-page) /* work was canceled */ +kmem_cache_free(async_pf_cache, work); +} Are you holding any lock here? If not, what protects vcpu-async_pf.queue? Nothing. It is accessed only from vcpu thread.
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On Sat, Oct 09, 2010 at 08:32:14PM +0200, Avi Kivity wrote: On 10/09/2010 08:30 PM, Avi Kivity wrote: So that info cpu will interfere with apf? Migration should work in regular way. apf state should not be migrated since it has no meaning on the destination. I'll make sure synthetic halt state will not interfere with migration. If you deliver an apf, the guest expects a completion. btw, the token generation scheme resets as well. Draining the queue fixes that as well. I don't see what's there to fix. Can you explain what problem you see in the way current code works? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On 10/10/2010 09:29 AM, Gleb Natapov wrote: On Sat, Oct 09, 2010 at 08:30:18PM +0200, Avi Kivity wrote: On 10/07/2010 07:47 PM, Gleb Natapov wrote: On Thu, Oct 07, 2010 at 11:50:08AM +0200, Avi Kivity wrote: On 10/04/2010 05:56 PM, Gleb Natapov wrote: If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. + +static bool can_do_async_pf(struct kvm_vcpu *vcpu) +{ + if (unlikely(!irqchip_in_kernel(vcpu-kvm) || + kvm_event_needs_reinjection(vcpu))) + return false; + + return kvm_x86_ops-interrupt_allowed(vcpu); +} Strictly speaking, if the cpu can handle NMIs it can take an apf? We can always do apf, but if vcpu can't do anything hwy bother. For NMI watchdog yes, may be it is worth to allow apf if nmi is allowed. Actually it's very dangerous - the IRET from APF will re-enable NMIs. So without the guest enabling apf-in-nmi we shouldn't allow it. Good point. Not worth the complexity IMO. @@ -5112,6 +5122,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (unlikely(r)) goto out; + kvm_check_async_pf_completion(vcpu); + if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED) { + /* Page is swapped out. Do synthetic halt */ + r = 1; + goto out; + } + Why do it here in the fast path? Can't you halt the cpu when starting the page fault? Page fault may complete before guest re-entry. We do not want to halt vcpu in this case. So unhalt on completion. I want to avoid touching vcpu state from work if possible. Work code does not contain arch dependent code right now and mp_state is x86 thing Use a KVM_REQ. I guess the apf threads can't touch mp_state, but they can have a KVM_REQ to trigger the check. This will require KVM_REQ check on fast path, so what's the difference performance wise. We already have a KVM_REQ check (if (vcpu-requests)) so it doesn't cost anything extra. if (vcpu-requests) does not clear req bit, so what will have to be added is: if (kvm_check_request(KVM_REQ_APF_HLT, vcpu)) which is even more expensive then my check (but not so expensive to worry about). It's only expensive when it happens. Most entries will have the bit clear. @@ -6040,6 +6064,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm) int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) { return vcpu-arch.mp_state == KVM_MP_STATE_RUNNABLE + || !list_empty_careful(vcpu-async_pf.done) || vcpu-arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED || vcpu-arch.nmi_pending || (kvm_arch_interrupt_allowed(vcpu) Unrelated, shouldn't kvm_arch_vcpu_runnable() look at vcpu-requests? Specifically KVM_REQ_EVENT? I think KVM_REQ_EVENT is covered by checking nmi and interrupt queue here. No, the nmi and interrupt queues are only filled when the lapic is polled via KVM_REQ_EVENT. I'll prepare a patch. I don't think you are correct. nmi_pending is filled before setting KVM_REQ_EVENT and kvm_cpu_has_interrupt() checks directly in apic/pic. Right. + +TRACE_EVENT( + kvm_async_pf_not_present, + TP_PROTO(u64 gva), + TP_ARGS(gva), Do you actually have a gva with tdp? With nested virtualization, how do you interpret this gva? With tdp it is gpa just like tdp_page_fault gets gpa where shadow page version gets gva. Nested virtualization is too complex to interpret. It's not good to have a tracepoint that depends on cpu mode (without recording that mode). I think we have the same issue in trace_kvm_page_fault though. We have mmu_is_nested(). I'll just disable apf while vcpu is in nested mode for now. What if we get the apf in non-nested mode and it completes in nested mode? + + /* do alloc nowait since if we are going to sleep anyway we + may as well sleep faulting in page */ /* * multi * line * comment */ (but a good one, this is subtle) I missed where you halt the vcpu. Can you point me at the function? Note this is a synthetic halt and must not be visible to live migration, or we risk live migrating a halted state which doesn't really exist. Might be simplest to drain the apf queue on any of the save/restore ioctls. So that info cpu will interfere with apf? Migration should work in regular way. apf state should not be migrated since it has no meaning on the destination. I'll make sure synthetic halt state will not
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On 10/10/2010 05:55 PM, Avi Kivity wrote: There is special completion that tells guest to wake all sleeping tasks on vcpu. It is delivered after migration on the destination. Yes, I saw. What if you can't deliver it? is it possible that some other vcpu will start receiving apfs that alias the old ones? Or is the broadcast global? And, is the broadcast used only for migrations? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On Sun, Oct 10, 2010 at 05:55:25PM +0200, Avi Kivity wrote: @@ -5112,6 +5122,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (unlikely(r)) goto out; + kvm_check_async_pf_completion(vcpu); + if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED) { + /* Page is swapped out. Do synthetic halt */ + r = 1; + goto out; + } + Why do it here in the fast path? Can't you halt the cpu when starting the page fault? Page fault may complete before guest re-entry. We do not want to halt vcpu in this case. So unhalt on completion. I want to avoid touching vcpu state from work if possible. Work code does not contain arch dependent code right now and mp_state is x86 thing Use a KVM_REQ. Completion happens asynchronously. CPU may not be even halted at that point. Actually completion does unhalt vcpu. It puts completed work into vcpu-async_pf.done list and wakes vcpu thread if it sleeps. Next invocation of kvm_arch_vcpu_runnable() will return true since vcpu-async_pf.done is not empty and vcpu will be unhalted in usual way by kvm_vcpu_block(). I guess the apf threads can't touch mp_state, but they can have a KVM_REQ to trigger the check. This will require KVM_REQ check on fast path, so what's the difference performance wise. We already have a KVM_REQ check (if (vcpu-requests)) so it doesn't cost anything extra. if (vcpu-requests) does not clear req bit, so what will have to be added is: if (kvm_check_request(KVM_REQ_APF_HLT, vcpu)) which is even more expensive then my check (but not so expensive to worry about). It's only expensive when it happens. Most entries will have the bit clear. kvm_check_async_pf_completion() (the one that detects if vcpu should be halted) is called after vcpu-requests processing. It is done in order to delay completion checking as far as possible in hope to get completion before next vcpu entry and skip sending apf, so I do it at the last possible moment before event injection. + +TRACE_EVENT( + kvm_async_pf_not_present, + TP_PROTO(u64 gva), + TP_ARGS(gva), Do you actually have a gva with tdp? With nested virtualization, how do you interpret this gva? With tdp it is gpa just like tdp_page_fault gets gpa where shadow page version gets gva. Nested virtualization is too complex to interpret. It's not good to have a tracepoint that depends on cpu mode (without recording that mode). I think we have the same issue in trace_kvm_page_fault though. We have mmu_is_nested(). I'll just disable apf while vcpu is in nested mode for now. What if we get the apf in non-nested mode and it completes in nested mode? I am not yet sure we have any problem with nested mode at all. I am looking at it. If we have we can skip prefault if in nested. + + /* do alloc nowait since if we are going to sleep anyway we + may as well sleep faulting in page */ /* * multi * line * comment */ (but a good one, this is subtle) I missed where you halt the vcpu. Can you point me at the function? Note this is a synthetic halt and must not be visible to live migration, or we risk live migrating a halted state which doesn't really exist. Might be simplest to drain the apf queue on any of the save/restore ioctls. So that info cpu will interfere with apf? Migration should work in regular way. apf state should not be migrated since it has no meaning on the destination. I'll make sure synthetic halt state will not interfere with migration. If you deliver an apf, the guest expects a completion. There is special completion that tells guest to wake all sleeping tasks on vcpu. It is delivered after migration on the destination. Yes, I saw. What if you can't deliver it? is it possible that some other vcpu How can this happen? If I can't deliverer it I can't deliver non-broadcast apfs too. will start receiving apfs that alias the old ones? Or is the broadcast global? Broadcast is not global but tokens are unique per cpu so other vcpu will not be able to receiving apfs that alias the old ones (if I understand what you mean correctly). -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On Sun, Oct 10, 2010 at 05:56:31PM +0200, Avi Kivity wrote: On 10/10/2010 05:55 PM, Avi Kivity wrote: There is special completion that tells guest to wake all sleeping tasks on vcpu. It is delivered after migration on the destination. Yes, I saw. What if you can't deliver it? is it possible that some other vcpu will start receiving apfs that alias the old ones? Or is the broadcast global? And, is the broadcast used only for migrations? Any time apf is enabled on vcpu broadcast is sent to the vcpu. Guest should be careful here and do not write to apf msr without a reason. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On 10/07/2010 07:47 PM, Gleb Natapov wrote: On Thu, Oct 07, 2010 at 11:50:08AM +0200, Avi Kivity wrote: On 10/04/2010 05:56 PM, Gleb Natapov wrote: If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. + +static bool can_do_async_pf(struct kvm_vcpu *vcpu) +{ + if (unlikely(!irqchip_in_kernel(vcpu-kvm) || + kvm_event_needs_reinjection(vcpu))) + return false; + + return kvm_x86_ops-interrupt_allowed(vcpu); +} Strictly speaking, if the cpu can handle NMIs it can take an apf? We can always do apf, but if vcpu can't do anything hwy bother. For NMI watchdog yes, may be it is worth to allow apf if nmi is allowed. Actually it's very dangerous - the IRET from APF will re-enable NMIs. So without the guest enabling apf-in-nmi we shouldn't allow it. Not worth the complexity IMO. @@ -5112,6 +5122,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (unlikely(r)) goto out; + kvm_check_async_pf_completion(vcpu); + if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED) { + /* Page is swapped out. Do synthetic halt */ + r = 1; + goto out; + } + Why do it here in the fast path? Can't you halt the cpu when starting the page fault? Page fault may complete before guest re-entry. We do not want to halt vcpu in this case. So unhalt on completion. I guess the apf threads can't touch mp_state, but they can have a KVM_REQ to trigger the check. This will require KVM_REQ check on fast path, so what's the difference performance wise. We already have a KVM_REQ check (if (vcpu-requests)) so it doesn't cost anything extra. @@ -6040,6 +6064,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm) int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) { return vcpu-arch.mp_state == KVM_MP_STATE_RUNNABLE + || !list_empty_careful(vcpu-async_pf.done) || vcpu-arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED || vcpu-arch.nmi_pending || (kvm_arch_interrupt_allowed(vcpu) Unrelated, shouldn't kvm_arch_vcpu_runnable() look at vcpu-requests? Specifically KVM_REQ_EVENT? I think KVM_REQ_EVENT is covered by checking nmi and interrupt queue here. No, the nmi and interrupt queues are only filled when the lapic is polled via KVM_REQ_EVENT. I'll prepare a patch. + +TRACE_EVENT( + kvm_async_pf_not_present, + TP_PROTO(u64 gva), + TP_ARGS(gva), Do you actually have a gva with tdp? With nested virtualization, how do you interpret this gva? With tdp it is gpa just like tdp_page_fault gets gpa where shadow page version gets gva. Nested virtualization is too complex to interpret. It's not good to have a tracepoint that depends on cpu mode (without recording that mode). I think we have the same issue in trace_kvm_page_fault though. + +TRACE_EVENT( + kvm_async_pf_completed, + TP_PROTO(unsigned long address, struct page *page, u64 gva), + TP_ARGS(address, page, gva), What does address mean? There's also gva? hva. Is hva helpful here? Generally gpa is better, but may not be available since it's ambiguous. +void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu) +{ + /* cancel outstanding work queue item */ + while (!list_empty(vcpu-async_pf.queue)) { + struct kvm_async_pf *work = + list_entry(vcpu-async_pf.queue.next, +typeof(*work), queue); + cancel_work_sync(work-work); + list_del(work-queue); + if (!work-page) /* work was canceled */ + kmem_cache_free(async_pf_cache, work); + } Are you holding any lock here? If not, what protects vcpu-async_pf.queue? Nothing. It is accessed only from vcpu thread. If yes, cancel_work_sync() will need to aquire it too (in case work is running now and needs to take the lock, and cacncel_work_sync() needs to wait for it) - deadlock. Work never touches this list. So, an apf is always in -queue and when completed also in -done? Is it not cleaner to list_move the apf from -queue to -done? saves a -link. Can be done later. + + /* do alloc nowait since if we are going to sleep anyway we +may as well sleep faulting in page */ /* * multi * line * comment */ (but a good one, this is subtle) I missed where you halt the vcpu. Can you point me at the function? Note this is a synthetic halt and must not be visible to live migration, or we risk live migrating a halted state which doesn't really exist. Might be simplest to drain the apf queue on
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On 10/09/2010 08:30 PM, Avi Kivity wrote: So that info cpu will interfere with apf? Migration should work in regular way. apf state should not be migrated since it has no meaning on the destination. I'll make sure synthetic halt state will not interfere with migration. If you deliver an apf, the guest expects a completion. btw, the token generation scheme resets as well. Draining the queue fixes that as well. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On 10/04/2010 05:56 PM, Gleb Natapov wrote: If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. + +static bool can_do_async_pf(struct kvm_vcpu *vcpu) +{ + if (unlikely(!irqchip_in_kernel(vcpu-kvm) || +kvm_event_needs_reinjection(vcpu))) + return false; + + return kvm_x86_ops-interrupt_allowed(vcpu); +} Strictly speaking, if the cpu can handle NMIs it can take an apf? @@ -5112,6 +5122,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (unlikely(r)) goto out; + kvm_check_async_pf_completion(vcpu); + if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED) { + /* Page is swapped out. Do synthetic halt */ + r = 1; + goto out; + } + Why do it here in the fast path? Can't you halt the cpu when starting the page fault? I guess the apf threads can't touch mp_state, but they can have a KVM_REQ to trigger the check. if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { inject_pending_event(vcpu); @@ -5781,6 +5798,9 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu) kvm_make_request(KVM_REQ_EVENT, vcpu); + kvm_clear_async_pf_completion_queue(vcpu); + memset(vcpu-arch.apf.gfns, 0xff, sizeof vcpu-arch.apf.gfns); An ordinary for loop is less tricky, even if it means one more line. @@ -6040,6 +6064,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm) int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) { return vcpu-arch.mp_state == KVM_MP_STATE_RUNNABLE + || !list_empty_careful(vcpu-async_pf.done) || vcpu-arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED || vcpu-arch.nmi_pending || (kvm_arch_interrupt_allowed(vcpu) Unrelated, shouldn't kvm_arch_vcpu_runnable() look at vcpu-requests? Specifically KVM_REQ_EVENT? +static void kvm_add_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn) +{ + u32 key = kvm_async_pf_hash_fn(gfn); + + while (vcpu-arch.apf.gfns[key] != -1) + key = kvm_async_pf_next_probe(key); Not sure what that -1 converts to on i386 where gfn_t is u64. + +void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, +struct kvm_async_pf *work) +{ + vcpu-arch.mp_state = KVM_MP_STATE_HALTED; + + if (work == kvm_double_apf) + trace_kvm_async_pf_doublefault(kvm_rip_read(vcpu)); + else { + trace_kvm_async_pf_not_present(work-gva); + + kvm_add_async_pf_gfn(vcpu, work-arch.gfn); + } +} Just have vcpu as the argument for tracepoints to avoid unconditional kvm_rip_read (slow on Intel), and call kvm_rip_read() in tp_fast_assign(). Similarly you can pass work instead of work-gva, though that's not nearly as important. + +TRACE_EVENT( + kvm_async_pf_not_present, + TP_PROTO(u64 gva), + TP_ARGS(gva), Do you actually have a gva with tdp? With nested virtualization, how do you interpret this gva? + +TRACE_EVENT( + kvm_async_pf_completed, + TP_PROTO(unsigned long address, struct page *page, u64 gva), + TP_ARGS(address, page, gva), What does address mean? There's also gva? + + TP_STRUCT__entry( + __field(unsigned long, address) + __field(struct page*, page) + __field(u64, gva) + ), + + TP_fast_assign( + __entry-address = address; + __entry-page = page; + __entry-gva = gva; + ), Recording a struct page * in a tracepoint? Userspace can read this entry, better to the page_to_pfn() here. +void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu) +{ + /* cancel outstanding work queue item */ + while (!list_empty(vcpu-async_pf.queue)) { + struct kvm_async_pf *work = + list_entry(vcpu-async_pf.queue.next, + typeof(*work), queue); + cancel_work_sync(work-work); + list_del(work-queue); + if (!work-page) /* work was canceled */ + kmem_cache_free(async_pf_cache, work); + } Are you holding any lock here? If not, what protects vcpu-async_pf.queue? If yes, cancel_work_sync() will need to aquire it too (in case work is running now and needs to take the lock, and cacncel_work_sync() needs to wait for it) - deadlock. + + /* do alloc nowait since if we are going to sleep anyway we + may as well sleep faulting in page */ /* * multi * line * comment */ (but a good one, this is subtle) I missed where you halt the vcpu. Can you point me at the function? Note this
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On 10/07/2010 11:50 AM, Avi Kivity wrote: I missed where you halt the vcpu. Can you point me at the function? Found it. Multiplexing a guest state with a host state is a bad idea. Better have separate state. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On 10/06/2010 12:52 PM, Gleb Natapov wrote: On Wed, Oct 06, 2010 at 12:50:01PM +0200, Avi Kivity wrote: On 10/05/2010 04:59 PM, Marcelo Tosatti wrote: On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote: If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. Signed-off-by: Gleb Natapovg...@redhat.com --- arch/x86/include/asm/kvm_host.h | 17 +++ arch/x86/kvm/Kconfig|1 + arch/x86/kvm/Makefile |1 + arch/x86/kvm/mmu.c | 51 +- arch/x86/kvm/paging_tmpl.h |4 +- arch/x86/kvm/x86.c | 109 +++- include/linux/kvm_host.h| 31 ++ include/trace/events/kvm.h | 88 virt/kvm/Kconfig|3 + virt/kvm/async_pf.c | 220 +++ virt/kvm/async_pf.h | 36 +++ virt/kvm/kvm_main.c | 57 -- 12 files changed, 603 insertions(+), 15 deletions(-) create mode 100644 virt/kvm/async_pf.c create mode 100644 virt/kvm/async_pf.h + async_pf_cache = NULL; +} + +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu) +{ + INIT_LIST_HEAD(vcpu-async_pf.done); + INIT_LIST_HEAD(vcpu-async_pf.queue); + spin_lock_init(vcpu-async_pf.lock); +} + +static void async_pf_execute(struct work_struct *work) +{ + struct page *page; + struct kvm_async_pf *apf = + container_of(work, struct kvm_async_pf, work); + struct mm_struct *mm = apf-mm; + struct kvm_vcpu *vcpu = apf-vcpu; + unsigned long addr = apf-addr; + gva_t gva = apf-gva; + + might_sleep(); + + use_mm(mm); + down_read(mm-mmap_sem); + get_user_pages(current, mm, addr, 1, 1, 0,page, NULL); + up_read(mm-mmap_sem); + unuse_mm(mm); + + spin_lock(vcpu-async_pf.lock); + list_add_tail(apf-link,vcpu-async_pf.done); + apf-page = page; + spin_unlock(vcpu-async_pf.lock); This can fail, and apf-page become NULL. Does it even become NULL? On error, get_user_pages() won't update the pages argument, so page becomes garbage here. apf is allocated with kmem_cache_zalloc() and -page is set to NULL in kvm_setup_async_pf() to be extra sure. But you assign apf-page = page;, overriding it with garbage here. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On 10/07/2010 05:50 AM, Avi Kivity wrote: +static bool can_do_async_pf(struct kvm_vcpu *vcpu) +{ + if (unlikely(!irqchip_in_kernel(vcpu-kvm) || + kvm_event_needs_reinjection(vcpu))) + return false; + + return kvm_x86_ops-interrupt_allowed(vcpu); +} Strictly speaking, if the cpu can handle NMIs it can take an apf? Strictly speaking, yes. However, it may not be able to DO anything with it, since it won't be able to reschedule the context it's running :) -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On 10/07/2010 03:24 PM, Rik van Riel wrote: On 10/07/2010 05:50 AM, Avi Kivity wrote: +static bool can_do_async_pf(struct kvm_vcpu *vcpu) +{ + if (unlikely(!irqchip_in_kernel(vcpu-kvm) || + kvm_event_needs_reinjection(vcpu))) + return false; + + return kvm_x86_ops-interrupt_allowed(vcpu); +} Strictly speaking, if the cpu can handle NMIs it can take an apf? Strictly speaking, yes. However, it may not be able to DO anything with it, since it won't be able to reschedule the context it's running :) I was thinking about keeping the watchdog nmi handler responsive. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On Thu, Oct 07, 2010 at 11:50:08AM +0200, Avi Kivity wrote: On 10/04/2010 05:56 PM, Gleb Natapov wrote: If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. + +static bool can_do_async_pf(struct kvm_vcpu *vcpu) +{ +if (unlikely(!irqchip_in_kernel(vcpu-kvm) || + kvm_event_needs_reinjection(vcpu))) +return false; + +return kvm_x86_ops-interrupt_allowed(vcpu); +} Strictly speaking, if the cpu can handle NMIs it can take an apf? We can always do apf, but if vcpu can't do anything hwy bother. For NMI watchdog yes, may be it is worth to allow apf if nmi is allowed. @@ -5112,6 +5122,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (unlikely(r)) goto out; +kvm_check_async_pf_completion(vcpu); +if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED) { +/* Page is swapped out. Do synthetic halt */ +r = 1; +goto out; +} + Why do it here in the fast path? Can't you halt the cpu when starting the page fault? Page fault may complete before guest re-entry. We do not want to halt vcpu in this case. I guess the apf threads can't touch mp_state, but they can have a KVM_REQ to trigger the check. This will require KVM_REQ check on fast path, so what's the difference performance wise. if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { inject_pending_event(vcpu); @@ -5781,6 +5798,9 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu) kvm_make_request(KVM_REQ_EVENT, vcpu); +kvm_clear_async_pf_completion_queue(vcpu); +memset(vcpu-arch.apf.gfns, 0xff, sizeof vcpu-arch.apf.gfns); An ordinary for loop is less tricky, even if it means one more line. @@ -6040,6 +6064,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm) int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) { return vcpu-arch.mp_state == KVM_MP_STATE_RUNNABLE +|| !list_empty_careful(vcpu-async_pf.done) || vcpu-arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED || vcpu-arch.nmi_pending || (kvm_arch_interrupt_allowed(vcpu) Unrelated, shouldn't kvm_arch_vcpu_runnable() look at vcpu-requests? Specifically KVM_REQ_EVENT? I think KVM_REQ_EVENT is covered by checking nmi and interrupt queue here. +static void kvm_add_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn) +{ +u32 key = kvm_async_pf_hash_fn(gfn); + +while (vcpu-arch.apf.gfns[key] != -1) +key = kvm_async_pf_next_probe(key); Not sure what that -1 converts to on i386 where gfn_t is u64. Will check. + +void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, + struct kvm_async_pf *work) +{ +vcpu-arch.mp_state = KVM_MP_STATE_HALTED; + +if (work == kvm_double_apf) +trace_kvm_async_pf_doublefault(kvm_rip_read(vcpu)); +else { +trace_kvm_async_pf_not_present(work-gva); + +kvm_add_async_pf_gfn(vcpu, work-arch.gfn); +} +} Just have vcpu as the argument for tracepoints to avoid unconditional kvm_rip_read (slow on Intel), and call kvm_rip_read() in tp_fast_assign(). Similarly you can pass work instead of work-gva, though that's not nearly as important. Will do. + +TRACE_EVENT( +kvm_async_pf_not_present, +TP_PROTO(u64 gva), +TP_ARGS(gva), Do you actually have a gva with tdp? With nested virtualization, how do you interpret this gva? With tdp it is gpa just like tdp_page_fault gets gpa where shadow page version gets gva. Nested virtualization is too complex to interpret. + +TRACE_EVENT( +kvm_async_pf_completed, +TP_PROTO(unsigned long address, struct page *page, u64 gva), +TP_ARGS(address, page, gva), What does address mean? There's also gva? hva. + +TP_STRUCT__entry( +__field(unsigned long, address) +__field(struct page*, page) +__field(u64, gva) +), + +TP_fast_assign( +__entry-address = address; +__entry-page = page; +__entry-gva = gva; +), Recording a struct page * in a tracepoint? Userspace can read this entry, better to the page_to_pfn() here. OK. +void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu) +{ +/* cancel outstanding work queue item */ +while (!list_empty(vcpu-async_pf.queue)) { +struct kvm_async_pf *work = +list_entry(vcpu-async_pf.queue.next, + typeof(*work), queue); +cancel_work_sync(work-work); +list_del(work-queue); +if (!work-page) /* work was canceled */ +
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On Thu, Oct 07, 2010 at 11:54:22AM +0200, Avi Kivity wrote: On 10/06/2010 12:52 PM, Gleb Natapov wrote: On Wed, Oct 06, 2010 at 12:50:01PM +0200, Avi Kivity wrote: On 10/05/2010 04:59 PM, Marcelo Tosatti wrote: On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote: If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. Signed-off-by: Gleb Natapovg...@redhat.com --- arch/x86/include/asm/kvm_host.h | 17 +++ arch/x86/kvm/Kconfig|1 + arch/x86/kvm/Makefile |1 + arch/x86/kvm/mmu.c | 51 +- arch/x86/kvm/paging_tmpl.h |4 +- arch/x86/kvm/x86.c | 109 +++- include/linux/kvm_host.h| 31 ++ include/trace/events/kvm.h | 88 virt/kvm/Kconfig|3 + virt/kvm/async_pf.c | 220 +++ virt/kvm/async_pf.h | 36 +++ virt/kvm/kvm_main.c | 57 -- 12 files changed, 603 insertions(+), 15 deletions(-) create mode 100644 virt/kvm/async_pf.c create mode 100644 virt/kvm/async_pf.h +async_pf_cache = NULL; +} + +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu) +{ +INIT_LIST_HEAD(vcpu-async_pf.done); +INIT_LIST_HEAD(vcpu-async_pf.queue); +spin_lock_init(vcpu-async_pf.lock); +} + +static void async_pf_execute(struct work_struct *work) +{ +struct page *page; +struct kvm_async_pf *apf = +container_of(work, struct kvm_async_pf, work); +struct mm_struct *mm = apf-mm; +struct kvm_vcpu *vcpu = apf-vcpu; +unsigned long addr = apf-addr; +gva_t gva = apf-gva; + +might_sleep(); + +use_mm(mm); +down_read(mm-mmap_sem); +get_user_pages(current, mm, addr, 1, 1, 0,page, NULL); +up_read(mm-mmap_sem); +unuse_mm(mm); + +spin_lock(vcpu-async_pf.lock); +list_add_tail(apf-link,vcpu-async_pf.done); +apf-page = page; +spin_unlock(vcpu-async_pf.lock); This can fail, and apf-page become NULL. Does it even become NULL? On error, get_user_pages() won't update the pages argument, so page becomes garbage here. apf is allocated with kmem_cache_zalloc() and -page is set to NULL in kvm_setup_async_pf() to be extra sure. But you assign apf-page = page;, overriding it with garbage here. Ah, right you are. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On 10/05/2010 04:59 PM, Marcelo Tosatti wrote: On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote: If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. Signed-off-by: Gleb Natapovg...@redhat.com --- arch/x86/include/asm/kvm_host.h | 17 +++ arch/x86/kvm/Kconfig|1 + arch/x86/kvm/Makefile |1 + arch/x86/kvm/mmu.c | 51 +- arch/x86/kvm/paging_tmpl.h |4 +- arch/x86/kvm/x86.c | 109 +++- include/linux/kvm_host.h| 31 ++ include/trace/events/kvm.h | 88 virt/kvm/Kconfig|3 + virt/kvm/async_pf.c | 220 +++ virt/kvm/async_pf.h | 36 +++ virt/kvm/kvm_main.c | 57 -- 12 files changed, 603 insertions(+), 15 deletions(-) create mode 100644 virt/kvm/async_pf.c create mode 100644 virt/kvm/async_pf.h + async_pf_cache = NULL; +} + +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu) +{ + INIT_LIST_HEAD(vcpu-async_pf.done); + INIT_LIST_HEAD(vcpu-async_pf.queue); + spin_lock_init(vcpu-async_pf.lock); +} + +static void async_pf_execute(struct work_struct *work) +{ + struct page *page; + struct kvm_async_pf *apf = + container_of(work, struct kvm_async_pf, work); + struct mm_struct *mm = apf-mm; + struct kvm_vcpu *vcpu = apf-vcpu; + unsigned long addr = apf-addr; + gva_t gva = apf-gva; + + might_sleep(); + + use_mm(mm); + down_read(mm-mmap_sem); + get_user_pages(current, mm, addr, 1, 1, 0,page, NULL); + up_read(mm-mmap_sem); + unuse_mm(mm); + + spin_lock(vcpu-async_pf.lock); + list_add_tail(apf-link,vcpu-async_pf.done); + apf-page = page; + spin_unlock(vcpu-async_pf.lock); This can fail, and apf-page become NULL. Does it even become NULL? On error, get_user_pages() won't update the pages argument, so page becomes garbage here. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On Wed, Oct 06, 2010 at 12:50:01PM +0200, Avi Kivity wrote: On 10/05/2010 04:59 PM, Marcelo Tosatti wrote: On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote: If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. Signed-off-by: Gleb Natapovg...@redhat.com --- arch/x86/include/asm/kvm_host.h | 17 +++ arch/x86/kvm/Kconfig|1 + arch/x86/kvm/Makefile |1 + arch/x86/kvm/mmu.c | 51 +- arch/x86/kvm/paging_tmpl.h |4 +- arch/x86/kvm/x86.c | 109 +++- include/linux/kvm_host.h| 31 ++ include/trace/events/kvm.h | 88 virt/kvm/Kconfig|3 + virt/kvm/async_pf.c | 220 +++ virt/kvm/async_pf.h | 36 +++ virt/kvm/kvm_main.c | 57 -- 12 files changed, 603 insertions(+), 15 deletions(-) create mode 100644 virt/kvm/async_pf.c create mode 100644 virt/kvm/async_pf.h + async_pf_cache = NULL; +} + +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu) +{ + INIT_LIST_HEAD(vcpu-async_pf.done); + INIT_LIST_HEAD(vcpu-async_pf.queue); + spin_lock_init(vcpu-async_pf.lock); +} + +static void async_pf_execute(struct work_struct *work) +{ + struct page *page; + struct kvm_async_pf *apf = + container_of(work, struct kvm_async_pf, work); + struct mm_struct *mm = apf-mm; + struct kvm_vcpu *vcpu = apf-vcpu; + unsigned long addr = apf-addr; + gva_t gva = apf-gva; + + might_sleep(); + + use_mm(mm); + down_read(mm-mmap_sem); + get_user_pages(current, mm, addr, 1, 1, 0,page, NULL); + up_read(mm-mmap_sem); + unuse_mm(mm); + + spin_lock(vcpu-async_pf.lock); + list_add_tail(apf-link,vcpu-async_pf.done); + apf-page = page; + spin_unlock(vcpu-async_pf.lock); This can fail, and apf-page become NULL. Does it even become NULL? On error, get_user_pages() won't update the pages argument, so page becomes garbage here. apf is allocated with kmem_cache_zalloc() and -page is set to NULL in kvm_setup_async_pf() to be extra sure. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On Tue, Oct 05, 2010 at 11:59:16AM -0300, Marcelo Tosatti wrote: On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote: If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_host.h | 17 +++ arch/x86/kvm/Kconfig|1 + arch/x86/kvm/Makefile |1 + arch/x86/kvm/mmu.c | 51 +- arch/x86/kvm/paging_tmpl.h |4 +- arch/x86/kvm/x86.c | 109 +++- include/linux/kvm_host.h| 31 ++ include/trace/events/kvm.h | 88 virt/kvm/Kconfig|3 + virt/kvm/async_pf.c | 220 +++ virt/kvm/async_pf.h | 36 +++ virt/kvm/kvm_main.c | 57 -- 12 files changed, 603 insertions(+), 15 deletions(-) create mode 100644 virt/kvm/async_pf.c create mode 100644 virt/kvm/async_pf.h + async_pf_cache = NULL; +} + +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu) +{ + INIT_LIST_HEAD(vcpu-async_pf.done); + INIT_LIST_HEAD(vcpu-async_pf.queue); + spin_lock_init(vcpu-async_pf.lock); +} + +static void async_pf_execute(struct work_struct *work) +{ + struct page *page; + struct kvm_async_pf *apf = + container_of(work, struct kvm_async_pf, work); + struct mm_struct *mm = apf-mm; + struct kvm_vcpu *vcpu = apf-vcpu; + unsigned long addr = apf-addr; + gva_t gva = apf-gva; + + might_sleep(); + + use_mm(mm); + down_read(mm-mmap_sem); + get_user_pages(current, mm, addr, 1, 1, 0, page, NULL); + up_read(mm-mmap_sem); + unuse_mm(mm); + + spin_lock(vcpu-async_pf.lock); + list_add_tail(apf-link, vcpu-async_pf.done); + apf-page = page; + spin_unlock(vcpu-async_pf.lock); This can fail, and apf-page become NULL. + if (list_empty_careful(vcpu-async_pf.done)) + return; + + spin_lock(vcpu-async_pf.lock); + work = list_first_entry(vcpu-async_pf.done, typeof(*work), link); + list_del(work-link); + spin_unlock(vcpu-async_pf.lock); + + kvm_arch_async_page_present(vcpu, work); + +free: + list_del(work-queue); + vcpu-async_pf.queued--; + put_page(work-page); + kmem_cache_free(async_pf_cache, work); +} Better handle it here (and other sites). Yeah. We should just reenter gust and let usual code path handle error on next guest access. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote: If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_host.h | 17 +++ arch/x86/kvm/Kconfig|1 + arch/x86/kvm/Makefile |1 + arch/x86/kvm/mmu.c | 51 +- arch/x86/kvm/paging_tmpl.h |4 +- arch/x86/kvm/x86.c | 109 +++- include/linux/kvm_host.h| 31 ++ include/trace/events/kvm.h | 88 virt/kvm/Kconfig|3 + virt/kvm/async_pf.c | 220 +++ virt/kvm/async_pf.h | 36 +++ virt/kvm/kvm_main.c | 57 -- 12 files changed, 603 insertions(+), 15 deletions(-) create mode 100644 virt/kvm/async_pf.c create mode 100644 virt/kvm/async_pf.h + async_pf_cache = NULL; +} + +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu) +{ + INIT_LIST_HEAD(vcpu-async_pf.done); + INIT_LIST_HEAD(vcpu-async_pf.queue); + spin_lock_init(vcpu-async_pf.lock); +} + +static void async_pf_execute(struct work_struct *work) +{ + struct page *page; + struct kvm_async_pf *apf = + container_of(work, struct kvm_async_pf, work); + struct mm_struct *mm = apf-mm; + struct kvm_vcpu *vcpu = apf-vcpu; + unsigned long addr = apf-addr; + gva_t gva = apf-gva; + + might_sleep(); + + use_mm(mm); + down_read(mm-mmap_sem); + get_user_pages(current, mm, addr, 1, 1, 0, page, NULL); + up_read(mm-mmap_sem); + unuse_mm(mm); + + spin_lock(vcpu-async_pf.lock); + list_add_tail(apf-link, vcpu-async_pf.done); + apf-page = page; + spin_unlock(vcpu-async_pf.lock); This can fail, and apf-page become NULL. + if (list_empty_careful(vcpu-async_pf.done)) + return; + + spin_lock(vcpu-async_pf.lock); + work = list_first_entry(vcpu-async_pf.done, typeof(*work), link); + list_del(work-link); + spin_unlock(vcpu-async_pf.lock); + + kvm_arch_async_page_present(vcpu, work); + +free: + list_del(work-queue); + vcpu-async_pf.queued--; + put_page(work-page); + kmem_cache_free(async_pf_cache, work); +} Better handle it here (and other sites). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_host.h | 17 +++ arch/x86/kvm/Kconfig|1 + arch/x86/kvm/Makefile |1 + arch/x86/kvm/mmu.c | 51 +- arch/x86/kvm/paging_tmpl.h |4 +- arch/x86/kvm/x86.c | 109 +++- include/linux/kvm_host.h| 31 ++ include/trace/events/kvm.h | 88 virt/kvm/Kconfig|3 + virt/kvm/async_pf.c | 220 +++ virt/kvm/async_pf.h | 36 +++ virt/kvm/kvm_main.c | 57 -- 12 files changed, 603 insertions(+), 15 deletions(-) create mode 100644 virt/kvm/async_pf.c create mode 100644 virt/kvm/async_pf.h diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e209078..5f154d3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -83,6 +83,8 @@ #define KVM_NR_FIXED_MTRR_REGION 88 #define KVM_NR_VAR_MTRR 8 +#define ASYNC_PF_PER_VCPU 64 + extern spinlock_t kvm_lock; extern struct list_head vm_list; @@ -412,6 +414,10 @@ struct kvm_vcpu_arch { u64 hv_vapic; cpumask_var_t wbinvd_dirty_mask; + + struct { + gfn_t gfns[roundup_pow_of_two(ASYNC_PF_PER_VCPU)]; + } apf; }; struct kvm_arch { @@ -585,7 +591,12 @@ struct kvm_x86_ops { const struct trace_print_flags *exit_reasons_str; }; +struct kvm_arch_async_pf { + gfn_t gfn; +}; + extern struct kvm_x86_ops *kvm_x86_ops; +extern struct kvm_async_pf *kvm_double_apf; int kvm_mmu_module_init(void); void kvm_mmu_module_exit(void); @@ -823,4 +834,10 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask); bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip); +void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, +struct kvm_async_pf *work); +void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, +struct kvm_async_pf *work); +extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn); + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index ddc131f..50f6364 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -28,6 +28,7 @@ config KVM select HAVE_KVM_IRQCHIP select HAVE_KVM_EVENTFD select KVM_APIC_ARCHITECTURE + select KVM_ASYNC_PF select USER_RETURN_NOTIFIER select KVM_MMIO ---help--- diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 31a7035..c53bf19 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -9,6 +9,7 @@ kvm-y += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \ coalesced_mmio.o irq_comm.o eventfd.o \ assigned-dev.o) kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o) +kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/, async_pf.o) kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ i8254.o timer.o diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index c94c432..4d49b5e 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -18,9 +18,11 @@ * */ +#include irq.h #include mmu.h #include x86.h #include kvm_cache_regs.h +#include x86.h #include linux/kvm_host.h #include linux/types.h @@ -2575,6 +2577,49 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva, error_code PFERR_WRITE_MASK, gfn); } +int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn) +{ + struct kvm_arch_async_pf arch; + arch.gfn = gfn; + + return kvm_setup_async_pf(vcpu, gva, gfn, arch); +} + +static bool can_do_async_pf(struct kvm_vcpu *vcpu) +{ + if (unlikely(!irqchip_in_kernel(vcpu-kvm) || +kvm_event_needs_reinjection(vcpu))) + return false; + + return kvm_x86_ops-interrupt_allowed(vcpu); +} + +static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva, +pfn_t *pfn) +{ + bool async; + + *pfn = gfn_to_pfn_async(vcpu-kvm, gfn, async); + + if (!async) + return false; /* *pfn has correct page already */ + + put_page(pfn_to_page(*pfn)); + + if (can_do_async_pf(vcpu)) { + trace_kvm_try_async_get_page(async, *pfn); + if (kvm_find_async_pf_gfn(vcpu, gfn)) { + vcpu-async_pf.work = kvm_double_apf; + return
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On 10/04/2010 11:56 AM, Gleb Natapov wrote: If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. Signed-off-by: Gleb Natapovg...@redhat.com This seems quite different from the last version, but it looks fine to me. Acked-by: Rik van Riel r...@redhat.com -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html