Re: [kvm-devel] Remove APIC lock
Dong, Eddie wrote: Avi: apic-lock is used in many place to avoid race condition with apic timer call back function which may run on different pCPU. This patch migrate the apic timer to same CPU with the one VP runs on, thus the lock is no longer necessary. thx,eddie This is now applied into a new branch lapic6. I split into removal of the lock, removal of kvm_lapic_get_regs(), and timer migration. Eventually the first two will be folded into preceding patches. Please separate patches in the future. -- error compiling committee.c: too many arguments to function - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Remove APIC lock
Avi Kivity wrote: Just noticed it is changed to mutex, but seems same here :-) If the process is switched to other task, it is OK since it won't access local APIC. Current VP access to APIC will take the mutex first (see below). Or you are talking other corner case? apic access from process context is protected by kvm-lock, but apic access from hrtimer is not. Consider this scenario: - guest accesses apic - apic code starts modifying apic data preemption - timer fires - apic_timer_fn() corrupts apic data (I'm not even sure preemption is required here) I think that in Xen this can't happen because is is not preemptible and timers are processed when exiting back to the guest. For this situation, even without preemption, the problem is still there. But maybe you are refering the old code, the latest code is already preemption free since the apic_timer_fn didn't change any APIC state. It only increase apic-timer.pending. When guest change/disable APIC time configuration, the hrtimer must be canceled first so that no race with old hrtimer. The apic can be protected by vcpu-mutex, platform-wide things (pic, ioapic) should be protected by kvm-lock. This will work if we move all apic processing to process context like I proposed in a previous mail. In this patch since hrtimer always run in same pCPU with guest VP (when VP is active), each time when hrtime fires (comes from a hardware IRQ), it already VM Exit to kernel (similar function with kvm_vcpu_kick but no need to explicitly call it) and then we do IRQ injection at vmx_intr_assist time. Yes, the two solutions are very similar. But I think mine protects against a race: - scheduler starts migrating vcpu from cpu 0 to cpu 1 - hrtimer fires on cpu 0, but apic_timer_fn not called yet - vcpu on cpu 1 migrates the hrtimer When CPU 1 do hrtimer migration, hrtimer_cancel will wait for an in-flying timer be completed and then remove it. see hrtimer_try_to_cancel. - vcpu enters guest mode on cpu 1 - cpu 0 calls apic_timer_fn The timer is either already canceled or be done before above migration completion. In this case, there will be no wakeup. So I think you do need to call kvm_vcpu_kick() which will usually do nothing. We also need to make sure all the non atomic code in __apic_timer_fn() is executed in process context (it can use the pending count to decide how much to add). I think today it is all atomic ops there. maybe I missed something. So I think there are three separate issues here: - hrtimer migration: it helps performance, but doesn't help locking - changing __apic_timer_fn() to only do atomic operations, and do the nonatomic operations in process context under vcpu-mutex - remove the apic lock thx,eddie - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Remove APIC lock
Dong, Eddie wrote: Avi Kivity wrote: Just noticed it is changed to mutex, but seems same here :-) If the process is switched to other task, it is OK since it won't access local APIC. Current VP access to APIC will take the mutex first (see below). Or you are talking other corner case? apic access from process context is protected by kvm-lock, but apic access from hrtimer is not. Consider this scenario: - guest accesses apic - apic code starts modifying apic data preemption - timer fires - apic_timer_fn() corrupts apic data (I'm not even sure preemption is required here) I think that in Xen this can't happen because is is not preemptible and timers are processed when exiting back to the guest. For this situation, even without preemption, the problem is still there. But maybe you are refering the old code, the latest code is already preemption free since the apic_timer_fn didn't change any APIC state. It only increase apic-timer.pending. I see the following: static int __apic_timer_fn(struct kvm_lapic *apic) { int result = 0; wait_queue_head_t *q = apic-vcpu-wq; atomic_inc(apic-timer.pending); if (waitqueue_active(q)) wake_up_interruptible(q); if (apic_lvtt_period(apic)) { result = 1; apic-timer.dev.expires = ktime_add_ns( apic-timer.dev.expires, apic-timer.period); } return result; } So, timer.dev.expires is protected by hrtimer internal locking? Tricky, but it should work. The apic can be protected by vcpu-mutex, platform-wide things (pic, ioapic) should be protected by kvm-lock. This will work if we move all apic processing to process context like I proposed in a previous mail. In this patch since hrtimer always run in same pCPU with guest VP (when VP is active), each time when hrtime fires (comes from a hardware IRQ), it already VM Exit to kernel (similar function with kvm_vcpu_kick but no need to explicitly call it) and then we do IRQ injection at vmx_intr_assist time. Yes, the two solutions are very similar. But I think mine protects against a race: - scheduler starts migrating vcpu from cpu 0 to cpu 1 - hrtimer fires on cpu 0, but apic_timer_fn not called yet - vcpu on cpu 1 migrates the hrtimer When CPU 1 do hrtimer migration, hrtimer_cancel will wait for an in-flying timer be completed and then remove it. see hrtimer_try_to_cancel. Okay. I'm satisfied that it's safe now. I'll add some comments later and commit. -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Remove APIC lock
What about preemption: - vcpu executes lapic code in qemu process context Don't understand. LAPIC is in kernel, how can qemu access? If you mean qemu is calling APIC KVM syscall, then it already disabled preemption take kvm-lock. - process is preempted - timer fires, touches lapic code See above. Furthermore, I question the necessity of this. Taking a spinlock is a couple dozen cycles on modern processors. Entering the guest is a couple thousand. So what are we saving? I am stuck, can u explain more? We didn't enter guest more than before. (migrating the timer is a good thing though). A different approach might be to wake up the vcpu like we do for irr, with kvm_vcpu_kick(), and let the timer code be handled in process context, so no locks need be taken. This can be too, but will that be much efficient than timer migration? Usually the guest APIC timer is around 1ms, which means we have to VM Exit guest 1000HZ with cost of thousands of cycles each time. While the process migration can only happen at scheduler quantum (10ms?) and highly depend on the scheduler policy. I guess the cost of hrtimer migration won;t exceed 1000 cycles in modern processor. And today the migration cost is already exceeds hundreds of cycles, add this additional one won;t change many. So I think the cost using hrtimer migration is much cheap than kvm_vcpu_kick(), but I may miss something. Eddie - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Remove APIC lock
Dong, Eddie wrote: What about preemption: - vcpu executes lapic code in qemu process context Don't understand. LAPIC is in kernel, how can qemu access? If you mean qemu is calling APIC KVM syscall, then it already disabled preemption take kvm-lock. I meant qemu executes the KVM_VCPU_RUN ioctl. kvm-lock does not disable preemption (it is a mutex). Do we really take kvm-lock for local accesses? That's a significant problem, much more than the timer. - process is preempted - timer fires, touches lapic code See above. Furthermore, I question the necessity of this. Taking a spinlock is a couple dozen cycles on modern processors. Entering the guest is a couple thousand. So what are we saving? I am stuck, can u explain more? We didn't enter guest more than before. What I'm saying is that the performance improvement is negligible, because the VT switch dominates the time. Taking the lock is fast in comparison. However I do like the simplification that removing the lock brings. (migrating the timer is a good thing though). A different approach might be to wake up the vcpu like we do for irr, with kvm_vcpu_kick(), and let the timer code be handled in process context, so no locks need be taken. This can be too, but will that be much efficient than timer migration? Usually the guest APIC timer is around 1ms, which means we have to VM Exit guest 1000HZ with cost of thousands of cycles each time. While the process migration can only happen at scheduler quantum (10ms?) and highly depend on the scheduler policy. I guess the cost of hrtimer migration won;t exceed 1000 cycles in modern processor. And today the migration cost is already exceeds hundreds of cycles, add this additional one won;t change many. So I think the cost using hrtimer migration is much cheap than kvm_vcpu_kick(), but I may miss something. I meant in addition to timer migration (I really like the timer migration part -- it's much more important than lock removal for performance). kvm_vcpu_kick() is needed to wake up from halt, or if we have races between the timer and task migration. I'm thinking about something like the remote tlb flush logic: - hrtimer callback sets a bit in vcpu-requests and calls kvm_vcpu_kick(). - if the hrtimer and the vcpu are co-located, kvm_vcpu_kick() does nothing - if the guest is running on another cpu, it sends an IPI - if the vcpu is halted, it wakes it up - when we reenter the guest again, we examine vcpu-requests; if KVM_REQ_LAPIC is set we know the timer expired and we run the lapic code (where we can examine the pending count). - we also need to update hlt sleep to look at vcpu-requests So, to summarize: - timer migration always helps, regardless of what we do - lock removal helps, IMO, not performance, but stability and maintainablity - kvm is preemptible except in vcpu_load() and guest entry, so if we do lock removal we need to do everything in task context (nothing in interrupts except raising bits like irr) Opinions? - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Remove APIC lock
Avi Kivity wrote: Do we really take kvm-lock for local accesses? That's a significant problem, much more than the timer. Actually, we must take the lock during emulation. But maybe we can change it to a reader/writer lock, and certainly we can drop it during lapic access. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Remove APIC lock
Avi Kivity wrote: Dong, Eddie wrote: What about preemption: - vcpu executes lapic code in qemu process context Don't understand. LAPIC is in kernel, how can qemu access? If you mean qemu is calling APIC KVM syscall, then it already disabled preemption take kvm-lock. I meant qemu executes the KVM_VCPU_RUN ioctl. kvm-lock does not disable preemption (it is a mutex). Just noticed it is changed to mutex, but seems same here :-) If the process is switched to other task, it is OK since it won't access local APIC. Current VP access to APIC will take the mutex first (see below). Or you are talking other corner case? Do we really take kvm-lock for local accesses? That's a significant problem, much more than the timer. Today all APIC/IOAPIC access comes from shadow page fault which already take kvm-lock. KVM_IRQ_LINE will take too. (just noticed the save/restore part missed this one, will add later if we agree here). PIC access comes from kernel_pio which takes the mutex too. Another missing place is vmx_intr_assist which needs to take the mutex too. Will add later. - process is preempted - timer fires, touches lapic code See above. Furthermore, I question the necessity of this. Taking a spinlock is a couple dozen cycles on modern processors. Entering the guest is a couple thousand. So what are we saving? I am stuck, can u explain more? We didn't enter guest more than before. What I'm saying is that the performance improvement is negligible, because the VT switch dominates the time. Taking the lock is fast in comparison. However I do like the simplification that removing the lock brings. Yes, I do this for simplification only, I didn't expect any performance difference. I met recursive lock taking case in debug, so just want to revisit it now. (migrating the timer is a good thing though). A different approach might be to wake up the vcpu like we do for irr, with kvm_vcpu_kick(), and let the timer code be handled in process context, so no locks need be taken. This can be too, but will that be much efficient than timer migration? Usually the guest APIC timer is around 1ms, which means we have to VM Exit guest 1000HZ with cost of thousands of cycles each time. While the process migration can only happen at scheduler quantum (10ms?) and highly depend on the scheduler policy. I guess the cost of hrtimer migration won;t exceed 1000 cycles in modern processor. And today the migration cost is already exceeds hundreds of cycles, add this additional one won;t change many. So I think the cost using hrtimer migration is much cheap than kvm_vcpu_kick(), but I may miss something. I meant in addition to timer migration (I really like the timer migration part -- it's much more important than lock removal for performance). kvm_vcpu_kick() is needed to wake up from halt, or if we have races between the timer and task migration. :-) Actually we have solved this issue in previous patch and this one naturally. In adding back APIC timer IRQ patch, we will wakeup the halt vCPU. In this patch since hrtimer always run in same pCPU with guest VP (when VP is active), each time when hrtime fires (comes from a hardware IRQ), it already VM Exit to kernel (similar function with kvm_vcpu_kick but no need to explicitly call it) and then we do IRQ injection at vmx_intr_assist time. I'm thinking about something like the remote tlb flush logic: - hrtimer callback sets a bit in vcpu-requests and calls kvm_vcpu_kick(). - if the hrtimer and the vcpu are co-located, kvm_vcpu_kick() does nothing - if the guest is running on another cpu, it sends an IPI - if the vcpu is halted, it wakes it up - when we reenter the guest again, we examine vcpu-requests; if KVM_REQ_LAPIC is set we know the timer expired and we run the lapic code (where we can examine the pending count). - we also need to update hlt sleep to look at vcpu-requests So, to summarize: - timer migration always helps, regardless of what we do - lock removal helps, IMO, not performance, but stability and maintainablity - kvm is preemptible except in vcpu_load() and guest entry, so if we do lock removal we need to do everything in task context (nothing in interrupts except raising bits like irr) Opinions? - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Remove APIC lock
Dong, Eddie wrote: Avi Kivity wrote: Dong, Eddie wrote: What about preemption: - vcpu executes lapic code in qemu process context Don't understand. LAPIC is in kernel, how can qemu access? If you mean qemu is calling APIC KVM syscall, then it already disabled preemption take kvm-lock. I meant qemu executes the KVM_VCPU_RUN ioctl. kvm-lock does not disable preemption (it is a mutex). Just noticed it is changed to mutex, but seems same here :-) If the process is switched to other task, it is OK since it won't access local APIC. Current VP access to APIC will take the mutex first (see below). Or you are talking other corner case? apic access from process context is protected by kvm-lock, but apic access from hrtimer is not. Consider this scenario: - guest accesses apic - apic code starts modifying apic data preemption - timer fires - apic_timer_fn() corrupts apic data (I'm not even sure preemption is required here) I think that in Xen this can't happen because is is not preemptible and timers are processed when exiting back to the guest. Do we really take kvm-lock for local accesses? That's a significant problem, much more than the timer. Today all APIC/IOAPIC access comes from shadow page fault which already take kvm-lock. KVM_IRQ_LINE will take too. (just noticed the save/restore part missed this one, will add later if we agree here). PIC access comes from kernel_pio which takes the mutex too. Another missing place is vmx_intr_assist which needs to take the mutex too. Will add later. The apic can be protected by vcpu-mutex, platform-wide things (pic, ioapic) should be protected by kvm-lock. This will work if we move all apic processing to process context like I proposed in a previous mail. I meant in addition to timer migration (I really like the timer migration part -- it's much more important than lock removal for performance). kvm_vcpu_kick() is needed to wake up from halt, or if we have races between the timer and task migration. :-) Actually we have solved this issue in previous patch and this one naturally. In adding back APIC timer IRQ patch, we will wakeup the halt vCPU. In this patch since hrtimer always run in same pCPU with guest VP (when VP is active), each time when hrtime fires (comes from a hardware IRQ), it already VM Exit to kernel (similar function with kvm_vcpu_kick but no need to explicitly call it) and then we do IRQ injection at vmx_intr_assist time. Yes, the two solutions are very similar. But I think mine protects against a race: - scheduler starts migrating vcpu from cpu 0 to cpu 1 - hrtimer fires on cpu 0, but apic_timer_fn not called yet - vcpu on cpu 1 migrates the hrtimer - vcpu enters guest mode on cpu 1 - cpu 0 calls apic_timer_fn In this case, there will be no wakeup. So I think you do need to call kvm_vcpu_kick() which will usually do nothing. We also need to make sure all the non atomic code in __apic_timer_fn() is executed in process context (it can use the pending count to decide how much to add). So I think there are three separate issues here: - hrtimer migration: it helps performance, but doesn't help locking - changing __apic_timer_fn() to only do atomic operations, and do the nonatomic operations in process context under vcpu-mutex - remove the apic lock -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Remove APIC lock
On Fri, 2007-08-24 at 21:08 +0800, Dong, Eddie wrote: Avi: apic-lock is used in many place to avoid race condition with apic timer call back function which may run on different pCPU. This patch migrate the apic timer to same CPU with the one VP runs on, thus the lock is no longer necessary. What about sources that can inject interrupts besides the timer? (E.g. in-kernel PV drivers) - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Remove APIC lock
Gregory Haskins wrote: On Fri, 2007-08-24 at 21:08 +0800, Dong, Eddie wrote: Avi: apic-lock is used in many place to avoid race condition with apic timer call back function which may run on different pCPU. This patch migrate the apic timer to same CPU with the one VP runs on, thus the lock is no longer necessary. What about sources that can inject interrupts besides the timer? (E.g. in-kernel PV drivers) Injecting IRQ is OK, since it is just operation to IRR register which we can use atomic operations. Xen also do in that way. Eddie - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Remove APIC lock
[EMAIL PROTECTED] wrote: Gregory Haskins wrote: On Fri, 2007-08-24 at 21:08 +0800, Dong, Eddie wrote: Avi: apic-lock is used in many place to avoid race condition with apic timer call back function which may run on different pCPU. This patch migrate the apic timer to same CPU with the one VP runs on, thus the lock is no longer necessary. What about sources that can inject interrupts besides the timer? (E.g. in-kernel PV drivers) Injecting IRQ is OK, since it is just operation to IRR register which we can use atomic operations. Xen also do in that way. O, said too quick. Xen current has evolved to be protected by a bigger irqlock for both APIC IOAPIC, and PIC uses per chip lock. For our case, PIC/IOAPIC now is using kvm-lock. So APIC is working with kvm-lock too. But this lock may be too big for pv driver. We may need to think of a solution to cover both APIC IOAPIC in future. Eddie - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Remove APIC lock
On Fri, 2007-08-24 at 22:24 +0800, Dong, Eddie wrote: [EMAIL PROTECTED] wrote: Gregory Haskins wrote: On Fri, 2007-08-24 at 21:08 +0800, Dong, Eddie wrote: Avi: apic-lock is used in many place to avoid race condition with apic timer call back function which may run on different pCPU. This patch migrate the apic timer to same CPU with the one VP runs on, thus the lock is no longer necessary. What about sources that can inject interrupts besides the timer? (E.g. in-kernel PV drivers) Injecting IRQ is OK, since it is just operation to IRR register which we can use atomic operations. Xen also do in that way. O, said too quick. Xen current has evolved to be protected by a bigger irqlock for both APIC IOAPIC, and PIC uses per chip lock. For our case, PIC/IOAPIC now is using kvm-lock. So APIC is working with kvm-lock too. But this lock may be too big for pv driver. We may need to think of a solution to cover both APIC IOAPIC in future. Yeah, I would highly recommend you make this more fine grained. For example, I had a vcpu-irq.lock, and a per-chip lock (e.g. one per apic, per pic, (and one per ioapic but I never got there)). Event injection is a hot-spot, so coarse locking is probably going to cause performance deficiencies. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel