Re: [PATCH v4 0/5] KVM: LAPIC: Optimize timer latency further

2019-05-22 Thread Wanpeng Li
On Mon, 20 May 2019 at 16:18, Wanpeng Li  wrote:
>
> Advance lapic timer tries to hidden the hypervisor overhead between the
> host emulated timer fires and the guest awares the timer is fired. However,
> it just hidden the time between apic_timer_fn/handle_preemption_timer ->
> wait_lapic_expire, instead of the real position of vmentry which is
> mentioned in the orignial commit d0659d946be0 ("KVM: x86: add option to
> advance tscdeadline hrtimer expiration"). There is 700+ cpu cycles between
> the end of wait_lapic_expire and before world switch on my haswell desktop.
>
> This patchset tries to narrow the last gap(wait_lapic_expire -> world switch),
> it takes the real overhead time between apic_timer_fn/handle_preemption_timer
> and before world switch into consideration when adaptively tuning timer
> advancement. The patchset can reduce 40% latency (~1600+ cycles to ~1000+
> cycles on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when
> testing busy waits.

Testing on a Skylake Server, w/ nohz=off, idle=poll in the guest.
Reduces average cyclictest latency from 3us to 2us.

Regards,
Wanpeng Li


Re: [PATCH v4 0/5] KVM: LAPIC: Optimize timer latency further

2019-05-20 Thread Paolo Bonzini
On 20/05/19 10:18, Wanpeng Li wrote:
> Advance lapic timer tries to hidden the hypervisor overhead between the 
> host emulated timer fires and the guest awares the timer is fired. However, 
> it just hidden the time between apic_timer_fn/handle_preemption_timer -> 
> wait_lapic_expire, instead of the real position of vmentry which is 
> mentioned in the orignial commit d0659d946be0 ("KVM: x86: add option to 
> advance tscdeadline hrtimer expiration"). There is 700+ cpu cycles between 
> the end of wait_lapic_expire and before world switch on my haswell desktop.
> 
> This patchset tries to narrow the last gap(wait_lapic_expire -> world 
> switch), 
> it takes the real overhead time between apic_timer_fn/handle_preemption_timer
> and before world switch into consideration when adaptively tuning timer 
> advancement. The patchset can reduce 40% latency (~1600+ cycles to ~1000+ 
> cycles on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when 
> testing busy waits.
> 
> v3 -> v4:
>  * create timer_advance_ns debugfs entry iff lapic_in_kernel() 
>  * keep if (guest_tsc < tsc_deadline) before the call to __wait_lapic_expire()
> 
> v2 -> v3:
>  * expose 'kvm_timer.timer_advance_ns' to userspace
>  * move the tracepoint below guest_exit_irqoff()
>  * move wait_lapic_expire() before flushing the L1
> 
> v1 -> v2:
>  * fix indent in patch 1/4
>  * remove the wait_lapic_expire() tracepoint and expose by debugfs
>  * move the call to wait_lapic_expire() into vmx.c and svm.c
> 
> Wanpeng Li (5):
>   KVM: LAPIC: Extract adaptive tune timer advancement logic
>   KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow
>   KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace
>   KVM: LAPIC: Delay trace advance expire delta
>   KVM: LAPIC: Optimize timer latency further
> 
>  arch/x86/kvm/debugfs.c | 18 +++
>  arch/x86/kvm/lapic.c   | 60 
> +-
>  arch/x86/kvm/lapic.h   |  3 ++-
>  arch/x86/kvm/svm.c |  4 
>  arch/x86/kvm/vmx/vmx.c |  4 
>  arch/x86/kvm/x86.c |  9 
>  6 files changed, 68 insertions(+), 30 deletions(-)
> 

Queued, thanks (2-3 for 5.2, the rest for 5.3).

Paolo