Re: KVM arm realtime performance optimization

Christoffer Dall Mon, 10 Dec 2018 05:19:18 -0800

On Mon, Dec 10, 2018 at 05:36:09AM +0000, Steven Miao (Arm Technology China) 
wrote:
> 
> From: kvmarm-boun...@lists.cs.columbia.edu 
> <kvmarm-boun...@lists.cs.columbia.edu> On Behalf Of Steven Miao (Arm 
> Technology China)
> Sent: Thursday, December 6, 2018 3:05 PM
> To: kvmarm@lists.cs.columbia.edu
> Subject: KVM arm realtime performance optimization
> 
> Hi Everyone,
> 
> I' currently testing KVM arm realtime performance on a hikey960 board. My 
> test benchmark is cyclictest to measure thread wake up latency both on Host 
> linux OS and KVM Guest linux OS.
> 
> Host OS:
> 
> hikey960:/mnt/debian/usr/src/linux#  cyclictest -p 99 -t 4 -m -n -a 0-3 -l 
> 100000
> # /dev/cpu_dma_latency set to 0us
> WARN: Running on unknown kernel version...YMMV
> policy: fifo: loadavg: 0.00 0.00 0.00 1/165 3270
> 
> T: 0 ( 3266) P:99 I:1000 C: 100000 Min:      4 Act:   15 Avg:   15 Max:     
> 139
> T: 1 ( 3267) P:99 I:1500 C:  66736 Min:      4 Act:   15 Avg:   15 Max:     
> 239
> T: 2 ( 3268) P:99 I:2000 C:  50051 Min:      4 Act:   19 Avg:   15 Max:      
> 43
> T: 3 ( 3269) P:99 I:2500 C:  40039 Min:      5 Act:   15 Avg:   16 Max:      
> 74
> 
> Guest OS:
> root@genericarmv8:~# cyclictest -p 99 -t 4 -m -n -a 0-3 -l 100000
> # /dev/cpu_dma_latency set to 0us
> WARN: Running on unknown kernel version...YMMV
> policy: fifo: loadavg: 0.13 0.05 0.01 1/70 293
> 
> T: 0 (  290) P:99 I:1000 C: 100000 Min:      7 Act:   44 Avg:   85 Max:   
> 16111
> T: 1 (  291) P:99 I:1500 C:  66665 Min:      7 Act:   81 Avg:   90 Max:   
> 15306
> T: 2 (  292) P:99 I:2000 C:  49995 Min:      7 Act:   88 Avg:   87 Max:   
> 16703
> T: 3 (  293) P:99 I:2500 C:  39992 Min:      8 Act:   72 Avg:   97 Max:   
> 14976
> 
> 
> RT performance on KVM guest OS is poor compared to that on host OS. The 
> average wake up latency is about 6 - 7 times on Guest OS vs on Host OS.
> I've tried some configurations to improve RT in KVM, like:
> 1 Can be combined with CPU isolation
> 2 Host OS and Guest OS use RT preempt kernel
> 3 Host CPU avoid frequency change
> 4 Configure NO_HZ_FULL for Guest OS
> 
> There could be a little improvement after apply above configuration, but the 
> RT performance is still very poor.
> 
> 5 Guest OS use idle poll instead of WFI to avoid trap and switch out
> 
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 2dc0f84..53aef78 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -83,7 +83,7 @@ void arch_cpu_idle(void)
>          * tricks
>          */
>         trace_cpu_idle_rcuidle(1, smp_processor_id());
> -       cpu_do_idle();
> +       cpu_relax();
>         local_irq_enable();
>         trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
>  }
> 
> root@genericarmv8:~# cyclictest -p 99 -t 4 -m -n  -l 100000
> # /dev/cpu_dma_latency set to 0us
> WARN: Running on unknown kernel version...YMMV
> policy: fifo: loadavg: 0.07 0.03 0.00 1/99 328
> 
> T: 0 (  325) P:99 I:1000 C: 100000 Min:      3 Act:    6 Avg:   13 Max:    
> 4999
> T: 1 (  326) P:99 I:1500 C:  66659 Min:      5 Act:    7 Avg:   14 Max:    
> 3449
> T: 2 (  327) P:99 I:2000 C:  49989 Min:      4 Act:    7 Avg:    9 Max:   
> 11471
> T: 3 (  328) P:99 I:2500 C:  39986 Min:      4 Act:   14 Avg:   14 Max:   
> 11253
> 
> The method 5 can improve Guest OS RT performance a lot, the average thread 
> wake up latency on Guest OS is almost same as its on Host OS, but the Max 
> wake up latency is still very poor.
> 
> Anyone has any idea to improve RT performance on KVM Guest OS? Although 
> method 5 can improve RT performance on Guest OS a lot, I think it is not good 
> idea.
> 
This is a known problem and there have been presentations about similar
problems on x86 in past KVM Forums.


The first thing to do is analyze the critical path that adds latency to
a wakeup.  One way to do that is to instrument the path by adding time
counter reads to the path and figuring out what takes time.

One thing you can look at is having a configurable grace period in KVM's
block function before the process actually goes to sleep (and calls
kvm_vcpu_put) and the host scheduler, and see if that helps anything.

At the end of the day, virtualization is going to add a lot of latency
when you have to switch the entire state of your CPU, and in terms of
virtual RT, you end up with a very high minimal latency.


Thanks,

    Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: KVM arm realtime performance optimization

Reply via email to