Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

Raghavendra K T Wed, 03 Oct 2012 05:26:27 -0700

* Avi Kivity <a...@redhat.com> [2012-09-24 17:41:19]:

> On 09/21/2012 08:24 PM, Raghavendra K T wrote:
> > On 09/21/2012 06:32 PM, Rik van Riel wrote:
> >> On 09/21/2012 08:00 AM, Raghavendra K T wrote:
> >>> From: Raghavendra K T <raghavendra...@linux.vnet.ibm.com>
> >>>
> >>> When total number of VCPUs of system is less than or equal to physical
> >>> CPUs,
> >>> PLE exits become costly since each VCPU can have dedicated PCPU, and
> >>> trying to find a target VCPU to yield_to just burns time in PLE handler.
> >>>
> >>> This patch reduces overhead, by simply doing a return in such
> >>> scenarios by
> >>> checking the length of current cpu runqueue.
> >>
> >> I am not convinced this is the way to go.
> >>
> >> The VCPU that is holding the lock, and is not releasing it,
> >> probably got scheduled out. That implies that VCPU is on a
> >> runqueue with at least one other task.
> > 
> > I see your point here, we have two cases:
> > 
> > case 1)
> > 
> > rq1 : vcpu1->wait(lockA) (spinning)
> > rq2 : vcpu2->holding(lockA) (running)
> > 
> > Here Ideally vcpu1 should not enter PLE handler, since it would surely
> > get the lock within ple_window cycle. (assuming ple_window is tuned for
> > that workload perfectly).
> > 
> > May be this explains why we are not seeing benefit with kernbench.
> > 
> > On the other side, Since we cannot have a perfect ple_window tuned for
> > all type of workloads, for those workloads, which may need more than
> > 4096 cycles, we gain. thinking is it that we are seeing in benefited
> > cases?
> 
> Maybe we need to increase the ple window regardless.  4096 cycles is 2
> microseconds or less (call it t_spin).  The overhead from
> kvm_vcpu_on_spin() and the associated task switches is at least a few
> microseconds, increasing as contention is added (call it t_tield).  The
> time for a natural context switch is several milliseconds (call it
> t_slice).  There is also the time the lock holder owns the lock,
> assuming no contention (t_hold).
> 
> If t_yield > t_spin, then in the undercommitted case it dominates
> t_spin.  If t_hold > t_spin we lose badly.
> 
> If t_spin > t_yield, then the undercommitted case doesn't suffer as much
> as most of the spinning happens in the guest instead of the host, so it
> can pick up the unlock timely.  We don't lose too much in the
> overcommitted case provided the values aren't too far apart (say a
> factor of 3).
> 
> Obviously t_spin must be significantly smaller than t_slice, otherwise
> it accomplishes nothing.
> 
> Regarding t_hold: if it is small, then a larger t_spin helps avoid false
> exits.  If it is large, then we're not very sensitive to t_spin.  It
> doesn't matter if it takes us 2 usec or 20 usec to yield, if we end up
> yielding for several milliseconds.
> 
> So I think it's worth trying again with ple_window of 20000-40000.
>


Hi Avi,

I ran different benchmarks increasing ple_window, and results does not
seem to be encouraging for increasing ple_window.

Results:
16 core PLE machine with 16 vcpu guest. 

base kernel = 3.6-rc5 + ple handler optimization patch 
base_pleopt_8k = base kernel + ple window = 8k
base_pleopt_16k = base kernel + ple window = 16k
base_pleopt_32k = base kernel + ple window = 32k


Percentage improvements of benchmarks w.r.t base_pleopt with ple_window = 4096

                base_pleopt_8k  base_pleopt_16k base_pleopt_32k
-----------------------------------------------------------------               
        
kernbench_1x    -5.54915        -15.94529       -44.31562
kernbench_2x    -7.89399        -17.75039       -37.73498
-----------------------------------------------------------------               
        
sysbench_1x     0.45955         -0.98778        0.05252
sysbench_2x     1.44071         -0.81625        1.35620
sysbench_3x     0.45549         1.51795         -0.41573
-----------------------------------------------------------------               
        
                        
hackbench_1x    -3.80272        -13.91456       -40.79059
hackbench_2x    -4.78999        -7.61382        -7.24475
-----------------------------------------------------------------               
        
ebizzy_1x       -2.54626        -16.86050       -38.46109
ebizzy_2x       -8.75526        -19.29116       -48.33314
-----------------------------------------------------------------               
        

I also got perf top output to analyse the difference. Difference comes
because of flushtlb (and also spinlock).

Ebizzy run for 4k ple_window
-  87.20%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
      - 100.00% _raw_spin_unlock_irqrestore
         + 52.89% release_pages
         + 47.10% pagevec_lru_move_fn
-   5.71%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
      + 86.03% default_send_IPI_mask_allbutself_phys
      + 13.96% default_send_IPI_mask_sequence_phys
-   3.10%  [kernel]  [k] smp_call_function_many
     smp_call_function_many


Ebizzy run for 32k ple_window

-  91.40%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
      - 100.00% _raw_spin_unlock_irqrestore
         + 53.13% release_pages
         + 46.86% pagevec_lru_move_fn
-   4.38%  [kernel]  [k] smp_call_function_many
     smp_call_function_many
-   2.51%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
      + 90.76% default_send_IPI_mask_allbutself_phys
      + 9.24% default_send_IPI_mask_sequence_phys


Below is the detailed result:                   
patch = base_pleopt_8k 
+-----------+-----------+-----------+------------+-----------+
                              kernbench 
+-----------+-----------+-----------+------------+-----------+
    base         stddev    patch       stdev       %improve    
+-----------+-----------+-----------+------------+-----------+
    41.0027     0.7990      43.2780     0.5180    -5.54915
    89.2983     1.2406      96.3475     1.8891    -7.89399
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              sysbench 
+-----------+-----------+-----------+------------+-----------+
     9.9010     0.0558       9.8555     0.1246     0.45955
    19.7611     0.4290      19.4764     0.0835     1.44071
    29.1775     0.9903      29.0446     0.8641     0.45549
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              hackbench 
+-----------+-----------+-----------+------------+-----------+
    77.1580     1.9787      80.0921     2.9696    -3.80272
   239.2490     1.5660     250.7090     2.6074    -4.78999
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              ebizzy 
+-----------+-----------+-----------+------------+-----------+
  4256.2500   186.8053    4147.8750   206.1840    -2.54626
  2197.2500    93.1048    2004.8750    85.7995    -8.75526
+-----------+-----------+-----------+------------+-----------+

patch = base_pleopt_16k
+-----------+-----------+-----------+------------+-----------+
                              kernbench 
+-----------+-----------+-----------+------------+-----------+
    base         stddev    patch       stdev       %improve    
+-----------+-----------+-----------+------------+-----------+
    41.0027     0.7990      47.5407     0.5739   -15.94529
    89.2983     1.2406     105.1491     1.2244   -17.75039
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              sysbench 
+-----------+-----------+-----------+------------+-----------+
     9.9010     0.0558       9.9988     0.1106    -0.98778
    19.7611     0.4290      19.9224     0.9016    -0.81625
    29.1775     0.9903      28.7346     0.2788     1.51795
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              hackbench 
+-----------+-----------+-----------+------------+-----------+
    77.1580     1.9787      87.8942     2.2132   -13.91456
   239.2490     1.5660     257.4650     5.3674    -7.61382
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              ebizzy 
+-----------+-----------+-----------+------------+-----------+
  4256.2500   186.8053    3538.6250   101.1165   -16.86050
  2197.2500    93.1048    1773.3750    91.8414   -19.29116
+-----------+-----------+-----------+------------+-----------+

patch = base_pleopt_32k
+-----------+-----------+-----------+------------+-----------+
                              kernbench 
+-----------+-----------+-----------+------------+-----------+
    base         stddev    patch       stdev       %improve    
+-----------+-----------+-----------+------------+-----------+
    41.0027     0.7990      59.1733     0.8102   -44.31562
    89.2983     1.2406     122.9950     1.5534   -37.73498
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              sysbench 
+-----------+-----------+-----------+------------+-----------+
     9.9010     0.0558       9.8958     0.0593     0.05252
    19.7611     0.4290      19.4931     0.1767     1.35620
    29.1775     0.9903      29.2988     1.0420    -0.41573
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              hackbench 
+-----------+-----------+-----------+------------+-----------+
    77.1580     1.9787     108.6312    13.1500   -40.79059
   239.2490     1.5660     256.5820     2.2722    -7.24475
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              ebizzy 
+-----------+-----------+-----------+------------+-----------+
  4256.2500   186.8053    2619.2500    80.8150   -38.46109
  2197.2500    93.1048    1135.2500    22.2887   -48.33314
+-----------+-----------+-----------+------------+-----------+

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

Reply via email to